question

bensmail avatar image
bensmail asked Erick Ramirez answered

Does OpsCenter Repair Service repair all DCs?

Hello,

We are runing a Cluster with 2- DCs, some Keyspaces are replicated on the both DC.

The repair service is also enabled from Opscenter

My question is:

1. Is the service repairing all nodes of the cluster or only Nodes in each DC?

2. If I have a table in a replicated Keyspace in DC1 and DC2 and some ROWS in that Table are present in DC1 but not in DC2 for some reasons, the repair service will repair Nodes in DC2, so that the missed Rows will be recovered?

Regards Salah

repair
1 comment
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

bensmail avatar image bensmail commented ·

Hello Another Question regarding the repair:

What if I enable Repair service to complete before 5 days and it finish the repair on the Cluster in 2 days, The Repair will stop and start after 3 days or it will start the repair immediatly after the End of the first repair?

Regards


0 Likes 0 ·
steve.lacerda avatar image
steve.lacerda answered

Hi,

Answering inline:

1) OpsCenter repairs based on token ranges, so whichever nodes own those ranges will get repaired regardless of DC.

2) Yes, a repair in DC1 will repair the nodes in DC2.

3) The repair will cycle back through and start over again once it's complete. It won't wait the 3 days to start another repair.

Thanks

Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Erick Ramirez avatar image
Erick Ramirez answered

The repair operation will repair all replicas unless you specifically tell it to only perform a "local" repair (with the -local or --in-local-dc flags).

The OpsCenter Repair Service work exactly the same as repairs you run manually. The difference is that OpsCenter automates it for you so you and it uses the same nodetool repair commands.

So to respond to your points directly:

  1. OpsCenter runs sub-range repairs to efficiently repair token ranges of all replicas regardless of which DC they belong to. In your case, this means replicas in both DC1 and DC2 are repaired.
  2. As in point 1 above, yes.
  3. By default, OpsCenter optimises repairs by splitting the the full token range into small segments such that each segment will get repaired within the time-to-completion value. If you've configured it to 5 days, OpsCenter will spread out the repair tasks (threads to repair each segment of token subrange) over the 5 days to minimise the load on the cluster.

As a side note, it is not recommended to reduce the time-to-completion as a means of getting your nodes repaired quicker to address inconsistencies. In most cases, replicas have data missing because they missed writes as a result of dropped mutations.

Nodes drop mutations because they are overloaded and the commitlog disk cannot keep up with writes. When you try to increase the frequency of repairs, it will put additional load on nodes which are already overloaded so it will not achieve the outcome you're after.

The correct way to deal with an overloaded cluster is to increase the capacity by adding nodes. If you need assistance with sizing your cluster correctly, please log a ticket with DataStax Support so one of our engineers can assist you directly. Cheers!

Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.