azim_91_184236 avatar image
azim_91_184236 asked Erick Ramirez commented

What is the recommendation for running repairs when decommissioning a datacenter?

I am trying to find the best practices of running 'nodetool repair' as part of migrating large OnPrem Cassandra clusters to cloud, and decommissioning the old cluster.

The blog suggests running full repair on each node of the new DC, but the Datastax documentation doesn't say if we need to run on the old or new DC or to run on all nodes or a single node. I know, there are possibly multiple ways of running it, hence was looking for some guidance on how to best run nodetool repair as part of the decommission datacenter step.

My questions are -

1. Considering the fact that nodetool repair can be quite resource intensive, is there an alternate way to ensure data consistency, after we stop new writes to the old DC?

2. If running nodetool repair is the best option available to confirm all the data has migrated correctly from the original DC

a. Should we run it in the old DC or new DC?

b. Is it better to run one node at a time or running full repair?

I am looking for a safe way to run it with minimal impact to Production workload and application. Thanks for any guidance you can provide!

10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered Erick Ramirez commented

That's a good question to ask!

We recommend that you run a full round of repairs before you decommission a data centre. The most efficient way of doing this is by running a partitioner-range repair (with the -pr flag), one node at a time:

$ nodetool repair -pr

I suggest you start running it in on the nodes in the old DC then run it on the nodes in the new DC.

A partitioner-range repair is very efficient because it will only repair the token ranges owned by each node -- it doesn't run a repair on token ranges owned by other replicas so there's no duplication of work. But be warned: you must run it on all nodes in all data centres to make sure all token ranges are repaired. Otherwise, if you only run it in 1 DC then there's a chance that some data will be missed.

If you're interested, Jeremiah Jordan explained it in detail in this blog post -- Repair in Cassandra. Cheers!

2 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

azim_91_184236 avatar image azim_91_184236 commented ·

Thank you @Erick Ramirez, appreciate your help and guidance!

0 Likes 0 ·
Erick Ramirez avatar image Erick Ramirez ♦♦ azim_91_184236 commented ·

Not a problem. Good luck with the decommission. Cheers!

0 Likes 0 ·