acscott avatar image
acscott asked Erick Ramirez commented

How do we safely decommission a data center?

We have 2 datacenters (DCs). 1 is production, the other is begin migrated to using replication. However we encountered data corruption in one keyspace in the 2nd DC after migration completed.

We tried repair of the table in the keyspace but the row counts are different each time.

We tried scrubbing the table on all the nodes in the 2nd datacenter but the row counts are different each time and there's definitely no writes going on.

If we drop the table and repair it, this would drop the table in the production DC.

So, what is the lowest risk way of rebuilding the table from 1st DC? It seems decommissioning the 2nd DC and re-adding it is the sure way.

How did the table become corrupt? We did a nodetool ---rebuild on each node when creating the 2nd DC but interrupted the rebuild with CTRL-C. It is the only explanation we have.

(We used the instructions here to create the 2nd DC:

So, how de we *safely* decommission the 2nd DC?



10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

smadhavan avatar image
smadhavan answered smadhavan edited

@acscott, you'd basically follow the Decommissioning a datacenter documentation to safely decommission the 2nd DC from your cluster.

In terms of your other questions like how did the table get corrupted, etc., we don't have enough information on here like the version of Cassandra/DSE that you're running on here, what are the exact configuration and steps that was done on this cluster prior to adding, repairing, and expanding the DCs, error logs, how exactly are you counting the rows, etc to be able to help triage here.

There was a great explanation in another post related to counting in Cassandra and as part of that Erick had wrote how to use DSBulk to safely perform the count operation.

... along came DataStax Bulk Loader (aka DSBulk). It is a tool for efficiently loading and unloading data from Apache Cassandra though that is not the extent of its abilities.

DSBulk has a nice feature for counting data in large tables in a distributed manner. It is the recommended tool for loading or unloading data in CSV or JSON format. It performs up to 4x faster than the cqlsh COPY command.

And yes, DataStax made it freely available to open-source Apache Cassandra users. For details, see Counting data in tables with DSBulk. Cheers!

10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Erick Ramirez avatar image
Erick Ramirez answered Erick Ramirez commented

The safe way to decommission a Cassandra DC is to:

  1. reconfigure all apps to connect to another DC
  2. run a rolling repair with the -pr flag one node at a time on all nodes in all DCs
  3. remove the DC from replication of all keyspaces
  4. run nodetool decommission on each node in the DC, one node at a time with no 2 nodes being decommissioned in parallel.

I realise that you've already gone ahead and decommissioned/recommissioned the nodes but for the record, it is not the correct way of handling the issue you were facing. Rebuilding nodes is not the appropriate way of addressing issues in Cassandra. It may work for a single-node DC, or perhaps even a 3-node DC with very little data. But in most use cases, rebuilding nodes take hours to days depending on how much data there is in the cluster.

What you should have done instead is identify the cause of the corruption and address the cause. In my experience, most don't know how to count rows in Cassandra and the practice is flawed in most cases as I've explained in Why COUNT() is bad in Cassandra.

If you have the details of why you think the data is corrupted, we'd be happy to review it for the chance at identifying the root cause so you could maybe avoid it from happening again. Cheers!

3 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

acscott avatar image acscott commented ·

thank you very much for the reply and the steps to decomission a Cassandra DC.

We haven't decommissioned/recommissioned the nodes yet. We did counts using the wonderful dsbulk utility on other read-only tables between the two DCs and they came out exactly equal. The corruption is probably due to operator error somewhere, I'm afraid: I don't recall the precise steps anymore. But I don't suspect it's Cassandra at all, it's been replicating like a charm. (It's a good reminder to write down the steps as you go as an audit trail.)

Just for clarity, we only did the nodetool --rebuild to add a new DC (step 10 here)

0 Likes 0 ·
acscott avatar image acscott commented ·

Our plan is to use a maintenance window to truncate the table and reload it using dsbulk. At 1.4B rows with just a few columns it doesn't take very long using dsbulk in our tests. We have flexibility with the length of the window if something goes awry. At worst case we can re-run our batch jobs, so we have a way forward.

Thank you very much for your support of the community!


0 Likes 0 ·
Erick Ramirez avatar image Erick Ramirez ♦♦ acscott commented ·

It's not obvious to me why you are truncating a table and bulk-loading it. I think you're creating more problems than you are trying to solve.

If you tell us what issue you are facing, we might be able to assist you without having to go through this [what seems like an] unnecessary operation. Cheers!

0 Likes 0 ·