Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

tarikh avatar image
tarikh asked Erick Ramirez answered

Is there a way to wipe data from a new DC without decommissioning it and run rebuild again?

Hi,

We've added a new DC to a cluster that has now 2 DC.

currently no application are pointing to the new DC.

we followed the procedure of adding new DC including rebuilding, anyhow, during rebuilding we noticed on some nodes connectivity/network issues, we ended up with DC which is not in sync.

after resolving the network issues, we tried the rebuild again, and repair -pr on all nodes.

but still we see DC out of sync.

is there a way to wipe data from this new DC without decommissioning it and then run rebuild again?

application is configured as local_quorum, RF is DC1:3 DC2:3, so we will be safe if the new DC is down or has no data.

decommissionrebuild
1 comment
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

@tarikh, could you please update the question to state what version of Cassandra is this and also the exact steps that you had performed during this new DC addition operation? And how do you say the DC is out of sync? Thanks!

0 Likes 0 ·

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered

Yes, you should be able to rebuild a node without decommissioning it.

Here are the high-level steps:

STEP 1 - Run nodetool drain to force all memtables to be flushed to disk.

STEP 2 - Shutdown Cassandra.

STEP 3 - Delete the SSTables on disk for the application keyspace you want to rebuild. For example, I have a keyspace community which contains 2 tables:

/path/to/data
  community/
    answers-UUID/
      md-1-big-CompressionInfo.db
      md-1-big-Data.db
      md-1-big-Digest.crc32
      md-1-big-Filter.db
      md-1-big-Index.db
      md-1-big-Statistics.db
      md-1-big-Summary.db
      md-1-big-TOC.txt
    questions-UUID/
      md-3-big-CompressionInfo.db
      md-3-big-Data.db
      md-3-big-Digest.crc32
      md-3-big-Filter.db
      md-3-big-Index.db
      md-3-big-Statistics.db
      md-3-big-Summary.db
      md-3-big-TOC.txt

To delete the data files of the 2 tables:

$ rm /path/to/data/community/answers-*/*
$ rm /path/to/data/community/questions-*/*

WARNING - Do NOT delete data files for system keyspaces. Only perform this step on application keyspaces or risk corrupting the system.

STEP 4 - Restart Cassandra.

STEP 5 - Delete the app keyspace from the list of available ranges that are ready to be served by the node. This information is stored in the local system.available_ranges table.

For example, to make the community keyspace unavailable:

cqlsh> DELETE FROM system.available_ranges WHERE keyspace_name = 'community';

STEP 6 - Rebuild the keyspace on the node. For example, to rebuild the community keyspace from the DC1 data centre:

$ nodetool rebuild -ks community -- DC1

HINT - If the keyspace wasn't deleted from the available ranges table, the node won't request the data from other nodes because it thinks it already has them. In this case, you will see a log entry like this:

INFO  [RMI TCP Connection(34)-127.0.0.1] 2021-07-16 18:49:08,845 StorageService.java:1253 - rebuild from dc: DC1, community, (All tokens)
INFO  [RMI TCP Connection(34)-127.0.0.1] 2021-07-16 18:49:08,848 RangeStreamer.java:384 - Some ranges of [...] are already available. Skipping streaming those ranges.

If so, just go back to step 5 and try again.

Repeat the steps above on the other nodes in the DC until all nodes have been rebuilt. Cheers!

Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.