Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

a.gheshlaghy_177282 avatar image
a.gheshlaghy_177282 asked ·

Can nodetool cleanup be run after all nodes are added?

Hi,

from the documentation

"Cleanup can be safely postponed for low-usage hours"

url:https://docs.datastax.com/en/cassandra-oss/3.0/cassandra/operations/opsAddNodeToCluster.html

and also:

"Failure to run nodetool cleanup after adding a node may result in data inconsistencies including resurrection of previously deleted data."

url:https://docs.datastax.com/en/dse/6.7/dse-admin/datastax_enterprise/operations/opsAddNodeToCluster.html

how can it resurrect data when the token recalculation happened, shouldn't nodes that streamed data to new node ignore the token ranges that they are not responsible for it anymore?

the first doc is about open source cassandra and second one is for dse and postponing cleanup process is not mentioned in dse doc, so does that mean in data stax enterprise we need to run nodetool cleanup as soon as adding the node in rest of the cluster while in open source cassandra it can be postponed ?

Also, when adding multiple nodes to the cluster in dse, should we run nodetool cleanup after adding a node and then proceed and add other nodes or cleanup process can be done at the end of the adding all nodes?

dsecleanup
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered ·

Thanks for pointing those differences in the documentation. The behaviour is the same and there is no difference between Apache Cassandra and DSE as far as cleanup is concerned. I will get the DSE page updated accordingly.

As it says in the OSS version of the document, there is no requirement to immediately run nodetool cleanup. In fact, you don't need to run it unless you desperately need to reclaim disk space on the existing nodes because the data which is no longer owned by nodes will naturally get compacted out in the coming days.

We recommend performing the cleanup outside of primetime hours for your cluster so the additional disk IO doesn't impact the performance of your application. Cheers!

2 comments Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Dear Erick,
no problem.
Thanks for the answer, could you please explain how can it result to data resurrection? according to docs it can lead to data resurrection.

0 Likes 0 · ·

I don't think it's possible for that to happen which is why I'm getting the DSE page corrected. Cheers!

0 Likes 0 · ·