Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

cache_drive avatar image
cache_drive asked Erick Ramirez edited

What are the steps for bringing a node back into service after it's been powered down for 2 days?

I have a x8 node production SMS cluster running a pretty old version of Cassandra. One of the nodes was powered down for the weekend. This single node was unable to communicate with the entire ring so my question is now that I've got the VM back up, what do we need to do?
Should I perform a cleanup in a specific order on the ring and once that is done, go back around the ring and do a repair -pr? Appreciate any advice on how to proceed here.

cassandra
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered Erick Ramirez edited

@cache_drive If the node has been down for less than the smallest gc_grace_seconds, it should be as simple as starting Cassandra on the node then running a repair on it.

If the node has been down longer than the smallest GC grace, you will need to wipe the node clean including deleting all the contents of data/, commitlog/ and saved_caches/. Then replace the node "with itself" by adding the replace_address flag and specifying its own IP. For details, see Replacing a dead node. Cheers!

4 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Thank you! So would I ONLY perform a repair -pr on the node that was down or do it on all ring members in nodetool status?

0 Likes 0 ·

@cache_drive Just a regular repair (without any flags) on the problem node should suffice and it will repair all the token ranges it owns plus the replica ranges. Cheers!

0 Likes 0 ·

Hi, I have a somewhat similar situation, but in my case the node has been down for 2 months (so definitely longer than gc_grace_seconds). The node has also lost one of its drives. Would I also perform the same process of replacing the node with itself? do I have to wipe it clean first? What would happen if I tried to just run a nodetool repair in this case? I noticed there's a --pull option that should make it only stream data into the node, which seemed like what I would want, but there's no documentation regarding it.

Btw, we're using Cassandra version 2.0.17, in case that changes the answer. Also, we're using the open source version at the moment.

Thanks!

0 Likes 0 ·
You would need to wipe the node in order to bring it back in. DO NOT run a repair. Even with --pull because you will still have old data that could resurrect in your cluster through read repairs and other repairs that are running. You should wipe, and then use replacenode as above.
0 Likes 0 ·