vranganathan avatar image
vranganathan asked Erick Ramirez edited

Cassandra v3.0.9 bootstrap failing and stuck in JOINING state even after the streaming is complete

We are in the process of scaling the cluster from 30 nodes to about 40.

Configuration for bootstrap:

auto_bootstrap: true (Default)
-Dcassandra.consistent.rangemovement=false (Default)

The bootstrap seems to be complete with all the nodes streaming data to this new node. Yet, the node seems to stay in JOINING state and bootstrapping eventually is timing out (after 3 hrs; streaming_socket_timeout_in_ms). This is an inconsistent state with the new node being stuck in UJ state forever.I tried nodetool bootstrap resume which also hangs indefinitely. I checked nodetool netstats and none of the nodes are streaming data to the new node.

Now, since I know that this node has all the data that belongs to it, I try to add auto_bootstrap: false in the cassandra.yaml and restart cassandra process. My expectation was that adding auto_bootstrap: false will not bother about streaming data from other nodes, but seems like I am missing something here.

The node seems to receive data from other nodes and the bootstrapping is starting all over again.

I went another step ahead and tried by adding the -Dcassandra.consistent.rangemovement=false along with the auto_bootstrap: false (I did this as I intermittently got RuntimeException suggesting A node required to move the data consistently is down although all the nodes were UN).

I still see that the node tries to stream data from other nodes. Am I missing something here.Would really appreciate if someone could help me out here.

1 comment
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

vranganathan avatar image vranganathan commented ·

Looks like `nodetool bootstrap resume` was taking a long time to complete and I assumed it was stuck (Not able to reason as to why I was not able to see any streaming going thru earlier).

All I did was to restart the cassandra process with the defaults (`auto_bootstrap: false` & `-Dcassandra.consistent.rangemovement=false` and wait until it got stuck again (I was getting a `SocketTimeoutException` consistently)..

After it was stuck I ran a `nodetool bootstrap resume` again and let it run for some time. Eventually it completed and the node joined the ring.

0 Likes 0 ·

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered

@vranganathan Since you have disabled consistent range movement, the likely scenario is that the bootstrap streams for the node got interrupted when you added nodes simultaneously and that the streams didn't actually complete. When you ran the resume command, it restarted the streams again. Cheers!

10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.