Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

sharathsai666_167144 avatar image
sharathsai666_167144 asked Erick Ramirez commented

Node didn't join the cluster after bootstrapping

Hi all,
We are facing some problems in the scaling of our nodes, previously we are having 7 nodes in our cluster. we have planned to add 7 nodes to our cluster, we are adding one by one node and successfully added 4 nodes and while adding the 5th node the bootstrap process started and after 36+ hrs the node was not part of a cluster, there is no connectivity for that node and nodetool commands are not working and checked storage and it was using 1.5 TB out of 3 TB like other nodes.

So what are steps need to follow to get back that node?

below type of logs are there in system.log

2021-08-23 04:48:36:309*[ERROR]*STREAM-OUT-/10.24.13.107:7000*o.a.c.s.StreamSession*logError*[Stream #07aaa190-0376-11ec-9159-45ce30811155] Streaming error occurred on session with peer 10.24.13.107
java.io.IOException: Connection reset by peer
  at sun.nio.ch.FileDispatcherImpl.write0(Native Method) ~[na:1.8.0_181]
  at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) ~[na:1.8.0_181]
  at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) ~[na:1.8.0_181]
  at sun.nio.ch.IOUtil.write(IOUtil.java:51) ~[na:1.8.0_181]
  at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471) ~[na:1.8.0_181]
  at org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.doFlush(BufferedDataOutputStreamPlus.java:323) ~[apache-cassandra-3.11.3.jar:3.11.3]
  at org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.flush(BufferedDataOutputStreamPlus.java:331) ~[apache-cassandra-3.11.3.jar:3.11.3]
  at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:409) [apache-cassandra-3.11.3.jar:3.11.3]
  at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:380) [apache-cassandra-3.11.3.jar:3.11.3]
  at java.lang.Thread.run(Thread.java:748) [na:1.8.0_181]
bootstrapadd nodes
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered Erick Ramirez commented

The error you posted indicates that the stream got interrupted so it means that the bootstrap didn't complete. Unless the bootstrap operation completes, a node will not be able to join the cluster so that explains why.

You need to review the logs specifically paying attention to log entries related to the stream ID which in your case is: 07aaa190-0376-11ec-9159-45ce30811155. Cheers!

2 comments Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

log.txt
Thanks, @Erick Ramirez for the Reply above logs I got after tracking with that id.
So, what are the next steps do I need to follow

  • Any steps to get back that node
  • Delete the node and add a new one
  • Any Other recommendations?
0 Likes 0 ·
log.txt (801 B)
Erick Ramirez avatar image Erick Ramirez ♦♦ sharathsai666_167144 ·

You need to investigate why the stream failed. In the text file that you posted, there was an error reading a partition being streamed and that would have been the reason for the bootstrap to fail.

You can attempt to re-bootstrap the node with nodetool bootstrap resume but unless you identify the root cause of the failure and fix it, the bootstrap is likely to fail again. Cheers!

0 Likes 0 ·