kajarvine_115939 avatar image
kajarvine_115939 asked Erick Ramirez commented

Issues with joining node, stream failed

We have a problem joining new (old-dropped) node into Cassandra (Apache) Cluster.

The version is 3.11.2 - vnodes 128

It seems that during the bootstrap the streams start to flow ok.

At the end / after some time "Stream Failed".

One ERROR we have gotten is like

ERROR [Native-Transport-Requests-1] 2021-01-25 14:54:21,621 - Unexpected exception during request; channel = [id: 0xf5c8dc6c, L:/ - R:/]
java.lang.NullPointerException: null

..67 is the node in question, the 172 is docker address of a sending node"

We do suspect network failures. But how to prove "the NW is not valid"?

OR what kind of traces to run?

10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered Erick Ramirez commented

The common cause for stream failures is when the connection between the source and receiving node gets interrupted either because of (a) a network connectivity issue, or (b) failure in reading the SSTable on the source node to stream the data.

The error you message you posted has no relation to the streaming failure at all. It relates to a failed client connection between your application (running on remote server and the coordinator node (local server

You will need to review the logs on both the source and destination nodes to determine why the stream failed. For the record, I'm not asking you to send me the logs. I'm just giving you pointers on how to identify the cause of the issue. Cheers!

5 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Hello Erick, again :)
We did do a lot after Your excact answer. Tried to find out TCP layers etc.
AND did a lot of logging.
So I will attach the log between two instances here also


One is of course the bootstrapping one, and one another streaming one.

We have done tests with AWS instances with similiar docker image and schema descriptions, and they do bootstrap just fine.
That makes this even more disturbing; "WHY. It seems that all stops after all streaming has been completed and it is time to upgrade the schema version."


0 Likes 0 ·

[part 1/2]

And one more; of the schema when joining / streaming:

We have inspected the logs in the environment where the joining works. Here are the few lines that show the node gets the correct schema version:

DEBUG [main] 2021-01-27 09:35:45,648 - Gossiping my 3.11 schema version (empty)
DEBUG [RequestResponseStage-1] 2021-01-27 09:35:47,311 - Immediately submitting migration task for /, schema versions: local/real=(empty), local/compatible=(empty), remote=0ba43fce-642d-3ec5-a8e4-b837de32458e
DEBUG [InternalResponseStage:1] 2021-01-27 09:35:56,475 - Gossiping my 3.11 schema version 0ba43fce-642d-3ec5-a8e4-b837de32458e
DEBUG [main] 2021-01-27 09:35:56,717 - got schema: 0ba43fce-642d-3ec5-a8e4-b837de32458e
0 Likes 0 ·

It seems like you are sidetracked. The schema version is not a problem and is not related at all to the stream failure.

You need to trace the stream in the logs based on the streaming ID and figure out why the streaming failed. Cheers!

0 Likes 0 ·

A followup on what kajarvine_115939 posted. The streaming works now, but the schema version is different on the joining node. As seen in the log snippet attached in kajarvine_115939's post. The other nodes can be changed to that schema version with resetlocalschema command. Is there any risk involved doing so? And why would the joining node decide to create a new schema version?

0 Likes 0 ·

[part 2/2]

INFO [main] 2021-01-27 09:35:56,717 - JOINING: waiting for schema information to complete
INFO [main] 2021-01-27 09:35:57,718 - JOINING: schema complete, ready to bootstrap
DEBUG [pool-1-thread-1] 2021-01-27 09:36:10,769 - Schemas are in agreement.

Any ideas where to look? Why in the other environment the joining node gets the wrong schema version? The logs don’t really say anything about that.

0 Likes 0 ·