Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

noelle.heerink-wijnja_161326 avatar image
noelle.heerink-wijnja_161326 asked ·

Cassandra repair threads are hanging forever

we have a 21 node Cassandra ring over 3 datacenters. Version 3.11.5. CentOs. on premise. And every once in a while Repair threads start hanging forever on node and blocking other repairs on different nodes in a later timeframe of the same keyspace. We do full and PR repairs on all 21 nodes. And have a schedule so they should not interfere with each other, but if one starts hanging then they will.


What could be causing this? That is the first node with a hanging thread. What should we investigate?

some Grafana/Prometheus output of the first node hanging


repair
1579630068031.png (79.2 KiB)
1579630089811.png (92.2 KiB)
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered ·

@noelle.heerink-wijnja_161326 The most common cause of repair streams hanging is when they get interrupted. You need to check the logs on the nodes for the real cause.

Some of the things to check for is whether a firewall is truncating the connections between nodes. If so, you'll need to make sure the keepalive is set on nodes. See this KB article for instructions. Cheers!

2 comments Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Unfortunately to view the KB article, you have to be registered support user of Datastax and we are not. Is it possible to receive the KB article content in a different way?

0 Likes 0 · ·
Erick Ramirez avatar image Erick Ramirez ♦♦ noelle.heerink-wijnja_161326 ·

@noelle.heerink-wijnja_161326 Our apologies. The article wasn't published correctly. I've fixed it now so you should be able to access it. Cheers!

0 Likes 0 · ·