question

Nike avatar image
Nike asked Nike commented

Cassandra keeps crashing without errors

Hi everyone

We have 3 nodes cluster of open-source Cassandra 3.11.6. And our problem is: one of the nodes keeps crashing (1-3 times per week) completely without logs. I mean no errors in system/debug/GC/linux/OOM logs.

The last messages in the debug.log may look like this:

DEBUG [ReadRepairStage:18238] 2021-10-03 06:55:57,358 ReadCallback.java:244 - Digest mismatch:
org.apache.cassandra.service.DigestMismatchException: Mismatch for key DecoratedKey(-3667414724057301858, 323032315f31305f30325f30345f30305f30335f63725f6272616e645f39315f3138305f75735f5f6c73705f6f74615f72656e74616c636172735f63617272656e74616c5f776562) (8f4e680e106aa195f71f9f450ef029fe vs 5799f2c017c91de04030a075a3f4b935)
        at org.apache.cassandra.service.DigestResolver.compareResponses(DigestResolver.java:92) ~[apache-cassandra-3.11.9.jar:3.11.9]
        at org.apache.cassandra.service.ReadCallback$AsyncRepairRunner.run(ReadCallback.java:235) ~[apache-cassandra-3.11.9.jar:3.11.9]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_292]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_292]
        at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:84) [apache-cassandra-3.11.9.jar:3.11.9]
        at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_292]
DEBUG [ReadRepairStage:18236] 2021-10-03 06:55:57,363 ReadCallback.java:244 - Digest mismatch:
org.apache.cassandra.service.DigestMismatchException: Mismatch for key DecoratedKey(-3667414724057301858, 323032315f31305f30325f30345f30305f30335f63725f6272616e645f39315f3138305f75735f5f6c73705f6f74615f72656e74616c636172735f63617272656e74616c5f776562) (84909145131237c795cc11ea541d63ee vs 401c0fcd17a0a9db78e95fa851e1686f)
        at org.apache.cassandra.service.DigestResolver.compareResponses(DigestResolver.java:92) ~[apache-cassandra-3.11.9.jar:3.11.9]
        at org.apache.cassandra.service.ReadCallback$AsyncRepairRunner.run(ReadCallback.java:235) ~[apache-cassandra-3.11.9.jar:3.11.9]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_292]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_292]
        at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:84) [apache-cassandra-3.11.9.jar:3.11.9]
        at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_292]

We tried to increase debug level to a TRACE level, but again - we saw no errors. One moment and Cassandra process disappears from the server memory. And again this keeps happening only with one of three nodes.

Guys, can you suggest to us what we can do?

cassandra
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered Nike commented

If the log entries abruptly stop as you described then it's an indication that the Cassandra process is being terminated/killed by an external process/daemon/monitor/person outside of the control of Cassandra.

In my experience, the most common cause is the Linux oom-killer terminating the Cassandra process because the server has run out of memory.

See if Why nodes have increased memory usage applies to your cluster. Cheers!

3 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

I believe it is not oom-killer. Here is RAM monitoring screenshot for this server (you can see Cassandra process crashes):

casram.jpg

We have 80GB total RAM and have Xms/Xmx set to 45GB.
What else can help us to investigate the root cause?

0 Likes 0 ·
casram.jpg (140.5 KiB)
You've just proven my point. :)
0 Likes 0 ·
Nike avatar image Nike Erick Ramirez ♦♦ ·

Excuse me, could you please describe what I've proved?

0 Likes 0 ·