question

IGonzalez avatar image
IGonzalez asked Erick Ramirez answered

What could be the cause of "Unexpected exception killed worker"?

My cassandra's raise an error of type:

ERROR [RequestResponseStage-3] 2021-02-15 11:36:37,325 SEPWorker.java:147 - Unexpected exception killed worker: {}java.lang.NullPointerException: null    at java.lang.Long.compareTo(Long.java:1234) ~[na:1.8.0_181]    at java.lang.Long.compareTo(Long.java:54) ~[na:1.8.0_181]    at java.util.concurrent.ConcurrentSkipListMap.cpr(ConcurrentSkipListMap.java:655) ~[na:1.8.0_181]    at java.util.concurrent.ConcurrentSkipListMap.findPredecessor(ConcurrentSkipListMap.java:682) ~[na:1.8.0_181]    at java.util.concurrent.ConcurrentSkipListMap.doRemove(ConcurrentSkipListMap.java:960) ~[na:1.8.0_181]    at java.util.concurrent.ConcurrentSkipListMap.remove(ConcurrentSkipListMap.java:1992) ~[na:1.8.0_181]    at org.apache.cassandra.concurrent.SEPWorker.doWaitSpin(SEPWorker.java:245) ~[apache-cassandra-3.11.3.jar:3.11.3]    at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:78) ~[apache-cassandra-3.11.3.jar:3.11.3]    at java.lang.Thread.run(Thread.java:748) [na:1.8.0_181]

Env: k8s; Nodes: 3

Despite them, the cluster seems to be up and active (nodetool status indicates this at least), but applications using cassandra cannot connect.

What could be the root cause of the error?

(Note It occurs a few times, once or twice a month. )

Thanks in advance.

cassandra
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered

This indicates that a shared pool worker thread was terminated and it couldn't be assigned a new task.

This occurs when the JVM has run into problems, usually an out-of-memory error and it is no longer able to create new threads to service requests. This is the reason applications can no longer connect to the cluster.

In most cases, this is an indication that your cluster is overloaded and you need to review your configuration. There is a good chance that the nodes may not have sufficient RAM allocated to the heap and/or there isn't enough nodes in the cluster. We recommend a minimum of 32GB of RAM per node and allocate at least 16GB to the heap if using CMS.

Here are other resources you should have a look at if you're looking to get the most out of your cluster:

I realise a couple of the docs are for DataStax Enterprise but most of the recommendations apply to open-source Apache Cassandra.

Since you're running on Kubernetes, consider using K8ssandra -- a ready-made platform for running Apache Cassandra in Kubernetes using the DataStax Cassandra Operator (cass-operator) under the hood but with all the tooling built-in:

  • Reaper for automated repairs
  • Medusa for backups and restores
  • Metrics Collector for monitoring with Prometheus + Grafana
  • Traefik templates for k8s cluster ingress

Cheers!

Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.