Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started



graemenewlands avatar image
graemenewlands asked Erick Ramirez commented

Is there a way to mitigate a node with low resources from being picked as a coordinator?

Cluster: 3 DC, 6 Nodes per DC, RF: 3

Client: datastax java driver 3.11

Situation, after patching, one host launches a monitor process (completely unrelated to C*) that has a memory leak. Over a one week interval, the memory leak consumes more and more available system memory (is not visible via running top).

The host with the memory leak remains visible on the network, it responds to ping, accepts connections from clients, displays as Normal when viewing

nodetool status

What we saw happening was that since the node was unable to allocate memory for reads, requests for which the node itself was a coordinator and hosted a replica became disproportionally slower than its peer coordinators. At the point where almost no memory was available to C*, the node would still accept client connections, however, would respond with the not enough replicas message (two of three replicas responded).

We're using the datastax java client libraries, they work well.

I'm trying to understand how this situation could be remediated with zero knowledge of the underlying system infrastructure:

  1. Under what circumstances would the LoadBalancingPolicy used by the driver return HostDistance.IGNORED
  2. is there anything internal (i.e. that appears in the logs as warning or error) to C* that would identify hosts that gradually degrade relative to their peers?
  3. Aside from the obvious system monitoring that should flag rogue processes with memory leaks, what monitoring process would start to flag the gradual performance decline of a single node (via JMX).
java driver
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered Erick Ramirez commented

Unfortunately, it's very difficult for the driver to deal with this kind of scenario because it relies on the Cassandra nodes to be unresponsive. As you've already stated, the underlying issue is external to Cassandra so the node is reporting itself as still "operational".

The driver sends heartbeats to nodes on the CQL port and if the node responds, the driver sees it as UP.

To respond to your points directly:

  1. Nodes will be ignored by the DC-aware policy if they are not local and you don't allow remote nodes explicitly.
  2. In theory you should see the read or write latencies spike up if a node is getting overloaded. But in the scenario you described where the node is not accepting requests but not actually processing those requests then the latency wouldn't come into play.
  3. As per (2) above.

As you've already pointed out, you need to monitor the OSI and infrastructure to pick up external issues that Cassandra itself or the driver would not. Cheers!

4 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

thanks! Out of interest, do you know of any community efforts to flag under performing nodes? I could see how the java driver could maybe pull some data from zookeeper to enhance its declsion on available coordinators, whether that would be worthwhile or not is an entirely different question however!
0 Likes 0 ·

Unfortunately, no. It's difficult for the driver to have a view of external factors like server load and RAM utilisation so it can't account for those.

There is also no appetite to use Zookeeper since it introduces another point-of-failure in the ecosystem. In fact, there are a few Apache projects moving away from Zookeeper in preference for other solutions. Cheers!

0 Likes 0 ·

thanks Erick, what alternatives to zookeeper are being used, Consul?

0 Likes 0 ·
Show more comments