denis.koblov avatar image
denis.koblov asked Erick Ramirez commented

Why am I seeing unexpected behavior like lots of read-repairs and hint handoffs?

I have a cassandra cluster(3.11.2 version) in the 4 DC.

Sometimes, the nodes in the each DC start work very much. There are a lot of read-repair operations, a lot of successed hints.


- org.apache.cassandra.metrics.HintsService.HintsSucceeded.count

s2vjg.png- org.apache.cassandra.metrics.ReadRepair.RepairedBlocking.count


- org.apache.cassandra.metrics.ThreadPools.TotalBlockedTasks.transport.Native-Transport-Requests.count


- org.apache.cassandra.metrics.ThreadPools.PendingTasks.request.MutationStage.value


It happens for about 1 hour. In this time, nodes lost connection with each other and itself:

[cluster3-timeouter-0] com.datastax.driver.core.Host.STATES - [] Defuncting Connection[/{
  {local_ip}}:9042-1, inFlight=0, closed=false]
  com.datastax.driver.core.exceptions.ConnectionException: [/{
  {local_ip}}:9042] Heartbeat query timed out
    at com.datastax.driver.core.Connection$11.onTimeout(
    at com.datastax.driver.core.Connection$ResponseHandler$
    at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(
    at io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(
    at io.netty.util.HashedWheelTimer$
    at io.netty.util.concurrent.DefaultThreadFactory$

I don't understand, why it happened. It isn't compaction, because there are no anomalies in the compaction metrics. Are there any ideas why it happens?

performancehinted handoffread repair
s2vjg.png (399.7 KiB)
kedlm.png (697.7 KiB)
vzsld.png (611.9 KiB)
gtnq4.png (456.0 KiB)
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered Erick Ramirez commented

The symptoms you described indicate to me that the traffic from your application is bursty meaning that it spikes from time to time instead of a regular, constant stream of user activity.

You see hinted handoffs because the nodes become unresponsive because they are overloaded. They cannot keep up with the burst in requests from the app. The graphs you posted show that there's close to no traffic then suddenly there's a massive spike in writes around 21:00 of between 100K to 200K.

When the nodes get overloaded with writes, they end up dropping mutations (inserts, updates, deletes) because the commitlog disk cannot keep up with the IO. There's only so much writes a disk can take and you're hitting the maximum.

When a replica is unresponsive, the coordinator node will store hints (missed writes) for that replica. When the replica comes back online, the coordinator node will replay (handoff) the missed writes (hints) to that replica -- this is what is called hinted handoff in Cassandra. See Hinted Handoff: repair during write path for a more detailed explanation.

When read requests come through and the data is out-of-sync between replicas, a read-repair is triggered by the coordinator to sync the missing data on a bad replica before returning the results to the client (app). Replicas are out-of-sync because they've missed writes while they were overloaded. See Read Repair: repair during read path for a more detailed explanation.

When nodes are overloaded, they will be unresponsive and this explains why the driver loses connection to the nodes. Compaction doesn't cause this problem -- this issue isn't caused by Cassandra.

The root cause of all these symptoms is that your cluster doesn't have enough capacity to deal with the peak application load. You need to review your storage infrastructure to make sure you're using the right disks such as local SSDs (NVMe SSDs are ideal for low latency) versus SAN disks, for example.

You also need to review the size of your cluster and you might not have enough nodes. In Cassandra, you need to provision capacity for peak load and not the average load. Cheers!

4 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

denis.koblov avatar image denis.koblov commented ·

Erick, thank you for your detailed answer, but I have two questions:

1) I have other metrics in my application that show that there were no spikes in load during this period. Probably my metrics are bad, but I thought that spikes on the MutationStage chart caused by internal operations but not client requests (I mean requests from application to the cassandra). So my question: Do org.apache.cassandra.metrics.CQL.PreparedStatementsExecuted.count and org.apache.cassandra.metrics.ThreadPools.PendingTasks.request.MutationStage.value show only client requests or internal (cassandra) operations too?

2) Let's imagine, that the load was unexpectedly huge in one DC. As a result, the nodes in this DC would be unavailable. It explains the lost of connections with these nodes. But why did other nodes also lose connections with themselves?

0 Likes 0 ·
Erick Ramirez avatar image Erick Ramirez ♦♦ denis.koblov commented ·
  1. Both for whatever threads are in the pool at the time they were polled.
  2. With respect to writes, all mutations are sent to all replicas to all DCs -- not just the local DC. As the screenshot shows, the mutations are queued up on all 4 DCs since I suspect the keyspace is replicated to all DCs.

I expect you'll have follow up questions but consider this first -- if your app is not sending writes to Cassandra, there would be no reason the mutations would build up on the nodes whichever DC they belong to. And they lose connectivity with each other because they are overloaded.

Also, writes in C* are fast because there is no disk seek involved. Commits are appended to the end of the log which is why they are called commitlogs. Any new insert, update or delete (in C* all mutations are inserts under the hood, even for deletes) is simply appended to the end of the commitlog. Cheers!

0 Likes 0 ·
denis.koblov avatar image denis.koblov Erick Ramirez ♦♦ commented ·
if your app is not sending writes to Cassandra, there would be no reason the mutations would build up on the nodes whichever DC they belong to.

The repair procedure for example. Or, if I run repair in the one DC, it won't affect the org.apache.cassandra.metrics.ThreadPools.PendingTasks.request.MutationStage.value metric?

0 Likes 0 ·
Show more comments