question

lavaraja.padala_150810 avatar image
lavaraja.padala_150810 asked Erick Ramirez answered

Getting WriteTimeoutException using Reaper, "Cassandra timeout during CAS write query at consistency SERIAL"

We are using cassandra reaper for repairs in our cassandra cluster. We are seeing below messages in reaper logs and the repair is not progressing. Using cassandra as backend for reaper.

Caused by: com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during CAS write query at consistency SERIAL (5 replica were required but only 0 acknowledged the write)
    at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:87)
    at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:65)
    at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:297)
    at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:268)
    at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:88)

Our current timeout settings in cassandra as belows.

Cassandra Version: Apache Cassandra 3.10

### cassandra yaml settings: ###

read_request_timeout_in_ms: 5000

range_request_timeout_in_ms: 10000

write_request_timeout_in_ms: 2000

counter_write_request_timeout_in_ms: 5000

cas_contention_timeout_in_ms: 1000

truncate_request_timeout_in_ms: 60000

request_timeout_in_ms: 10000

slow_query_log_timeout_in_ms: 500

cross_node_timeout: false

Regards,

Lavaraja.

reaper
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered

There's very little information in what you provided for us to make a meaningful analysis of the problem you're facing but it indicates that replicas are unresponsive so the compare-and-set (CAS) query is failing.

The most common cause of this is when the commitlog disks cannot keep up with the IO. If your cluster is overloaded, running repairs just adds more load to an already overloaded system.

If you're seeing dropped mutations and/or high GC pauses, these are signs that your cluster cannot keep up with the traffic and you need to review the capacity of your cluster. In such situations, adding more nodes will alleviate the symptoms and allow you to investigate the problem further. Cheers!

Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.