Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

scano_183208 avatar image
scano_183208 asked ·

Server timeout during read query at consistency LocalOne (0 replica(s) responded over 1 required)

Hello,

We have been experiencing the following Consistency level error on read attempts. It does not happen constantly and after a few minutes we stop receiving this error. I am wondering if i could get some pointers as to where to look for a more detailed description of what is actually going on. Here is the error we see:

2020-09-21T15:08:54.8521630Z Server timeout during read query at consistency LocalOne (0 replica(s) responded over 1 required)
    at Cassandra.Data.Linq.CqlQueryBase`1.InternalExecuteWithProfileAsync(String executionProfile, String cqlQuery, Object[] values)
    at Cassandra.Data.Linq.CqlQueryBase`1.ExecuteCqlQueryAsync(String executionProfile)

I have also check dse server logs as in system.log and debug.log but i do not see anything that correlates to this.

Thank you

Simon

consistency level
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

david.cao avatar image
david.cao answered ·

You can setup a monitor for this issue. Then we can know which node has timeout issue. You can check this post. It is exactly the same issue with yours. How to fix a Cassandra timeout issue

It might be also related with the poor server performance like slow disk. The key thing is to setup a Cassanra cluster monitor. This will save you a lot of troubleshooting time.

Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Erick Ramirez avatar image
Erick Ramirez answered ·

The server-side timeout indicates to me that the queried replica did not respond within read_request_timeout_in_ms.

The most common causes for this are:

  • expensive queries, e.g. full table scans, using IN() clause with tens of partitions, use of ALLOW FILTERING
  • high-delete workload so queries need to scan over thousands of tombstones
  • overloaded nodes, long GC pauses

I would recommend that you pinpoint the queries which are timing out then correlate the driver log entries with the replica which timed out.

Once you have a better idea of the queries and replicas, execute the query locally in cqlsh with TRACING ON. The trace output will give you clues as to why the query is failing. Cheers!

3 comments Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

@Erick Ramirez


Thank you for your suggestion. I found the replica and the queries that cause this issue to appear. The only thing is that after enabling Tracing on within cqlsh and submitting the query thats causing the failure the system does not provide me with a trace record it only throws the error and now i'm back to square one.


-Simon

0 Likes 0 · ·
0 Likes 0 · ·

You need to check the logs on the node for clues on why you can't execute the query. By the way, what is the exact query?

0 Likes 0 · ·