Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

cassandrauser avatar image
cassandrauser asked Erick Ramirez answered

How do I fix slow queries reported in the logs as "slow timeout 500 msec/cross-node"?

Seeing few queries taking more than 500 milliseconds and these queries are logged with the message

slow timeout 500 msec/cross-node"

These queries are fired from a java driver and high latency is for less than 1% of the traffic for the same query.

The same queries take few milliseconds for 99% of time and validated the same by running experiments for multiple days. No bottleneck is seen in CPU/Memory/IO.

How to fix these slow queries?

java driver
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

steve.lacerda avatar image
steve.lacerda answered

Slow queries at the p99 and higher latencies are typically indicative of data model issues affecting GC performance. You can check the logs for high GC times, or even a large number of consecutive GC's.

To resolve these types of issues, you should be looking at:

1) High tombstone counts

2) Tables with blobs eating up memory

3) Large partitions

4) Large collections


Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Erick Ramirez avatar image
Erick Ramirez answered

To add to Steve's answer, the first thing I would do is to run the query in cqlsh with TRACING ON. This will allow you to quickly identify whether the query itself is problematic or there are some other issues in your cluster, i.e. a slow node.

If after tracing you identify that the query isn't problematic then the most likely issue is that your cluster is overloaded at the time that the slow query messages were logged -- not at the time that you were tracing the query.

Unfortunately when nodes become overloaded, most of the queries get logged as slow because they triggered the slow query threshold. You need to make sure you don't overload your cluster and size it correctly by adding more nodes as appropriate. Cheers!

Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.