Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

penky28_147901 avatar image
penky28_147901 asked steve.lacerda answered

Observed timeouts after upgrading from Cassandra 3.11.3 to 3.11.13

Hi,

We recently upgraded Cassandra server from 3.11.3 to 3.11.13. After upgrade we observed timeout issues from debug.log and from application server logs.

DEBUG [Native-Transport-Requests-21] 2022-05-24 04:30:52,151 ReadCallback.java:133 - Timed out; received 1 of 2 responses (including data)

this was not the case when we were on Cassandra 3.11.3 version.

we ran same set of jobs on 3.11.3 and they were successful. No time outs observed in debug.log or application server logs.

One thing I observed in tpstats is that "completed" value of NTR requests is low in 3.11.13 when compared with 3.11.3.

3.11.3 - Native-Transport-Requests 0 0 21079745 0 0

3.11.13 - Native-Transport-Requests 0 0 12364724 0 0

on further reading, I see that from 3.11.5, there were some changes made to NTR requests from JIRA -

Prevent client requests from blocking on executor task queue (CASSANDRA-15013)

Why the number of requests from clients are blocked or not processed due to the above change from Cassandra 3.11.5. How do I tune or which parameter/s to let the Cassandra server accept the requests(from client side) and process it.

Thanks for your help.

cassandra
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Erick Ramirez avatar image
Erick Ramirez answered

What the log entry indicates is that a read request was issued to two replicas (of 2 responses) but only one replica responded to the coordinator (1 of 2). The most common cause of this is unresponsive nodes when the cluster is overloaded.

I'm not sure what your "jobs" do but my best guess is that those jobs are overloading the cluster leading to the timeouts. You'll need to review the logs for clues.

Also, it looks like you misunderstood the change in C* 3.11.5. CASSANDRA-15013 is supposed to prevent requests from getting blocked. Cheers!

Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

steve.lacerda avatar image
steve.lacerda answered

Not sure if this is your issue, but I did notice that due to the changes in 3.11.5, we had to increase the number of connections and that resolved timeout issues. Refer to the java documentation, or whatever is relevant for your env:

https://docs.datastax.com/en/developer/java-driver/3.11/manual/pooling/

Specifically, these parameters:

poolingOptions
    .setCoreConnectionsPerHost(HostDistance.LOCAL,  4)
    .setMaxConnectionsPerHost( HostDistance.LOCAL, 10)
    .setCoreConnectionsPerHost(HostDistance.REMOTE, 2)
    .setMaxConnectionsPerHost( HostDistance.REMOTE, 4);
Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.