question

kallmekunal_131749 avatar image
kallmekunal_131749 asked Erick Ramirez commented

How do I handle ReadTimeoutException where a query consists of thousands of partitions?

I have a situation where i need run some query on a database based on hour(partition key).

Earlier the solution was based on in query where query consists of may be thousands of hours in the in section.

Something like:

SELECT field1,field2,hour FROM tablex WHERE hour IN (hour1, hour2, hour3, ..., hour2000);

The query was timing out at times ..i went online and got a suggestion to leverage on cassandra capability to identify node based on partiotion keys hash and split all in queries into individual equal= hour query. So now i have some 2000 async queries something like:

SELECT field1,field2,hour from tablex where hour = hour1;
SELECT field1,field2,hour from tablex where hour =hour2;
SELECT field1,field2,hour from tablex where hour =hour3;
...

I did that with async session api..I am still getting error which is :

Cassandra timeout during read query at consistency LOCAL_QUORUM (2 responses were required but only 1 replica responded)","stacktrace":"com.datastax.oss.driver.api.core.servererrors.ReadTimeoutException: Cassandra timeout during read query at consistency LOCAL_QUORUM (2 responses were required but only 1 replica responded)
    at com.datastax.oss.driver.api.core.servererrors.ReadTimeoutException.copy(ReadTimeoutException.java:111)
    at com.datastax.oss.driver.internal.core.util.concurrent.CompletableFutures.getUninterruptibly(CompletableFutures.java:149)
    at com.datastax.oss.driver.internal.core.cql.CqlRequestSyncProcessor.process(CqlRequestSyncProcessor.java:53)
    at com.datastax.oss.driver.internal.core.cql.CqlRequestSyncProcessor.process(CqlRequestSyncProcessor.java:30)
    at com.datastax.oss.driver.internal.core.session.DefaultSession.execute(DefaultSession.java:230)

I am still not very convinced with this new approach as i have lots of query to execute but probably it will distribute the query to multiple nodes .

Requesting experts:

  1. how to get rid of this read timeout exception: It seems there is some read timeout within cluster level which may be causing it.on driver level my timeout is 12s(DSE driver version 4.9).with below statements :.withDuration(DefaultDriverOption.REQUEST_TIMEOUT, Duration.ofSeconds(12))
  2. Any better way to do this.

Please help

java driver
1 comment
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

kallmekunal_131749 avatar image kallmekunal_131749 commented ·


Waiting for some expert advice on this !!

next i am going to try with read_timeout increase in cassandra.yaml...

but it may be related to GC pauses which is yet to analyze...any further suggestions to increase GC pauses start times or other tunning.

0 Likes 0 ·

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered Erick Ramirez commented

This isn't about "getting rid of the read timeout exception" or increasing the timeout because that's just a symptom of the underlying issue.

The original issue you had is using the IN() operator with thousands of keys which indicates that you're doing something other than an OLTP query and perhaps you have an analytics workload.

When you use the IN() operator with thousands of keys, the coordinator gets overloaded since it has to coordinate thousands of individual read requests and need to wait for each of those thousands of requests to complete before it can respond to back to the client.

Only 2-3 keys are recommended when using the IN() operator. As you already found out, it is not optimisation and asynchronous requests is a better way since the driver will distribute those requests to different coordinators.

In your situation though, you are still getting this exception despite issuing individual asynchronous request because you are overloading your cluster:

ReadTimeoutException: Cassandra timeout during read query at consistency LOCAL_QUORUM (2 responses were required but only 1 replica responded

It's an indication that you have reached the maximum capacity of your cluster and have overloaded it. If this is the case then you need to review the size of your cluster and consider adding nodes to increase capacity to handle the throughput that your application requires. Cheers!

2 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

kallmekunal_131749 avatar image kallmekunal_131749 commented ·

Thanks for the reply Erick.

But does that mean that cassandra 3 node cluster with fairly averagely used hardware ( I think the RAM is 8 GB and CPU is 2 cores).

Is capable of handling only approx 2000 read queries at a time(may be per second).If yes then could be a surprise for me!!

apart from increasing the cluster size what is there anything else we could do?

Probably read timeout reconfig could help here but what can be other approaches ..any suggestions?

0 Likes 0 ·
Erick Ramirez avatar image Erick Ramirez ♦♦ kallmekunal_131749 commented ·

An 8GB system with only 2 vCPUs is too small to do any meaningful work. A lot of professional laptops/desktops have better specs than that.

For production workloads, we recommend running on a system with a minimum of 8 vCPUs (4 physical cores) + 30GB RAM. Cheers!

0 Likes 0 ·