question

Dayanand8143 avatar image
Dayanand8143 asked Erick Ramirez answered

Why does count(*) or count(1) time out?

Hi All,

I have two tables which has 167342 rows in table A and 1126736 records in table B.

When Im trying to check the count of rows using the query, i see timedout error in cassandra.

SELECT COUNT(*) FROM A;

As a workaround i am using nodetool tablestats schemaname.A. But the number of rows in this is not matching with the number of loaded rows.

can anybody help me to fix the count(*) issue.

Version of Cassandra: 3.10

Network topology

Exact error message:

ReadTimeout: Error from server: code=1200 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out - received only 0 responses." info={'received_responses': 0, 'required_responses': 1, 'consistency': 'ONE'}

Regards

Daya

count
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered

The COUNT() function performs a full table scan and has to query every single node in the cluster to get a result. Unless you have a small table and running a very small cluster with three nodes at the most, the query will eventually time out.

I've explained the reasons why this is not a good idea in Why COUNT() is bad in Cassandra.

As I stated in that post, use the DataStax Bulk Loader tool (DSBulk) to perform a count if you must. But unless the apps are switched off, the counts are valid for a point in time only because your data is constantly mutating particularly for production loads. Cheers!

Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.