question

igor.rmarinho_185445 avatar image
igor.rmarinho_185445 asked smadhavan edited

Why does COUNT() return different results?

Hi,

So I'm logging in the node machine and performing a row count, sometimes I got 906 rows and others 915.

I already did a repair -pr, tried repair --full, disable nodeSync and try to repair it but the issue persists. how can I fix it?

DSE 6.8.3

select count(*) from  keyspace.table

 count
-------
   915

select count(*) from  keyspace.table
 count
-------
   906
count
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered smadhavan edited

You are running a query which does a full table scan. Unless you run the COUNT() on:

  1. a single node, and
  2. have no writes hitting the cluster,

you will continue to get inconsistent results. I've explained the reasons why this is not a good measure in Why COUNT() is bad in Cassandra.

As I stated in that post, use the DataStax Bulk Loader tool (DSBulk) to perform a count if you must. But unless the apps are switched off, the counts are valid for a point in time only because your data is constantly mutating particularly for production loads. Cheers!

6 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

igor.rmarinho_185445 avatar image igor.rmarinho_185445 commented ·

Hi Erick.

I'm running it in a 3 nodes cluster, and it have no writes hitting the tables I'm performing Count();

I'm testing a manual backup/restore, and I getting inconsistencies when I restore it.

All keyspaces use RF:3 so they supposed to have the same number of rows, no?

0 Likes 0 ·
igor.rmarinho_185445 avatar image igor.rmarinho_185445 commented ·

Even using Dsbulk I have different row count in between the nodes.


For example:

node 1

total | failed | rows/s | p50ms | p99ms | p999ms
  929 |      0 |  1,741 |  8.93 | 33.42 |  33.42


total | failed | rows/s | p50ms | p99ms | p999ms
8,927 |      0 | 12,382 | 18.89 | 36.96 |  36.96

node 2


total | failed | rows/s | p50ms | p99ms | p999ms
  930 |      0 |  1,580 | 10.28 | 31.98 |  31.98


total | failed | rows/s | p50ms | p99ms | p999ms
8,924 |      0 | 11,205 | 24.95 | 54.26 |  54.26
Operation COUNT_20201016-175215-908537 completed successfully in .


0 Likes 0 ·
Erick Ramirez avatar image Erick Ramirez ♦♦ igor.rmarinho_185445 commented ·

If you run a single-partition query (not COUNT(), not unbounded) in cqlsh with (1) consistency set to ALL and (2) tracing enabled, you will see read repairs getting triggered which would confirm that data among your nodes are inconsistent.

I realise you already said you ran a repair but I suspect there was an issue in the way you ran it. You should run this command one node at a time for it to be effective:

$ nodetool repair -pr -- ks_name table_name

Cheers!

1 Like 1 ·
igor.rmarinho_185445 avatar image igor.rmarinho_185445 Erick Ramirez ♦♦ commented ·

I ran nodetool repair -pr and after nodetool repair -pr - ks_name table_name but it did'nt change.

I'll try the select with consistency all.

0 Likes 0 ·
Show more comments