Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

Valuser avatar image
Valuser asked Valuser commented

Why is the number of keys an estimate in the nodetool tablestats output?

Hi,

In the nodetool tablestats command, i can see 'number of keys (estimate) '. I know its same as number of partitions ( correct me if i am wrong). Why is it an estimate? If we create a table with primary key with (states, date) with states being partition key. So there wil be total of 50 states ( US) , meaning total 50 partitions. And the rows being under each of these partitions spanning multiple nodes. So how is 'estimate' coming in tablestats - 'Number of keys'?

Also i am in a scenario where i need to calculate the count of rows(estimate only needed) from a big table. I cannot use dsbulk count as of now.

Is there any other fast way just to get the estimate row count? Like using any other tools?

cassandra
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered Valuser commented

The reason for it is quite simple -- data in a Cassandra cluster changes at a high velocity and partitions can be partially changed too which makes it difficult to reconcile the number without reading-before-writing.

Let me illustrate with some examples using this table:

CREATE TABLE community.users (
    name text PRIMARY KEY,
    address text,
    email text,
    mobile text
)

EXAMPLE 1 - INSERT over-count

If I add a new user to the table:

cqlsh:community> INSERT INTO users (name, address) VALUES ('alice', '1 Main Rd');

I end up with 1 partition in the table as expected:

cqlsh:community> SELECT * FROM users;

 name  | address   | email | mobile
-------+-----------+-------+--------
 alice | 1 Main Rd |  null |   null
$ nodetool tablestats community.users | grep partitions
Number of partitions (estimate): 1

But when I add Alice's email:

cqlsh:community> INSERT INTO users (name, email) VALUES ('alice', 'alice@mail.com');

there should still be the 1 partition in the table:

cqlsh:community> SELECT * FROM users ;

 name  | address   | email          | mobile
-------+-----------+----------------+--------
 alice | 1 Main Rd | alice@mail.com |   null

But the table metrics shows an estimate of 2 partitions:

$ nodetool tablestats community.users | grep partitions
Number of partitions (estimate): 2

This is because Cassandra doesn't "know" that the name = 'alice' partition already exists -- C* does not do a read-before-write. Therefore it estimates that there are 2 partitions based on the 2 INSERT statements.

EXAMPLE 2 - Updates

But what about updates? If we're updating a partition then we shouldn't be counting them as an update, right?

Updates in C* are inserts under the hood. And since C* doesn't do a read-before-write, C* doesn't know if the partition already exists so counts it as a new partition:

cqlsh:community> UPDATE users SET mobile = '61-456-789-123' WHERE name = 'bob';

And we end up with an estimate of 3 partitions:

$ nodetool tablestats community.users | grep partitions
Number of partitions (estimate): 3

when we know there are only 2 partitions in the table:

cqlsh:community> SELECT * FROM users ;

 name  | address   | email          | mobile
-------+-----------+----------------+----------------
   bob |      null |           null | 61-456-789-123
 alice | 1 Main Rd | alice@mail.com |           null

But if we flush the memtables to disk:

$ nodetool flush

the act of flushing forces the data to be coalesced between what's already on disk and what was in the memtable. All the fragments of the Alice partition get merged and the table count is now correct:

$ nodetool tablestats community.users | grep partitions
Number of partitions (estimate): 2

Conclusion

In a production environment, mutations (inserts, updates, deletes) are taking place at high velocity so the partition counts on a node will constantly get updated and reconciled.

Depending on when you look at the table metrics, the number of partitions (keys) will change. This is the reason it is labelled as an estimate.

In response to your other question, the DataStax Bulk Loader (DSBulk) is the correct tool to perform a count. But even then, it is a count at a point-in-time and only true at the time that you performed the count because in production environments, data is changing constantly. I've explained this in detail in Why COUNT() is bad in Cassandra. Cheers!

2 comments Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

In your explanation, when you added email for 'Alice' partition key , for the same partition key there will be 2 rows in table right? Like now after is addition you have total number of keys estimate as 2.

Then eventually is 'number of partitions (estimate)' = number of total rows (estimate)in a table?

I had a test table in which i hade 20k records and number of keys (estimate) was 3 . I believe this 3 is from the partition key

I think i am missing something .please update me


0 Likes 0 ·

@Erick Ramirez , could you explain the diff b/w total partitions and total number of rows in a table based on the example context?

0 Likes 0 ·