Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

pwozniak_45657 avatar image
pwozniak_45657 asked ·

How do I calculate the maximum number of rows in Cassandra 2.x vs 3.x?

Hi, I would like to calculate maximum rows number for by table that I have in Cassandra v2.

I found the formula (and calculator) in multiple places:

https://www.datastax.com/resources/ebook/oreilly-cassandra-definitive-guide

https://cql-calculator.herokuapp.com/

https://medium.com/@johnny_width/calculating-cassandra-partition-disk-size-43276d1dcb34

But it looks like each place describes calculations for Cassandra v3. Can use the same formula for Cassadra v2? If not, what is the formula for Cassandra v2? Can I find description somewhere?

Regards

storage
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Artem Chebotko avatar image
Artem Chebotko answered ·

Hi @pwozniak_45657, thanks for your question.

I am the original author of the formulas. It is important to understand that the formulas can give you estimates of the number of values in a partition and partition size on disk. I intended them to be used for data modeling purposes to detect large partitions. They are not to be used for storage planning.

TIP 1: For storage planning, consider loading a representative dataset and using nodetool flush followed by nodetool tablestats to get the exact size of your data on disk. It includes metadata size, column name size, partition key size, cluster key size, overhead, compression, etc.

TIP 2: While working on numerous data modeling problems, I found myself rarely using the formulas. In most cases, it is sufficient to estimate a partition size by multiplying the number of rows in a partition and the worst case row size estimate. The number of rows is usually easy so estimate. It is generally good if it is << 100,000. Estimating the row size is usually straightforward, too, unless you have dozens of columns, store paragraphs of text, or large BLOB values. It is a good sign when the resulting partition size is << 100 MBs.


The formulas for Cassandra 2

Number of values in a partition:

1616118084658.png

Partition size on disk:
1616118157004.png


The formulas for Cassandra 3

Number of values in a partition:

1616118717289.png


Partition size on disk:

1616118764199.png


1616118084658.png (187.7 KiB)
1616118157004.png (285.6 KiB)
1616118469726.png (187.6 KiB)
1616118717289.png (192.3 KiB)
1616118764199.png (225.6 KiB)
Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.