Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

marcin_175159 avatar image
marcin_175159 asked ·

Absolute partition size and related implications

Hi there,


In Cassandra 2, there were recommendations not to exceed 100MB per partition, I have been told that is not the case with Cassandra 3 but its recommended to keep it small due to sync performance should partition be too big.


Question is, how big is too big and how can we measure big partition impact on sync/replication latency should partition grow?

cassandrapartitionlimit
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Cedrick Lunven avatar image
Cedrick Lunven answered ·

+1 with what @Erick Ramirez just said:

  • Too many rows could lead to slowing down your queries due to partition scanning: recommendation is about 100 000 rows
  • Too much volume could lead to streaming issue depending on your network load and use cases : recommendation is about 100MB


2 comments Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Thanks @Cedrick Lunven but how can we measure impact of size on scanning and stream performance/time? Is there something nodetool can report?

0 Likes 0 · ·

You measure it by monitoring read latencies while running load tests that simulate production-like data and production-like access patterns. Cheers!

0 Likes 0 · ·
Erick Ramirez avatar image
Erick Ramirez answered ·

@marcin_175159 there are no absolutes and the same 100MB partition size recommendation applies but you can have partitions larger than that -- it all depends on your use case, data model, access patterns, etc.

The only way you can determine the impact of large partitions is by performing exhaustive testing with production-like workloads. Cheers!

2 comments Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Hi @Erick Ramirez thanks for that, but is there some way to measure how sync/timings are affected while partition size increases? Can nodetool handle it?



0 Likes 0 · ·

You measure the performance of your cluster by performing read/write load tests on production-like data. Not sure what you mean by "can nodetool handle it?".

0 Likes 0 · ·