Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

mishra.anurag643_153409 avatar image
mishra.anurag643_153409 asked ·

What happens when most of the data is in one single partition?

I am wondering If for example I have designed a data model with a partition key and that in that partition key there is one key present in the partition key that is associated with most of the data. As per my understanding data with same key partition will go to same partition and in this case one partition will be created with most of the data and that could lead to performance issue . I do understand partition key selection is here not right but if it happens what are ways to avoid this problem ?

data modeling
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered ·

When most of the data in your cluster belongs to one partition it means that the replicas for that partition will constantly be taking the load regardless of the size of your cluster.

For example if the keyspace has a replication factor of 3, only three nodes in your cluster will get all the reads and writes even when you have 10, 100 or 1000 nodes in your cluster.

To avoid this, you must model your data correctly. There is no other way around it. Cheers!

Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.