question

pankajgajjar_39995 avatar image
pankajgajjar_39995 asked Erick Ramirez edited

Cassandra able to write more than 2Billions per partition, whats impact on read/write & overall performance ?

Cassandra: 3.5

we able to write more than 2 billion records in our one partition table.

many interesting facts

query : select count(*) from table - does not able to return more than 2 billion even if the table has more than 2 billions records.

also, since it's a costlier operation, took 2+ hrs to get this count result, this could not use case for Cassandra to get count, but question how can we improve this well?


also, the main question, what if impact if we were able to write more than 2 billion record into single table partition.


do we have to start splitting our data with another partition? difficult to need to validate well.



performancepartitionlimit
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Erick Ramirez avatar image
Erick Ramirez answered pankajgajjar_39995 commented

@pankajgajjar_39995 yes, you can write 2B columns to a partition but that it's very impractical to have a very wide partition. What is your use case? It doesn't make sense to have such a wide partition. You will most likely need to fix your data model. In relation to COUNT, see this blogpost I wrote. We recommend you use DSBulk for counting. Cheers!

4 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Erick Ramirez avatar image Erick Ramirez ♦♦ commented ·

@pankajgajjar_39995 just a friendly note that you need to post comments under my answer instead of starting a "new answer". :)

0 Likes 0 ·
pankajgajjar_39995 avatar image pankajgajjar_39995 Erick Ramirez ♦♦ commented ·

yes, bcos there is limit of 400 chars in comment box :)

0 Likes 0 ·
Erick Ramirez avatar image Erick Ramirez ♦♦ commented ·

Yes, reading a partition with 2B rows is definitely problematic because that query will be slow. Will your application really read 2B catalog items and display it to a user? Even if you page through the results, it doesn't sound plausible for you app to be able to display that many results to a user. Cheers!

0 Likes 0 ·
pankajgajjar_39995 avatar image pankajgajjar_39995 Erick Ramirez ♦♦ commented ·

of course, there is pagination !!

read is important but another concern it's about written on partition has more than 2B rows. I just came to know that node can be crash anytime due to the heap memory issue.

so, a simple solution is split partition so it's store more than 2B rows, please correct me. and do to this, make data model more flexible to adopt changes well.


0 Likes 0 ·
pankajgajjar_39995 avatar image
pankajgajjar_39995 answered

@Erick Ramirez appreciated for your quick feedback, yes i can understand this wide partition and need to fix the data model.

about use case, we as ContentServ provided PIM solution, we have billions of products are stored into cassandra DB and fetching those data with different viewpoint, to generate the different viewpoint, we used the cassandra feature called "Materialised View" its help us to fire query on view and get record faster, but view only return the unique ids, does not complete info, so we have to get complete info from our master db stored into cassandra only, we have partition here but it would be reaching to cassandra limit 2B columns/rows per partition soon, now to cover come this case, we would need to know if we store more than 2 B columns/rows per partition, would it create any problem or read/write slow ?


appreciate again if you able to give more view on this use case well and see if we are in right direction or some data model changes need as you already mentioned. also, it's practical work well or not.


Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.