Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

NiD avatar image
NiD asked ·

How do I aggregate events from lots of devices (lots of partitions)?

How to aggregate a Cassandra (ver 3.11) table which have too many partitions ?

I need a table which need to store events from devices, to make the partition size below 100 MB, i did combination of device_Id and hour as partition key. So in a partition it may have 3600 events (one event per second). But if i need to aggregate all the events which came in last hour, i have to hit too many partition depends on how many devices were active in last hour.

Is it a good option to use Cassandra in such case where data wont be fit under one partition

or if i distribute data across many partition and later read data from those partitions in a single query ?

Can I design the table some other way? How can i improve the design?

What If am using Spark to read multiple partition, how will be the performance ?

sparkdata modellingspark-cassandra-connector
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered ·

Partitioning by hour with 3600 events is just about right. You are correct with using Spark to do the aggregation and it is a good fit.

We recommend that you should have 2 data centres -- one DC for OLTP workload from the application and another DC for analytics workload. Isolating workloads in separate DCs ensures that the Spark jobs don't affect the app traffic. Cheers!

Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.