Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started



scherian_188962 avatar image
scherian_188962 asked smadhavan commented

Which compaction strategy should I use for a table that tracks user activity which requires a read-before-write?

Sample table :

CREATE TABLE user_activity (
    user_id int,
    activity_id uuid,
    activity_timestamp timestamp,
    score int,
    PRIMARY KEY (user_id, activity_timestamp)
) WITH CLUSTERING ORDER BY (activity_timestamp DESC);

whenever a user performs a new activity, his previous activities are first read, a score is calculated and the new activity is then persisted along with the calculated score.

eg: a new user USER1 performs his 1st activity , in this scenario, zero records will be called on SELECT before persisting, when he performs the 2nd activity, his previous record will be read and score of 1 will be given, so the 2nd activity is then persisted with score value as 1.

A TTL is also given for every insert. There will be updates (only 1 time update) happening on a certain low percentage (30%) of the overall records, the time interval between an insert and update would be between 10min-1hr.

In this scenario, which is the preferred compaction strategy to be used?

version: Apache Cassandra 3.11.4

10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

@scherian_188962, can you also update your original post with the version of Cassandra/DSE that you're running? Cheers!

0 Likes 0 ·

have updated the post

0 Likes 0 ·

@scherian_188962, I don't think I'm seeing the version of C*/DSE being updated in the original post. Where did you update that info?

0 Likes 0 ·
Show more comments

1 Answer

smadhavan avatar image
smadhavan answered smadhavan commented

@scherian_188962, based on the assumption that you won't be upserting (a.k.a. editing/updating) the data once written into the table (along with TTL), TimeWindowCompactionStrategy (TWCS) is best suited for this use case. Again, you'll have to test this with your production-like load in a lower environment to gauge which is better strategy for your workloads. For further reading, you can refer to the following resources on choosing the compaction strategy,

3 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

@scherian_188962, you could also "Accept" this answer if you're done. Thanks!

0 Likes 0 ·

@smadhavan assuming your ans remains the same after the post update on version and data upsert?

0 Likes 0 ·

@scherian_188962, yes that's correct!

0 Likes 0 ·