Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

samuel_191829 avatar image
samuel_191829 asked Erick Ramirez commented

Is SizeTieredCompactionStrategy better than LeveledCompactionStrategy for a write-once with never-update access pattern?

Hello.

I have a question regarding compaction strategy.

Let say I have a workload where data will be inserted once, or upsert (batch of insert for a given partition) but never updated (in terms of column update).

I'm trying to figure out if the use of Size Tiered Compaction Strategy is better than Leveled Compaction Strategy.

Because Size Tiered Compaction does not group data by rows, if I want to fetch an entire partition. (it seems the rows are spread over many SSTable).

Also the partition created will not be updated or added with new column, only during the initial write (like batch), so STC may involve only a single SSTable

Thanks

compaction
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered Erick Ramirez commented

If you're only ever inserting the whole partition once and never updating it in whole or in parts then the compaction strategy makes no difference and sticking with the default SizeTieredCompactionStrategy is the right choice.

These 2 assumptions are incorrect:

Because Size Tiered Compaction does not group data by rows

(it seems the rows are spread over many SSTable)

STCS does coalesce fragments of a partition into an SSTable when it can. The difference with LCS is that it does it very aggressively and requires significant disk IO for write-heavy workloads.

In your case where the whole partition is written all at once with no updates, there is no chance for the partition or its rows to be fragmented across multiple SSTables because it will only ever get written to one SSTable when the mutation is flushed to disk. There is no reason for Cassandra to split the data that is already stored together in memtables into separate SSTables. Cheers!

2 comments Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Not a problem. Cheers!
0 Likes 0 ·