question

leehuihua avatar image
leehuihua asked leehuihua edited

Does the compaction strategy impact Cassandra read path?

Hi,all:

I am not sure whether my understanding is right or not:

If Size Tiered Compaction strategy is used, the sstables on disk are not ordered, so if the target row is not in memory, you have to go through all the sstables to find all the records(with different timstamp), and compare them to get the newest one.

If Leveled Compaction strategy is used, the sstables on lower level is newer than the higher level. so if the target row is not in memory, you just have to travese from level 0 to level N. Once you find the record, you can stop travesing and return the record, because it is the newest one.

so the read path with Leveled Compaction strategy is shorter and more efficent?

compaction
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered leehuihua edited

In my experience, LCS performing better is a myth.

LCS is only suitable for very read-heavy workloads where there are hardly any or no updates to the data. Otherwise, LCS is constantly running compactions trying to coalesce partitions into single SSTables. The constant compactions end up competing for the same IO bandwidth of the data disks as read operations.

I field too many questions relating to tuning LCS because of performance issues so in my opinion it's just not worth it. There are very limited use cases where LCS is a good fit.

For STCS, you are incorrect in your understanding. STCS does not "have to go through all the SSTables". Cassandra uses data structures including bloom filters, partition key caches, partition summary and partition indexes in order to retrieve data very quickly without having to scan over all the files on disk.

I recommend having a look at How Cassandra reads data for details. Cheers!

1 comment Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

leehuihua avatar image leehuihua commented ·

thanks!

I have read the document you recommended , but i am still wondering whether the sstables are ordered on disk.

For STCS, maybe you don't have to go through all the SSTables data block, but you still have to check all the bloom filters (every sstable owns its bloom filter) or other components to make sure whether this sstable contains the target row or not. My point is ,For STCS, you have to get all the records to pick the newest one. For LCS, the first one you get is the newest.

Am i getting it wrong?

0 Likes 0 ·