Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started



sunilrpawar7_183464 avatar image
sunilrpawar7_183464 asked Erick Ramirez edited

Which compaction strategy should I choose for big tables showing high read latency?

In our current set up we have one table with a size of 150+ GB which stored profiles of users. Currently, we have STCS with max and min threshold as 32 and 4 respectively. Below are a few details of the details:-

1. caching = { 'keys' : 'ALL' , 'rows_per_partition' : 'NONE'}

2. Total SSTable counts for a per node are approximately 16-18.

3. gc_grace_second = 864000

Cassandra version details:- 3.11.2

Total number of nodes:- 15

Replication factor for keyspace:- 4

Please suggest if we can change the compaction strategy to something else to enhance the performance of the system.

10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

bettina.swynnerton avatar image
bettina.swynnerton answered Erick Ramirez edited


There could be a few different factors contributing to the read latency, and in my experience the compaction strategy impact is secondary to other factors, such as partition sizes, query design, GC pauses, read-repairs and tombstones.

Unless compactions are lagging behind (you can check for pending compactions with nodetool compactionstats), I would first check on the max partition sizes in nodetool tablestats, the number of tombstones read in the last five mins, and you can trace the representative queries in cqlsh to see how many sstables are being touched and what is contributing to the latency. nodetool tablehistograms for your problematic table will also show how many sstables are read on average.

Also ensure that the table is repaired regularly and that the reads are not triggering a lot of read repairs.

Perhaps you have looked a these already, let us know what you found.

3 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Hi @bettina.swynnerton,

Thanks for your valuable feedback.

1. nodetool compactionstats doesn't show any pending task.

2. nodetool tablestats shows a number of partitions are 1200041569, partition minimum, maximum and mean bytes size is 51,1331,192 respectively.

3. Average tombstones per slice are 1.0 and the maximum is also 1.0

4. nodetool tablehistograms provides following information :-

percentile ----SSTables

50% ------------2.00







0 Likes 0 ·

Interesting tablehistograms output. The fact that the queries involve lots of SSTables (up to 17) indicates to me that your application is doing multi-partition reads.

Do you know what the queries look like? I suspect they include the IN() operator with lots of partition keys involved. This kind of query is very inefficient and unpredictable since the coordinator needs to fire a separate request for each partition. The compaction strategy has no influence over it.

0 Likes 0 ·

[Follow up question re-posted as #5735]

0 Likes 0 ·
Erick Ramirez avatar image
Erick Ramirez answered Erick Ramirez commented

Speaking from experience, it's very rare that the compaction strategy is the source of latency. The factors which affect latency the most are:

  • use case
  • data model
  • access patterns
  • hardware configuration
  • node density
  • cluster size

For example, a bad data model means that a partition read has to iterate over deleted rows to get to the data. Read latency will also be negatively affected when the data/ and commitlog/ disks are on the same disk/volume because they're competing with each other for IO.

Insufficiently provisioned EBS volumes will be inherently slow. Or when the maximum throughput per node has been reached that it's time to add more nodes to increase the capacity of the cluster.

Switching compaction strategies isn't going to solve these problems. In some edge cases, it might be possible to improve read performance with the LeveledCompactionStrategy since it's goal is to coalesce fragments of a partition into a few files so that only 1 or 2 SSTables are involved in a read.

But in my personal experience, this is only effective for some edge cases. If the partitions are regularly updated, LCS is spending a lot of time constantly compacting and re-compacting that most of the IO is spent on compaction instead of reads. Cheers!

2 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Hi @Erick Ramirez,

Thanks for your answer.

1. data/ and commitlog/ are on different disk.

2. Till now we have not observed nodes are going out of workload.

As we are using caching = { 'keys' : 'ALL' , 'rows_per_partition' : 'NONE'} for current table and cache hit ration for the table is observed around 30-50%, can it be one of the cause for reading latency?

0 Likes 0 ·

For what it's worth, I was just giving examples of what could cause latency. I wasn't suggesting they applied to your situation. :)

The caching you mentioned is for the partition key cache. The hit ratio is within the expected range and doesn't necessarily cause latency. Cheers!

1 Like 1 ·