awhdesmond avatar image
awhdesmond asked Erick Ramirez answered

LCS compaction is performing very slow


We are running a 3-node Cassandra cluster with one of our nodes facing some issues with compaction.

  • LCS is used for the table that compaction is lagging
  • currently, there are about 1600+ compactions pending and the number is decreasing slowly.
  • this occurred after we tune that node to increase its JVM heap, concurrent_writers, concurrent_compactors and compaction_throughput.
  • the other nodes that were not tuned, they are performing fine
  • right now, we have set compaction_throughput to 0 to unthrottle the compaction.

Originally, we thought that by increasing the compaction_throughput, the compaction will perform faster. However, that is not that case and after we tracked various disk i/o statistics -- iostats, dstats, we realised that the cpu, disk i/o is not saturated.

Output for iostats:

avg-cpu: %user %nice %system %iowait %steal %idle
 1.07 6.01 0.74 0.17 0.00 92.02
Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn
sda 1135.75 35.43 28.82 519157965 422340399
sda1 1135.75 35.43 28.82 519157472 422340399

Therefore, we are not very sure where the bottleneck is. Does anyone has any suggestion on what we can possibly do to resolve this? Thanks!

10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered

Unfortunately, it's impossible to figure out what the underlying issue without diagnostic information. And that level of detail is challenging in a Q&A format.

If the other nodes are operating fine, my suggestion is to rollback the configuration changes you made to the problematic to bring it in-line with the other nodes.

LCS can be problematic in that it takes a bit of time for it to catch up once it has fallen behind. The usual workaround for this is to temporarily switch to STCS to force a rewrite of the SSTables then switch back to LCS again.

However, this workaround is not applicable in your case since only one node is problematic. We only recommend the workaround if the issue is cluster-wide.

If you provide some background information on why the node fell behind, I'm happy to provide some further suggestions. Cheers!

10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.