What are the downsides of forcing major compaction on a table (with
nodetool compact) and what is the best practice recommendation?
Bringing together the Apache Cassandra experts from the community and DataStax.
Want to learn? Have a question? Want to share your expertise? You are in the right place!
Not sure where to begin? Getting Started
When forcing a major compaction on a table configured with the
SizeTieredCompactionStrategy (STCS), all the SSTables on the node get compacted together into a single large SSTable. Due to its size, the resulting SSTable will likely never get compacted out since similar-sized SSTables are not available as compaction candidates. This creates additional issues for the nodes since tombstones do not get evicted and keep accumulating, affecting the cluster's performance.
We understand that cluster administrators use major compaction as a way of evicting tombstones which have accumulated as a result of high-delete workloads which in most cases is due to an incorrect data model.
The recommendation in this post does not constitute a solution to the underlying issue users face. It should not be considered a long-term fix to the data model problem.
In Apache Cassandra 2.2, CASSANDRA-7272 introduced a huge improvement which splits the output of
nodetool compact into multiple files which are 50% then 25% then 12.5% of the original table size until the smallest chunk is 50MB for tables using STCS.
When using major compaction as a last resort for evicting tombstones, use the
--split-output (or shorthand
-s) to take advantage of this new feature:
$ nodetool compact --split-output -- <keyspace> <table>
NOTE - This feature is only available from Cassandra 2.2 and newer versions.
[Re-published from DataStax Support KB article "Why is forcing major compaction on a table not ideal?"]
5 People are following this question.