What are the downsides of forcing major compaction on a table (with nodetool compact
) and what is the best practice recommendation?
What are the downsides of forcing major compaction on a table (with nodetool compact
) and what is the best practice recommendation?
When forcing a major compaction on a table configured with the SizeTieredCompactionStrategy
(STCS), all the SSTables on the node get compacted together into a single large SSTable. Due to its size, the resulting SSTable will likely never get compacted out since similar-sized SSTables are not available as compaction candidates. This creates additional issues for the nodes since tombstones do not get evicted and keep accumulating, affecting the cluster's performance.
We understand that cluster administrators use major compaction as a way of evicting tombstones which have accumulated as a result of high-delete workloads which in most cases is due to an incorrect data model.
The recommendation in this post does not constitute a solution to the underlying issue users face. It should not be considered a long-term fix to the data model problem.
In Apache Cassandra 2.2, CASSANDRA-7272 introduced a huge improvement which splits the output of nodetool compact
into multiple files which are 50% then 25% then 12.5% of the original table size until the smallest chunk is 50MB for tables using STCS.
When using major compaction as a last resort for evicting tombstones, use the --split-output
(or shorthand -s
) to take advantage of this new feature:
$ nodetool compact --split-output -- <keyspace> <table>
NOTE - This feature is only available from Cassandra 2.2 and newer versions.
Also see How to split large SSTables on another server.
[Re-published from DataStax Support KB article "Why is forcing major compaction on a table not ideal?"]
5 People are following this question.
How do I split large SSTables on another server?
What happens to the rest of SSTables when the max_threshold is reached for STCS?
How do I avoid SSTables from growing too large?
Does running manual compaction on LCS have the same consequences as STCS?
What will happen when we switch from STCS to TWCS on-the-fly?
DataStax Enterprise is powered by the best distribution of Apache Cassandra ™
© 2023 DataStax, Titan, and TitanDB are registered trademarks of DataStax, Inc. and its subsidiaries in the United States and/or other countries.
Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries.
Privacy Policy Terms of Use