DataStax Academy FAQ

DataStax Academy migrated to a new learning management system (LMS) in July 2020. We are also moving to a new Cassandra Certification process so there are changes to exam bookings, voucher system and issuing of certificates.

Check out the Academy FAQ pages for answers to your questions:


question

Erick Ramirez avatar image
Erick Ramirez asked ·

Why is forcing major compaction on a table not ideal?

What are the downsides of forcing major compaction on a table (with nodetool compact) and what is the best practice recommendation?

compactionstcs
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered ·

Background

When forcing a major compaction on a table configured with the SizeTieredCompactionStrategy (STCS), all the SSTables on the node get compacted together into a single large SSTable. Due to its size, the resulting SSTable will likely never get compacted out since similar-sized SSTables are not available as compaction candidates. This creates additional issues for the nodes since tombstones do not get evicted and keep accumulating, affecting the cluster's performance.

Caveats

We understand that cluster administrators use major compaction as a way of evicting tombstones which have accumulated as a result of high-delete workloads which in most cases is due to an incorrect data model.

The recommendation in this post does not constitute a solution to the underlying issue users face. It should not be considered a long-term fix to the data model problem.

Recommendation

In Apache Cassandra 2.2, CASSANDRA-7272 introduced a huge improvement which splits the output of nodetool compact into multiple files which are 50% then 25% then 12.5% of the original table size until the smallest chunk is 50MB for tables using STCS.

When using major compaction as a last resort for evicting tombstones, use the --split-output (or shorthand -s) to take advantage of this new feature:

$ nodetool compact --split-output -- <keyspace> <table>

NOTE - This feature is only available from Cassandra 2.2 and newer versions.

Also see How to split large SSTables on another server.

[Re-published from DataStax Support KB article "Why is forcing major compaction on a table not ideal?"]

Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.