in STCS , whenever a bucket reaches its max_threshold value the SSTABLES are trimmed to by default 32. what happens to the rest of the other tables are they dropped? is data lost
in STCS , whenever a bucket reaches its max_threshold value the SSTABLES are trimmed to by default 32. what happens to the rest of the other tables are they dropped? is data lost
Hi @pranali.khanna101994_189965,
Size tiered compaction merges sets of SSTables that are approximately the same size. Casssandra compares each SSTable size to the average of all SSTable sizes on the node. It merges SSTables whose sizes in KB are within [average-size × bucket_low] and [average-size × bucket_high].
The subproperties min_threshold
and max_threshold
control how a minor compaction would be triggered and how many sstables can be merged as part of one minor compaction.
The minor compaction is triggered when the min_threshold
of similarly sized SSTables is met. By default that is 4.
Should you have more than max_threshold
number of SSTables of similar size, a first compaction would merge no more than max_threshold
number of SSTables together. If after that compaction the min_threshold
is still met (i.e. you have more SSTables of similar size than min_threshold), the next compaction would merge the next set, up to a maximum of max_threshold
.
No data is lost.
I can't see many situations where the default max_threshold
of 32 would be hit, unless automatic compaction was disabled for a longer period of time, allowing a significant number of SSTables to build up; or the compaction strategy was changed. In those cases you want to avoid that too many SSTables are merged together into a very large SSTable, and the max_threshold
allows you to set an upper limit on the number that can be merged. In most cases, there is no need to change these default settings.
I hope this clarifies how the minor compaction is triggered (by the min_threshold
) and what is defined by the max_threshold
setting.
7 People are following this question.
DataStax Enterprise is powered by the best distribution of Apache Cassandra ™
© 2023 DataStax, Titan, and TitanDB are registered trademarks of DataStax, Inc. and its subsidiaries in the United States and/or other countries.
Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries.
Privacy Policy Terms of Use