How do I split a large SSTable file on a server that is not part of a running cluster?
When forcing a major compaction on a table configured with the
SizeTieredCompactionStrategy, all the SSTables on the node get compacted together into a single large SSTable. Due to its size, the resulting SSTable will likely never get compacted out since similar-sized SSTables are not available as compaction candidates. This creates additional issues for the nodes since tombstones do not get evicted and keep accumulating, affecting the cluster's performance.
The large SSTables need to be split into multiple smaller SSTables so they can get compacted as normal using the
sstablesplit tool. However, this is in an offline tool which requires Cassandra to be shutdown on the node. The steps in this article provide a workaround which does not require downtime.
WARNING - Although it may be possible to run a compatible
sstablesplit from another Cassandra version, e.g. split C* 3.0 SSTables with C* 3.11, it is a not a tested configuration so is not recommended.
Follow these steps to split a large SSTable on another server that is not part of a cluster.
Step 1 - Copy the large SSTable and all its components from the source node to the alternate server. For example, if splitting SSTable generation 5678 from a C* 3.11 cluster, copy the whole set of
md-5678-big-CompressionInfo.db md-5678-big-CRC.db md-5678-big-Data.db md-5678-big-Digest.crc32 md-5678-big-Filter.db md-5678-big-Index.db md-5678-big-Statistics.db md-5678-big-Summary.db md-5678-big-TOC.txt
WARNING - Only copy SSTables from one source node at a time. DO NOT mix SSTables from multiple source nodes.
Step 2 - Here is a recommended way of running the tool:
$ tools/bin/sstablesplit --debug --no-snapshot -v /path/to/large/sstable/*
-v results in additional troubleshooting information reported back to the console. The
--no-snapshot flag skips the need for a snapshot since the tool is operating on a secondary copy of the SSTable.
By default, multiple 50MB SSTables will be generated. Alternatively, it is possible to specify a target size using the
-s flag, e.g.
-s 100 to generate multiple 100MB SSTables.
Step 3 - Copy all the new files (including all component files, e.g.
*-Statistics.db) to the source node.
WARNING - Only copy the new files to the owner of the original large SSTable. Each node owns a portion of the data and copying files onto a node which does not own the data will result in data loss.
Step 4 - Check file permissions on the newly copied files to make sure they match the rest of the SSTables on the node.
Step 5 - On the source node, run
nodetool drain then temporarily stop Cassandra.
Step 6 - Move the original large SSTable (and all its component files) out of the data directory.
Step 7 - Start Cassandra.
After starting Cassandra, check the
debug.log to confirm that the new SSTables were opened and read.
nodetool cfstats against the table and check for statistics such as data size and/or estimated keys.
In circumstances where an SSTable is excessively large or contains large partitions, the
sstablesplit utility could experience an
OutOfMemoryError exception. In this situation, increase the JVM heap size. For example to increase the heap to 8GB, modify the following line in the
tools/bin/sstablesplit shell script:
5 People are following this question.