question

phofegger_148429 avatar image
phofegger_148429 asked Erick Ramirez commented

How do I avoid SSTables from growing too large?

According the cassandra migration (see qestions: https://community.datastax.com/questions/3042/migrate-cassandra-to-new-service-provider-and-redu.html?childToView=3074#answer-3074 current at step 7) i have seen that i have much growing sstables > 500GB after the datasync from a existing DC. On the basis of the big sstable i will get trouble with compactions and nodetool garbagecollection because of free diskspace. I have to add a third virtual DC (DC2new) to the existing C* cluster and I would like to avoid much growing sstables. Here my questions:

a) Is it possible to set a max size for sstable for the datasync?

b) On the existing DC I would like to split some sstables in smaller pieces like 100GB or Max 150GB. Is this possible and make sense?

Many thanks in advance.

cheers

patrick

cassandracompactionstcs
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered Erick Ramirez commented

For nodes to have a single SSTable larger than 500GB indicates that your cluster has super-dense nodes which can be problematic on its own.

In any case, I don't believe a standard nodetool rebuild would generate large SSTables. My guess is that you are manually running a major compaction (with nodetool compact) as a workaround for a tombstone problem. This workaround is an issue in itself because it only hides an underlying data model problem, one that I've previously written about in Why forcing major compaction on a table is not ideal.

If you must run a major compaction, you should run it with the "split output" flag (--split-output or -s for short) so you don't end up with one single giant SSTable. It was an enhancement implemented in Cassandra 2.2 (CASSANDRA-7272) which splits the output of nodetool compact into multiple files which are 50% then 25% then 12.5% of the original table size and so on until the smallest chunk is 50MB for tables using STCS.

If you would like to break up the large SSTables on the nodes, you can do so using the sstablesplit utility. I have previously documented a workaround which does not require downtime and involves copying the large SSTable to another server which is not part of the cluster. The high level steps are:

  1. Copy a single SSTable generation and its components to another server where Cassandra is installed (but not running).
  2. Run the sstablesplit utility on the SSTable.
  3. Copy the output files back to the source node.
  4. Temporarily shutdown C*.
  5. Move out the original [problematic] large SSTable out to another directory.
  6. Start C*.

For the detailed steps, see How to split large SSTables on another server. Cheers!

4 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

phofegger_148429 avatar image phofegger_148429 commented ·

Hi Erick, thank you for the answer.

I use garbagecollect for deleting tombstones `e.g. /nodetool garbagecollect -j 1 keyspace01 table01`. I tried to avoid making major compaction I think the autocompaction after the rebuild created this big sstables. Is it possible to set a max size of sstable to avoid creating big sstables through compaction? I have seen it is possible to do a major compaction on a single sstable but not with the property --split-output is that right?

Is it possible to split sstables on the same node. My plan would be

  • stop cassandra
  • start the sstablesplit with one sstable and split it in 100GB pieces. After the split I think the original sstable are exist.
  • move the original sstable (whole set) to another /directory (as a backup)
  • start Cassandra again.

Is this plan possible ? Thanks.

cheers, Patrick

0 Likes 0 ·
phofegger_148429 avatar image phofegger_148429 phofegger_148429 commented ·

I tested sstablesplit on a cassandra DEV environment.

sstable split on the same node

  • stopped cassandra /appl/data/split/<keyspace>/<table>/
  • copied a sstable with whole set to /appl/data/split/-
/appl/cassandra/tools/bin/sstablesplit --debug -s 100 --no-snapshot /appl/data/split/mc-2289-big-Data.db

I got following message

Exception in thread "main" java.lang.AssertionError: Unknown keyspace data

i changed the directory structure to /appl/data/split/<keyspace>/<table>/

/appl/cassandra/tools/bin/sstablesplit --debug -s 100 --no-snapshot /appl/data/split/<keyspace>/<table>/*

then it works, but the ouput files was written to the original datapath.

removed the old sstable

started cassandra again -> up & running -> cassandra started a compaction and the new sstable was bigger than before :-)

How can I prevent to compact the new sstable files.?

0 Likes 0 ·
Erick Ramirez avatar image Erick Ramirez ♦♦ phofegger_148429 commented ·
How can I prevent to compact the new sstable files.?

You can't prevent it. Compaction is part of Cassandra's normal operation.

Is the table configured with TWCS? If it is, the SSTables will get compacted with STCS into one SSTable during the first "window". That's expected behaviour.

0 Likes 0 ·
Erick Ramirez avatar image Erick Ramirez ♦♦ phofegger_148429 commented ·
Is it possible to set a max size of sstable to avoid creating big sstables through compaction?

No, it isn't possible to set a maximum size.

0 Likes 0 ·