Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

amolskh@gmail.com avatar image
amolskh@gmail.com asked amolskh@gmail.com commented

How can I minimise the impact of TWCS compaction of large SSTables with 45-day intervals?

have cassandra cluster where CPU usage is between 10-20% normally with occcassional spikes to 40-50%. I am storing time series data ( which can come at times out of order in some delta time typical IOT use case)

Currently I have configured 45 days as my interval for time window compaction.

As of now at end of 45 days my data gets compacted. Currently this process takes about 3-4 hrs for SStable of 70-80GB (with parition size of 10-20MB)

As load and amount of data ingested increases SStable size also will increse what are options I can use to deal with these long compaction taking place ? What is ideal way to handle end of time window compaction gracefully ? Is there any config that I can push this to happen on weekends instead (When load is very less) ?

Also does having smaller parition size eg. 5 MB Partition size 80GB SStable v/s 20MB parition 80GB SStable would there be significant difference in compaction times?

compactiontwcs
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered amolskh@gmail.com commented

The underlying cause of your problem is the large time window you've set so it's accumulating lots of data over the 45-day window which will always end up in a large amount of SSTables to compact.

To respond to your questions directly:

  1. I recommend reducing the time window so there's less data to compact. For example, 5 days would be more manageable than 45.
  2. Compactions are part of the normal operation of a cluster and take place automatically so there's no manual "... way to handle [it] gracefully".
  3. No, there's no way to delay compactions to some preferred timeslot.
  4. The partition sizes won't matter much if you still have 80GB of data to compact in both situations.

All things being equal, you should consider using machines with locally-attached NVMe SSDs if you are sensitive to IO throughput. Cheers!

2 comments Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Since I want to store 1-2 yrs of data . reducing window will lead to to many SSTables and which will impact read performance. Would splitting table into 2 tables so SSTable of 40GB 40GB , would that help resolve the issue

0 Likes 0 ·

Also 2 more question for

- 2 core 8GB Ram instance what would be idea size of SSTable ?

- Incase my compaction runs for very long like 12 hrs /1 Day how much impact will be faced on impcoming new data compaction which would be blocked

0 Likes 0 ·