question

DavidN avatar image
DavidN asked Erick Ramirez answered

Do I need to change default_time_to_live with TimeWindowCompactionStrategy?

Good Day,

I basically want to implement a practical example of seeing the effect of using a compaction strategy and not using one on a table with a few days load of data.

I have a table called buckettest in my Db and I would like to use the TimeWindowCompactionStrategy with a time window of 3 days. Because I already have sample data in the table, I would like to create a new table called (buckettestTWCS ) and use this compaction strategy.

I basically ran a describe command for the existing table and modified the create table script to include this strategy. Below is the modified script:

CREATE TABLE buckettestTWCS(
    customer text,
    assetid text,
    ...
    PRIMARY KEY ((customer),assetid, timestamp_modified)
) WITH CLUSTERING ORDER BY (assetid ASC,timestamp ASC)
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy', 'compaction_window_size': '3', 'compaction_window_unit': 'DAYS'}
    AND default_time_to_live = 0
    ... ;

My question is based on the default_time_live property and whether I would need to change this default value or leave it as is. I wouldn't want to change any property which would affect the performance on querying data in the table.

Any feedback and insight would be greatly appreciated. I am still researching on different ways on implementing compaction strategies as I want to fully understand it in depth.

Kind Regards,

David.

compaction
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered

The question on whether you need to set a default TTL on a table is based on the business requirements.

If (a) your business requires that you shouldn't keep data older than a certain threshold or (b) if you have no need to read old data then (c) you can set the appropriate TTL.

If you set a default TTL with TWCS, expired SSTables automatically get dropped (deleted from the filesystem).

If you're learning about compaction strategies, How data is maintained in Cassandra is recommended reading. But I suggest that it's not something you should be too concerned about. As a general recommendation:

  • STCS is almost always the right choice which is why it is the default
  • TWCS is the right choice if you have a true time-series use case
  • there are very limited edge cases where LCS is a good fit

In my experience, users choose LCS without understanding the pros/cons. If you don't know, STCS is the right choice. Cheers!

Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.