sagis avatar image
sagis asked Erick Ramirez answered

What are primary key requirements for TWCS?

We have multiple tables storing all kinds of statistics, which for now are using default Size Tiered CS. As we're witnessing some issues with the stability, we're thinking about switching to Time Window CS, as according to the available information, it's the best option for our use case.

My question is, what requirements should the primary key meet to take advantage of the TWCS? Currently, it's a little bit messy - ((user_id, month), namespace, data_id, timestamp), where all columns, besides timestamp, are strings.

Should the timestamp be a part of the partition key?

10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Erick Ramirez avatar image
Erick Ramirez answered

The primary key for a table is determined by the application query -- the primary key has no bearing on the choice of compaction strategy.

The only consideration for using TimeWindowCompactionStrategy (TWCS) is that you have a pure time-series use case particularly where the data has a default expiration (TTL). By definition, time-series means that data does not get updated or they will be out-of-sequence.

It is not strictly required to have a default TTL on TWCS tables but being time-series, the data size can grow pretty quickly and since there are no updates/deletes, there is no way to manage the size of the data.

For more information, see the Compaction Strategies section of How data is maintained in Cassandra. Cheers!

10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

steve.lacerda avatar image
steve.lacerda answered steve.lacerda commented

That's not a question that I can easily answer because it will depend on the data. Are you experiencing large partitions with STCS? If so, then the same will hold for TWCS so you may need to add the timestamp or some sort of bucketing. You just want to make sure your partitions stay below that 400MiB size, so the partition key will depend on keeping below those bounds.

With TWCS, be sure you're not issuing UPDATE or DELETE statements and all you're doing is inserting and reading. If that's the use case, then TWCS will fit perfectly. Also, if you're TTL'ing data, then TWCS is typically better if you have the same TTL for all of your records because TWCS can drop entire sstables instead of only cleaning on compaction.

You didn't explain your issue, so I hope the above helps.

2 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

sagis avatar image sagis commented ·

Sorry for not having enough details, I've "inherited" the schema and I'm still quite shaky on details myself.

Data stored in the DB are statistics - the average number of events per day / month, per user / location. They fulfill the requirements you've listed - I'm only inserting the data (there's not any UPDATE or DELETE), and it's TTLed. Since it quite well fits into TWCS strengths, I've wanted to make the switch, regardless of the issue (but perhaps wished it'd help with it too).

What was my main point of the question, was to learn how important is defining correct primary key when using TWCS. My doubts mostly come from the blog post, especially the part: "This only works efficiently if the primary key of your data is time-based"

I was wondering if my primary key is "enough time-based" in order for TWCS to deliver its benefits.

0 Likes 0 ·
steve.lacerda avatar image steve.lacerda ♦ sagis commented ·

I'm not sure why they are concerned about the primary key being time-based, as that really doesn't matter. The way TWCS works is by bucketing over time, so the most important thing is that you're partitions are not large.

0 Likes 0 ·