Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

ebonfortress avatar image
ebonfortress asked ·

Reducing compactions after bulk loading

I have a table that looks like this

CREATE TABLE t (id INT, ts TIMESTAMP, /* 90 other small fields */, PRIMARY KEY (id, ts)) WITH CLUSTERING ORDER BY (ts DESC) AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}

The table is supposed to be queried as

SELECT * FROM t WHERE id = 1234 LIMIT 1

and

SELECT * FROM t WHERE id = 1234 AND ts < some_ts LIMIT 1

It has 1.5B rows and 200M unique ids and is populated with SSTableLoader from some source data. After the data is loaded into Cassandra it starts compacting it which takes quite some time. So two questions:

1. Is the data model correct for intended purpose - write once, read many, getting either the last version of the row or a version for certain timestamp?

2. Could something be done at CQLTableSSTableWriter side to reduce compactions? I have full control on how the data is written.

compaction
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered ·

Compactions are part of the normal operation of a Cassandra node.

When SSTables are identified as candidates based on the compaction strategy, Cassandra runs compaction on them. There's isn't anything wrong with this and assuming that you have to "reduce" or avoid the occurrence of compactions is an incorrect conclusion. Cheers!

Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.