Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

ebonfortress avatar image
ebonfortress asked ·

Poor performance when using clustering key and compression

I've encountered some strange performance characteristics while evaluating Cassandra as a storage for our project. The table is created like this

CREATE TABLE t (id INT, ts TIMESTAMP, a TEXT, b TEXT, c TEXT, PRIMARY KEY (id, ts))

It has 1.5B rows with 200M unique ids, text fields are quite short (< 30 characters). If compression is enabled the performance for queries like

SELECT * FROM t WHERE id = 123 LIMIT 1

is atrocious, around 1 second. If compression is disabled, the latency is way lower (50ms mean) but still not good. The bottleneck is IO. The cluster is comprised of 5 nodes with 8 cores and 32Gb RAM with spinning disks limited to 300 IOPS. Interestingly, pure key-value workload (e.g. table with 120M rows) works just fine, with latency of 1ms and 50K reads/sec.


The question is, obviously, what am I doing wrong, is the schema incorrect or are there some settings I should change?

performance
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered ·

Yes, you're right. The main issue here is the spinning disks with very low IOPS.

Features such as compression are at the mercy of disk IO. Our recommendation when latency matters is to use SSDs.

We noted from working with hundreds of customers that 3K and even 5K IOPS isn't sufficient for demanding applications. We recommend 10K IOPS for production workloads for optimum performance.

Have you considered lowering chunk_length_in_kb? If you're not reading a large amount of data, it makes sense to reduce the chunks so there's less data to decompress.

Also try different read-ahead values for the block device on the data disk to see if it makes a difference. But in my experience, there's only so much you can do with spinning disks. Cheers!

1 comment Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Thanks. What's surprising is that key-value workload works great. Is the table schema correct for intended purpose - getting the version of the data at specified interval with each key having on average 6 and maximum of 500 versions?

0 Likes 0 · ·