I've encountered some strange performance characteristics while evaluating Cassandra as a storage for our project. The table is created like this
CREATE TABLE t (id INT, ts TIMESTAMP, a TEXT, b TEXT, c TEXT, PRIMARY KEY (id, ts))
It has 1.5B rows with 200M unique ids, text fields are quite short (< 30 characters). If compression is enabled the performance for queries like
SELECT * FROM t WHERE id = 123 LIMIT 1
is atrocious, around 1 second. If compression is disabled, the latency is way lower (50ms mean) but still not good. The bottleneck is IO. The cluster is comprised of 5 nodes with 8 cores and 32Gb RAM with spinning disks limited to 300 IOPS. Interestingly, pure key-value workload (e.g. table with 120M rows) works just fine, with latency of 1ms and 50K reads/sec.
The question is, obviously, what am I doing wrong, is the schema incorrect or are there some settings I should change?