Hi!
I'm curious if Cassandra can perform reads faster than I currently get.
I have a single-node development cluster with 16 Gb RAM, 6 Cores and SSD. Cassandra has 8 Gb RAM.
The only table in my keyspace is the following:
CREATE TABLE test.records ( source text, record_date date, record_time time, id text, name text, type text, z_content text, PRIMARY KEY ((source, record_date), record_time, id) ) WITH CLUSTERING ORDER BY (record_time ASC, id ASC) AND bloom_filter_fp_chance = 0.01 AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} AND comment = '' AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'} AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND crc_check_chance = 1.0 AND dclocal_read_repair_chance = 0.1 AND default_time_to_live = 0 AND gc_grace_seconds = 864000 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND read_repair_chance = 0.0 AND speculative_retry = '99PERCENTILE';
It contains 1 partition with 100 000 rows with total size ~180 Mb. Data was put into the table via CQLSH COPY.
With the following simple code, targeting that 1 partition, I'm getting the data of this whole partition in 1.5-2 seconds:
int count = 0;
long start = System.nanoTime();
for (Row row : session.execute("SELECT * FROM test.records WHERE source='Source1' AND record_date='2021-09-29'")) {
count++;
}
long duration = System.nanoTime()-start;
System.out.println(count+" got in "+duration+" nanoseconds");
Is there a chance to improve performance of partition dumping? I'm expecting it to read 5-10 times faster.