Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started



vpanarin84_192084 avatar image
vpanarin84_192084 asked vpanarin84_192084 commented

Can read performance of a single partition on a single-node cluster be improved?


I'm curious if Cassandra can perform reads faster than I currently get.

I have a single-node development cluster with 16 Gb RAM, 6 Cores and SSD. Cassandra has 8 Gb RAM.

The only table in my keyspace is the following:

CREATE TABLE test.records (
    source text,
    record_date date,
    record_time time,
    id text,
    name text,
    type text,
    z_content text,
    PRIMARY KEY ((source, record_date), record_time, id)
    AND bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '64', 'class': ''}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99PERCENTILE';

It contains 1 partition with 100 000 rows with total size ~180 Mb. Data was put into the table via CQLSH COPY.

With the following simple code, targeting that 1 partition, I'm getting the data of this whole partition in 1.5-2 seconds:

int count = 0;
long start = System.nanoTime();
for (Row row : session.execute("SELECT * FROM test.records WHERE source='Source1' AND record_date='2021-09-29'")) {
long duration = System.nanoTime()-start;
System.out.println(count+" got in "+duration+" nanoseconds");

Is there a chance to improve performance of partition dumping? I'm expecting it to read 5-10 times faster.

read latency
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered vpanarin84_192084 commented

In the scenario you described, Cassandra is hardly doing any work. The performance is really constrained by the physics of reading from the disk.

When you're iterating over the rows within a partition, all that happens is that you're scanning over the sectors of the data disk and there isn't any disk seeks involved.

If you updated your code to get pages of rows (default is 5000), it might provide a slight improvement with larger chunks being read off the disk but it will still be constrained by the speed of your data disk. Cheers!

2 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Eric, thank you!

Yes, I understand that the reading is restricted by the disk performance. What I'm curious about is Cassandra reading performance in such scenario. As I don't see a disk access limit being hit with this scenario I'm trying to find out if Cassandra reads rows from disk not utilizing available disk speed. Maybe, some settings for this should be done for the node, but I can't find them. Having larger pageSize doesn't change performance.

0 Likes 0 ·
@Erick Ramirez Changing pageSize didn't improve the performance. I don't see any limit (disk, network, CPU) being hit. Looks like the performance is limited by some setting. Is there any such setting in Cassandra which limits the CPU or disk utilization? Currently it doesn't go above 67% of 1 core of 4, i.e. the CPU has enough capacity to be utilized by Cassandra.
0 Likes 0 ·