Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

stephan_178911 avatar image
stephan_178911 asked ·

How can I improve the read throughput for individual queries to single nodes?

I'm evaluating using Cassandra as a binary object store. In my experiments I'm not getting good enough read throughput for single queries to single nodes. Even on a MacBook Pro 15 2018 reading e.g. a 128 MiB blob in 64KiB chunks with the Cassandra C++ driver from a single node on the loopback interface takes about 300ms. I was hoping I could saturate a 10 GiB port on a fast enough server with single requests for large blobs. Are there any C++ driver, Cassandra or Java VM tuning options that could improve read throughput in this scenario?

The blob table has the following schema:

CREATE TABLE blobs.blob (
    id uuid,
    chunk_id int,
    blob_size int,
    creation_time timestamp,
    data blob,
    data_type text,
    hash blob,
    info text,
    update_time timestamp,
    PRIMARY KEY (id, chunk_id)
) WITH CLUSTERING ORDER BY (chunk_id ASC)
    AND bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'}
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'enabled': 'false'}
    AND crc_check_chance = 0.0
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99PERCENTILE';



cassandraperformancec++
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered ·

@stephan_178911 your test isn't ideal in that both client and cluster are on the same host because they are competing for the same resources (CPU, disk, network IO).

We recommend that you have at least 3 nodes in your cluster (on separate servers) and run the application on a completely separate server. You will find that increasing the number of app instances (even when all app instances are running on the same server) will also increase the throughput of your cluster significantly (all things being equal). Cheers!

Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.