stephan_178911 avatar image
stephan_178911 asked Erick Ramirez answered

How can I improve the read throughput for individual queries to single nodes?

I'm evaluating using Cassandra as a binary object store. In my experiments I'm not getting good enough read throughput for single queries to single nodes. Even on a MacBook Pro 15 2018 reading e.g. a 128 MiB blob in 64KiB chunks with the Cassandra C++ driver from a single node on the loopback interface takes about 300ms. I was hoping I could saturate a 10 GiB port on a fast enough server with single requests for large blobs. Are there any C++ driver, Cassandra or Java VM tuning options that could improve read throughput in this scenario?

The blob table has the following schema:

CREATE TABLE blobs.blob (
    id uuid,
    chunk_id int,
    blob_size int,
    creation_time timestamp,
    data blob,
    data_type text,
    hash blob,
    info text,
    update_time timestamp,
    PRIMARY KEY (id, chunk_id)
    AND bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'}
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'enabled': 'false'}
    AND crc_check_chance = 0.0
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99PERCENTILE';

10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered

@stephan_178911 your test isn't ideal in that both client and cluster are on the same host because they are competing for the same resources (CPU, disk, network IO).

We recommend that you have at least 3 nodes in your cluster (on separate servers) and run the application on a completely separate server. You will find that increasing the number of app instances (even when all app instances are running on the same server) will also increase the throughput of your cluster significantly (all things being equal). Cheers!

10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.