PLANNED MAINTENANCE

Hello, DataStax Community!

We want to make you aware of a few operational updates which will be carried out on the site. We are working hard to streamline the login process to integrate with other DataStax resources. As such, you will soon be prompted to update your password. Please note that your username will remain the same.

As we work to improve your user experience, please be aware that login to the DataStax Community will be unavailable for a few hours on:

  • Wednesday, July 15 16:00 PDT | 19:00 EDT | 20:00 BRT
  • Thursday, July 16 00:00 BST | 01:00 CEST | 04:30 IST | 07:00 CST | 09:00 AEST

For more info, check out the FAQ page. Thank you for being a valued member of our community.


question

stephan_178911 avatar image
stephan_178911 asked ·

How can I improve the read throughput for individual queries to single nodes?

I'm evaluating using Cassandra as a binary object store. In my experiments I'm not getting good enough read throughput for single queries to single nodes. Even on a MacBook Pro 15 2018 reading e.g. a 128 MiB blob in 64KiB chunks with the Cassandra C++ driver from a single node on the loopback interface takes about 300ms. I was hoping I could saturate a 10 GiB port on a fast enough server with single requests for large blobs. Are there any C++ driver, Cassandra or Java VM tuning options that could improve read throughput in this scenario?

The blob table has the following schema:

CREATE TABLE blobs.blob (
    id uuid,
    chunk_id int,
    blob_size int,
    creation_time timestamp,
    data blob,
    data_type text,
    hash blob,
    info text,
    update_time timestamp,
    PRIMARY KEY (id, chunk_id)
) WITH CLUSTERING ORDER BY (chunk_id ASC)
    AND bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'}
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'enabled': 'false'}
    AND crc_check_chance = 0.0
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99PERCENTILE';



cassandraperformancec++
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered ·

@stephan_178911 your test isn't ideal in that both client and cluster are on the same host because they are competing for the same resources (CPU, disk, network IO).

We recommend that you have at least 3 nodes in your cluster (on separate servers) and run the application on a completely separate server. You will find that increasing the number of app instances (even when all app instances are running on the same server) will also increase the throughput of your cluster significantly (all things being equal). Cheers!

Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.