Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

lewis_142690 avatar image
lewis_142690 asked ·

30k lines read/s

When reading from a single client, the throughput seems to always be ~30k lines/s. This happens when reading from a 6 nodes cluster of 8vCPU each, same result on a single machine with 16 cores / 32 threads, whether the client is on the same machine as the server or on another machine on the same lan or even across an internet connection. If I start multiple instances of the reading client in parallel (on the same client machine), I can easily read more than 60k lines/s on the single machine server.


Seems to me like I should be able to extract around 60-100k lines per second without too much trouble and I'm wondering if there's a specific configuration parameter I should be looking into to fix this issue.


Note that there is only a single `select * from table` query running.

performance
5 comments
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

@lewis_142690 what did you mean by "If I start the reading client multiple times"? I'm not sure what you mean so I'm having difficulty answering your question. Cheers!

0 Likes 0 · ·

From a single client machine, I start multiple instances of my nodejs client that reads from a table using `select * from table` with the streaming API. That's just showing that the server is able to provide more than 30k lines/s. The client itself does nothing but deserialize and output data which is then redirected to `/dev/null`

0 Likes 0 · ·

@lewis_142690 yes, multiple client instances will be more performant. Single process clients send requests sequentially so the throughput gets capped. Multiple clients send multiple queries to the cluster in parallel so you get better throughput. Cheers!

0 Likes 0 · ·
Show more comments

Also note that some other tables are able to provide more throughput (50k/s)

0 Likes 0 · ·

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered ·

@lewis_142690 multiple client instances will be more performant. Single process clients send requests sequentially so the throughput gets "capped". Multiple clients send multiple queries to the cluster in parallel so you get better throughput. Cheers!

1 comment Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

I'm using the nodejs client streaming api and I'm running a single `select * from table`. Doesn't the client asynchronously loads multiple pages to speed things up or would I have to manually do paging to increase performance?

0 Likes 0 · ·