vpanarin84_192084 avatar image
vpanarin84_192084 asked Erick Ramirez edited

Why are my async queries not executed in parallel?


I'm using Cassandra Java Driver 4.2.0 to perform multiple asynchronous SELECT queries with the following code:

for (int i = 1; i <= 20; i++) {
  String currentKey = "key_"+i;
  SimpleStatement stmt = SimpleStatement.newInstance("SELECT * FROM mykeyspace.table1 WHERE "+
  long start = System.nanoTime();
  CompletableFuture<?> f = session.executeAsync(stmt).toCompletableFuture();
  f.thenRunAsync(() ->" SELECT duration: "+(System.nanoTime()-start)));

my_key column is the partition key for table1, the result of each query consists of 10000-20000 rows.

As I see in the output (the result of last line of code above), the duration of query execution grows, i.e. query results are obtained not in parallel: while result of 1st query is being transferred to my machine, result of 2nd query is not transferred.

My assumption this is because of single network connection to Cassandra being used. If my assumption is correct, is there a way to utilize multiple network connections to perform queries and obtain results in parallel? Does the java Driver provide such an option?

java driver
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered Erick Ramirez commented

The nature of a for loop is that the code inside it gets executed one at a time, not in parallel.

If you want parallelism to maximise the throughput of your cluster, we recommend that you create multiple app instances so you have multiple clients sending requests to the cluster in parallel. Cheers!

2 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

vpanarin84_192084 avatar image vpanarin84_192084 commented ·

Erick, thank you.

However, the `for` loop in my example executes asynchronous requests that are still executed when the `for` loop is ended. I see the requests complete in a random order, so they finish not necessarily in the order I've made them in the `for` loop.

By the end of each request, its duration is measured in a separate thread. And it looks like obtaining of query result blocks other query results as duration of each request is duration of previously completed request + some time.

I think this is because of only one Netty connection being used, i.e. all results are obtained via one socket. I've tried to maximize throughput by increasing the connection pool size but it haven't changed the picture at all.

0 Likes 0 ·
Erick Ramirez avatar image Erick Ramirez ♦♦ vpanarin84_192084 commented ·

Correct, there's just one connection so the queries are not really parallel in the true sense. In any case, you wouldn't do a for loop in a real system.

Again, you need multiple app instances if you want to maximise throughput. Cheers!

0 Likes 0 ·