sam_192051 avatar image
sam_192051 asked commented

What is the recommended way of dealing with the 1024-connection limit per node?

Hello - I am looking for common approaches to limiting concurrency using async operations in the Java CQL driver. I have some large write operations that occur on a lot of different partitions in what is effectively a batch.

In total, for a common job it will end up doing between 45,000 and 1,000,000 CQL operations. Some of these are batched, but the majority are not suitable for batching due to the nature of the workload. Most writes are between 1 kb and 10 kb, though some are smaller by necessity.

I've split out each write operation to function somewhat as a tree, since some writes require reads, and some writes are dependent on other writes. This works fine when the number of branches is small, but when the number of branches (n.b. >100) and therefore number of dependent async operations increases, I am getting NoNodeAvailableException.

Having done some googling, this appears to be expected when the number of active operations is higher than the number of available slots configured for the connection. However, what isn't apparent is how to address the situation. In my case, it's fine for every operation to simply queue up and "wait its turn." We're using Datastax driver version 4.6.x and cassandra 3.11, running in Spring Boot 2.3. According to the rate limiting document on Datastax driver docs, one can use the rate limiter, and I've tried a few different permutations with it, but this appears to have no affect and still raises the same exception.

Here is the configuration I've tried to allow things to queue up (not that I planned on doing this, but just to see what happened--nothing did).

      keyspace-name: test
      schema-action: none
      contact-points: cassandra,localhost
          type: RATE_LIMITING
          max-queue-size: 999999999
          max-requests-per-second: 32768
          drain-interval: 1ms

I am curious what other teams are doing to combat this 1024 connections/node limitation. The docs indicate that one doesn't necessarily want to increase this limit. In fact, spring-data-cassandra doesn't even expose this configuration, so it's clearly meant to be left alone.

So far I've been able to implement a retry for the async operations. This appears to work fine, though a suboptimal solution, since I am putting a lot of additional load on the driver as it turns away writes with this NoNodeAvailableException.

10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered commented

Based on your description, your application is doing a significant amount of queries such that it almost behaves like a DDoS attack on the cluster. As you already discovered, increasing max-requests-per-connection results in diminishing returns since the nodes in the cluster can only service so many requests at a time that there really is no benefit.

If your application really does need to fire off more requests than there are total max-requests-per-connection, you should add nodes to your cluster to increase capacity. For example, if you have a 5-node cluster than can handle 200K transactions per second (tps), doubling the size by adding another 5 nodes also doubles the capacity of your cluster to 400K tps.

I'm going to reach out internally to the Driver engineers at DataStax and request them to respond to you directly with their thoughts. Cheers!

1 comment Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total. avatar image commented ·

I would just use the throttling that is built-in into the driver:

0 Likes 0 ·