question

nahar.tarun86_182389 avatar image
nahar.tarun86_182389 asked Erick Ramirez answered

Spark job getting "BusyPoolException: Pool is busy (no available connection and the queue has reached its max size 256)"

We have spark job that runs on skylake and cascade machines. The executor configuration on both the cluster are same (same memory and core for executors).However in skylake cluster the jobs run fine and completes early, whil in cascade lake the job take 2 or 3 times higher time to complete.

My cluster is a standalone spark cluster running on spark 2.4.4 (not yarn or mesos as resource manager), there are 3 slaves, each running on each node/machine. There is one primary master and one standby secondary master.

When i see the logs in cascade machine, i get see lots of errors on debug

com.datastax.driver.core.exceptions.BusyPoolException: [host3_priv/] Pool is busy (no available connection and the queue has reached its max size 256)

com.datastax.driver.core.exceptions.BusyPoolException: [host1_priv/] Pool is busy (no available connection and the queue has reached its max size 256)

com.datastax.driver.core.exceptions.BusyPoolException: [host2_priv/] Pool is busy (no available connection and the queue has reached its max size 256)

My Spark cassandra connector version is 2.4.2.

I saw the post https://community.datastax.com/questions/3749/cassandra-pool-is-busy-no-available-connection-and.html talking about the problem which suggests 2 options a and b.

With a version , fix is available for 2.0.4 and above as mentioned in https://datastax-oss.atlassian.net/browse/SPARKC-503?page=com.atlassian.streams.streams-jira-plugin%3Aactivity-stream-issue-tab

But this issue is there ,so any solution to this problem.

spark-cassandra-connector
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered

It doesn't sound like the Spark connector bug you referenced applies to you. In my experience, the most common cause of the BusyPoolException is your cluster being overloaded.

Requests to the cluster are queued in a connection pool which allows 256 concurrent requests by default. When a request to the cluster completes, the connection is freed up to handle request.

Typically, this happens very quickly. But when you run (a) complex queries which take a long time, or (b) overload the cluster with more queries than it can handle, then (c) the pool runs out of available connections and the BusyPoolException gets thrown.

You need to throttle the throughput of your Spark app so it doesn't overload your cluster. See the Spark connector read and write tuning parameters.

As an alternative, consider increasing the capacity of your cluster by adding more nodes so it can handle more requests. Cheers!

Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.