Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

sharathsai666_167144 avatar image
sharathsai666_167144 asked ·

ConnectionException with Cassandra java driver while querying

Hi all,
we were facing ConnectionException for some nodes, connection exception is related to that Cassandra is not connecting to any particular node, is that means the whole cluster is down or any particular node is not available
can I get more inputs on this scenario?
thanks !!
Cassandra version - 3.11.6
Cassandra java Driver - 3.10

2020-07-29 11:33:00:747*[ERROR]*transformreg (2/2)*RequestHandler*onFailure*Unexpected error while querying nosqldb-2.nosqldb.default.svc.cluster.local/x.x.x.x
com.datastax.driver.core.exceptions.ConnectionException: [nosqldb-2.nosqldb.default.svc.cluster.local/x.x.x.x:9042] Pool is CLOSING
    at com.datastax.driver.core.HostConnectionPool.borrowConnection(HostConnectionPool.java:241)
    at com.datastax.driver.core.RequestHandler$SpeculativeExecution.query(RequestHandler.java:397)
    at com.datastax.driver.core.RequestHandler$SpeculativeExecution.findNextHostAndQuery(RequestHandler.java:360)
    at com.datastax.driver.core.RequestHandler.startNewExecution(RequestHandler.java:140)
    at com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:122)
    at com.datastax.driver.core.SessionManager.execute(SessionManager.java:697)
    at com.datastax.driver.core.SessionManager.prepareAsync(SessionManager.java:178)
    at com.datastax.driver.core.AbstractSession.prepareAsync(AbstractSession.java:104)
    at com.datastax.driver.mapping.AccessorMapper.prepare(AccessorMapper.java:55)
    at com.datastax.driver.mapping.MappingManager.getAccessor(MappingManager.java:359)
    at com.datastax.driver.mapping.MappingManager.createAccessor(MappingManager.java:321)
    at com.test.cassandra.dao.DetailsDao.fetchDetailsImpl(DetailsDao.java:87)
    at com.test.cloud.util.DAOUtils.fetchDetails(DAOUtils.java:130)
    at com.test.cloud.processing.operators.RegistrationRecordConvertor.processElement(RegistrationRecordConvertor.java:94)
    at com.test.cloud.processing.operators.RegistrationRecordConvertor.processElement(RegistrationRecordConvertor.java:45)
    at org.apache.flink.streaming.api.operators.KeyedProcessOperator.processElement(KeyedProcessOperator.java:85)
    at org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:202)
    at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:105)
    at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:300)
    at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
    at java.lang.Thread.run(Thread.java:748)
java driver
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Erick Ramirez avatar image
Erick Ramirez answered ·

I've been looking into the issue you reported. Since the requests are made asynchronously, I think you are running into a race condition where the connection pool to a node gets shutdown while the client is requesting a new connection.

Two scenarios where I think this could happen are:

  1. Your app closes the session while async transactions are in-flight.
  2. The nodes become unresponsive or are down leading to the driver shutting down the connection pool.

In your case, I think the second scenario is more likely than the first.

It's interesting that the screenshot you attached shows a memory limit of about 8GB. Do the pods only have 8GB of RAM or is that the max heap allocated to Cassandra? Either way, 8GB is too small and only suitable for development environments where you are doing functional testing meaning you only run 2-3 queries per second. Cheers!

Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

bettina.swynnerton avatar image
bettina.swynnerton answered ·

Hi @sharathsai666_167144,

I can't tell from the error trace if this is one or more nodes that are not responding in time. Is this the same cluster where you are already experiencing pauses, as reported here?

To get a state of the cluster, you can check with nodetool status

And inspect the system.logs for evidence of nodes down.

However, the pauses that you mentioned in your other post can also cause these client side exceptions, so you really want to look into those.

In your other post you mention that you have low load, but I still think that 2.5GB of heap is small for running Cassandra, perhaps you can experiment with larger instance sizes for a more reliable performance.

I hope this gives you some ideas where to look next.

5 comments Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Thanks, @bettina.swynnerton for the multiple posts reply.
This is a different cluster, we also checked with system.log we didn't find any node down logs I am attaching few logs after ConnectionException.

2020-07-29 11:41:04:352*[INFO]*CompactionExecutor:28657*o.a.c.c.AutoSavingCache*saveCache*Saved RowCache (14 items) in 14 ms

2020-07-29 11:41:38:909*[INFO]*IndexSummaryManager:1*o.a.c.i.s.IndexSummaryRedistribution*redistributeSummaries*Redistributing index summaries

we have more logs with particular ip as below, eg. host1. so connection exception can also lead with one single node problem also?

2020-07-29 11:33:00:738*[DEBUG]*transformconvertcontrolstream (1/2)*STATES*signalConnectionClosed*[nosqldb-2.nosqldb.default.svc.cluster.local/host1:9042] Connection[nosqldb-2.nosqldb.default.svc.cluster.local/host1:9042-2, inFlight=1, closed=true] closed, remaining = 1

hardware usage of this cluster

0 Likes 0 · ·
1602060323862.png (84.4 KiB)

Hi @sharathsai666_167144,

those two info messages about autosaving row cache and redistributing index summaries are nothing out of the ordinary and do not indicate a problem, so this doesn't tell us any more information.

How many nodes do you have in this cluster, what is the replication factor and the consistency of the query with the problem, and what happens in the logs before the connection issue?


0 Likes 0 · ·
sharathsai666_167144 avatar image sharathsai666_167144 bettina.swynnerton ♦♦ ·

Thanks for the Response @bettina.swynnerton

Our Cluster Details:

Total Nodes - 3
Replication Factor -
3

Consistency - QUORUM


Cassandra Version Details:
Cassandra - 3.11.6
Cassandra Java Driver - 3.10.0

I am attaching sequence logs along with this
sequence_logs.txt

These are our Pooling Options Configuration,

PoolingOptions poolingOptions = new PoolingOptions();
poolingOptions.setConnectionsPerHost(HostDistance.LOCAL, 1, 8);
poolingOptions.setConnectionsPerHost(HostDistance.REMOTE, 1, 4);
poolingOptions.setMaxRequestsPerConnection(HostDistance.LOCAL, 32768);
poolingOptions.setMaxRequestsPerConnection(HostDistance.REMOTE, 2000);

Note: we are facing pool closing logs, what are the best configuration for pooling options, expected outcomes.

0 Likes 0 · ·
sequence-logs.txt (4.3 KiB)
Show more comments