Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

teriksson avatar image
teriksson asked ·

How does the ConstantReconnectionPolicy work and when does it retry another host?

We have had two Cassandra nodes in a n-size cluster go down, and I see these

[Control connection] Cannot connect to any host, scheduling retry in 3000 milliseconds

And reading up about it, our retry handling:

Cluster.builder
  .addContactPointsWithPorts(hosts.map(new InetSocketAddress(_, port)): _*)
  .withReconnectionPolicy(new ConstantReconnectionPolicy(retryInterval))
  .withSocketOptions(new SocketOptions().setReadTimeoutMillis(readTimeOut))
  .withQueryOptions(new QueryOptions().setConsistencyLevel(defaultConsistencyLevel) .setSerialConsistencyLevel(defaultSerialConsistencyLevel))
  .withRetryPolicy(retryPolicy)
  .withPoolingOptions(poolingOptions)

So we retry every 3 seconds, BUT why don't we retry with another host ?

obviously the node is not working, so why try over and over with the same host

or perhaps that is what it does, how would I tell ?

either way it cannot retry and our solution will kill the micro service after a while


java driver
1603274991462.png (45.7 KiB)
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

teriksson avatar image
teriksson answered ·

thanx @Erick Ramirez, here are the details you were looking for

Something that would have nice is if it would have been possible to turn on some sort of printout, from the retry mechanism alone, so that one could see which host/port it tries to reach, and fails to reach. Then it would have been so much easier to understand what is really going on.

Whether this is a network problem on the particular host, hmmm.... well I need to do some analysis


Driver used :

`cassandra-driver-core` = "com.datastax.cassandra" % "cassandra-driver-core" % "3.10.2"

Cluster Topology :

2 DC with each 20+ Cassandra Nodes

If local DC is set on the driver :

No it is not explicitly set

What is set is :

Cluster.builder

.addContactPointsWithPorts(hosts.map(new InetSocketAddress(_, port)): _*)

.withReconnectionPolicy(new ConstantReconnectionPolicy(retryInterval))

.withSocketOptions(new SocketOptions().setReadTimeoutMillis(readTimeOut))

.withQueryOptions(new QueryOptions()

.setConsistencyLevel(defaultConsistencyLevel)

.setSerialConsistencyLevel(defaultSerialConsistencyLevel))

.withRetryPolicy(retryPolicy)

.withPoolingOptions(poolingOptions)



Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Erick Ramirez avatar image
Erick Ramirez answered ·

Background

A control connection is a dedicated connection to an arbitrary Cassandra node (referred to as the control node) in the cluster used by the driver for administrative purposes such as determining the cluster topology, schema metadata, node addition/decommission.

Reconnection policy

When the control node goes down, the reconnection policy:

  1. fetches a new query plan (list of nodes to contact) from the load balancing policy,
  2. attempt to connect to each node in sequence until a control connection is established to a node.

If the driver is unable to connect to any node in the query plan, the reconnection policy starts all over again with a new query plan from the load balancing policy until the driver has a control node again.

Your questions

To respond specifically to your questions:

So we retry every 3 seconds, BUT why don't we retry with another host ?

obviously the node is not working, so why try over and over with the same host

You haven't provided evidence that the reconnection policy isn't working as expected. It would help us if you could supply background information to show how you arrived at the conclusion that the driver is only trying the same node over and over.

It would be ideal too if you could provide the following information:

  • driver version used
  • cluster topology, e.g. if multi-DC
  • if local DC is set on the driver

It is odd that the driver cannot establish a control connection to any node in the cluster. In my experience, this is usually a network layer issue. It might be a clue worth pursuing in your environment. Cheers!

Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.