We have had two Cassandra nodes in a n-size cluster go down, and I see these
[Control connection] Cannot connect to any host, scheduling retry in 3000 milliseconds
And reading up about it, our retry handling:
Cluster.builder .addContactPointsWithPorts(hosts.map(new InetSocketAddress(_, port)): _*) .withReconnectionPolicy(new ConstantReconnectionPolicy(retryInterval)) .withSocketOptions(new SocketOptions().setReadTimeoutMillis(readTimeOut)) .withQueryOptions(new QueryOptions().setConsistencyLevel(defaultConsistencyLevel) .setSerialConsistencyLevel(defaultSerialConsistencyLevel)) .withRetryPolicy(retryPolicy) .withPoolingOptions(poolingOptions)
So we retry every 3 seconds, BUT why don't we retry with another host ?
obviously the node is not working, so why try over and over with the same host
or perhaps that is what it does, how would I tell ?
either way it cannot retry and our solution will kill the micro service after a while