question

milosz avatar image
milosz asked joao.reis answered

Java driver NoNodeAvailableException

Hi,

I've already checked all other questions with NoNodeAvailableException, but neither was the same case and definitely the proposed solution won't work in this one.

We have Spring Boot API serving content from Cassandra 3.11.6 cluster which currently uses Java Driver 4.9.0 (in the past also older ones plus same thing happens with DBeaver which also uses the library).

The problem with NoNodeAvailableException occurs when the cluster is fully restarted. I'm not confident that it happens when the restart proceeds in rolling, graceful fashion, but when all nodes are eventually restarted a couple of times the driver apparently "gives up" and no longer tries to reconnect even though all nodes are UP/NORMAL in the nodetool (the errors_init_connection_total counter increases with every new request attempt).

It's definitely not a cluster issue, because simple app restart (or connection with other tool like DBeaver) work perfectly fine, it's just that the driver doesn't recover. For the same reasons I'm sure it's not a network issue, tried with both IP and domain endpoints with createUnresolved feature. It's also not connected to requests clutter - there were no more requests than those I could click manually in HTTP request tool.

I tried to manipulate all related properties in the application.conf which didn't help. Is there some kind of internal limit of cluster reconnections?

java driverdriver
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Erick Ramirez avatar image
Erick Ramirez answered Erick Ramirez converted comment to answer

It's difficult to comment as to what's going on with your environment. Perhaps if you provide a sample code plus the steps to replicate the problem, we might be able to provide some guidance. Cheers!

3 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

milosz avatar image milosz commented ·

I guess it can be any Cassandra cluster and any application using DataStax Java Driver, but here you can find simplest setup possible with Kubernetes deployment:

https://github.com/miloszszymczak/cassandra-test

The steps to reproduce is to restart all the cluster nodes (even all at once) until the client (which mustn't be restarted) starts throwing NoNodeAvailableException even though all the nodes are UN in the nodetool. Cross-check can be done with other client instance or some database client which should be able to connect and interact with the cluster.

0 Likes 0 ·
milosz avatar image milosz commented ·

Ok, so it looks like after simulated cluster crash the driver no longer uses the initial contact points (Kubernetes services with static IP addresses), but instead tries to reach Cassandra using IP adressess obtained during the first sessions (which are no longer valid because after cluster crash the nodes' addresses changed). Is it possible to force the driver to use only addresses provided by developer? It's not connected to the createUnresolved feature, because this value is replaced after cluster crash anyway.

0 Likes 0 ·
Erick Ramirez avatar image Erick Ramirez ♦♦ milosz commented ·

It seems like you omitted a very important fact about your environment -- that Cassandra is deployed in a Kubernetes cluster. For future reference, you should state this up front.

As you know, IP addresses are transient in a K8s cluster so they are not recommended to be used for contact points. Instead you should use a service that exposes your cluster to apps/clients for ingress into your cluster.

If you didn't already know,

K8ssandra is a ready-made platform for running Apache Cassandra in Kubernetes using the DataStax Cassandra Operator (cass-operator) under the hood but with all the tooling built-in:

More importantly, it comes with Traefik templates for k8s cluster ingress. It should give you an idea on how to setup your cluster. Cheers!

0 Likes 0 ·
joao.reis avatar image
joao.reis answered

I'm not a Java expert but it sounds like an issue with DNS caching, either at OS or JVM level. Can you confirm that when you restart the nodes, you get a new IP for each Cassandra node?

You are initializing the contact points the correct way for this type of dynamic environment so the driver should eventually pick up the new IP addresses from the DNS resolution of the hostnames. If this is not happening then you should check what is the default dns cache TTL setting for your JVM environment.

1 comment Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

milosz avatar image milosz commented ·

Yes, the most internal IPs change, but I also tried to provide contact points as IPs to relevant, individual Kubernetes Services, they are static and immediately point to the proper Cassandra node, but it didn't help either. It should eliminate any DNS and DNS-caching issues, but still the driver just "gives up" and starts working perfectly just after restart.

0 Likes 0 ·