question

acucciarre_144605 avatar image
acucciarre_144605 asked Erick Ramirez commented

Why does the C++ driver connect to an IP not in the Kubernetes cluster?

cpp cassandra driver (2.12) doesn't seems to work properly in kuberenetes

After deleted the cassandra pods for testing I have the following errors in my app that uses the cpp driver

1592907111.602 [WARN] (src/connection_pool.cpp:272:void cass::ConnectionPool::on_reconnect(cass::DelayedConnector*)): Connection pool was unable to reconnect to host 10.244.135.3 because of the following error: Connect error 'connection refused'
1592907113.605 [WARN] (src/connection_pool.cpp:272:void cass::ConnectionPool::on_reconnect(cass::DelayedConnector*)): Connection pool was unable to reconnect to host 10.244.135.3 because of the following error: Connect error 'connection refused'

The ip address 10.244.135.3 doesn't exist, so it seems that cpp driver is using one of the old pod ip addresses:

# kubectl exec -it hf-apiserver-7748dd65cd-ftndw -c apiserver -- env | grep CASSANDRA_HOSTNAME
CASSANDRA_HOSTNAME=hyperfile-cassandra-seed-service.cass-operator.svc.cluster.local
# kubectl exec -it hf-apiserver-7748dd65cd-ftndw -c apiserver -- nslookup hyperfile-cassandra-seed-service.cass-operator.svc.cluster.local
Server: 10.96.0.10
Address: 10.96.0.10#53

Name: hyperfile-cassandra-seed-service.cass-operator.svc.cluster.local
Address: 10.244.166.133
Name: hyperfile-cassandra-seed-service.cass-operator.svc.cluster.local
Address: 10.244.104.7
Name: hyperfile-cassandra-seed-service.cass-operator.svc.cluster.local
Address: 10.244.135.6

Note that cassandra pods are healthy and running:

# kubectl get pod -n cass-operator -o wide
NAME                                 READY  STATUS   RESTARTS  AGE   IP              NODE   NOMINATED NODE READINESS GATES
cass-operator-6d89c5f54f-8ljgr       1/1    Running  0         176m  10.244.104.1    node2  <none>         <none>
hyperfile-cassandra-dc1-rack1-sts-0  2/2    Running  0         147m  10.244.104.7    node2  <none>         <none>
hyperfile-cassandra-dc1-rack1-sts-1  2/2    Running  0         136m  10.244.135.6    node3  <none>         <none>
hyperfile-cassandra-dc1-rack1-sts-2  2/2    Running  0         169m  10.244.166.133  node1  <none>         <none>

[EDIT]

My app use the env variable CASSANDRA_HOSTNAME which is set to hyperfile-cassandra-seed-service.cass-operator.svc.cluster.local

# kubectl exec -it hf-apiserver-7748dd65cd-ftndw -c apiserver -- env | grep CASSANDRA_HOSTNAME
CASSANDRA_HOSTNAME=hyperfile-cassandra-seed-service.cass-operator.svc.cluster.local

the name hyperfile-cassandra-seed-service.cass-operator.svc.cluster.local comes from the service created by the datastax cassandra operator:

# kubectl get svc hyperfile-cassandra-seed-service -n cass-operator -o yaml
apiVersion: v1
kind: Service
  <snip>
  name: hyperfile-cassandra-seed-service
  namespace: cass-operator
  <snip>
cass-operatorkubernetescpp driver
2 comments
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

bettina.swynnerton avatar image bettina.swynnerton ♦♦ commented ·

Hi,

How are the cluster contact points defined in your app? What hostnames or ip addresses are you specifying there?

0 Likes 0 ·
acucciarre_144605 avatar image acucciarre_144605 bettina.swynnerton ♦♦ commented ·

[Comment reposted in original question]

0 Likes 0 ·

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered Erick Ramirez commented

There are three things that stood out to me in reading your post. Let me address them individually.

Timestamp

1592907111 is equivalent to 10:11 GMT or roughly 3 hours after you posted your question. I wanted to confirm that the warning is relevant and not possibly from an earlier period of time before you deleted the pods and created a new cluster.

If it was then the rest is no longer relevant.

Problematic IP

Can you confirm that IP address 10.244.135.3 is indeed from the cluster you deleted previously? Or was that just speculation?

There is the possibility that one of the STS ran into an issue and was replaced automatically. When the driver connects to the cluster, it discovers the cluster topology by reading the system.peers table of the contact point. If the replaced node's IP is still in the system.peers table, the driver will attempt to connect to it.

It's quite simple to work whether this is the problem you're hitting. Just query all the nodes with cqlsh to check the contents of the system.peers table. If IP 10.244.135.3 does exist, the workaround is to simply delete it:

cqlsh> DELETE FROM system.peers WHERE peer = '10.244.135.3';

Contact points

As you stated, you are using the seed service for the driver contact point. We recommend that you use the Kubernetes headless service for connecting to the C* cluster.

The headless service name has the format <clusterName>-<datacenterName>-service. For example, cluster1-dc1-service. Use this as the contact point when you configure the driver.

For more information, see Using your Cassandra Operator cluster in Kubernetes. Cheers!

2 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

acucciarre_144605 avatar image acucciarre_144605 commented ·

Thanks Erick, I will reproduce the issue and let you know the outcome

0 Likes 0 ·
Erick Ramirez avatar image Erick Ramirez ♦♦ acucciarre_144605 commented ·

Not a problem. Cheers!

0 Likes 0 ·