Hi,
I would like to ask for help, whether somebody has faced similar issue.
I have created multi DC. 2 kubernet clusters, 2 DCs, each have replication 2 pods/nodes.
DC1: pod0(seed node), pod1
DC2: pod0(seed node), pod1
The main point here is that beside local IP addresses, all the pods have public IP address interface as well, that way that the 2 kubernet clusters can see each other via this public interface. I would like to solve the internode communication between the DCs, clusters so that they communicate on the public interface. I have configured the listen_interface for this purpose.
If the internode_communication is set to none, then everything works perfectly, the gossip works and all the nodes are part of the cluster, 2 pods in each DCs.
----------------------------------------------------------------------------
If I enable the internode_communication to all (or dc) and create the certificates (keystores, truststores, the truststores of the nodes contain all the public keys of all the nodes). After uploading the certificates to nodes, everything seams to be working. Based on the system.log all the nodes are connected to each other with SSL/TLS. Based on the nodetool status also everything seams to be working (executed from all the pods)>
nodetool status
Datacenter: MyCenter
====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 10.94.135.227 74.12 KiB 16 48.5% 427b0b7b-471f-4a42-9b68-e4000d9bc933 rack1
UN 10.94.135.228 128.06 KiB 16 51.6% 54309e13-83f7-470a-9f29-36185b67256f rack1
Datacenter: MyCenter2
=====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 10.94.135.230 88.57 KiB 16 51.2% 87fed0db-cefc-4a41-a04f-fe57c794051f rack1
UN 10.94.135.229 74.04 KiB 16 48.7% 2512b115-8308-4528-8deb-a28eef2caf13 rack1
But after 2 minutes everything dies at least the nodetool status shows incorrect information after 2 minutes.
Example> from DC1pod0>
nodetool status
Datacenter: MyCenter
====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 10.94.135.227 96.24 KiB 16 100.0% 093ba9e2-1167-42c7-921b-bf8c85392d3e rack1
UN 10.94.135.228 91.21 KiB 16 100.0% f853b93b-2e42-4407-b864-d9de71dc8d62 rack1
From DC2pod1>
nodetool status
Datacenter: MyCenter
====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 10.94.135.227 96.24 KiB 16 68.2% 093ba9e2-1167-42c7-921b-bf8c85392d3e rack1
UN 10.94.135.228 91.21 KiB 16 66.1% f853b93b-2e42-4407-b864-d9de71dc8d62 rack1
Datacenter: MyCenter2
=====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 10.94.135.230 81.48 KiB 16 65.8% b1c8420e-71ba-4604-a350-9b980dcfefea rack1
What I don't understand that if nothing happens then why is it changed after 2 minutes?
From the system.log I don't see any failure, from the netstat still all the nodes are connected to each other, like>
tcp 0 0 0.0.0.0:8080 0.0.0.0:* LISTEN 326/java
tcp 0 0 172.0.143.47:9042 0.0.0.0:* LISTEN 2677/java
tcp 0 0 10.94.135.230:7000 0.0.0.0:* LISTEN 2677/java
tcp 0 0 0.0.0.0:9500 0.0.0.0:* LISTEN 2677/java
tcp 0 0 0.0.0.0:7199 0.0.0.0:* LISTEN 2677/java
tcp 0 0 10.94.135.230:46816 10.94.135.228:7000 ESTABLISHED 2677/java
tcp 0 0 10.94.135.230:7000 10.94.135.228:55424 ESTABLISHED 2677/java
tcp 0 0 10.94.135.230:7000 10.94.135.229:44976 ESTABLISHED 2677/java
tcp 0 0 10.94.135.230:7000 10.94.135.227:56306 ESTABLISHED 2677/java
tcp 0 0 10.94.135.230:44948 10.94.135.229:7000 ESTABLISHED 2677/java
tcp 0 0 172.0.143.47:8080 192.168.142.12:48016 TIME_WAIT -
tcp 0 0 10.94.135.230:46822 10.94.135.228:7000 ESTABLISHED 2677/java
tcp 0 0 10.94.135.230:57214 10.94.135.227:7000 ESTABLISHED 2677/java
tcp 0 0 10.94.135.230:57162 10.94.135.227:7000 ESTABLISHED 2677/java
tcp 0 0 10.94.135.230:46824 10.94.135.228:7000 ESTABLISHED 2677/java
tcp 0 0 10.94.135.230:57712 10.94.135.227:7000 ESTABLISHED 2677/java
tcp 0 0 10.94.135.230:7000 10.94.135.229:44524 ESTABLISHED 2677/java
Does any body face similar issue with encryption of internode-communication ?
Thank you for any help or information,
BR
Gabor