question

PGabor avatar image
PGabor asked Erick Ramirez commented

Multi-DC environment on Kubernetes stops working when internode encryption is enabled

Hi,

I would like to ask for help, whether somebody has faced similar issue.

I have created multi DC. 2 kubernet clusters, 2 DCs, each have replication 2 pods/nodes.

DC1: pod0(seed node), pod1

DC2: pod0(seed node), pod1
The main point here is that beside local IP addresses, all the pods have public IP address interface as well, that way that the 2 kubernet clusters can see each other via this public interface. I would like to solve the internode communication between the DCs, clusters so that they communicate on the public interface. I have configured the listen_interface for this purpose.

If the internode_communication is set to none, then everything works perfectly, the gossip works and all the nodes are part of the cluster, 2 pods in each DCs.

----------------------------------------------------------------------------

If I enable the internode_communication to all (or dc) and create the certificates (keystores, truststores, the truststores of the nodes contain all the public keys of all the nodes). After uploading the certificates to nodes, everything seams to be working. Based on the system.log all the nodes are connected to each other with SSL/TLS. Based on the nodetool status also everything seams to be working (executed from all the pods)>

nodetool status
Datacenter: MyCenter
====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 10.94.135.227 74.12 KiB 16 48.5% 427b0b7b-471f-4a42-9b68-e4000d9bc933 rack1
UN 10.94.135.228 128.06 KiB 16 51.6% 54309e13-83f7-470a-9f29-36185b67256f rack1

Datacenter: MyCenter2
=====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 10.94.135.230 88.57 KiB 16 51.2% 87fed0db-cefc-4a41-a04f-fe57c794051f rack1
UN 10.94.135.229 74.04 KiB 16 48.7% 2512b115-8308-4528-8deb-a28eef2caf13 rack1

But after 2 minutes everything dies at least the nodetool status shows incorrect information after 2 minutes.

Example> from DC1pod0>

nodetool status
Datacenter: MyCenter
====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 10.94.135.227 96.24 KiB 16 100.0% 093ba9e2-1167-42c7-921b-bf8c85392d3e rack1
UN 10.94.135.228 91.21 KiB 16 100.0% f853b93b-2e42-4407-b864-d9de71dc8d62 rack1

From DC2pod1>

nodetool status
Datacenter: MyCenter
====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 10.94.135.227 96.24 KiB 16 68.2% 093ba9e2-1167-42c7-921b-bf8c85392d3e rack1
UN 10.94.135.228 91.21 KiB 16 66.1% f853b93b-2e42-4407-b864-d9de71dc8d62 rack1

Datacenter: MyCenter2
=====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 10.94.135.230 81.48 KiB 16 65.8% b1c8420e-71ba-4604-a350-9b980dcfefea rack1

What I don't understand that if nothing happens then why is it changed after 2 minutes?

From the system.log I don't see any failure, from the netstat still all the nodes are connected to each other, like>

tcp 0 0 0.0.0.0:8080 0.0.0.0:* LISTEN 326/java
tcp 0 0 172.0.143.47:9042 0.0.0.0:* LISTEN 2677/java
tcp 0 0 10.94.135.230:7000 0.0.0.0:* LISTEN 2677/java
tcp 0 0 0.0.0.0:9500 0.0.0.0:* LISTEN 2677/java
tcp 0 0 0.0.0.0:7199 0.0.0.0:* LISTEN 2677/java
tcp 0 0 10.94.135.230:46816 10.94.135.228:7000 ESTABLISHED 2677/java
tcp 0 0 10.94.135.230:7000 10.94.135.228:55424 ESTABLISHED 2677/java
tcp 0 0 10.94.135.230:7000 10.94.135.229:44976 ESTABLISHED 2677/java
tcp 0 0 10.94.135.230:7000 10.94.135.227:56306 ESTABLISHED 2677/java
tcp 0 0 10.94.135.230:44948 10.94.135.229:7000 ESTABLISHED 2677/java
tcp 0 0 172.0.143.47:8080 192.168.142.12:48016 TIME_WAIT -
tcp 0 0 10.94.135.230:46822 10.94.135.228:7000 ESTABLISHED 2677/java
tcp 0 0 10.94.135.230:57214 10.94.135.227:7000 ESTABLISHED 2677/java
tcp 0 0 10.94.135.230:57162 10.94.135.227:7000 ESTABLISHED 2677/java
tcp 0 0 10.94.135.230:46824 10.94.135.228:7000 ESTABLISHED 2677/java
tcp 0 0 10.94.135.230:57712 10.94.135.227:7000 ESTABLISHED 2677/java
tcp 0 0 10.94.135.230:7000 10.94.135.229:44524 ESTABLISHED 2677/java

Does any body face similar issue with encryption of internode-communication ?

Thank you for any help or information,

BR

Gabor

kubernetesencryption
4 comments
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

steve.lacerda avatar image steve.lacerda ♦ commented ·
My first assumption is that there's something wrong with the certs, but you should see something in the logs. There has to be some error because the nodes aren't connected, so we'd need to see the logs in order to understand what's happening. There should clearly be some logging around the time the nodes are UN to DS or whatever state they end up in.
0 Likes 0 ·
PGabor avatar image PGabor steve.lacerda ♦ commented ·

Hi,

I am attaching the logs from all the nodes/DCs.

After the certificates were uploaded to the node, the logs contain these logs>

URGENT_MESSAGES-05e5cb6e successfully connected, version = 12, framing = LZ4, encryption = encrypted(factory=openssl;protocol=TLSv1.3;cipher=TLS_AES_128_GCM_SHA256)

And also from the netstat I see that the nodes are connected to each other.

After that I see only one suspicious log>

Failed to register collector for MBean org.apache.cassandra.metrics:type=Connection,scope=10.94.135.228_7000,name=LargeMessagePendingTasks.
java.lang.IllegalArgumentException: '10.94.135.228_7000' is not an IP string literal.


But on other forums as I have read, this means only that some java metrics will not work. Or this can be a problem ? Maybe if you can check the logs, maybe you see more failures.

Thank you so much, for your time and help

Br

Gabor

dcs.zip

0 Likes 0 ·
dcs.zip (130.6 KiB)
PGabor avatar image PGabor PGabor commented ·

Hi,

Also I would have few questions, maybe the configuration itself is wrong, because it was not straightforward how to configure the system.

So, I have used listen_interface instead of listen_address, where the interface is the public interface instead of the private interface of the pod. Is it ok this way ? In the official documentation it is stated that listen_address should be the private interface and the broadcast_address should be the public interface, but in that case how can I replicate the statefulset to use deploy more pods ?

Thank you for your help

Br
Gabor

0 Likes 0 ·
PGabor avatar image PGabor steve.lacerda ♦ commented ·
Hi,

I made an experiment, and if I execute the configuration that way that there is only 1 pod in the datacenters and the encryption is enabled in the internode communication, then it's working.

So it is strange that if I replicate the statefulset to replication factor 2, then the newly created pods seams to be working just for few minutes, then it stops working.

What could be wrong ?

Br

Gabor

0 Likes 0 ·

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered Erick Ramirez commented

The symptoms you described points to an environmental issue that is preventing nodes from communicating with each other.

The problem could be an issue with the Cassandra configuration, or with Kubernetes itself. Unfortunately, this isn't something we can solve without looking at all the components of your environment. Cheers!

2 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

PGabor avatar image PGabor commented ·

Hi,
thank you for your answer. Is it possible to check the cassandra configuration, or it's not enough what is logged in the cassandra system.log ? I would like to eliminate if something is misconfigured in the cassandra, I understand that the local kubernetes can not be checked further.

Can I get more help from somewhere ?

Thank you,

BR

Gabor

0 Likes 0 ·
Erick Ramirez avatar image Erick Ramirez ♦♦ PGabor commented ·

The system.log and debug.log on the pods should give you an idea of why the nodes stop communicating with each other. You can then take action depending on the underlying issue. Cheers!

0 Likes 0 ·