Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

Erick Ramirez avatar image
Erick Ramirez asked ·

Nodes lose connection with each other during low traffic periods

This article discusses an issue where nodes in a cluster lose connectivity with each other during low traffic periods.

network
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered ·

Symptom

In most cases, nodes in a cluster fail to communicate with each other because a firewall is closing the socket when it detects that the connection between 2 nodes is idle. By default, most firewalls are configured with a timeout period of 5 minutes.

Solution

We recommend setting TCP keepalive to 60 seconds with 3 probes every 10 seconds on every node in the cluster:

$ sudo sysctl -w net.ipv4.tcp_keepalive_time=60 net.ipv4.tcp_keepalive_probes=3 net.ipv4.tcp_keepalive_intvl=10

These settings will detect dead TCP connections after 90 seconds (wait 60 seconds + send 3 probes every 10 seconds). The probes don't contain data so the additional traffic on the network is insignificant.

Note that you will need to consult the relevant documentation for your Linux distribution on how to persist these changes across reboots. Cheers!

Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.