In this post I'll explain why a new node added to the cluster is unable to communicate with other nodes. In some instances, the node was previously part of the cluster and is still unable to gossip when added back in.
One of the tell-tale signs of this issue is that the node reports in the system.log
that it is unable to gossip with other nodes in the cluster, for example:
ERROR [main] 2019-08-15 18:46:32,241 CassandraDaemon.java:749 - Exception encountered during startup java.lang.RuntimeException: Unable to gossip with any peers at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1435) ~[apache-cassandra-3.11.4.jar:3.11.4] at org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:566) ~[apache-cassandra-3.11.4.jar:3.11.4] at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:823) ~[apache-cassandra-3.11.4.jar:3.11.4] at org.apache.cassandra.service.StorageService.initServer(StorageService.java:683) ~[apache-cassandra-3.11.4.jar:3.11.4] at org.apache.cassandra.service.StorageService.initServer(StorageService.java:632) ~[apache-cassandra-3.11.4.jar:3.11.4] at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:388) ~[apache-cassandra-3.11.4.jar:3.11.4] at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:620) ~[apache-cassandra-3.11.4.jar:3.11.4] at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:732) ~[apache-cassandra-3.11.4.jar:3.11.4]
In some cases, other nodes are able to see the affected node as operational but the affected node itself is unable to gossip with other nodes. Here is a sample output of nodetool gossipinfo
:
/10.1.2.4 generation:0 heartbeat:0 /10.1.2.3 generation:0 heartbeat:0 /10.1.2.6 generation:1444263348 heartbeat:6232 ... DC:DC1 STATUS:NORMAL,-1041938454866204344 ... /10.1.2.5 generation:0 heartbeat:0
One other symptom is that the affected node sees all other nodes in the cluster belonging to another DC as shown in this sample nodetool status
output:
Datacenter: r1 ============== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack DN 10.1.2.5 ? 256 9.0% 5279619a-550c-42b3-8150-61ad24f828f3 r1 DN 10.1.2.3 ? 256 9.1% 5d1fa459-cdac-4658-b68d-c6e0933afcee r1 DN 10.1.2.4 ? 256 10.5% a8f35c63-6a76-4e95-99f1-bef65d785366 r1 Datacenter: DC1 =============== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN 10.1.2.6 18.9 GB 256 9.5% 36fdcf57-0274-43b8-a501-c0e475e3e30b RAC1
The gossip protocol is used by the nodes to communicate information within the cluster. Gossip issues are usually related to problems with either snitch/topology configuration or the network layer.
In this case, the most common cause of the symptoms above are related to misconfigured firewall or VLANs.
Use the following checklist to identify the cause of the issue:
iptables
or firewalld
for misconfigurationNOTE - The standard gossip TCP port is 7000, or 7001 for SSL-secured clusters.
Republished from DataStax Support Knowledge Base article, "New node in cluster unable to gossip, cannot determine workload of other nodes".
Please note, described behavoir can be caused by different time on DN nodes:
ERROR [main] 2021-12-17 20:30:55,495 CassandraDaemon.java:909 - Exception encountered during startup java.lang.IllegalStateException: Unable to contact any seeds: [node-dc1/10.0.0.1:7000, node-dc2/10.1.0.1:7000] at org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1751) at org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:1056) at org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:1017) at org.apache.cassandra.service.StorageService.initServer(StorageService.java:799) at org.apache.cassandra.service.StorageService.initServer(StorageService.java:729) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:420) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:763) at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:887)
This can be also detected by warnings like this:
INFO [ScheduledTasks:1] 2021-12-17 20:30:30,130 MessagingMetrics.java:206 - ECHO_REQ messages were dropped in last 5000 ms: 0 internal and 21 cross node. Mean internal dropped latency: 0 ms and Mean cross-node dropped latency: 91830
Cassandra 4 is more intolerant to time desynchronization, so check that in all hosts time is correct and properly configure ntpd/chrony.
5 People are following this question.
DataStax Enterprise is powered by the best distribution of Apache Cassandra ™
© 2023 DataStax, Titan, and TitanDB are registered trademarks of DataStax, Inc. and its subsidiaries in the United States and/or other countries.
Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries.
Privacy Policy Terms of Use