Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

Erick Ramirez avatar image
Erick Ramirez asked ·

Why is a new node "Unable to gossip with other peers"?

In this post I'll explain why a new node added to the cluster is unable to communicate with other nodes. In some instances, the node was previously part of the cluster and is still unable to gossip when added back in.

cassandragossip
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered ·

Symptoms

One of the tell-tale signs of this issue is that the node reports in the system.log that it is unable to gossip with other nodes in the cluster, for example:

ERROR [main] 2019-08-15 18:46:32,241 CassandraDaemon.java:749 - Exception encountered during startup
java.lang.RuntimeException: Unable to gossip with any peers
        at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1435) ~[apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:566) ~[apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:823) ~[apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.service.StorageService.initServer(StorageService.java:683) ~[apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.service.StorageService.initServer(StorageService.java:632) ~[apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:388) ~[apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:620) ~[apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:732) ~[apache-cassandra-3.11.4.jar:3.11.4]

In some cases, other nodes are able to see the affected node as operational but the affected node itself is unable to gossip with other nodes. Here is a sample output of nodetool gossipinfo:

/10.1.2.4
  generation:0
  heartbeat:0
/10.1.2.3
  generation:0
  heartbeat:0
/10.1.2.6
  generation:1444263348
  heartbeat:6232
  ...
  DC:DC1
  STATUS:NORMAL,-1041938454866204344
  ...
/10.1.2.5
  generation:0
  heartbeat:0

One other symptom is that the affected node sees all other nodes in the cluster belonging to another DC as shown in this sample nodetool status output:

Datacenter: r1 
============== 
Status=Up/Down 
|/ State=Normal/Leaving/Joining/Moving 
-- Address Load Tokens Owns Host ID Rack 
DN 10.1.2.5 ? 256 9.0% 5279619a-550c-42b3-8150-61ad24f828f3 r1 
DN 10.1.2.3 ? 256 9.1% 5d1fa459-cdac-4658-b68d-c6e0933afcee r1 
DN 10.1.2.4 ? 256 10.5% a8f35c63-6a76-4e95-99f1-bef65d785366 r1 
Datacenter: DC1 
=============== 
Status=Up/Down 
|/ State=Normal/Leaving/Joining/Moving 
-- Address Load Tokens Owns Host ID Rack 
UN 10.1.2.6 18.9 GB 256 9.5% 36fdcf57-0274-43b8-a501-c0e475e3e30b RAC1

Cause

The gossip protocol is used by the nodes to communicate information within the cluster. Gossip issues are usually related to problems with either snitch/topology configuration or the network layer.

In this case, the most common cause of the symptoms above are related to misconfigured firewall or VLANs.

Solution

Use the following checklist to identify the cause of the issue:

  • check software firewall such as iptables or firewalld for misconfiguration
  • check for missed steps in your organisation's server provisioning process - did security settings get inadvertently applied to the node?
  • check ports on network devices for misconfiguration
  • check network policies such as quality-of-service (QoS) or bandwidth throttling rules for misconfiguration - do they apply to this environment?

NOTE - The standard gossip TCP port is 7000, or 7001 for SSL-secured clusters.

Credits

Republished from DataStax Support Knowledge Base article, "New node in cluster unable to gossip, cannot determine workload of other nodes".

Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.