Hi,
We have a 3-node cluster (10.179.127.67/68/69) running on AWS EC2 (RHEL). Past few days, we are noticing that 2nd and 3rd nodes are crashing quickly after starting up without even posting any error messages in the system.log file. Can you please help in understanding what might be the cause? Also what can we do to make Cassandra log all the error messages clearly so that it is easy to identify the root cause?
In the below snippet, I have started up the node at around 2021-05-24 20:24. After few seconds, the node crashes again. So I had to attempt restart again at 2021-05-24 20:34:50. There is no errors logged explaining why the node crashed. I checked the logs from other 2 nodes, but they did not log any additional messages either. How do I force Cassandra to log more details?
INFO [GossipStage:1] 2021-05-24 20:24:24,367 Gossiper.java:1125 - Node /10.179.127.67 has restarted, now UP INFO [GossipStage:1] 2021-05-24 20:24:24,372 StorageService.java:2386 - Node /10.179.127.67 state jump to NORMAL INFO [GossipStage:1] 2021-05-24 20:24:24,378 TokenMetadata.java:497 - Updating topology for /10.179.127.67 INFO [GossipStage:1] 2021-05-24 20:24:24,378 TokenMetadata.java:497 - Updating topology for /10.179.127.67 INFO [GossipStage:1] 2021-05-24 20:24:24,379 Gossiper.java:1125 - Node /10.179.127.69 has restarted, now UP INFO [GossipStage:1] 2021-05-24 20:24:24,386 StorageService.java:2386 - Node /10.179.127.69 state jump to NORMAL INFO [GossipStage:1] 2021-05-24 20:24:24,391 TokenMetadata.java:497 - Updating topology for /10.179.127.69 INFO [GossipStage:1] 2021-05-24 20:24:24,391 TokenMetadata.java:497 - Updating topology for /10.179.127.69 INFO [HANDSHAKE-/10.179.127.69] 2021-05-24 20:24:24,450 OutboundTcpConnection.java:561 - Handshaking version with /10.179.127.69 INFO [GossipStage:1] 2021-05-24 20:24:24,505 Gossiper.java:1089 - InetAddress /10.179.127.67 is now UP INFO [GossipStage:1] 2021-05-24 20:24:24,570 Gossiper.java:1089 - InetAddress /10.179.127.69 is now UP INFO [HANDSHAKE-/10.179.127.67] 2021-05-24 20:24:25,036 OutboundTcpConnection.java:561 - Handshaking version with /10.179.127.67 WARN [GossipTasks:1] 2021-05-24 20:24:25,192 FailureDetector.java:278 - Not marking nodes down due to local pause of 28740715793 > 5000000000 INFO [main] 2021-05-24 20:24:32,354 Gossiper.java:1811 - No gossip backlog; proceeding INFO [main] 2021-05-24 20:24:33,020 NativeTransportService.java:68 - Netty using native Epoll event loop INFO [main] 2021-05-24 20:24:33,115 Server.java:148 - Enabling encrypted CQL connections between client and server INFO [main] 2021-05-24 20:24:33,154 Server.java:158 - Using Netty Version: [netty-buffer=netty-buffer-4.0.44.Final.452812a, netty-codec=netty-codec-4.0.44.Final.452812a, netty-codec-haproxy=netty-codec-haproxy-4.0.44.Final.452812a, netty-codec-http=netty-codec-http-4.0.44.Final.452812a, netty-codec-socks=netty-codec-socks-4.0.44.Final.452812a, netty-common=netty-common-4.0.44.Final.452812a, netty-handler=netty-handler-4.0.44.Final.452812a, netty-tcnative=netty-tcnative-1.1.33.Fork26.142ecbb, netty-transport=netty-transport-4.0.44.Final.452812a, netty-transport-native-epoll=netty-transport-native-epoll-4.0.44.Final.452812a, netty-transport-rxtx=netty-transport-rxtx-4.0.44.Final.452812a, netty-transport-sctp=netty-transport-sctp-4.0.44.Final.452812a, netty-transport-udt=netty-transport-udt-4.0.44.Final.452812a] INFO [main] 2021-05-24 20:24:33,155 Server.java:159 - Starting listening for CQL clients on /10.179.127.68:9142 (encrypted)... INFO [main] 2021-05-24 20:24:33,213 CassandraDaemon.java:556 - Not starting RPC server as requested. Use JMX (StorageService->startRPCServer()) or nodetool (enablethrift) to start it INFO [Native-Transport-Requests-1] 2021-05-24 20:24:33,710 AuthCache.java:177 - (Re)initializing PermissionsCache (validity period/update interval/max entries) (2000/2000/1000) INFO [main] 2021-05-24 20:34:50,372 YamlConfigurationLoader.java:89 - Configuration location: file:/data/cassandra/conf/cassandra.yaml
Thanks,
Yashwanth.