Hello guys!
I performed a thread dump https://fastthread.io/my-thread-report.jsp?p=c2hhcmVkLzIwMTkvMTAvMTcvLS1kdW1wLnR4dC0tMC0zMC00MA==& while try to join the node with-Dcassandra.join_ring=false
because the node spiked in load and latency was affecting the clients.
With or without that flag the node is high in latency and I see the load sky rocketing when the number of TCP established connections increases
Analyzing the /var/log/messages
I am able to read
Oct 17 00:23:39 prod-personalization-live-data-cassandra-08 cassandra: INFO [Service Thread] 2019-10-17 00:23:39,030 GCInspector.java:284 - G1 Young Generation GC in 255ms. G1 Eden Space: 361758720 -> 0; G1 Old Gen: 1855455944 -> 1781007048; G1 Survivor Space: 39845888 -> 32505856;
Oct 17 00:23:40 prod-personalization-live-data-cassandra-08 cassandra: INFO [ScheduledTasks:1] 2019-10-17 00:23:40,352 NoSpamLogger.java:91 - Some operations were slow, details available at debug level (debug.log)
Oct 17 00:23:03 prod-personalization-live-data-cassandra-08 kernel: TCP: request_sock_TCP: Possible SYN flooding on port 9042. Sending cookies. Check SNMP counters.
I don't see anything on debug.log that looks to be relevant
The machine is on aws with 4 cpu with 32GB Ram and 1 TB SSD i3.xlarge
TOP
top - 19:58:32 up 2 days, 19:59, 3 users, load average: 42.77, 39.50, 38.07
Tasks: 137 total, 1 running, 71 sleeping, 0 stopped, 0 zombie
%Cpu(s): 87.3 us, 6.3 sy, 4.8 ni, 0.0 id, 0.0 wa, 0.0 hi, 1.6 si, 0.0 st
KiB Mem : 31391772 total, 256604 free, 10187460 used, 20947708 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 20017184 avail Mem
FREE
total used free shared buff/cache available
Mem: 31391772 10186776 239876 549400 20965120 20018132
Swap: 0 0 0
DEBUG.LOG
INFO [ReadStage-30] 2019-10-17 19:34:20,370 NoSpamLogger.java:91 - Maximum memory usage reached (536870912), cannot allocate chunk of 1048576
INFO [ScheduledTasks:1] 2019-10-17 19:35:19,004 NoSpamLogger.java:91 - Some operations were slow, details available at debug level (debug.log)
INFO [ScheduledTasks:1] 2019-10-17 19:40:19,223 NoSpamLogger.java:91 - Some operations were slow, details available at debug level (debug.log)
INFO [ScheduledTasks:1] 2019-10-17 19:45:19,614 NoSpamLogger.java:91 - Some operations were slow, details available at debug level (debug.log)
INFO [ReadStage-277] 2019-10-17 19:49:20,438 NoSpamLogger.java:91 - Maximum memory usage reached (536870912), cannot allocate chunk of 1048576
INFO [ScheduledTasks:1] 2019-10-17 19:50:19,834 NoSpamLogger.java:91 - Some operations were slow, details available at debug level (debug.log)
bash-4.2$ java -jar sjk.jar ttop -p 26309
Monitoring threads ...
2019-10-17T20:01:14.870+0000 Process summary
process cpu=325.34%
application cpu=321.13% (user=291.77% sys=29.36%)
other: cpu=4.21%
thread count: 476
heap allocation rate 772mb/s
[000101] user=34.76% sys= 0.45% alloc= 12mb/s - CompactionExecutor:1
[000125] user=14.90% sys= 0.73% alloc= 49mb/s - Native-Transport-Requests-1
[000269] user=12.12% sys= 0.30% alloc= 38mb/s - Native-Transport-Requests-3
[000267] user=10.92% sys= 0.34% alloc= 33mb/s - ReadStage-4
[000584] user= 9.14% sys= 0.33% alloc= 30mb/s - Native-Transport-Requests-14
[000126] user= 8.54% sys= 0.16% alloc= 25mb/s - Native-Transport-Requests-2
[000271] user= 8.34% sys= 0.26% alloc= 26mb/s - Native-Transport-Requests-6
[000525] user= 7.15% sys= 0.51% alloc= 22mb/s - Native-Transport-Requests-12
[000582] user= 7.35% sys= 0.25% alloc= 22mb/s - Native-Transport-Requests-16
[000304] user= 7.15% sys= 0.41% alloc= 22mb/s - Native-Transport-Requests-7
[000524] user= 6.55% sys= 0.30% alloc= 20mb/s - Native-Transport-Requests-13
[000268] user= 5.36% sys= 0.35% alloc= 13mb/s - Native-Transport-Requests-5
[002461] user= 3.97% sys= 1.49% alloc= 2655kb/s - RMI TCP Connection(309)-10.1.19.198
[000545] user= 4.97% sys= 0.20% alloc= 14mb/s - ReadStage-10
[000581] user= 4.97% sys= 0.19% alloc= 15mb/s - ReadStage-18
[000531] user= 4.97% sys= 0.14% alloc= 16mb/s - Native-Transport-Requests-26
[000572] user= 4.97% sys= 0.13% alloc= 16mb/s - Native-Transport-Requests-37
[000321] user= 4.57% sys= 0.37% alloc= 12mb/s - Native-Transport-Requests-8
[000573] user= 4.57% sys= 0.19% alloc= 14mb/s - Native-Transport-Requests-35
[000583] user= 4.57% sys= 0.15% alloc= 13mb/s - ReadStage-15
DSTAT
^Cbash-4.2$ dstat
You did not select any stats, using -cdngy by default.
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read writ| recv send| in out | int csw
28 1 70 0 0 0| 380k 5912k| 0 0 | 0 0 |3245 2629
90 7 1 0 0 2|1128k 917k| 803k 1010k| 0 0 | 21k 25k
92 6 0 0 0 1| 120k 0 | 836k 1048k| 0 0 | 20k 24k
90 5 4 0 0 1| 952k 0 | 488k 539k| 0 0 | 16k 15k
96 3 0 0 0 1| 128k 0 | 252k 248k| 0 0 | 12k 12k
89 8 1 0 0 2| 256k 2064k| 807k 1088k| 0 0 | 22k 30k
92 6 0 0 0 1|1164k 29k| 707k 905k| 0 0 | 17k 21k
Do you advise to increase the Heap?
cassand+ 26309 365 50.6 69878708 15914212 ? Ssl 17:24 574:49 java -Xloggc:/var/log/cassandra/gc.log -Dcassandra.max_queued_native_transport_requests=4096 -ea -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=1000003 -XX:+AlwaysPreTouch -XX:-UseBiasedLocking -XX:+UseTLAB -XX:+ResizeTLAB -XX:+UseNUMA -XX:+PerfDisableSharedMem -Djava.net.preferIPv4Stack=true -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:+UseG1GC -XX:G1RSetUpdatingPauseTimePercent=5 -XX:MaxGCPauseMillis=200 -XX:InitiatingHeapOccupancyPercent=45 -XX:G1HeapRegionSize=0 -XX:-ParallelRefProcEnabled -Xms7664M -Xmx7664M -XX:CompileCommandFile=/etc/cassandra/conf/hotspot_compiler -Dcom.sun.management.jmxremote.port=7199 -Dcom.sun.management.jmxremote.rmi.port=7199 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.password.file=/etc/cassandra/conf/jmxremote.password -Dcom.sun.management.jmxremote.access.file=/etc/cassandra/conf/jmxremote.access -Djava.library.path=/usr/share/cassandra/lib/sigar-bin -Djava.rmi.server.hostname=10.1.19.198 -XX:+CMSClassUnloadingEnabled -javaagent:/usr/share/cassandra/lib/jmx_prometheus_javaagent-0.3.1.jar=10100:/etc/cassandra/default.conf/jmx-export.yml -Dcassandra.join_ring=false -Dlogback.configurationFile=logback.xml -Dcassandra.logdir=/var/log/cassandra -Dcassandra.storagedir= -Dcassandra-pidfile=/var/run/cassandra/cassandra.pid -Dcassandra-foreground=yes -cp /etc/cassandra/conf:/usr/share/cassandra/lib/airline-0.6.jar:/usr/share/cassandra/lib/antlr-runtime-3.5.2.jar:/usr/share/cassandra/lib/asm-5.0.4.jar:/usr/share/cassandra/lib/caffeine-2.2.6.jar:/usr/share/cassandra/lib/cassandra-driver-core-3.0.1-shaded.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.9.jar:/usr/share/cassandra/lib/commons-lang3-3.1.jar:/usr/share/cassandra/lib/commons-math3-3.2.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.4.jar:/usr/share/cassandra/lib/concurrent-trees-2.4.0.jar:/usr/share/cassandra/lib/disruptor-3.0.1.jar:/usr/share/cassandra/lib/ecj-4.4.2.jar:/usr/share/cassandra/lib/guava-18.0.jar:/usr/share/cassandra/lib/HdrHistogram-2.1.9.jar:/usr/share/cassandra/lib/high-scale-lib-1.0.6.jar:/usr/share/cassandra/lib/hppc-0.5.4.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.13.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.13.jar:/usr/share/cassandra/lib/jamm-0.3.0.jar:/usr/share/cassandra/lib/javax.inject.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jcl-over-slf4j-1.7.7.jar:/usr/share/cassandra/lib/jctools-core-1.2.1.jar:/usr/share/cassandra/lib/jflex-1.6.0.jar:/usr/share/cassandra/lib/jmx_prometheus_javaagent-0.3.1.jar:/usr/share/cassandra/lib/jna-4.2.2.jar:/usr/share/cassandra/lib/joda-time-2.4.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/jstackjunit-0.0.1.jar:/usr/share/cassandra/lib/libthrift-0.9.2.jar:/usr/share/cassandra/lib/log4j-over-slf4j-1.7.7.jar:/usr/share/cassandra/lib/logback-classic-1.1.3.jar:/usr/share/cassandra/lib/logback-core-1.1.3.jar:/usr/share/cassandra/lib/lz4-1.3.0.jar:/usr/share/cassandra/lib/metrics-core-3.1.5.jar:/usr/share/cassandra/lib/metrics-jvm-3.1.5.jar:/usr/share/cassandra/lib/metrics-logback-3.1.5.jar:/usr/share/cassandra/lib/netty-all-4.0.44.Final.jar:/usr/share/cassandra/lib/ohc-core-0.4.4.jar:/usr/share/cassandra/lib/ohc-core-j8-0.4.4.jar:/usr/share/cassandra/lib/reporter-config3-3.0.3.jar:/usr/share/cassandra/lib/reporter-config-base-3.0.3.jar:/usr/share/cassandra/lib/sigar-1.6.4.jar:/usr/share/cassandra/lib/slf4j-api-1.7.7.jar:/usr/share/cassandra/lib/snakeyaml-1.11.jar:/usr/share/cassandra/lib/snappy-java-1.1.1.7.jar:/usr/share/cassandra/lib/snowball-stemmer-1.3.0.581.1.jar:/usr/share/cassandra/lib/ST4-4.0.8.jar:/usr/share/cassandra/lib/stream-2.5.2.jar:/usr/share/cassandra/lib/thrift-server-0.3.7.jar:/usr/share/cassandra/apache-cassandra-3.11.4.jar:/usr/share/cassandra/apache-cassandra-thrift-3.11.4.jar:/usr/share/cassandra/stress.jar: org.apache.cassandra.service.CassandraDaemon
OS Linux 4.16.13-1.el7.elrepo.x86_64 #1 SMP Wed May 30 14:31:51 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux
nodetool compactionstats
pending tasks: 584
nodetool netstats
Mode: STARTING
Not sending any streams.