PLANNED MAINTENANCE

Hello, DataStax Community!

We want to make you aware of a few operational updates which will be carried out on the site. We are working hard to streamline the login process to integrate with other DataStax resources. As such, you will soon be prompted to update your password. Please note that your username will remain the same.

As we work to improve your user experience, please be aware that login to the DataStax Community will be unavailable for a few hours on:

  • Wednesday, July 15 16:00 PDT | 19:00 EDT | 20:00 BRT
  • Thursday, July 16 00:00 BST | 01:00 CEST | 04:30 IST | 07:00 CST | 09:00 AEST

For more info, check out the FAQ page. Thank you for being a valued member of our community.


question

Sergio avatar image
Sergio asked ·

3.11.4 Node the load starts to increase after few minutes to 40 on 4 CPU machine. It is due to compaction. What can I do to help?

Hello guys!

I performed a thread dump https://fastthread.io/my-thread-report.jsp?p=c2hhcmVkLzIwMTkvMTAvMTcvLS1kdW1wLnR4dC0tMC0zMC00MA==& while try to join the node with

-Dcassandra.join_ring=false


because the node spiked in load and latency was affecting the clients.

With or without that flag the node is high in latency and I see the load sky rocketing when the number of TCP established connections increases

Analyzing the /var/log/messages I am able to read

Oct 17 00:23:39 prod-personalization-live-data-cassandra-08 cassandra: INFO [Service Thread] 2019-10-17 00:23:39,030 GCInspector.java:284 - G1 Young Generation GC in 255ms. G1 Eden Space: 361758720 -> 0; G1 Old Gen: 1855455944 -> 1781007048; G1 Survivor Space: 39845888 -> 32505856;

Oct 17 00:23:40 prod-personalization-live-data-cassandra-08 cassandra: INFO [ScheduledTasks:1] 2019-10-17 00:23:40,352 NoSpamLogger.java:91 - Some operations were slow, details available at debug level (debug.log)

Oct 17 00:23:03 prod-personalization-live-data-cassandra-08 kernel: TCP: request_sock_TCP: Possible SYN flooding on port 9042. Sending cookies. Check SNMP counters.

I don't see anything on debug.log that looks to be relevant

The machine is on aws with 4 cpu with 32GB Ram and 1 TB SSD i3.xlarge


TOP

top - 19:58:32 up 2 days, 19:59, 3 users, load average: 42.77, 39.50, 38.07

Tasks: 137 total, 1 running, 71 sleeping, 0 stopped, 0 zombie

%Cpu(s): 87.3 us, 6.3 sy, 4.8 ni, 0.0 id, 0.0 wa, 0.0 hi, 1.6 si, 0.0 st

KiB Mem : 31391772 total, 256604 free, 10187460 used, 20947708 buff/cache

KiB Swap: 0 total, 0 free, 0 used. 20017184 avail Mem


FREE

total used free shared buff/cache available

Mem: 31391772 10186776 239876 549400 20965120 20018132

Swap: 0 0 0


DEBUG.LOG

INFO [ReadStage-30] 2019-10-17 19:34:20,370 NoSpamLogger.java:91 - Maximum memory usage reached (536870912), cannot allocate chunk of 1048576

INFO [ScheduledTasks:1] 2019-10-17 19:35:19,004 NoSpamLogger.java:91 - Some operations were slow, details available at debug level (debug.log)

INFO [ScheduledTasks:1] 2019-10-17 19:40:19,223 NoSpamLogger.java:91 - Some operations were slow, details available at debug level (debug.log)

INFO [ScheduledTasks:1] 2019-10-17 19:45:19,614 NoSpamLogger.java:91 - Some operations were slow, details available at debug level (debug.log)

INFO [ReadStage-277] 2019-10-17 19:49:20,438 NoSpamLogger.java:91 - Maximum memory usage reached (536870912), cannot allocate chunk of 1048576

INFO [ScheduledTasks:1] 2019-10-17 19:50:19,834 NoSpamLogger.java:91 - Some operations were slow, details available at debug level (debug.log)


bash-4.2$ java -jar sjk.jar ttop -p 26309

Monitoring threads ...


2019-10-17T20:01:14.870+0000 Process summary

process cpu=325.34%

application cpu=321.13% (user=291.77% sys=29.36%)

other: cpu=4.21%

thread count: 476

heap allocation rate 772mb/s

[000101] user=34.76% sys= 0.45% alloc= 12mb/s - CompactionExecutor:1

[000125] user=14.90% sys= 0.73% alloc= 49mb/s - Native-Transport-Requests-1

[000269] user=12.12% sys= 0.30% alloc= 38mb/s - Native-Transport-Requests-3

[000267] user=10.92% sys= 0.34% alloc= 33mb/s - ReadStage-4

[000584] user= 9.14% sys= 0.33% alloc= 30mb/s - Native-Transport-Requests-14

[000126] user= 8.54% sys= 0.16% alloc= 25mb/s - Native-Transport-Requests-2

[000271] user= 8.34% sys= 0.26% alloc= 26mb/s - Native-Transport-Requests-6

[000525] user= 7.15% sys= 0.51% alloc= 22mb/s - Native-Transport-Requests-12

[000582] user= 7.35% sys= 0.25% alloc= 22mb/s - Native-Transport-Requests-16

[000304] user= 7.15% sys= 0.41% alloc= 22mb/s - Native-Transport-Requests-7

[000524] user= 6.55% sys= 0.30% alloc= 20mb/s - Native-Transport-Requests-13

[000268] user= 5.36% sys= 0.35% alloc= 13mb/s - Native-Transport-Requests-5

[002461] user= 3.97% sys= 1.49% alloc= 2655kb/s - RMI TCP Connection(309)-10.1.19.198

[000545] user= 4.97% sys= 0.20% alloc= 14mb/s - ReadStage-10

[000581] user= 4.97% sys= 0.19% alloc= 15mb/s - ReadStage-18

[000531] user= 4.97% sys= 0.14% alloc= 16mb/s - Native-Transport-Requests-26

[000572] user= 4.97% sys= 0.13% alloc= 16mb/s - Native-Transport-Requests-37

[000321] user= 4.57% sys= 0.37% alloc= 12mb/s - Native-Transport-Requests-8

[000573] user= 4.57% sys= 0.19% alloc= 14mb/s - Native-Transport-Requests-35

[000583] user= 4.57% sys= 0.15% alloc= 13mb/s - ReadStage-15


DSTAT


^Cbash-4.2$ dstat

You did not select any stats, using -cdngy by default.

----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--

usr sys idl wai hiq siq| read writ| recv send| in out | int csw

28 1 70 0 0 0| 380k 5912k| 0 0 | 0 0 |3245 2629

90 7 1 0 0 2|1128k 917k| 803k 1010k| 0 0 | 21k 25k

92 6 0 0 0 1| 120k 0 | 836k 1048k| 0 0 | 20k 24k

90 5 4 0 0 1| 952k 0 | 488k 539k| 0 0 | 16k 15k

96 3 0 0 0 1| 128k 0 | 252k 248k| 0 0 | 12k 12k

89 8 1 0 0 2| 256k 2064k| 807k 1088k| 0 0 | 22k 30k

92 6 0 0 0 1|1164k 29k| 707k 905k| 0 0 | 17k 21k



Do you advise to increase the Heap?

cassand+ 26309 365 50.6 69878708 15914212 ? Ssl 17:24 574:49 java -Xloggc:/var/log/cassandra/gc.log -Dcassandra.max_queued_native_transport_requests=4096 -ea -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=1000003 -XX:+AlwaysPreTouch -XX:-UseBiasedLocking -XX:+UseTLAB -XX:+ResizeTLAB -XX:+UseNUMA -XX:+PerfDisableSharedMem -Djava.net.preferIPv4Stack=true -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:+UseG1GC -XX:G1RSetUpdatingPauseTimePercent=5 -XX:MaxGCPauseMillis=200 -XX:InitiatingHeapOccupancyPercent=45 -XX:G1HeapRegionSize=0 -XX:-ParallelRefProcEnabled -Xms7664M -Xmx7664M -XX:CompileCommandFile=/etc/cassandra/conf/hotspot_compiler -Dcom.sun.management.jmxremote.port=7199 -Dcom.sun.management.jmxremote.rmi.port=7199 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.password.file=/etc/cassandra/conf/jmxremote.password -Dcom.sun.management.jmxremote.access.file=/etc/cassandra/conf/jmxremote.access -Djava.library.path=/usr/share/cassandra/lib/sigar-bin -Djava.rmi.server.hostname=10.1.19.198 -XX:+CMSClassUnloadingEnabled -javaagent:/usr/share/cassandra/lib/jmx_prometheus_javaagent-0.3.1.jar=10100:/etc/cassandra/default.conf/jmx-export.yml -Dcassandra.join_ring=false -Dlogback.configurationFile=logback.xml -Dcassandra.logdir=/var/log/cassandra -Dcassandra.storagedir= -Dcassandra-pidfile=/var/run/cassandra/cassandra.pid -Dcassandra-foreground=yes -cp /etc/cassandra/conf:/usr/share/cassandra/lib/airline-0.6.jar:/usr/share/cassandra/lib/antlr-runtime-3.5.2.jar:/usr/share/cassandra/lib/asm-5.0.4.jar:/usr/share/cassandra/lib/caffeine-2.2.6.jar:/usr/share/cassandra/lib/cassandra-driver-core-3.0.1-shaded.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.9.jar:/usr/share/cassandra/lib/commons-lang3-3.1.jar:/usr/share/cassandra/lib/commons-math3-3.2.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.4.jar:/usr/share/cassandra/lib/concurrent-trees-2.4.0.jar:/usr/share/cassandra/lib/disruptor-3.0.1.jar:/usr/share/cassandra/lib/ecj-4.4.2.jar:/usr/share/cassandra/lib/guava-18.0.jar:/usr/share/cassandra/lib/HdrHistogram-2.1.9.jar:/usr/share/cassandra/lib/high-scale-lib-1.0.6.jar:/usr/share/cassandra/lib/hppc-0.5.4.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.13.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.13.jar:/usr/share/cassandra/lib/jamm-0.3.0.jar:/usr/share/cassandra/lib/javax.inject.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jcl-over-slf4j-1.7.7.jar:/usr/share/cassandra/lib/jctools-core-1.2.1.jar:/usr/share/cassandra/lib/jflex-1.6.0.jar:/usr/share/cassandra/lib/jmx_prometheus_javaagent-0.3.1.jar:/usr/share/cassandra/lib/jna-4.2.2.jar:/usr/share/cassandra/lib/joda-time-2.4.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/jstackjunit-0.0.1.jar:/usr/share/cassandra/lib/libthrift-0.9.2.jar:/usr/share/cassandra/lib/log4j-over-slf4j-1.7.7.jar:/usr/share/cassandra/lib/logback-classic-1.1.3.jar:/usr/share/cassandra/lib/logback-core-1.1.3.jar:/usr/share/cassandra/lib/lz4-1.3.0.jar:/usr/share/cassandra/lib/metrics-core-3.1.5.jar:/usr/share/cassandra/lib/metrics-jvm-3.1.5.jar:/usr/share/cassandra/lib/metrics-logback-3.1.5.jar:/usr/share/cassandra/lib/netty-all-4.0.44.Final.jar:/usr/share/cassandra/lib/ohc-core-0.4.4.jar:/usr/share/cassandra/lib/ohc-core-j8-0.4.4.jar:/usr/share/cassandra/lib/reporter-config3-3.0.3.jar:/usr/share/cassandra/lib/reporter-config-base-3.0.3.jar:/usr/share/cassandra/lib/sigar-1.6.4.jar:/usr/share/cassandra/lib/slf4j-api-1.7.7.jar:/usr/share/cassandra/lib/snakeyaml-1.11.jar:/usr/share/cassandra/lib/snappy-java-1.1.1.7.jar:/usr/share/cassandra/lib/snowball-stemmer-1.3.0.581.1.jar:/usr/share/cassandra/lib/ST4-4.0.8.jar:/usr/share/cassandra/lib/stream-2.5.2.jar:/usr/share/cassandra/lib/thrift-server-0.3.7.jar:/usr/share/cassandra/apache-cassandra-3.11.4.jar:/usr/share/cassandra/apache-cassandra-thrift-3.11.4.jar:/usr/share/cassandra/stress.jar: org.apache.cassandra.service.CassandraDaemon


OS Linux 4.16.13-1.el7.elrepo.x86_64 #1 SMP Wed May 30 14:31:51 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux


nodetool compactionstats

pending tasks: 584


nodetool netstats

Mode: STARTING

Not sending any streams.

cassandra load
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Sergio avatar image
Sergio answered ·

For all the readers I solved that problem by changing the policy at the Cassandra driver level for all the clients. https://docs.datastax.com/en/drivers/java/2.2/com/datastax/driver/core/policies/LatencyAwarePolicy.html

Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Erick Ramirez avatar image
Erick Ramirez answered ·

@Sergio this sounds like a follow up to the same symptom you reported in this post. My response to that was the symptoms appear to be due to node not being able to cope with the traffic, particularly the disk being overloaded with IO.

You need to check the disk based on the suggestions I made. But for all intents and purposes, the ttop output shows the node is serving a lot of requests. You should also check that the clients/driver/application is configured correctly in case all requests are getting routed to this node instead of being load balanced to the rest of the cluster. Cheers!

Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.