Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

Beck avatar image
Beck asked ·

Why the compaction is running slowly on a few nodes?

Hi,


I discover that sometimes compaction is running slowly on a few nodes.


My cluster config: (DSE 6.7.4)

data disk: one (/var/lib/cassandra/data)
disk usage: 26%

concurrent compactors: 3
compaction throughput: 150 MB/s

compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}


I checked the "debug.log" and metrics of the disk on this node, the metrics of systems are normal.


Why the Row Throughput is not smooth?


Normal:

681.954MiB to 688.352MiB (~100% of original) in 30,470ms.  Read Throughput = 22.381MiB/s, Write Throughput = 22.591MiB/s, Row Throughput = ~212,723/s.

Slow:

1.032GiB to 1.041GiB (~100% of original) in 1,101,984ms.  Read Throughput = 982.182KiB/s, Write Throughput = 990.530KiB/s, Row Throughput = ~9,292/s.
896.792MiB to 903.236MiB (~100% of original) in 1,168,173ms.  Read Throughput = 786.111KiB/s, Write Throughput = 791.761KiB/s, Row Throughput = ~7,588/s.


A part of debug.log:

DEBUG [CompactionExecutor:148] 2019-07-19 03:45:03,837  CompactionTask.java:289 - Compacted (77525850-a994-11e9-83f5-974421971a0f) 4 sstables to [/var/lib/cassandra/data/becktest/extend-776a0872730111e99b8811d05e233dab/aa-22324-bti,] to level=0.  681.954MiB to 688.352MiB (~100% of original) in 30,470ms.  Read Throughput = 22.381MiB/s, Write Throughput = 22.591MiB/s, Row Throughput = ~212,723/s.  3,297,218 total partitions merged to 3,296,040.  Partition merge counts were {1:3297218, }
DEBUG [CompactionExecutor:142] 2019-07-19 04:04:36,261  CompactionTask.java:289 - Compacted (822d6da0-a994-11e9-83f5-974421971a0f) 4 sstables to [/var/lib/cassandra/data/becktest/extend-776a0872730111e99b8811d05e233dab/aa-22326-bti,] to level=0.  3.172GiB to 3.184GiB (~100% of original) in 1,184,680ms.  Read Throughput = 2.742MiB/s, Write Throughput = 2.752MiB/s, Row Throughput = ~26,292/s.  15,578,111 total partitions merged to 15,551,606.  Partition merge counts were {1:15578109, 2:1, }
DEBUG [CompactionExecutor:148] 2019-07-19 04:05:13,532  CompactionTask.java:289 - Compacted (444d1910-a997-11e9-83f5-974421971a0f) 6 sstables to [/var/lib/cassandra/data/becktest/extend-776a0872730111e99b8811d05e233dab/aa-22332-bti,] to level=0.  826.671MiB to 834.403MiB (~100% of original) in 37,273ms.  Read Throughput = 22.178MiB/s, Write Throughput = 22.386MiB/s, Row Throughput = ~212,833/s.  4,043,830 total partitions merged to 4,042,672.  Partition merge counts were {1:4043824, 2:3, }
DEBUG [CompactionExecutor:154] 2019-07-19 05:12:34,107  CompactionTask.java:289 - Compacted (0d06d4c0-a99a-11e9-83f5-974421971a0f) 4 sstables to [/var/lib/cassandra/data/becktest/extend-776a0872730111e99b8811d05e233dab/aa-22337-bti,] to level=0.  609.340MiB to 613.505MiB (~100% of original) in 2,882,094ms.  Read Throughput = 216.496KiB/s, Write Throughput = 217.976KiB/s, Row Throughput = ~2,092/s.  3,016,709 total partitions merged to 3,015,373.  Partition merge counts were {1:3016707, 2:1, }
DEBUG [CompactionExecutor:152] 2019-07-19 05:12:34,110  CompactionTask.java:289 - Compacted (320e5dc0-a99e-11e9-83f5-974421971a0f) 6 sstables to [/var/lib/cassandra/data/becktest/extend-776a0872730111e99b8811d05e233dab/aa-22344-bti,] to level=0.  1.032GiB to 1.041GiB (~100% of original) in 1,101,984ms.  Read Throughput = 982.182KiB/s, Write Throughput = 990.530KiB/s, Row Throughput = ~9,292/s.  5,120,348 total partitions merged to 5,118,682.  Partition merge counts were {1:5120340, 2:4, }
DEBUG [CompactionExecutor:156] 2019-07-19 05:12:59,002  CompactionTask.java:289 - Compacted (c2e91ef1-a9a0-11e9-83f5-974421971a0f) 4 sstables to [/var/lib/cassandra/data/becktest/extend-776a0872730111e99b8811d05e233dab/aa-22349-bti,] to level=0.  587.660MiB to 588.035MiB (~100% of original) in 24,858ms.  Read Throughput = 23.641MiB/s, Write Throughput = 23.656MiB/s, Row Throughput = ~232,556/s.  2,923,738 total partitions merged to 2,906,197.  Partition merge counts were {1:2890191, 2:16733, 3:27, }
DEBUG [CompactionExecutor:156] 2019-07-19 05:55:00,160  CompactionTask.java:289 - Compacted (d1ba2690-a9a0-11e9-83f5-974421971a0f) 4 sstables to [/var/lib/cassandra/data/becktest/extend-776a0872730111e99b8811d05e233dab/aa-22350-bti,] to level=0.  2.660GiB to 2.672GiB (~100% of original) in 2,521,157ms.  Read Throughput = 1.081MiB/s, Write Throughput = 1.085MiB/s, Row Throughput = ~10,515/s.  13,260,282 total partitions merged to 13,245,947.  Partition merge counts were {1:13260276, 2:3, }
DEBUG [CompactionExecutor:161] 2019-07-19 05:55:00,162  CompactionTask.java:289 - Compacted (f82b9540-a9a3-11e9-83f5-974421971a0f) 5 sstables to [/var/lib/cassandra/data/becktest/extend-776a0872730111e99b8811d05e233dab/aa-22356-bti,] to level=0.  896.792MiB to 903.236MiB (~100% of original) in 1,168,173ms.  Read Throughput = 786.111KiB/s, Write Throughput = 791.761KiB/s, Row Throughput = ~7,588/s.  4,435,345 total partitions merged to 4,434,488.  Partition merge counts were {1:4435341, 2:2, }
DEBUG [CompactionExecutor:161] 2019-07-19 05:55:24,131  CompactionTask.java:289 - Compacted (b074a310-a9a6-11e9-83f5-974421971a0f) 4 sstables to [/var/lib/cassandra/data/becktest/extend-776a0872730111e99b8811d05e233dab/aa-22361-bti,] to level=0.  553.353MiB to 557.069MiB (~100% of original) in 23,967ms.  Read Throughput = 23.087MiB/s, Write Throughput = 23.242MiB/s, Row Throughput = ~224,346/s.  2,692,155 total partitions merged to 2,691,546.  Partition merge counts were {1:2692155, }



update:

I see compaction stats on this node.

The compaction stats of low throughput is stuck 100%.


I "tail" the compaction log now.

tail -f aa_txn_compaction_b9c38100-a9d0-11e9-83f5-974421971a0f.log

ADD:[/var/lib/cassandra/data/becktest/extend-776a0872730111e99b8811d05e233dab/aa-22442-bti-,0,8][3680451155]
REMOVE:[/var/lib/cassandra/data/becktest/extend-776a0872730111e99b8811d05e233dab/aa-22437-bti-,1563503364000,8][3827712236]
REMOVE:[/var/lib/cassandra/data/becktest/extend-776a0872730111e99b8811d05e233dab/aa-22438-bti-,1563503793000,8][618203301]
REMOVE:[/var/lib/cassandra/data/becktest/extend-776a0872730111e99b8811d05e233dab/aa-22439-bti-,1563504214000,8][4257559006]

I guess the tasks of REMOVE are slow at "compaction 100%".

compactiondse 6.7.4
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

cache_drive avatar image
cache_drive answered ·

I see you're using DSE - does Datastax support assist in issues like this? We're interested in DSE but curious why someone using DSE is posting online for help? Is their support terrible?

2 comments Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

@cache_drive thanks for joining us on the DataStax Community. Just a friendly note that you have posted a comment as an "answer" to a question. Please edit and re-post so other users in the future don't mistake it as an answer. Thanks for your understanding. Cheers!

0 Likes 0 · ·

@cache_drive Beck is most likely trying out DSE (maybe to learn how DSE and Apache Cassandra works) and won't necessarily have a subscription. For this reason, we try to help out here. Feel free to try DSE for yourself and let us know if you have any questions. Cheers!

1 Like 1 · ·
Erick Ramirez avatar image
Erick Ramirez answered ·

A quick glance over the log outputs you posted indicate the compactions which are slow had very low read throughputs. For example, the throughput at 05:12 was less than 1 MB/s. Without knowing much about the cluster config, I'd say the likely reason is the disks were overloaded and busy. Check that data/ and commitlog/ are on separate disks so writes don't affect reads & vice-versa. Cheers!

2 comments Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Hi @Erick Ramirez,


> Check that data/ and commitlog/ are on separate disks

Yes, it is on separate disks.

data/ => /dev/sde1
commitlog/ => /dev/sdf1


I restart Cassandra on this node, then this node is normal.

I still don't understand what caused it.

0 Likes 0 · ·

You would have needed to monitor what was going on at the time to analyse what was going on, e.g. use the Linux utility iostat to monitor disk utilisation. You would then need to correlate the disk stats with the log entries at a point in time to determine what the node was doing. Cheers!

1 Like 1 · ·