Hi,
I discover that sometimes compaction is running slowly on a few nodes.
My cluster config: (DSE 6.7.4)
data disk: one (/var/lib/cassandra/data) disk usage: 26% concurrent compactors: 3 compaction throughput: 150 MB/s compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
I checked the "debug.log" and metrics of the disk on this node, the metrics of systems are normal.
Why the Row Throughput is not smooth?
Normal:
681.954MiB to 688.352MiB (~100% of original) in 30,470ms. Read Throughput = 22.381MiB/s, Write Throughput = 22.591MiB/s, Row Throughput = ~212,723/s.
Slow:
1.032GiB to 1.041GiB (~100% of original) in 1,101,984ms. Read Throughput = 982.182KiB/s, Write Throughput = 990.530KiB/s, Row Throughput = ~9,292/s. 896.792MiB to 903.236MiB (~100% of original) in 1,168,173ms. Read Throughput = 786.111KiB/s, Write Throughput = 791.761KiB/s, Row Throughput = ~7,588/s.
A part of debug.log:
DEBUG [CompactionExecutor:148] 2019-07-19 03:45:03,837 CompactionTask.java:289 - Compacted (77525850-a994-11e9-83f5-974421971a0f) 4 sstables to [/var/lib/cassandra/data/becktest/extend-776a0872730111e99b8811d05e233dab/aa-22324-bti,] to level=0. 681.954MiB to 688.352MiB (~100% of original) in 30,470ms. Read Throughput = 22.381MiB/s, Write Throughput = 22.591MiB/s, Row Throughput = ~212,723/s. 3,297,218 total partitions merged to 3,296,040. Partition merge counts were {1:3297218, } DEBUG [CompactionExecutor:142] 2019-07-19 04:04:36,261 CompactionTask.java:289 - Compacted (822d6da0-a994-11e9-83f5-974421971a0f) 4 sstables to [/var/lib/cassandra/data/becktest/extend-776a0872730111e99b8811d05e233dab/aa-22326-bti,] to level=0. 3.172GiB to 3.184GiB (~100% of original) in 1,184,680ms. Read Throughput = 2.742MiB/s, Write Throughput = 2.752MiB/s, Row Throughput = ~26,292/s. 15,578,111 total partitions merged to 15,551,606. Partition merge counts were {1:15578109, 2:1, } DEBUG [CompactionExecutor:148] 2019-07-19 04:05:13,532 CompactionTask.java:289 - Compacted (444d1910-a997-11e9-83f5-974421971a0f) 6 sstables to [/var/lib/cassandra/data/becktest/extend-776a0872730111e99b8811d05e233dab/aa-22332-bti,] to level=0. 826.671MiB to 834.403MiB (~100% of original) in 37,273ms. Read Throughput = 22.178MiB/s, Write Throughput = 22.386MiB/s, Row Throughput = ~212,833/s. 4,043,830 total partitions merged to 4,042,672. Partition merge counts were {1:4043824, 2:3, } DEBUG [CompactionExecutor:154] 2019-07-19 05:12:34,107 CompactionTask.java:289 - Compacted (0d06d4c0-a99a-11e9-83f5-974421971a0f) 4 sstables to [/var/lib/cassandra/data/becktest/extend-776a0872730111e99b8811d05e233dab/aa-22337-bti,] to level=0. 609.340MiB to 613.505MiB (~100% of original) in 2,882,094ms. Read Throughput = 216.496KiB/s, Write Throughput = 217.976KiB/s, Row Throughput = ~2,092/s. 3,016,709 total partitions merged to 3,015,373. Partition merge counts were {1:3016707, 2:1, } DEBUG [CompactionExecutor:152] 2019-07-19 05:12:34,110 CompactionTask.java:289 - Compacted (320e5dc0-a99e-11e9-83f5-974421971a0f) 6 sstables to [/var/lib/cassandra/data/becktest/extend-776a0872730111e99b8811d05e233dab/aa-22344-bti,] to level=0. 1.032GiB to 1.041GiB (~100% of original) in 1,101,984ms. Read Throughput = 982.182KiB/s, Write Throughput = 990.530KiB/s, Row Throughput = ~9,292/s. 5,120,348 total partitions merged to 5,118,682. Partition merge counts were {1:5120340, 2:4, } DEBUG [CompactionExecutor:156] 2019-07-19 05:12:59,002 CompactionTask.java:289 - Compacted (c2e91ef1-a9a0-11e9-83f5-974421971a0f) 4 sstables to [/var/lib/cassandra/data/becktest/extend-776a0872730111e99b8811d05e233dab/aa-22349-bti,] to level=0. 587.660MiB to 588.035MiB (~100% of original) in 24,858ms. Read Throughput = 23.641MiB/s, Write Throughput = 23.656MiB/s, Row Throughput = ~232,556/s. 2,923,738 total partitions merged to 2,906,197. Partition merge counts were {1:2890191, 2:16733, 3:27, } DEBUG [CompactionExecutor:156] 2019-07-19 05:55:00,160 CompactionTask.java:289 - Compacted (d1ba2690-a9a0-11e9-83f5-974421971a0f) 4 sstables to [/var/lib/cassandra/data/becktest/extend-776a0872730111e99b8811d05e233dab/aa-22350-bti,] to level=0. 2.660GiB to 2.672GiB (~100% of original) in 2,521,157ms. Read Throughput = 1.081MiB/s, Write Throughput = 1.085MiB/s, Row Throughput = ~10,515/s. 13,260,282 total partitions merged to 13,245,947. Partition merge counts were {1:13260276, 2:3, } DEBUG [CompactionExecutor:161] 2019-07-19 05:55:00,162 CompactionTask.java:289 - Compacted (f82b9540-a9a3-11e9-83f5-974421971a0f) 5 sstables to [/var/lib/cassandra/data/becktest/extend-776a0872730111e99b8811d05e233dab/aa-22356-bti,] to level=0. 896.792MiB to 903.236MiB (~100% of original) in 1,168,173ms. Read Throughput = 786.111KiB/s, Write Throughput = 791.761KiB/s, Row Throughput = ~7,588/s. 4,435,345 total partitions merged to 4,434,488. Partition merge counts were {1:4435341, 2:2, } DEBUG [CompactionExecutor:161] 2019-07-19 05:55:24,131 CompactionTask.java:289 - Compacted (b074a310-a9a6-11e9-83f5-974421971a0f) 4 sstables to [/var/lib/cassandra/data/becktest/extend-776a0872730111e99b8811d05e233dab/aa-22361-bti,] to level=0. 553.353MiB to 557.069MiB (~100% of original) in 23,967ms. Read Throughput = 23.087MiB/s, Write Throughput = 23.242MiB/s, Row Throughput = ~224,346/s. 2,692,155 total partitions merged to 2,691,546. Partition merge counts were {1:2692155, }
update:
I see compaction stats on this node.
The compaction stats of low throughput is stuck 100%.
I "tail" the compaction log now.
tail -f aa_txn_compaction_b9c38100-a9d0-11e9-83f5-974421971a0f.log
ADD:[/var/lib/cassandra/data/becktest/extend-776a0872730111e99b8811d05e233dab/aa-22442-bti-,0,8][3680451155] REMOVE:[/var/lib/cassandra/data/becktest/extend-776a0872730111e99b8811d05e233dab/aa-22437-bti-,1563503364000,8][3827712236] REMOVE:[/var/lib/cassandra/data/becktest/extend-776a0872730111e99b8811d05e233dab/aa-22438-bti-,1563503793000,8][618203301] REMOVE:[/var/lib/cassandra/data/becktest/extend-776a0872730111e99b8811d05e233dab/aa-22439-bti-,1563504214000,8][4257559006]
I guess the tasks of REMOVE are slow at "compaction 100%".