Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started



vranganathan avatar image
vranganathan asked ·

Unable to reason about cleanup's progress

We have recently doubled our host count in the ring. There is one strange thing that we are not able to reason about.

I am currently performing `nodetool cleanup` on the cluster as our scaling is now complete and the output `nodetool compactionstats` is giving out does not make any sense.

nodetool compactionstats -H
pending tasks: 1
id compaction type keyspace table completed total unit progress
9a587000-0d85-11ea-bee2-2b2f32752d4c Cleanup <keyspace_name> <table_name> 188.56 GB 850.93 GB bytes 22.16%

compactionstats is suggesting that the total data that is to be cleaned up is ~`850GB`, but when I do a `df -h` on the box the host itself does not have that much data in it.

> df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 7.6G 64K 7.6G 1% /dev
tmpfs 7.6G 32K 7.6G 1% /dev/shm
/dev/nvme0n1p1 9.8G 4.4G 5.3G 46% /
/dev/mapper/vg-data--master 2.9T 438G 2.5T 15% /mnt

As you can see that this host is just having 438G of data. How come Cassandra is claiming that it'll cleanup 850GB of data.

Would appreciate if someone can explain what is happening here.

nodetool3.0.9apache cassandracleanup
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

0 Answers