We have recently doubled our host count in the ring. There is one strange thing that we are not able to reason about.
I am currently performing `nodetool cleanup` on the cluster as our scaling is now complete and the output `nodetool compactionstats` is giving out does not make any sense.
nodetool compactionstats -H pending tasks: 1 id compaction type keyspace table completed total unit progress 9a587000-0d85-11ea-bee2-2b2f32752d4c Cleanup <keyspace_name> <table_name> 188.56 GB 850.93 GB bytes 22.16%
compactionstats is suggesting that the total data that is to be cleaned up is ~`850GB`, but when I do a `df -h` on the box the host itself does not have that much data in it.
> df -h Filesystem Size Used Avail Use% Mounted on devtmpfs 7.6G 64K 7.6G 1% /dev tmpfs 7.6G 32K 7.6G 1% /dev/shm /dev/nvme0n1p1 9.8G 4.4G 5.3G 46% / /dev/mapper/vg-data--master 2.9T 438G 2.5T 15% /mnt
As you can see that this host is just having 438G of data. How come Cassandra is claiming that it'll cleanup 850GB of data.
Would appreciate if someone can explain what is happening here.