vranganathan avatar image
vranganathan asked Erick Ramirez answered

Why does nodetool compactionstats show 850GB of data to cleanup when node only has 438GB?

We have recently doubled our host count in the ring. There is one strange thing that we are not able to reason about.

I am currently performing `nodetool cleanup` on the cluster as our scaling is now complete and the output `nodetool compactionstats` is giving out does not make any sense.

nodetool compactionstats -H
pending tasks: 1
id compaction type keyspace table completed total unit progress
9a587000-0d85-11ea-bee2-2b2f32752d4c Cleanup <keyspace_name> <table_name> 188.56 GB 850.93 GB bytes 22.16%

compactionstats is suggesting that the total data that is to be cleaned up is ~`850GB`, but when I do a `df -h` on the box the host itself does not have that much data in it.

> df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 7.6G 64K 7.6G 1% /dev
tmpfs 7.6G 32K 7.6G 1% /dev/shm
/dev/nvme0n1p1 9.8G 4.4G 5.3G 46% /
/dev/mapper/vg-data--master 2.9T 438G 2.5T 15% /mnt

As you can see that this host is just having 438G of data. How come Cassandra is claiming that it'll cleanup 850GB of data.

Would appreciate if someone can explain what is happening here.

10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered

I suspect there's another problem here. My best guess is that something else occurred on the node which caused the discrepancy between the df output and what Cassandra thinks are SSTables it has at that point in time.

The total column simply reports the aggregate file sizes of all SSTables the node owns.

Without a list of the open files (from lsof output for example) and file listings on disk for the table in question, it would be impossible to determine what is going on in your case. Cheers!

10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.