This is a follow-up question to https://stackoverflow.com/questions/68896064/ and I'm re-posting my answer here for context.
nodetool cleanupwon't make a difference either since there's nothing to cleanup. In any case, major compactions are a bad idea in C* as I've explained in question #6396.
So how do you deal with low disk space on existing nodes? You need to increase the capacity of your cluster by adding more nodes. As you add nodes one by one, you can run
nodetool cleanup on the existing nodes to immediately free up space.
I've done some rough calculations based on the average node density of 1153GB across all 12 nodes. If you add 1 node, it will free up ~89GB per node on average. If you add 2 nodes, it should free up ~165GB per node on average. 3 nodes is about a 231GB drop and 4 nodes about 288GB. Cheers!
Now let me respond to your follow up questions.
This is expected because forcing a major compaction requires that all SSTables are read, loaded to memory and serialised on heap so they can all get compacted:
But held back running the same on Prod as 1 node spiked up the disc space from 73% to 99% during compact process.
That's why it's called major compaction. It requires a lot of IO and has the potential to completely slow your app down which is why it isn't recommended.
The only long term solution is to add nodes. As soon as the disk utilisation on the nodes go above 500GB, you need to start provisioning new servers so they are ready to deploy and add to the cluster. As soon as you get close to 1TB, you need to add nodes.
Cassandra is completely different from running other traditional RDBMS like Oracle. Trust me -- I used to be an Oracle architect for years. :) As soon as you hit capacity issues, Oracle tells you to scale your servers vertically by adding more RAM/CPU/disks. The opposite applies to Cassandra -- you scale horizontally by adding more nodes.
In relation to the old data, the only way to get rid of them is to issue a
DELETE. You will need to write an ETL job preferably with Spark to scan through the tables efficiently then delete the whole partition (not rows within the partition).
Finally if you have a valid Support subscription then by all means, please log a ticket with DataStax Support so one of our engineers can assist you directly. Cheers!
@Radhika, what is the version of Apache Cassandra® and/or DataStax Enterprise (DSE) that you're running with?
Like as explained in the Stackoverflow thread, the first best approach here would be to expand (scale-out) the cluster horizontally by adding additional nodes to this cluster so that disk space per node gets reduced as the new nodes will share the token to distribute the data. If you've properly sized the cluster accounting for parameters such as but not limited to throughput, latency, data growth, data time-to-live, you could adjust the table properties (or set TTL at the ingestion side) to expire the newly inserted data which will take care of the new data. For clearing out existing data, based on your business logic, you could write a one-time adhoc program (for e.g. Spark, etc.,) to clear them to reduce the disk size per node.
If you've further questions or would need hands-on help with this situation, please log a ticket with DataStax Support so one of our engineers can work with you directly.
8 People are following this question.