@Erick Ramirez We have a 12 node cassandra cluster in Production. In the recent past, almost all the nodes are using higher than 85% disc space. We tried to add default_time_to_live, gc_grace_seconds for few tables. But there seems to be no effect on the count of records or the disc space. There are suggestions to perform nodetool compact and cleanup. But this also mentions that it is not suggested to be run on production environment.
Some specific questions,
- Tried setting the TTL as 100 days and gc as 3hours. Expectation was records older than 90 days should get deleted after 3 hours. But it was still intact. Is there anything else to be taken care to delete records older than 100 days using TTL settings? Disc space is also expected to be freed up. Again what else should be done to free up disc space after deleting records.
ALTER TABLE my_keyspace.my_item WITH default_time_to_live=8640000
ALTER TABLE my_keyspace.my_item WITH gc_grace_seconds=10800
- Is it ok to run nodetool compact followed by nodetool cleanup on a prod environment with all instances over 85% disc space utilized?
Please share other suggestions as well to free up disc space utilized by Cassandra.
Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 10.1.x.x 997.26 GiB 256 24.7% erff8abf-16a1-4a72-b63e-5c4rg2c8d003 rack1 UN 10.2.x.x 1.22 TiB 256 26.1% a8auuj76-f635-450f-a2fd-7sdfg0ss713e rack1 UN 10.3.x.x 1.21 TiB 256 25.4% 8ebas25c-4c0b-4be9-81e3-013fasdas255 rack1 UN 10.4.x.x 1.27 TiB 256 25.1% wwwdba15-16f3-41a8-b3d1-2d2b6e35715d rack1 UN 10.5.x.x 975.67 GiB 256 24.7% 72ed4df7-fb65-4332-b8ac-e7461699f633 rack1 UN 10.6.x.x 1.01 TiB 256 24.8% 39803f58-127f-453b-b102-ed7bdfb8afb2 rack1 UN 10.7.x.x 1.18 TiB 256 25.9% b6e692a6-249f-433d-8b54-1d20d4bc4962 rack1 UN 10.8.x.x 1.12 TiB 256 24.5% 8ed8c306-9ac9-4130-bff1-97f7d5d9a02f rack1 UN 10.9.x.x 973.26 GiB 256 24.4% f7489923-3cc3-43ec-83ca-42bbdeb0cbb7 rack1 UN 10.10.x.x 1.13 TiB 256 26.0% ea694224-ds0b-42f5-9acf-ff4ddfb450e0 rack1 UN 10.11.x.x 1.22 TiB 256 24.0% ddde4bce-553e-4246-9920-47sdfdf324ed rack1 UN 10.12.x.x 1.28 TiB 256 24.4% 0222d40f-edb8-4710-9bae-39dsfd87e18db rack1
https://stackoverflow.com/questions/68896064/cassandra-high-disc-space-utilization
We have a 3 node cluster in Test environment and running nodetool compact(one keyspace alone) reduced disc space and data load as below. But held back running the same on Prod as 1 node spiked up the disc space from 73% to 99% during compact process.
Used Disc Space Before compact After compact Cassandra 01 73% 60% - During compact spiked upto 99% Cassandra 02 58% 46% Cassandra 03 61% 43% Data Load Before compact After compact Cassandra 01 114.8 GiB 98.08 GiB Cassandra 02 152.77 GiB 88.57 GiB Cassandra 03 132.93GiB 89.33 GiB
@Erick Ramirez I have reposted the question here for your direct advice and suggestions.
1) Please suggest if just adding extra node helps or increasing the disc space of existing instance/s would work as well. We already have 12 nodes and we do not want to end up having more nodes to manage !
2) And we are also looking at a long term solution for having cassandra instance/s space under control. I can understand that the suggestions to add extra nodes comes in as the state of the cluster is beyond control. But how do I ensure that I don't end up in the same situation!
3) I also want to ensure that records older than 100 days are cleaned up automatically. If setting the TTL today will start taking effect only after 100 days, how can I clear up the records older than that. Will deleting the records manually and then enabling a lesser gc_grace_seconds(say minimum 3 hours), ensure that the records and the tombstones are deleted?Please suggest if I need to create any support ticket to get immediate attention on the issue.