We are hosting a Cassandra cluster with 5 nodes and around 2 TB of data. Cassandra version 3.11.6. We run weekly repairs scheduled by Cassandra Reaper. This morning the main keyspace was suddenly dropped, and all data was deleted. We have no idea why this happened. Looking through the logs, we found this entry:
INFO [Native-Transport-Requests-13] 2021-03-01 09:46:42,275 MigrationManager.java:495 - Drop Keyspace 'helios'
There were also some errors related to insufficient disk space:
WARN [CompactionExecutor:83038] 2021-03-01 09:21:35,070 CompactionTask.java:356 - Not enough space for compaction, 101305.734MB estimated. Reducing scope. ERROR [CompactionExecutor:83038] 2021-03-01 09:21:35,118 CassandraDaemon.java:235 - Exception in thread Thread[CompactionExecutor:83038,1,main] java.lang.RuntimeException: Not enough space to write 57.776GiB to /var/lib/cassandra/data (47.092GiB available)
Could this error have caused the keyspace to be dropped? It seems like unlikely behaviour.
Thankfully, we managed to recover the data due to the auto_snapshot parameter. But we are still very worried about how this could have happened. We have since ip-restricted all incoming traffic to the nodes.