pi_165798 avatar image
pi_165798 asked Erick Ramirez answered

Cluster suddenly dropped a keyspace


We are hosting a Cassandra cluster with 5 nodes and around 2 TB of data. Cassandra version 3.11.6. We run weekly repairs scheduled by Cassandra Reaper. This morning the main keyspace was suddenly dropped, and all data was deleted. We have no idea why this happened. Looking through the logs, we found this entry:

INFO  [Native-Transport-Requests-13] 2021-03-01 09:46:42,275 - Drop Keyspace 'helios'

There were also some errors related to insufficient disk space:

WARN  [CompactionExecutor:83038] 2021-03-01 09:21:35,070 - Not enough space for compaction, 101305.734MB estimated.  Reducing scope.

ERROR [CompactionExecutor:83038] 2021-03-01 09:21:35,118 - Exception in thread Thread[CompactionExecutor:83038,1,main]
java.lang.RuntimeException: Not enough space to write 57.776GiB to /var/lib/cassandra/data (47.092GiB available)

Could this error have caused the keyspace to be dropped? It seems like unlikely behaviour.

Thankfully, we managed to recover the data due to the auto_snapshot parameter. But we are still very worried about how this could have happened. We have since ip-restricted all incoming traffic to the nodes.

10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered

There is no operation/functionality/feature in Cassandra that would cause a keyspace to be dropped. If it did, that would absolutely be catastrophic for a database to do.

If you look closely at the log entry, the thread that reported the keyspace getting dropped was Native-Transport-Requests:

INFO  [Native-Transport-Requests-13] 2021-03-01 09:46:42,275 - Drop Keyspace 'helios'

If you're not already aware, native transport in Cassandra is the native binary protocol (aka CQL). This means that the keyspace drop was initiated by a CQL client -- cqlsh, CQL tools such as DevCenter or DataStax Studio, application, etc.

Interestingly, I responded to an identical question on the Cassandra user mailing list exactly about the same issue so I'll provide the same answer here.

Since it came as a CQL request, the keyspace didn't get randomly dropped -- some operator/developer/daemon/ orchestration tool/whatever initiated it either intentionally or by accident.

I've seen this happen a number of times where a developer thought they were connecting to a dev/staging/test environment and issued a DROP or TRUNCATE not realising they were connected to production. Not saying this is what happened in your case but this should give you some ideas on where to focus your investigation. Cheers!

10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.