nagasree963_193876 avatar image
nagasree963_193876 asked nagasree963_193876 commented

Does the cass-operator support disaster recovery?

I am using operator version:1.3 and application version: 3.11.6. I am using 2 data centers with same number of nodes.Will cass-operator supports disaster recovery. If supports how to achieve it?

10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

jim.dickinson_187342 avatar image
jim.dickinson_187342 answered

Could you clarify the support you're looking for? For losing k8s workers? For losing volumes? Something else?

10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Erick Ramirez avatar image
Erick Ramirez answered nagasree963_193876 commented

Your question is too open-ended for us to be able to provide a meaningful answer but I'll try my best to respond based on some assumptions.

The concept of disaster recovery (DR) is an old paradigm that is rooted in near-obsolete practices when an organisation's infrastructure was geographically distributed to just 2 or 3 locations. A remote DC classified as a DR site would get activated in the event of a catastrophic failure at the primary site. These days, most organisation cannot tolerate interruptions to their business operations and so have their systems running active-active all the time.

A multi-DC cluster in Cassandra operates in active-active fashion -- there is no primary, there is no secondary. All DCs receive writes from the application(s) in real-time. If DC1 is unavailable (for whatever reason), the remaining DCs continue to operate and the traffic gets diverted to the operational DCs.

Specifically with the cass-operator, when a node goes down, the operator will attempt to automatically recover the statefulsets (STS) provided the outage isn't due to a failure in the underlying infrastructure (e.g. physical host servers going down). Jim Dickinson will correct me if I'm mistaken but there is a limitation with Kubernetes at this point where multi-region deployments isn't a supported feature so clusters can only be deployed to a single-region right now. Cheers!

3 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

nagasree963_193876 avatar image nagasree963_193876 commented ·

Thanks @Erick Ramirez. I just write the data into the Node of DC1. I just deleted the DC1 and tried to connect to DC2 , I can get the keyspaces and tables. But not the data . When I try to retrieve the data, It is giving error like Host not found. But from your explaination, it should retrieve data when DC1 goes down. How can I get data when 1 Dc goes down. Can you please explain.

0 Likes 0 ·
Erick Ramirez avatar image Erick Ramirez ♦♦ nagasree963_193876 commented ·

As I thought, your question wasn't really about the cass-operator. This is really more about how replication works. Will you please post the schema for your keyspace? I just need to look at the line which has the replication settings like this:



0 Likes 0 ·
nagasree963_193876 avatar image nagasree963_193876 Erick Ramirez ♦♦ commented ·

This is the one I used:

CREATE KEYSPACE mykeyspace WITH replication = {'class': 'NetworkTopologyStrategy', 'dc1': '3', 'dc2': '3'} AND durable_writes = true

0 Likes 0 ·