question

bakisahin_185308 avatar image
bakisahin_185308 asked med.amghari_30102 commented

How do you configure Apache Cassandra for disaster recovery?

How to configure apache cassandra for disaster recovery?

If cassandra on primary data center down, we want to failover DR data center.

Also we want to switchover and use DR data center for production and production site as DR.

How can we configure these architecture on apache cassandra?

Regards.

cassandradisaster recovery
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Cedrick Lunven avatar image
Cedrick Lunven answered med.amghari_30102 commented

Hi, @bakisahin_185308,


TL;DR:

  • Solution 1: 2 DC with same number of nodes, RF=3 on each DC to ensure resilience.
  • Solution 2: Single logical DC using RACK to hint Cassandra to tell which nodes are colocated and should not store the same data. The is how you setup in AWS for instance. 1 DC per region and some nodes distributed in multiple AZ.
  • Consistency Level depends on your requests and use cases LOCAL_QUORUM seems good candidate for you
  • There is not a lot special about configuration for multi-DC installation. Simply Install node per node, define the DC name and rack for each and seeds node. When a node will start it joins the correct DC and stream portion of the data. Here some doc : https://docs.datastax.com/en/ddac/doc/datastax_enterprise/production/DDACmultiDCperWorkloadType.html

More Details:

  • Apache Cassandra is an active-active distributed system. Any DC will be up and accept request. You CAN route requests any time to both DC to distribute the load.


  • In the driver you set the `localDataCenter` name to name the closest DC from the application. At first login the client will get full topology of the cluster (all IP, all DC) and will be able to fail over for you there is nothing to setup at server level for "DR"


  • When you create a keyspace (like Oracle Schema) you will define the DC where the keyspace live and how many it is replicated on each.
CREATE KEYSPACE IF NOT EXISTS killrvideo 
WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'dc1' : 3, 'dc2': 3 }
AND DURABLE_WRITES = true;


  • With such keyspace data is replicated in 2-WAYS in real time. Having 2 DC is good to ensure the lost of an entire site.


  • Recommended size is about 1TB per node, and about 3000 transactions/second/core. The recommended replication factor is 3. If both DC will have the same data it is recommended to have also same number of nodes.


is it possible 6 node on primary DC and 4 node DR DC

  • Technically speaking this is possible. You have to ensure that 4 nodes is enough capacity (1TB per node divide by 3 due to RF) to store all data of your 6 "main"
  • Reducing the RF for DC2 in order to save space and node is a bad idea. If you say RF=1, loosing a node mean loosig data and if you say RF=2 you cannot have a QUORUM
  • DC2 is always started you won't save any CPU or cost here.


Training


Best Regards

2 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

bakisahin_185308 avatar image bakisahin_185308 commented ·

thank you so much Cedrick.

0 Likes 0 ·
med.amghari_30102 avatar image med.amghari_30102 commented ·

Hi Cedric,

It seems the auto failover is no more used on version > 4.0 of the driver.


https://docs.datastax.com/en/developer/java-driver/4.7/manual/core/load_balancing/

tells :

In previous driver versions, you could configure application-level failover, such as: “if all the Cassandra nodes in DC1 are down, allow app1 to connect to the nodes in DC2”. We now believe that this is not the right place to handle this: if a whole datacenter went down at once, it probably means a catastrophic failure happened in Region1, and the application node is down as well. Failover should be cross-region instead (handled by the load balancer in this example).


Can you please clarify ?


Best regards

0 Likes 0 ·
Erick Ramirez avatar image
Erick Ramirez answered Erick Ramirez commented

@bakisahin_185308 There is nothing special that you need to do to your Cassandra cluster. We recommend that you do the failovers at the application layer so that when the primary DC is unavailable, you would switch the traffic to the app servers in DR which are using the DR Cassandra DC as its local DC (its primary DC). For all intents and purposes, the failover operation is transparent to Cassandra and it will continue to operate as normal.

For more information, see the DataStax Java Driver Load-balancing document. I realise that you may not necessarily be using the Java driver but the same concept applies regardless of the driver you are using. Cheers!

2 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

bakisahin_185308 avatar image bakisahin_185308 commented ·

Hi Erick,

thank you so much for your answer. What should we consider about these architecture?

is it possible 6 node on primary DC and 4 node DR DC , what shuld be Consistency Level, Replication Factor? Do you have link or doc about these issees and step by step configuration for this architecture?

Regards.

0 Likes 0 ·
Erick Ramirez avatar image Erick Ramirez ♦♦ bakisahin_185308 commented ·

@bakisahin_185308 For replication, we recommend using NetworkTopologyStrategy with a replication factor of 3 in each DC. For more info, see Data Replication.

For the best consistency, we recommend using LOCAL_QUORUM for both reads and writes. For more info, see Data Consistency. Cheers!

1 Like 1 ·