PLANNED MAINTENANCE

Hello, DataStax Community!

We want to make you aware of a few operational updates which will be carried out on the site. We are working hard to streamline the login process to integrate with other DataStax resources. As such, you will soon be prompted to update your password. Please note that your username will remain the same.

As we work to improve your user experience, please be aware that login to the DataStax Community will be unavailable for a few hours on:

  • Wednesday, July 15 16:00 PDT | 19:00 EDT | 20:00 BRT
  • Thursday, July 16 00:00 BST | 01:00 CEST | 04:30 IST | 07:00 CST | 09:00 AEST

For more info, check out the FAQ page. Thank you for being a valued member of our community.


question

pranali.khanna101994_189965 avatar image
pranali.khanna101994_189965 asked ·

How does Cassandra handle a whole data center failing?

Failover and replication in Cassandra

can the whole DataCenter be failed if yes , then how they are handled ?

cassandra
2 comments
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

can you be more specific - what answer are you looking for? what do you want to understand?

0 Likes 0 · ·

I meant ... when all nodes in DataCentre is down then how request can be handled?

0 Likes 0 · ·
Erick Ramirez avatar image
Erick Ramirez answered ·

Data center failover is not handled in the database layer so Cassandra does not perform any action to recover a DC. If a node is down or unavailable during a write request, Cassandra handles this with the Hinted Handoff -- a mechanism where the coordinator node responsible for managing a write request will store hints (write mutations) and replay it to the replica when it comes back online. But if a whole DC is down, this mechanism isn't relevant since there would be no nodes in the DC to coordinate requests.

In older versions of the DataStax drivers, a DC outage was handled by the DC-aware load balancing policy. To use the Java driver version 3.9 as an example, the DCAwareRoundRobinPolicy will build a query plan with contact points from the local DC first and add nodes from remote DCs to the end of the query plan. This means that if nodes in the local DC are not available, it will connect to nodes in remote DCs effectively "failing over".

We no longer think that is the ideal way of handling outages to the DC. Think of the situation where the app is querying with LOCAL_QUORUM consistency level but with the local DC down, suddenly the query gets run in a remote DC. Instead of the driver failing over at the application layer, the failover should instead be handled at the infrastructure layer.

In newer versions of the drivers (Java driver 4.x for example), the default load-balancing policy will only ever connect to a single DC -- the local DC. If the local C* DC (local to the app instances) is down or unavailable, chances are it's a full site outage and the app instances are unavailable as well. In this instance, the infrastructure load-balancer should failover to another site/region. This approach means that consistency guarantees are not compromised and that local CLs will always be local. Cheers!

Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

smadhavan avatar image
smadhavan answered ·

@pranali.khanna101994_189965, yes Cassandra is built to support that level of failure in mind in its architecture. You might understand this concept well by reading through the following resources,

Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.