Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

yun avatar image
yun asked yun commented

What happens if the network goes down between two DCs?

I have a Cassandra cluster with 2 datacenters of 3 nodes each - DC1 and DC2.

During disaster test, network will be down between DC1 and DC2, and application will connect to DC2.

question 1:

Do we need shutdown cassandra service in DC1 during the test, then bring it back and run nodetool repair one by one after the test? OR no shutdown, just run nodetool repair on all nodes after the test?

question 2:

Once network recovered, before data fully synced from DC2 to DC1, if application use Consistency Level = one, will it get outdated data from DC1?

thanks

disaster recovery
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

smadhavan avatar image
smadhavan answered yun commented

@yun, here is my take on your questions.

  1. There is no need to shutdown the C* services in DC1 nodes during your test if you're able to control that at the network level while the application is connected to DC2. Since you haven't mentioned how long will be the DR testing, I am just going to assume that to be less than an hour or three. The data will get synced automatically, but if you prefer, it is okay to run nodetool repair -pr on every node in DC1 one node at a time
  2. If your application is connecting to DC1 post network recovery and prior to full data sync with a consistency level of ONE for reads, there is a potential to get stale or outdated data and hence it is always recommended to leverage LOCAL_QUORUM as the consistency level for the application reads & writes.

Refer to these resources:

I hope that helps!

1 comment Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Thanks @smadhavan,

1. The test will run for a few days(greater than `max_hint_window_in_ms` and less than `gc_grace_seconds`), so I think we need run nodetool repair.

2.Since application is using `DCAwareRoundRobinPolicy` and DC1 is localDc, once network recovered, it will connect to DC1, even if it use `LOCAL_QUORUM`, I guess it will get stale data from DC1, right? would be better to shutdown cassandra services in DC1 during the test, and once the test finished, start one node then repair it, since it can't meet LOCAL_QUORUM consistency level, so it will get latest data from DC2, then start the second node and repair, then the third node. does this make sense?

Thanks for your help.



1 Like 1 ·
Erick Ramirez avatar image
Erick Ramirez answered yun commented

If there is no network connectivity between the two DCs there are several problems that could arise:

  • Replication between the 2 DCs will not work. Also consider that coordinators in the operational DC (DC2 in your case) will be storing hints (writes which could not delivered to remote replicas) so be aware that it could have a significant impact to the disk space on the nodes in the DC.
  • One important thing to consider is that if the outage is greater than the smallest gc_grace_seconds (each table has its own GC grace set in the table schema), you should not bring the nodes back online or risk resurrecting deleted data. Instead, you need to completely erase all the contents of the nodes in DC1, rebuild them then add them back into the cluster.
  • Any queries that require a non-local consistency as QUORUM or SERIAL (in the case of lightweight transactions) will fail.
  • Queries that use the default cassandra superuser (if authentication is enabled) will fail since it requires a QUORUM of replicas to be available.

To respond to your questions directly:

  • It makes no difference whether you shutdown the nodes in DC1.
  • You need to run a rolling repair, one node at a time, until all nodes are repaired.
  • Queries against DC1 with any local consistency (LOCAL_ONE, LOCAL_QUORUM) or ONE will not return the latest results until nodes are repaired.

Cheers!

8 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Thanks Erick!

0 Likes 0 ·

Hi Erick,

I have a question about gc_grace_seconds, found gc_grace_seconds is 0 in below system keypsace tables, do we need erase all nodes in DC1 and rebuild them? thanks

dse_system

dse_leases

OpsCenter

system

dse_perf

system_traces

0 Likes 0 ·
No, I was referring to application keyspaces only. Not system keyspaces. Cheers!
0 Likes 0 ·
yun avatar image yun Erick Ramirez ♦♦ ·

cool, thanks

0 Likes 0 ·

Hi Erick,

if the outage is greater than gc_grace_seconds, for the rebuild steps, I found it in https://community.datastax.com/answers/11883/view.html

we only need remove data files of application keyspaces and rebuild them, once rebuild finished on all nodes in DC1, then run nodetool repair to sync system keyspaces, like opscenter, right?


0 Likes 0 ·

or could we raise gc_grace_seconds? so we can avoid rebuild, just run nodetool repair, does this make sense?

0 Likes 0 ·

we only need remove data files of application keyspaces and rebuild them, right?

No, you need to wipe the disk completely and uninstall Cassandra. You need to start from scratch completely as if the servers are brand new.

or could we raise gc_grace_seconds? so we can avoid rebuild, just run nodetool repair, does this make sense?

No, increasing GC grace creates more problems than you're trying to solve. Follow the recommended procedure. Cheers!

0 Likes 0 ·
yun avatar image yun Erick Ramirez ♦♦ ·

Thanks Erick, just double confirm, if we increase gc_grace_seconds before network down, could it help or it also create more problems?

0 Likes 0 ·