nicholasamorim avatar image
nicholasamorim asked Erick Ramirez answered

Shipping Disk to New DC in order to sync 50TB of data

We're adding a new datacenter to our Cassandra cluster. Currently we have a 15-node DC with RF=3 resulting in about 50TB~ of data.

We are adding another datacenter in a different country and we want both datacenters to contain all the data. Obviously, synchronising 50TB of data across the internet will take a gargantuan amount of time.

Is is possible to copy a full back to a few disks, ship that to the new DC and then recover? I'm just wondering what would be the procedure to do so.

  • Could someone give me a few pointers on this operation, if possible at all?
  • Or any other tips?

Not sure if it matters but our new DC is going to be smaller (6 nodes) for the time being, although space will be available. The new DC is mostly meant as a live-backup/failover and will not be the primary cluster for writing, generally speaking.

10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered

@nicholasamorim it isn't really possible to copy (or restore the files) from one DC to another because each partition (the equivalent of records in RDBMS) is going to be distributed differently between the existing DC and the new DC. For example, partition XYZ which is owned by node J in the source DC could be owned by node B in the new DC.

If you were building a completely separate cluster in another country, then it is possible to bulk load a snapshot to the new cluster using the sstableloader utility. The steps are documented in Restoring from a snapshot. WARNING - This method is NOT valid if the new DC is part of the same cluster as the existing (source) DC. Cheers!

10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.