anson avatar image
anson asked Erick Ramirez answered

How will the replicas be handled when restoring data from a 5-node cluster to a 2-node cluster?

Hi, i have a 5 node cluster with rf=3. I took snapshot of these 5 nodes and have them in a folder. Each if these 5 node snapshots will also have the replica data of other nodes as well right?

So while restoring the above 5 node snapshot to a 2 node cluster , how will the replicas be handled( given that i am restoring to a lower node here (2 node ) ) ?

Also while taking backup using snapshot, is it possible to avoid the replica data and just keep the original data of those 5 nodes in the snapshots?

10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered

Since you are restoring data to a cluster with a different topology, I assume you are using the conventional restore procedure with the sstableloader utility.

The Cassandra bulk loader utility sstableloader loads table snapshots to a cluster by streaming relevant parts of SSTables to destination nodes. In this context, "relevant parts" means the data which belongs in the token range(s) owned by the destination nodes.

Think of sstableloader as just another client to the cluster sending mutations (writes) like any other client/app. Just like writes from any other app, the cluster will replicate the data as defined in the keyspace schema. So regardless of the replication settings in the source cluster, if the destination cluster has a replication factor of two then Cassandra will send the mutations to two replicas.

For the second part of your question, snapshots are local to each node so there's no facility for de-duplicating data across nodes since they all operate independently. Cheers!

10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.