Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

satvantsingh_190085 avatar image
satvantsingh_190085 asked ·

How do I migrate tables to a new cluster with identical configuration as the source cluster?

How to migrate a subset of tables(30 tables out of 110 ) from a single key space having TB of data from one cluser to another cluster ?

Source cluster and target cluster have same configuration(RF) and number of nodes ?

cassandrarestore
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Erick Ramirez avatar image
Erick Ramirez answered ·

For the purposes of this question, two clusters have identical configuration if:

  • the cluster topologies are identical -- same number of DCs, same number of nodes in each DC
  • the token assignments are identical -- the assigned tokens for each node in one cluster is a mirror-image of the nodes in another cluster

If the conditions above are true, you can use the "refresh method" where you copy the SSTable snapshot from the source cluster to the target cluster then run nodetool refresh.

Sample topology

Let me illustrate with an example. Consider a cluster with 2 data centres, each with 3 nodes and the following token assignments:

Datacentre: AlphaDC

  • nodeA: tokenA1, tokenA2, tokenA3
  • nodeB: tokenB1, tokenB2, tokenB3
  • nodeC: tokenC1, tokenC2, tokenC3

Datacentre: BetaDC

  • nodeD: tokenD1, tokenD2, tokenD3
  • nodeE: tokenE1, tokenE2, tokenE3
  • nodeF: tokenF1, tokenF2, tokenF3

This cluster has an identical copy that has the following DCs and nodes:

Datacentre: CharlieDC

  • nodeU: tokenA1, tokenA2, tokenA3
  • nodeV: tokenB1, tokenB2, tokenB3
  • nodeW: tokenC1, tokenC2, tokenC3

Datacentre: DeltaDC

  • nodeX: tokenD1, tokenD2, tokenD3
  • nodeY: tokenE1, tokenE2, tokenE3
  • nodeZ: tokenF1, tokenF2, tokenF3

In this example nodeA (from the first cluster) and nodeU (second cluster) have the same tokens assigned, nodeB and nodeV have the same tokens, and so on.

The "refresh" method

Follow this procedure to migrate data from the first cluster to the second cluster in the example.

STEP 1 - For the first table, create the schema on the second cluster.

STEP 2 - Take the snapshot of the first table on nodeA and copy it to the corresponding table directory on nodeU. Note that the suffix of table directories are UUID timestamps and will be different between clusters since they are based on when the table was created.

STEP 3 - On nodeU, force Cassandra to read and load the new SSTables from the disk:

$ nodetool refresh -- ks_name table_name

STEP 4 - Check the logs on the node to verify that the new SSTables were opened.

STEP 5 - Repeat the steps above on the next table and keep repeating until all the tables have been migrated.

STEP 6 - Repeat the steps above on by migrating the snapshot from nodeB to nodeV. Keep repeating until all nodes in the second cluster have the snapshots restored.

Things to know

  • This is an online procedure. It does not require the destination nodes to be shutdown.
  • This procedure only applies to clusters with identical topologies and token assignments.
  • It will not work for non-identical clusters because the partitions in the source SSTables will not necessarily fall in the token range(s) owned by the destination nodes.

There is no need to use OpsCenter cloning or sstableloader for this scenario. Cheers!

5 comments Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Thank you very much @Erick Ramirez for detailed answer. I understand all the steps just have one doubt . As a single table can spread on multiple nodes So in that case just assume first table spread ed on nodeA, nodeB and nodeC.

So I have to repeat Step2 & Step3 for same table as well?

table_xyz = nodeA -> nodeU,

nodeB -> nodeV,

nodeC-> nodeW



0 Likes 0 · ·
smadhavan avatar image smadhavan satvantsingh_190085 ·

@satvantsingh_190085, I think, you don't really have to as the replication factor will kick in and apply it on other nodes to take care of the replicas.

0 Likes 0 · ·
... you don't really have to as the replication factor will kick in and apply it on other nodes to take care of the replicas.

You must copy the respective snapshots to all nodes in the new DC. Replication does not come into play in this situation because data is not getting streamed to the other DC. In the "refresh method", we are not bulk loading data to the cluster.

If you don't copy the snapshots to the nodes in DeltaDC, the data directory on those nodes will be empty and you don't want to "bootstrap" the data with repairs because that would be an expensive operation. You need to follow the procedure exactly as I documented. Cheers!

1 Like 1 · ·
Show more comments

Yes, you have to repeat steps 1-3 for ALL tables and ALL nodes in the new cluster.

By "all nodes", I mean you need to do the steps on nodes X, Y and Z in the DeltaDC. Cheers!

0 Likes 0 · ·
smadhavan avatar image
smadhavan answered ·

@satvantsingh_190085, you could also refer to the answer that was posted to a similar question earlier in this thread. Also, if you're using DSE version, you could perform this using OpsCenter. Cheers!

Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

saravanan.chinnachamy_185977 avatar image
saravanan.chinnachamy_185977 answered ·

@satvantsingh_190085 Can you please update the Cassandra version details so that we can provide you with specific tools to use?

Having said that, Cassandra provides a number of tools to copy\migrate data between clusters as detailed at Migrating data to DataStax Enterprise. I will provide a couple of such tools tools.

  1. sstableloader - Streams a set of SSTable data files to a live cluster. It provides the ability to.
    • Bulk load external data into a cluster.
    • Restore snapshots.
    • Load existing SSTables into another cluster with a different number of nodes or replication strategy.

    • You can find details about using this tool at sstableloader

  2. DataStax Bulk Loader - Efficiently and reliably loads small or large amounts of data into cassandra cluster. It supports the following databases.


Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.