How to migrate a subset of tables(30 tables out of 110 ) from a single key space having TB of data from one cluser to another cluster ?
Source cluster and target cluster have same configuration(RF) and number of nodes ?
How to migrate a subset of tables(30 tables out of 110 ) from a single key space having TB of data from one cluser to another cluster ?
Source cluster and target cluster have same configuration(RF) and number of nodes ?
For the purposes of this question, two clusters have identical configuration if:
If the conditions above are true, you can use the "refresh method" where you copy the SSTable snapshot from the source cluster to the target cluster then run nodetool refresh
.
Let me illustrate with an example. Consider a cluster with 2 data centres, each with 3 nodes and the following token assignments:
Datacentre: AlphaDC
nodeA
: tokenA1, tokenA2, tokenA3nodeB
: tokenB1, tokenB2, tokenB3nodeC
: tokenC1, tokenC2, tokenC3Datacentre: BetaDC
nodeD
: tokenD1, tokenD2, tokenD3nodeE
: tokenE1, tokenE2, tokenE3nodeF
: tokenF1, tokenF2, tokenF3This cluster has an identical copy that has the following DCs and nodes:
Datacentre: CharlieDC
nodeU
: tokenA1, tokenA2, tokenA3nodeV
: tokenB1, tokenB2, tokenB3nodeW
: tokenC1, tokenC2, tokenC3Datacentre: DeltaDC
nodeX
: tokenD1, tokenD2, tokenD3nodeY
: tokenE1, tokenE2, tokenE3nodeZ
: tokenF1, tokenF2, tokenF3In this example nodeA
(from the first cluster) and nodeU
(second cluster) have the same tokens assigned, nodeB
and nodeV
have the same tokens, and so on.
Follow this procedure to migrate data from the first cluster to the second cluster in the example.
STEP 1 - For the first table, create the schema on the second cluster.
STEP 2 - Take the snapshot of the first table on nodeA
and copy it to the corresponding table directory on nodeU
. Note that the suffix of table directories are UUID timestamps and will be different between clusters since they are based on when the table was created.
STEP 3 - On nodeU
, force Cassandra to read and load the new SSTables from the disk:
$ nodetool refresh -- ks_name table_name
STEP 4 - Check the logs on the node to verify that the new SSTables were opened.
STEP 5 - Repeat the steps above on the next table and keep repeating until all the tables have been migrated.
STEP 6 - Repeat the steps above on by migrating the snapshot from nodeB
to nodeV
. Keep repeating until all nodes in the second cluster have the snapshots restored.
There is no need to use OpsCenter cloning or sstableloader
for this scenario. Cheers!
Thank you very much @Erick Ramirez for detailed answer. I understand all the steps just have one doubt . As a single table can spread on multiple nodes So in that case just assume first table spread ed on nodeA, nodeB and nodeC.
So I have to repeat Step2 & Step3 for same table as well?
table_xyz = nodeA -> nodeU,
nodeB -> nodeV,
nodeC-> nodeW
@satvantsingh_190085, I think, you don't really have to as the replication factor will kick in and apply it on other nodes to take care of the replicas.
... you don't really have to as the replication factor will kick in and apply it on other nodes to take care of the replicas.
You must copy the respective snapshots to all nodes in the new DC. Replication does not come into play in this situation because data is not getting streamed to the other DC. In the "refresh method", we are not bulk loading data to the cluster.
If you don't copy the snapshots to the nodes in DeltaDC, the data directory on those nodes will be empty and you don't want to "bootstrap" the data with repairs because that would be an expensive operation. You need to follow the procedure exactly as I documented. Cheers!
Yes, you have to repeat steps 1-3 for ALL tables and ALL nodes in the new cluster.
By "all nodes", I mean you need to do the steps on nodes X, Y and Z in the DeltaDC. Cheers!
@satvantsingh_190085 Can you please update the Cassandra version details so that we can provide you with specific tools to use?
Having said that, Cassandra provides a number of tools to copy\migrate data between clusters as detailed at Migrating data to DataStax Enterprise. I will provide a couple of such tools tools.
Load existing SSTables into another cluster with a different number of nodes or replication strategy.
You can find details about using this tool at sstableloader
DataStax Bulk Loader - Efficiently and reliably loads small or large amounts of data into cassandra cluster. It supports the following databases.
@satvantsingh_190085, you could also refer to the answer that was posted to a similar question earlier in this thread. Also, if you're using DSE version, you could perform this using OpsCenter. Cheers!
7 People are following this question.
DataStax Enterprise is powered by the best distribution of Apache Cassandra ™
© 2023 DataStax, Titan, and TitanDB are registered trademarks of DataStax, Inc. and its subsidiaries in the United States and/or other countries.
Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries.
Privacy Policy Terms of Use