question

azim_91_184236 avatar image
azim_91_184236 asked azim_91_184236 commented

How do I migrate data from Cassandra 2.1/3.0 clusters to 3.11 clusters?

I have a migration scenario where source clusters are C* 2.1.x to 3.0.x and the destination clusters are C* 3.11.6. If understood from the datastax documentation https://docs.datastax.com/en/cassandra-oss/3.x/cassandra/operations/opsMoveCluster.html correctly, we can use sstableloader to stream the data from source cluster to destination cluster without worrying about the sstable version difference, since sstableloader (of C* 3.0.5 or greater) can support previous sstable versions.

Questions-

1. Is the above understanding correct? I also looked at the article https://community.datastax.com/questions/4477/how-to-migrate-a-subset-of-tables30-tables-out-of.html

2. If yes, do we need to run sstableloader from a different machine, since nodes of the source cluster may have a sstableloader version < C* 3.0.5

I would appreciate any guidance and clarity of the steps for this scenario.

migration
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered azim_91_184236 commented

That's correct. You will need to use the Cassandra 3.11 sstableloader to load the C* 2.1/3.0 SSTables into a C* 3.11 cluster.

Because of this, you cannot run sstableloader in-place. You won't be able to run it straight off the source cluster anyway since the SSTables will be in the incorrect path, for example:

data/
  ks_name/
    table_name-6140f420a4a411ea9212efde68e7dd4b/
      snapshots/
        1591083719993/
          [data_files]

The data files will need to be located in the table directory in the path ks_name/table_name/ for it to work.

We recommend that you run sstableloader on a server that is not part of the source or destination clusters for optimum performance. This means that you will need to copy the data from the source nodes. Cheers!

3 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

azim_91_184236 avatar image azim_91_184236 commented ·

Thanks @Erick Ramirez! I understand the copy requirement, but if I was able to use the sstableloader on the source cluster node, we could have potentially copied the data files to the table directory path on the same node.

But I understand your recommendations to run sstableloader on the 3rd server (or set of servers) and have seen the article https://community.datastax.com/questions/5979/sstableloader-1.html

But for a scenario where my source cluster is OnPrem and target cluster is on cloud (IaaS), I need to figure out if I should put the 3rd server on OnPrem or on the cloud.

1. With 3rd server(s) onPrem, data copy is easier, we can let the sstableloader stream the data over the network.

2. With 3rd server(s) on the cloud, will have to copy all the files over the network to the 3rd server and then run sstableloader there to stream the data to the cluster.

Do you have any suggestion 1 vs 2? Also, the 3rd server needs to have the same cassandra.yaml file as the target server, correct?

0 Likes 0 ·
Erick Ramirez avatar image Erick Ramirez ♦♦ azim_91_184236 commented ·

You are better off copying the data to where you need to load them so you're not streaming with the loader across the WAN. Cheers!

0 Likes 0 ·
azim_91_184236 avatar image azim_91_184236 Erick Ramirez ♦♦ commented ·

@Erick Ramirez, one more clarification. The approach (using sstableloader to migrate to higher version) should work for source Cassandra C* 2.0.x as well, correct?

0 Likes 0 ·