question

satvantsingh_190085 avatar image
satvantsingh_190085 asked Erick Ramirez edited

How do I migrate tables to a new cluster with a different configuration to the source cluster?

How to migrate a subset of tables(30 tables out of 110 ) from a single key space having TB of data from one cluser to another cluster ?

Source cluster and target have different configuration(RF) and number of nodes ?

cassandrarestore
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Erick Ramirez avatar image
Erick Ramirez answered Erick Ramirez edited

I've modified your original question and broke them up into 2 parts:

  • this question specifically relates to cloning data to a non-identical cluster
  • the second part that relates to cloning data to a cluster with identical configuration is in question #4534

Preparation

On the source cluster, take a snapshot of the relevant keyspaces using the nodetool snapshot command. For example:

$ nodetool snapshot <keyspace_name>

Here is an example where I take a snapshot of the community keyspace.

STEP B1 - Create a snapshot:

$ nodetool snapshot community
Requested creating snapshot(s) for [community] with snapshot name [1591083719993] and options {skipFlush=false}
Snapshot directory: 1591083719993

The directory 1591083719993 name is a Unix timestamp for when the snapshot was created and is equivalent to June 2, 2020 7:41am GMT. There is one table called users in my example keyspace and the snapshot is located in the following directory structure:

data/
  community/
    users-6140f420a4a411ea9212efde68e7dd4b/
      snapshots/
        1591083719993/
          manifest.json
          mc-1-big-CompressionInfo.db
          mc-1-big-Data.db
          mc-1-big-Digest.crc32
          mc-1-big-Filter.db
          mc-1-big-Index.db
          mc-1-big-Statistics.db
          mc-1-big-Summary.db
          mc-1-big-TOC.txt
          schema.cql

For more info, see Taking a snapshot.

Tooling

Taking a snapshot needs to be carried out on all nodes in the cluster. It is preferable if you create them in parallel to make it simpler for you to identify the snapshot folders.

To achieve this, I recommend using tools you already have in your environment. If you are already using orchestration tools like Ansible, create the snapshots in parallel by running the command on all nodes simultaneously. Similarly, you can also script the restore operation so you can execute it in parallel using Ansible.

If you are not using orchestration tools, consider using Cluster SSH (cssh) or Parallel SSH (pssh) so you can run commands simultaneously on all nodes in your cluster.

Cloning a table

PREPARATION - Create the keyspace and table schema on the destination cluster. If necessary, use the schema.cql file in the snapshots folder as a guide.

Once the keyspace and table schema has been created, follow the procedure below to restore the tables.

STEP 1 - Copy the snapshot to a temporary location so that the SSTable files are located in a directory with keyspace_name/table_name. For example:

$ cp -p data/community/users-6140f420a4a411ea9212efde68e7dd4b/snapshots/1591083719993/* /path/to/community/users/.

STEP 2 - Load the data files to the destination cluster with the utility as follows:

$ sstableloader -d dest_node_ip1, dest_node_ip2 /path/to/community/users/

STEP 3 - Repeat steps 1 & 2 on the next node in the source cluster until the snapshots on ALL nodes have been loaded to the destination cluster.

Repeat the steps above on each table that you want to clone to the destination cluster.

For more info, see Cassandra bulk loader. Cheers!

1 comment Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

sunilrpawar7_183464 avatar image sunilrpawar7_183464 commented ·

Hi @Erick Ramirez,

I am a novice in Cassandra and I have question-based on the below scenarios:-

[Follow up questions posted in #7935 and #7936]

0 Likes 0 ·
dmngaya avatar image
dmngaya answered satvantsingh_190085 commented

For case 1 or case 2, you can use sstableloader.

How it works: it takes representation of sstable on each tables and use that to load data in another cluster.

Prerequisites: the cassandra.yaml file setup will be have basically the same information than the cluster you are dumping data into.

You can find information about this tool here:

https://cassandra.apache.org/doc/latest/tools/sstable/sstableloader.html

how to do your migration ?

1) Make snapshot of each table on the source cluster

2) move all snapshots on each table from the source to to the target cluster

3) Don't forget to create the same structure on table on the target cluster

3) use sstableloader to restore them on the target cluster


2 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

alex.ott avatar image alex.ott commented ·

you don't need to do step 2 - just stream data directly from source cluster. If necessary, you can provide `cassandra.yaml` of target cluster via -f flag of sstableloader.

0 Likes 0 ·
satvantsingh_190085 avatar image satvantsingh_190085 commented ·

Thanks @dmngaya for your swift response. Further I have few queries -

Just assume I have 6 node cluster in source with RF=3 and all my table (33 tables) data spread in all 6 nodes (Range like : 0-10,11-20,21-30,31-40,41-50,51-60) . I will take snapshot using nodetool and It will create snapshot in each table directories (33*3=99) for all 33 tables.There will a structure like this -

/node1/data/keyspace/table1/snp*,/node1/data/keyspace/table2/snp*,...
/node2/data/keyspace/table1/snp*,/node2/data/keyspace/table2/snp*,...
...
/node6/data/keyspace/table2/snp*,/node5/data/keyspace/table1/snp*,...

how I will come to know In which target node and in which table directory I have to copy snapshot for each table ? Do I need to copy all (99) snapshot to each table directory on target cluster?

Kindly Explain !

0 Likes 0 ·