DataStax Academy FAQ

DataStax Academy migrated to a new learning management system (LMS) in July 2020. We are also moving to a new Cassandra Certification process so there are changes to exam bookings, voucher system and issuing of certificates.

Check out the Academy FAQ pages for answers to your questions:


question

satvantsingh_190085 avatar image
satvantsingh_190085 asked ·

How do I migrate tables to a new cluster with a different configuration to the source cluster?

How to migrate a subset of tables(30 tables out of 110 ) from a single key space having TB of data from one cluser to another cluster ?

Source cluster and target have different configuration(RF) and number of nodes ?

cassandrarestore
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

dmngaya avatar image
dmngaya answered ·

For case 1 or case 2, you can use sstableloader.

How it works: it takes representation of sstable on each tables and use that to load data in another cluster.

Prerequisites: the cassandra.yaml file setup will be have basically the same information than the cluster you are dumping data into.

You can find information about this tool here:

https://cassandra.apache.org/doc/latest/tools/sstable/sstableloader.html

how to do your migration ?

1) Make snapshot of each table on the source cluster

2) move all snapshots on each table from the source to to the target cluster

3) Don't forget to create the same structure on table on the target cluster

3) use sstableloader to restore them on the target cluster


2 comments Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

you don't need to do step 2 - just stream data directly from source cluster. If necessary, you can provide `cassandra.yaml` of target cluster via -f flag of sstableloader.

0 Likes 0 · ·

Thanks @dmngaya for your swift response. Further I have few queries -

Just assume I have 6 node cluster in source with RF=3 and all my table (33 tables) data spread in all 6 nodes (Range like : 0-10,11-20,21-30,31-40,41-50,51-60) . I will take snapshot using nodetool and It will create snapshot in each table directories (33*3=99) for all 33 tables.There will a structure like this -

/node1/data/keyspace/table1/snp*,/node1/data/keyspace/table2/snp*,...
/node2/data/keyspace/table1/snp*,/node2/data/keyspace/table2/snp*,...
...
/node6/data/keyspace/table2/snp*,/node5/data/keyspace/table1/snp*,...

how I will come to know In which target node and in which table directory I have to copy snapshot for each table ? Do I need to copy all (99) snapshot to each table directory on target cluster?

Kindly Explain !

0 Likes 0 · ·
Erick Ramirez avatar image
Erick Ramirez answered ·

I've modified your original question and broke them up into 2 parts:

  • this question specifically relates to cloning data to a non-identical cluster
  • the second part that relates to cloning data to a cluster with identical configuration is in question #4534

Preparation

On the source cluster, take a snapshot of the relevant keyspaces using the nodetool snapshot command. For example:

$ nodetool snapshot <keyspace_name>

Here is an example where I take a snapshot of the community keyspace.

STEP B1 - Create a snapshot:

$ nodetool snapshot community
Requested creating snapshot(s) for [community] with snapshot name [1591083719993] and options {skipFlush=false}
Snapshot directory: 1591083719993

The directory 1591083719993 name is a Unix timestamp for when the snapshot was created and is equivalent to June 2, 2020 7:41am GMT. There is one table called users in my example keyspace and the snapshot is located in the following directory structure:

data/
  community/
    users-6140f420a4a411ea9212efde68e7dd4b/
      snapshots/
        1591083719993/
          manifest.json
          mc-1-big-CompressionInfo.db
          mc-1-big-Data.db
          mc-1-big-Digest.crc32
          mc-1-big-Filter.db
          mc-1-big-Index.db
          mc-1-big-Statistics.db
          mc-1-big-Summary.db
          mc-1-big-TOC.txt
          schema.cql

For more info, see Taking a snapshot.

Tooling

Taking a snapshot needs to be carried out on all nodes in the cluster. It is preferable if you create them in parallel to make it simpler for you to identify the snapshot folders.

To achieve this, I recommend using tools you already have in your environment. If you are already using orchestration tools like Ansible, create the snapshots in parallel by running the command on all nodes simultaneously. Similarly, you can also script the restore operation so you can execute it in parallel using Ansible.

If you are not using orchestration tools, consider using Cluster SSH (cssh) or Parallel SSH (pssh) so you can run commands simultaneously on all nodes in your cluster.

Cloning a table

PREPARATION - Create the keyspace and table schema on the destination cluster. If necessary, use the schema.cql file in the snapshots folder as a guide.

Once the keyspace and table schema has been created, follow the procedure below to restore the tables.

STEP 1 - Copy the snapshot to a temporary location so that the SSTable files are located in a directory with keyspace_name/table_name. For example:

$ cp -p data/community/users-6140f420a4a411ea9212efde68e7dd4b/snapshots/1591083719993/* /path/to/community/users/.

STEP 2 - Load the data files to the destination cluster with the utility as follows:

$ sstableloader -d dest_node_ip1, dest_node_ip2 /path/to/community/users/

STEP 3 - Repeat steps 1 & 2 on the next node in the source cluster until the snapshots on ALL nodes have been loaded to the destination cluster.

Repeat the steps above on each table that you want to clone to the destination cluster.

For more info, see Cassandra bulk loader. Cheers!

1 comment Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Hi @Erick Ramirez,

I am a novice in Cassandra and I have question-based on the below scenarios:-

https://docs.datastax.com/en/ddac/doc/datastax_enterprise/operations/opsMoveCluster.html

1. I have a cluster with 5 nodes at source and now I want to migrate this cluster to a new cluster which is having 3 nodes. Are these scenarios possible? or having node ratio 1:1 ( as described in https://docs.datastax.com/en/ddac/doc/datastax_enterprise/operations/opsMoveCluster.html) already eliminate this question?

2. If I have source and destination cluster identical in size then taking backup on each individual node (nodetool snapshot) and then using sstableloader to migrate that data to new nodes of the destination cluster. Is this we have to do for the respective node at the source and the corresponding node at the destination?

3. How token based on the partition key gets distributed among destination cluster nodes in 1st and 2nd question?

0 Likes 0 · ·