I have a requirement to migrate a keyspace (employees keyspace with only 1 table in it) in production from one existing source cluster to another existing target cluster. Here are the details:
Source Cluster -- Number of nodes 6 (1 DC, Replication Factor - 3) -- has a total of 7 keyspaces (including employees keyspace)
Target Cluster -- Number of nodes 6 (1 DC, Replication Factor - 3) -- has a total of 4 keyspaces (not including employees keyspace)
Since the source and Target clusters are already hosting other keyspaces, we are not allowed to do any changes to cassandra.yaml configuration file. We are only allowed to copy/migrate the entire employees keyspace (has only 1 table) from source to target without modifications to the existing cluster configuration. Also, they are not identical clusters with varying token range assignments, so I have followed the following article to migrate the data for the 1 table.
1. Created snapshot on Node1 of source and copied the snapshot files to the /path/to/<keyspace>/<table> directory structure of Target node1
2. Ran the sstableloader on node1 of target cluster.
3. copied the snapshots from node2 of source node and copied the contents to the node1 of target.
4. Similarly, I have run sstableloader for the data coming from all source nodes and executed it on only the node1 of target.
The problem is that on source, the table size on each node is approx. in the range of 33gb -35gb but on the target the size of the table has increased to the range of 60gb -85gb on each node.
1. Why has the table size increased considerably on all the nodes of the target?
2. Do I need to run nodetool repair or compaction to bring it to the actual 33-35gb range as it is on source?
3. How do I validate that all the data is copied properly and there are no discrepancies between source and target tables?
Appreciate your inputs..