virajut avatar image
virajut asked Erick Ramirez commented

What is the best way to move a cluster to new hardware?

I've Cassandra cluster running 4 nodes, one of which is seed node. Database is loaded with ~10GB of data. Now I am moving disks over to another hardware and trying to run cluster there. I am getting couple of errors like

  • Received an invalid gossip generation for peer
  • Nodes are not reachable to one another even if they're part of same cluster

I ran `nodetool drain` on each node before killing Cassandra process on each node. Than I stopped all the nodes of cluster with seed node being last.

Things tried:

  • Removed node from cluster and tried to add back
  • Removed file from commitlog as one of the error was related to commit log
  • Tried to update Gossip time but no effects

What would be the best approach to stop whole cluster and starting it again on new set of hardware? (I am using Cassandra from source, version: 4.0.1)

replace nodes
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

steve.lacerda avatar image steve.lacerda ♦ commented ·

By new h/w I'm assuming you mean you're moving the disk over to new servers? If so, are the IP's changing? How about hostnames? If so, then you may be having some issues there because the system.local and system.peers tables are probably incorrect. You can verify by just doing a select * on system.local and the same on the peers table.

0 Likes 0 ·
virajut avatar image virajut steve.lacerda ♦ commented ·
No IPs and hostname stays same, kind of managing it on an internal network.
0 Likes 0 ·
Erick Ramirez avatar image
Erick Ramirez answered Erick Ramirez commented

Replacing hardware is pretty straightforward in Cassandra provided the data directories are intact. Cassandra is capable of handling a new IP if necessary and will automatically update all nodes in the cluster as long as the original data directories exist.

It is recommended to start with the seed nodes as always so the rest of the nodes have a seed to gossip with on startup. The high level procedure is:

  1. On the first seed node, force all memtables to be flushed to disk with nodetool drain so there are no commit logs to replay on startup.
  2. Shutdown Cassandra.
  3. Prevent the node from accidentally re-joining the cluster by setting the following in cassandra.yaml:
    • cluster_name: REPLACED
    • set the seeds list to ""
  4. Unmount the data disk then mount it on the new server.
  5. Start Cassandra.
  6. Monitor the startup sequence with tail -f system.log.
  7. When the node has started successfully, confirm that it is listening for client connections on port 9042 with netstat -tnlp.
  8. Confirm the node is reporting as Up/Normal (UN) by running nodetool status on other nodes.
  9. Repeat steps 1 to 8 until all seed nodes are done.
  10. Repeat steps 1 to 8 until all other nodes are done.

This approach doesn't require downtime as long as the nodes are replaced one at a time. Cheers!

2 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

virajut avatar image virajut commented ·

Wow! This is the best and standard approach of all. I was kind of hoping for something like this.

Thanks a lot for sharing this @Erick Ramirez .

0 Likes 0 ·
Erick Ramirez avatar image Erick Ramirez ♦♦ virajut commented ·

Happy to help. Cheers!

0 Likes 0 ·
starlord avatar image
starlord answered Erick Ramirez commented

instead of manually updating gossip, try a rolling restart of any nodes that are currently online, that may work better, and like Steve said, check hostname/IP changes

9 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

virajut avatar image virajut commented ·

None of the nodes are online at the moment, I've shutdown all nodes and now trying to get back to the healthy cluster stage.

0 Likes 0 ·
starlord avatar image starlord ♦ virajut commented ·

The next time you should do this, I'd swap the hardware on one node at a time, setting the following in the jvm-server.options as you start the nodes with the new hardware:


For now, you could bootstrap your seed node to the cluster, but save your data from any user-created keyspaces/tables.

After the node is freshly bootstrapped to the cluster without any data, create your schema again.

Once the directory structure exists for the user-created tables, copy the following in the appropriate data directories:

-the sstables you saved from the original seed node

-the sstables from all the user-created data dirs on the other three nodes that are down

Now this one node has all of the sstables for the user created data, and assuming your RF was 3 it will have some redundant data as well.

Run 'nodetool refresh <keyspace> <table>' for every user-created table to populate the node with the data.

1 Like 1 ·
starlord avatar image starlord ♦ starlord ♦ commented ·

After the seed node is online with the data, you can ensure the Replication Factor is 3 for the keyspace, clear the directories of the remaining 3 nodes and bootstrap those to the cluster fresh.

After all nodes have joined the cluster, run 'nodetool cleanup' on the seed node that had redundant data and hopefully that gets you back to functional.

1 Like 1 ·
Show more comments
Show more comments