penky28_147901 avatar image
penky28_147901 asked pihu_0206 edited

What are the steps for reducing vnodes in a production cluster?

Hi All,

We are currently running on Apache Cassandra 3.11.3 version. We are planning to upgrade to 4.0.

I see that Cassandra 4.0 recommends to use less number of vnodes. Can anyone please tell me high level of steps to reduce the virtual nodes on production cluster.

on 3.11.3, num_tokens = 256

Proposed to use num_tokens = 8 on 4.0 version.

One way I can think of is by creating new datacenter with new Vnodes and migrating the data to new datacenter. Once complete decommission the existing one. This involves extra hardware which involves additional cost.

Can you suggest any other approach for this activity. Downtime is acceptable.

5 node cluster, each node size is ~ 200 G. Only 1 datacenter is being used.


virtual nodes
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered pihu_0206 edited

It isn't possible to change the virtual nodes (num_tokens) once a node is part of a cluster. You can only set it for new nodes.

There are 2 possible options to implement this:

  1. Add a new DC with new vnode configuration then decommission old DC.
  2. Add new nodes with 8 tokens to the DC and decommission old nodes.

For option 1, the procedure is the same as switching from single-token nodes described in Enabling virtual nodes on an existing cluster.

For option 2, you could implement it whichever way suits you. For example:

  1. Install/configure C* on new server with num_tokens: 8.
  2. Add node to the DC.
  3. Decommission one of the existing nodes in the DC.
  4. Completely wipe contents of data/, commitlog/, saved_caches/ subdirectories.
  5. Reconfigure node with num_tokens: 8.
  6. Add node back to the DC.
  7. Repeat steps 3 to 6 above until all nodes have been reconfigured.

WARNING: Make sure when adding a seed node back into the cluster that it doesn't have its own IP in its seeds list or it will not bootstrap any data when it joins the cluster. Cheers!

4 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

penky28_147901 avatar image penky28_147901 commented ·
thanks Erick.

I was wondering if we can follow the below steps

1. Execute nodetool snapshot on each node

2. Copy snapshots to different drive(/backup) for each node

3. remove data and start Cassandra with 8 vnodes (num_tokens: 8)

4. copy data from /backup drive to each node

5. Start cassandra

Let me know if above approach works.


0 Likes 0 ·
Erick Ramirez avatar image Erick Ramirez ♦♦ penky28_147901 commented ·
Nope, or I would've given it as an option. :)

The data in the SSTables are only valid for the tokens that the node owns at the time. Once you change token assignments, you've made the data in the SSTables unreadable resulting in data loss. Cheers!

0 Likes 0 ·
pihu_0206 avatar image pihu_0206 Erick Ramirez ♦♦ commented ·

Hi Erick,I follow the same step as you suggested for option 2 but I can see the data did not get balanced. screenshot-2021-10-08-at-23524-am.png

Just for your information, I did not set bootstrap = true. I have RF =2

0 Likes 0 ·
penky28_147901 avatar image penky28_147901 commented ·

Thanks Eric.

0 Likes 0 ·