arnulf.hanauer_193730 avatar image
arnulf.hanauer_193730 asked smadhavan answered

Is rebalancing required on a vnode based cluster post node removal?

I started with a perfectly token range balanced Cassandra 3.11.8 cluster of 6 nodes (vnodes=4, RF=3) but needed to remove one node from one of the DC's. After a successful "nodetool decommission" process, the cluster now has 5 nodes as expected but if I look at the token ranges that each node is responsible for, one of the remaining 5 nodes now has double the range for each vnode (almost like a single node took all the range from the decommissioned node).

nodetool status (post decom)

nodetool ring (post decom)

Each node is a physical server and defined as a single rack. If you look at the ranges for rack#20 in the "nodetool ring", it has double the range compared to the other nodes. This node#20 is currently suffering many "dropped messages" and now appears overloaded, I suspect due to the extended range it has to service.

Documentation says that rebalancing a vnode based cluster is not required after node removal but is a must for a single token node.

So the question is: Do I need to rebalance my vnode based cluster to get the load percentages and token range distribution back to balanced numbers?

If the answer is yes, then do I use the "nodetool decommission" and node add (with new initial token calculated and configured)? I also have "allocate_tokens_for_keyspace" correctly configured, so would that negate the requirements for the "initial_token" setting?

virtual nodes
nodetool-status.png (23.9 KiB)
nodetool-ring.png (77.5 KiB)
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Erick Ramirez avatar image
Erick Ramirez answered arnulf.hanauer_193730 commented

No, there is no option to rebalance tokens when nodes are configured with vnodes.

As you already stated, one of the main reasons for using virtual nodes is rebalancing is automatically handled for you when you add or remove nodes.

In your case, the nodes are placed in 5 different racks but your keyspaces are configured with a replication factor of 3. In this situation, the unbalanced distribution is expected.

In most cases, a single-rack configuration is sufficient but if you really wanted to distribute nodes in different racks, you need to have an equal number of racks as there are replicas. This is difficult to achieve in your situation since you need to have N nodes in a DC where N is divisible by 3 (the replication factor).

With 3 racks, you will need to have N nodes which are multiples of 3 so 3, 6, 9, 12, etc, for an even distribution of data. Otherwise, default to a single-rack configuration. Cheers!

1 comment Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

arnulf.hanauer_193730 avatar image arnulf.hanauer_193730 commented ·

Thanks Erick,

I believe you are referring to the load distribution which would as you say not be equal as its not a multiple of 3 due to the RF. I accept that.

There seems to be a secondary problem that can possibly be rebalanced and that is the token range per vnode. Initially I had perfect distribution of range across all vnodes, now I sit with a lopsided range distribution for one node (4 vnodes). I believe that if I would decommission the entire DC and reallocate it with perfect token range distribution (using initial_token calculations), I can create a better solution as each vnode would be responsible for the same exact token counts (again, I'm not worried about the load distribution).

I can clearly see that the one node#20 that took the ranges of the decommissioned node in my case is suffering badly with performance since that move.

Any further comments would be appreciated.

0 Likes 0 ·
smadhavan avatar image
smadhavan answered

@arnulf.hanauer_193730, all the rebalancing related questions are already getting handled in the other thread but, I just wanted to remind you that post removing a node(s), you might want to leverage running nodetool cleanup on all of the existing nodes in the cluster to cleanup keyspaces and table partition keys that are no longer belonging to a node post the token ranges are adjusted based on the removal.

10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.