rashokkumartce_193569 avatar image
rashokkumartce_193569 asked Erick Ramirez answered

How is data split across nodes in DC when a new node joins the cluster?

when the new node joins the cluster

1. will there be a change in the token ranges for each node? For eg) if node A is responsible for 1-10 partition token ranges , and after the new node is joined , will it be given responsibility of handling only 1-8 token ranges ?

2. If the above one is true, won't it be an expensive operation as data will be shifted to to other in every nodes? As a result will there be down time ?

10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered

The token assigned to a node determines the token range (and its underlying data) the node owns. It also determines it's position around the ring (data centre).

Consider this example 3-node cluster where the yellow node at the top owns the token range shown as yellow:

When a new light-blue node is added whose token bisects the yellow token range, the light-blue node takes ownership of half the data that the yellow node owned (section of the ring shown in light-blue):

To answer your questions explicitly:

  1. Yes, the token you assigned to the new node will change the token range of the adjacent node in the ring.
  2. Yes, portions of the data that another used to own will be streamed to the new node when it bootstraps. But no, there is no downtime. Cassandra is an always-on database. There aren't any operation in C* that requires downtime -- even upgrades.

There are additional details in How data is distributed across a cluster.

If you haven't done them already, I recommend the DS201 Cassandra Foundations course which explains these concepts in detail at DataStax Academy. Cheers!

10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.