DataStax Academy FAQ

DataStax Academy migrated to a new learning management system (LMS) in July 2020. We are also moving to a new Cassandra Certification process so there are changes to exam bookings, voucher system and issuing of certificates.

Check out the Academy FAQ pages for answers to your questions:


question

bhupalreddy1992_162660 avatar image
bhupalreddy1992_162660 asked ·

Does each datacenter have its own token range?

In multi Datacenter Cassandra Cluster each datacenter will have own token range?

Or all nodes across different datacenter share same token range?

cassandrareplicationpartitionertoken ranges
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Erick Ramirez avatar image
Erick Ramirez answered ·

A Cassandra ring

Generally speaking, each data centre is the full range of tokens that wraps around itself into a ring. In the case of the default and recommended Murmur3Partitioner, the possible [hash] token values ranges from -263 to 263-1. For a cluster with 2 DCs, there are 2 rings each with the full token range.

However, this is a simplistic view and the complete answer to your question is a lot more nuanced.

Data distribution

The more complete answer is -- it depends on the replication strategy defined for a keyspace.

LOCAL STRATEGY

In the case of these system keyspaces, all data is stored on each node locally regardless of the token range(s) owned by the nodes:

CREATE KEYSPACE system_schema WITH replication = {'class': 'LocalStrategy'}  AND durable_writes = true;
CREATE KEYSPACE system WITH replication = {'class': 'LocalStrategy'}  AND durable_writes = true;

Data in these keyspaces are not partitioned so token range ownership does not apply.

SIMPLE STRATEGY

This replication strategy does not take into account the topology of the cluster so the location of the nodes don't matter. In a multi-DC cluster, all nodes are "placed" into one giant ring so behaves like a single-DC cluster.

Effectively, keyspaces configured with this strategy combine their token ownership to form one full token range (one ring).

NETWORK TOPOLOGY STRATEGY

By definition, this strategy takes into account the nodes' network topology and places each node into the relevant rings based on their configured DC.

Keyspaces have a full copy of data (full token range) in each DC they are replicated to.

To answer your questions:

  • Each Cassandra data centre has its own copy of the full token range for keyspaces configured with NetworkTopologyStrategy.
  • Nodes across all DCs belong to one "shared" full token range for keyspaces configured with SimpleStrategy.
  • Token ranges don't apply for keyspaces configured with LocalStrategy.

Cheers!

Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

bettina.swynnerton avatar image
bettina.swynnerton answered ·

This is a really interesting question, and I had to think it through quite a bit before writing this answer.

The short answer is that all datacenters within a cluster can cover the entire token range. The token range is not divided between the datacenters in a way that would mean that only the data for certain tokens can be stored in a particular datacenter. Let's not forget that the tokens are derived from the partition key, and the partition key is user data, and each datacenter needs to be able to store data for any kind of data that the user want to store.

However, when it comes to only the tokens, no two nodes in the entire cluster (and that is all datacenters combined) own the same token. You can think of it as a big global token range that is distributed across all nodes in the cluster.

If two nodes owned the same token, this would lead to a token collision. (I did some tests with two datacenters where I configured the same initial token for each node, and the nodes never discovered each other due to the token collision. )

So, it is possible that the hashing algorithm resolves to a token that is owned by a node in datacenter1, but only datacenter2 has data for this keyspace (according to the replication strategy). The token calculated by the hashing algorithm serves only as the starting point for the determination of the nodes that have replicas for a keyspace, table and partition key. This is where the replication strategy comes in, and this is why it is essential that NetworkTopologyStrategy is used when we have more than one datacenter, as this replication strategy is datacenter aware.

When assigning tokens manually to nodes, the tokens for a second datacenter need to be offset to avoid token collisions.

This document for example describes the token assignment in this case:

https://docs.datastax.com/en/dse/5.1/dse-admin/datastax_enterprise/production/calcTokens.html#Calculatingtokensforamultipledatacentercluster

1 comment Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

It's unfortunate that I cannot accept more than one answer because your response should go along with @Erick Ramirez's as well IMO.

1 Like 1 · ·