In multi Datacenter Cassandra Cluster each datacenter will have own token range?
Or all nodes across different datacenter share same token range?
In multi Datacenter Cassandra Cluster each datacenter will have own token range?
Or all nodes across different datacenter share same token range?
Generally speaking, each data centre is the full range of tokens that wraps around itself into a ring. In the case of the default and recommended Murmur3Partitioner
, the possible [hash] token values ranges from -263 to 263-1. For a cluster with 2 DCs, there are 2 rings each with the full token range.
However, this is a simplistic view and the complete answer to your question is a lot more nuanced.
The more complete answer is -- it depends on the replication strategy defined for a keyspace.
In the case of these system keyspaces, all data is stored on each node locally regardless of the token range(s) owned by the nodes:
CREATE KEYSPACE system_schema WITH replication = {'class': 'LocalStrategy'} AND durable_writes = true; CREATE KEYSPACE system WITH replication = {'class': 'LocalStrategy'} AND durable_writes = true;
Data in these keyspaces are not partitioned so token range ownership does not apply.
This replication strategy does not take into account the topology of the cluster so the location of the nodes don't matter. In a multi-DC cluster, all nodes are "placed" into one giant ring so behaves like a single-DC cluster.
Effectively, keyspaces configured with this strategy combine their token ownership to form one full token range (one ring).
By definition, this strategy takes into account the nodes' network topology and places each node into the relevant rings based on their configured DC.
Keyspaces have a full copy of data (full token range) in each DC they are replicated to.
To answer your questions:
NetworkTopologyStrategy
.SimpleStrategy
.LocalStrategy
.Cheers!
This is a really interesting question, and I had to think it through quite a bit before writing this answer.
The short answer is that all datacenters within a cluster can cover the entire token range. The token range is not divided between the datacenters in a way that would mean that only the data for certain tokens can be stored in a particular datacenter. Let's not forget that the tokens are derived from the partition key, and the partition key is user data, and each datacenter needs to be able to store data for any kind of data that the user want to store.
However, when it comes to only the tokens, no two nodes in the entire cluster (and that is all datacenters combined) own the same token. You can think of it as a big global token range that is distributed across all nodes in the cluster.
If two nodes owned the same token, this would lead to a token collision. (I did some tests with two datacenters where I configured the same initial token for each node, and the nodes never discovered each other due to the token collision. )
So, it is possible that the hashing algorithm resolves to a token that is owned by a node in datacenter1, but only datacenter2 has data for this keyspace (according to the replication strategy). The token calculated by the hashing algorithm serves only as the starting point for the determination of the nodes that have replicas for a keyspace, table and partition key. This is where the replication strategy comes in, and this is why it is essential that NetworkTopologyStrategy is used when we have more than one datacenter, as this replication strategy is datacenter aware.
When assigning tokens manually to nodes, the tokens for a second datacenter need to be offset to avoid token collisions.
This document for example describes the token assignment in this case:
It's unfortunate that I cannot accept more than one answer because your response should go along with @Erick Ramirez's as well IMO.
8 People are following this question.
How is data distributed and replicated in a cluster?
Is it possible to define a custom partitioner for 1 table in the cluster?
Isn't data supposed to be written to where the partition key is hashed?
How does replication factor widen the token range and what will cause it to overlap?
What replication factor is required for a consistency level of QUORUM?
DataStax Enterprise is powered by the best distribution of Apache Cassandra ™
© 2023 DataStax, Titan, and TitanDB are registered trademarks of DataStax, Inc. and its subsidiaries in the United States and/or other countries.
Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries.
Privacy Policy Terms of Use