aravinth_chakravarthyr_173918 avatar image
aravinth_chakravarthyr_173918 asked Erick Ramirez edited

How is data distributed and replicated in a cluster?

Based on partition range given to each node, A data which is inserted will move to respective node based on the partition key . Based on what the replicates of the data are moved to other nodes .For Example if i have 6 node cluster with test keyspace Replication factor 3 and i am inserting a record in test.test table which is stored in node 1 based on partition , where does the other two replicas are stored in and what is the factor that decides the replicas to store in the particular node . If it is based on replica placement strategy please let us know how it works in detail, or share a document which deals with this.

10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered

@aravinth_chakravarthyr_173918 The partitioner will always determine the placement of the partition in the ring. Once that happens, the replicas are always placed in the adjacent nodes in the ring.

For example, consider a cluster with nodes A to F where:

  • replication factor of 3,
  • B is to the right of A in the ring,
  • C to the right of B,
  • D to the right of C, and so on.

If for a given partition, the partition key's token value is determined to be on node C which means the first replica (copy) will be placed on this node. The second replica is placed on the next node D and the third replica placed next to it which is E.

It gets a bit more complicated if there were racks in the topology but in brief, C* will try to place a copy (replica) on another rack meaning that C* will keep going around the ring until the "next node" is in another rack.

For more info, see Data distribution and replication. Cheers!

10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.