Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started



krishna_r_nutulapati_81343 avatar image
krishna_r_nutulapati_81343 asked Lewisr650 answered

Questions related Partitioner responsibilities

1)Tokens are equally distributed to all nodes, soon after cluster is up and running (Before even any key spaces are created). Murmer3 algorithm is used to create and assign tokens.

Who is assigning these tokens? . Is DataStax assigning these tokens automatically using above algorithm or is there any partitioner in server dealing with it?

  1. Is partitioner a software , part of the application connectivity driver, or server side component?

  1. While inserting record in appropriate node, partitioner can generate hash value for the value of partition key(Example EmployeeNumber = ‘123’) , and compare the hash value with token ranges of each and every node to figure out qualified node and send data to appropriate node. Driver is performing above task, if DC Aware policty is set. Does it mean partioner is integrated into driver?
  2. Also Possible that oneNode can have more data , if partition allocated to this node has more records. Partitioner don’t have control on it. Hence partitioner can’t equally distribute data among nodes, but only assign data to responsible node. Is this correct?
questions related partitioner responsibilities
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Lewisr650 avatar image
Lewisr650 answered

That isn't quire right, but close.

The murmur3 partitioner actually identifies which node owns the data based on the value of the partition key. There is an algorithm that identifies which nodes own the tokens when the cluster is first provisioned. After that, you can clearly identify which node will own the data.

The partitioner is a library that identifies which node owns the data based on the value of the partition key being passed down. This library can be swapped out in the cassandra.yaml file if so desired, but the murmur3 partitioner is the preferred partitioner for most applications.

Each DC has a distribution of tokens across the nodes in that DC. The driver is not performing that task, the Coordinator that received the connection for the transaction is performing that function. You can mitigate a single hop by using the drivers TokenAware() method to connect to the coordinator that will also own the first replica of data. However, if you have low cardinality of data you also have the opportunity to overwhelm that node, so it's important to know your data in that scenario. You don't want your app to always connect to the same node as you will over-utilize that node and under-utilize the other nodes. Data modeling plays a key role in data distribution.

One node can have more data than other nodes if the data model defined ends up storing low cardinality data and forcing more data to be stored on one partition over more distributed pattern across multiple partitions. You can learn more about the implications to data modeling here:

10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.