Primary key Vs Clustering column vs Partition key vs Partition token, Need some more help.
The relation between them.
Bringing together the Apache Cassandra experts from the community and DataStax.
Want to learn? Have a question? Want to share your expertise? You are in the right place!
Not sure where to begin? Getting Started
A table's primary key is one or more columns that uniquely identify:
The primary key's first element is the partition key. It is used to determine which node holds a given table's row(s) by hashing its value into a partition token (done by the default
Murmur3Partitioner which uses the MurmurHash algorithm).
A simple primary key uses just one column as the partition key. When there are 2 or more columns enclosed in parenthesis at the start of a primary key, it is known as a composite partition key.
For tables with a compound primary key, the primary key has both a partition key and one or more clustering columns.
In this table, there is only one column in the partition key (
CREATE TABLE users ( username text, realname text, email text, PRIMARY KEY (username) )
COMPOSITE PARTITION KEY
CREATE TABLE videos ( title text, year int, description text, ... PRIMARY KEY ((title, year)) )
The video title on its own is not unique. For example, the 1950 release of Superman with Kirk Alyn in the leading role is not the same Superman movie released in 1978 starring Christopher Reeve so we need to append the year next to the title to make the partition key unique -- "
Superman:1950" and "
This table has a single-column partition key (
userid) and a clustering column (
CREATE TABLE user_emails ( username text, email_type text, email_address text ... PRIMARY KEY (userid, email_type) )
A user can have multiple emails -- personal, work, etc.
This table has a composite partition key (
(title, year)) and 2 clustering columns (
CREATE TABLE comments_by_video_title ( title text, year int, commented_at timestamp, comment text, username text, PRIMARY KEY ((title, year), commented_at, comment) ) WITH CLUSTERING ORDER BY (commented_at DESC)
Comments are sorted with most recent as the first row. In this case, we can retrieve the 10 most recent comments about a video with the following query:
SELECT comment FROM comments_by_video_title \ WHERE title = 'Superman' AND year = 1978;
Primary key: uniquely identify a row. A primary key will definitely have partition key and can optionally have clustering column
Partition key: Partition key defines where your data will locate on a node
Clustering column: defines how the data is sorted within a partition
Partition token: It is the output number of a hash function which takes partition key value as an input. Each node has a token range and hence based on partition key (or token) data will go to the node which owns it.
Example: PRIMARY KEY( city, date, id ) ... in this primary key, city is the partition key, and date and id are clustering keys. within a city, data will be sorted based on date and then id.
Let's assume a Cassandra node A which owns range (0-50), and token of London is hash(London) = 10 .....so all the data which has the city as London will go to node A.
Hope it helps.
6 People are following this question.