Primary key Vs Clustering column vs Partition key vs Partition token, Need some more help.
The relation between them.
Primary key Vs Clustering column vs Partition key vs Partition token, Need some more help.
The relation between them.
Just letting you know that we've had a high volume of questions in the last 48 hours and we've been caught up with the FREE Cassandra Workshop Series so our responses are delayed but we will get to them in the next few hours. Cheers!
A table's primary key is one or more columns that uniquely identify:
The primary key's first element is the partition key. It is used to determine which node holds a given table's row(s) by hashing its value into a partition token (done by the default Murmur3Partitioner
which uses the MurmurHash algorithm).
A simple primary key uses just one column as the partition key. When there are 2 or more columns enclosed in parenthesis at the start of a primary key, it is known as a composite partition key.
For tables with a compound primary key, the primary key has both a partition key and one or more clustering columns.
In this table, there is only one column in the partition key (username
).
CREATE TABLE users ( username text, realname text, email text, PRIMARY KEY (username) )
COMPOSITE PARTITION KEY
In this table, there are 2 columns that form the partition key enclosed in parenthesis ((title, year)
):
CREATE TABLE videos ( title text, year int, description text, ... PRIMARY KEY ((title, year)) )
The video title on its own is not unique. For example, the 1950 release of Superman with Kirk Alyn in the leading role is not the same Superman movie released in 1978 starring Christopher Reeve so we need to append the year next to the title to make the partition key unique -- "Superman:1950
" and "Superman:1978
".
This table has a single-column partition key (userid
) and a clustering column (email
):
CREATE TABLE user_emails ( username text, email_type text, email_address text ... PRIMARY KEY (userid, email_type) )
A user can have multiple emails -- personal, work, etc.
This table has a composite partition key ((title, year)
) and 2 clustering columns (commented_at
and comment
):
CREATE TABLE comments_by_video_title ( title text, year int, commented_at timestamp, comment text, username text, PRIMARY KEY ((title, year), commented_at, comment) ) WITH CLUSTERING ORDER BY (commented_at DESC)
Comments are sorted with most recent as the first row. In this case, we can retrieve the 10 most recent comments about a video with the following query:
SELECT comment FROM comments_by_video_title \ WHERE title = 'Superman' AND year = 1978;
Cheers!
Primary key: uniquely identify a row. A primary key will definitely have partition key and can optionally have clustering column
Partition key: Partition key defines where your data will locate on a node
Clustering column: defines how the data is sorted within a partition
Partition token: It is the output number of a hash function which takes partition key value as an input. Each node has a token range and hence based on partition key (or token) data will go to the node which owns it.
Example: PRIMARY KEY( city, date, id ) ... in this primary key, city is the partition key, and date and id are clustering keys. within a city, data will be sorted based on date and then id.
Let's assume a Cassandra node A which owns range (0-50), and token of London is hash(London) = 10 .....so all the data which has the city as London will go to node A.
Hope it helps.
6 People are following this question.
DataStax Enterprise is powered by the best distribution of Apache Cassandra ™
© 2023 DataStax, Titan, and TitanDB are registered trademarks of DataStax, Inc. and its subsidiaries in the United States and/or other countries.
Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries.
Privacy Policy Terms of Use