Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

amitmund_177224 avatar image
amitmund_177224 asked ·

Primary key vs Clustering column vs Partition key vs Partition token

Primary key Vs Clustering column vs Partition key vs Partition token, Need some more help.

The relation between them.

data modeling
1 comment
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Just letting you know that we've had a high volume of questions in the last 48 hours and we've been caught up with the FREE Cassandra Workshop Series so our responses are delayed but we will get to them in the next few hours. Cheers!

0 Likes 0 · ·
Erick Ramirez avatar image
Erick Ramirez answered ·

Definitions

A table's primary key is one or more columns that uniquely identify:

  1. the location of the data in a cluster of nodes, and
  2. the order of the stored data.

The primary key's first element is the partition key. It is used to determine which node holds a given table's row(s) by hashing its value into a partition token (done by the default Murmur3Partitioner which uses the MurmurHash algorithm).

A simple primary key uses just one column as the partition key. When there are 2 or more columns enclosed in parenthesis at the start of a primary key, it is known as a composite partition key.

For tables with a compound primary key, the primary key has both a partition key and one or more clustering columns.

Examples

SIMPLE PRIMARY KEY

In this table, there is only one column in the partition key (username).

CREATE TABLE users (
    username text,
    realname text,
    email text,
    PRIMARY KEY (username)
)

COMPOSITE PARTITION KEY

In this table, there are 2 columns that form the partition key enclosed in parenthesis ((title, year)):

CREATE TABLE videos (
    title text,
    year int,
    description text,
    ...
    PRIMARY KEY ((title, year))
)

The video title on its own is not unique. For example, the 1950 release of Superman with Kirk Alyn in the leading role is not the same Superman movie released in 1978 starring Christopher Reeve so we need to append the year next to the title to make the partition key unique -- "Superman:1950" and "Superman:1978".

COMPOUND PRIMARY KEY

This table has a single-column partition key (userid) and a clustering column (email):

CREATE TABLE user_emails (
    username text,
    email_type text,
    email_address text
    ...
    PRIMARY KEY (userid, email_type)
)

A user can have multiple emails -- personal, work, etc.

This table has a composite partition key ((title, year)) and 2 clustering columns (commented_at and comment):

CREATE TABLE comments_by_video_title (
    title text,
    year int,
    commented_at timestamp,
    comment text,
    username text,
    PRIMARY KEY ((title, year), commented_at, comment)
) WITH CLUSTERING ORDER BY (commented_at DESC)

Comments are sorted with most recent as the first row. In this case, we can retrieve the 10 most recent comments about a video with the following query:

SELECT comment FROM comments_by_video_title \
    WHERE title = 'Superman'
    AND year = 1978;

Cheers!

Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

laxmikant.hcl_32751 avatar image
laxmikant.hcl_32751 answered ·

Primary key: uniquely identify a row. A primary key will definitely have partition key and can optionally have clustering column

Partition key: Partition key defines where your data will locate on a node

Clustering column: defines how the data is sorted within a partition

Partition token: It is the output number of a hash function which takes partition key value as an input. Each node has a token range and hence based on partition key (or token) data will go to the node which owns it.


Example: PRIMARY KEY( city, date, id ) ... in this primary key, city is the partition key, and date and id are clustering keys. within a city, data will be sorted based on date and then id.

Let's assume a Cassandra node A which owns range (0-50), and token of London is hash(London) = 10 .....so all the data which has the city as London will go to node A.

Hope it helps.

Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.