Is primary key is part of partition key while design at table creation
GOT QUESTIONS from the Workshop? You're in the right place! Post a question here and we'll get you answers.
A primary key uniquely identifies a row.
A composite key is a key formed from multiple columns.
A partition key is the primary lookup to find a set of rows, i.e. a partition.
A clustering key is the part of the primary key that isn't the partition key (and defines the ordering within a partition)
The PRIMARY KEY definition is made up of two parts: The Partition Key and the Clustering Columns. The first part maps to the storage engine row key, while the second is used to group columns in a row.
The primary key consists of two parts - the partition key and the clustering key.
The partition key is used to determine data locality, e.g. where it is stored, and is also needed to query the data. When specified the clustering key is used to order the data within a single partition. The primary key must still be 'primary' e.g. uniquely identify a record.
Patrick has a great write up : https://www.datastax.com/blog/2016/02/most-important-thing-know-cassandra-data-modeling-primary-key
Looks like you have a lot of fans with all the answers here. :)
The primary key must include a partition key and can include (optional) one or more clustering keys. And yes, the primary key must be defined at the time that you create a table because it can not be changed once the table is created.
I've answered a similar question recently (see #6171) so let me add to the list of awesome responses by reposting it here.
A table's primary key is one or more columns that uniquely identify:
The primary key's first element is the partition key. It is used to determine which node holds a given table's row(s) by hashing its value into a partition token (done by the default
Murmur3Partitioner which uses the MurmurHash algorithm).
A simple primary key uses just one column as the partition key. When there are 2 or more columns enclosed in parenthesis at the start of a primary key, it is known as a composite partition key.
For tables with a compound primary key, the primary key has both a partition key and one or more clustering columns.
In this table, there is only one column in the partition key (
CREATE TABLE users ( username text, realname text, email text, PRIMARY KEY (username) )
CREATE TABLE videos ( title text, year int, description text, ... PRIMARY KEY ((title, year)) )
The video title on its own is not unique. For example, the 1950 release of Superman with Kirk Alyn in the leading role is not the same Superman movie released in 1978 starring Christopher Reeve so we need to append the year next to the title to make the partition key unique -- "
Superman:1950" and "
This table has a single-column partition key (
userid) and a clustering column (
CREATE TABLE user_emails ( username text, email_type text, email_address text ... PRIMARY KEY (userid, email_type) )
A user can have multiple emails -- personal, work, etc.
This table has a composite partition key (
(title, year)) and 2 clustering columns (
CREATE TABLE comments_by_video_title ( title text, year int, commented_at timestamp, comment text, username text, PRIMARY KEY ((title, year), commented_at, comment) ) WITH CLUSTERING ORDER BY (commented_at DESC)
Comments are sorted with most recent as the first row. In this case, we can retrieve the 10 most recent comments about a video with the following query:
SELECT comment FROM comments_by_video_title \ WHERE title = 'Superman' \ AND year = 1978;
every Cassandra table definition needs a primary key.
Every primary key needs a partition key. You can add additional clustering keys to the primary key, this is optional. But a partition key is essential. The first field listed in the primary key is the partition key. The partition key is responsible for determining the data locality in your cluster, a key concept for Cassandra as a distributed database.
In its simplest form, with basic primary keys, the partition key is the primary key.
For example here:
CREATE TABLE my_keyspace.table1 ( id text, data text, PRIMARY KEY (id) )
If you add a clustering key:
CREATE TABLE my_keyspace.table2 ( id text, ckey text, data text, PRIMARY KEY (id, ckey) )
This is still a good blog on the topic of primary keys:
I hope this helps to answer your question.
7 People are following this question.