Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

rashokkumartce_193569 avatar image
rashokkumartce_193569 asked ·

Can we choose a primary key in which both the partition key and clustering column defines uniqueness in the table?

If i have a Primary key statement like Primary Key((year),moviename) for a table. will i be able to insert 2014, Interstellar in a row and 2017, Interstellar in another row. Same movie name on different years

Should the clustering key/keys defines uniqueness or we can rely on year which is a partition key in addition to the clustering column to define uniqueness

data modeling
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered ·

Definitions

In Cassandra, a primary key uniquely identifies a row in a table whereas a partition key uniquely identifies a partition and its location the ring (which node stores it).

If you're interested, I go into this into a bit more detail in question #6171.

Partitions

To answer your question, year is not a good choice for a partition key because you potentially end up with hundreds or even thousands of rows of movies in each partition. As an example, the-numbers.com lists 1,525 movies in 2014 and 3,611 movies in 2017 [1].

The movie title is also not a good choice because it is not universally unique. A quick internet search returns these movies with the same title but were released in different years [2]:

  • Bad Boys (1983) and Bad Boys (1995)
  • The Avengers (1998) and The Avengers (2012)
  • Twilight (1998) and Twilight (2008)

Recommendation

If your goal is uniqueness, then I suggest you model your table as follows:

CREATE TABLE movies (
    movie_title text,
    year int,
    ...
    PRIMARY KEY ((movie_title, year))
)

In this case, the partition key is (movie_title, year).

In this model, the combination of movie title and year make partitions unique. For example, Bad Boys:1983 and Bad Boys:1995 are universally unique. Similarly, The Avengers:1998 and The Avengers:2012 are 2 distinct partitions in the table. Cheers!

[1] https://www.the-numbers.com/movies/#tab=year

[2] https://www.ifc.com/2015/07/movies-that-share-the-same-name-and-not-much-else

Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.