animesh.sharma.pandit0_178934 avatar image
animesh.sharma.pandit0_178934 asked Erick Ramirez edited

How do I choose the primary key when data modeling?

I am trying data modeling with Cassandra and I am confused on what should I choose as my primary key. My table looks like below

CREATE TABLE mykeyspace.mytable ( 
id UUID,
A text,
B text,
C text,
D text,
... other columns

I have introduced an id column in my table and made it as primary key, so that querying with id is faster, as most of my query would be with id.

The problem that I am facing is the set of columns (A,B,C,D) uniquely identifies the data, and whenever a record creation comes with set of columns (A,B,C,D) it should not create a new record and rather return a response with the id of already existing record and suggesting client to use that id for updating the record.

I am generating the id randomly. Below are the approaches that I though to solve the problem

  1. first approach that I though was to hash the 4 columns to generate the id, then it would solve the problem but I skeptical about how the data would be distributed if I start taking the hash for the 4 columns.
  2. second approach that I though of was making a secondary index using (A,B,C,D) columns, here I am bit skeptical about the search using secondary index before insertion.

Which of the above approach for data modeling is more suitable or is there any other approach?

data modeling
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered animesh.sharma.pandit0_178934 commented

@animesh.sharma.pandit0_178934 since A, B, C, D uniquely identify each partition in the table then you should use them as the partition key. In your table definition, it would look like:

CREATE TABLE mykeyspace.mytable (
    PRIMARY KEY ( (A, B, C, D) )

Note that all 4 columns are enclosed in a separate bracket to mark all of them as the full partition key. Cheers!

1 comment Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

animesh.sharma.pandit0_178934 avatar image animesh.sharma.pandit0_178934 commented ·

Each of them is text and some of them can be long and can be updated, do you still think using all 4 of them as the partition key is good idea, since all client querying needs to have all this 4 field present, don't you think having an id would simplify querying?

0 Likes 0 ·