I am trying data modeling with Cassandra and I am confused on what should I choose as my primary key. My table looks like below
CREATE TABLE mykeyspace.mytable ( id UUID, A text, B text, C text, D text, ... other columns PRIMARY KEY(id) );
I have introduced an id column in my table and made it as primary key, so that querying with id is faster, as most of my query would be with id.
The problem that I am facing is the set of columns (A,B,C,D) uniquely identifies the data, and whenever a record creation comes with set of columns (A,B,C,D) it should not create a new record and rather return a response with the id of already existing record and suggesting client to use that id for updating the record.
I am generating the id randomly. Below are the approaches that I though to solve the problem
- first approach that I though was to hash the 4 columns to generate the id, then it would solve the problem but I skeptical about how the data would be distributed if I start taking the hash for the 4 columns.
- second approach that I though of was making a secondary index using (A,B,C,D) columns, here I am bit skeptical about the search using secondary index before insertion.
Which of the above approach for data modeling is more suitable or is there any other approach?