question

milind.jivtode_158531 avatar image
milind.jivtode_158531 asked Erick Ramirez edited

Can I use a column whose value can be updated in the partition key?

A table which contains a particular field whose value can be updated ... and the same field need to be used in where clause without using "allow filtering"

EDIT - I am managing this by creating a secondary index on that column in question. This column is not part of the PK (partition + clustering).

Using the partition-key and the secondary index, I do the query without using "allow filtering".

EDIT - My scenario is different.

CREATE TABLE users (
    city text,
    name text,
    age int,
    gender text,
    favourite_colour text
    PRIMARY KEY (city, name)
)

1. city+name is unique

2. favourite_colour is updated periodically using :

UPDATE users SET favourite_colour='XXX' WHERE city='CCC' AND name='John'

3. I want to query --- find all the people from LA whose fav color is RED.

4. How about PRIMARY KEY (city, name, favourite_colour) ?

data modeling
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

vikram.singh.chouhan_187371 avatar image
vikram.singh.chouhan_187371 answered vikram.singh.chouhan_187371 commented

Hi @milind.jivtode_158531: This is not possible in Cassandra or any hashing based system/database. "field need to be used in where clause without using allow filtering" is only possible if the field is part of the primary key in the table.
Actually, when you insert a record in Cassandra it evaluates the hash key for it based on the primary keys designed during table creation (which can be used in where clause in cql queries). Hence changing one or more primary key mean inserting a new record in the database corresponding to a new hash value generated for it. Basically, this is not the right use case for which Cassandra has been designed.

If you have such use cases I would suggest not to use Cassandra. otherwise, This is a good article to start understanding data modelling basics in Cassandra.

5 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

milind.jivtode_158531 avatar image milind.jivtode_158531 commented ·

Thanks @vikram.singh.chouhan_187371:

I am managing this by creating a secondary index on that column in question. This column is not part of the PK (partition + clustering).

Using the partition-key and the secondary index, I do the query without using "allow filtering".

Hope I will not get in trouble.


0 Likes 0 ·
vikram.singh.chouhan_187371 avatar image vikram.singh.chouhan_187371 milind.jivtode_158531 commented ·

Cassandra community always recommended not to use secondary Indexes. You can check my other answer to your question to get more insight.

0 Likes 0 ·
milind.jivtode_158531 avatar image milind.jivtode_158531 commented ·

Thanks @vikram.singh.chouhan_187371 @Erick Ramirez : I really appreciate your help.

I am probably not able to ask my question properly.

My issue is that I have a field (favourite_colour) that application keeps on updating. Also I want to add it part of my query. I am looking for a modeling solution.

I don't want to add it to PK (partition+clustering) because I can't update it without DELETE and INSERT which would result in TombStone.

Only option I see is to create sec. Index on favourite_colour column but that is not the best option from performance point of view.

0 Likes 0 ·
vikram.singh.chouhan_187371 avatar image vikram.singh.chouhan_187371 milind.jivtode_158531 commented ·

@milind.jivtode_158531: Thanks !!! Cassandra is very helpful in solving use cases with a huge amount of write operation with best available throughput time in industry ~2ms. For your problem, it is better to create separate tables to solve separate use cases. Cheers !!

0 Likes 0 ·
Erick Ramirez avatar image Erick Ramirez ♦♦ milind.jivtode_158531 commented ·

@milind.jivtode_158531 A secondary index on a column that gets updated frequently is not a good idea either because tombstones get created which you already know you need to avoid.

The solution is what I provided in my answer -- you need to create a separate table where the favourite_colour is the partition key if you want to query it. Cheers!

0 Likes 0 ·
Erick Ramirez avatar image
Erick Ramirez answered vikram.singh.chouhan_187371 commented

@milind.jivtode_158531 The partition key uniquely identifies your records and where they are placed around the ring in the cluster. This means that they have to be globally unique in your cluster and changing their values doesn't make sense.

Consider this table of users which records their age, gender and favourite colour:

CREATE TABLE users (
    name text,
    age int,
    gender text,
    favourite_colour text
    PRIMARY KEY (name)
)

For simplicity, let's say it contains these records:

name age gender favourite colour
Alice 20 F red
Bob 33 M orange
Carol 24 F yellow

In the table, the partition key is the column name. It doesn't make sense to change a user's name because that's what identifies them in the table. If you need to change someone's name, you're really inserting a new record in the DB and need to delete the existing record.

I recommend that since you're new to Cassandra you should do the free courses at DataStax Academy, in particular the DS220 course on Data Modelling. This 9-minute video from DS220 on Partitioning and Storage Structure would help you understand the concepts in more detail. Cheers!

UPDATE - In your table of users where the primary key is (city, name), you won't be able to query (filter) based on favourite_colour. You can only specify columns in the WHERE clause if:

  • it is part of the partition key
  • it is a clustering column (but must preceded by other clustering columns as defined in the primary key)
  • it is indexed (with CREATE INDEX)

If the primary key is (city, name, favourite_colour), you still will not be able to query for people from LA whose favourite colour is red because you must specify all columns in the primary key (as noted in the second bullet point above), i.e. you can't avoid "skipping" the name column.

The most important rule of data modelling is that you need to create a table for each application query. This means you need a [duplicate] table whose primary key is (city, favourite_colour) so you end up with partitions of cities which have clusters of colours of rows of people. Logically, it looks like this:

partition: 'LA'
- clustering column: 'red'
  - name: 'Alice', age: 20, gender: 'F'
  - name: 'Bob', age: 33, gender: 'M'
  - name: 'Carol', age 24, gender: 'F'

When you query this table with:

SELECT name, age, gender FROM users_by_city \
  WHERE city = 'LA' AND favourite_colour = 'red'

you get:

 name  | age | gender
-------+--------------
 Alice |  20 |      F
   Bob |  33 |      M
 Carol |  24 |      F

I'm going to predict that you'll have lots of follow-up questions so I'll reiterate again that it's very important that you at least have a look at some of the videos from the DS220 course I previously mentioned. It will explain a lot of what you don't understand right now being a new Cassandra user. Cheers!

1 comment Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Erick Ramirez avatar image Erick Ramirez ♦♦ commented ·

@milind.jivtode_158531 I've updated my answer based on your follow up questions. Cheers!

0 Likes 0 ·