question

nag9sri_139755 avatar image
nag9sri_139755 asked smadhavan commented

Will data be stored on the same node if 2 tables have the same partition key?

I had two following tables ( taken from Cassandra Definitve Guide , https://gist.github.com/jeffreyscarpenter/761ddcd1c125dfb194dc02d753d31733 } - What is guaranteed with respect to the folloowing tables assuming they had the same partition key ?

1. Can we safely assume the data for both the tables present in the same node as long as the partition key is same ? as both tables contain same partition key.

2. Ok , and as tables are different from each other , will they be stored in different partitions or same partition in the "same" node

CREATE TABLE hotel.pois_by_hotel (
    poi_name text,
    hotel_id text,
    description text,
    PRIMARY KEY ((hotel_id), poi_name)
) WITH comment = 'Q3. Find pois near a hotel';
CREATE TABLE hotel.available_rooms_by_hotel_date (
    hotel_id text,
    date date,
    room_number smallint,
    is_available boolean,
    PRIMARY KEY ((hotel_id), date, room_number)
) WITH comment = 'Q4. Find available rooms by hotel / date';


cassandra
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

alex.ott avatar image
alex.ott answered smadhavan commented

1. if both tables have the same partition key, then the same value will be mapped into the same token. If tables are in the same keyspace, then yes - they will be on the same node(s). If they are in the different keyspaces, then there could be a partial overlap, if replication factor is different, for example, one keyspace has higher RF.

2. Each table will have its own set of the files on disk, so although they have the same "logical partitions", on disk they are in different files. You can always look into data files, something like, /var/lib/cassandra/data/<keyspace>/<table>-<table-uuid>/

3 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

nag9sri_139755 avatar image nag9sri_139755 commented ·

So, does it mean, the combination of key space and partition is what defines tables allocation to the node ( however, there could be a overlap based on RF) ?

0 Likes 0 ·
alex.ott avatar image alex.ott nag9sri_139755 commented ·

partition key value is used to calculate the token value. Token value belongs to token range that is mapped to specific host (primary replica). If keyspace has RF > 1, then other hosts also could be used to store replicas. So it's always guaranteed that all tables will have the same primary replica & stored on the same host if their keyspaces are replicated to specific DC. If we have one keyspace with RF=2 & another with RF=5, then replicas for first will be on nodes 1,2 (just example), and for another on 1,2,3,4,5 - so there is some overlap, but not complete

1 Like 1 ·
smadhavan avatar image smadhavan ♦ nag9sri_139755 commented ·

@nag9sri_139755, you might also want to read the below resources for better understanding,

0 Likes 0 ·