pranali.khanna101994_189965 avatar image
pranali.khanna101994_189965 asked smadhavan edited

Why does nodetool getendpoints return the same IP for 2 different partitions?

Hi I have created a 2 node cluster and following the killrvideo keyspace example . I have created a table just like mentioned :

videos_by_tag with primary key as (tag,added_date,video_id) tag is partition key.

in my caasandra.yaml i have taken all default config like num_tokens in each node as 256.

that means for each node 1 and node 2 token range will be divided into 256 parts each.

now i have some data in the table. lets say 5 rows are present tagged by either 'datastax' or 'cassandra'

so when i execute the command to find out on which node the data lies based on parttion key (tag) value I recieve ip address of the same node for 'cassandra' as well as 'datastax' why ?

nodetool getendpoints killrvideos videos_by_tag 'cassandra'

> returns Node 1 IP

nodetool getendpoints killrvideos videos_by_tag 'datastax'

> returns node 1 ip

why same ip it must be partitoned onto 2 nodes 1 and 2 as per partition key value is different?

10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

smadhavan avatar image
smadhavan answered smadhavan commented

@pranali.khanna101994_189965, in short, is your question: Why do I get the same node 1 as result for getendpoints for both partition key values of cassandra & datastax? Partition keys are determined based on the hash of the values and there is a possibility that both the values fell into the same bucket when the hash value ranges were calculated. Also this command will list all of the replicas for the given partition key. I have a four-node two-DC cluster where videos_by_tag table has an RF of 2 on each DC.

CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 'DC1': '2', 'DC2': '2'}  AND durable_writes = true;

cqlsh> desc table test.videos_by_tag;

CREATE TABLE test.videos_by_tag (
    tag text,
    added_date timestamp,
    videoid uuid,
    name text,
    preview_image_location text,
    tagged_date timestamp,
    userid uuid,
    PRIMARY KEY (tag, added_date, videoid)
) WITH CLUSTERING ORDER BY (added_date ASC, videoid ASC);

which contains the following data,

cqlsh:test> select * from videos_by_tag ;

 tag       | added_date                      | videoid                              | name  | preview_image_location | tagged_date                     | userid
  datastax | 2020-05-31 00:00:00.000000+0000 | ea2eacbc-a4a2-4340-a4e1-ce2e66955eae | name2 |                preview | 2020-05-31 00:00:00.000000+0000 | 7f6156ab-c87c-4e68-9779-cb09329d3698
 cassandra | 2020-05-31 00:00:00.000000+0000 | 947fdf95-babf-4602-a1cc-a7cece6eb1f8 | name1 |                preview | 2020-05-31 00:00:00.000000+0000 | 8d1eeb43-4b78-4052-a4f1-a44113d4622a

(2 rows)

Now, when I invoke the getendpoints for the two different tags, I get the following (notice all 4 nodes listed),

$ nodetool getendpoints test videos_by_tag datastax
$ nodetool getendpoints test videos_by_tag cassandra

10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.