Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

penky28_147901 avatar image
penky28_147901 asked ·

Why is there uneven data distribution in my 3-node cluster with RF=3?

Hi Guys,

I have cluster with 3 nodes. Each key space has replication factor 3. From node tool status, I do see node 1(12.2.20.10), node 3(12.2.10.12) has almost same size but node 2(12.2.20.11) has less data than other nodes.

I am not able to understand why node 2 has less data than other nodes. Any suggestions

Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 12.2.20.10 106.94 GiB 8 100.0% f9fe8dd2-d02e-4ec7-9af5-ba341149104c rack1
UN 12.2.20.12 105.07 GiB 8 100.0% 24b02f44-abc9-40e0-8d62-434a8c377106 rack1
UN 12.2.20.11 96.01 GiB 8 100.0% f9da1cc0-8c34-41b9-8f3d-dace28e3c09f rack1
cassandra
1 comment
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

num_token is 8

0 Likes 0 ·

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered ·

Cassandra nodes don't have anything shared between them and they all operate independently. Operations such as flushing memtables to disk and compactions on a node are not identical to those on another node.

As a result, the distribution of data fragments in SSTables on a node will be completely different from the SSTables of another node so the data size (load) will be off by some margin.

As the density (data size) of nodes increase, the amount of variance will decrease. Cheers!

2 comments Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Thanks Erick. How much variance is considered as acceptable. when should I start looking into further.. is there a hard limit. Currently load difference between largest size node and smallest size node is almost 10G which is of > 10% variance.

0 Likes 0 ·

There is no "acceptable" variance since it depends from one cluster to another. What you should be concerned about is whether the nodes are dropping mutations when they are overloaded. Cheers!

0 Likes 0 ·