question

Sergio avatar image
Sergio asked Erick Ramirez edited

Cassandra 3.11.4 Unbalanced cluster problem

Hello guys!

Given the following `nodetool status` how can I solve this unbalanced cluster problem?


Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address      Load       Tokens       Owns    Host ID                               Rack
UN  10.1.20.49   860.9 GiB  256          ?       be5a0193-56e7-4d42-8cc8-5d2141ab4872  us-east-1a
UN  10.1.30.112  299.56 GiB  256          ?       e5108a8e-cc2f-4914-a86e-fccf770e3f0f  us-east-1b
UN  10.1.19.163  741.8 GiB  256          ?       3c2efdda-8dd4-4f08-b991-9aff062a5388  us-east-1a
UN  10.1.26.181  373.72 GiB  256          ?       0a8f07ba-a129-42b0-b73a-df649bd076ef  us-east-1b
DN  10.1.17.213  572.24 GiB  256          ?       71563e86-b2ae-4d2c-91c5-49aa08386f67  us-east-1a
UN  10.1.31.60   499.07 GiB  256          ?       3647fcca-688a-4851-ab15-df36819910f4  us-east-1b
UN  10.1.25.206  489.81 GiB  256          ?       f43532ad-7d2e-4480-a9ce-2529b47f823d  us-east-1b


1) How do I find how many partitions there are per node?
2) How can I identify why all my records end up on that node?

I tried to run nodetool cleanup but it gave me a SocketTimeoutException after a while
I also got the following warning

WARN  18:19:25 Only 61.739GiB free across all data volumes. Consider adding more capacity to your cluster or removing obsolete snapshots WARN  18:19:25 index_interval has been deprecated and should be removed from cassandra.yaml



Thanks

virtual nodes
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered Erick Ramirez commented

@Sergio you didn't specify what the replication factor is for the application keyspace(s). If the app keyspace has a replication factor of 1 then there is only 1 copy of the data and it will be randomly distributed around the ring so each node will hold a random amount of data.

Assuming that the app keyspace has a replication factor of 3, the load of each node in your cluster will be unbalanced because the topology is unbalanced: 3 nodes in rack us-east-1a vs 4 nodes in us-east-1b. This isn't necessarily the only reason but it is mostly the reason.

If you want the load to be balanced across all nodes, we recommend that you have a balanced topology where:

  • the number of racks is a multiple of the replication factor, i.e. 3, 6, 9 and so on
  • the number of nodes in each rack is identical.
How do I find how many partitions there are per node?

Use the "number of partitions" statistic from the nodetool tablestats output for all tables to get an estimate on each node.

How can I identify why all my records end up on that node?

Partitions are placed on by the partitioner based on the token equivalent of the hashed partition key. For example, let's sat node A owned token range 0-1000. If partition key "John" had a token value of 567 then it will be stored on node A (token 567 is within the range 0-1000). It would be too long for me to try and explain it here so let me refer you to Consistent hashing and Partitioners for details. Cheers!

5 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Sergio avatar image Sergio commented ·

yes it is a replication factor of 3. Should I add an extra node on us-east-1a?

0 Likes 0 ·
Erick Ramirez avatar image Erick Ramirez ♦♦ Sergio commented ·

@Sergio that would still be unbalanced because it will only have 2 racks --

...we recommend that you have a balanced topology where:
- the number of racks is a multiple of the replication factor, i.e. 3, 6, 9
1 Like 1 ·
Sergio avatar image Sergio commented ·

oh, the number of racks should be a multiple of the replication factor.
I am thinking if I keep only one rack I would not solve the problem?
Could I stop the node in rolling fashing and change the rack and restart?
I would keep 1 rack and 9 nodes to be fully balanced make it sense?
Or maybe I can have 3 racks with 3 nodes per rack.
I have also the same keyspace replicated on another datacenter with the same configuration.
I use one Datacenter to write and the other ones for reads. What's the best approach?

0 Likes 0 ·
Sergio avatar image Sergio commented ·

Thanks, @Erick Ramirez. Yes, it Replication Factor = 3.

Do you think that there is no way to keep the number of nodes not a multiple of 3 and however don't get so much discrepancy in terms of Load for each node?
Should I change the number of Vtokens assigned per node and try to restart the Cassandra node if I want to rebalance without spinning a new node?

Thank you

0 Likes 0 ·
Erick Ramirez avatar image Erick Ramirez ♦♦ Sergio commented ·

I didn't realise you posted updates to this question so apologies for the really late response.

You can choose to have whatever number of nodes in each rack but you can't expect the data distribution to be balanced. That's just maths. :)

You cannot change the number of tokens allocated to a node when it is already part of the cluster. You have to decommission it, wipe its contents clean, then add it back to the cluster as if it were a brand new server. Cheers!

0 Likes 0 ·