question

Gangadhara M.B avatar image
Gangadhara M.B asked Erick Ramirez edited

Should the number of nodes be multiples of replication factor?

Three months back we have built cluster with 33 nodes(AWS EC2) with each application specific key spaces having replication factor of 03

Total 33 nodes from single region US-WEST-2 multi AZ (11 nodes from RACK usw2a , 11 nodes from RACK usw2b and 11 nodes from RACK usw2c).

If we look at current load (CPU/MEM) and space usage on this cluster it's very low , so decided to decommission 09 nodes the cluster .

Now if we decommission 09 nodes later cluster will be with 24 nodes .

Question is :- Some where I read number of nodes should be in multiples of replication factor is this correct ?.

If above question theory is correct then we will end up with 08 nodes in each RACK that is not in multiples of replication factor .

Is it good/recommended to have 08 nodes in each RACK for production cluster set up which is not in multiples of replication factor ? or should we always follow the theory of having nodes in each RACK in multiples of replication factor for production set up ?

replication
1 comment
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

sunilrpawar7_183464 avatar image sunilrpawar7_183464 commented ·

It is recommended to have number of nodes in each rack >= Replication factor of the cluster for even distribution of the data across the cluster.

I haven't come across document where it has mentioned to have nodes multiple of Replication factor, can you update the link if you have for the same?

0 Likes 0 ·

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered Erick Ramirez edited

No, that isn't correct. I suspect you are referring to my responses in other questions such as #1128.

What I said was:

... the number of racks [should be] a multiple of the replication factor ...

For clarity, I was talking about achieving a balanced distribution of load (data size) across nodes in the cluster. To achieve this, I recommend a balanced topology where:

  • the number of racks is a multiple of the replication factor (for example, 3, 6 or 9 racks and so on), and
  • the number of nodes in each rack is identical.

In your case, the number of racks (3) is a multiple of the replication factor (also 3).

If you want a balanced distribution of data, you should decommission an equal number of nodes from each rack so take away 3 nodes each from usw2a, usw2b and usw2c. Cheers!

[UPDATE] I had a conversation about my answer in this post and I wanted to clarify the motivation for my comments.

To be clear:

  • The point I wanted to put across is that you shouldn't have a replication factor of 3 but have 8 racks because you will end up with unbalanced data.
  • We do not recommend configuring lots of Cassandra rack -- 3 at the most for RF of 3.
  • Our recommendation is to configure a single Cassandra rack for simplicity.


Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.