Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

Gangadhara M.B avatar image
Gangadhara M.B asked ·

Should the number of nodes be multiples of replication factor?

Three months back we have built cluster with 33 nodes(AWS EC2) with each application specific key spaces having replication factor of 03

Total 33 nodes from single region US-WEST-2 multi AZ (11 nodes from RACK usw2a , 11 nodes from RACK usw2b and 11 nodes from RACK usw2c).

If we look at current load (CPU/MEM) and space usage on this cluster it's very low , so decided to decommission 09 nodes the cluster .

Now if we decommission 09 nodes later cluster will be with 24 nodes .

Question is :- Some where I read number of nodes should be in multiples of replication factor is this correct ?.

If above question theory is correct then we will end up with 08 nodes in each RACK that is not in multiples of replication factor .

Is it good/recommended to have 08 nodes in each RACK for production cluster set up which is not in multiples of replication factor ? or should we always follow the theory of having nodes in each RACK in multiples of replication factor for production set up ?

replication
1 comment
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

It is recommended to have number of nodes in each rack >= Replication factor of the cluster for even distribution of the data across the cluster.

I haven't come across document where it has mentioned to have nodes multiple of Replication factor, can you update the link if you have for the same?

0 Likes 0 ·

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered ·

No, that isn't correct. I suspect you are referring to my responses in other questions such as #1128.

What I said was:

... the number of racks [should be] a multiple of the replication factor ...

For clarity, I was talking about achieving a balanced distribution of load (data size) across nodes in the cluster. To achieve this, I recommend a balanced topology where:

  • the number of racks is a multiple of the replication factor (for example, 3, 6 or 9 racks and so on), and
  • the number of nodes in each rack is identical.

In your case, the number of racks (3) is a multiple of the replication factor (also 3).

If you want a balanced distribution of data, you should decommission an equal number of nodes from each rack so take away 3 nodes each from usw2a, usw2b and usw2c. Cheers!

Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.