question

somrcha1_186924 avatar image
somrcha1_186924 asked Erick Ramirez edited

What is the recommended replication factor for a 2-node x 2-DC cluster?

We have requirements to create Cassandra cluster with 2 datacenters and each having 2 nodes.

What would be the best replication factor here and why

Can I go ahead with Replication factor 2 for each datacenter

replication
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

bettina.swynnerton avatar image
bettina.swynnerton answered somrcha1_186924 commented

Hi @somrcha1_186924,

with Cassandra, you replicate data for fault tolerance and reliability.

With only two nodes per datacenter, you don't have much choice: if you want to achieve some resilience against nodes being unresponsive, you should go for a replication factor of 2 for each datacenter.

It then also depends at what consistency you want to read or write your data. If you are reading and writing with local consistency levels higher than ONE you will lose the failure tolerance, as all nodes will need to be up in the local datacenter.

If you need strong local consistency, then you will need to read at local quorum, which is at two in the case of a two-node datacenter, which again means that all your nodes need to be up.

To read more about consistency levels, see here:

https://docs.datastax.com/en/cassandra-oss/3.0/cassandra/dml/dmlConfigConsistency.html

I hope this helps with your question.


1 comment Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

somrcha1_186924 avatar image somrcha1_186924 commented ·

Thanks Bettina,


Thanks a lot . It helps.

I have planning a single datacenter configuration in cassandra with 4 nodes.

We have 2 physical data center just 25 miles apart with less than 2 ms latency.we are going to put 2 nodes in each physical data center.

I primarily think about Replication factor 3 and Consistency QUORAM

Please let me know your thought on this.


In case we will face latency issue (because of physically apart datacenter) we would go ahead with 3 nodes each in 2 datacenters configured in Cassandra.


0 Likes 0 ·
Erick Ramirez avatar image
Erick Ramirez answered Erick Ramirez edited

To take Bettina's response further, two nodes per DC is insufficient to support high availability and fault tolerance.

We recommend a minimum of three nodes per DC in production. We recommend using a strong consistency level of LOCAL_QUORUM which is applicable in most cases. This strong consistency can tolerate an outage to one replica when you have three nodes in each DC.

As Bettina stated, two nodes per DC cannot tolerate an outage and operating on a single node effectively halves the capacity of your cluster so the application's performance is also severely degraded.

In addition, I wouldn't recommend spanning nodes across physical DCs if latency between the physical sites is a concern. It's best to setup each physical DC as a Cassandra DC.

For more info, see the Cassandra doc on Data replication. Cheers!

1 comment Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

somrcha1_186924 avatar image somrcha1_186924 commented ·

Hi Erick,

Thanks a lot. It helps!!

2 dc with 3 nodes per DC is my actual proposal but we had to deviate( 2Dc with 2 nodes per DC etc.) due to budget constrain.

I am currently planning for single dc with 4 nodes in single rack.During testing if we suffer with Latency,probably we will consider our original recommendation.

[Follow up question posted in #9110]

Regards

Somraj

0 Likes 0 ·