We have requirements to create Cassandra cluster with 2 datacenters and each having 2 nodes.
What would be the best replication factor here and why
Can I go ahead with Replication factor 2 for each datacenter
We have requirements to create Cassandra cluster with 2 datacenters and each having 2 nodes.
What would be the best replication factor here and why
Can I go ahead with Replication factor 2 for each datacenter
Hi @somrcha1_186924,
with Cassandra, you replicate data for fault tolerance and reliability.
With only two nodes per datacenter, you don't have much choice: if you want to achieve some resilience against nodes being unresponsive, you should go for a replication factor of 2 for each datacenter.
It then also depends at what consistency you want to read or write your data. If you are reading and writing with local consistency levels higher than ONE you will lose the failure tolerance, as all nodes will need to be up in the local datacenter.
If you need strong local consistency, then you will need to read at local quorum, which is at two in the case of a two-node datacenter, which again means that all your nodes need to be up.
To read more about consistency levels, see here:
https://docs.datastax.com/en/cassandra-oss/3.0/cassandra/dml/dmlConfigConsistency.html
I hope this helps with your question.
Thanks Bettina,
Thanks a lot . It helps.
I have planning a single datacenter configuration in cassandra with 4 nodes.
We have 2 physical data center just 25 miles apart with less than 2 ms latency.we are going to put 2 nodes in each physical data center.
I primarily think about Replication factor 3 and Consistency QUORAM
Please let me know your thought on this.
In case we will face latency issue (because of physically apart datacenter) we would go ahead with 3 nodes each in 2 datacenters configured in Cassandra.
To take Bettina's response further, two nodes per DC is insufficient to support high availability and fault tolerance.
We recommend a minimum of three nodes per DC in production. We recommend using a strong consistency level of LOCAL_QUORUM
which is applicable in most cases. This strong consistency can tolerate an outage to one replica when you have three nodes in each DC.
As Bettina stated, two nodes per DC cannot tolerate an outage and operating on a single node effectively halves the capacity of your cluster so the application's performance is also severely degraded.
In addition, I wouldn't recommend spanning nodes across physical DCs if latency between the physical sites is a concern. It's best to setup each physical DC as a Cassandra DC.
For more info, see the Cassandra doc on Data replication. Cheers!
Hi Erick,
Thanks a lot. It helps!!
2 dc with 3 nodes per DC is my actual proposal but we had to deviate( 2Dc with 2 nodes per DC etc.) due to budget constrain.
I am currently planning for single dc with 4 nodes in single rack.During testing if we suffer with Latency,probably we will consider our original recommendation.
[Follow up question posted in #9110]
Regards
Somraj
5 People are following this question.
Is it possible to replicate keyspaces to different data centers?
Is there a chance of data loss if I remove a non-existent DC from replication?
How does replication factor widen the token range and what will cause it to overlap?
Why is my data not replicated to DC2?
Does changing the replication factor from 1 to 3 affect production operations?
DataStax Enterprise is powered by the best distribution of Apache Cassandra ™
© 2023 DataStax, Titan, and TitanDB are registered trademarks of DataStax, Inc. and its subsidiaries in the United States and/or other countries.
Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries.
Privacy Policy Terms of Use