question

krishna_r_nutulapati_81343 avatar image
krishna_r_nutulapati_81343 asked Erick Ramirez edited

Multi DC Environment -coordinator role meeting CL level of quorum.

In Multi DC Environment, for write request, if CL level is quorum, will coordinator node update other DC Nodes as well , till CL reaches quorum OR coordinator node hand over data to any specific node of other DC. If coordinator node hand over data to any node of other DC, then how will coordinator node respond to client, immediately after CL reaches quorum. (How will any node of other DC respond to original coordinator immediately after CL at cluster level is reached).

consistency level
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

ben.krug_85176 avatar image
ben.krug_85176 answered

With a CL of QUORUM, the coordinator will attempt to send the mutation to all replicas, and wait for an ack from a quorum, whether the cluster has one DC or multiple. If it doesn't get the acks soon enough, it will send a timeout to the client. (As in any distributed update, a timeout could mean your update was applied at CL, or it wasn't, but the coordinator can't say whether it was.)

Of course, this can lead to timeouts if there's much inter-DC latency, which is why there's also a LOCAL_QUORUM choice.

Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Lewisr650 avatar image
Lewisr650 answered

CL of Quorum is a global consistency check. So, if you have 2 DCs and each DC has a Replication Factor of 3, that means that Quorum requires 4 acknowledgements by the nodes that the transaction has been received. If you constrain that to the local DC using Local_Quorum then the coordinator only requires 2 acknowledgements from the nodes in the local DC. All replication is a Coordinator to Coordinator hand off and each coordinator will validate consistency for the DC based on the level of consistency level checking you provide. The coordinator maintaining the connection for the transaction will wait until he receives acknowledgments from the number of nodes identified by the consistency level before moving forward with any response to the client request.

Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

krishna_r_nutulapati_81343 avatar image
krishna_r_nutulapati_81343 answered krishna_r_nutulapati_81343 published

Thanks for your prompt response.

Understood that, original coordinator maintains connection, till it receives response from other DC's till CL level reaches quorum. However. Consider following scenario.

DC1 = 4 Nodes, 2 replicas

DC2 = 4 Nodes, 2 replicas

Total replicas in cluster = 4 CL = 3 (QUORAM)

Request is sent to DC1 - NODE1.

Option 1 ) Does Node1 Sends request to 2 replicas of DC1 as well as 2 replicas of DC2 and keep connection open till it receive and validate at-least 3 responses (Very straight forward and no confusion here)

OR

Option 2) Does Node1(Original coordinator) sends request to 2 replicas of DC1 and any one node of DC2, so that, that node of DC2 acts a coordinator for DC2 and send request to qualified replicas of DC2. (Document is closely matching with this possibility).

If option 2 is correct, then when will coordinator node of DC2 will respond to original coordinator node of DC1?

If original coordinator of DC1 expecting response from only one replica of DC2, but if coordinator of DC2 wait till it get responses from 2 replicas of DC2 and the respond to DC1, is it not increasing latency? (I believe this is not the case, but just trying understand actual behavior). How does DC2 coordinator . node know, when to respond to DC1 coordinator node . In other words, How does DC2 Coordinator node know, how many responses DC1 coordinator node is expecting . (If DC1 replicas respond fast, it may have little dependency on DC2 other wise more).

Appreciate your response.


Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.