question

teriksson avatar image
teriksson asked Erick Ramirez answered

Why does C* do read repair on both DCs instead of doing DC2 in the background?

We have two DCs with 20+ Cassandra nodes in each, data has RF=3

From time to time we read to quick after a write, so all 3 nodes in the cluster does no have the data just yet, and this triggers a READ REPAIR

Somehow Cassandra is configured to do a READ REPAIR two all DCs, so DC1 and DC2 in my case.

Here something goes wrong occasionally and the READ REPAIR times out

1. How can I figure why that READ REPAIR timed out, which node did not respond, or .. was there some other reason

2. Is it possible to say do READ REPAIR only DC1, and then in the background Cassandra can synch up with DC2

read repair
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered

Read repairs are generally carried out on replicas involved in the read request. This means that it only applies to reads with a consistency level (CL) greater than ONE or LOCAL_ONE since only one replica is involved and there is no digest request sent to other replicas.

For example with a replication factor (RF) of 3 and a read consistency level of LOCAL_QUORUM, 2 replicas in the local DC are contacted by the coordinator. The coordinator (a) requests the data from one replica node, and (b) sends a digest request to another replica -- (c) the request is NOT sent to all replicas.

In your case, the likely scenario is that you are reading with a CL of QUORUM so 4 replicas are required when there are 3 replicas in both DCs. These 4 replicas can be from either DC.

In addition to the above, there are table options which determine whether read repairs are triggered:

  • dclocal_read_repair_chance - probability that a read operation triggers a read repair on replicas in the same DC
  • read_repair_chance - probability that a read operation triggers a read repair on replicas in all DCs

If you have read_repair_chance set to any value other than 0 then this will also cause read repairs to be carried out across DCs.

To respond to your questions directly:

  1. The timeout is almost always caused by nodes being busy or overloaded. If this is the case for your cluster, consider increasing the capacity of your cluster by adding more nodes.
  2. As I've explained above, this is the effect of the consistency level set for the read request.

If you're interested, this topic is discussed in more detail in How read requests are accomplished and Examples of read consistency levels. Cheers!

Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.