Sergio avatar image
Sergio asked Erick Ramirez edited

Default values for dclocal_read_repair_chance = 0.1 and read_repair_chance = 0 Should I set both to 0?

I have a COLUMN Family in a Keyspace with Replication Factor = 3.
The client reads it with LOCAL_QUORUM. Does this mean that all the reads should kick a read_repair or not?
Are dclocal_read_repair_chance and read_repair_chance parameters meaningful only with LOCAL_ONE or ONE Consistency then?

I have also an application that translates some data in SSTABLE format and prepares the data to be streamed to the cluster with the SSTABLELOADER.
This operation is done to UPDATE the mentioned COLUMN FAMILY.
Can I avoid to repair the COLUMN FAMILY since the clients are using the LOCAL_QUORUM Consistency Level?
If I use LOCAL_ONE should I repair the table with the REAPER or can I avoid to repair it if I have all the nodes up and running?
There is no concern to read the most updated data and I believe that it should be really unlikely that it is going to happen so even with LOCAL_ONE I should not have concerns and avoid to perform REPAIR with REAPER.
I would like to achieve consistency and possibly avoiding to perform an expensive repair cycle with REAPER.

What do you think about it?

Reference: SLIDE 85. Not repair everything.

Thanks everyone!

Have a great weekend!

10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Lewisr650 avatar image
Lewisr650 answered Sergio commented

The read_repair_chance is based on the compaction strategy and it determines what the chance of performing a read-repair when inconsistent responses are provided back to the coordinator. So, with size-tiered compactions strategy that will occur 10% of the time an inconsistent read is delivered to the coordinator. Since Local_One means the coordinator is waiting for a single replica response there is no other replica to compare against and read_repair_chance can't do anything with that response. To identify a read_repair opportunity you have to have at least 2 replicas to compare against to identify whether a read_repair would be an opportunity to write back a consistent record.

Repair is not so critical upon loading the data. Repair is an opportunity to validate consistency of replicas across the cluster. The critical timing for repair operations is that it be completed in less than gc_grace_seconds to validate data and tombstones before data would be identified as being deleted. Trying to avoid repair operations exposes you to serving stale data. Repair and compaction are regular data management processes that should not be avoided.

If you are writing with Local_Quorum and reading with Local_One you open yourself to a window of opportunity to serve stale data. The best starting pattern is to write with Local_Quorum (Or Quorum) and reading with Local_Quorum. If you need to read with Local_Quorum for latency reasons, then you should write with Local_All. Vice versa, if you need to write data with low latency then should write with Local_One but read with Local_All. Consistency management is a tunable based on application requirements. (High read vs high write affinity)

You can review the documentation here:

1 comment Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Sergio avatar image Sergio commented ·

Yes, I agree with what you said but I was also watching a talk that was suggesting to don't repair everything.

So I was thinking if I can avoid running repair in COLUMN FAMILIES where the clients read and write with LOCAL_QUORUM since the repair is going to be performed by Cassandra behind the scene.
Thanks again for your reply

0 Likes 0 ·