Gangadhara M.B avatar image
Gangadhara M.B asked Erick Ramirez edited

How do we know when to run repairs?

Are there any logic/method/program/script to find out when do really need to run repair on keyspace/table.

In general if we say we need to schedule periodic repair or on each node business asking us how do you find out when keyspaces/table needs repair to be run ?.

We don't have any monitoring tools like OpsCenter or Prometheus , we are entirely dependent on nodetool or any command line tools if any

10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered Erick Ramirez commented

You need to run repairs regularly at least once every gc_grace_seconds. By default, tables are configured with the default gc_grace_seconds of 10 days. If you check each tables schema, you'll see this as one of the options:

    AND gc_grace_seconds = 864000
    ... ;

Running repairs isn't a matter of when you need them or not -- you should just run them regularly. We recommend that you run the following repair in a rolling fashion, one node at a time:

$ nodetool repair -pr

This command will only repair the token ranges for which the node is the primary owner so it's very important to run it on ALL nodes in ALL data centres, one node at a time so as not to overload your cluster.

This blog post by Jeremiah Jordan explains it in great detail -- Repair in Cassandra. I also recommend this section of the docs for more info -- When to run anti-entropy repair.

You don't have to manually manage repairs. If you are running open-source Apache Cassandra, we recommend you use the Cassandra Reaper tool for scheduling repairs in your cluster. It is open-source and free to use. Cheers!

2 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Gangadhara M.B avatar image Gangadhara M.B commented ·

Que 1) What about in cases where none of the nodes in the Cassandra cluster are never expected to go down and assume no data missing or corruption , since cloud provider like AWS are guaranteeing to provide SLA of 99.95 % up time .?

Que 2) Do we still need to consider running repair once before GC_GRACE_SECONDS even when we have guaranteed SLA of 99.95% up time of EC2 Instances ?

Que 3) What are chances/scenarios which can make data inconsistency b/n the replica nodes considering no nodes are expected to go down with high/strong level of 99.95 % up time SLA and no real data corruption

If really need to run repair then i will consider to go with reaper

0 Likes 0 ·
Erick Ramirez avatar image Erick Ramirez ♦♦ Gangadhara M.B commented ·

Yes, you still need to run repairs regardless. It's a must in Cassandra.

As a side note, you would have noticed that I keep converting your posts in various questions to comments since they're not "answers". Please do not post your questions as "answers" since they will be confusing to other users. Cheers!

-1 Like -1 ·