pranali.khanna101994_189965 avatar image
pranali.khanna101994_189965 asked Erick Ramirez answered

What is the difference between partitioner range repair and subrange repair?

to ease the work and load of the node during a full repair we can opt for either primary repair or subrange repair . what is the difference between the two?

10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

saravanan.chinnachamy_185977 avatar image
saravanan.chinnachamy_185977 answered saravanan.chinnachamy_185977 edited

@pranali.khanna101994_189965 Each node in cassandra will manage the following data.

  1. Data that belongs to token range that is assigned to the node according to the token setup within the ring. This is called "partitioner-range".
  2. Data that belongs to replica of primary ranges from other nodes. This is stored to satisfy the replication factor > 1 requirement.

Now we can run repair with the following options.

  • (Primary or partitioner Range Repair): nodetool repair -pr - This will repair the data of the primary range of the data that the node is responsible for , but not other ranges managed on this node. It reduces the load on the node by reducing excessive data streaming across the network. One caveat of using this option is that since each node only repairs one range of data that is managed by a node, this option needs to run on ALL nodes in the ring in order to repair all the data.
  • (Sub-Range Repair): nodetool repair -st -9223372036854775808 -et -3074457345618258603 - Conceptually sub-range repair is much like primary range repair, except that each sub-range repair operation focuses even smaller subset of data. You have the option to choose the ranges of data that you need to repair. This will be an advanced operation for DevOps teams.

Please refer to our documentation Repair for more details.

10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Erick Ramirez avatar image
Erick Ramirez answered

Partitioner range (-pr) repairs only repair the range where the node is the primary owner so the specified range only gets repaired once (recommended).

Subrange repair (-st & -et) repairs the range between the given start and end tokens.

We recommend running partitioner range repairs (-pr). The reason for this is that it is the most efficient type of repair and is the most fool-proof.

Subrange repairs is not recommended because most users do not know how to pick and calculate the correct ranges. In a lot of cases, they miss repairing certain ranges and wonder why their data is out-of-sync. There are advantages to running subrange repairs but it is best left to experts. If you really want to run it, we recommend that you use a free open-source tool like Cassandra Reaper.

For more info on why partitioner range repairs are recommend, see Repairs in Cassandra. Cheers!

10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.