question

376752150_179413 avatar image
376752150_179413 asked RashidSajjad answered

Is restricting total partitions to a small range helpful for accelorating node repair process?

We perform a repair on a weekly basis.
at the beginning, it took 3 hours to repair
but the more data we had and the more time it took
Now it takes something like 20h to repair.
I found that creating an artificial partition key to restrict the totall range will reduce the repair for each table by at least a factor 2 or 3.


For example,

table A1 {

id

PRIMARY KEY (id)

};

table A2 {

cursor // the artificial partition key, calculated according to id manually, the range is restricted 0-7

id

PRIMARY KEY (cursor, id)

};


Why "creating an artificial partition key to restrict the totall range will reduce the repair for each table by at least a factor 2 or 3"?

Any insights would be much appreciated!

repair
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Lewisr650 avatar image
Lewisr650 answered 376752150_179413 commented

That will be an unintended outcome, but your data model should be driven by your query patterns and then enhanced to optimize partition sizes. It's reasonable that repair takes longer the more data you have. But the critical timeline is that repair is completed in less that gc_grace_seconds which is 10 days by default. You don't want to force repair to consume compute resources that might have a negative impact in read and write performance of the cluster. The important factor here is that repair is completed before tombstones are deleted and validating data consistency within partitions where tombstones will not be written forward.

1 comment Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

376752150_179413 avatar image 376752150_179413 commented ·

Thanks a lot for your reply! Would you please share mor insights about why "creating an artificial partition key to restrict the totall range will reduce the repair for each table by at least a factor 2 or 3"?

0 Likes 0 ·
RashidSajjad avatar image
RashidSajjad answered

You haven't really figured our what is causing a long repair process so artificial partition keys at this point will only be a suggestion, not a fix.

By itself, reducing number of partitions is irrelevant, the other part is what is your replication factor. the more copies of data, the more concurrency is needed.

Look at your metrics and review them against here first to figure out why is repair taking long:
https://academy.datastax.com/support-blog/diagnostic-tarball-gold-mine

Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.