question

penky28_147901 avatar image
penky28_147901 asked Erick Ramirez edited

What changed in Cassandra 4.0 to cause repair jobs to fail?

Hi,

I have a apache Cassandra cluster running with 5 nodes of version 3.11.3. We used to run. Cassandra sub range repair and the activity used to complete in less than 150 mins for each node with 300+ G of data.

I am testing Cassandra upgrade to 4.0 and downloaded, upgraded the cluster. After upgrade, I ran the sub range repair using the ranges which I had from 3.11.3 version. Following error is encountered.

Repair command #8 failed with error Incremental repair session 88232520-0049-11ec-8e8e-291479d814da has failed
[2021-08-18 10:28:15,867] Repair command #8 finished with error
error: Repair job has failed with the error message: Repair command #8 failed with error Incremental repair session 88232520-0049-11ec-8e8e-291479d814da has failed. Check the logs on the repair participants for further details
-- StackTrace --
java.lang.RuntimeException: Repair job has failed with the error message: Repair command #8 failed with error Incremental repair session 88232520-0049-11ec-8e8e-291479d814da has failed. Check the logs on the repair participants for further details
 at org.apache.cassandra.tools.RepairRunner.progress(RepairRunner.java:133)
 at org.apache.cassandra.utils.progress.jmx.JMXNotificationProgressListener.handleNotification(JMXNotificationProgressListener.java:77)
 at com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.dispatchNotification(ClientNotifForwarder.java:583)
 at com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.doRun(ClientNotifForwarder.java:533)
 at com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.run(ClientNotifForwarder.java:452)
 at com.sun.jmx.remote.internal.ClientNotifForwarder$LinearExecutor$1.run(ClientNotifForwarder.java:108

Wondering any changes made to Cassandra sub range repair in 4.0

repair
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered Erick Ramirez commented

There isn't anything in the error you posted that suggests it's an issue with Cassandra 4.0.

I would suggest that you review the logs of the replicas involved in the repair for clues on the root cause of the failure. Cheers!

2 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

penky28_147901 avatar image penky28_147901 commented ·

Thanks Erick for the reply.

In 3.11.3, following syntax used and worked without any error.

nodetool repair -st -09183985968492985983 -et -09163001156344214513


However, In 4.0, I if use the above syntax, I ran into error as mentioned above.


I modified the syntax as below in 4.0 and it worked.


nodetool repair -pr -full -st -09183985968492985983 -et -09163001156344214513


Can you please confirm if there is any syntax change for repair command.


0 Likes 0 ·
Erick Ramirez avatar image Erick Ramirez ♦♦ penky28_147901 commented ·

Those 2 are unrelated. In your original question, you were running an incremental repair:

Repair command #8 failed with error Incremental repair session 88232520-0049-11ec-8e8e-291479d814da has failed

But what you claim is "modified" in 4.0 is a full repair:

$ nodetool repair -pr -full -st ... -et ...

You can either run incremental repair or a full repair -- but not both. See this document for an explanation of full vs incremental repairs. In any case, neither have anything to do with Cassandra 4.0.

Again, you need to review the logs to find out why the incremental repair failed. Cheers!

0 Likes 0 ·