question

wederbrand avatar image
wederbrand asked Erick Ramirez commented

What is the correct way of upgrading from 2 to 3 replicas, and from LOCAL_ONE to LOCAL_QUORUM?

We have some keyspaces with 2 replicas that we query with LOCAL_ONE to tolerate node failures.

Now we're about to upgrade those to have 3 replicas and LOCAL_QUORUM but I struggle to find a process that works.

These are my findings, and I guess questions. Please correct me if I got something wrong.

  • Doing the alter first means 1 out of 3 nodes does not have the data until "nodetool repair --full" finishes. Quering that node will result in the data not being found.
  • Changing to LOCAL_QUORUM first puts us at risk of losing availability until the alter has happend.

Is there some other way of doing this?

Is it weird that the new node accepts reads before it has been correctly repaired?

replicationconsistency level
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Erick Ramirez avatar image
Erick Ramirez answered Erick Ramirez commented

There is no easy way out of this situation. This is why we recommend 3 replicas in each DC in a production cluster right from the start.

My recommendation is to [1] switch to LOCAL_QUORUM first, noting that you already know your app cannot tolerate any node outage.

Alter the keyspace to have 3 replicas [2] then run a rolling repair -pr [3] (do not use the --full flag) one node at a time so as not to impact the performance of your cluster.

Between step [2] and step [3], there will be a temporary performance impact for read requests as a read-repair is triggered in the background whenever the app reads a partition which hasn't been repaired yet.

As a side note, the partitioner range repair (with the -pr flag) is the most efficient way to run repairs manually since it will only repair the primary token ranges owned by a node and not the ranges for which it is a non-primary replica. For details, see Jeremiah Jordan's blog post Apache Cassandra Maintenance and Repair. Cheers!

2 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

wederbrand avatar image wederbrand commented ·

Thanks. This aligns with what we came up with internally, after failing with one of the other alternatives.


Is it documented somewhere that the new replica will be without data until repair is finished? Perhaps this could be added as warning on the "alter keyspace"-pages.

I for one "knew" wrongly, propably confusing it with adding nodes.

0 Likes 0 ·
Erick Ramirez avatar image Erick Ramirez ♦♦ wederbrand commented ·

That's a good call. I'll make the suggestion to the Docs team. Cheers!

0 Likes 0 ·
ion_145132 avatar image
ion_145132 answered Erick Ramirez commented

@wederbrand can you afford to build a parallel Data Center with RF=3?
When the new DC has all the data just point the app to it.

1 comment Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Erick Ramirez avatar image Erick Ramirez ♦♦ commented ·

That's certainly an option but it's can be an expensive exercise and in my experience, no organisation I've worked with have gone down that path since it also requires a change to the app.

But more importantly, thanks for being part of the community. We hope to see more contributions from you. Cheers!

0 Likes 0 ·