Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

ted.petersson_164115 avatar image
ted.petersson_164115 asked ·

How do I revert changes to a pod in CrashLoopBackOff status?

I'm testing the cass-operator and have a very simple 3-node cluster setup.

Then I try to give the operator a serverImage configuration that does not work, e.g.

serverImage: "cassandra:3.11"

Now my pods (and statefulset) looks like this:

> kubectl -n cass-operator get pods,statefulset
NAME                                 READY   STATUS             RESTARTS   AGE
pod/cass-operator-78884f4f84-lmkbj   1/1     Running            0          18m
pod/cluster1-dc1-default-sts-0       2/2     Running            0          18m
pod/cluster1-dc1-default-sts-1       2/2     Running            0          18m
pod/cluster1-dc1-default-sts-2       1/2     CrashLoopBackOff   6          7m26s

NAME                                        READY   AGE
statefulset.apps/cluster1-dc1-default-sts   2/3     18m

The statefulset try to roll out the new image to pod #2 but it fails!

And the operator status is

 > kubectl -n cass-operator describe cassdc
...
Status:
  Cassandra Operator Progress:  Updating
...

So even though I try to fallback (apply a old config without a failing image), the operator is "stuck" in Updating state and will not "feed" the old (working) image down to the statefulset...

> kubectl -n cass-operator describe statefulset |grep Image
    Image:      datastax/cass-config-builder:1.0.0  <--- Init container
    Image:      cassandra:3.11  <--- non-working image - not "fallback:ed"
    Image:      busybox  <--- I'm using a sidecar image as well, that's why its 2 containers/pod

So HOW can I force the operator to ignore its updating state and feed the correct image to the statefulset?

I want to do this without deleting the statefulset (then all pods will be deleted and traffic will be lost).

/BR Ted

kubernetescass-operator
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered ·

This looks like more a Kubernetes configuration issue than a problem with the cass-operator. The CrashLoopBackOff is k8s attempting to repeatedly restart the container after it keeps crashing.

You'll need to investigate the root cause and resolve it if you don't want to delete the statefulset. I'd recommend you run kubectl describe on the problematic pod and review the events for clues. Also get the logs with kubectl logs so you can review for a possible cause.

I don't believe there's a way to force configuration changes. I'm going to reach out to the authors of the operator internally and request them to respond. Cheers!

4 comments Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

I'm testing a In-service software rollback scenario. So that the pod does not start is deliberate. I don't want it to start, I want to rollback to the previous image (without affecting the other pods currently taking traffic.

Is this possible somehow?

0 Likes 0 · ·
Erick Ramirez avatar image Erick Ramirez ♦♦ ted.petersson_164115 ·

There's a concept of a canary upgrade like in this example:

  # Using canaryUpgrade will limit config changes that directly impact the
  # underlying StatefulSets resources (which is most of them) to only updating
  # the first StatefulSet / rack. Users can use this to test configuration
  # changes before rolling them out to the whole cluster.
  canaryUpgrade: false

But I don't think there's a rollback facility. Cheers!

0 Likes 0 · ·

Maybe failback is a better name for what I'm testing.

The scenario is to perform an In-Service Upgrade, but something fails (faulty image) during the pod rollout... Then an In-Service Failback should be performed, to fall back to the previous (working) image.

0 Likes 0 · ·
Erick Ramirez avatar image Erick Ramirez ♦♦ ted.petersson_164115 ·

Rollback, failback, revert -- I understood your intent the first time. :)

1 Like 1 · ·