Build Cloud-Native apps with Apache Cassandra

GOT QUESTIONS from the Workshop? You're in the right place! Post a question here and we'll get you answers.

Click here for Week 8 Materials and Homework.

Follow us on Eventbrite to get notified when new workshops are scheduled!


question

quinn.wong_194182 avatar image
quinn.wong_194182 asked ·

Running into same "Failed to connect" issue when running "nodetool status" from week 5 exercises

[FOLLOW UP QUESTION TO #7257]

Hi David, I'm running into the same issue as well during Step 4 of the Week 5 homework. Here is what I get when I execute:

$ kubectl -n cass-operator get pods
NAME                             READY   STATUS    RESTARTS   AGE
cass-operator-56fcb9ff47-jrrz6   1/1     Running   4          23h
cluster1-dc1-default-sts-0       1/2     Running   2          54m
cluster1-dc1-default-sts-1       1/2     Running   0          24m
cluster1-dc1-default-sts-2       1/2     Running   13         23h

I did not run into any issues when starting up a single node.

[UPDATE] I completely removed and deleted the cluster with the provided instructions and then reset the cluster. I have the same issues still.

I saw a status of 2/2 for cluster1-dc1-default-sts-0 when I first scaled up to 3 nodes. However, the cluster would cycle through initializing and terminating the nodes:

cass-operator-56fcb9ff47-8lnmc   1/1     Running       27         172m
cass-operator-56fcb9ff47-zjv6p   1/1     Terminating   0          19h
cluster1-dc1-default-sts-0       1/2     Terminating   4          19h
cluster1-dc1-default-sts-1       1/2     Running       98         19h
cluster1-dc1-default-sts-2       1/2     Terminating   74         12h

Where would I go to view the log files for the cluster? Also, could this be a memory or JVM issue? I am running 1.8.

workshopcass-operator
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Erick Ramirez avatar image
Erick Ramirez answered ·

To add to David's response, in my experience the issue you're seeing occurs when there is not enough memory on the machine so the Cassandra instances cannot start.

Here is an example output for a 3-pod cluster:

$ kubectl -n cass-operator get pods
NAME                             READY   STATUS    RESTARTS   AGE
cass-operator-56fcb9ff47-kb9lf   1/1     Running   0          6d15h
cluster1-dc1-default-sts-0       2/2     Running   0          6d15h
cluster1-dc1-default-sts-1       2/2     Running   0          6d15h
cluster1-dc1-default-sts-2       2/2     Running   0          6d15h

Note that if the value in the READY column is set to 1/2, it means that the Cassandra instance in that pod has not started successfully because there isn't enough resources available to Cassandra.

We recommend that you only attempt running a 3-node cluster on machines which have at least 16GB of memory. In some circumstances, it might be possible to run the exercises with 12GB but you will eventually run into issues.

I've also noted that you've installed the operator twice:

cass-operator-56fcb9ff47-8lnmc   1/1     Running       27         172m
cass-operator-56fcb9ff47-zjv6p   1/1     Terminating   0          19h

I recommend that you delete your KiND cluster completely and start from scratch. Cheers!

Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

David Jones-Gilardi avatar image
David Jones-Gilardi answered ·

Do you ever see a status of 2/2 for any of the nodes? Hrmmm, looking at the data you attached these should not ever take 24mins to start up. Seems like something is off.

Have you tried to completely remove and delete all of the cluster with these instructions -> https://github.com/DataStax-Academy/cassandra-workshop-series/blob/master/week5-Cass-in-k8s/README_METRICS_DSE.MD#8-stop-and-clean-up? If not, I would try to stop and delete everything, then go back to the kind cluster setup step here -> https://github.com/DataStax-Academy/cassandra-workshop-series/tree/master/week5-Cass-in-k8s#6-create-a-kind-cluster and start fresh.

What's interesting to me is that you stated you can start and run a single node just fine, yet in the case you are displaying all 3 nodes are stuck at 1/2. If one node was running, when you update the config to expand to 3 nodes 2 are simply added to the 1. So I'm curious if somehow something is happening to the existing 1 node that was working.

Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.