question

prylandsmoore_193617 avatar image
prylandsmoore_193617 asked Erick Ramirez edited

Had big problems with cloud instance for workshop 5


Please look at the cut-and-paste job below and see if you can analyze what happened. Right now my instance is cleaned up. I’d like to try again, IF things are working OK!


Off to a not-so-good start, on my instance. I was fine and dandy until I attempted to get Cassandra to a 3-node cluster. The processes showed up, but never in the data center.



[8:30 PM]NAME READY STATUS RESTARTS AGE cass-operator-56fcb9ff47-zr7bp 1/1 Running 0 56m cluster1-dc1-default-sts-0 2/2 Running 0 47m cluster1-dc1-default-sts-1 1/2 Running 0 31m cluster1-dc1-default-sts-2 1/2 Running 0 31m



[8:32 PM]I notice the 1/2 on the last 2. When I executed step 4e, it only showed the 1 node, not the 3. I think the above output indicates the problem. I went ahead and changed the replication factor and the quorum per the instructions, and then when I went to perform the select, I got a "NoHostAvailable:" Sigh. I suspect a sick 3-node cluster.



[8:32 PM]OK, on to graphana & prometheus ...



[8:33 PM]I was fine until I went to add prometheus. Got the following error:



[8:33 PM]ec2-user@ip-172-31-21-37:~/kubernetes-workshop-online/week5-Cass-in-k8s> kubectl -n cass-operator apply -f ./prometheus_grafana/prometheus/instance.yaml serviceaccount/prometheus created clusterrole.rbac.authorization.k8s.io/prometheus created clusterrolebinding.rbac.authorization.k8s.io/prometheus created error: unable to recognize "./prometheus_grafana/prometheus/instance.yaml": no matches for kind "Prometheus" in version " monitoring.coreos.com/v1



[8:33 PM]Went to section 4 e, and guess what - a bunch of graphana, but no prometheus!



[8:34 PM]ec2-user@ip-172-31-21-37:~/kubernetes-workshop-online/week5-Cass-in-k8s> kubectl get svc -n cass-operator --show-labels=true NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE LABELS cass-operator-metrics ClusterIP 10.103.217.192 <none> 8383/TCP,8686/TCP 73m name=cass-operator cassandradatacenter-webhook-service ClusterIP 10.103.247.162 <none> 443/TCP 73m name=cass-operator-webhook grafana-operator-metrics ClusterIP 10.96.48.44 <none> 8080/TCP 116s name=grafana-operator grafana-service ClusterIP 10.101.246.99 <none> 3000/TCP 16s <none>



[8:34 PM]I was trying to get done with my instance, but I guess not ...



[8:37 PM]I did execute the cleanup so nothing is running over there.
[8:39 PM]Sigh ...



Tried again the next day, got the following:


Tried again today.


Tried to start at step 7, got the following:


ec2-user@ip-172-31-21-37:~/kubernetes-workshop-online/week5-Cass-in-k8s> kubectl create ns cass-operator

error: Missing or incomplete configuration info. Please point to an existing, complete config file:


1. Via the command-line flag --kubeconfig

2. Via the KUBECONFIG environment variable

3. In your home directory as ~/.kube/config


To view or setup config directly use the 'config' command.

ec2-user@ip-172-31-21-37:~/kubernetes-workshop-online/week5-Cass-in-k8s>

workshop
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

David Jones-Gilardi avatar image
David Jones-Gilardi answered David Jones-Gilardi commented

Hi there @prylandsmoore_193617. Thank you for your post and sorry you are having troubles.

Ok, first thing. Could you send me the instance you are working on? I'd like to take a look and troubleshoot directly.


Responding to some of the issues you reported above:

[8:32 PM]I notice the 1/2 on the last 2. When I executed step 4e, it only showed the 1 node, not the 3. I think the above output indicates the problem. I went ahead and changed the replication factor and the quorum per the instructions, and then when I went to perform the select, I got a "NoHostAvailable:" Sigh. I suspect a sick 3-node cluster.

Right, if you had 2 nodes out of 3 that were not completely initialized and running properly I would suspect trying to use QUORUM and setting your RF = 3 would be problematic. So, yes, I agree that you might have a sick 3-node cluster.


[8:33 PM]ec2-user@ip-172-31-21-37:~/kubernetes-workshop-online/week5-Cass-in-k8s> kubectl -n cass-operator apply -f ./prometheus_grafana/prometheus/instance.yaml serviceaccount/prometheus created clusterrole.rbac.authorization.k8s.io/prometheus created clusterrolebinding.rbac.authorization.k8s.io/prometheus created error: unable to recognize "./prometheus_grafana/prometheus/instance.yaml": no matches for kind "Prometheus" in version " monitoring.coreos.com/v1”

This is a direct result of the prometheus operator not running yet. This one takes at least 5-8mins in my experience and will you continue to get this error until the operator is fully up and running. Did your prometheus operator EVER come up at all?


ec2-user@ip-172-31-21-37:~/kubernetes-workshop-online/week5-Cass-in-k8s> kubectl create ns cass-operator
error: Missing or incomplete configuration info. Please point to an existing, complete config file:

This...is strange and exactly why I'd like to take a look at your instance.

1 comment Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

David Jones-Gilardi avatar image David Jones-Gilardi ♦ commented ·

Using your instance I was able to run the full set of operations, however, I could reproduce each of the items you listed above and the responses I gave to each section fit each scenario.

I did run into an issue with the amount of space the Docker volumes were using preventing one node from coming up. After clearing things out and starting over I was able to do a full run again, but noticed disk space was very close to being out of space. I've addressed this with the team to see how would should handle this.

0 Likes 0 ·