Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

oscneira avatar image
oscneira asked ·

cass-operator does not create PVC/PV - Pods pending state (netapp.io/trident)

Hello, I have some issues deploying Cassandra on my cluster using the operator. "Warning FailedScheduling default-scheduler pod has unbound immediate PersistentVolumeClaims". I use the same StorageClass to deploy Cassandra manually, and it works. Any ideas?

K8s version: 1.13

Storageclass provider: netapp.io/trident

kdp cluster1-dc1-default-sts-0 -n cass-operator
Name:               cluster1-dc1-default-sts-0
Namespace:          cass-operator
Priority:           0
PriorityClassName:  <none>
Node:               <none>
Labels:             app.kubernetes.io/managed-by=cass-operator
                    cassandra.datastax.com/cluster=cluster1
                    cassandra.datastax.com/datacenter=dc1
                    cassandra.datastax.com/node-state=Ready-to-Start
                    cassandra.datastax.com/rack=default
                    controller-revision-hash=cluster1-dc1-default-sts-6dcbf85bb5
                    statefulset.kubernetes.io/pod-name=cluster1-dc1-default-sts-0
Annotations:        <none>
Status:             Pending
IP:
Controlled By:      StatefulSet/cluster1-dc1-default-sts
Init Containers:
  server-config-init:
    Image:      datastax/cass-config-builder:1.0.1
    Port:       <none>
    Host Port:  <none>
    Environment:
      CONFIG_FILE_DATA:  {"cassandra-yaml":{"authenticator":"org.apache.cassandra.auth.PasswordAuthenticator","authorizer":"org.apache.cassandra.auth.CassandraAuthorizer","role_manager":"org.apache.cassandra.auth.CassandraRoleManager"},"cluster-info":{"name":"cluster1","seeds":"cluster1-seed-service"},"datacenter-info":{"name":"dc1"},"jvm-options":{"initial_heap_size":"800M","max_heap_size":"800M"}}
      POD_IP:             (v1:status.podIP)
      RACK_NAME:         default
      PRODUCT_VERSION:   3.11.6
      PRODUCT_NAME:      cassandra
      DSE_VERSION:       3.11.6
    Mounts:
      /config from server-config (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-mv8lz (ro)
Containers:
  cassandra:
    Image:       datastax/cassandra-mgmtapi-3_11_6:v0.1.5
    Ports:       9042/TCP, 8609/TCP, 7000/TCP, 7001/TCP, 8080/TCP
    Host Ports:  0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP
    Liveness:    http-get http://:8080/api/v0/probes/liveness delay=15s timeout=1s period=15s #success=1 #failure=3
    Readiness:   http-get http://:8080/api/v0/probes/readiness delay=20s timeout=1s period=10s #success=1 #failure=3
    Environment:
      DS_LICENSE:               accept
      DSE_AUTO_CONF_OFF:        all
      USE_MGMT_API:             true
      MGMT_API_EXPLICIT_START:  true
      DSE_MGMT_EXPLICIT_START:  true
    Mounts:
      /config from server-config (rw)
      /var/lib/cassandra from server-data (rw)
      /var/log/cassandra from server-logs (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-mv8lz (ro)
  server-system-logger:
    Image:      busybox
    Port:       <none>
    Host Port:  <none>
    Args:
      /bin/sh
      -c
      tail -n+1 -F /var/log/cassandra/system.log
    Environment:  <none>
    Mounts:
      /var/log/cassandra from server-logs (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-mv8lz (ro)
Conditions:
  Type           Status
  PodScheduled   False
Volumes:
  server-data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  server-data-cluster1-dc1-default-sts-0
    ReadOnly:   false
  server-config:
    Type:    EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
  server-logs:
    Type:    EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
  default-token-mv8lz:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-mv8lz
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason            Age                    From               Message
  ----     ------            ----                   ----               -------
  Warning  FailedScheduling  114s (x190 over 112m)  default-scheduler  pod has unbound immediate PersistentVolumeClaims (repeated 7 times)

here the storageclass:

allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
  name: server-storage
parameters:
  backendType: ontap-nas-economy
  encryption: "true"
  media: ssd
  provisioningType: thin
provisioner: netapp.io/trident
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer


Deploying manually Cassandra with the same StorageClass works. It generates new PVCs and bound to new PVs.

✦2 ➜ kgp
NAME                        READY   STATUS    RESTARTS   AGE
busybox                     1/1     Running   0          20h
spectrum-77f9f76c58-gwstc   1/1     Running   0          31h
spectrum-cassandra-0        1/1     Running   0          3d7h
spectrum-cassandra-1        1/1     Running   0          3d7h
spectrum-cassandra-2        1/1     Running   0          3d7h
spectrum-elasticsearch-0    1/1     Running   0          3d21h


Here the Statefulset:

apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
  name: spectrum-cassandra
  labels:
    app: spectrum-cassandra
spec:
  serviceName: spectrum-cassandra
  replicas: 3
  template:
    metadata:
      labels:
        app: spectrum-cassandra
        zone: bb-hsec
    spec:
      containers:
        - name: spectrum-cassandra
          imagePullPolicy: IfNotPresent
          image: spectrum-cassandra:latest
          ports:
            - containerPort: 7000
              name: internal
            - containerPort: 7001
              name: internaltls
            - containerPort: 7199
              name: jmx
            - containerPort: 9042
              name: cql
            - containerPort: 9160
              name: thrift
          livenessProbe:
            exec:
              command:
                - /bin/bash
                - -c
                - nodetool statusgossip |grep "^running$"
            initialDelaySeconds: 60
            periodSeconds: 20
            failureThreshold: 3
            successThreshold: 1
            timeoutSeconds: 20
          readinessProbe:
            exec:
              command:
                - /bin/bash
                - -c
                - if [[ $(nodetool status | grep $POD_IP) == *"UN"* ]]; then exit 0; else exit 1; fi
            initialDelaySeconds: 10
            periodSeconds: 10
            failureThreshold: 1
            successThreshold: 1
            timeoutSeconds: 10
          resources:
            limits:
              cpu: "2000m"
              memory: 4Gi
            requests:
              cpu: "500m"
              memory: 2Gi
          securityContext:
            capabilities:
              add:
                - IPC_LOCK
          lifecycle:
            preStop:
              exec:
                command:
                  - /bin/sh
                  - -c
                  - nodetool drain
          envFrom:
            - configMapRef:
                name: spectrum-cassandra
          env:
            - name: POD_IP
              valueFrom:
                fieldRef:
                  fieldPath: status.podIP
          volumeMounts:
            - name: ca-pv-data
              mountPath: /cassandra_data
      terminationGracePeriodSeconds: 300
  volumeClaimTemplates:
    - metadata:
        name: ca-pv-data
      spec:
        accessModes:
          - ReadWriteOnce
        storageClassName: trident-nfs
        resources:
          requests:
            storage: 10Gi

PVC:

✦2 ➜ kg pvc
NAME                                  STATUS   VOLUME                                                     CAPACITY      ACCESS MODES   STORAGECLASS   AGE
ca-pv-data-spectrum-cassandra-0       Bound    oneira-sandbox-ca-pv-data-spectrum-cassandra-0-1f88d       10737418240   RWO            trident-nfs    3d19h
ca-pv-data-spectrum-cassandra-1       Bound    oneira-sandbox-ca-pv-data-spectrum-cassandra-1-742ba       10737418240   RWO            trident-nfs    3d19h
ca-pv-data-spectrum-cassandra-2       Bound    oneira-sandbox-ca-pv-data-spectrum-cassandra-2-aa2ad       10737418240   RWO            trident-nfs    3d19h
es-pv-data-spectrum-elasticsearch-0   Bound    oneira-sandbox-es-pv-data-spectrum-elasticsearch-0-4e172   10737418240   RWO            trident-nfs    4d2h


cass-operator
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

jim.dickinson_187342 avatar image
jim.dickinson_187342 answered ·

Can you show the yaml for the PVCs generated by cass-operator, and the PVCs you created? Or is there a difference between the trident-nfs storage class and the server-storage class? You're welcome to not use the name server-storage - we just needed a name for examples/tests/tutorials.

5 comments Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

All storage classes are the same, just different names.

✦3 ➜ kg storageclass
NAME                         PROVISIONER         AGE
cass-trident-nfs (default)   netapp.io/trident   28h
server-storage (default)     netapp.io/trident   5h19m
trident-nfs (default)        netapp.io/trident   10d
0 Likes 0 · ·
I was able to see the problem after you ask about the PVCs in the namespace. Forgot to add the namespace in the query daa!. Now I see that the problem "name exceeds the limit of 64 characters".
  Normal     ExternalProvisioning  2m43s (x1285 over 5h22m)  persistentvolume-controller  waiting for a volume to be created, either by external provisioner "netapp.io/trident" or manually created by system administrator
  Normal     ProvisioningFailed    2m (x323 over 5h22m)      netapp.io/trident            encountered error(s) in creating the volume: [Failed to create volume cass-operator-server-data-cluster1-dc1-default-sts-0-44412 on storage pool bp2_nas02_l003_trident_ssd from backend liaison_cfcr_nas: volume liaison_cass_operator_server_data_cluster1_dc1_default_sts_0_44412 name exceeds the limit of 64 characters]

Thanks for the help!

0 Likes 0 · ·

That wasn't going to be my next guess but I'm glad you found it! I'll add that to my checklist of places you can get stuck.

1 Like 1 · ·

Got it moving forward.. However, the cluster does not have access to public repos and I cannot find the place where I can override the image value for the busybox:

server-system-logger:
    Container ID:
    Image:         busybox
    Image ID:
    Port:          <none>
0 Likes 0 · ·

Making that un configurable was an oversight and it's in the queue to get fixed. https://github.com/datastax/cass-operator/issues/241

1 Like 1 · ·