sangeeth_talluri_163565 avatar image
sangeeth_talluri_163565 asked Erick Ramirez answered

Data missing from Grafana dashboards, some nodes have no metrics

dse metric collector missing metrics not giving all metrics ,when i create grafana dashabord i see many graphs with no data.

with the large schema i belive collectd not collecting all the metrics on some nodes. is thery any way we can ensure that metrics from all nodes are consistent

tried adding some filters ,but no luck

dsemonitoringmetrics collector
1 comment
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

bhisey.swapnil avatar image bhisey.swapnil commented ·

Hi Sangeeth,

Can you please give more detail?

  • Which metrics you are looking for are missing?
  • If you have enabled Prometheus plugin to expose metrics, then you can check with curl HTTP://hostname:9103 port to see which metrics are collected on an individual node.
  1. LoadPlugin write_prometheus
  2. <Plugin write_prometheus>
  3. Port “9103”
  4. </Plugin>
0 Likes 0 ·

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered Erick Ramirez edited

@sangeeth_talluri_163565 There's possibly several things not working but the primary thing you need to resolve is metrics collection not working on some nodes.

Prometheus plugin

On each DSE node, confirm that the Prometheus plugin (collectd/prometheus.conf) loaded successfully. Assuming you had configured the plugin to run on port 9103, you should see nodes listening on the port as follows:

$ netstat -lnt | grep 9103
tcp6       0      0 :::9103                 :::*                    LISTEN  

Each monitored node should be listening on this port. Otherwise, it means that the plugin wasn't configured on it and you need to deploy the configuration.

Once you've deployed the configuration to all nodes, disable the collector on ONE node in the cluster with:

$ dsetool insights_config --mode DISABLED

You should see the following entries in the system.log:

INFO  [RMI TCP Connection(4)-] 2020-04-24 19:29:23,228 - Generating new scribe config
INFO  [RMI TCP Connection(4)-] 2020-04-24 19:29:23,233 - Stopping Insights Client...
INFO  [RMI TCP Connection(4)-] 2020-04-24 19:29:23,235 - Stopping collectd

Wait a minimum of 30 seconds, even up to a minute to be safe. Then re-enable the collector with:

$ dsetool insights_config --mode ENABLED_WITH_LOCAL_STORAGE

Wait 30 seconds to a minute then check that collectd is listening on the plugin port for each node in the cluster.

Prometheus server

Check the targets on this URL -- http://prometheus_server_ip:9090/targets. Verify that all DSE nodes are up.

If there are nodes showing as DOWN, you will need to go back through the steps and confirm that the plugin is configured correctly.

Metrics data

To check that metrics are getting collected, you can view the raw metrics data for the node by visiting this URL -- http://dse_node_ip:9103/metrics.

If the URL isn't available for a node, you will need to go back through the steps again.

I hope this helps. Cheers!

10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.