JoshPerryman avatar image
JoshPerryman asked Erick Ramirez edited

DSE (C*, Spark, Graph) Monitoring Starting Point, Best Practices

We're setting up additional monitoring for a couple of DSE clusters hosted in AWS. The "how" part of this seems pretty well documented, but the "what" part seems oddly missing in the publicly available material.

By "the what part" I mean: what DS metrics should definitely be watched, and which are good candidate's to keep an eye on as well? The clusters run Cassandra + Spark + Graph so we're interested in things pertaining to those specifically.

In a former job I could alway just "look at Mike's doc on DSE monitoring" and go from there. But that job being former precludes that approach. Also, in other circumstances I might install OpsCenter, take its starting defaults, and go from there. But OpsCenter isn't a good fit for where we are with our current monitoring.

So, is there a list of "the top 20 metrics you should be watching in DSE" somewhere which could serve as a good starting place for our team? Or should I just have an intern go through these files: and pull out the metrics in use there?

10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered Erick Ramirez edited

@JoshPerryman You're on the right path there by using the DSE Grafana dashboards as a reference point.

There isn't a one-size-fits-all "top 20" list of metrics but we think the condensed cluster metrics (dse-cluster-condensed.json) is a good place to start. The rest of the dashboards have the metrics we think are relevant to most users.

We find that as users get more familiar with the dashboards over several days, they tend to pick and choose the ones they want to build their own dashboards to meet their requirements. Cheers!

10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.