I am currently running a 7 node DSE cluster (with analytics, graph and search ) and I am getting inconsistent results when doing a simple vertices count. I have tried running repair and cleanUp on all the nodes, but that didn't seem to make a difference. For fun I stood up another cluster with 5 nodes and loaded it with the same data, and I get the same exact results. At least the numbers are consistent!
When looking at the spark jobs I do see that when I get the wrong count very little data is being read. Makes me think that the tables are just sending a cached/pre calculated count, but I haven't proven that yet.Below are the different ways I am trying to get the count
The following give the correct results:
scala> spark.dseGraph("my_graph").cache().V().hasLabel("labelA").count.show() spark.sql("SELECT count(DISTINCT(labelA_id)) FROM my_graph.labelA").show()
gremlin> g.V().label().groupCount()
These searches give me a smaller incorrect number, but it is consistent)
scala> spark.dseGraph("my_graph").V().hasLabel("labelA").count.show() spark.sql("SELECT count(1) FROM my_graph.labelA").show()
gremlin> g.V().groupCount().by(label)
Has anyone seen anything like this? Not sure what could be in the data to cause this.