Hi, we have requirements to run some drop commands on our graph , so we decide to run OLAP queries for the same, we have 2 nodes in cluster and both have configured for spark (they are in mix load ). but queries were slow taking almost 7-8 min for data set of around 10lac data.
so I configured some spark commands by gremlin console in my graph.
those are :
:remote config alias g example_graph.a; g.graph.configuration.setProperty("spark.cores.max", 10); g.graph.configuration.setProperty("spark.executor.memory", "4g"); g.graph.configuration.setProperty("spark.executor.cores", "1"); g.graph.configuration.setProperty("spark.sql.shuffle.partitions", 500); g.graph.configuration.setProperty("spark.dynamicAllocation.enabled", "true"); g.graph.configuration.setProperty("spark.shuffle.service.enabled", "true"); g.graph.configuration.setProperty("spark.shuffle.service.port", "7437");
1. after configuring these commands our queries works in 2-3 min on avg. my concern is that suppose I only run these commands on one graph then should they will also effect our all graphs as well ( I want those conf. only one graph ).
2. how can I configure those commands using java api.
3. if there any best practice to increase speed of olap query also share those as well.
4. I also read about snapshot it would be better if you can elaborate about this as well if this can help.
sharing one query profile for better understanding of our dataset.
gremlin> g.V().hasLabel('Entitlement').out('Is').in('Of_Entity').profile() ==>Traversal Metrics Step Count Traversers Time (ms) % Dur============================================================================================================= GraphStep(vertex,[]) 1010654 1010654 39234.918 76.15 HasStep([~label.eq(Entitlement)]) 703 703 6218.585 12.07 VertexStep(OUT,[Is],vertex) 588 588 6049.714 11.74 VertexStep(IN,[Of_Entity],vertex) 640 640 17.395 0.03 >TOTAL - - 51520.614
So as we can see now visiting specific Entity from huge data it took around 52 sec. ( although this is only visiting , drop may take 1-2 min extra on top of this calculation ) . so we can assume an olap drop query with this amount of data will run in around 3 min.
I want to reduce this as low as possible.