Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

sean.fennell_168661 avatar image
sean.fennell_168661 asked ·

Is there a more efficient way to prune cycles without simplePath() ?

Hello,

I'm using the .simplePath() step to prune cycles when traversing:

g.V().has(...).repeat(outE("dependency").simplePath().inV()).emit()

However, simplePath is more expensive because it must keep path history. Is there a more efficient approach to prune cycles without using the .simplePath() step?

graphgremlin
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

stephen.mallette_131353 avatar image
stephen.mallette_131353 answered ·

I think that simplePath() is probably the best approach you can take to generally remove cycles within a traversal and virtually any other approach one could think of would likely turn on the tracking of path history to accomplish the same goal. Simply put, I think you would need to know the history of the path in order to determine if it cycled on itself unless of course your data was such that you knew no such cycle could exist. In that case though, you probably wouldn't be using simplePath() at all.

This is an interesting approach:

outE("dependency").simplePath().inV() 

where you filter the edges with simplePath() as opposed to what is more commonly seen with:

out("dependency").simplePath()

I'd be curious if there was any performance improvement as a result of doing one over the other and if that difference was more pronounced for different graph implementations. Unless you need the edges in your result I would guess that, generally speaking, for especially deep or long run traversals that the latter would outperform the former because there would be less objects in the path to evaluate for the cycle. On the other hand, the former filters on the outE() which would avoid more reads from the database in gathering the other vertex, so that might perform better in this specific case for DSE Graph.

3 comments Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

I believe I tested this a long while ago and found that simplePath on edges was faster. I don't have the numbers anymore, but at the time I assumed it was faster since it didn't have to read inVertex rows. I found overall with cassandra that edge filtering is faster than vertex filtering.


How about storing the inVertices such that I can check if a vertex was already visited, as I repeat, skip the traversal if the inVertex was already visited? I cannot store a visited property on my graph because it is immutable, and many concurrent traversals could overlap.

0 Likes 0 · ·

I think that your suggestion is an example that's a bit outside of the scope of what I could have generally suggested just because whether or not you can do it is probably based on your graph structure and the nature of your specific traversal. You could certainly store() the vertex you visited in a list and only continue to traverse if you hadn't seen that vertex before, but I think that will only assure that you only end up with paths of unique vertices. For example, you might find a path of [v[1],v[3],v[4]], but by just storing traversed vertices in `repeat()` you would then miss a path of [v[1],v[4],v[3]]. Now, it's possible that for your use case you don't care about the latter path and just knowing a path exists among v[1], v[3] and v[4] exist is enough, in which case that approach might be fine.

0 Likes 0 · ·

How do you see the cost of simplePath() manifesting as a problem? Is it showing up as a larger than expected memory cost? Or is this more related to just query speed?

0 Likes 0 · ·