Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

aparajitha.neelakanta_192572 avatar image
aparajitha.neelakanta_192572 asked ·

How do we do a traversal on one-to-many and many-to-many edges, and get the count?

I need to get the total count of businesses in a particular state. My schema is designed such that business vertex is connected to suburb vertex via an edge. And then suburb to region and then region to state. Now each state will have multiple regions and each region will have multiple suburbs. Given the state name, I need to get all the businesses in it via regions and suburbs. Is there a way to fetch it in one query?

dsegraph
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

bettina.swynnerton avatar image
bettina.swynnerton answered ·

Hi @aparajitha.neelakanta_192572,

do you only want to count the businesses?

Depending on the direction of your edges (you might have to use the in() step), to count the number of unique businesses you could do:

g.V().hasLabel("state").has("state_name", "x").out("connected_to_region").out("connected_to_suburb").out("connected_to_business").dedup().count()

Let me know if I understood your question correctly. Also, what version of Graph are you using?

Cheers!

2 comments Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Hi,

Your understanding is perfect and this query returns the data. However, due to the amount of data, it takes a long time and gives db timeout sometimes. Is there a more efficient way of writing this query?

0 Likes 0 · ·
bettina.swynnerton avatar image bettina.swynnerton ♦♦ aparajitha.neelakanta_192572 ·

yes, that was also my concern. It is not really an OLTP query. If you have Spark available, then you can run this query in OLAP mode or with graphframes, count queries of this type are better done with Spark. The default timeout for OLAP is also much more generous than for OLTP (168 hours vs 30 seconds). It might still not be very fast, but you won't run into any timeouts.

Have a look at this and see if this gets you started:

https://docs.datastax.com/en/dse/6.8/dse-dev/datastax_enterprise/graph/graphAnalytics/graphAnalyticsSparkGraphComputer.html

0 Likes 0 · ·