Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

Yeikel.ValdesSantana_186477 avatar image
Yeikel.ValdesSantana_186477 asked ·

Is using edge properties preferred over a new edge to support Graph time traversals?

Data modeling question regarding the use of edge properties versus a new edge to support "Time traversals" (SubgraphStrategy).

In our graph, we have the following schema :

Person(vertex) => has_phone(edge) => Phone(vertex)

Where the has_phone edge has the following properties :

  • create_timestamp
  • end_timestamp (to denote an "old phone")
  • update_timestamp

The reason we implemented timestamps is that we'd like to support queries like "Show me the phone numbers that this person had" and because we'd like to know when that relationship ended, but we do not have intentions to ask questions like "Show me the phone numbers that this person had in 2019".

In general, we will have the following queries.

  1. Show the phones that this person has at the moment (80-90% of our queries)
  2. Show me the phones that this person had at some point
  3. 1&2 together

With that in mind, our initial idea was to use "Time traversals"[1] using a SubgraphStrategy.

But I was wondering if that's how you normally implement this considering the cardinality of the timestamps , the possibility of having to create an index to support the workload and the type of queries that we are planning to create (as I described, 80-90% will be about the current state of the graph and not the past)

In that sense, I was wondering if this makes more sense :

Person(vertex) => has_phone(edge) => Phone(vertex) - for all the current phones

Person(vertex) => had_phone(edge) => Phone(vertex) - for all the "past" phones

if it helps, we are using DSE graph 6.0.4

[1] https://www.datastax.com/blog/2016/09/gremlins-time-machine

dsegraph
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

jeromatron avatar image
jeromatron answered ·

I don't think you would use the subgraph strategy in this case.

You might have the current/primary phone denormalized as a property on the person vertex. That way the most common use case is a simple vertex lookup. Then you could have all phones past and current as you've described as separate vertices, so the past phone lookup is a single hop across all had_phone edges.

You mention indexes - vertex centric or edge indexes in this case wouldn't really be useful unless you had at least hundreds of past phones.

With DSE 6.8, you also have access to collection types on vertices and edges. That would open up your options to have a map of phones on the vertex itself if you wanted.

Hope that helps.

Jeremy


1 comment Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

As I explained to on the gremlin-users mailing list, you might also consider a DSL to help encapsulate your filters and prepare the three graph views you described.

0 Likes 0 · ·