Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

Yeikel.ValdesSantana_186477 avatar image
Yeikel.ValdesSantana_186477 asked ·

When do we need indices for edges? How did you implement them?

In our graph , we are planning to run queries like this one :

g.V().hasLabel("customer").has("id","123").outE("has_phone").has("creat_ts","2019-10-18T21:21:21Z")

We are also planning to implement a "time machine" similar to how is explained here : https://www.datastax.com/blog/2016/09/gremlins-time-machine. In a few words , we need to find support "phone edges" that do not have "end_ts" (end_ts is stored in the edge)

I noticed that building an index or not building one , generates a similar query :

SELECT * FROM "label"."label_e" WHERE "id" = ? AND "partition_key" = ? AND "~~edge_label_id" = ? AND "~creat_ts" = ? LIMIT ? ALLOW FILTERING; with params (java.lang.String) xxx, (java.lang.String) fc768f6eebd170e0fc81e6516001510fb78c0b8f, (java.lang.Integer) 67726, (java.time.Instant) 2019-10-18T21:21:21Z, (java.lang.Integer) 50000
SELECT * FROM "graph_name"."label_e_OUT_search_by_ts_e" WHERE "id" = ? AND "partition_key" = ? AND "~~edge_label_id" = ? AND "~creat_ts" = ? LIMIT ? ALLOW FILTERING; with params (java.lang.String) xx, (java.lang.String) fc768f6eebd170e0fc81e6516001510fb78c0b8f, (java.lang.Integer) 67726, (java.time.Instant) 2019-10-18T21:21:21Z, (java.lang.Integer) 50000

From the documentation[1], you recommend that we should create different type of indexes depending on the cardinality :

Index type Use
Materialized view Most efficient index for high cardinality, high selectivity vertex properties and equality predicates.
Secondary index Efficient index for low cardinality, low selectivity vertex properties and equality predicates.
Search index

Efficient and versatile index for vertex properties with a wide range of cardinality and selectivity. A search index supports a variety of predicates:

  • Full Text and String searches

  • Fuzzy search, both tokenized and non-tokenized

  • Phrase Search

  • Spatial (geospatial, Cartesian) searches

Considering that timestamps are variables of high cardinality , do we need an index?

We are also planning to run queries like this :

g.V().hasLabel("label").has("id","123").outE("has_address").has("type","home")

Where address type's cardinality is low. The number of values are > 3. Do we need an index there?

We noticed that queries do not fail with or without index (unlike vertices) , so that makes us wonder if there is any impact or need to have edge indexes.

Lastly , It does not seem that we can set the type of index for edges and the documentation does not match what we found experimenting. According to our interpretation of the documentation , edge indexes are secondary views, but inspecting the tables we only see secondary views. Could you please clarify?

[1] https://docs.datastax.com/en/dse/6.0/dse-dev/datastax_enterprise/graph/using/indexing.html

dsegraph
1 comment
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

@Yeikel.ValdesSantana_186477 just acknowledging your question and I've reached out to our Graph engineers for a response. Cheers!

0 Likes 0 · ·
polandll avatar image
polandll answered ·

Are you using DSE Graph 6.0? In DSE Graph 5.1-6.7, you do not have a choice about setting an edge index to either materialized view or secondary index. You need to use an edge index as detailed here: https://docs.datastax.com/en/dse/6.0/dse-dev/datastax_enterprise/graph/reference/schema/refAddEdgeIndex.html

You don't give the index that you built, but I would presume you did something like:

schema.vertexLabel('customer').index('has_PhoneByEnd_ts').outE('has_phone').by('end_ts').add()

for the edge index and that "id" is the partition key of the vertex "customer". If you have this edge index, then you should be able to take advantage of it in the query that you want to do.


Doing a "end_ts" doesn't exist will take a bit of fancy work, as indexes generally find a value for a given field (end_ts = 2020-01-01:01:01:01). But give the edge index a try!

4 comments Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Hi @polandll ,

Thank you for response. I am sorry I missed to specify the version but you are correct with your assumption. We are using 6.0.4

Unfortunately your response does not answer my question.

Do we really need an index for that scenario? Do they help? The output of profile() seems similar for both , so I am not sure how it can help. Please clarify if you can.


0 Likes 0 · ·
Yeikel.ValdesSantana_186477 avatar image Yeikel.ValdesSantana_186477 Yeikel.ValdesSantana_186477 ·

@Erick Ramirez Please let me know if you help to get a follow up reply on this.


Thanks!

0 Likes 0 · ·
polandll avatar image polandll Yeikel.ValdesSantana_186477 ·

It is, in fact, difficult in DSE Graph 6.0.x to see if an edge index is used, even using profile(). The CQL statement that you include is for the first step in the Gremlin query, and whether or not you have an edge index, that step will never use an index, because you are using the partition key to get the customer (g.V().has('customer', 'id', 123). What you would want to look at is the shortened time in the profile() after including an edge index on the second step, the outE(...).has(...). I believe you do see that, even though in the older versions, profile() doesn't proclaim the index use. I did test this, but apparently the studio notebook file is too large to attach here for your perusal.

0 Likes 0 · ·

I did not see any difference in time. Could you please share your analysis somewhere else? (github for example)


Thanks!

0 Likes 0 · ·
yukim avatar image
yukim answered ·
g.V().hasLabel("customer").has("id","123").outE("has_phone").has("creat_ts","2019-10-18T21:21:21Z") 

You seem to have "creat_ts" property in "has_phone" edge and want to use that property to traverse to adjacent vertex.

If you do not have edge index for "creat_ts" in "has_phone" edge, then DSE Graph has to scan all the edges connected from "customer" of "id" equals "123", and check their edge label and creat_ts value.

If you have edge index, then DSE Graph can directly look up the edge with the given label ("has_phone") and "creat_ts".

Even though generated CQLs look the same for both cases, this is the difference between them.

So, when to use Edge Index? If you have many connected edges (high cardinality) and you know the exact value to look for, then it will be useful.

If you see difference in performance, then you may not have enough "has_phone" edges to see the difference.

> Lastly , It does not seem that we can set the type of index for edges and the documentation does not match what we found experimenting. According to our interpretation of the documentation , edge indexes are secondary views, but inspecting the tables we only see secondary views. Could you please clarify?

I'm not sure if I understand your question, but edge index is implemented using Cassandra's Materialized View. In your second CQL, "graph_name"."label_e_OUT_search_by_ts_e" is the Materialized View.

Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.