question

JoshPerryman avatar image
JoshPerryman asked alioli commented

Get all of vertex label for a given partition key

This is on DSE 6.0.9, both locally (single-node tarball), and in a QA env (3 nodes, package install).

I have a domain which partitions nicely by owner, so I'm using owner_id as the partitioning for all types. An example is the person vertex label with schema along the lines of:

schema.vertexLabel("person").
  partitionKey("pkey").
  clusteringKey("ckey").
  properties("owner_id", "person_id", "name").
  create()

schema.vertexLabel("person").
  index("person_by_person_id").
  materialized().
  by("person_id").
  add()

in which: pkey = owner_id and ckey = person_id

(yes, I still prefer to have pkey/ckey properties separate from my domain objects)

In most cases, we should have between 1 and 8 persons for a given pkey.

I would like to just grab them (or the cars, or addresses) by the pkey value. Something like the following would be nice:

g.V().has('person','pkey','pkey_value').toList()

However, when I attempt this I get alerts about:

  • missing indexes (Could not find an index on vertices labeled 'person' to answer the condition: '((label = person & pkey WITHIN [[pkey_value]))'. Current indexes are: person_by_person_id(Materialized)->person_id)
  • allow_scan (Graph scan is NOT suitable...)
  • timeouts (<SELECT ~~vertex_exists FROM person_p WHERE ~~vertex_exists = true LIMIT 200000>, time 7408 msec - slow timeout 500 msec)

But... I can get something pretty close to what I'm looking for with:

SELECT * FROM person_p WHERE pkey = 'pkey_value';


So my question: Should it be possible to query based just on the configured partitionKey() property in DSE 6.0.9?


I'm considering multiple work-arounds, including:

  • Build a more complex traverser to start a specific vertex based on a label/pkey/ckey set and then go around the partition and collect the type
  • Query the Cassandra tables directly
  • Query the source system, which now also has knowledge of this ownership partition value.
  • Add a materialized() index on the owner_id property (though the cardinalities aren't ideal)
  • move up the schedule for our upgrade to 6.7 (though I was hoping to go to 6.8 or later next)


Oddly, it works fine in my test suite with a toy graph. It just seems to fail when I try to run the queries against production-sized data.

graphgraph-traversal
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

jlacefield avatar image
jlacefield answered alioli commented

Okay ... this is a known limitation with the existing graph engine. I did some quick research and remembered the JIRA that was related to this. This type of traversal pattern is acceptable in the "experimental" graph engine available through Labs.

The current workaround is to create an index for the partition key field.

2 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

JoshPerryman avatar image JoshPerryman commented ·

We can add the index. Thanks!

0 Likes 0 ·
alioli avatar image alioli commented ·

Happy to hear that a solution is available for testing.

I updated my question on StackOverflow accordingly: https://stackoverflow.com/questions/47435810/why-does-a-traversal-that-filters-on-partition-key-need-a-full-table-scan

0 Likes 0 ·
jlacefield avatar image
jlacefield answered JoshPerryman commented

Josh, i agree that this "should" work in Gremlin but need to check with Bryn and co, as well as look through JIRAs, in case there are reasons it does not.

1 comment Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

JoshPerryman avatar image JoshPerryman commented ·

Thanks Jonathan! Appreciate the quick response.

0 Likes 0 ·