This is on DSE 6.0.9, both locally (single-node tarball), and in a QA env (3 nodes, package install).
I have a domain which partitions nicely by owner, so I'm using owner_id as the partitioning for all types. An example is the person vertex label with schema along the lines of:
schema.vertexLabel("person"). partitionKey("pkey"). clusteringKey("ckey"). properties("owner_id", "person_id", "name"). create() schema.vertexLabel("person"). index("person_by_person_id"). materialized(). by("person_id"). add()
in which: pkey = owner_id and ckey = person_id
(yes, I still prefer to have pkey/ckey properties separate from my domain objects)
In most cases, we should have between 1 and 8 persons for a given pkey.
I would like to just grab them (or the cars, or addresses) by the pkey value. Something like the following would be nice:
g.V().has('person','pkey','pkey_value').toList()
However, when I attempt this I get alerts about:
- missing indexes (Could not find an index on vertices labeled 'person' to answer the condition: '((label = person & pkey WITHIN [[pkey_value]))'. Current indexes are: person_by_person_id(Materialized)->person_id)
- allow_scan (Graph scan is NOT suitable...)
- timeouts (<SELECT ~~vertex_exists FROM person_p WHERE ~~vertex_exists = true LIMIT 200000>, time 7408 msec - slow timeout 500 msec)
But... I can get something pretty close to what I'm looking for with:
SELECT * FROM person_p WHERE pkey = 'pkey_value';
So my question: Should it be possible to query based just on the configured partitionKey() property in DSE 6.0.9?
I'm considering multiple work-arounds, including:
- Build a more complex traverser to start a specific vertex based on a label/pkey/ckey set and then go around the partition and collect the type
- Query the Cassandra tables directly
- Query the source system, which now also has knowledge of this ownership partition value.
- Add a materialized() index on the owner_id property (though the cardinalities aren't ideal)
- move up the schedule for our upgrade to 6.7 (though I was hoping to go to 6.8 or later next)
Oddly, it works fine in my test suite with a toy graph. It just seems to fail when I try to run the queries against production-sized data.