Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

josef.schauer_169242 avatar image
josef.schauer_169242 asked ·

How do I index UDTs for Graph in DSE 6.8?

Hello,

How to create indices on UDT the right way and be able to query them.

Schema:

schema.type('names')
  .ifNotExists()
  .property('id', UUID)
  .property('value', Text)
  .property('classification_ids', listOf(UUID))
  .create()
  
schema.type('hairColor')
  .ifNotExists()
  .property('id', UUID)
  .property('value', Text)
  .property('classification_ids', listOf(UUID))
  .create()
  
  schema.type('height')
  .ifNotExists()
  .property('id', UUID)
  .property('value', Integer)
  .property('classification_ids', listOf(UUID))
  .create()

schema.vertexLabel('Person').ifNotExists()
.partitionBy('id', UUID)
.property('names', setOf(typeOf('names')))
.property('hairColor', setOf(typeOf('hairColor')))
.property('height', setOf(typeOf('height')))
.create()

schema.vertexLabel('Person').searchIndex().ifNotExists().by('names').create()
schema.vertexLabel('Person').searchIndex().ifNotExists().by('hairColor').create()
schema.vertexLabel('Person').searchIndex().ifNotExists().by('height').create()

Data:

g.addV('Person').property('id', UUID.randomUUID())
.property('names', [ [ id: UUID.randomUUID(), value:'Name1', classification_ids: [UUID.randomUUID()] as List] as names] as Set)
.property('hairColor', [ [ id: UUID.randomUUID(), value:'black', classification_ids: [UUID.randomUUID()] as List] as hairColor] as Set)
.property('height', [ [ id: UUID.randomUUID(), value:160, classification_ids: [UUID.randomUUID()] as List] as height] as Set)

g.addV('Person').property('id', UUID.randomUUID())
.property('names', [ [ id: UUID.randomUUID(), value:'Name2', classification_ids: [UUID.randomUUID()] as List] as names] as Set)
.property('hairColor', [ [ id: UUID.randomUUID(), value:'brown', classification_ids: [UUID.randomUUID()] as List] as hairColor] as Set)
.property('height', [ [ id: UUID.randomUUID(), value:150, classification_ids: [UUID.randomUUID()] as List] as height] as Set)

I want to query for a person with a specific name and classification_ids:

works:

g.V().hasLabel('Person').has('names.value', eq('Name1')).properties()

works not:

g.V().hasLabel('Person').has('names.value', eq('Name1')).has('names.classification_ids', contains('390e208d-eefc-4b03-ba17-6770336cf31e' as UUID)).properties()

error:

java.lang.IllegalArgumentException: Inconsistent types found for expression: [names contains 390e208d-eefc-4b03-ba17-6770336cf31e, names = Name1]

How to get this runnable?


UPDATE:

I also have to delete properties by a specific id (The id in UDT). This means id in UDT must be searchable, too.


If model changes are necessary, this is still possible.

thanks in advance

Josef

dsegraphuser-defined type
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

bettina.swynnerton avatar image
bettina.swynnerton answered ·

Hi @josef.schauer_169242,

I think you are right, there is a problem with retrieving the indexed data from UDTs. The issue seems to come from an incorrect concatenation of the search criteria by the query optimiser, and I will raise a jira for this to investigate it further.

As a workaround I inserted a barrier() step, which is a way to break up the query optimiser.

This one worked for me (obviously with my UUID values):

g.V().hasLabel('Person')
.has('names.value', eq('Name1'))
.barrier()
.has('names.classification_ids', contains('ac8e44b9-50aa-43d2-b1ac-6c6a0e689a51' as UUID))
.properties()

Let me know if this works.

And thanks so much for posting schema and data, this helps a lot!


Edit: The effect of the barrier() step is that the search on the set does not hit the search index (in this particular example). Only the first search term before the barrier() step hits the index.

If you use the search on the set in the first step, the search index is used, so the indexing seems correct in itself:

gremlin> g.V().hasLabel('Person').has('names.classification_ids', contains('ac8e44b9-50aa-43d2-b1ac-6c6a0e689a51' as UUID)).profile()
==>Traversal Metrics
Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
__.V().hasLabel("Person").has("names",P.NestedE...                     1           1          25.643    95.87
  CQL statements ordered by overall duration                                                  20.245
    \_1=SELECT * FROM community_6417."Person" WHERE solr_query = '{"q":"*:*", "fq":["{!tuple}names.classifica
        tion_ids:ac8e44b9\\-50aa\\-43d2\\-b1ac\\-6c6a0e689a51"]}' LIMIT 2147483647 / Duration: 20 ms / Count:
         1
HasStep([~label.eq(Person), names.NestedElement...                     1           1           0.777     2.91
ReferenceElementStep                                                   1           1           0.326     1.22
                                            >TOTAL                     -           -          26.747       


here the example with barrier step:

gremlin> g.V().hasLabel('Person').has('names.value', eq('Name1')).barrier().has('names.classification_ids', contains('ac8e44b9-50aa-43d2-b1ac-6c6a0e689a51' as UUID)).profile()
==>Traversal Metrics
Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
__.V().hasLabel("Person").has("names",P.NestedE...                     1           1          22.191    94.77
  CQL statements ordered by overall duration                                                  16.062
    \_1=SELECT * FROM community_6417."Person" WHERE solr_query = '{"q":"*:*", "fq":["{!tuple}names.value:Name
        1"]}' LIMIT 2147483647 / Duration: 16 ms / Count: 1
HasStep([~label.eq(Person), names.NestedElement...                     1           1           0.644     2.75
NoOpBarrierStep                                                        1           1           0.276     1.18
HasStep([names.NestedElementPredicate{path=[cla...                     1           1           0.091     0.39
ReferenceElementStep                                                   1           1           0.211     0.91
                                            >TOTAL                     -           -          23.415        -
gremlin> 


Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.