Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

raktim.0526_188212 avatar image
raktim.0526_188212 asked ·

With DSE Graph 6.7, is it possible to run CQL search queries on graph vertex tables?

I have created a graph schema(DSE 6.7), a search index, inserted graph data into it.

schema.propertyKey("geoBoundary").Text().single().create()
schema.propertyKey("location").Point().withGeoBounds().single().create()
schema.vertexLabel("region").partitionKey("regionId").properties("name", "location", "geoBoundary").create()
schema.vertexLabel("region").index("search").search().by("geoBoundary").by("location").add()  
graph.addVertex(label,'region', 'regionId','0ab79c08-ef37-11e9-a713-2a2ae2dbcce8','location',Geo.point(149.0124,35.4735),'name','Dummy_ACT','geoBoundary',"MULTIPOLYGON (((-122.756762 49.000239, -117.033359 49.000239)), ((-122.718423 48.310145, -122.586977 48.35396, -122.608885 48.151313, -122.767716 48.227991, -122.718423 48.310145)), ((-123.025132 48.583992, -122.915593 48.715438, -122.767716 48.556607, -122.811531 48.419683, -123.041563 48.458022, -123.025132 48.583992)))");

SELECT name FROM test.region_p where solr_query='{"q":"*:*","fq":"geoBoundary:\"Intersects(MULTIPOLYGON (((-122.718423 48.310145))))\""}';
OR
SELECT name FROM test.region_p where solr_query='{"q":"*:*"}';

Both throws this error,

Search queries are not allowed when the document unique key does not match the CQL primary key unless the query is a count or the selected columns are a subset of the unique key.

I am unable to query MULTIPOLYGONS using Gremlin because, graph only supports Point, Linestring and Polygon datatypes. My data contains MULTIPOLYGONS too. So this eliminates the use to Gremlin. So my question here is, without changing the graph schema and data , can i run CQL query with the solr_query parameter?

dsegraphsearch
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

bettina.swynnerton avatar image
bettina.swynnerton answered ·

Hi @raktim.0526_188212,

you are running into a limitation on multi-pass Search queries on graph vertex tables.

Due to the underlying layout and schema of the graph tables in Cassandra, which may have multiple rows per vertex, multi-pass Search queries against the vertex table are not permitted in DSE Graph 6.0 and 6.7.

Search queries on graph tables are restricted to single-pass queries. The valid way to issue a direct CQL statement that includes the SOLR API is to retrieve only the primary key fields in the select statement.

For context, here is some additional explanation about single-pass queries, quoted from here:

With single-pass queries, each node reads all rows that satisfy the query and returns them to the coordinator node. An advanced feature, a single-pass distributed query saves one network round trip transfer during the retrieval of queried rows.

A regular distributed query (multi-pass query) performs two network round trips:

- The first one to retrieve IDs from DSE Search that satisfy the query.

- The second one to retrieve only the rows that satisfy the query from the database, based on IDs from the first step.


In your particular case, you can work around this limitation by rolling the property name in to the partition key:

schema.vertexLabel("region2").partitionKey("regionId", "name").properties("location", "geoBoundary").create()

This then lets me do the specified search queries:

cqlsh> SELECT name FROM test2.region2_p where solr_query='{"q":"*:*","fq":"geoBoundary:\"Intersects(MULTIPOLYGON (((-122.718423 48.310145))))\""}';

 name
------
(0 rows)

cqlsh> SELECT name FROM test2.region2_p where solr_query='{"q":"*:*"}';

 name
-----------
 Dummy_ACT
(1 rows)

This might work well in this case.

In general, the drawback of partition keys with a large number of properties is that you will need to provide all of the properties if you want to look up based on the partition key (not using an index). Using an index is generally slower than the data read based on the partition key, take this into consideration. And of course, the question remains whether such a schema change is practical with your existing graph.

On another general note, searching graph vertex tables directly with Search queries can lead to unexpected behaviour due to the way how the properties are represented in the vertex tables. This has been problematic in some cases, especially in the context of paging. It is not something that we generally recommend.

I hope this helps!


P.S. thanks for providing schema and data with your question, this really helped!

2 comments Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

@bettina.swynnerton Thank you very much for your detailed and prompt reply always!

0 Likes 0 · ·

You are very welcome!

0 Likes 0 · ·