Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

iamgroot avatar image
iamgroot asked ·

Is the DSE Graph Loader able to check if a record exists before performing an update?

We have a use case where we have geo data like sub-urbs, regions , states and countries and they are related with each other as

SUBURB ---belongsto ------REGION --------belognsto--------STATE-----belongsto------COUNTRY

The data for the different nodes above is already loaded with some existing schema like

SuburbID
Name
Boundary

Now, we have to add new additional property value to the existing suburb vertices. Can we use GraphLoader in this case and try to do bulk loading of the additional properties for the existing vertices.

Typical flow should be while loading GraphLoader verifies if the vertex exist add the new property to the vertex.

Also if we cannot use graph loader, what would be the best way to add additional properties o the existing vertex.

graph
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

jeromatron avatar image
jeromatron answered ·

Any updates should ideally be idempotent upserts as long as I have the IDs of the vertices and edges. In that case, I shouldn't need to check for existence and I can use something like a table loader like DSBulk to load the graph data (provided you're using DataStax Enterprise 6.8) .

Just to give an example that should be okay with blind upserts:

If I have a simple product rating system in a graph, I have a user vertex/table and a product vertex/table. I have a rating edge/table. Say I have Erick (with a unique id, 1) with user information in the user table like location "Melbourne, Victoria". Erick rates the product "Basketball" (with its own id, 2) with data like Manufacturer "Spalder". The rating is "4" stars on "16 February 2021". Erick adds "Professional Basketball Player" as his job title. There is also a typo in the name of the manufacturer so the owner of the rating website updates it to "Spalding". If you already have the ids for the user and the product, you shouldn't need to check for existence. For Erick's career, if I have id 1 and "Professional Basketball Player" as the career, I can blindly upsert that data. It won't delete his location, it will simply add his career. If I already have the product id of 2 and the updated "Spalding" manufacturer, I can blindly upsert the data. It will updated the existing record with the corrected manufacturer name.

Is that approximately similar to what you want to do and do you have the ids (natural key or surrogate key) when you want to update the data? I just want to see if your use case fits upserting the data.

1 comment Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Hi,

thank you for your answer. I have a few concerns though. For my case, we are using datastax 6.7 and the data set I am using to update may have new Data items of suburbs, in that case, how do you recommend to use the above stated solution? the suburb vertex has connections to region vertex and few other vertex, and I am thinking to pick each data items from csv and then perform a check whether the suburb is already present in the graph, if not new vertex of suburb needs to be created.

0 Likes 0 ·
Erick Ramirez avatar image
Erick Ramirez answered ·

The Graph Loader API has an exists() method which checks if vertices already exist when edges are created. It can also check if an edge already exists.

For more info, see the graphloader API. Cheers!

2 comments Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Did you have a look at Jeremy's response?

The new question you posted (#10350) doesn't really help clarify your use case. The thing we don't understand is why you need to check if a suburb vertex exists.

If you have properties for the suburb vertex, then just insert the data and the vertex will get added in the process. If the suburb doesn't exist, the why do you have data for it?

It would be really ideal if you respond directly to the points that Jeremy raised. Cheers!

0 Likes 0 ·