PLANNED MAINTENANCE

Hello, DataStax Community!

We want to make you aware of a few operational updates which will be carried out on the site. We are working hard to streamline the login process to integrate with other DataStax resources. As such, you will soon be prompted to update your password. Please note that your username will remain the same.

As we work to improve your user experience, please be aware that login to the DataStax Community will be unavailable for a few hours on:

  • Wednesday, July 15 16:00 PDT | 19:00 EDT | 20:00 BRT
  • Thursday, July 16 00:00 BST | 01:00 CEST | 04:30 IST | 07:00 CST | 09:00 AEST

For more info, check out the FAQ page. Thank you for being a valued member of our community.


question

danielleplex_185261 avatar image
danielleplex_185261 asked ·

When do we need to think in terms of cassandra vs graph when working with DSE graph?

As a general rule, I tell my team to think about DSE graph as a graph and how graph schemas and traversals work rather than to start thinking about Cassandra and the underlying technologies like Solr.

There are many examples when mentioning Cassandra leads to questions like "Why can't we implement the same using X database" or "How are the tables structured?"

In general, this adds noise to our conversations and adds confusion to the project as it draws the attention to what I think to be the wrong topic.

In my opinion, I believe that we need to concentrate on the graph to focus on what I believe to be the critical part of the project such as how our relationships are defined, how we build our gremlin queries and how do we need to retrieve our data.

I see the internals of DSE graph (how you store the actual tables in Cassandra, what kind of relationships you build between the internal tables, etc) as a layer of abstraction rather than something I need to focus on, but I also understand that some concepts like partitioning is something that we need to be aware of.

For the average stakeholder, I am doing my best to avoid mentioning Cassandra as it adds the noise explained above, but I also understand that some transparency might be needed

Where should we draw the line? How much should we know about Cassandra before considering an implementation with DSE?


cassandraschemadse graph
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

David Jones-Gilardi avatar image
David Jones-Gilardi answered ·

As @Erick Ramirez said, you already hit on some solid points. I agree that it's a good approach to start with your Graph topology, figure out what that looks like. Then, ask the question of whether or not you are truly solving for a Graph problem.

If you are and Graph is your answer then as both you and @Cedrick Lunven stated you do need to take into account that Cassandra is under the hood to ensure your partitioning scheme is going to perform at scale. The video that @Cedrick Lunven included does a really nice job explaining this.

"How much should we know about Cassandra before considering an implementation with DSE?"

Understanding what partition keys and clustering columns are and why they matter is pretty important. Just like in Cassandra, having the correct data model is key to performance at scale.

Again, the video from @jlacefield listed above gives a nice overview.


Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Cedrick Lunven avatar image
Cedrick Lunven answered ·

I will start with my 2 cents then :)


  • As a graph developer you need to know and focus first on gremlin language. You would designed your model with vertices and edges based on the relationships of your business. Step1. Define your entities
  • I believe devs also need to know DSE Graph is a distributed system which purpose is to scale as much as you need and there are multiple servers under the hood. Everybody can understand that having a request accessing a single node would perform better. Data Locality and subgraph is the target to help you define partitions keys


This is the way things are described in DS330. First Property Graph Data Model but also Data Locally right after that.

https://academy.datastax.com/units/3306-understanding-graph-partitioning-and-data-locality?resource=ds330-datastax-enterprise-6-graph


Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Erick Ramirez avatar image
Erick Ramirez answered ·

@danielleplex_185261 I hate to admit but you've pretty much covered all the points and you have a very good understanding of what your teams should focus on and what they can set aside as "noise".

What I'll do is socialise your points internally and I'll get other engineers at DataStax to put in their two cents. Cheers!

1 comment Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Thanks for your help. I am looking forward to hearing from them

0 Likes 0 · ·