Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

Ryan Quey avatar image
Ryan Quey asked ·

What are the pros and cons of starting with CQL then migrating to DSE Graph vs starting in Graph?

I'm starting a project that will eventually incorporate DSE Graph, at least on some level. I found that you can actually start with a regular Cassandra and then add DSE graph capabilities later. Since DSE Graph is built on top of Cassandra anyway, are there any disadvantages to this? I'm wondering especially since I'm already more familiar with Cassandra and CQL than I am with DSE Graph.

On the other hand, if I want to be able to access my data using CQL later, are there any disadvantages to starting with DSE Graph from the start? (e.g., using Gremlin and the Gremlin console, as demonstrated here).

I'm not asking for a product recommendation per se, but rather:

1) Will it change how I model my data if I go with one approach vs the other?

2) Is starting with CQL vs Gremlin a best practice, or are both equally fine?

3) If in the long run I plan on both hitting my db using both CQL queries and Gremlin queries (through DSE graph), will starting with one or the other be better?

graph
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

David Jones-Gilardi avatar image
David Jones-Gilardi answered ·

Hey there @Ryan Quey, this is a really great question. If you would have asked me this before the release of DSE 6.8 and the support for graph native tables in Cassandra I would have had a different answer. Since the links you provided are all referencing DSE 6.8 I assume that is where you are starting. Anything before that version and you can throw out my response. :)


I found that you can actually start with a regular Cassandra and then add [DSE graph capabilities later]. Since DSE Graph is built on top of Cassandra anyway, are there any disadvantages to this?
On the other hand, if I want to be able to access my data using CQL later, are there any disadvantages to starting with DSE Graph from the start?

I'll answer these as a group. With new native graph support you can essentially start in either direction, create tables in CQL then add native graph support later, or create your schema with graph and access them with CQL. Either direction will work. When you create your graph schema they are actually being implemented as C* tables anyway.

However, I would argue that adding native graph support to tables in CQL retroactively is more for cases where you already have a data model implemented with C* and want to add graph capabilities to that data model. If you know you want to enable graph at some point for your data model I would start by designing your data model for your graph use case up front. You could implement this data model with either CQL or Graph perfectly fine so if you are more familiar with CQL you can definitely use that method.

The key IMO is not so much which method you use, but the intent behind why you might create certain partition and primary keys for your data model. If you know you are designing for a graph use case up front you very well may design your access patterns differently than you would if you were just using straight C* tables which is why I suggest you design with the graph model up-front.


Will it change how I model my data if I go with one approach vs the other?

Per my comments above I would argue, depending on what you are trying to accomplish, "yes" it could change your data model. Again though, not because using the CQL method is intrinsically different than the Graph method or anything, but more "intent" behind the data model itself if that makes sense.


Is starting with CQL vs Gremlin a best practice, or are both equally fine?

They are both equally fine IMO.


If in the long run I plan on both hitting my db using both CQL queries and Gremlin queries (through DSE graph), will starting with one or the other be better?

The Gremlin language was created specifically for the graph use case and it's nomenclature and syntax provide a rich set of methods and tools to execute against Graph. It's also widely adopted as Tinkerpop is used all over the place, not just in DSE, so there is already a ton of information out there on how to do things with Gremlin. So, IMO, I would start with Gremlin, then use CQL in cases where I just needed simple CQL access to data within my Graph, but that's me and I've already spent some time in the Gremlin language so I'm kinda biased at this point.

One of the things that got me really excited for native graph support in C* when this first came up was the ability to easily perform CQL queries against graph enabled tables in CQL. I might really like the Gremlin language, but nothing like being able to quickly grab some data from my graph without having to learn and understand a new language.


BTW, are you familiar with @Denisekgosnell@gmail.com and Matthias's graph book https://github.com/datastax/graph-book? This is a wonderful resource and one of the best places to start with Graph IMO. The github repo in the link provides a whole set of notebooks you can use with graph for many of the most regular use cases out there not to mention a docker image that runs DSE Graph with datasets pre-loaded so you can. jump right in. Also, take a look at DataStax Desktop, the 6.8 "kitchen sink" stack https://downloads.datastax.com/#desktop. It also loads DSE Graph with a set of notebooks that explore going from CQL to Graph and back. Feel free to reach out if you need any help with any of this.


Finally, I added @Artem Chebotko and @Denisekgosnell@gmail.com to get their input. They are both much heavier hitters in this space than I am and may have something to say per your questions.

1 comment Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

This is really helpful, I appreciate the detail you provide. Thanks!

1 Like 1 · ·