Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started



baid_manish_187433 avatar image
baid_manish_187433 asked ·

Does DseGraphFrame load entire graph database?

Hi, We are just starting to use DseGraphFrame.

Looked at various examples/blogs, first statement typically is: DseGraphFrame graph = DseGraphFrameBuilder.dseGraph("test", spark);

Does this load entire graph into the memory?

There is a need to load data for sub-graph, ideally originating from a query. Any references would be helpful.

We are using DSE 6.8.1.


10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

bettina.swynnerton avatar image
bettina.swynnerton answered ·

Hi @baid_manish_187433,

DseGraphFrames are an extension of Spark DataFrames. They are similar to Spark GraphFrames with some additional methods to support Tinkerpop graph traversals.

A DseGraphFrame represents a Graph as two virtual tables: a Vertex DataFrame and an Edge DataFrame (just as you can consider a graph as a set of two sets, one for all vertices, one for all edges).

The principles of Spark apply to reading DseGraphFrames.

An important aspect with Spark is that setting the DseGraphFrame (or DataFrame etc) does not mean that anything is read. Spark does not read anything until certain actions are called, like count or show. Once one of these actions is called, Spark loads data in chunks, more accurately known as partitions.

Spark also divides a job into transformation stages, and applies a stage of the job only to the currently loaded partition. Once this is done, Spark keeps the output and then loads more partitions, and reapplies the stage transformation, until all the partitions have been read and processed, and then on to the next stage, and so on.

So, via the DseGraphFrames, you get access to the entire graph database, but it is not all read at that time.

Spark is an extensive topic, and if you want to understand more about the Spark internals, let me refer you to the official Spark documentation.

However, you can get started with DseGraphFrames without the detailed Spark knowledge. This is a good page for first experiments.

And this is still a good blog about DseGraphFrames:

And yes, via traversals on these graphframes you can then construct vertex and edge tables of subgraphs.

I hope this answers your question.

5 comments Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Thank you. I have worked on spark.

Just to re-confirm, with respect to DseDatagraphFrame - if a traversal is performed ex.

val graph = spark.dseGraph("my_graph")

//use the TinkerPop API

graph.V().hasLabel("Person").has("department", "345")

A specific query (to get subset of persons) will be send to DSE, and data is loaded onto the spark executors.

FYI - DB is used for both OLTP (70%) and OLAP (30%) transactions.


0 Likes 0 · ·

Hi @baid_manish_187433,

sorry that I misunderstood your question.

I am quite certain that the whole graphframe will be loaded, not just the table for the label "Person". Let me see how best to demonstrate the behaviour and get back to you.

0 Likes 0 · ·
baid_manish_187433 avatar image baid_manish_187433 bettina.swynnerton ♦♦ ·

Thanks Bettina. Do share some pointers to demonstrate the behavior.

0 Likes 0 · ·
Show more comments