Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

tapan.sharma_186956 avatar image
tapan.sharma_186956 asked ·

Slow performance of edge insertion in DSE Graph

Hi Team,

I am trying to insert edges in DSE Graph but not getting satisfactory performance.

I want to insert 1 million edge but able to insert roughly 95K in an hour. Here is what I am trying to do. Please let me know how can I improve the performance.

1. Batch is executed for 500 edges at a time.

BatchGraphStatementBuilder builder = BatchGraphStatement.builder();

2. Traverse through all the edges to be inserted (Steps 3 - 8)

3. Get Source Vertex.

Vertex sourceVertex = getSourceVertex(edgeRecord.getFromId());

4. Get Target Vertex.

Vertex targetVertex = getTargetVertex(edgeRecord.getToId());

5. Create edge label if not exists.

String createEdgeLabel = "schema.edgeLabel('" + edgeRecord.getName() + "').ifNotExists().create()";

6. Create Graph Traversal Object

GraphTraversal<Vertex, Edge> graphTraversal = g.V(sourceVertex.id()).addE(edgeRecord.getName()).to(targetVertex);

7. For each edge property(There are 5 properties per edge)

  • Create edge index:
    createEdgeIndexIfNoExists(edgeIndex);
  • Add property to graph traversal
    graphTraversal.property(key, fields.get(key));

8. Add Graph traversal object to batch statement builder

builder = builder.addTraversal(graphTraversal);

9. After adding all the edges, execute the batch

GraphResultSet result = cqlSession.execute(batchGraphStatement);
dsegraph
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

bryn.cooke_101029 avatar image
bryn.cooke_101029 answered ·

Hi.

There are a couple of things to bear in mind:

  1. Schema operations are not cheap. You should define your schema up front.
  2. Reading the vertices before writing the edges will be slow. It is better to just add the edge and vertices. DSE Graph will upsert if a vertex or edge already exists as long as you are using custom IDs and your edge cardinality is single (Assuming your DSE Graph version is < 6.8).

For instance, here is an example of adding an edge without reading the vertices:

g.addV('person').
     property('name', 'bob').
     as('bob').
 addV('software').
     property('name', 'studio').
     as('studio').
 addE('created').from('bob').to('studio').
     property('weight', 0.8)

You don't have to specify all the properties on the vertices, just those that are pert of the primary key.

If you have the option, consider using the 6.8 Core graph as this has much improved performance, especially around edge inserts.

1 comment Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Thanks @bryn.cooke_101029. Let me try and get back to you on this.

0 Likes 0 · ·
Lewisr650 avatar image
Lewisr650 answered ·

You might also consider reducing your batch size to ~100 so you aren't slamming the JVM from one extreme to the other. You'll realize better performance and sustained throughput with smaller batches.

Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Erick Ramirez avatar image
Erick Ramirez answered ·

@tapan.sharma_186956 Your traversals and queries aside, the performance of the cluster is a factor of its capacity. When selecting hardware for DSE Graph, keep the following in mind:

  • large amounts of RAM allow DSE to perform better
  • 60GB of RAM is preferred for production environments
  • allocate 31GB of memory to the heap
  • 16-core machines is the recommended minimum for production workloads

For more info, see Capacity planning and hardware selection for DataStax Enterprise implementations.

In the meantime, I will request one of our Graph engineers to review and comment on your post. Cheers!

1 comment Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Thanks Erick.

I understand the production environment requirement. Right now, I am working in the development environment with a million of nodes.

0 Likes 0 · ·