question

bretts avatar image
bretts asked eddy.wong_186388 commented

Is there a limit on edges for a supernode in DSE Graph?

We have a use case where we could potentially have a node with millions of edges on it and I can't seem to find any documentation on limits other than Cassandra.

There are older articles on supernodes but most of them talk about ways to reduce the number of edges by using labels or properties, which is not an option here.

Are there published limits around node edge limits in DSE graph?

dsegraphsupernode
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Erick Ramirez avatar image
Erick Ramirez answered

@bretts On the About DataStax Graph page, it talks about support for billions of vertices and edges:

Scalable for large graphs and high volumes of users, events, and operations
DSG can contain billions (109) of vertices and edges. It takes advantage of the unique scalability of Apache Cassandra to store graph data.

I'm going to reach out internally to the DSE Graph team to find out about limits for supernodes. Cheers!

Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

eddy.wong_186388 avatar image
eddy.wong_186388 answered eddy.wong_186388 commented

There isn't a theoretical limit (in the db), but it is a good practice NOT to navigate a supernode in an online session, because you could easily load too much data into memory. To build an application that can access the graph, you would use an approach called "lambda architecture": https://en.wikipedia.org/wiki/Lambda_architecture

With this architecture, you have several layers of processing: a batch layer, a speed layer, and a serving layer. Each operating with the same piece of info, but with different response times.

DSE Graph has OLTP (Gremlin) and OLAP (Gremlin/Spark) modes that can help implement this architecture. Ideally in an online session, you would navigate a subgraph that is a filter or a summary of the larger graph. There is some design work for coalescing your data though, like making sure that you use consistent vertex ids.

With the upcoming DSE 6.8, there will be more consistent polygot persistence (and polyglot access), ie. your data will accessible via Cassandra CQL, as well as via the Gremlin query language. For the supernode case, a Cassandra edge index (partition key + clustering key) will also come handy for a faster serving layer.

2 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

bretts avatar image bretts commented ·

Thanks for the response @eddy.wong_186388. A lambda architecture isn't appropriate for our use of graph in this case; we can avoid gremlin traversals that would potentially explore all the edges of a potential supernode. We have considered it for other cases!

The question is more about practical limitations of the data model for a node that has a large number of edges. Based on what you're saying, it sounds like that equates to the limitations of a single partition.

0 Likes 0 ·
eddy.wong_186388 avatar image eddy.wong_186388 commented ·

Update to my answer, there is a practical limit of less than 400mb per partition:

https://thelastpickle.com/blog/2019/01/11/wide-partitions-cassandra-3-11.html

Do the math for how many edges per vertex, if 1k per row, then 400k edges at most.

0 Likes 0 ·