DataStax Academy FAQ

DataStax Academy migrated to a new learning management system (LMS) in July 2020. We are also moving to a new Cassandra Certification process so there are changes to exam bookings, voucher system and issuing of certificates.

Check out the Academy FAQ pages for answers to your questions:


question

bretts avatar image
bretts asked ·

Is there a limit on edges for a supernode in DSE Graph?

We have a use case where we could potentially have a node with millions of edges on it and I can't seem to find any documentation on limits other than Cassandra.

There are older articles on supernodes but most of them talk about ways to reduce the number of edges by using labels or properties, which is not an option here.

Are there published limits around node edge limits in DSE graph?

dsegraphsupernode
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

eddy.wong_186388 avatar image
eddy.wong_186388 answered ·

There isn't a theoretical limit (in the db), but it is a good practice NOT to navigate a supernode in an online session, because you could easily load too much data into memory. To build an application that can access the graph, you would use an approach called "lambda architecture": https://en.wikipedia.org/wiki/Lambda_architecture

With this architecture, you have several layers of processing: a batch layer, a speed layer, and a serving layer. Each operating with the same piece of info, but with different response times.

DSE Graph has OLTP (Gremlin) and OLAP (Gremlin/Spark) modes that can help implement this architecture. Ideally in an online session, you would navigate a subgraph that is a filter or a summary of the larger graph. There is some design work for coalescing your data though, like making sure that you use consistent vertex ids.

With the upcoming DSE 6.8, there will be more consistent polygot persistence (and polyglot access), ie. your data will accessible via Cassandra CQL, as well as via the Gremlin query language. For the supernode case, a Cassandra edge index (partition key + clustering key) will also come handy for a faster serving layer.

1 comment Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Thanks for the response @eddy.wong_186388. A lambda architecture isn't appropriate for our use of graph in this case; we can avoid gremlin traversals that would potentially explore all the edges of a potential supernode. We have considered it for other cases!

The question is more about practical limitations of the data model for a node that has a large number of edges. Based on what you're saying, it sounds like that equates to the limitations of a single partition.

0 Likes 0 · ·
Erick Ramirez avatar image
Erick Ramirez answered ·

@bretts On the About DataStax Graph page, it talks about support for billions of vertices and edges:

Scalable for large graphs and high volumes of users, events, and operations
DSG can contain billions (109) of vertices and edges. It takes advantage of the unique scalability of Apache Cassandra to store graph data.

I'm going to reach out internally to the DSE Graph team to find out about limits for supernodes. Cheers!

Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.