question

tapan.sharma_186956 avatar image
tapan.sharma_186956 asked tapan.sharma_186956 commented

Can DSE Graph store and process billions of vertices?

Could you provide the information about the scalability metrics of DSE (Datastax) Graph? As per the link: https://docs.datastax.com/en/dse/6.0/dse-dev/datastax_enterprise/graph/dseGraphAbout.html

DSE Graph can contain hundreds of millions (10^8) of vertices and billions (10^9) of edges.

Can it store and process billions of vertices? Also, how many such graphs can be stored simultaneously? What should be the ideal cluster size?

Regards

Tapan

dsegraph
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

jeromatron avatar image
jeromatron answered tapan.sharma_186956 commented

The current graph engine and the new graph engine in DSE 6.8 can store billions of vertices and edges. There is no technical limitation. Looking at the architecture of both current and 6.8 Graph, the number of vertices is correlated to the number of possible partitions in Cassandra which is enormous. For edges, in the current version, they are stored horizontally in partitions for incident edges for each vertex. So a global limit doesn't really make sense. You may want to make sure the number of edges doesn't explode for a single vertex but that's more about the supernode problem in any graph and will come into play when you try to traverse through that vertex. For 6.8's graph, the number of edges is more in line with vertices and the storage limitations are similar to vertices - the number of Cassandra partitions.

As has been said by Erick and Seb, data modeling and the nature of your traversal will be more impactful. In your traversal, how many hops are you expecting to take, are they bounded? What is the expected branching factor at each step in the traversal? Those sorts of questions.

We should probably update our docs. At first, we wanted to be conservative with recommendations and they had tested up to certain amounts - that along with the generated IDs that are now deprecated had some potential limitations. Generated IDs shouldn't be used and the limitations on Graph, both current and in 6.8, are less about number of vertices and edges (you are fine with billions).

1 comment Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

tapan.sharma_186956 avatar image tapan.sharma_186956 commented ·

Thanks everyone for the response!

Thanks @jeromatron for details.

0 Likes 0 ·
Erick Ramirez avatar image
Erick Ramirez answered Erick Ramirez edited

@tapan.sharma_186956 As you quoted, DSE Graph can store hundreds of millions of vertices. You can have as many graphs as you can host on your cluster based on your use case, access pattern, data model, etc, and is only limited by the size of your cluster (number of nodes).

The ideal size is however much is required by your application and is a function of (1) the amount of data you're storing, (2) the types of queries you run, (3) your SLAs for those queries. Note that this isn't an exhaustive list but is a start to get you thinking on what your needs are. Cheers!

1 comment Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

tapan.sharma_186956 avatar image tapan.sharma_186956 commented ·

Thanks Erick.

Just to clarify, this means I can load billions of vertices, however, I may have to add more number of nodes to my cluster. Is my understanding correct?


0 Likes 0 ·
sebastian.estevez@datastax.com avatar image
sebastian.estevez@datastax.com answered

With the graph version that's in labs https://downloads.datastax.com/#labs a vertex is just a c* row so yes you can do billions. Cassandra data modeling implications apply.

Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.