Ryan Quey avatar image
Ryan Quey asked Ryan Quey edited

What are best practices for dealing with supernodes in DSE 6.7 Graph?

Supernodes cause trouble for any graph database, and in Cassandra in particular can cause wide partitions. Common solutions for supernodes include "cutting vertices", ie, locating edges on separate partitions from supernodes instead of the default behavior which places edges on the same node as incident incoming vertices.

For example, this is the solution suggested here:

However, while that was applicable in DSE 5.0 (as per the article) and 6.8 (according to this answer), in 6.7 this is not supported (according to this answer, note especially the comments which confirm).

This being the case, assuming we can't just avoid creating supernodes in the first place, what are the best practices for handling supernodes in 6.7? Specific concerns include but are not limited to:

  • How to avoid creating wide partitions where supernodes are
  • How to mitigate the effects of wide partitions (ie, how to improve performance and avoid other problems associated with wide partitions)
  • How to avoid other problems related to supernodes (traffic skew, etc)
graphdata modelingbest practices
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered Ryan Quey edited

Thanks for continuing to work with DSE. I need to point out your Graph questions are difficult to respond to in a Q&A forum since they are nuanced and require more assistance than we can provide in a question-and-answer format, particularly since most of the community members here like me are responding in their spare time.

There isn't a quick answer to the supernodes problem in DSE 6.7 so my suggestion is that you log a ticket with DataStax Support so you can get in touch with the Engineering team at DataStax and with the Graph devs in particular. Cheers!

3 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Ryan Quey avatar image Ryan Quey commented ·

Great thanks - just knowing that there's not a clear solution for this one is helpful in itself :)

0 Likes 0 ·
Erick Ramirez avatar image Erick Ramirez ♦♦ Ryan Quey commented ·

I do wish there was a quick answer but it eludes me when it comes to Graph. :)

0 Likes 0 ·
Ryan Quey avatar image Ryan Quey Erick Ramirez ♦♦ commented ·

I just came across this SO post from a couple of years ago actually:

(I'm not sure, but might be answered by Jonathan Lacefield, looking at the SO handle of the answerer?).

It recommends creating an intermediary hop, splitting the adjacent vertices into groups, basically so that there are less adjacent edges - there would be adjacent edges to the intermediate vertex, but that would be substantially less than adjacent edges to each connected vertex. This way data for a supernode can be split up among different partitions rather than having to be on a single partition.

It's not as clean, and seems to be clearly inferior to 6.8's solution since it adds an extra hop AND you're still splitting the edge from the vertex (rather than just splitting the edge from the vertex like in 6.8), but could still be helpful depending on the data model.

0 Likes 0 ·