question

rkstruts_181775 avatar image
rkstruts_181775 asked bettina.swynnerton edited

Reading a lot of vertex data into a DataFrame results in "IllegalArgumentException: Size exceeds Integer.MAX_VALUE"

Hi Team,

We are reading the vertex data from the graph and saving in to the Dataframe. we are getting the below issue with large data.

val sourceDf=gsrc.V().df
org.apache.spark.scheduler.TaskSetManager: Lost task 10169.0 in stage 0.0 (TID 10169, org.apache.spark.scheduler.TaskSetManager: Lost task 10169.0 in stage 0.0 (TID 10169, IPAddress, executor 2): java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE
    at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:869)
    at executor 2): java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE
    at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:869)
    at


dsegraphanalytics
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Erick Ramirez avatar image
Erick Ramirez answered

There isn't enough information in what you posted to provide any meaningful analysis but as you already stated, you're retrieving too much data and it's larger than the 2GB limit.

You probably have a large Graph and reading all the vertices into a DataFrame won't work unless you limit the result size by breaking them up into smaller chunks.

If you provide the full error message plus the full stack trace, we would be able to provide a bit more info. It would probably be too big to post here so I suggest that you upload it somewhere like https://gist.github.com/ then post the URL here. Cheers!

Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

bettina.swynnerton avatar image
bettina.swynnerton answered

This is likely to be an issue with partition sizes. You can try to increase the number of partitions after the read, with .repartition(100) or more. The repartition method can be used to either increase or decrease the number of partitions in a DataFrame.

If you are reading the vertices from a graph created with DSE Graph, you could read them into a DSEGraphFrame:

val g = spark.dseGraph("your_graphname")

If this doesn't help, let us have a bit more context (what kind of graph are you reading, versions, what Spark etc).

Thanks!

Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.