DataStax Academy FAQ

DataStax Academy migrated to a new learning management system (LMS) in July 2020. We are also moving to a new Cassandra Certification process so there are changes to exam bookings, voucher system and issuing of certificates.

Check out the Academy FAQ pages for answers to your questions:


question

rkstruts_181775 avatar image
rkstruts_181775 asked ·

Reading a lot of vertex data into a DataFrame results in "IllegalArgumentException: Size exceeds Integer.MAX_VALUE"

Hi Team,

We are reading the vertex data from the graph and saving in to the Dataframe. we are getting the below issue with large data.

val sourceDf=gsrc.V().df
org.apache.spark.scheduler.TaskSetManager: Lost task 10169.0 in stage 0.0 (TID 10169, org.apache.spark.scheduler.TaskSetManager: Lost task 10169.0 in stage 0.0 (TID 10169, IPAddress, executor 2): java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE
    at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:869)
    at executor 2): java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE
    at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:869)
    at


dsegraphanalytics
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

bettina.swynnerton avatar image
bettina.swynnerton answered ·

This is likely to be an issue with partition sizes. You can try to increase the number of partitions after the read, with .repartition(100) or more. The repartition method can be used to either increase or decrease the number of partitions in a DataFrame.

If you are reading the vertices from a graph created with DSE Graph, you could read them into a DSEGraphFrame:

val g = spark.dseGraph("your_graphname")

If this doesn't help, let us have a bit more context (what kind of graph are you reading, versions, what Spark etc).

Thanks!

Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Erick Ramirez avatar image
Erick Ramirez answered ·

There isn't enough information in what you posted to provide any meaningful analysis but as you already stated, you're retrieving too much data and it's larger than the 2GB limit.

You probably have a large Graph and reading all the vertices into a DataFrame won't work unless you limit the result size by breaking them up into smaller chunks.

If you provide the full error message plus the full stack trace, we would be able to provide a bit more info. It would probably be too big to post here so I suggest that you upload it somewhere like https://gist.github.com/ then post the URL here. Cheers!

Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.