PLANNED MAINTENANCE

Hello, DataStax Community!

We want to make you aware of a few operational updates which will be carried out on the site. We are working hard to streamline the login process to integrate with other DataStax resources. As such, you will soon be prompted to update your password. Please note that your username will remain the same.

As we work to improve your user experience, please be aware that login to the DataStax Community will be unavailable for a few hours on:

  • Wednesday, July 15 16:00 PDT | 19:00 EDT | 20:00 BRT
  • Thursday, July 16 00:00 BST | 01:00 CEST | 04:30 IST | 07:00 CST | 09:00 AEST

For more info, check out the FAQ page. Thank you for being a valued member of our community.


question

rkstruts_181775 avatar image
rkstruts_181775 asked ·

Reading a lot of vertex data into a DataFrame results in "IllegalArgumentException: Size exceeds Integer.MAX_VALUE"

Hi Team,

We are reading the vertex data from the graph and saving in to the Dataframe. we are getting the below issue with large data.

val sourceDf=gsrc.V().df
org.apache.spark.scheduler.TaskSetManager: Lost task 10169.0 in stage 0.0 (TID 10169, org.apache.spark.scheduler.TaskSetManager: Lost task 10169.0 in stage 0.0 (TID 10169, IPAddress, executor 2): java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE
    at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:869)
    at executor 2): java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE
    at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:869)
    at


dsegraphanalytics
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

bettina.swynnerton avatar image
bettina.swynnerton answered ·

This is likely to be an issue with partition sizes. You can try to increase the number of partitions after the read, with .repartition(100) or more. The repartition method can be used to either increase or decrease the number of partitions in a DataFrame.

If you are reading the vertices from a graph created with DSE Graph, you could read them into a DSEGraphFrame:

val g = spark.dseGraph("your_graphname")

If this doesn't help, let us have a bit more context (what kind of graph are you reading, versions, what Spark etc).

Thanks!

Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Erick Ramirez avatar image
Erick Ramirez answered ·

There isn't enough information in what you posted to provide any meaningful analysis but as you already stated, you're retrieving too much data and it's larger than the 2GB limit.

You probably have a large Graph and reading all the vertices into a DataFrame won't work unless you limit the result size by breaking them up into smaller chunks.

If you provide the full error message plus the full stack trace, we would be able to provide a bit more info. It would probably be too big to post here so I suggest that you upload it somewhere like https://gist.github.com/ then post the URL here. Cheers!

Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.