question

gbong_177337 avatar image
gbong_177337 asked Erick Ramirez edited

Unable to collect the results from the spark workers.

I am trying to use Spark 2.4.3 built for Hadoop 2.7.3 to talk to Cassandra.

table_df = sqlContext.read \
  .format("org.apache.spark.sql.cassandra")\
  .options(table="xxxx", keyspace="yyy")\
  .load()

When I call table_dr.show(); it is throwing the following exception.

Note that the pyspark submit JVM is running inside a container. I have a proxy that forwards traffic to spark.driver.port on the HOST to the container. Help me understand what this TorrentBroadcast issue is all about. Thanks

Py4JJavaError: An error occurred while calling o70.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, 172.16.104.84, executor 0): java.io.IOException: org.apache.spark.SparkException: Failed to get broadcast_0_piece0 of broadcast_0
    at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1333)
    at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:207)
    at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:66)
    at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:66)
    at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:96)
    at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:84)
    at org.apache.spark.scheduler.Task.run(Task.scala:121)
    at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.spark.SparkException: Failed to get broadcast_0_piece0 of broadcast_0
spark-cassandra-connector
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered

@gbong_177337 it looks like you have an issue with OSS Apache Spark 2.4 which we don't support unfortunately. We only provide support on the version of Spark which ships with DSE. I would suggest you consult the relevant OSS Spark forum/site. Cheers!

Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.