Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

gbong_177337 avatar image
gbong_177337 asked ·

Unable to collect the results from the spark workers.

I am trying to use

Spark 2.4.3 built for Hadoop 2.7.3

to talk to Cassandra.


table_df = sqlContext.read\

.format("org.apache.spark.sql.cassandra")\

.options(table="xxxx", keyspace="yyy")\

.load()


When I call table_dr.show(); it is throwing the following exception.

Note that the pyspark submit JVM is running inside a container. I have a proxy that forwards traffic to spark.driver.port on the HOST to the container. Help me understand what this TorrentBroadcast issue is all about. Thanks



Py4JJavaError: An error occurred while calling o70.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, 172.16.104.84, executor 0): java.io.IOException: org.apache.spark.SparkException: Failed to get broadcast_0_piece0 of broadcast_0
    at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1333)
    at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:207)
    at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:66)
    at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:66)
    at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:96)
    at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:84)
    at org.apache.spark.scheduler.Task.run(Task.scala:121)
    at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.spark.SparkException: Failed to get broadcast_0_piece0 of broadcast_0
sparkdockerspark-connectorpyspark
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered ·

@gbong_177337 it looks like you have an issue with OSS Apache Spark 2.4 which we don't support unfortunately. We only provide support on the version of Spark which ships with DSE. I would suggest you consult the relevant OSS Spark forum/site. Cheers!

Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.