DataStax Academy FAQ

DataStax Academy migrated to a new learning management system (LMS) in July 2020. We are also moving to a new Cassandra Certification process so there are changes to exam bookings, voucher system and issuing of certificates.

Check out the Academy FAQ pages for answers to your questions:


question

sridhar.addanki_188870 avatar image
sridhar.addanki_188870 asked ·

Getting java.util.concurrent.TimeoutException for Spark app performing JOIN on Solr-indexed table

Hai Team,

We are running join query as spark-submit job and getting below error timeout error. Same was fine earlier.

We have created solr search index on one of the table involved in Join. Also given high value for 'broadcasttimeout' parameter. Please help us to resolve this.

Tested bye passing different parameters to this job and getting same error.

$ dse -u cassandra -p cassandra spark-submit --class com.datastax.spark.example.WriteRead --executor-memory 10G --num-executors 10 --executor-cores 5 --driver-memory 5G test_cassandras3_new.py --master dse://100.23.42.24 --deploy-mode cluster --conf spark.dynamicAllocation.enabled=false --conf spark.shuffle.service.enabled=false


Cassandra test job started
Monday, 01. June 2020 04:44:09AM Statement Number : 10
Monday, 01. June 2020 04:44:12AM Statement Number : 20
Monday, 01. June 2020 04:44:24AM Job failed at statement number 20 immediate action required
An error occurred while calling o181.showString.
: java.util.concurrent.TimeoutException
        at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:411)
        at org.apache.spark.sql.cassandra.SolrPredicateRules$$anonfun$19.apply(SolrPredicateRules.scala:371)
        at org.apache.spark.sql.cassandra.SolrPredicateRules$$anonfun$19.apply(SolrPredicateRules.scala:361)
        at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$withSessionDo$1.apply(CassandraConnector.scala:115)
        at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$withSessionDo$1.apply(CassandraConnector.scala:114)
        at com.datastax.spark.connector.cql.CassandraConnector.closeResourceAfterUse(CassandraConnector.scala:158)
        at com.datastax.spark.connector.cql.CassandraConnector.withSessionDo(CassandraConnector.scala:114)
        at org.apache.spark.sql.cassandra.SolrPredicateRules.convertToSolrQuery(SolrPredicateRules.scala:361)
        at org.apache.spark.sql.cassandra.SolrPredicateRules.apply(SolrPredicateRules.scala:74)
        at org.apache.spark.sql.cassandra.SolrPredicateRules.apply(SolrPredicateRules.scala:91)
        at org.apache.spark.sql.cassandra.CassandraSourceRelation$$anonfun$predicatePushDown$1$$anonfun$20.apply(CassandraSourceRelation.scala:290)
        at org.apache.spark.sql.cassandra.CassandraSourceRelation$$anonfun$predicatePushDown$1$$anonfun$20.apply(CassandraSourceRelation.scala:289)
        at scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:124)
        at scala.collection.immutable.List.foldLeft(List.scala:84)
        at org.apache.spark.sql.cassandra.CassandraSourceRelation$$anonfun$predicatePushDown$1.apply(CassandraSourceRelation.scala:288)
        at org.apache.spark.sql.cassandra.CassandraSourceRelation$$anonfun$predicatePushDown$1.apply(CassandraSourceRelation.scala:269)
        at scala.collection.concurrent.TrieMap.getOrElseUpdate(TrieMap.scala:901)
        at org.apache.spark.sql.cassandra.CassandraSourceRelation.predicatePushDown(CassandraSourceRelation.scala:269)
        at org.apache.spark.sql.cassandra.CassandraSourceRelation.unhandledFilters(CassandraSourceRelation.scala:229)
        at org.apache.spark.sql.execution.datasources.DataSourceStrategy$.selectFilters(DataSourceStrategy.scala:557)
        at org.apache.spark.sql.execution.datasources.DataSourceStrategy.pruneFilterProjectRaw(DataSourceStrategy.scala:358)
        at org.apache.spark.sql.execution.datasources.DataSourceStrategy.pruneFilterProject(DataSourceStrategy.scala:321)
        at org.apache.spark.sql.execution.datasources.DataSourceStrategy.apply(DataSourceStrategy.scala:289)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:63)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:63)
        at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435)
        at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441)
        at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:78)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:75)
        at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
        at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
        at scala.collection.Iterator$class.foreach(Iterator.scala:891)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
        at scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157)
        at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1334)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:75)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:67)
        at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435)
        at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:78)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:75)
        at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
        at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
        at scala.collection.Iterator$class.foreach(Iterator.scala:891)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
        at scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157)
        at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1334)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:75)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:67)
        at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435)
        at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:78)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:75)
        at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
        at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
        at scala.collection.Iterator$class.foreach(Iterator.scala:891)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
        at scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157)
        at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1334)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:75)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:67)
        at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435)
        at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)
        at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:72)
        at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:68)
        at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:77)
        at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:77)
        at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3360)
        at org.apache.spark.sql.Dataset.head(Dataset.scala:2545)
        at org.apache.spark.sql.Dataset.take(Dataset.scala:2759)
        at org.apache.spark.sql.Dataset.getRows(Dataset.scala:255)
        at org.apache.spark.sql.Dataset.showString(Dataset.scala:292)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Thread.java:748)

WARN 2020-06-01 04:45:19,093 com.datastax.driver.core.RequestHandler: /100.23.42.23:9042 replied with server error (Query response timeout of 60000, missing responses from nodes: [100.23.42.23]), defuncting connection.
dsespark
2 comments
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

The timeout occurs because the Solr query that's required to service the Spark request doesn't complete within the 60-second timeout window. It indicates that the queries are too big or complex and take too long to complete.

Could you provide additional info about your Spark app/job? Please update your original question with details like:

  • of what your Spark app is doing
  • sample table schema
  • the JOIN query
  • any other info that would be relevant to determine why the Solr query takes to long to execute

Cheers!

0 Likes 0 · ·
Erick Ramirez avatar image Erick Ramirez ♦♦ Erick Ramirez ♦♦ ·

@sridhar.addanki_188870 Do you have an update for us? :)

0 Likes 0 · ·

0 Answers