question

sridhar.addanki_188870 avatar image
sridhar.addanki_188870 asked Erick Ramirez answered

Getting java.util.concurrent.TimeoutException for Spark app performing JOIN on Solr-indexed table

Hai Team,

We are running join query as spark-submit job and getting below error timeout error. Same was fine earlier.

We have created solr search index on one of the table involved in Join. Also given high value for 'broadcasttimeout' parameter. Please help us to resolve this.

Tested bye passing different parameters to this job and getting same error.

$ dse -u cassandra -p cassandra spark-submit --class com.datastax.spark.example.WriteRead --executor-memory 10G --num-executors 10 --executor-cores 5 --driver-memory 5G test_cassandras3_new.py --master dse://100.23.42.24 --deploy-mode cluster --conf spark.dynamicAllocation.enabled=false --conf spark.shuffle.service.enabled=false


Cassandra test job started
Monday, 01. June 2020 04:44:09AM Statement Number : 10
Monday, 01. June 2020 04:44:12AM Statement Number : 20
Monday, 01. June 2020 04:44:24AM Job failed at statement number 20 immediate action required
An error occurred while calling o181.showString.
: java.util.concurrent.TimeoutException
        at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:411)
        at org.apache.spark.sql.cassandra.SolrPredicateRules$$anonfun$19.apply(SolrPredicateRules.scala:371)
        at org.apache.spark.sql.cassandra.SolrPredicateRules$$anonfun$19.apply(SolrPredicateRules.scala:361)
        at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$withSessionDo$1.apply(CassandraConnector.scala:115)
        at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$withSessionDo$1.apply(CassandraConnector.scala:114)
        at com.datastax.spark.connector.cql.CassandraConnector.closeResourceAfterUse(CassandraConnector.scala:158)
        at com.datastax.spark.connector.cql.CassandraConnector.withSessionDo(CassandraConnector.scala:114)
        at org.apache.spark.sql.cassandra.SolrPredicateRules.convertToSolrQuery(SolrPredicateRules.scala:361)
        at org.apache.spark.sql.cassandra.SolrPredicateRules.apply(SolrPredicateRules.scala:74)
        at org.apache.spark.sql.cassandra.SolrPredicateRules.apply(SolrPredicateRules.scala:91)
        at org.apache.spark.sql.cassandra.CassandraSourceRelation$$anonfun$predicatePushDown$1$$anonfun$20.apply(CassandraSourceRelation.scala:290)
        at org.apache.spark.sql.cassandra.CassandraSourceRelation$$anonfun$predicatePushDown$1$$anonfun$20.apply(CassandraSourceRelation.scala:289)
        at scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:124)
        at scala.collection.immutable.List.foldLeft(List.scala:84)
        at org.apache.spark.sql.cassandra.CassandraSourceRelation$$anonfun$predicatePushDown$1.apply(CassandraSourceRelation.scala:288)
        at org.apache.spark.sql.cassandra.CassandraSourceRelation$$anonfun$predicatePushDown$1.apply(CassandraSourceRelation.scala:269)
        at scala.collection.concurrent.TrieMap.getOrElseUpdate(TrieMap.scala:901)
        at org.apache.spark.sql.cassandra.CassandraSourceRelation.predicatePushDown(CassandraSourceRelation.scala:269)
        at org.apache.spark.sql.cassandra.CassandraSourceRelation.unhandledFilters(CassandraSourceRelation.scala:229)
        at org.apache.spark.sql.execution.datasources.DataSourceStrategy$.selectFilters(DataSourceStrategy.scala:557)
        at org.apache.spark.sql.execution.datasources.DataSourceStrategy.pruneFilterProjectRaw(DataSourceStrategy.scala:358)
        at org.apache.spark.sql.execution.datasources.DataSourceStrategy.pruneFilterProject(DataSourceStrategy.scala:321)
        at org.apache.spark.sql.execution.datasources.DataSourceStrategy.apply(DataSourceStrategy.scala:289)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:63)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:63)
        at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435)
        at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441)
        at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:78)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:75)
        at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
        at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
        at scala.collection.Iterator$class.foreach(Iterator.scala:891)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
        at scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157)
        at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1334)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:75)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:67)
        at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435)
        at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:78)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:75)
        at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
        at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
        at scala.collection.Iterator$class.foreach(Iterator.scala:891)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
        at scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157)
        at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1334)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:75)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:67)
        at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435)
        at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:78)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:75)
        at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
        at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
        at scala.collection.Iterator$class.foreach(Iterator.scala:891)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
        at scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157)
        at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1334)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:75)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:67)
        at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435)
        at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)
        at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:72)
        at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:68)
        at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:77)
        at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:77)
        at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3360)
        at org.apache.spark.sql.Dataset.head(Dataset.scala:2545)
        at org.apache.spark.sql.Dataset.take(Dataset.scala:2759)
        at org.apache.spark.sql.Dataset.getRows(Dataset.scala:255)
        at org.apache.spark.sql.Dataset.showString(Dataset.scala:292)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Thread.java:748)

WARN 2020-06-01 04:45:19,093 com.datastax.driver.core.RequestHandler: /100.23.42.23:9042 replied with server error (Query response timeout of 60000, missing responses from nodes: [100.23.42.23]), defuncting connection.
dsespark
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered

The timeout occurs because the Solr query that's required to service the Spark request doesn't complete within the 60-second timeout window. It indicates that the queries are too big or complex and take too long to complete.

Could you provide additional info about your Spark app/job? Please update your original question with details like:

  • of what your Spark app is doing
  • sample table schema
  • the JOIN query
  • any other info that would be relevant to determine why the Solr query takes to long to execute

Cheers!

1 comment Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Erick Ramirez avatar image Erick Ramirez ♦♦ commented ·

@sridhar.addanki_188870 Do you have an update for us? :)

0 Likes 0 ·