PLANNED MAINTENANCE

Hello, DataStax Community!

We want to make you aware of a few operational updates which will be carried out on the site. We are working hard to streamline the login process to integrate with other DataStax resources. As such, you will soon be prompted to update your password. Please note that your username will remain the same.

As we work to improve your user experience, please be aware that login to the DataStax Community will be unavailable for a few hours on:

  • Wednesday, July 15 16:00 PDT | 19:00 EDT | 20:00 BRT
  • Thursday, July 16 00:00 BST | 01:00 CEST | 04:30 IST | 07:00 CST | 09:00 AEST

For more info, check out the FAQ page. Thank you for being a valued member of our community.


question

sridhar.addanki_188870 avatar image
sridhar.addanki_188870 asked ·

Getting java.util.concurrent.TimeoutException for Spark app performing JOIN on Solr-indexed table

Hai Team,

We are running join query as spark-submit job and getting below error timeout error. Same was fine earlier.

We have created solr search index on one of the table involved in Join. Also given high value for 'broadcasttimeout' parameter. Please help us to resolve this.

Tested bye passing different parameters to this job and getting same error.

$ dse -u cassandra -p cassandra spark-submit --class com.datastax.spark.example.WriteRead --executor-memory 10G --num-executors 10 --executor-cores 5 --driver-memory 5G test_cassandras3_new.py --master dse://100.23.42.24 --deploy-mode cluster --conf spark.dynamicAllocation.enabled=false --conf spark.shuffle.service.enabled=false


Cassandra test job started
Monday, 01. June 2020 04:44:09AM Statement Number : 10
Monday, 01. June 2020 04:44:12AM Statement Number : 20
Monday, 01. June 2020 04:44:24AM Job failed at statement number 20 immediate action required
An error occurred while calling o181.showString.
: java.util.concurrent.TimeoutException
        at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:411)
        at org.apache.spark.sql.cassandra.SolrPredicateRules$$anonfun$19.apply(SolrPredicateRules.scala:371)
        at org.apache.spark.sql.cassandra.SolrPredicateRules$$anonfun$19.apply(SolrPredicateRules.scala:361)
        at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$withSessionDo$1.apply(CassandraConnector.scala:115)
        at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$withSessionDo$1.apply(CassandraConnector.scala:114)
        at com.datastax.spark.connector.cql.CassandraConnector.closeResourceAfterUse(CassandraConnector.scala:158)
        at com.datastax.spark.connector.cql.CassandraConnector.withSessionDo(CassandraConnector.scala:114)
        at org.apache.spark.sql.cassandra.SolrPredicateRules.convertToSolrQuery(SolrPredicateRules.scala:361)
        at org.apache.spark.sql.cassandra.SolrPredicateRules.apply(SolrPredicateRules.scala:74)
        at org.apache.spark.sql.cassandra.SolrPredicateRules.apply(SolrPredicateRules.scala:91)
        at org.apache.spark.sql.cassandra.CassandraSourceRelation$$anonfun$predicatePushDown$1$$anonfun$20.apply(CassandraSourceRelation.scala:290)
        at org.apache.spark.sql.cassandra.CassandraSourceRelation$$anonfun$predicatePushDown$1$$anonfun$20.apply(CassandraSourceRelation.scala:289)
        at scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:124)
        at scala.collection.immutable.List.foldLeft(List.scala:84)
        at org.apache.spark.sql.cassandra.CassandraSourceRelation$$anonfun$predicatePushDown$1.apply(CassandraSourceRelation.scala:288)
        at org.apache.spark.sql.cassandra.CassandraSourceRelation$$anonfun$predicatePushDown$1.apply(CassandraSourceRelation.scala:269)
        at scala.collection.concurrent.TrieMap.getOrElseUpdate(TrieMap.scala:901)
        at org.apache.spark.sql.cassandra.CassandraSourceRelation.predicatePushDown(CassandraSourceRelation.scala:269)
        at org.apache.spark.sql.cassandra.CassandraSourceRelation.unhandledFilters(CassandraSourceRelation.scala:229)
        at org.apache.spark.sql.execution.datasources.DataSourceStrategy$.selectFilters(DataSourceStrategy.scala:557)
        at org.apache.spark.sql.execution.datasources.DataSourceStrategy.pruneFilterProjectRaw(DataSourceStrategy.scala:358)
        at org.apache.spark.sql.execution.datasources.DataSourceStrategy.pruneFilterProject(DataSourceStrategy.scala:321)
        at org.apache.spark.sql.execution.datasources.DataSourceStrategy.apply(DataSourceStrategy.scala:289)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:63)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:63)
        at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435)
        at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441)
        at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:78)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:75)
        at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
        at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
        at scala.collection.Iterator$class.foreach(Iterator.scala:891)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
        at scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157)
        at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1334)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:75)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:67)
        at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435)
        at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:78)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:75)
        at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
        at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
        at scala.collection.Iterator$class.foreach(Iterator.scala:891)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
        at scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157)
        at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1334)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:75)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:67)
        at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435)
        at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:78)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:75)
        at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
        at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
        at scala.collection.Iterator$class.foreach(Iterator.scala:891)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
        at scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157)
        at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1334)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:75)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:67)
        at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435)
        at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)
        at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:72)
        at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:68)
        at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:77)
        at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:77)
        at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3360)
        at org.apache.spark.sql.Dataset.head(Dataset.scala:2545)
        at org.apache.spark.sql.Dataset.take(Dataset.scala:2759)
        at org.apache.spark.sql.Dataset.getRows(Dataset.scala:255)
        at org.apache.spark.sql.Dataset.showString(Dataset.scala:292)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Thread.java:748)

WARN 2020-06-01 04:45:19,093 com.datastax.driver.core.RequestHandler: /100.23.42.23:9042 replied with server error (Query response timeout of 60000, missing responses from nodes: [100.23.42.23]), defuncting connection.
dsespark
2 comments
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

The timeout occurs because the Solr query that's required to service the Spark request doesn't complete within the 60-second timeout window. It indicates that the queries are too big or complex and take too long to complete.

Could you provide additional info about your Spark app/job? Please update your original question with details like:

  • of what your Spark app is doing
  • sample table schema
  • the JOIN query
  • any other info that would be relevant to determine why the Solr query takes to long to execute

Cheers!

0 Likes 0 · ·
Erick Ramirez avatar image Erick Ramirez ♦♦ Erick Ramirez ♦♦ ·

@sridhar.addanki_188870 Do you have an update for us? :)

0 Likes 0 · ·

0 Answers