Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

mishra.anurag643_153409 avatar image
mishra.anurag643_153409 asked Erick Ramirez edited

spark connector read timeout job aborted due to stage failure

I am reading cassandra table of data size ( 268 MB ) from pyspark job running on 16GB ram . read consistency is set one .

below is my code to read cassandra table data from pyspark job:

data = read_data(spark, 'platform_agent_db', 'agent_version')
data = data.persist(StorageLevel.MEMORY_AND_DISK)
   data.write.parquet("s3a://dp-cassandra-full-poc/keysapce/table//full_load/", mode="overwrite")

above code is throwing an below error:

Caused by: com.datastax.driver.core.exceptions.OperationTimedOutException: [localhost/127.0.0.1:9042] Timed out waiting for server response
  at com.datastax.driver.core.RequestHandler$SpeculativeExecution.onTimeout(RequestHandler.java:772)
    at com.datastax.driver.core.Connection$ResponseHandler$1.run(Connection.java:1374)
    at com.datastax.shaded.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:581)
    at com.datastax.shaded.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:655)
    at com.datastax.shaded.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:367)
    at com.datastax.shaded.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(
[Stage 1:>                                                         (0 + 1) / 64]2020-12-04 11:43:02 WARN  Session:378 - Error creating pool to /172.28.16.43:9042
com.datastax.driver.core.exceptions.ConnectionException: [/cassandraip9042] Pool was closed during initialization
    at com.datastax.driver.core.HostConnectionPool$2.onSuccess(HostConnectionPool.java:148)
[Stage 1:==>                                                       (3 + 8) / 64]2020-12-04 11:54:57 WARN  BlockManager:66 - Putting block rdd_4_5 failed due to exception java.io.IOException: Exception during execution of SELECT cols WHERE token(partition_key) > ? AND token(partition_key) <= ?   ALLOW FILTERING: [localhost/127.0.0.1:9042] Timed out waiting for server response.
2020-12-04 11:54:57 WARN  BlockManager:66 - Block rdd_4_5 could not be removed as it was not found on disk or in memory
2020-12-04 11:54:57 ERROR Executor:91 - Exception in task 9.0 in stage 1.0 (TID 10)


spark
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered

It looks like your code is connecting to localhost and the connection timed out leading to this exception:

Caused by: com.datastax.driver.core.exceptions.OperationTimedOutException: [localhost/127.0.0.1:9042] Timed out waiting for server response

I suspect your app is not configured correctly. In any case, you haven't provided enough information for us to make any meaningful assessment.

Please try running your code against Astra. And please provide the minimum code that would allow us to replicate the issue. Cheers!

Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.