I am reading cassandra table of data size ( 268 MB ) from pyspark job running on 16GB ram . read consistency is set one .
below is my code to read cassandra table data from pyspark job:
data = read_data(spark, 'platform_agent_db', 'agent_version') data = data.persist(StorageLevel.MEMORY_AND_DISK) data.write.parquet("s3a://dp-cassandra-full-poc/keysapce/table//full_load/", mode="overwrite")
above code is throwing an below error:
Caused by: com.datastax.driver.core.exceptions.OperationTimedOutException: [localhost/127.0.0.1:9042] Timed out waiting for server response at com.datastax.driver.core.RequestHandler$SpeculativeExecution.onTimeout(RequestHandler.java:772) at com.datastax.driver.core.Connection$ResponseHandler$1.run(Connection.java:1374) at com.datastax.shaded.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:581) at com.datastax.shaded.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:655) at com.datastax.shaded.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:367) at com.datastax.shaded.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(
[Stage 1:> (0 + 1) / 64]2020-12-04 11:43:02 WARN Session:378 - Error creating pool to /172.28.16.43:9042 com.datastax.driver.core.exceptions.ConnectionException: [/cassandraip9042] Pool was closed during initialization at com.datastax.driver.core.HostConnectionPool$2.onSuccess(HostConnectionPool.java:148) [Stage 1:==> (3 + 8) / 64]2020-12-04 11:54:57 WARN BlockManager:66 - Putting block rdd_4_5 failed due to exception java.io.IOException: Exception during execution of SELECT cols WHERE token(partition_key) > ? AND token(partition_key) <= ? ALLOW FILTERING: [localhost/127.0.0.1:9042] Timed out waiting for server response. 2020-12-04 11:54:57 WARN BlockManager:66 - Block rdd_4_5 could not be removed as it was not found on disk or in memory 2020-12-04 11:54:57 ERROR Executor:91 - Exception in task 9.0 in stage 1.0 (TID 10)