I am running pyspark job to read cassandra data , but it is throwing an error . I am using spark-cassandra-connector-2.3.2_patched.jar
below is error stack trace :
ERROR:root:exception during cassandra table read >>>>>:An error occurred while calling o380.count.: org.apache.spark.SparkException: Job aborted due to stage failure: Task 438 in stage 19.0 failed 4 times, most recent failure: Lost task 438.3 in stage 19.0 (TID 775, ip-10-221-46-130.ec2.internal, executor 13): com.datastax.driver.core.exceptions.FrameTooLongException: Response frame exceeded maximum allowed lengthat com.datastax.driver.core.exceptions.FrameTooLongException.copy(FrameTooLongException.java:48)
1) Is this issue happening due to connector and I have to upgrade connector version ?
2) if I have to update version , is there some work around to resolve this issue without
upgrading connector ?
def read_data_cassandra(spark_session, keyspace, table): """ Read Cassandra table :param spark_session: Spark session :param keyspace: Cassandra keyspace :param table: Cassandra table :return: DataFrame object """ try: data = spark_session.read \ .format('org.apache.spark.sql.cassandra') \ .options(table=table, keyspace=keyspace) \ .load() return data except Exception as e: print("issue reading cassandra table name:", table) logging.exception("exception during cassandra table read >>>>>:" + str(e)) raise e cassandra_data = read_data_cassandra(spark, c_keyspace, table_name) cassandra_data.cache() cassandra_data.count()
while running count() above error is coming .
versions for all the software components used including Cassandra cluster:
connector : spark-cassandra-connector-2.3.2