I am reading cassandra table from spark , and it is throwing an error
com.datastax.driver.core.exceptions.FrameTooLongException: Response frame exceeded maximum allowed length at com.datastax.driver.core.exceptions.FrameTooLongException.copy(FrameTooLongException.java:48) at com.datastax.driver.core.exceptions.FrameTooLongException.copy(FrameTooLongException.java:24) at com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:37) at com.datastax.driver.core.ArrayBackedResultSet$MultiPage.prepareNextRow(ArrayBackedResultSet.java:313)
I have gone through some article on this and understood that there is limit of 256 MB, but I do not understand why job is not failing on few executions but fails most of times.
spark job details:
- def read_data_cassandra(spark_session, keyspace, table):
- try:
- data = spark_session.read \
- .format('org.apache.spark.sql.cassandra') \
- .options(table=table, keyspace=keyspace) \
- .load()
- return data
- except Exception as e:
- print("issue reading cassandra table name:", table)
- logging.exception("exception during cassandra table read >>>>>:" + str(e))
- raise e
- cassandra_data = read_data_cassandra(spark, c_keyspace, table_name)
- cassandra_data.cache()
- cassandra_data.count()
while running count() above error is coming .
versions for all the software components used including Cassandra cluster:
cassandra: 3.11.3
spark: 2.4.2
connector : spark-cassandra-connector-2.3.2
below are my questions :
1. If partition in cassandra is bigger than 256 MB and spark program is reading that cassandra partition then how response frame length will be lesser in that case ? In most cases response frame will be higher than 256 MB .
2. How does cassandra spark connector decide response frame length ?
3. What happens when cassandra partitions are too big , how frame length are measured in such cases ?