Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

mishra.anurag643_153409 avatar image
mishra.anurag643_153409 asked Erick Ramirez commented

FrameTooLongException error reading table from Spark

I am reading cassandra table from spark , and it is throwing an error

com.datastax.driver.core.exceptions.FrameTooLongException: Response frame exceeded maximum allowed length     at com.datastax.driver.core.exceptions.FrameTooLongException.copy(FrameTooLongException.java:48)     at com.datastax.driver.core.exceptions.FrameTooLongException.copy(FrameTooLongException.java:24)     at com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:37)     at com.datastax.driver.core.ArrayBackedResultSet$MultiPage.prepareNextRow(ArrayBackedResultSet.java:313)

I have gone through some article on this and understood that there is limit of 256 MB, but I do not understand why job is not failing on few executions but fails most of times.

spark job details:

  1. def read_data_cassandra(spark_session, keyspace, table):
  2. try:
  3. data = spark_session.read \
  4. .format('org.apache.spark.sql.cassandra') \
  5. .options(table=table, keyspace=keyspace) \
  6. .load()
  7. return data
  8. except Exception as e:
  9. print("issue reading cassandra table name:", table)
  10. logging.exception("exception during cassandra table read >>>>>:" + str(e))
  11. raise e
  12. cassandra_data = read_data_cassandra(spark, c_keyspace, table_name)
  13. cassandra_data.cache()
  14. cassandra_data.count()

while running count() above error is coming .

versions for all the software components used including Cassandra cluster:

cassandra: 3.11.3

spark: 2.4.2

connector : spark-cassandra-connector-2.3.2

below are my questions :

1. If partition in cassandra is bigger than 256 MB and spark program is reading that cassandra partition then how response frame length will be lesser in that case ? In most cases response frame will be higher than 256 MB .

2. How does cassandra spark connector decide response frame length ?

3. What happens when cassandra partitions are too big , how frame length are measured in such cases ?

spark-cassandra-connector
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered Erick Ramirez commented

The Spark Cassandra connector uses the Java driver under hood. The driver itself is generating the FrameTooLongException when it receives a response from the cluster that is too large.

The most common cause of this exception is reading a very large partition with lots of rows. If paging is disabled, all the rows are returned to the client leading to this exception.

Tune the read parameters and set input.fetch.size_in_rows to a lower value. For example, if you have a partition which has 1000 rows and the average row size is 2MB then the partition size is 2GB which is too big for the maximum frame size allowed. In this scenario, consider setting size_in_rows=200 (equivalent to 200 x 2MB or 200MB). This will require a fair bit of trial-and-error if you don't know the offending partition. Cheers!

2 comments Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

@Erick Riamirez does this mean if paging is disabled , in that case spark would never be able to read cassandra partition bigger than 256 MB ?
0 Likes 0 ·
Erick Ramirez avatar image Erick Ramirez ♦♦ mishra.anurag643_153409 ·
Correct, yes.
0 Likes 0 ·