Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

mishra.anurag643_153409 avatar image
mishra.anurag643_153409 asked ·

Why has Spark created more partitions reading a table from Cassandra than input split size?

I am reading cassandra table in spark . My spark job is running on aws emr .

I am aware that number of partitions are created in spark cassandra table size/spark.cassandra.input.split_size_in_mb . spark.cassandra.input.split_size_in_mb default value is 64 MB.

I have cassandra data size 30 GB but number of partitions in spark is not equal to 30 GB/ 64 MB

but many more than this number . What could be reason for partitions number not matching with the formula ?

spark-cassandra-connector
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered ·

[Closed as duplicate of question #11500]

Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.