I am reading cassandra table in spark . My spark job is running on aws emr .
I am aware that number of partitions are created in spark cassandra table size/spark.cassandra.input.split_size_in_mb . spark.cassandra.input.split_size_in_mb default value is 64 MB.
I have cassandra data size 30 GB but number of partitions in spark is not equal to 30 GB/ 64 MB
but many more than this number . What could be reason for partitions number not matching with the formula ?