I am using
cassandra: 3.11.3 spark: 2.3.2 connector : spark-cassandra-connector-2.3.2
I have run nodetool cfstats keyspace.table -H and it shows :
Space used (live): 33.8 GiB Space used (total): 33.8 GiB
as replication factor is set 3 and this is three nodes cluster , so I assume actual table size would be 33.8/3 ~ 12 GiB
I am reading this table from spark and spark is creating 1443 partitions , I am wondering why did spark create so many partitions , even if spark.cassandra.input.split.sizeInMB default is 64 MB .
I tried to set values in spark but getting below error:
spark.cassandra.input.split.size_in_mb spark.cassandra.input.split_size_in_mb is not a valid Spark Cassandra Connector variable. No likely matches found. spark.cassandra.input.split.size is not a valid Spark Cassandra Connector variable.