We are using Spark Cassandra Connector 2.5.0 to read from a Cassandra Table with approximately 400 columns with 10 Million Records. Table has 64 Partitions. We can increase or decrease the number of partitions on the table.
Our goal is to fetch more than one partition in a single read within Spark Job. We are using 'IN Clause' in Spark SQL with the partition Key e.g:
spark.sql("select * from <tableName> where <PartitionKey> in ('2021-03-12_0','2021-03-12_1','2021-03-12_2','2021-03-12_3','2021-03-11_4','2021-03-11_5','2021-03-11_6','2021-03-11_7')").
Cassandra: 8 Nodes with 3 CPU each.
Spark Executor: 4, Cores: 2, Memory: 6GB for each Executor
Spark Cassandra Connector Configurations used: "spark.cassandra.input.split.sizeInMB": 64 MB
Above parameter has been modified for multiple values like 512 MB, 1024 MB etc.
But we have observed from the Spark UI, that only 1 Spark Task is reading from all the partitions for above query.
Anticipate your prompt response as we are stuck on this. Thanks in advance.