When trying to read data through spark job with spark-cassandra-connector-2.4.2 it takes 28 minutes to read 90000000 records.we need to reduce this time to 5-10 mins also below are the config used in spark job & while reading out of 5 executors of spark only one executor has tasks running on it other 4 are idle.Our cassandra version is apache-cassandra-3.11.3.
fs.gs.impl", "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem fs.AbstractFileSystem.gs.impl", "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS google.cloud.auth.service.account.enable", "true" spark.yarn.maxAppAttempts", "1" spark.memory.offHeap.enabled", "true" spark.memory.offHeap.size", "16g" spark.sql.broadcastTimeout", "36000" spark.network.timeout", "600s" spark.cassandra.input.consistency.level", "LOCAL_QUORUM" spark.cassandra.output.consistency.level", "ANY" spark.sql.shuffle.partitions", "150" spark.shuffle.blockTransferService", "nio" spark.maxRemoteBlockSizeFetchToMem", "2000m" spark.sql.hive.filesourcePartitionFileCacheSize", "0" spark.cassandra.input.split.size_in_mb","512"
@Erick Ramirez Please suggest some solution.