Hi,
In Spark+Cassandra Standard deployment where Spark is deployed in standalone mode on same physical nodes where Cassandra is deployed, I use repartitionByCassandraReplica of spark-cassandra-connector API before joining two RDDS and that works pretty fine.
Now I deployed same code on Kuberenetes, where Cassandra and Spark are running in different PODs but in Kubernetes deployment, the RDD becomes blank when repartitionByCassandraReplica is called on that. I understand that repartitionByCassandraReplica is used before JoinWithCassandraTable to obtain data locality, such that each spark partition will only require queries to their local node. But is this understanding correct that repartitionByCassandraReplica will always return blank RDD if used in Kubernetes deployment of Spark and Cassandra.