I am a relative newcomer to development using the Spark Cassandra Connector, using the Java API to read large datasets into Spark from a Cassandra table. In the queries I have tested so far, the predicate pushdown feature of SCC is not working.
I have read the documentation of the feature here, and am using similar code: https://github.com/datastax/spark-cassandra-connector/blob/master/doc/14_data_frames.md Although the simple filters I am using are within the list of predicate pushdown restrictions at the bottom of the doc, the org.apache.spark.sql.cassandra log entries and Spark UI pages show no evidence that any filters are being pushed down to Cassandra.
I see that the Scala code to read the dataset looks like this:
val df = spark .read .cassandraFormat("pushdownexample", "pushdowns") .load()
I don't see an equivalent cassandraFormat method accessible in the Java API, but based on reading its Scala source I tried code like the following in Java.
Dataset<Row> df = spark .read() .format("org.apache.spark.sql.cassandra") .option("keyspace", "pushdowns") .option("table", "pushdownexample") .option("pushdownEnable", true) .load();
(I read that pushdown is enabled by default but tried the "pushdownEnable" option anyway.)
Assuming that I haven't made a stupid coding error elsewhere, should this approach work? Any suggestions on debugging it? I am using the Dataset.explain method in my code in combination with Spark UI SQL and job details but can't find any clues. These details show that the entire table is being scanned in read into Spark before any filters are applied.