fwy_187020 avatar image
fwy_187020 asked Erick Ramirez commented

How do I delete rows using the Spark connector Java API?

I am using the Spark Cassandra Connector 3.1.0 Java API to both add and delete rows in a Cassandra 3.11.2 database. However, I am having trouble coding the delete operations using the SCC Java classes.

I have read the SCC Github documentation as well as other web postings regarding use of the Java API and row deletion using SCC. While this appears to be pretty straightforward using Scala, I haven't found good guidance on how to do it in Java.

I am simply trying to delete all the rows with a given partition key, which in Scala would be something like this.

sc.cassandraTable("myKeyspace", "myTable")
  .where("key1 = 'a' AND key2 = 'b' and key3 = 'c'")
  .deleteFromCassandra("myKeyspace", "myTable")

In Java, I am trying to use the CassandraJavaUtil.javaFunctions wrapper methods to accomplish the same, something like this.

CassandraTableScanJavaRDD<CassandraRow> rdd = javaFunctions(sc.cassandraTable("myKeyspace", "myTable"))
  .where("key1 = 'a' AND key2 = 'b' and key3 = 'c'");
javaFunctions(rdd).deleteFromCassandra("myKeyspace", "myTable", ...);

However, the RDDJavaFunctions.deleteFromCassandra() signature after the first two parameters seems confusing and verbose:

deleteFromCassandra(keyspace: String, table: String, rowWriterFactory: RowWriterFactory[T], deleteColumns: ColumnSelector, keyColumns: ColumnSelector, conf: WriteConf, connector: CassandraConnector): Unit

In particular, I don't know how to construct the appropriate RowWriterFactory object and why it would even be relevant to a delete. The ColumnSelector entries look like they could be simple maps, although I don't need to specify any mappings in this case. I understand the WriteConf and the CassandraConnector, although I don't know why the latter would be necessary.

It would be easier if the simpler RDDFunctions.deleteFromCassandra() implementation could be used, but that doesn't seem to be an option. Looking at the source code, I see that RDDJavaFunctions.deleteFromCassandra() delegates to RDDFunctions.deleteFromCassandra(), which doesn't use the RowWriterFactory object at all although the parameter is required.

At this point I expect there is something I am missing that would make this easier. Is there another deleteFromCassandra() implementation that can be directly called in Java? What is the best way to process basic deletes in Java using the Spark Cassandra Connector?

10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

fwy_187020 avatar image
fwy_187020 answered Erick Ramirez commented

After much trolling of the SCC source code, I found that the following coding approach worked for my table with three String-type partition key columns.

CassandraTableScanJavaRDD<Tuple3<String, String, String>> rdd = javaFunctions(sc)
    .cassandraTable("myKeyspace", "myTable", mapRowToTuple(String.class, String.class, String.class))
    .select("key1", "key2", "key3")
    .where("key1 = 'a' AND key2 = 'b' and key3 = 'c'");
javaFunctions(rdd).deleteFromCassandra("myKeyspace", "myTable",
    mapTupleToRow(String.class, String.class, String.class),
    someColumns(), someColumns("key1", "key2", "key3")),
    javaFunctions(rdd).defaultWriteConf(), javaFunctions(rdd).defaultConnector());

So in order to delete rows using SCC, I had to map the keys to-from Tuple objects.

I had no interest in mapping the raw data to anything at all, only in deleting it, so the approach still seems awkward and non-intuitive to me. Please post any alternative examples that you can find.

1 comment Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Erick Ramirez avatar image Erick Ramirez ♦♦ commented ·

Sorry for not getting back to you sooner because I missed the update. I'm glad you found a workaround.

Unfortunately, the Java API is not as full-featured so can be a bit limited. Cheers!

0 Likes 0 ·
Erick Ramirez avatar image
Erick Ramirez answered

I can't seem to figure out how to do this either in Java.

I've reached out to the Analytics team to ask for a working example. I'll post it here when I find one. Cheers!

10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.