question

jleichter_160836 avatar image
jleichter_160836 asked jaroslaw.grabowski_50515 answered

Can you use Cassandra UDFs/UDAFs from spark-sql?

If I create a table in spark-sql that is back by a cassandra table:

CREATE TABLE my_spark_table
       USING org.apache.spark.sql.cassandra
       OPTIONS (
                keyspace "my_keyspace",
                table "my_cassandra_table",
                pushdown "true");

Would it be possible to call cassandra udfs in spark, for example:

select my_cassandra_udf(column_a) from my_spark_table; 


spark-cassandra-connectoruser-defined function
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

jaroslaw.grabowski_50515 avatar image
jaroslaw.grabowski_50515 answered
Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Erick Ramirez avatar image
Erick Ramirez answered

The quick answer is you can call a user-defined function (UDF) in the Spark connector using FunctionCallRef (SPARKC-280). Unfortunately, there isn't a lot of detail on it so I'm not sure if you can call it from Spark SQL.

There's this example from the User Defined Aggregations with Spark blog post:

val result = sensorsAndDaysRDD.
  map( _._1 ).
  joinWithCassandraTable("udfdemo","measurements",
    SomeColumns("sensor",
      "time_bucket",
      FunctionCallRef("temp_avg", Seq(Right("temp")), Some("avg_temp"))))

I'm going to reach out to the Analytics devs in DataStax and get them to respond on this ticket. Cheers!

Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.