PLANNED MAINTENANCE

Hello, DataStax Community!

We want to make you aware of a few operational updates which will be carried out on the site. We are working hard to streamline the login process to integrate with other DataStax resources. As such, you will soon be prompted to update your password. Please note that your username will remain the same.

As we work to improve your user experience, please be aware that login to the DataStax Community will be unavailable for a few hours on:

  • Wednesday, July 15 16:00 PDT | 19:00 EDT | 20:00 BRT
  • Thursday, July 16 00:00 BST | 01:00 CEST | 04:30 IST | 07:00 CST | 09:00 AEST

For more info, check out the FAQ page. Thank you for being a valued member of our community.


question

jleichter_160836 avatar image
jleichter_160836 asked ·

Can you use Cassandra UDFs/UDAFs from spark-sql?

If I create a table in spark-sql that is back by a cassandra table:

CREATE TABLE my_spark_table
       USING org.apache.spark.sql.cassandra
       OPTIONS (
                keyspace "my_keyspace",
                table "my_cassandra_table",
                pushdown "true");

Would it be possible to call cassandra udfs in spark, for example:

select my_cassandra_udf(column_a) from my_spark_table; 


spark-cassandra-connectoruser-defined function
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

jaroslaw.grabowski_50515 avatar image
jaroslaw.grabowski_50515 answered ·
Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Erick Ramirez avatar image
Erick Ramirez answered ·

The quick answer is you can call a user-defined function (UDF) in the Spark connector using FunctionCallRef (SPARKC-280). Unfortunately, there isn't a lot of detail on it so I'm not sure if you can call it from Spark SQL.

There's this example from the User Defined Aggregations with Spark blog post:

val result = sensorsAndDaysRDD.
  map( _._1 ).
  joinWithCassandraTable("udfdemo","measurements",
    SomeColumns("sensor",
      "time_bucket",
      FunctionCallRef("temp_avg", Seq(Right("temp")), Some("avg_temp"))))

I'm going to reach out to the Analytics devs in DataStax and get them to respond on this ticket. Cheers!

Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.