I´m trying to configure Databricks to use the Spark Cassandra Connector.
Here my environment:
Databricks Runtime version: 8.4 ML (includes Apache Spark 3.1.2, Scala 2.12)
Installed Maven Library: datastax:spark-cassandra-connector:2.4.0-s_2.11
Configured the catalog with a Python notebook:
spark.conf.set("spark.sql.catalog.catthingsboard.spark.cassandra.auth.password","<password>") spark.conf.set("spark.sql.catalog.catthingsboard.spark.cassandra.auth.username","<user>") spark.conf.set("spark.sql.catalog.catthingsboard", "com.datastax.spark.connector.datasource.CassandraCatalog") spark.conf.set("spark.sql.defaultCatalog","catthingsboard") spark.conf.set("spark.sql.catalog.catthingsboard.spark.cassandra.connection.host", "<ip>")
Then try to use the catalog:
%sql show databases
This results to an error:
Error in SQL statement: SparkException: Cannot find catalog plugin class for catalog 'catthingsboard': com.datastax.spark.connector.datasource.CassandraCatalog
Now my question: How do I propagate the catalog plugin / Maven library datastax:spark-cassandra-connector:2.4.0-s_2.11 so that it is recognized by com.datastax.spark.connector.datasource.CassandraCatalog ??
If I try to configure the library with:
%sh $SPARK_HOME/bin/spark-shell --packages com.datastax.spark:spark-cassandra-connector:2.4.0-s_2.11
I get this error:
Error: Could not find or load main class org.apache.spark.launcher.Main /databricks/spark/bin/spark-class: line 101: CMD: bad array subscript
Thanks a lot for your help.