Jeoppman avatar image
Jeoppman asked Jeoppman commented

How do I configure Databricks for spark-cassandra-connector?

I´m trying to configure Databricks to use the Spark Cassandra Connector.

Here my environment:

Databricks Runtime version: 8.4 ML (includes Apache Spark 3.1.2, Scala 2.12)

Installed Maven Library: datastax:spark-cassandra-connector:2.4.0-s_2.11

Configured the catalog with a Python notebook:

spark.conf.set("spark.sql.catalog.catthingsboard", "com.datastax.spark.connector.datasource.CassandraCatalog")
spark.conf.set("", "<ip>")

Then try to use the catalog:

show databases 

This results to an error:

Error in SQL statement: SparkException: Cannot find catalog plugin class for catalog 'catthingsboard': com.datastax.spark.connector.datasource.CassandraCatalog

Now my question: How do I propagate the catalog plugin / Maven library datastax:spark-cassandra-connector:2.4.0-s_2.11 so that it is recognized by com.datastax.spark.connector.datasource.CassandraCatalog ??

If I try to configure the library with:

$SPARK_HOME/bin/spark-shell --packages com.datastax.spark:spark-cassandra-connector:2.4.0-s_2.11

I get this error:

Error: Could not find or load main class org.apache.spark.launcher.Main /databricks/spark/bin/spark-class: line 101: CMD: bad array subscript

Thanks a lot for your help.

10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered Jeoppman commented

It looks like you're using the wrong coordinates for the connector. Since you're connecting to a Spark 3.1 cluster, you'll need to use 3.1.0 of the connector. The spark-cassandra-connector v2.4.0 only works against Spark 2.4.

You need to update your dependencies with the right coordinates. For example:

libraryDependencies += "com.datastax.spark" % "spark-cassandra-connector_2.12" % "3.1.0"

Alternatively, you can specify the packages on the command line with the Spark shell or submit command:

$ spark-shell --packages com.datastax.spark:spark-cassandra-connector_2.12:3.1.0
$ spark-submit --packages com.datastax.spark:spark-cassandra-connector_2.12:3.1.0

You can test the connection to your Spark cluster with:

$ spark-shell
  --packages com.datastax.spark:spark-cassandra-connector_2.12:3.1.0
  --master <master_url>
  --conf spark.cassandra.auth.username=<username>
  --conf spark.cassandra.auth.password=<password>
  --conf spark.sql.extensions=com.datastax.spark.connector.CassandraSparkExtensions

Then you can load your catalog with:

spark.conf.set(s"spark.sql.catalog.catthingsboard", "com.datastax.spark.connector.datasource.CassandraCatalog")
spark.sql("SHOW TABLES FROM catthingsboard.your_keyspace;").show

For details, see the spark-cassandra-connector Quick Start Guide. Cheers!

1 comment Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Thanks a lot for your answer. Now I´m one step further.

[Post converted to a comment since it's not an answer]

[Follow up question posted in #12493]

0 Likes 0 ·