Overview
This article provides the steps for connecting to Astra DB from a Databricks Lakehouse Platform Apache Spark cluster.
Prerequisites
This article assumes you have created a DataBricks Community Edition account.
You will also need to generate an application token and download the secure bundle for your Astra DB.
Procedure
DATABRICKS COMMUNITY EDITION
Create a Databricks cluster:
- From the left hand menu, select Create > Cluster to create a new cluster:
- Create a Spark 3.2 cluster with a Databricks Runtime Version of
10.4 LTS (includes Apache Spark 3.2.1, Scala 2.12)
:
SPARK-CASSANDRA-CONNECTOR
In the Maven Repository, there are 2 versions of the connector:
IMPORTANT - In contrast to the standard connector, the shaded assembly JAR includes all the required dependencies with some connector dependencies "shaded out" so they do not conflict with the Spark dependencies. For Spark 3.2, use spark-cassandra-connector-assembly version 3.2. For the full compatibility list, see the Version Compatibility table on the GitHub repository.
Install the spark-cassandra-connector:
- For the DataBricks Spark 3.2 cluster above, download the spark-cassandra-connector-assembly_2.12-3.2.0.jar.
- On the Databricks site, select the Libraries tab then click on the Install new button:
- Upload the connector JAR:
- Once uploaded, click the Install button. This action will store the file to the Databricks filesystem (DBFS).
CONFIGURE ASTRA
Setup the Spark environment:
- Configure your Databricks cluster by clicking on the cluster Edit button in the top right:
- Click on the Spark tab and configure the following properties:
spark.cassandra.auth.username
- set to literal stringtoken
spark.cassandra.auth.password
- set to token value that begins withAstraCS:*
spark.cassandra.connection.config.cloud.path
- filename of the secure connect bundlespark.files
: full DBFS URL to the secure connect bundlespark.dse.continuousPagingEnabled
- disable continuous paging by setting tofalse
(recommended)
- Click the Confirm and restart button for the changes to take effect.
Final test
Test connectivity to Astra by creating a new notebook. From the left hand menu, click Create > Notebook (select Scala as the default language):
Copy the following sample Scala code to your notebook:
import org.apache.spark.sql.functions._ import com.datastax.spark.connector._ import org.apache.spark.sql.cassandra._ import spark.implicits._ val dbName = "HelloCatalog" val keyspace = "databricks_ks" spark.conf.set(s"spark.sql.catalog.$dbName", "com.datastax.spark.connector.datasource.CassandraCatalog") spark.sql(s"use $dbName.$keyspace") spark.sql("show tables").show()
WARNING - You need to replace this line with the keyspace name for your own Astra DB:
val keyspace = "databricks_ks"
Run the code to retrieve the list of tables from your Astra DB.
Here is an example output: