article

Erick Ramirez avatar image
Erick Ramirez posted Erick Ramirez edited

HOW TO - Connect to Astra DB from the DataBricks cloud platform

Overview

This article provides the steps for connecting to Astra DB from a Databricks Lakehouse Platform Apache Spark cluster.

Prerequisites

This article assumes you have created a DataBricks Community Edition account.

You will also need to generate an application token and download the secure bundle for your Astra DB.

Procedure

DATABRICKS COMMUNITY EDITION

Create a Databricks cluster:

  1. From the left hand menu, select Create > Cluster to create a new cluster:d1-create-cluster.png
  2. Create a Spark 3.2 cluster with a Databricks Runtime Version of 10.4 LTS (includes Apache Spark 3.2.1, Scala 2.12):d2-dce-version.png


SPARK-CASSANDRA-CONNECTOR

In the Maven Repository, there are 2 versions of the connector:

IMPORTANT - In contrast to the standard connector, the shaded assembly JAR includes all the required dependencies with some connector dependencies "shaded out" so they do not conflict with the Spark dependencies. For Spark 3.2, use spark-cassandra-connector-assembly version 3.2. For the full compatibility list, see the Version Compatibility table on the GitHub repository.

Install the spark-cassandra-connector:

  1. For the DataBricks Spark 3.2 cluster above, download the spark-cassandra-connector-assembly_2.12-3.2.0.jar.
  2. On the Databricks site, select the Libraries tab then click on the Install new button:c1-libraries.png
  3. Upload the connector JAR:c2-upload.png
  4. Once uploaded, click the Install button. This action will store the file to the Databricks filesystem (DBFS).


CONFIGURE ASTRA

Setup the Spark environment:

  1. Configure your Databricks cluster by clicking on the cluster Edit button in the top right:a1-cluster-edit.png
  2. Click on the Spark tab and configure the following properties:
    • spark.cassandra.auth.username - set to literal string token
    • spark.cassandra.auth.password - set to token value that begins with AstraCS:*
    • spark.cassandra.connection.config.cloud.path - filename of the secure connect bundle
    • spark.files: full DBFS URL to the secure connect bundle
    • spark.dse.continuousPagingEnabled - disable continuous paging by setting to false (recommended)
    a2-spark-config.png
  3. Click the Confirm and restart button for the changes to take effect.


Final test

Test connectivity to Astra by creating a new notebook. From the left hand menu, click Create > Notebook (select Scala as the default language):

Copy the following sample Scala code to your notebook:

import org.apache.spark.sql.functions._
import com.datastax.spark.connector._
import org.apache.spark.sql.cassandra._
import spark.implicits._

val dbName = "HelloCatalog"
val keyspace = "databricks_ks"

spark.conf.set(s"spark.sql.catalog.$dbName", "com.datastax.spark.connector.datasource.CassandraCatalog")
spark.sql(s"use $dbName.$keyspace")
spark.sql("show tables").show()

WARNING - You need to replace this line with the keyspace name for your own Astra DB:

val keyspace = "databricks_ks"

Run the code to retrieve the list of tables from your Astra DB.

Here is an example output:

test-output.png

astra dbdatabricksastra-db-clients
d1-create-cluster.png (140.1 KiB)
d2-dce-version.png (94.9 KiB)
c1-libraries.png (75.0 KiB)
c2-upload.png (51.8 KiB)
a1-cluster-edit.png (55.1 KiB)
a2-spark-config.png (162.7 KiB)
test-output.png (208.3 KiB)
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Article

Contributors

Erick Ramirez contributed to this article