question

Jeoppman avatar image
Jeoppman asked Erick Ramirez answered

PySpark on Databricks returns IOException: "Could not initialize class com.datastax.oss.driver.internal.core.config.typesafe.TypesafeDriverConfig"

[FOLLOW UP QUESTION TO #12321]

I installed following 4 libraries from Maven into my databricks cluster:

com.datastax.spark:spark-cassandra-connector_2.12:3.1.0
com.datastax.spark:spark-cassandra-connector-driver_2.12:3.1.0
com.datastax.spark:spark-cassandra-connector-assembly_2.12:3.1.0
com.datastax.spark:spark-cassandra-connector-test-support_2.12:3.1.0

Now this code results to a connection error:

from pyspark.sql import SparkSession

spark = SparkSession.builder \
  .appName('SparkCassandraApp') \
  .config('spark.cassandra.connection.host', '51.144.132.139') \
  .config('spark.cassandra.connection.port', '9042') \
  .config("spark.cassandra.auth.username","$$$$")\
  .config("spark.cassandra.auth.password","$$$$") \
  .getOrCreate()

table = 'ts_kv_partitions_cf'
keyspace = 'thingsboard'
df = sqlContext.read.format("org.apache.spark.sql.cassandra").load(keyspace="thingsboard", table="ts_kv_partitions_cf")

Error message:

java.io.IOException: Failed to open native connection to Cassandra at {51.144.132.139:9042} :: Could not initialize class com.datastax.oss.driver.internal.core.config.typesafe.TypesafeDriverConfig

If I try to connect to Cassandra from the same cluster via "cassandra-driver", I can connect and retrieve data without any problem.

from cassandra.cluster import Cluster
from cassandra.auth import PlainTextAuthProvider,AuthProvider
from cassandra.util import uuid_from_time
import pandas as pd
import uuid

contactPoint = ["51.144.132.139"]  # öffentliche IP der Cassandra VM DEV
port = 9042

username = dbutils.secrets.get('Thingsboard_DEV','cassandra-admin-username')
password = dbutils.secrets.get('Thingsboard_DEV','cassandra-admin-password')

auth_provider = PlainTextAuthProvider(username=username, password=password)
cluster = Cluster(contactPoint, port=port, auth_provider=auth_provider)
session = cluster.connect("thingsboard")

Any idea what´s missing? Thanks for a hint.

Best, Jens

spark-cassandra-connector
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered

As I stated in my answer in #12321, you only need these libraries:

libraryDependencies += "com.datastax.spark" % "spark-cassandra-connector_2.12" % "3.0.1"

Have a look at the Quick Start Guide for details.

You should also load the Cassandra extensions so your session builder should look something like:

SparkSession.builder()
  .config(sparkConf)
  .withExtensions(new CassandraSparkExtensions)
  .getOrCreate()

Have a look at the Datasets and PySpark with DataFrames pages for examples. Cheers!

Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.