Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

Jeoppman avatar image
Jeoppman asked Erick Ramirez answered

PySpark on Databricks returns IOException: "Could not initialize class com.datastax.oss.driver.internal.core.config.typesafe.TypesafeDriverConfig"

[FOLLOW UP QUESTION TO #12321]

I installed following 4 libraries from Maven into my databricks cluster:

com.datastax.spark:spark-cassandra-connector_2.12:3.1.0
com.datastax.spark:spark-cassandra-connector-driver_2.12:3.1.0
com.datastax.spark:spark-cassandra-connector-assembly_2.12:3.1.0
com.datastax.spark:spark-cassandra-connector-test-support_2.12:3.1.0

Now this code results to a connection error:

from pyspark.sql import SparkSession

spark = SparkSession.builder \
  .appName('SparkCassandraApp') \
  .config('spark.cassandra.connection.host', '51.144.132.139') \
  .config('spark.cassandra.connection.port', '9042') \
  .config("spark.cassandra.auth.username","$$$$")\
  .config("spark.cassandra.auth.password","$$$$") \
  .getOrCreate()

table = 'ts_kv_partitions_cf'
keyspace = 'thingsboard'
df = sqlContext.read.format("org.apache.spark.sql.cassandra").load(keyspace="thingsboard", table="ts_kv_partitions_cf")

Error message:

java.io.IOException: Failed to open native connection to Cassandra at {51.144.132.139:9042} :: Could not initialize class com.datastax.oss.driver.internal.core.config.typesafe.TypesafeDriverConfig

If I try to connect to Cassandra from the same cluster via "cassandra-driver", I can connect and retrieve data without any problem.

from cassandra.cluster import Cluster
from cassandra.auth import PlainTextAuthProvider,AuthProvider
from cassandra.util import uuid_from_time
import pandas as pd
import uuid

contactPoint = ["51.144.132.139"]  # öffentliche IP der Cassandra VM DEV
port = 9042

username = dbutils.secrets.get('Thingsboard_DEV','cassandra-admin-username')
password = dbutils.secrets.get('Thingsboard_DEV','cassandra-admin-password')

auth_provider = PlainTextAuthProvider(username=username, password=password)
cluster = Cluster(contactPoint, port=port, auth_provider=auth_provider)
session = cluster.connect("thingsboard")

Any idea what´s missing? Thanks for a hint.

Best, Jens

spark-cassandra-connector
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered

As I stated in my answer in #12321, you only need these libraries:

libraryDependencies += "com.datastax.spark" % "spark-cassandra-connector_2.12" % "3.0.1"

Have a look at the Quick Start Guide for details.

You should also load the Cassandra extensions so your session builder should look something like:

SparkSession.builder()
  .config(sparkConf)
  .withExtensions(new CassandraSparkExtensions)
  .getOrCreate()

Have a look at the Datasets and PySpark with DataFrames pages for examples. Cheers!

Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.