question

narayana.jayanthi_191238 avatar image
narayana.jayanthi_191238 asked smadhavan commented

Running PySpark on Databricks, getting "IOException: Failed to open native connection to Cassandra at {localhost:9042}"

Hi,

I am trying to fetch data from astradb into pyspark dataframe. However, I a getting errors in doing so.

Code:

df = spark.read.format("org.apache.spark.sql.cassandra")\
.options(table="emp", keyspace="kafka").load()
display(df)

Error:

java.io.IOException: Failed to open native connection to Cassandra at {localhost:9042} :: Could not reach any contact point, make sure you've provided valid addresses (showing first 1 nodes, use getAllErrors() for more): Node(endPoint=localhost/127.0.0.1:9042, hostId=null, hashCode=286743cc): [com.datastax.oss.driver.api.core.connection.ConnectionInitException: [s0|control|connecting...] Protocol initialization request, step 1 (OPTIONS): failed to send request (com.datastax.oss.driver.shaded.netty.channel.StacklessClosedChannelException)]

I have installed com.datastax.spark:spark-cassandra-connector-assembly_2.12:3.2.0 in the libraries.

Also, in the cluster config, I have provided the below details:

spark.databricks.delta.preview.enabled true

spark.cassandra.auth.username <>

spark.cassandra.auth.password AstraCS:CO....

spark.files dbfs:/FileStore/tables/secure_connect_kafka.zip

Cluster config:

(includes Apache Spark 3.2.1, Scala 2.12)

Can you please help here?

Thanks

Narayana

astra dbdatabricks
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

smadhavan avatar image
smadhavan answered smadhavan commented

@narayana.jayanthi_191238 if you're specifying the AstraCS:... value in the password, the username has to be the string literal token. Another option is to leverage the ClientID (as username) and ClientSecret (as password) from the Generate Token step.

Other Missing parameters:

  • spark.dse.continuousPagingEnabled - disable continuous paging by setting to false (recommended)
  • spark.cassandra.connection.config.cloud.path - filename of the secure connect bundle - this is a must. In your case it would be secure_connect_kafka.zip


For additional details, please refer to the project documentation here.

2 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Thank you.. The above solution has worked. Awesome !!

1 Like 1 ·
smadhavan avatar image smadhavan ♦ narayana.jayanthi_191238 ·

Glad it worked! @narayana.jayanthi_191238

0 Likes 0 ·
steve.lacerda avatar image
steve.lacerda answered narayana.jayanthi_191238 commented
1 comment Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Hi,

I did refer that link. Can you please share me an example using pyspark rather than scala. I am using Python as a language in my databricks notebook. Appreciate your help here.

Thanks

Narayana

0 Likes 0 ·