Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

Sandeep avatar image
Sandeep asked Sandeep commented

Getting "org.apache.spark.sql.cassandra is not a valid Spark SQL Data Source"

Hello Community

It seems that spark-cassandra-connector 3.1 isn't working with Spark 3.1.2 and Cassandra 3.11 combiination. The error suggests that spark can't find the class offered by the connector.

Already tried supplying the jars directly through command line using '--jars', passing dependencies through '--package'; none of which helps.

Can you please confirm if this connection version is indeed compatible with cassandra/spark versions used and has worked for anyone?

spark-cassandra-connector
4 comments
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Hello,

please share your start command line. You might have an error in `--packages` usage.

0 Likes 0 ·
Sandeep avatar image Sandeep jaroslaw.grabowski_50515 ·

Hello @jaroslaw.grabowski_50515

Thanks for your comment.

Attached herewith is the output. When listing all the jars from within the spark-shell, I can see the connector jar listed as expected. The same jar is also available in the spark.jars config in another isolated attempt.

spark-cassandra.txt

0 Likes 0 ·

Looks good. What's the error that you receive for exact this method?

0 Likes 0 ·
Show more comments

1 Answer

jaroslaw.grabowski_50515 avatar image
jaroslaw.grabowski_50515 answered Sandeep commented

Could you add the following configuration parameter to your start command line?

--conf spark.sql.extensions=com.datastax.spark.connector.CassandraSparkExtensions
6 comments Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Hello. It makes no difference at all; fails with the same error.

Just for clarification, is only connector jar needed in classpath, or also the driver or also the complete assembly?

0 Likes 0 ·

What's the exact command that you used? I want to make sure that the parameter was provided correctly.

0 Likes 0 ·
Sandeep avatar image Sandeep jaroslaw.grabowski_50515 ·

Hi @jaroslaw.grabowski_50515

Attached herewith are the complete command and outputs.

Here's a summary:

  • The connector seems to be working through data frame API's
  • org.apache.spark.sql.cassandra is also recognized as a valid data source by Spark SQL while creating a table from within spark sql
  • org.apache.spark.sql.cassandra is not recognized as a valid data source when trying to read the table created within spark catalog having provider = 'org.apache.spark.sql.cassandra', from within same spark session

Could you please also confirm if all these JAR's (ones downloaded and added by the --package) are mandatory for basic spark-cassandra connection or only some of them should be sufficient? With the previous versions, I remember adding only connector jar and it worked well.

spark_cassandra_connector_job_log.txt

0 Likes 0 ·
Show more comments

Hi @jaroslaw.grabowski_50515
Have you managed to take a look at this by any chance?

0 Likes 0 ·