question

Sandeep avatar image
Sandeep asked Sandeep commented

Getting "org.apache.spark.sql.cassandra is not a valid Spark SQL Data Source"

Hello Community

It seems that spark-cassandra-connector 3.1 isn't working with Spark 3.1.2 and Cassandra 3.11 combiination. The error suggests that spark can't find the class offered by the connector.

Already tried supplying the jars directly through command line using '--jars', passing dependencies through '--package'; none of which helps.

Can you please confirm if this connection version is indeed compatible with cassandra/spark versions used and has worked for anyone?

spark-cassandra-connector
4 comments
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

jaroslaw.grabowski_50515 avatar image jaroslaw.grabowski_50515 ♦ commented ·

Hello,

please share your start command line. You might have an error in `--packages` usage.

0 Likes 0 ·
Sandeep avatar image Sandeep jaroslaw.grabowski_50515 ♦ commented ·

Hello @jaroslaw.grabowski_50515

Thanks for your comment.

Attached herewith is the output. When listing all the jars from within the spark-shell, I can see the connector jar listed as expected. The same jar is also available in the spark.jars config in another isolated attempt.

spark-cassandra.txt

0 Likes 0 ·
jaroslaw.grabowski_50515 avatar image jaroslaw.grabowski_50515 ♦ Sandeep commented ·

Looks good. What's the error that you receive for exact this method?

0 Likes 0 ·
Show more comments

1 Answer

jaroslaw.grabowski_50515 avatar image
jaroslaw.grabowski_50515 answered rafafrdz commented

Could you add the following configuration parameter to your start command line?

--conf spark.sql.extensions=com.datastax.spark.connector.CassandraSparkExtensions
6 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Sandeep avatar image Sandeep commented ·

Hello. It makes no difference at all; fails with the same error.

Just for clarification, is only connector jar needed in classpath, or also the driver or also the complete assembly?

0 Likes 0 ·
jaroslaw.grabowski_50515 avatar image jaroslaw.grabowski_50515 ♦ Sandeep commented ·

What's the exact command that you used? I want to make sure that the parameter was provided correctly.

0 Likes 0 ·
Sandeep avatar image Sandeep jaroslaw.grabowski_50515 ♦ commented ·

Hi @jaroslaw.grabowski_50515

Attached herewith are the complete command and outputs.

Here's a summary:

  • The connector seems to be working through data frame API's
  • org.apache.spark.sql.cassandra is also recognized as a valid data source by Spark SQL while creating a table from within spark sql
  • org.apache.spark.sql.cassandra is not recognized as a valid data source when trying to read the table created within spark catalog having provider = 'org.apache.spark.sql.cassandra', from within same spark session

Could you please also confirm if all these JAR's (ones downloaded and added by the --package) are mandatory for basic spark-cassandra connection or only some of them should be sufficient? With the previous versions, I remember adding only connector jar and it worked well.

spark_cassandra_connector_job_log.txt

0 Likes 0 ·
Show more comments
Sandeep avatar image Sandeep commented ·

Hi @jaroslaw.grabowski_50515
Have you managed to take a look at this by any chance?

0 Likes 0 ·