Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

miyone_179431 avatar image
miyone_179431 asked ·

Spark Cassandra connector issue?

Hi, i tried my best to look around for answers, so appreciate your help please. I have Apache Spark and Cassandra set up. But any code i try to run is getting me error. I'm new to Scala/Spark.


When I run this code from Jupyter, it seems to work and writes a new keyspace/table to my Cassandra db:

CassandraConnector(conf).withSessionDo { session =>
  session.execute("CREATE KEYSPACE testdemo WITH REPLICATION = {'class': 'SimpleStrategy', 'replication_factor': 3 }")
  session.execute("CREATE TABLE testdemo.words (word text PRIMARY KEY, count int)")
}


In here, the schema command works, but showing the contents doesn't work and I get a lot of warning messages as below:

val rating = spark.read
  .format("org.apache.spark.sql.cassandra")
  .option("spark.cassandra.connection.host","192.168.0.100")
  .options(Map( "table" -> "testdemo", "keyspace" -> "testdemo"
  .load()

rating.schema    //schema loads ok
rating.show()    //this one throws error
WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources


When I try from spark-shell, I get this and error as below:

val testDF = spark.read.format("org.apache.spark.sql.cassandra").
    options(Map("table"->"testdemo","keyspace"->"testdemo")).
    load

testDF.show()
Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.catalyst.package$ScalaReflectionLock$


From looking around, the last error seems to be a compatibility issue.

spark-shell --packages datastax:spark-cassandra-connector:2.4.0-s_2.11

I tried to change this to use other connectors from https://mvnrepository.com/artifact/com.datastax.spark/spark-cassandra-connector, but spark-shell can't find it

--packages datastax:spark-cassandra-connector:2.4.1-2.11 //not found


Please help! I'm running the following versions.

  • Scala version 2.11.12
  • Spark version 2.4.4
  • Cassandra 3.11.4


cassandrasparkconnectorconne
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

miyone_179431 avatar image
miyone_179431 answered ·

Thanks for that. I realise it could be a resource issue, but it seems to only happen when data source is Cassandra.

And thanks for the correct package string. (I've tried many combination of this before). This one downloaded just right, however, I keep getting the same/similar error.

Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.catalyst.package$ScalaReflectionLock$
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 46 more

I would try to fix the resource issue, but just want to confirm if this is not a connector error. Thanks.

Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Russell Spitzer avatar image
Russell Spitzer answered ·

The first error is stating that your spark master has no resources to give the new application.

                 
  1. WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources



Check you spark ui, this will be different depending on the Spark Master you are using. For standalone it is port 7080/8080 on the node running the master process.

More info
https://www.datastax.com/blog/2014/10/common-spark-troubleshooting

The second error


Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.catalyst.package$ScalaReflectionLock$


Is caused by a version incompatibility error. The reason the package could not be found is you used the wrong string.

--packages com.datastax.spark:spark-cassandra-connector_2.11:2.4.1

Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.