Hi, i tried my best to look around for answers, so appreciate your help please. I have Apache Spark and Cassandra set up. But any code i try to run is getting me error. I'm new to Scala/Spark.
When I run this code from Jupyter, it seems to work and writes a new keyspace/table to my Cassandra db:
CassandraConnector(conf).withSessionDo { session => session.execute("CREATE KEYSPACE testdemo WITH REPLICATION = {'class': 'SimpleStrategy', 'replication_factor': 3 }") session.execute("CREATE TABLE testdemo.words (word text PRIMARY KEY, count int)") }
In here, the schema command works, but showing the contents doesn't work and I get a lot of warning messages as below:
val rating = spark.read .format("org.apache.spark.sql.cassandra") .option("spark.cassandra.connection.host","192.168.0.100") .options(Map( "table" -> "testdemo", "keyspace" -> "testdemo" .load() rating.schema //schema loads ok rating.show() //this one throws error
WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
When I try from spark-shell, I get this and error as below:
val testDF = spark.read.format("org.apache.spark.sql.cassandra"). options(Map("table"->"testdemo","keyspace"->"testdemo")). load testDF.show()
Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.catalyst.package$ScalaReflectionLock$
From looking around, the last error seems to be a compatibility issue.
spark-shell --packages datastax:spark-cassandra-connector:2.4.0-s_2.11
I tried to change this to use other connectors from https://mvnrepository.com/artifact/com.datastax.spark/spark-cassandra-connector, but spark-shell can't find it
--packages datastax:spark-cassandra-connector:2.4.1-2.11 //not found
Please help! I'm running the following versions.
- Scala version 2.11.12
- Spark version 2.4.4
- Cassandra 3.11.4