question

subhrangshu01 avatar image
subhrangshu01 asked Erick Ramirez edited

Connecting to Astra DB from Spark 3.1.3, getting ClassNotFoundException

I have used Spark 3.1.3 to connect Astra, but I am getting java.lang.ClassNotFoundException error on spark-submit.

Here is my Main.scala:

import com.datastax.spark.connector.{toSparkContextFunctions}
import org.apache.spark.sql.SparkSession
import org.apache.spark.{SparkConf, SparkContext}


object Main {

  def main(args: Array[String]) = {
    val conf = new SparkConf()
      .set("spark.cassandra.connection.config.cloud.path","/path/astradb-secure-connect.zip")
      .set("spark.cassandra.auth.username","client-id")
      .set("spark.cassandra.auth.password","client-secret")
      .setAppName("SparkTest")
    val sparkCTX = new SparkContext(conf)

    val sparkSess = SparkSession.builder.appName("MyStream").getOrCreate()
    sparkCTX.setLogLevel("Error")
    println("\n\n\n\n************\n\n\n\n")

    val Rdd = sparkCTX.cassandraTable("my_keyspace", "accounts") // Exact Line of Error
    Rdd.foreach(s => {
      println(s)
    })
  }
}

My build.sbt looks like:

name := "Synchronization"

version := "1.0-SNAPSHOT"


scalaVersion := "2.12.15"

idePackagePrefix := Some("info.myapp.synchronization")


val sparkVersion = "3.1.3"

libraryDependencies ++= Seq(
  "org.apache.spark" %% "spark-core" % sparkVersion,
  "org.apache.spark" %% "spark-sql" % sparkVersion,
  "org.apache.spark" %% "spark-mllib" % sparkVersion,
  "org.apache.spark" %% "spark-streaming" % sparkVersion,
  "io.spray" %%  "spray-json" % "1.3.6",
  "org.scalaj" %% "scalaj-http" % "2.4.2"
)

libraryDependencies += "com.datastax.spark" %% "spark-cassandra-connector" % "3.1.0" % "provided"

libraryDependencies += "com.twitter" % "jsr166e" % "1.1.0"

libraryDependencies += "net.liftweb" %% "lift-json" % "3.4.3"

libraryDependencies += "com.sun.mail" % "javax.mail" % "1.6.2"

libraryDependencies += "com.typesafe.akka" %% "akka-stream" % "2.5.22"

libraryDependencies += "com.github.jurajburian" %% "mailer" % "1.2.4"

The code I am using to run this is:

$ sbt package
$ spark-submit --class "Main" --jars $(echo localDependencies/*.jar | tr ' ' ',') target/scala-2.12/*jar

My localDependency folder contains jars downloaded from mvnrepository and looks like:

screenshot-from-2022-07-06-19-10-50.png

My error goes like:

Exception in thread "main" java.lang.NoClassDefFoundError: com/datastax/spark/connector/CassandraRow
        at Main$.main(Main.scala:20)
        at Main.main(Main.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: com.datastax.spark.connector.CassandraRow
        at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
        ... 14 more

Please help me fix the issue, or please provide the working versions combinations

spark-cassandra-connector
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

yukim avatar image
yukim answered yukim edited

libraryDependencies += "com.datastax.spark" %% "spark-cassandra-connector" % "3.1.0" % "provided"

Since the dependency scope is "provided", it is not included at runtime.

You can either remove "provided" above, or use the following command line option "--packages" to download and use it when submitting the job:

$ spark-submit --packages "com.datastax.spark:spark-cassandra-connector_2.12:jar:3.1.0" ...

Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

jaroslaw.grabowski_50515 avatar image
jaroslaw.grabowski_50515 answered

Hi! Make sure that all of the spark-cassandra-connector dependencies are on your classpath. spark-cassandra-connector_2.12:jar:3.1.0 is not enough.

Either include the spark-cassandra-connector_2.12 with `--packages` (which downloads and includes the dependencies automatically) or switch to spark-cassandra-connector-assembly_2.12 (this artifact contains all the dependencies inside the jar).

Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.