I'm trying to do a spark query (to cassandra) and I want to push the resulted dataset to kafka, basically the code is as follow:
dataset .withColumn("key", col("pk")) // kafka partition key .withColumn("value", to_json(struct("*"))) // kafka msg content .write() .format("kafka")
but I'm having error:
DirectJDKLog.java:175 - Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Request processing failed; nested exception is java.lang.ClassNotFoundException: Failed to find data source: kafka. Please find packages at http://spark.apache.org/third-party-projects.html] with root cause java.lang.ClassNotFoundException: kafka.DefaultSource
I have DSE install nodetool version:
DSE version: 6.7.14
ReleaseVersion: 4.0.0.6714
My jar is a uber-jar, I have added also following section in my maven-shade-plugin in pom:
<transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>META-INF/services/org.apache.spark.sql.sources.DataSourceRegister</resource>
</transformer>
my dependencies are:
<dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-sql-kafka-0-10_2.11</artifactId> <version>2.2.3</version> <scope>test</scope> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-streaming-kafka-0-10_2.11</artifactId> <version>2.2.3</version> </dependency>
I'm doing "dse spark-submit" directly on cluster nodes (for testing purpose for now)
The kafka datasource needs to be installed separatly in cluster ??
Can I achive the same using DSE KafkaConnector ? it is possible to have an example ?