Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

alfhv avatar image
alfhv asked Erick Ramirez answered

How do I publish to Kafka from Spark dataset?

I'm trying to do a spark query (to cassandra) and I want to push the resulted dataset to kafka, basically the code is as follow:

dataset
.withColumn("key", col("pk")) // kafka partition key
.withColumn("value", to_json(struct("*"))) // kafka msg content
.write()
.format("kafka")

but I'm having error:

DirectJDKLog.java:175 - Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Request processing failed; nested exception is java.lang.ClassNotFoundException: Failed to find data source: kafka. Please find packages at http://spark.apache.org/third-party-projects.html] with root cause 
java.lang.ClassNotFoundException: kafka.DefaultSource

I have DSE install nodetool version:
DSE version: 6.7.14
ReleaseVersion: 4.0.0.6714

My jar is a uber-jar, I have added also following section in my maven-shade-plugin in pom:

<transformer     implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>META-INF/services/org.apache.spark.sql.sources.DataSourceRegister</resource>
</transformer>

my dependencies are:

<dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql-kafka-0-10_2.11</artifactId>
            <version>2.2.3</version>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-streaming-kafka-0-10_2.11</artifactId>
            <version>2.2.3</version>
        </dependency>

I'm doing "dse spark-submit" directly on cluster nodes (for testing purpose for now)

The kafka datasource needs to be installed separatly in cluster ??

Can I achive the same using DSE KafkaConnector ? it is possible to have an example ?

sparkkafka
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered

Unfortunately there is no Kafka connector available where DSE or Cassandra is the source. The only connector available is a sink -- https://github.com/datastax/kafka-sink..

I'm unable to provide you assistance for this issue in a Q&A forum. My suggestion is that you log a ticket with DataStax Support so one of our engineers can assist you directly. Cheers!

Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.