I am working in a project data science . I am obligate to read data from cassandra in spark in real time . Anyone have solution for this issues ?
Bringing together the Apache Cassandra experts from the community and DataStax.
Want to learn? Have a question? Want to share your expertise? You are in the right place!
Not sure where to begin? Getting Started
I'm not sure if you're using OSS or DSE, but in DSE you can use spark streaming:
All streaming relies upon some underlying messaging system, like Kafka or Pulsar. So, you'll need to implement that first in order to receive messages which then get processed and sent to Cassandra.
Here's a doc providing an example of a sink between Kafka and Cassandra:
The Spark Streaming in the spark-cassandra-connector provides a mechanism for consuming data from sources like Akka and Kafka and store the data in Cassandra. It only works where Cassandra is the destination (sink), not the source.
With DataStax CDC, CDC agents installed on the same nodes as Cassandra capture changes (mutations) from the commitlog, deduplicates them, then streams the data to Apache Pulsar. Your Spark app can then subscribe to the relevant Pulsar topic and process the stream.
For more info, see the blog post Shatter Data Silos with DataStax Change Data Capture for Apache Cassandra. Cheers!
9 People are following this question.