Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started



shantanughar avatar image
shantanughar asked Erick Ramirez answered

Is there a way to track changes in Cassandra 3.7 for incremental data loading?

Hi, I'm very new to the ETL world and I wish to implement Incremental Data Loading with Cassandra 3.7 and Spark. I'm aware that later versions of Cassandra do support CDC, but I can only use Cassandra 3.7. Is there a method through which I can track the changed records only and use spark to load them, thereby performing incremental data loading?

If it can't be done on the cassandra end, any other suggestions are also welcome on the Spark side :)

10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered

There is a Kafka sink connector which synchronises records from a Kafka topic and writes to Cassandra. For details, see the DataStax Apache Kafka Connector.

To be clear, this is a sink connector which means that Cassandra is not the source of the data but the destination.

There is no out-of-the-box solution available at this point which allows you to consume the mutations in the Change-Data-Capture (CDC) logs in Cassandra to use as a data source for another system. You will need to implement a custom solution to achieve this. I'm sorry that I don't have any examples you could reference.

As a side note, Cassandra 3.7 is not a supported version. You will either need to install the older C* 3.0 or 3.11 releases. Cassandra 3.7 was released 4 years ago (June 2016) and is no longer maintained. Cheers!

10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.