Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

murari.rameshbabu_114828 avatar image
murari.rameshbabu_114828 asked Erick Ramirez edited

How can I save historical data into a s3 bucket?

What are the possible ways to store the data into S3 bucket and if required restore the data from S3 bucket into Cassandra

Using 3 nodes Cassandra cluster with 3.5 TB of data on each node and also 168 Tables, deleting the data using node js code.

Requirement:

We are deleting the 3 years old data everyday.

before deleting the data, need to store that data into S3 bucket.

if required need to take the data from S3 bucket and restore it in Cassandra.

backup
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

steve.lacerda avatar image
steve.lacerda answered murari.rameshbabu_114828 commented

Hello! If you're using DSE then you can use OpsCenter and if you're using OSS Cassandra then you can use Medusa. Both tools will allow you to backup to an s3 storage bucket and then restore from the s3 bucket if necessary. You can find both here:

https://docs.datastax.com/en/opscenter/6.8/opsc/online_help/services/opscBackupServiceAddS3Location.html

https://github.com/thelastpickle/cassandra-medusa

1 comment Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Requirement:

Every day we are deleting the 3 years old data , we want this data to be upload into s3 bucket before deleting from Cassandra.

If required the data to be stored from S3 bucket to Cassandra.

able to get this scenario using Medusa tool.

0 Likes 0 ·
Erick Ramirez avatar image
Erick Ramirez answered

You can't just selectively pull out the a subset of records to "store" in S3 without writing an app for it. Cassandra doesn't work that way.

You can only create backups of SSTables on disks (Cassandra snapshots) to then archive off-server (including S3). But this will backup all the data in a table/keyspace, not just the data you're deleting. And if you decide to restore the snapshots, it will recover all the data in those SSTables.

Alternatively, you can export the data to CSV files using DSBulk and then store the files to S3. If you want to restore the CSV, you can use DSBulk to bulk-load them into the cluster.

If you're new to Cassandra, I'd recommend you familiarise yourself with how Backups and Restores work. Cheers!

Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.