davi_prosesor2008_192135 avatar image
davi_prosesor2008_192135 asked Erick Ramirez commented

What is the best way to do offsite backup of Cassandra?

i have 6 cassandra nodes. i set up it as clustered cassandra with 3 nodes in datacenter A and the others in datacenter B. i am using network topology for this cluster. previously i was thinking, i dont need do offsite backup, but now i need it for my cassandra cluster. when i check in datastack web about how to backup the cassandra , the information is not too detail. maybe anyone could help me? and do i need backup per node?

10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered Erick Ramirez commented

We recommend using Medusa for performing backups of open-source Apache Cassandra clusters. It was a tool that started out at Spotify and now maintained by The Last Pickle (now a part of DataStax).

Medusa allows you to backup Cassandra clusters with support for saving backups to local filesystems and cloud storage (offsite backups) including Google Cloud Storage, AWS S3 and other storage providers supported by Apache Libcloud.

As a side note in case you weren't aware, which is the ready-made platform for running Apache Cassandra in Kubernetes has Medusa bundled in as well as:

  • Reaper for automated repairs
  • Metrics Collector for monitoring with Prometheus + Grafana
  • -- a data platform for connecting to Cassandra using REST API, GraphQL API and JSON/Doc API.

All of these components are open-source so they are free to use and supported by DataStax.

For the last part of your question, you need to backup all the nodes in all the DCs for the backup to be considered valid and complete regardless of the replication factor set. Cheers!

2 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

thank you mr Erick Ramirez. i have some question before i try to use medusa , actually i am learning how to backup and restore in cassandra, as your explanation we need to do backup for all nodes, what if we only backup 1 data center , is it possible? since we use local quorum for consistency level in our application. and the next question is how to restore incremental backup? since i cant find any information in the web for restore incremental backup. thank you
0 Likes 0 ·
Erick Ramirez avatar image Erick Ramirez ♦♦ davi_prosesor2008_192135 ·

No, you need to backup a whole cluster for it to be valid. There are no guarantees that a backup of just one DC is complete. It's a case of "if you have to ask, you shouldn't do it".

Restoring incremental backups is exactly the same as restoring from snapshots except you are also restoring whatever backups there are in the backups/ directory. Cheers!

0 Likes 0 ·