Gangadhara M.B avatar image
Gangadhara M.B asked Gangadhara M.B commented

What is the recommended tool for backups and restores of open-source Cassandra?

I have a Cassandra 09 node cluster Apache Cassandra version 3.11.6 with single DC running on AWS EC2 Instance .

Each EC2 instance is of R5.2x large having GP2 SSD 01 TB disk .

Customer runs batches to load data onto Cassandra cluster , there are four batches runs per day and each batch runs for about 04 hrs .

Customer came up with requirement asking DBA team to have solid backup and restore/recovery use case like below

1) Take a backup just before starting running each batch , in any case batch run fails restore database with recent backup , before running batch application gateway will be set in blackout so that no application comes during the period of batch data load , once batch job completes successfully then only application gateway will be opened for application .

2) Customer doesn't want have one more DC which can be used like DR/backup , but they are always looking at cost saving solution

3) What are all the best backup and recovery method or tool for the above use case , provided it's fast and simple to use .

4) Customer mayn't want to go with taking EC2 volume level backup and push them to S3 ,

Is the Cassandra snapshot still the best option or any other open source tool for Apache Cassandra for the above use cases .

Snapshot backup consumes less space and easy to take and also it's local to node but recovering/restoring needs automation and expertise



10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered Gangadhara M.B commented

You should be able to meet your customer's requirements with nodetool snapshot. And the operation is a lot simpler since you stated in (1) that there won't be application traffic.

For what it's worth, it isn't necessary to stand-up another DC in (2) just to do backups. I should also point out that EBS volume snapshots you've mentioned in (4) is not a recommended way of backing up Cassandra since (a) not all the data files would be consistent and (b) data in memtables would not be included in the backup (until they're flushed to disk which C* snapshots do).

Example backup

Here is an example where I take a snapshot of the community keyspace.

STEP B1 - Create a snapshot:

$ nodetool snapshot community
Requested creating snapshot(s) for [community] with snapshot name [1591083719993] and options {skipFlush=false}
Snapshot directory: 1591083719993

The directory 1591083719993 name is a Unix timestamp for when the snapshot was created and is equivalent to June 2, 2020 7:41am GMT. There is one table called users in my example keyspace and the snapshot is located in the following directory structure:


Example restore

Here is an example where I restore the table.

PREPARATION - Shutdown application(s) to temporarily stop traffic to the cluster. Clear out the data from all the target tables. This command I used on my table:

cqlsh> TRUNCATE TABLE community.users;

STEP R1 - To restore a table, copy the data files in the snapshot subfolder to the table's directory. Using my table's snapshot example above:

$ cd users-6140f420a4a411ea9212efde68e7dd4b
$ cp -p snapshots/1591083719993/*.db .

STEP R2 - Force C* to reload the SSTables on disk:

$ nodetool refresh community users

Check the debug.log on the node to confirm that the SSTables were loaded.

STEP R3 - Repeat steps R1-R2 until all tables have been restored.


Both the backup and restore examples I posted only does the job on one node. With the exception of TRUNCATE, all the steps need to be carried out on all nodes in the cluster.

To achieve this, I recommend using tools you already have in your environment. If you are already using orchestration tools like Ansible, create the snapshots in parallel by running the command on all nodes simultaneously. Similarly, you can also script the restore operation so you can execute it in parallel using Ansible.

If you are not using orchestration tools, consider using Cluster SSH (cssh) or Parallel SSH (pssh) so you can run commands simultaneously on all nodes in your cluster.

Backup archive

I realise your customer does not have a requirement for off-server backups but I'll mention this anyway for completeness. For open-source Apache Cassandra, we recommend using Medusa backup software maintained by the team at The Last Pickle (who coincidentally joined Datastax in early 2020).

Medusa provides a facility for archiving backups to NFS via a local mount point as well as off-server locations including AWS S3 and Google Cloud Storage. It also makes it easy to perform restores on the same cluster or another remote cluster (cloning) from off-server backup archives.

For more info, see Backing up and restoring data. Cheers!

3 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Gangadhara M.B avatar image Gangadhara M.B commented ·

Thanks lot for detailed explanation on taking backup of a keyspace community and later restoring it back ..

My question relates to how to restore an entire keyspace from the keyspace specific snapshot saved under the backup directory across all nodes in a 09 nodes single DC cluster.

0 Likes 0 ·
Erick Ramirez avatar image Erick Ramirez ♦♦ Gangadhara M.B commented ·

Have a read through my answer again. The concept of restoring a keyspace in fact involves restoring all the tables within that keyspace. Cheers!

0 Likes 0 ·
Gangadhara M.B avatar image Gangadhara M.B Erick Ramirez ♦♦ commented ·

Again thanks lot ,, since my cluster is having 09 nodes do i need to run same set of commands(copy sstables from snapshot dir of each table to <data_dir><keyspace><table>) on all nodes parallel and later run refresh command on each node ?

My requirement is not just to recover one table , requirement is to restore entire keyspace which may have 10 or more tables

0 Likes 0 ·