Cerberus avatar image
Cerberus asked Erick Ramirez commented

Is it possible to only take snapshot of original data and not replica data?

Hi all,

I have a 4 node cluster (2 tables) in which rf is 3. The cluster has been inserted 50k (test data , actual size will be high) records. Since rf is 3, ideally there should a total of 50k *3 .

If i go to the data directory of cassandra in one of the nodes ( out of the 4)and go to the respective tables folder, i can see the sstable files. There is only one data.db file. Are the replica data that the node is responsible for is also stored in the same data.db file?. I mean are the replicas stored in other data.db files?

I am currently in the scenario in which i need to take a backup snapshot of the this 4 node cluster. Inorder to reduce disk usage i just want to take backup of the original data meaning i want to avoid taking backup of replicas ( its just reduntant right)

Is snapshot tool capable of doing this? Or is it manually possibly to achieve this? If the replica data is stored in a different data.db file , i could manually remove

any help is appreciated


10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered Erick Ramirez commented

It is not possible to select specific partitions (records) when backing up a cluster.

Cassandra does not have a concept of master/slaves, primary/secondary or active/standby -- all nodes are the same with a nothing-shared architecture and therefore no single point of failure. Because of this, each of the replica nodes are equal in a cluster. They all own the data equally so it isn't possible to be selective of what to backup.

In addition, the snapshot command uses the Java IO file utilities under the hood to create backups of SSTables. There is no database operation that takes place compared to other traditional RDBMS backups.

Under the hood, the snapshot command simply creates hard links to the original SSTable inodes at the filesystem level. This is what makes it really quick to create a snapshot of data files since there is no filesystem copy that takes place, just a hard link similar to the way Linux ln command works. Cheers!

2 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

So lets say if we have keyspace with rf = 2, and i can see in the data directory a single *data.fb file. So both replica and original data of that keyspace in the node resides in that single *data.db file?

0 Likes 0 ·

Correct, yes. All mutations (writes) are sent to all replicas and each of those replicas don't discriminate since it doesn't have a concept of primary vs copy. Cheers!

0 Likes 0 ·