Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

Cerberus avatar image
Cerberus asked ·

Is it possible to only take snapshot of original data and not replica data?

Hi all,

I have a 4 node cluster (2 tables) in which rf is 3. The cluster has been inserted 50k (test data , actual size will be high) records. Since rf is 3, ideally there should a total of 50k *3 .

If i go to the data directory of cassandra in one of the nodes ( out of the 4)and go to the respective tables folder, i can see the sstable files. There is only one data.db file. Are the replica data that the node is responsible for is also stored in the same data.db file?. I mean are the replicas stored in other data.db files?

I am currently in the scenario in which i need to take a backup snapshot of the this 4 node cluster. Inorder to reduce disk usage i just want to take backup of the original data meaning i want to avoid taking backup of replicas ( its just reduntant right)

Is snapshot tool capable of doing this? Or is it manually possibly to achieve this? If the replica data is stored in a different data.db file , i could manually remove

any help is appreciated

Thanks

backup
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered ·

It is not possible to select specific partitions (records) when backing up a cluster.

Cassandra does not have a concept of master/slaves, primary/secondary or active/standby -- all nodes are the same with a nothing-shared architecture and therefore no single point of failure. Because of this, each of the replica nodes are equal in a cluster. They all own the data equally so it isn't possible to be selective of what to backup.

In addition, the snapshot command uses the Java IO file utilities under the hood to create backups of SSTables. There is no database operation that takes place compared to other traditional RDBMS backups.

Under the hood, the snapshot command simply creates hard links to the original SSTable inodes at the filesystem level. This is what makes it really quick to create a snapshot of data files since there is no filesystem copy that takes place, just a hard link similar to the way Linux ln command works. Cheers!

2 comments Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

So lets say if we have keyspace with rf = 2, and i can see in the data directory a single *data.fb file. So both replica and original data of that keyspace in the node resides in that single *data.db file?

0 Likes 0 ·

Correct, yes. All mutations (writes) are sent to all replicas and each of those replicas don't discriminate since it doesn't have a concept of primary vs copy. Cheers!

0 Likes 0 ·