Bringing together the Apache Cassandra experts from the community and DataStax.
Want to learn? Have a question? Want to share your expertise? You are in the right place!
Not sure where to begin? Getting Started
Hardlinks are not a Cassandra concept so this isn't really a Cassandra question but I'll try to explain.
Hardlinks are implemented by the underlying server filesystem and are pointers to the original filesystem inodes of the SSTables. This means that they are controlled and managed by the filesystem -- not Cassandra.
In a Linux filesystem, creating a hardlink to another file simply creates a new entry in a directory that points to the same inode of the original file. The original file itself is a hardlink meaning that if you create two hardlinks, the respective inode has 3 pointers to it.
Let me illustrate with an example. Consider a text file in a directory:
$ ls -i * datadir: 17337322 users.txt
The text file's inode number is
If I create a hardlink in another directory called
snapshot and give it a different filename:
$ ln datadir/users.txt snapshot/somefile.txt
The new file
somefile.txt has a different filename but has the same inode as
ls -i * datadir: 17337322 users.txt snapshot: 17337322 somefile.txt
It may have a different filename but it's the exact same file. I can also create another hardlink in another directory and give it the same filename:
$ ln datadir/users.txt yetanotherdir/users.txt $ ls -i * datadir: 17337322 users.txt snapshot: 17337322 somefile.txt yetanotherdir: 17337322 users.txt
To be clear, there aren't 3 copies of the file -- they're all just the one file in 3 different directories with pointers to the same file's inode.
If I delete the original
datadir/users.txt, the other files remain because they are hardlinks:
$ rm datadir/users.txt $ ls -i * datadir: snapshot: 17337322 somefile.txt yetanotherdir: 17337322 users.txt
To answer your question, no. The snapshot does not get corrupted when SSTables get compacted out. The inode does not get wiped from the filesystem when a pointer to it exists in the snapshots subdirectories. For this reason, you need to manually cleanup old snapshots that you no longer require because they take up disk space as stated in the Taking a snapshot document:
A single snapshot requires little disk space. However, snapshots can cause your disk usage to grow more quickly over time because a snapshot prevents old obsolete data files from being deleted.
Follow the instructions in Deleting snapshot files for details. Cheers!
6 People are following this question.