question

Jyothi avatar image
Jyothi asked Erick Ramirez answered

Do two incremental snapshots contain duplicate data?

Does two incremental snapshot contains duplicate data?

backup
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

steve.lacerda avatar image
steve.lacerda answered steve.lacerda edited

Snapshots are point in time. So, what does that mean? It means that when the nodes create an incremental snapshot, the snapshot links to the sstable files. Thus, nothing is ever stored twice, it's just a link to the sstable file. The file is not moved, copied, or anything else.

Let me provide an example:

1) Node takes a full snapshot and has 1 sstable file:

sstable-1.db

2) Some data is added and now we have an additional sstable file:

sstable-2.db

3) An incremental is taken, which takes only the changes from the full backup to that point in time, so we now have a link to sstable-2.db.

These are just links and not actual files.

4) Some data is added and now we have an additional sstable file:

sstable-3.db

5) Another incremental is taken, which takes only the changes since the last full backup to that point in time. Thus, you have a link now for sstable-2.db, sstable-3.db. Remember, these are point in time. So, this snapshot only cares about the changes since the last full snapshot and not the previous incremental snapshots.

If a compaction occurs and one of the sstable files is deleted from the live FS, then the file is not actually removed because it still has hard links associated with the file. Thus, you will only have one sstable file taking up space, the rest is all links with minimal overhead.

Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Erick Ramirez avatar image
Erick Ramirez answered

No two incremental backups (on the same Cassandra node) will ever contain the same SSTables so they will never be duplicated because the snapshots are taken when memtables are flushed to disk (SSTables).

Incremental snapshots use a completely different mechanism to generic backups which take snapshots of all SSTables already on disk. To put it differently:

  • Incremental backups are snapshots of newly created SSTables when memtables are flushed to disk.
  • Generic backups are snapshots of existing SSTables.

Cheers!

Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.