Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

igor.rmarinho_185445 avatar image
igor.rmarinho_185445 asked ·

Restoring archived commitlogs does not recover deleted rows

Hi,

I have an question about the incremental backup in Cassandra,

My commitlog_archiving.properties i set like this

# to script multiple commands and add a pointer here.
archive_command=/bin/bash /backup_cassandra/inc_backup.sh
%path %name
# Command to execute to make an archived commitlog live again.
# Parameters: %from is the full path to an archived commitlog segment (from restore_directories)
#             %to is the live commitlog directory
# Example: restore_command=/bin/cp -f %from %to
restore_command=cp -f /cassandra/backup/* /cassandra/commitlog/
# Directory to scan the recovery files in.
restore_directories=/cassandra/commitlog/

this part it just copy the live logs to /cassandra/backup/*

archive_command=/bin/bash /backup_cassandra/inc_backup.sh. (Works)

/backup_cassandra/inc_backup.sh (works)

#! /bin/bash

cp /cassandra/commitlog/*.log /cassandra/backup

In this step I'm coping the archived files to the live commitlog , but it seems to not be working, since when I check the live commitlog folder the archived files are not there, only the live commitlogs

restore_command=cp -f /cassandra/backup/* /cassandra/commitlog/

1 - Am I doing something wrong?

2 - Should I delete the live commitlogs before do the point in time restore?

3- So I should always leave the archive_command= active and the rest I will active only when I want to do a restore?

Sorry to ask so many questions but dataStax documentation os not clear enough.

EDIT - What I'm doing is:

I create a simple table with names, insert a few names, deleted some of them, flush the Memtables to the disk generating archivelogs (sometimes I just do shutdown and startup of cassandra to force the archivelogs to be created) after that I perfom the step below in all nodes wth the same point_in_time date, for exemple 2020:04:10 13:10:00, but the deleted rows are noot been restored.

cassandrabackuprestore
4 comments
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Hi,

Can you read this article from datastax to know about restoring archived commilog:

https://support.datastax.com/hc/en-us/articles/115001593706-Manual-Backup-and-Restore-with-Point-in-time-and-table-level-restore-

Best regards.

0 Likes 0 · ·

Hi, I already read all the material available and is not clean enough.

I’m to make it work the whole week without success.

Thanks

0 Likes 0 · ·

Hi,

Can send different steps of your restore ? also errors you are getting ?

Thanks.

0 Likes 0 · ·

Hi dmngaya,

I describled all steps above, I have no error msg but the restore iis not working.

What I'm doing is:

I create a simple table with names, insert a few names, deleted some of them, flush the Memtables to the disk generating archivelogs (sometimes I just do shutdown and startup of cassandra to force the archivelogs to be created) after that I perfom the step below in all nodes wth the same point_in_time date, for exemple 2020:04:10 13:10:00, but the deleted rows are noot been restored.


Thank you

0 Likes 0 · ·

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered ·

@igor.rmarinho_185445 The issue you're experiencing doesn't appear to be a problem with restoring commit logs. When you delete rows in Cassandra, it is in fact doing an INSERT of a tombstone marker with a timestamp of when the tombstone was inserted.

Restoring data from backups doesn't just restore the raw data on their own -- the metadata also gets restored including the timestamp of when they were written (INSERT, UPDATE, DELETE). When you try to read the restored rows, they have an older timestamp than the tombstone. In Cassandra, the newest timestamp wins so C* knows that the rows were deleted and will not get returned as live rows in a read request. You can get more insight on this by running the query in cqlsh with tracing enabled (TRACING ON). Cheers!

2 comments Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Hi Erick,

I changed the gc_grace_seconds to 1 second to expired all tombstones in my table
- Changed the restore_point_in_time=2020:04:14 15:03:00 in all my 3 nodes that's right after my inserts.
- After I did some insert and forced the archive log to be generated
- shutdown all nodes and performed manually this command cp -f /cassandra/backup/* /cassandra/commitlog/ in all nodes.
- Start up them.

The deleted rows still not there

0 Likes 0 · ·
Erick Ramirez avatar image Erick Ramirez ♦♦ igor.rmarinho_185445 ·

@igor.rmarinho_185445 it doesn't sound like you've traced your query so you're not seeing what's going on. On startup, you didn't indicate whether you saw the commit logs getting replayed or not.

You will also need to TRUNCATE the table before restoring the backups to make sure there is no shadowed data. Cheers!

0 Likes 0 · ·