DataStax Academy FAQ

DataStax Academy migrated to a new learning management system (LMS) in July 2020. We are also moving to a new Cassandra Certification process so there are changes to exam bookings, voucher system and issuing of certificates.

Check out the Academy FAQ pages for answers to your questions:


question

igor.rmarinho_185445 avatar image
igor.rmarinho_185445 asked ·

Why is the data missing when I restore a full backup on a node?

Hi,

I'm trying to restore a full backup as describe in DataStax website, but it/s not recognizing my keyspaces from the backup.

I pass as var the snapshot.tar and the snapshot name, they are in the right place, inside of /cassandra/data but when I log in any of my keyspaces are there only the system keyspaces.

Any idea on it?

Thank you.

if [ -z "$1" ]
then
        echo "ERROR: Snapshot_name.tar, keyspace_name or table_name missing"
elif [ -z "$2" ]
then
        echo "ERROR: Snapshot_name.tar, keyspace_name or table_name missing"
else

service dse stop
yum remove "dse-*"

#Completely purge any existing data or configuration directories and files.
rm -rf /etc/dse/*
rm -rf /var/lib/cassandra/*
rm -rf /var/log/cassandra/*
rm -rf /usr/share/dse/*
rm -rf /var/log/spark/*
rm -rf /var/lib/spark/*
rm -rf /var/lib/dsefs/*
rm -rf /var/log/cassandra/*
rm -rf /media/dse/*
rm -rf /media/cassandra/*
rm -rf /media/commitlog/*
rm -rf /media/cdc_raw/*

# Install cassandra from terraform
yum install dse-full-6.8.0 -y

# Extract your backup to the root of your drive
tar -xvf ${BACKUP_PATH}/${BACKUP_NAME} -C /

# Move the snapshot files up two levels to their keyspace directories. Replace snapshot_name with the name of the snapshot created during the backup.
find  ${CASSANDRA_DATA} -mindepth 2 -path '*/snapshots/${SNAPSHOT_NAME}/*' -type f \
-exec bash -c 'dir={} && cd ${dir%/*} && mv {} ../..' \;

#Start dse
chown -R cassandra:cassandra /media/
chown -R cassandra:cassandra /var/lib/cassandra/
chown -R cassandra:cassandra /var/log/cassandra/
chown -R cassandra:cassandra /var/run/cassandra/
chown -R cassandra:cassandra /usr/share/dse/
chown -R cassandra:cassandra /var/log/spark/
chown -R cassandra:cassandra /var/lib/spark/
chown -R cassandra:cassandra /var/lib/dsefs/
chown -R cassandra:cassandra /etc/dse/
chown -R cassandra:cassandra /usr/share/dse/
service dse start

fi
media/cassandra/data/system_auth/role_members-0ecdaa87f8fb3e6088d174fb36fe5c0d/snapshots/lab-usa-daily-full-10.20.50.3-202006301802/manifest.json
media/cassandra/data/system_auth/role_permissions-3afbe79f219431a7add7f5ab90d8ec9c/snapshots/lab-usa-daily-full-10.20.50.3-202006301802/manifest.json
media/cassandra/data/system_auth/roles-5bc52802de2535edaeab188eecebb090/snapshots/lab-usa-daily-full-10.20.50.3-202006301802/ba-1-bti-TOC.txt
media/cassandra/data/system_auth/roles-5bc52802de2535edaeab188eecebb090/snapshots/lab-usa-daily-full-10.20.50.3-202006301802/ba-1-bti-Filter.db
media/cassandra/data/system_auth/roles-5bc52802de2535edaeab188eecebb090/snapshots/lab-usa-daily-full-10.20.50.3-202006301802/ba-1-bti-Statistics.db
media/cassandra/data/system_auth/roles-5bc52802de2535edaeab188eecebb090/snapshots/lab-usa-daily-full-10.20.50.3-202006301802/ba-1-bti-Data.db
media/cassandra/data/system_auth/roles-5bc52802de2535edaeab188eecebb090/snapshots/lab-usa-daily-full-10.20.50.3-202006301802/ba-1-bti-Rows.db
media/cassandra/data/system_auth/roles-5bc52802de2535edaeab188eecebb090/snapshots/lab-usa-daily-full-10.20.50.3-202006301802/ba-1-bti-CompressionInfo.db
media/cassandra/data/system_auth/roles-5bc52802de2535edaeab188eecebb090/snapshots/lab-usa-daily-full-10.20.50.3-202006301802/ba-1-bti-Partitions.db
media/cassandra/data/system_auth/roles-5bc52802de2535edaeab188eecebb090/snapshots/lab-usa-daily-full-10.20.50.3-202006301802/ba-1-bti-Digest.crc32
media/cassandra/data/system_auth/roles-5bc52802de2535edaeab188eecebb090/snapshots/lab-usa-daily-full-10.20.50.3-202006301802/manifest.json
restore
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Erick Ramirez avatar image
Erick Ramirez answered ·

Backups in Cassandra

As Bettina already pointed out, the way you are restoring the data is not valid. A "full backup" in Cassandra isn't the same concept compared to relational databases. For example, a full C* backup is not the equivalent of a cold backup in Oracle so you can't just copy the files to a server as you would restoring a cold backup.

Bettina's already pointed out the appropriate ways to perform restores in Cassandra so I won't go over them again. Instead, let me answer your question directly.

Root cause

The reason that you cannot "read" the data after you performed the restore is because you never created the non-system keyspace(s) and table(s) so the schema doesn't exist. Cassandra will not look for the data files on disk if the tables don't exist.

In addition, if you just unpacked the files with the original directory structure, Cassandra wouldn't find the data files you copied. Each table's directory name on the filesystem is unique and is specific to each cluster's schema version.

In older versions of Cassandra, dropping a table doesn't remove the data files from the disk. If an operator creates a table with the exact same name, the data files on disk from the old version of the table results in all sorts of errors particularly if the old schema (primary key, partition key, columns) doesn't match the new schema.

To prevent this from happening, the table's CF ID (column family ID) is appended to the table's directory name since Cassandra 2.2 (CASSANDRA-5202).

Solution

Create the schema using the schema.cql included in the snapshot of each table.

After you've created the schema, you need to move the data files for each table into the new respective directories. For what it's worth, the CF ID in the directory suffix is a time UUID and has to match the CF ID in the new schema. Cheers!

2 comments Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Thanks! It makes sense.

0 Likes 0 · ·
Erick Ramirez avatar image Erick Ramirez ♦♦ igor.rmarinho_185445 ·

Glad to help. Cheers!

0 Likes 0 · ·
bettina.swynnerton avatar image
bettina.swynnerton answered ·

Hi @igor.rmarinho_185445,

there are a couple of things that stand out to me in your post:

First: There are two ways of restoring from backup - copying sstables and loading sstables.

Copying sstables is only applicable if you are restoring onto a cluster with the exact same token layout. This is for example the case when you are restoring back onto the same local node.

Since you are doing a new installation of DSE, the token layout is likely to be different. If you now copy sstables, nodes will have data that they are not responsible for, and they will be missing data that should be partitioned onto this node.

If you want to restore your snapshot onto a newly installed cluster, you will need to stream your sstables with sstableloader. This tool will read the sstables and copy the relevant data to the relevant nodes according to the token assignment.


Second: you do not mention the creation of schema.

Before you can copy or stream the sstables onto the new node, you need to have the matching schema for all copied keyspaces in place.

When you create the schema in cql, this will in turn create the relevant data directories for the keyspaces and tables. You then copy the sstables into those directories.

After the copy process you can do a full restart or a nodetool refresh to refresh Cassandra's knowledge of these sstables.


You mentioned that you followed the instructions from the website, I am not sure which document exactly. This one is a good starting point for restoring snapshots, and it covers what I mentioned above:

https://docs.datastax.com/en/dse/6.8/dse-admin/datastax_enterprise/operations/opsBackupSnapshotRestore.html

I hope this helps to understand where this might be failing for you.


1 comment Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

I’ll try this out! Thanks Bettina!

0 Likes 0 · ·