DataStax Academy FAQ

DataStax Academy migrated to a new learning management system (LMS) in July 2020. We are also moving to a new Cassandra Certification process so there are changes to exam bookings, voucher system and issuing of certificates.

Check out the Academy FAQ pages for answers to your questions:


question

azim_91_184236 avatar image
azim_91_184236 asked ·

How do I handle streaming failures in sstableloader?

I am looking this article https://community.datastax.com/questions/4477/how-to-migrate-a-subset-of-tables30-tables-out-of.html to migrate data to a new cluster and have a question on step 2 under the section 'cloning a table'.

$ sstableloader -d dest_node_ip1, dest_node_ip2 /path/to/community/users/

Questions:

1. Due to network issues or for some other reasons, if the command fails in the middle of streaming the data to the new cluster, do we rerun the command? Please suggest the best practices on how to handle failures in the sstableloader command.

2. When we re-run the command on the same table, does it start streaming from the beginning again?

sstableloader
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered ·

You will need to re-run sstableloader to load the data if any of the streams get interrupted or fail. I should mention that you need to investigate the cause of the failure first before re-running it or you're like to end up with the same failure.

Investigating failures will require that you correlate the outputs from stdout/stderr with the logs on the problematic node(s).

When you do re-run the loader, it will bulk load all the files in the source directory. It is safe to do so since it effectively streams portions of the SSTables. Cheers!

3 comments Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Thanks @Erick Ramirez for the answer! Follow up questions -

What happens to 'the data that was already inserted before the failure' during the re-run? Does sstableloader keeps track or do check-pointing?

0 Likes 0 · ·

No, sstableloader just streams relevant portions of SSTables. The operation is idempotent so you can keep reloading the same data over and over and the result is the same.

For example if an SSTable contains a partition tombstone, it doesn't matter how many times you load the SSTable. The result will be the same, i.e. the partition will stay deleted. Cheers!

0 Likes 0 · ·
azim_91_184236 avatar image azim_91_184236 Erick Ramirez ♦♦ ·

Thanks @Erick Ramirez for the clarification and guidance!

0 Likes 0 · ·