question

azim_91_184236 avatar image
azim_91_184236 asked azim_91_184236 commented

How do I handle streaming failures in sstableloader?

I am looking this article https://community.datastax.com/questions/4477/how-to-migrate-a-subset-of-tables30-tables-out-of.html to migrate data to a new cluster and have a question on step 2 under the section 'cloning a table'.

$ sstableloader -d dest_node_ip1, dest_node_ip2 /path/to/community/users/

Questions:

1. Due to network issues or for some other reasons, if the command fails in the middle of streaming the data to the new cluster, do we rerun the command? Please suggest the best practices on how to handle failures in the sstableloader command.

2. When we re-run the command on the same table, does it start streaming from the beginning again?

sstableloader
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered azim_91_184236 commented

You will need to re-run sstableloader to load the data if any of the streams get interrupted or fail. I should mention that you need to investigate the cause of the failure first before re-running it or you're like to end up with the same failure.

Investigating failures will require that you correlate the outputs from stdout/stderr with the logs on the problematic node(s).

When you do re-run the loader, it will bulk load all the files in the source directory. It is safe to do so since it effectively streams portions of the SSTables. Cheers!

3 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

azim_91_184236 avatar image azim_91_184236 commented ·

Thanks @Erick Ramirez for the answer! Follow up questions -

What happens to 'the data that was already inserted before the failure' during the re-run? Does sstableloader keeps track or do check-pointing?

0 Likes 0 ·
Erick Ramirez avatar image Erick Ramirez ♦♦ azim_91_184236 commented ·

No, sstableloader just streams relevant portions of SSTables. The operation is idempotent so you can keep reloading the same data over and over and the result is the same.

For example if an SSTable contains a partition tombstone, it doesn't matter how many times you load the SSTable. The result will be the same, i.e. the partition will stay deleted. Cheers!

0 Likes 0 ·
azim_91_184236 avatar image azim_91_184236 Erick Ramirez ♦♦ commented ·

Thanks @Erick Ramirez for the clarification and guidance!

0 Likes 0 ·