question

earthling avatar image
earthling asked Erick Ramirez commented

cqlsh.py export table COPY TO using -f hangs and leaves stuck py processes forever

Cassandra version : [cqlsh 5.0.1 | Cassandra 3.11.2 | CQL spec 3.4.4 | Native protocol v4]

We are attempting backup of cassandra tables using cqlsh.py with input file option

cqlsh.py <ip> -u <un> -p <pwd> -f export.cql

export.cql has many lines of COPY TO statements for different tables

This was actually called from

cassandra/bin/cqlsh <ip> -u <un> -p <pwd>  -f export.cql >> export.log 2>&1

At times, this process gets stuck and leaves around 16 hung python processes forever (15 child + 1 parent process on a 16 core linux system running centos 7.4).

Processes do not consume much CPU (less than 1%), around 30MB memory each, no log information providing any hint

On closer look with strace and lsof utilities, it seem child processes stuck on a socket read, parent process indefinitely looping (select) and waiting for child processes to finish

Please advise if there is any known issue or a cause that would hit this. The issue is creating unnecessary load in the system, worse compounded with our periodic backup mechanism. Any pointers towards cause/resolution would be of great help to us. Thanks

Note: Cassandra may be servicing other queries (select, insert..) in parallel from application

cqlsh
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered Erick Ramirez commented

When you export a table, you're doing a full table scan and the coordinator has to retrieve all the partitions from all nodes in the cluster. When you're exporting multiple tables, the same single coordinator is firing off multiple requests to all the tables which are all doing full table scans.

This is the reason the threads are hanging -- they're waiting for all the nodes to return the result. This may work in single-node clusters or when there's only a few hundred records in the database but it won't work in most cases.

The CQL COPY command is meant to be a development tool that can unload or load several hundred to several thousands of records but not much more than that.

I recommend you use the DataStax Bulk Loader (DSBulk) instead to export data (unload) from your cluster one table at a time. It's an open-source tool that's free to use.

If you're not familiar with DSBulk, you can quickly try it out on Astra. You will need to create a login but it's free to try with no credit card required. Cheers!

6 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

earthling avatar image earthling commented ·

Thanks for quick revert!

I see - cqlsh is not a production grade tool, agree it shall have issues on production use cases (but hung processes forever!)

Unfortunately, we chose this tool for one of our use case - while it is understandable, that it may not be performant on scale, it is not convincing that this tool shall leave hung processes forever. This is adding unnecessary load to (already loaded) system resources and we end up working on a clean up handling

As you have mentioned, threads are waiting for all nodes to return result - am curious what is causing it to be a indefinite wait and get stuck with it, can there be some timeout handling that shall allow to come out of it gracefully and possibly a retry

I will check on your recommendation (DSBulk), meanwhile please advise if there could be anything done to avoid hanging threads on cqlsh - Thanks

0 Likes 0 ·
Erick Ramirez avatar image Erick Ramirez ♦♦ earthling commented ·

There really is no point. In the time it took you to post that comment, you could have already been using DSBulk. Installing it is really as simple as:

Step 1 - Download the binary tarball:

$ curl -OL https://downloads.datastax.com/dsbulk/dsbulk.tar.gz

Step 2 - Unpack the tarball:

$ tar xzf dsbulk.tar.gz

That's it -- just 2 commands.

And to export the contents of a table:

$ cd dsbulk-*
$ bin/dsbulk unload -h node_ip -k ks_name -t table_name > some_file.csv

For more examples for unloading data with DSBulk, see https://www.datastax.com/blog/datastax-bulk-loader-unloading. Cheers!

1 Like 1 ·
earthling avatar image earthling Erick Ramirez ♦♦ commented ·

Agree @Erick Ramirez, I see there really is no point :-)

Basic use case for us is data migration on upgrades (apart from data backup for disaster recovery), we used a single approach, moreover were comfortable with human readable formats where in we could massage the data on the way while migrating if there is a need. (Note: There may not be differences in cluster topology)

Having said that - DSBulk seemingly fit the bill, as an alternative to cqlsh.

Apart, There seem other options for mentioned use cases, say nodetool/sstableloader seemingly help us but in binary forms, there may be schema migration support but really unsure of data massaging part

Your advice would be of great help for specific use cases in comparison with other tools, Thanks once again for your quick revert and continued support

0 Likes 0 ·
Show more comments