Cassandra version : [cqlsh 5.0.1 | Cassandra 3.11.2 | CQL spec 3.4.4 | Native protocol v4]
We are attempting backup of cassandra tables using cqlsh.py with input file option
cqlsh.py <ip> -u <un> -p <pwd> -f export.cql
export.cql has many lines of COPY TO statements for different tables
This was actually called from
cassandra/bin/cqlsh <ip> -u <un> -p <pwd> -f export.cql >> export.log 2>&1
At times, this process gets stuck and leaves around 16 hung python processes forever (15 child + 1 parent process on a 16 core linux system running centos 7.4).
Processes do not consume much CPU (less than 1%), around 30MB memory each, no log information providing any hint
On closer look with strace and lsof utilities, it seem child processes stuck on a socket read, parent process indefinitely looping (select) and waiting for child processes to finish
Please advise if there is any known issue or a cause that would hit this. The issue is creating unnecessary load in the system, worse compounded with our periodic backup mechanism. Any pointers towards cause/resolution would be of great help to us. Thanks
Note: Cassandra may be servicing other queries (select, insert..) in parallel from application