I have a 6 (3 per datacenter) node cassandra cluster where each node has the full replicated data.
I'm using cassandra-reaper to perform repairs and I have to babysit cassandra, monitoring the CPU usage and RAM and the amount of pending tasks shown in:
nodetool compactionstats
During the repair, suddenly cassandra starts scheduling compactions for completely unrelated keyspaces (that aren't being repaired at the moment) causing the CPU usage to go to 100% for all cores (reported more than 11.0 average load over long term, measured with htop) and beginning to fail queries. However, it does not stop there. It's slowly processing the tasks, but more keep coming than it can process, eventually, while watching:
nodetool status
I notice that several nodes keep switching between up and down and some queries keep failing until I restart those nodes manually.
Each node has the following specs:
RAM: 64GB VM memory of which 32GB is used with XMX
CPU: 8 vCPU (2.0 Ghz)
Cassandra version: 3.11.4
My data is not particularly huge (before repairing my nodes reported 20GB load, now they report 50GB), however there are some tables in which one cell can contain a lot (1000+ lines) of textual data.
I have read in other places that repairing such rows consumes a lot of processing resources, but I'm confused as to why it starts to schedule 300+ pending compactions of those big tables even when they are not being repaired, while a repair is running.
I've considered disabling auto compaction during the repair with:
nodetool disableautocompaction
and then re-enabling it after the repair is done but I fear the nodes will still be going in this corrupted/confused state.
The logs keep showing output from StatusLogger.java on nodes that are VERY busy with these tasks but nothing really clearly indicating a problem.
Note that this seems to start happening when nodes are sending/receiving streams. Validation tasks don't seem to have an impact (besides some increased cpu usage, but not 100% on all cores).
It's recommended to run weekly repairs, but I can't schedule these if the nodes are so fragile despite having those specs.
Something I haven't done yet is changing the garbage collector to G1 which I think was recommended for heap sizes larger than 16GB. Using the default memory ratio formula, the cassandra nodes go out of memory during a repair.
Is this behavior familiar? Is there something I'm missing in terms of configuration? Is it a bug? Is it supposed to start all these compaction tasks (of seemingly all tables) while a repair is running for one keyspace?
I don't know, having had a lot of repairs that simply failed, I already changed some parameters that were recommended in those cases. I also just read through the configuration and changed things like throughput settings to a (somewhat) higher setting because the disks and network can take more but it seems to have no effect.
Thanks in advance!