We have a DSE v6.0.4 cluster having 3 DCs, total of 12 nodes.
- DC1 - Search DC - 3 nodes
- DC2- SearchAnalytics DC - 3 nodes
- DC3 - Analytics DC - 6 nodes
In system.log of DC3, constantly getting below error-
ERROR [MessagingService-Incoming-/10.1.87.91] 2020-05-11 16:06:03,075 MessagingService.java:825 - java.util.concurrent.RejectedExecutionException while receiving WRITES.WRITE from /10.1.87.91, caused by: Too many pending remote requests! INFO [ScheduledTasks:1] 2020-05-11 16:06:03,924 DroppedMessages.java:104 - MUTATION messages were dropped in last 5000 ms: 2 internal and 689 cross node. Mean internal dropped latency: 2046 ms and Mean cross-node dropped latency: 2084 ms
nodetool tpstats in one of the nodes of DC3 has pending tasks for HintsDispatcher -
Pool Name Active Pending (w/Backpressure) Delayed Completed Blocked All time blocked HintsDispatcher 2 10 (N/A) N/A 0 0 0
Also, /var/lib/cassandra/hints directory has size of 204G.
I have tried nodetool repair -full, one node at a time, on all the nodes of the cluster, but still the cassandra hints directory size is around 200G. And the hints dispatcher active and pending tasks is not reducing.
What should be the next step we should try?