we are having a Cassandra ring consisting of 21 nodes in 3 datacenters. We have 10 keyspaces, all with RF=3. Running STCS.
This is running for more than 8 months. We are getting compliants that cassandra is giving timeouts when a certain application is requesting read actions. It is not that it is a lot of reads but it does touch data not being in cache.
normally there is no load problem. We do see warnings about tombstones during queries.
We have been looking for reasons. We have implemented Reaper and that is running with no issues now.
What we see is that in our Cassandra data directories we have a number of SStables that seem very old considering the livelyness of the application. Some SSTables are 8 months old for example, while there are also many newer SSTables (*big-Data.db).
We see that minor compactions are running.
Is it correct to assume we should not have such old SSTables? What can we do to correct if it is not. What can we do to detect issues?