btenlighted avatar image
btenlighted asked Erick Ramirez commented

Cassandra nodes going down with "Too many open files error"

We are using cassandra datastax community edition 3.9.0 in our demo environment. It is currently running with 3 nodes in GCP (w/ 4 vCPUs, 7.5 GB memory, & 500 GB data disks, w/ ubuntu linux). We have seen following exception in the system.log a couple of times. It seems like this is happening during compaction process.

It seems like this error comes when the process has insufficient number of max-open-files settings. I have checked that cassandra process has the recommended settings in /proc/<pid>/limits file.

Max open files 100000 100000 files

Does this exception mean that we need to increase these settings beyond 100000?

WARN [HintsWriteExecutor:1] 2020-04-22 15:28:02,233 - open(/var/lib/cassandra/hints, O_RDONLY) failed, errno (24).
WARN [HintsWriteExecutor:1] 2020-04-22 15:28:22,248 - open(/var/lib/cassandra/hints, O_RDONLY) failed, errno (24).
WARN [HintsWriteExecutor:1] 2020-04-22 15:28:42,251 - open(/var/lib/cassandra/hints, O_RDONLY) failed, errno (24).
ERROR [CompactionExecutor:23417] 2020-04-22 15:28:53,143 - Exception in thread Thread[CompactionExecutor:23417,1,main]
java.lang.RuntimeException: java.nio.file.FileSystemException: /var/lib/cassandra/data/us_where/tagpositionhistory-da427a405fb111e882a03fc4ddb7bb64/mc-143294-big-Data.db: Too many open files
    at ~[apache-cassandra-3.9.0.jar:3.9.0]
    at<init>( ~[apache-cassandra-3.9.0.jar:3.9.0]
    at<init>( ~[apache-cassandra-3.9.0.jar:3.9.0]
    at$Builder.getChannel( ~[apache-cassandra-3.9.0.jar:3.9.0]
    at$Builder.complete( ~[apache-cassandra-3.9.0.jar:3.9.0]
    at$Builder.buildData( ~[apache-cassandra-3.9.0.jar:3.9.0]
    at ~[apache-cassandra-3.9.0.jar:3.9.0]
    at ~[apache-cassandra-3.9.0.jar:3.9.0]
    at ~[apache-cassandra-3.9.0.jar:3.9.0]
    at org.apache.cassandra.db.compaction.writers.DefaultCompactionWriter.realAppend( ~[apache-cassandra-3.9.0.jar:3.9.0]
    at org.apache.cassandra.db.compaction.writers.CompactionAwareWriter.append( ~[apache-cassandra-3.9.0.jar:3.9.0]
    at org.apache.cassandra.db.compaction.CompactionTask.runMayThrow( ~[apache-cassandra-3.9.0.jar:3.9.0]
    at ~[apache-cassandra-3.9.0.jar:3.9.0]
    at org.apache.cassandra.db.compaction.CompactionTask.executeInternal( ~[apache-cassandra-3.9.0.jar:3.9.0]
    at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute( ~[apache-cassandra-3.9.0.jar:3.9.0]
    at org.apache.cassandra.db.compaction.CompactionManager$ ~[apache-cassandra-3.9.0.jar:3.9.0]
    at java.util.concurrent.Executors$ ~[na:1.8.0_72]
    at ~[na:1.8.0_72]
    at java.util.concurrent.ThreadPoolExecutor.runWorker( ~[na:1.8.0_72]
    at java.util.concurrent.ThreadPoolExecutor$ [na:1.8.0_72]
    at [na:1.8.0_72]
Caused by: java.nio.file.FileSystemException: /var/lib/cassandra/data/us_where/tagpositionhistory-da427a405fb111e882a03fc4ddb7bb64/mc-143294-big-Data.db: Too many open files
    at sun.nio.fs.UnixException.translateToIOException( ~[na:1.8.0_72]
    at sun.nio.fs.UnixException.rethrowAsIOException( ~[na:1.8.0_72]
    at sun.nio.fs.UnixException.rethrowAsIOException( ~[na:1.8.0_72]
    at sun.nio.fs.UnixFileSystemProvider.newFileChannel( ~[na:1.8.0_72]
    at ~[na:1.8.0_72]
    at ~[na:1.8.0_72]
    at ~[apache-cassandra-3.9.0.jar:3.9.0]
    ... 20 common frames omitted
ERROR [Reference-Reaper:1] 2020-04-22 15:29:04,918 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@64cb542e) to class org.apache.cassandra.utils.concurrent.WrappedSharedCloseable$Tidy@1779214023:[Memory@[0..3e4), Memory@[0..26e8)] was not released before the reference was garbage collected
ERROR [Reference-Reaper:1] 2020-04-22 15:29:04,920 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@6e333f5e) to class$Cleanup@724675836:/var/lib/cassandra/data/us_where/tagpositionhistory-da427a405fb111e882a03fc4ddb7bb64/mc-143294-big-Index.db was not released before the reference was garbage collected
datastax community edition
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered Erick Ramirez commented

You are correct. You need to bump up the max open files to 200 or 300K to get through the backlog of compactions.

As a side note, the number of files may or may not be an issue in your cluster depending on whether the nodes are dense or not (data size). But if the nodes are constantly getting overloaded, there's the possibility that memtables are getting flushed constantly to free up memory and the nodes have lots of tiny SSTables (much smaller than 32MB and maybe even thousands of *-Data.db files smaller than 1MB).

And finally for something unrelated to the issue you posted, I think you meant DataStax Community Edition (DSC) version 3.0.9. DSC was last released in 2017 and is no longer supported. The version of Apache Cassandra in that version is very old and I would recommend upgrading to Apache Cassandra 3.11.6. Cheers!

4 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

btenlighted avatar image btenlighted commented ·

Thanks. I have changed the max file limit to 200000 now. I will observe if we got rid of it.

[Follow up question posted in #6861]

0 Likes 0 ·
Erick Ramirez avatar image Erick Ramirez ♦♦ btenlighted commented ·

Just to be clear, increasing the max open files parameter is just a workaround. It doesn't fix the underlying issue that the nodes have thousands of tiny data files. Cheers!

P.S. I have converted your post to a comment since it's no an "answer".

0 Likes 0 ·
btenlighted avatar image btenlighted Erick Ramirez ♦♦ commented ·

Yes, I understand. Actually, when I checked now, tagpositionhistory table (for which I got this exception originally on the same node) has only 21 *-Data.db files,

  • the largest one being 49 GB,
  • smallest one 16 MB .
  • total folder size of this table is 217 GB

Do you think the compaction already merged the large number of db files before failing?

0 Likes 0 ·
Show more comments