lavaraja.padala_150810 avatar image
lavaraja.padala_150810 asked lavaraja.padala_150810 commented

Cassandra node is failing with error "Too many open files"

We have a Cassandra cluster of 8 nodes (Apache Cassandra 3.11.11). Due to disk failure on 2 nodes we have removed those nodes from the cluster. While trying to add them back to the cluster the bootstring process is failing with below error.

ERROR [STREAM-IN-/] 2021-10-10 23:08:03,437 - Exiting forcefully due to file system exception on startup, disk failure policy "stop" /cassandra/data/keyspace1/data_tbl-b4f243f986c711e8a0bc25f553f28aa6/me-62630-big-Filter.db (Too many open file)
 at$IndexWriter.flushBf( ~[apache-cassandra-3.11.11.jar:3.11.11]
 at$IndexWriter.doPrepare( ~[apache-cassandra-3.11.11.jar:3.11.11]
 at org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.prepareToCommit( ~[apache-cassandra-3.11.11.jar:3.11.11]
 at$TransactionalProxy.doPrepare( ~[apache-cassandra-3.11.11.jar:3.11.11]
 at org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.prepareToCommit( ~[apache-cassandra-3.11.11.jar:3.11.11]
 at org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.finish( ~[apache-cassandra-3.11.11.jar:3.11.11]
 at ~[apache-cassandra-3.11.11.jar:3.11.11]
 at ~[apache-cassandra-3.11.11.jar:3.11.11]
 at ~[apache-cassandra-3.11.11.jar:3.11.11]
 at org.apache.cassandra.streaming.StreamReceiveTask.received( ~[apache-cassandra-3.11.11.jar:3.11.11]
 at org.apache.cassandra.streaming.StreamSession.receive( ~[apache-cassandra-3.11.11.jar:3.11.11]
 at org.apache.cassandra.streaming.StreamSession.messageReceived( ~[apache-cassandra-3.11.11.jar:3.11.11]
 at org.apache.cassandra.streaming.ConnectionHandler$ ~[apache-cassandra-3.11.11.jar:3.11.11]
 at ~[na:1.8.0_102]
Caused by: /cassandra/data/keyspace1/data_tbl-b4f243f986c711e8a0bc25f553f28aa6/me-62630-big-Filter.db (Too many open files)
 at Method) ~[na:1.8.0_102]

This issue is caused by table keyspace1/data_tbl which has too many sstable under its directory. Few sstables are having size less than 1mb.

[cassandra@host data_tbl -b4f243f986c711e8a0bc25f553f28aa6]$ ls -lcrt |wc -l

As temporary fix we have increased open fie limit but that didn't fix the issue and node is again started failing during bootstrap process with above error. Any solution to fix this issue?

10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered lavaraja.padala_150810 commented

In my experience when nodes have thousands of tiny files, it's usually cause by your cluster getting overloaded. When a node gets a high volume of writes, the JVM heap is under pressure so memtables constantly flush their contents to disk to free up memory.

The constant flushing results in small amounts of data into SSTables. At some point, the compaction will eventually catchup and coalesce the small files into larger SSTables.

Until such time that compaction catches up, you need to increase the number of open file descriptors on the operating system. Our general recommendation is to set it to one million. You will need to keep increasing it temporarily until you can get through the bootstrap process. Cheers!

1 comment Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

lavaraja.padala_150810 avatar image lavaraja.padala_150810 commented ·

Thank you. We will increase the limit and try.

0 Likes 0 ·