We have deployed stand-alone, unclustered Cassandra 3.11.2 environments with various keyspaces containing tables used for different purposes. Recently, we have intermittently seen errors like the following in the Cassandra logs related to one of the keyspaces.
ERROR [MemtableFlushWriter:352] 2021-07-20 00:02:37,691 LogTransaction.java:273 - Transaction log [mc_txn_flush_5224b250-e90f-11eb-80d3-0dc12b55b1d5.log in /qond/apps/mfgpro/databases/cassandra/default/data/data/system_schema/columns-24101c25a2ae3af787c1b40ee1aca33f] indicates txn was not completed, trying to abort it now ERROR [MemtablePostFlush:307] 2021-07-20 00:02:37,800 CassandraDaemon.java:228 - Exception in thread Thread[MemtablePostFlush:307,5,main] java.lang.RuntimeException: java.nio.file.FileSystemException: /qond/apps/mfgpro/databases/cassandra/default/data/data/system_schema/columns-24101c25a2ae3af787c1b40ee1aca33f/mc_txn_flush_5224b250-e90f-11eb-80d3-0dc12b55b1d5.log: Operation not permitted at org.apache.cassandra.io.util.FileUtils.write(FileUtils.java:590) ~[apache-cassandra-3.11.2.jar:3.11.2] at org.apache.cassandra.io.util.FileUtils.appendAndSync(FileUtils.java:571) ~[apache-cassandra-3.11.2.jar:3.11.2] at org.apache.cassandra.db.lifecycle.LogReplica.append(LogReplica.java:85) ~[apache-cassandra-3.11.2.jar:3.11.2] at org.apache.cassandra.db.lifecycle.LogReplicaSet.lambda$null$5(LogReplicaSet.java:210) ~[apache-cassandra-3.11.2.jar:3.11.2] at org.apache.cassandra.utils.Throwables.perform(Throwables.java:113) ~[apache-cassandra-3.11.2.jar:3.11.2] at org.apache.cassandra.utils.Throwables.perform(Throwables.java:103) ~[apache-cassandra-3.11.2.jar:3.11.2] at org.apache.cassandra.db.lifecycle.LogReplicaSet.append(LogReplicaSet.java:210) ~[apache-cassandra-3.11.2.jar:3.11.2] at org.apache.cassandra.db.lifecycle.LogFile.addRecord(LogFile.java:324) ~[apache-cassandra-3.11.2.jar:3.11.2] at org.apache.cassandra.db.lifecycle.LogFile.add(LogFile.java:285) ~[apache-cassandra-3.11.2.jar:3.11.2] at org.apache.cassandra.db.lifecycle.LogTransaction.trackNew(LogTransaction.java:136) ~[apache-cassandra-3.11.2.jar:3.11.2] at org.apache.cassandra.db.lifecycle.LifecycleTransaction.trackNew(LifecycleTransaction.java:529) ~[apache-cassandra-3.11.2.jar:3.11.2] at org.apache.cassandra.io.sstable.format.big.BigTableWriter.<init>(BigTableWriter.java:81) ~[apache-cassandra-3.11.2.jar:3.11.2] at org.apache.cassandra.io.sstable.format.big.BigFormat$WriterFactory.open(BigFormat.java:92) ~[apache-cassandra-3.11.2.jar:3.11.2] at org.apache.cassandra.io.sstable.format.SSTableWriter.create(SSTableWriter.java:102) ~[apache-cassandra-3.11.2.jar:3.11.2] at org.apache.cassandra.io.sstable.SimpleSSTableMultiWriter.create(SimpleSSTableMultiWriter.java:119) ~[apache-cassandra-3.11.2.jar:3.11.2] at org.apache.cassandra.db.compaction.AbstractCompactionStrategy.createSSTableMultiWriter(AbstractCompactionStrategy.java:587) ~[apache-cassandra-3.11.2.jar:3.11.2] at org.apache.cassandra.db.compaction.CompactionStrategyManager.createSSTableMultiWriter(CompactionStrategyManager.java:1027) ~[apache-cassandra-3.11.2.jar:3.11.2] at org.apache.cassandra.db.ColumnFamilyStore.createSSTableMultiWriter(ColumnFamilyStore.java:518) ~[apache-cassandra-3.11.2.jar:3.11.2] at org.apache.cassandra.db.Memtable$FlushRunnable.createFlushWriter(Memtable.java:504) ~[apache-cassandra-3.11.2.jar:3.11.2] at org.apache.cassandra.db.Memtable$FlushRunnable.<init>(Memtable.java:445) ~[apache-cassandra-3.11.2.jar:3.11.2] at org.apache.cassandra.db.Memtable$FlushRunnable.<init>(Memtable.java:415) ~[apache-cassandra-3.11.2.jar:3.11.2] at org.apache.cassandra.db.Memtable.createFlushRunnables(Memtable.java:316) ~[apache-cassandra-3.11.2.jar:3.11.2] at org.apache.cassandra.db.Memtable.flushRunnables(Memtable.java:298) ~[apache-cassandra-3.11.2.jar:3.11.2] at org.apache.cassandra.db.ColumnFamilyStore$Flush.flushMemtable(ColumnFamilyStore.java:1140) ~[apache-cassandra-3.11.2.jar:3.11.2] at org.apache.cassandra.db.ColumnFamilyStore$Flush.run(ColumnFamilyStore.java:1105) ~[apache-cassandra-3.11.2.jar:3.11.2] ...
These errors are happening occasionally during daily background processing in which Cassandra tables are being dropped and created in the keyspace. The error seems to occur as part of a 'create table' operation. Once it occurs, the schema for that table becomes corrupted and Cassandra can no longer be started until the table is deleted from system_schema.tables. It happens with different tables seemingly at random, and we haven't found any problem specific to those tables. Debug messages like this are then written on a subsequent Cassandra start:
No columns found for table browses.kpi__654654711_usaa in system_schema.columns. This may be due to corruption or concurrent dropping and altering of a table. If this table is supposed to be dropped, restart cassandra with -Dcassandra.ignore_corrupted_schema_tables=true and run the following query: "DELETE FROM system_schema.tables WHERE keyspace_name = 'browses' AND table_name = 'kpi__654654711_usaa';".If the table is not supposed to be dropped, restore system_schema.columns sstables from backups.
I have read that java.nio.file.FileSystemException: ... Operation not permitted
usually implies a problem with permissions on the filesystem, where (for example) root-owned files not accessible by the Cassandra user might have been placed into the Cassandra directories. However, this is not the case in our environments.
We plan to remove the need for the frequent dropping-creating of tables in order to reduce the churn in the Cassandra schema, but at this point I don't know to what extent this is contributing to the problem.
Can anyone help diagnose this intermittent error? Thx in advance!