question

fwy_187020 avatar image
fwy_187020 asked fwy_187020 commented

FileSystemException and corrupted schema when dropping then recreating tables

We have deployed stand-alone, unclustered Cassandra 3.11.2 environments with various keyspaces containing tables used for different purposes. Recently, we have intermittently seen errors like the following in the Cassandra logs related to one of the keyspaces.

ERROR [MemtableFlushWriter:352] 2021-07-20 00:02:37,691 LogTransaction.java:273 - Transaction log [mc_txn_flush_5224b250-e90f-11eb-80d3-0dc12b55b1d5.log in /qond/apps/mfgpro/databases/cassandra/default/data/data/system_schema/columns-24101c25a2ae3af787c1b40ee1aca33f] indicates txn was not completed, trying to abort it now
ERROR [MemtablePostFlush:307] 2021-07-20 00:02:37,800 CassandraDaemon.java:228 - Exception in thread Thread[MemtablePostFlush:307,5,main]
java.lang.RuntimeException: java.nio.file.FileSystemException: /qond/apps/mfgpro/databases/cassandra/default/data/data/system_schema/columns-24101c25a2ae3af787c1b40ee1aca33f/mc_txn_flush_5224b250-e90f-11eb-80d3-0dc12b55b1d5.log: Operation not permitted
    at org.apache.cassandra.io.util.FileUtils.write(FileUtils.java:590) ~[apache-cassandra-3.11.2.jar:3.11.2]
    at org.apache.cassandra.io.util.FileUtils.appendAndSync(FileUtils.java:571) ~[apache-cassandra-3.11.2.jar:3.11.2]
    at org.apache.cassandra.db.lifecycle.LogReplica.append(LogReplica.java:85) ~[apache-cassandra-3.11.2.jar:3.11.2]
    at org.apache.cassandra.db.lifecycle.LogReplicaSet.lambda$null$5(LogReplicaSet.java:210) ~[apache-cassandra-3.11.2.jar:3.11.2]
    at org.apache.cassandra.utils.Throwables.perform(Throwables.java:113) ~[apache-cassandra-3.11.2.jar:3.11.2]
    at org.apache.cassandra.utils.Throwables.perform(Throwables.java:103) ~[apache-cassandra-3.11.2.jar:3.11.2]
    at org.apache.cassandra.db.lifecycle.LogReplicaSet.append(LogReplicaSet.java:210) ~[apache-cassandra-3.11.2.jar:3.11.2]
    at org.apache.cassandra.db.lifecycle.LogFile.addRecord(LogFile.java:324) ~[apache-cassandra-3.11.2.jar:3.11.2]
    at org.apache.cassandra.db.lifecycle.LogFile.add(LogFile.java:285) ~[apache-cassandra-3.11.2.jar:3.11.2]
    at org.apache.cassandra.db.lifecycle.LogTransaction.trackNew(LogTransaction.java:136) ~[apache-cassandra-3.11.2.jar:3.11.2]
    at org.apache.cassandra.db.lifecycle.LifecycleTransaction.trackNew(LifecycleTransaction.java:529) ~[apache-cassandra-3.11.2.jar:3.11.2]
    at org.apache.cassandra.io.sstable.format.big.BigTableWriter.<init>(BigTableWriter.java:81) ~[apache-cassandra-3.11.2.jar:3.11.2]
    at org.apache.cassandra.io.sstable.format.big.BigFormat$WriterFactory.open(BigFormat.java:92) ~[apache-cassandra-3.11.2.jar:3.11.2]
    at org.apache.cassandra.io.sstable.format.SSTableWriter.create(SSTableWriter.java:102) ~[apache-cassandra-3.11.2.jar:3.11.2]
    at org.apache.cassandra.io.sstable.SimpleSSTableMultiWriter.create(SimpleSSTableMultiWriter.java:119) ~[apache-cassandra-3.11.2.jar:3.11.2]
    at org.apache.cassandra.db.compaction.AbstractCompactionStrategy.createSSTableMultiWriter(AbstractCompactionStrategy.java:587) ~[apache-cassandra-3.11.2.jar:3.11.2]
    at org.apache.cassandra.db.compaction.CompactionStrategyManager.createSSTableMultiWriter(CompactionStrategyManager.java:1027) ~[apache-cassandra-3.11.2.jar:3.11.2]
    at org.apache.cassandra.db.ColumnFamilyStore.createSSTableMultiWriter(ColumnFamilyStore.java:518) ~[apache-cassandra-3.11.2.jar:3.11.2]
    at org.apache.cassandra.db.Memtable$FlushRunnable.createFlushWriter(Memtable.java:504) ~[apache-cassandra-3.11.2.jar:3.11.2]
    at org.apache.cassandra.db.Memtable$FlushRunnable.<init>(Memtable.java:445) ~[apache-cassandra-3.11.2.jar:3.11.2]
    at org.apache.cassandra.db.Memtable$FlushRunnable.<init>(Memtable.java:415) ~[apache-cassandra-3.11.2.jar:3.11.2]
    at org.apache.cassandra.db.Memtable.createFlushRunnables(Memtable.java:316) ~[apache-cassandra-3.11.2.jar:3.11.2]
    at org.apache.cassandra.db.Memtable.flushRunnables(Memtable.java:298) ~[apache-cassandra-3.11.2.jar:3.11.2]
    at org.apache.cassandra.db.ColumnFamilyStore$Flush.flushMemtable(ColumnFamilyStore.java:1140) ~[apache-cassandra-3.11.2.jar:3.11.2]
    at org.apache.cassandra.db.ColumnFamilyStore$Flush.run(ColumnFamilyStore.java:1105) ~[apache-cassandra-3.11.2.jar:3.11.2]
    ...

These errors are happening occasionally during daily background processing in which Cassandra tables are being dropped and created in the keyspace. The error seems to occur as part of a 'create table' operation. Once it occurs, the schema for that table becomes corrupted and Cassandra can no longer be started until the table is deleted from system_schema.tables. It happens with different tables seemingly at random, and we haven't found any problem specific to those tables. Debug messages like this are then written on a subsequent Cassandra start:

No columns found for table browses.kpi__654654711_usaa in system_schema.columns. This may be due to corruption or concurrent dropping and altering of a table. If this table is supposed to be dropped, restart cassandra with -Dcassandra.ignore_corrupted_schema_tables=true and run the following query: "DELETE FROM system_schema.tables WHERE keyspace_name = 'browses' AND table_name = 'kpi__654654711_usaa';".If the table is not supposed to be dropped, restore system_schema.columns sstables from backups.

I have read that java.nio.file.FileSystemException: ... Operation not permitted usually implies a problem with permissions on the filesystem, where (for example) root-owned files not accessible by the Cassandra user might have been placed into the Cassandra directories. However, this is not the case in our environments.

We plan to remove the need for the frequent dropping-creating of tables in order to reduce the churn in the Cassandra schema, but at this point I don't know to what extent this is contributing to the problem.

Can anyone help diagnose this intermittent error? Thx in advance!

schema
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered fwy_187020 commented

The symptoms you described indicate to me that tables are being dropped and recreated almost simultaneously without waiting for each DDL to propagate before issuing the next schema change as if they are being done programatically.

I suspect this is leading to the transaction logs for the schema to be out-of-sequence and needs to be aborted. Similarly, it is causing the system schema to be corrupted. If you are making the schema changes via your app code, make sure that you wait for schema to propagate to all nodes in the cluster and check that the nodes have schema agreement.

For the filesystem exception, you are correct that it happens (a) when the Cassandra process does not have permissions to the file or filesystem. It can also happen in situations where (b) the filesystem is full, or (c) when the C* process cannot write to the filesystem for whatever reason including disk failure or overloaded IO. Cheers!

3 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

fwy_187020 avatar image fwy_187020 commented ·

Hi Erick,

Thanks for the reply. What you suggest is plausible, and we will try to modify our application to reduce table drop-create activity to address it. Because the Cassandra environments where we see the error are stand alone and not clustered at all, we didn't expect to hit this kind of problem.

[Follow up question posted in #11949]

Thanks again for responding!

[Post converted to comment since it's not an "answer"]

0 Likes 0 ·
Erick Ramirez avatar image Erick Ramirez ♦♦ fwy_187020 commented ·

Just to clarify, "not clustered" is an incorrect description. It is possible to have single-node clusters -- by definition, it is still a cluster that just happens to have one node.

I'll respond to your follow up question in a separate post since it's different issue to your original question. Cheers!

0 Likes 0 ·
fwy_187020 avatar image fwy_187020 Erick Ramirez ♦♦ commented ·

Thanks! Just to clarify, by "unclustered" I meant that the SimpleSnitch not GossipingPropertyFileSnitch, was being used, and that the SimpleStrategy rather than NetworkTopologyStrategy was being used in our keyspaces. (We plan to change that in the near future.)

0 Likes 0 ·