Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

michael.guissine_30999 avatar image
michael.guissine_30999 asked ·

Nodes going down with AssertionError after upgrading to DSE 6.7.3

After upgrading DSE from 5.1.14 to 6.7.3 we are seeing nodes are going down with lots of errors like the one below in the logs, any thoughts?

ERROR [CoreThread-0] 2019-06-25 10:15:30,519  VerbHandlers.java:77 - Unexpected error during execution of request READS.SINGLE_READ (99097935): /10.16.6.6 -> /10.16.6.4
java.lang.AssertionError: Expected valid buffer or boundary crossed
    at org.apache.cassandra.utils.flow.Flow$ReduceSubscriber.onError(Flow.java:1221)
    at org.apache.cassandra.utils.flow.FlowTransformBase.onError(FlowTransformBase.java:38)
    at org.apache.cassandra.utils.flow.FlatMap.onError(FlatMap.java:133)
    at org.apache.cassandra.utils.flow.FlowTransformBase.onError(FlowTransformBase.java:38)
    at org.apache.cassandra.utils.flow.FlowTransformBase.onError(FlowTransformBase.java:38)
    at org.apache.cassandra.utils.flow.FlowTransformBase.onError(FlowTransformBase.java:38)
    at org.apache.cassandra.utils.flow.FlowTransformBase.onError(FlowTransformBase.java:38)
    at org.apache.cassandra.utils.flow.FlowTransformBase.onError(FlowTransformBase.java:38)
    at org.apache.cassandra.utils.flow.FlowTransformBase.onError(FlowTransformBase.java:38)
    at org.apache.cassandra.utils.flow.FlatMap$FlatMapChild.onError(FlatMap.java:185)
    at org.apache.cassandra.utils.flow.FlowTransformBase.onError(FlowTransformBase.java:38)
    at org.apache.cassandra.utils.flow.FlatMap$FlatMapChild.onError(FlatMap.java:185)
    at org.apache.cassandra.io.sstable.format.AsyncPartitionReader$PartitionReader.onError(AsyncPartitionReader.java:365)

[UPDATE] Thank you @Erick Ramirez. That was (I believe) the full stack trace however we also seeing the errors you mentioned (`o.a.c.utils.memory.buffers.TemporaryBufferPool `)

ERROR [CompactionExecutor:1557] 2019-06-27 10:02:45,022  CassandraDaemon.java:126 - Exception in thread Thread[CompactionExecutor:1557,5,main]
java.lang.AssertionError: Slab should have been unreferenced and all buffers returned before recycling
    at org.apache.cassandra.utils.memory.buffers.MemorySlabWithBumpPtr.recycle(MemorySlabWithBumpPtr.java:174)
    at org.apache.cassandra.utils.memory.buffers.TemporaryBufferPool.newSlab(TemporaryBufferPool.java:321)
    at org.apache.cassandra.utils.memory.buffers.TemporaryBufferPool.switchSharedSlab(TemporaryBufferPool.java:232)
    at org.apache.cassandra.utils.memory.buffers.TemporaryBufferPool.allocateFromShared(TemporaryBufferPool.java:197)
    at org.apache.cassandra.utils.memory.buffers.TemporaryBufferPool.allocate(TemporaryBufferPool.java:128)
    at org.apache.cassandra.io.util.ChunkReader.readScattered(ChunkReader.java:103)
    at org.apache.cassandra.cache.ChunkCacheImpl$MultiBufferChunk.asyncLoad(ChunkCacheImpl.java:215)
    at org.apache.cassandra.cache.ChunkCacheImpl.asyncLoad(ChunkCacheImpl.java:357)
    at org.apache.cassandra.cache.ChunkCacheImpl.asyncLoad(ChunkCacheImpl.java:56)
    at com.github.benmanes.caffeine.cache.LocalAsyncLoadingCache.lambda$get$2(LocalAsyncLoadingCache.java:129)
    at com.github.benmanes.caffeine.cache.BoundedLocalCache.lambda$doComputeIfAbsent$14(BoundedLocalCache.java:2039)
    at java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1853)
    at com.github.benmanes.caffeine.cache.BoundedLocalCache.doComputeIfAbsent(BoundedLocalCache.java:2037)
    at com.github.benmanes.caffeine.cache.BoundedLocalCache.computeIfAbsent(BoundedLocalCache.java:2020)
    at com.github.benmanes.caffeine.cache.LocalAsyncLoadingCache.get(LocalAsyncLoadingCache.java:128)

as well as `OutOfMemory` errors (on different cluster)

ERROR [CoreThread-0] 2019-06-27 14:06:15,140  VerbHandlers.java:77 - Unexpected error during execution of request READS.SINGLE_READ (330386123): /10.17.6.5 -> /10.17.6.5 java.lang.OutOfMemoryError: Direct buffer memory    at org.apache.cassandra.utils.flow.Flow$ReduceSubscriber.onError(Flow.java:1221)    at org.apache.cassandra.utils.flow.FlowTransformBase.onError(FlowTransformBase.java:38)    at org.apache.cassandra.utils.flow.FlatMap.onError(FlatMap.java:133)    at org.apache.cassandra.utils.flow.FlowTransformBase.onError(FlowTransformBase.java:38)    at org.apache.cassandra.utils.flow.FlowTransformBase.onError(FlowTransformBase.java:38)    at org.apache.cassandra.utils.flow.FlowTransformBase.onError(FlowTransformBase.java:38)    at org.apache.cassandra.utils.flow.FlowTransformBase.onError(FlowTransformBase.java:38)    at org.apache.cassandra.utils.flow.FlowTransformBase.onError(FlowTransformBase.java:38)    at org.apache.cassandra.utils.flow.FlowTransformBase.onError(FlowTransformBase.java:38)    at org.apache.cassandra.utils.flow.FlatMap.onError(FlatMap.java:133)    at org.apache.cassandra.utils.flow.FlatMap$FlatMapChild.onError(FlatMap.java:185)    at org.apache.cassandra.utils.flow.FlatMap$FlatMapChild.onError(FlatMap.java:185)    at org.apache.cassandra.io.sstable.format.AsyncPartitionReader$PartitionReader.onError(AsyncPartitionReader.java:365)

dse
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered ·

@michael.guissine_30999 the stack trace is incomplete but if it were, I'm pretty sure it would include o.a.c.utils.memory.buffers.TemporaryBufferPool. If it does, it's confirmation that it's for a known issue in DSE 6.7.3 (ticket ID DB-3172). We aim to get the fix included in the next release of DSE (no ETA yet).

You can workaround the issue by temporarily downgrading the binaries to DSE 6.7.2 -- this won't have any impact on the data. I've written about the issue in detail in this KB article -- Compaction fails with CorruptSSTableException, AssertionError recycling a memory buffer. Cheers!

[UPDATE] The solution is to upgrade to DSE 6.7.4 (or newer) where DB-3172/DB-3174 were fixed.

8 comments Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

I get this issue too.

Maybe it fixed. (6.7.4)


Reference: the DataStax Enterprise 6.7 release notes:

Resolved issues:
AssertionError in temporary buffer pool causes CorruptSSTableException. (DB-3172, DB-3174)
1 Like 1 · ·

You're correct @Beck. DB-3172 which I wrote about in the article above was fixed in DSE 6.7.4. When I wrote the article, 6.7.4 was not released yet. Cheers!

1 Like 1 · ·

[Update reposted in original question]

1 Like 1 · ·

@michael.guissine_30999 DSE 6.7.4 is out now so you should be able to do a simple binary upgrade. Cheers!

1 Like 1 · ·

thank you @Erick Ramirez, we managed to resolved the issue by downgrading to 6.7.2 along with disabling Asynchronous IO -Ddse.io.aio.enabled=false and limiting file size cache file_cache_size_in_mb: 1024 . Next, we will try upgrading to 6.7.4

1 Like 1 · ·

For future reference, disabling AIO is not recommended since it will significantly affect the performance of your cluster. There are very limited edge cases where it is suggested by a DataStax expert after exhaustive investigation. Cheers!

0 Likes 0 · ·
Show more comments