Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

gmldba_107428 avatar image
gmldba_107428 asked ·

Shouldn't Cassandra eliminate duplicates in compaction?

I am seeing this error in Cassandra logs:

Detected 1 duplicate rows for release-metric-<uuid>:<table>:2018-11-15 00\:00Z during Compaction

Shouldn't Cassandra eliminate duplicates in compaction?

[UPDATED]

WARN [Native-Transport-Requests-2] 2021-01-15 12:38:43,830 DuplicateRowChecker.java:96 - Detected 1 duplicate rows for release-metric-d2c23ed5-e281-49cf-9970-35e842b9d82e:data_points:2019-01-17 00\:00Z during Read.
compaction
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered ·

In a Cassandra cluster with mixed 2.x/3.x versions, certain operations such as:

  • a sequence of row deletions
  • collection overwrites
  • paging
  • read repair

can cause nodes running with Cassandra 3.x to split individual rows into several rows with identical clustering due to (a) paging and handling of row tombstones in C* 2.x, and (b) the way C* 3.x handles LegacyLayout.

The warning message gets logged by DuplicateRowChecker when it detects a duplicate row so it can be handled correctly in Cassandra 3.x (CASSANDRA-15789).

To respond to your question directly: yes, compaction and other operations such as handling read requests fixes the duplicate rows to avoid problems like data corruption. Cheers!

2 comments Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

WARN [Native-Transport-Requests-2] 2021-01-15 12:38:43,830 DuplicateRowChecker.java:96 - Detected 1 duplicate rows for release-metric-d2c23ed5-e281-49cf-9970-35e842b9d82e:data_points:2019-01-17 00\:00Z during Read.

Cassandra 3.11.9

What do you mean by full stack trace?

0 Likes 0 ·

I validated that we are not a mixed environment, this is all 3.11.9.

0 Likes 0 ·