Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

jonasmartinpico_114991 avatar image
jonasmartinpico_114991 asked ·

Is it bad to have dropped mutations even when it's less than 1%?

Hi,

We are suffering from dropped mutations on a cluster that is writing about 150 million rows/day.

This is a 2 hour sample of the dropped mutations we have:

INFO  [ScheduledTasks:1] 2020-02-18 08:02:56,908  DroppedMessages.java:156 - MUTATION messages were dropped in the last 5 s: 3 internal and 258 cross node. Mean internal dropped latency: 2192 ms and Mean cross-node dropped latency: 2169 ms
INFO  [ScheduledTasks:1] 2020-02-18 08:07:32,116  DroppedMessages.java:156 - MUTATION messages were dropped in the last 5 s: 0 internal and 396 cross node. Mean internal dropped latency: 2192 ms and Mean cross-node dropped latency: 2162 ms
INFO  [ScheduledTasks:1] 2020-02-18 08:10:47,252  DroppedMessages.java:156 - MUTATION messages were dropped in the last 5 s: 2 internal and 414 cross node. Mean internal dropped latency: 2147 ms and Mean cross-node dropped latency: 2126 ms
INFO  [ScheduledTasks:1] 2020-02-18 08:17:17,610  DroppedMessages.java:156 - MUTATION messages were dropped in the last 5 s: 0 internal and 63 cross node. Mean internal dropped latency: 2147 ms and Mean cross-node dropped latency: 2021 ms
INFO  [ScheduledTasks:1] 2020-02-18 08:25:02,864  DroppedMessages.java:156 - MUTATION messages were dropped in the last 5 s: 0 internal and 13 cross node. Mean internal dropped latency: 2147 ms and Mean cross-node dropped latency: 2013 ms
INFO  [ScheduledTasks:1] 2020-02-18 08:32:32,993  DroppedMessages.java:156 - MUTATION messages were dropped in the last 5 s: 1 internal and 423 cross node. Mean internal dropped latency: 2013 ms and Mean cross-node dropped latency: 2024 ms
INFO  [ScheduledTasks:1] 2020-02-18 08:39:53,218  DroppedMessages.java:156 - MUTATION messages were dropped in the last 5 s: 0 internal and 293 cross node. Mean internal dropped latency: 2013 ms and Mean cross-node dropped latency: 2015 ms
INFO  [ScheduledTasks:1] 2020-02-18 08:47:38,413  DroppedMessages.java:156 - MUTATION messages were dropped in the last 5 s: 0 internal and 108 cross node. Mean internal dropped latency: 2013 ms and Mean cross-node dropped latency: 2018 ms
INFO  [ScheduledTasks:1] 2020-02-18 08:54:58,562  DroppedMessages.java:156 - MUTATION messages were dropped in the last 5 s: 0 internal and 194 cross node. Mean internal dropped latency: 2013 ms and Mean cross-node dropped latency: 2018 ms
INFO  [ScheduledTasks:1] 2020-02-18 09:02:28,767  DroppedMessages.java:156 - MUTATION messages were dropped in the last 5 s: 1 internal and 123 cross node. Mean internal dropped latency: 2013 ms and Mean cross-node dropped latency: 2084 ms
INFO  [ScheduledTasks:1] 2020-02-18 09:07:44,055  DroppedMessages.java:156 - MUTATION messages were dropped in the last 5 s: 4 internal and 240 cross node. Mean internal dropped latency: 2013 ms and Mean cross-node dropped latency: 2081 ms
INFO  [ScheduledTasks:1] 2020-02-18 09:13:49,388  DroppedMessages.java:156 - MUTATION messages were dropped in the last 5 s: 0 internal and 189 cross node. Mean internal dropped latency: 2013 ms and Mean cross-node dropped latency: 2092 ms
INFO  [ScheduledTasks:1] 2020-02-18 09:20:49,678  DroppedMessages.java:156 - MUTATION messages were dropped in the last 5 s: 0 internal and 49 cross node. Mean internal dropped latency: 2013 ms and Mean cross-node dropped latency: 2013 ms
INFO  [ScheduledTasks:1] 2020-02-18 09:28:35,212  DroppedMessages.java:156 - MUTATION messages were dropped in the last 5 s: 1 internal and 48 cross node. Mean internal dropped latency: 2013 ms and Mean cross-node dropped latency: 2013 ms
INFO  [ScheduledTasks:1] 2020-02-18 09:35:00,520  DroppedMessages.java:156 - MUTATION messages were dropped in the last 5 s: 1 internal and 171 cross node. Mean internal dropped latency: 2013 ms and Mean cross-node dropped latency: 2058 ms
INFO  [ScheduledTasks:1] 2020-02-18 09:42:35,663  DroppedMessages.java:156 - MUTATION messages were dropped in the last 5 s: 3 internal and 132 cross node. Mean internal dropped latency: 2013 ms and Mean cross-node dropped latency: 2025 ms
INFO  [ScheduledTasks:1] 2020-02-18 09:49:11,032  DroppedMessages.java:156 - MUTATION messages were dropped in the last 5 s: 0 internal and 346 cross node. Mean internal dropped latency: 2013 ms and Mean cross-node dropped latency: 2025 ms
INFO  [ScheduledTasks:1] 2020-02-18 09:56:11,311  DroppedMessages.java:156 - MUTATION messages were dropped in the last 5 s: 0 internal and 37 cross node. Mean internal dropped latency: 2013 ms and Mean cross-node dropped latency: 2019 ms
INFO  [ScheduledTasks:1] 2020-02-18 10:04:41,485  DroppedMessages.java:156 - MUTATION messages were dropped in the last 5 s: 0 internal and 262 cross node. Mean internal dropped latency: 2013 ms and Mean cross-node dropped latency: 2027 ms


As you can see, the amount of dropped messages are not so big, and most of them are cross-node.

I know that dropped mutations are something bad :( but looking at the logs, is it that bad in our current situation (150 Million/day)

Regards.

mutations
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered ·

@jonasmartinpico_114991 The quick answer is yes, dropped mutations are bad. It means that the commit log disks cannot keep up with the writes and you're relying on repairs to keep the data in sync. If you're reading or writing with a consistency level of one, there's a risk that you're reading bad results from the DB if you hit a replica that has the missed mutation.

Dropped mutations should never happen that regularly in your cluster. Maybe 1 dropped mutation once a week maybe considered a one-off and is tolerable in a distributed environment but several dropped mutations every few minutes is an indication that your cluster is overloaded and you need to address it. Cheers!

2 comments Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Could the root cause be related to the commitlog "living" on the same disk than the data?

Thanks!

0 Likes 0 · ·
Erick Ramirez avatar image Erick Ramirez ♦♦ jonasmartinpico_114991 ·

@jonasmartinpico_114991 Yes, it is possible if the volumes are not fast enough. Usually this is not an issue for directly-attached SSDs. Cheers!

0 Likes 0 · ·