Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

HDC avatar image
HDC asked Erick Ramirez answered

cassandra 写入的副本不一致

cassandra 版本:2.1.15

下面的表是3个副本。

CREATE KEYSPACE keyspace1 WITH REPLICATION = {'class': 'NetworkTopologyStrategy','DC1': '3','DC2':'3'};

因为涉及业务数据,替换了部分不影响描述问题的信息。

我们在生产环境中遇到一个奇怪的现象。查询数据的时候发现有些在应用的处理上不缺少的字段缺少了。

然后我们解析sstable发现,确实底下的文件就是如此。

其中最奇怪的是,2条记录的writeTime都是一样的,但是其中一个节点有部分记录,另外一个节点只有部分记录。

我理解即使是压缩合并了,node2的writeTime应该也是新的,不应该保持一样。也许我理解错了。

Node1:

./sstable2json /disk/data2/cassandra/keyspace1/table-094aa0a0fa4311eab69565744744aa23/keyspace1-table-ka-207621-Data.db -k ID_1

[

{"key": "ID_1",

"cells": [["000:111:AAA:222:field_1","BBB",1620896500311001],

["000:111:AAA:222:field_2","CCC",1620896500311001],

["000:111:AAA:222:field_3","2021-05-13 17:01+0800",1620896500311001],

["000:111:AAA:222:field_4","2021-05-13 17:01+0800",1620896500311001]]}

Node2:

./sstable2json /disk/data2/cassandra/keyspace1/table-094aa0a0fa4311eab69565744744aa23/keyspace1-table-ka-724373-Data.db -k ID_1

[

{"key": "ID_1",

"cells": [["000:111:AAA:222:field_1","BBB",1620896500311001]]}

]

Node3:

empty

consistency
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered

A mutation (Cassandra write) is applied to replicas as a single atomic operation. It is not possible for only some rows in the mutation to be applied -- all of the rows of a single mutation must be written successfully or the replica will not send a successful write acknowledgement to the coordinator of the request.

In your case, it is not possible for the same mutation to only apply one row to node2 but four rows to node1 and none for node3. The likely scenario is that those are separate write requests.

The important thing to focus on is that all 3 replicas are out-of-sync. For this situation to take place, your application is writing with a consistency of ONE or LOCAL_ONE. This is bad practice and leads to data inconsistencies like you're experiencing.

Our recommendation is to use LOCAL_QUORUM so at least 2 replicas must acknowledge the write and avoids this situation. Cheers!

Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.