question

zoltan.lorincz avatar image
zoltan.lorincz asked zoltan.lorincz commented

Why are rows deleted through UNLOGGED BATCH reappearing?

Hi All,

we have a

  • 4 nodes (single DC) C* cluster
  • a keyspace with replication factor 3
  • a table in the keyspace (size of the datadir of the table on a single node is about 110G)
  • the table is created with the following settings:
CREATE TABLE changes (
    id uuid,
    "order" int,
    data_int frozen <map<text,int>>,
    data_list frozen <map<text,frozen <list<text>>>>,
    data_long frozen <map<text,bigint>>,
    data_maps frozen <map<text,frozen <list<frozen <map<text,text>>>>>>,
    data_text frozen <map<text,text>>,
    data_uuid frozen <map<text,uuid>>,
    guid uuid,
    "type" int,
    PRIMARY KEY (id, "order")
)

CREATE INDEX command_guid_index ON changes (guid
CREATE INDEX command_type_index ON changes ("type");

Consistency level both Read and Write is set to 2

 Cluster.Builder builder = Cluster.builder().
                withQueryOptions(new QueryOptions().setConsistencyLevel(ConsistencyLevel.TWO)).
                withCredentials(usernameCassandra, passwordCassandra).
                withSocketOptions(socketOptions);

We delete values from the above table with a LOGGED Batch (code not included).

Our problem

Under heavy load, if we execute a DELETE operation from the above described table (changes) with an LOGGED BATCH and Consistency Level = 2

And immediately after we execute a SELECT (with Consistency Level = 2) sometimes deleted rows (which were deleted previously in the batch) are returned by the SELECT.

Any ideas how this might happen?

Thank you,

Zoltan.

tombstones
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered zoltan.lorincz commented

Deleted rows don't "reappear" in Cassandra. The symptoms you described indicate that the rows were not deleted at all.

With UNLOGGED batches, you are making a conscious decision that you don't care about some (or all) of the statements in the batch failing.

You didn't provide an example of a UNLOGGED BATCH with the DELETE statements in it but if you're using batches as an optimisation, then you're making it worse since it puts pressure on the coordinator to fire N statements and results in worse performance.

Instead, we recommend that you issue individual DELETE statements to maximise the throughput of your cluster since each request will be coordinated by individual nodes. Cheers!

1 comment Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

zoltan.lorincz avatar image zoltan.lorincz commented ·

Dear Erick,

Thank you for your answer. I made a mistake in the problem description, I wanted to say that we are using LOGGED batches (not UNLOGGED). I am sorry about this.
In this case do you have any explanation?

We are using LOGGED batches because we need all operations to be executed or to fail.

0 Likes 0 ·