question

alex.bai avatar image
alex.bai asked Erick Ramirez answered

Can explicitly setting writetime reconcile mixing of lightweight transactions and regular writes?

First of all, I do know there's a restrict of not to mix LWT and non LWT operation on Cassandra.

From my observation in our application, one of the reason for such restriction is:
Since java driver 3.0, normal insertion will use a timestamp generated from client side, but LWT insertion will use the timestamp from server side, and Cassandra uses a last-write-win strategy.

I'm aware of the performance impaction of using an LWT (4 round trip / paxos / etc...), but our case is we put our DC level distributed lock on Cassandra.
So when try to acquire the lock, we use a LWT insertion, but to speed up the lock performance, we use a normal deletion when releasing the lock.
Then we're facing the data corruption caused by mixing usage of LWT and non LWT operation.
Which is, our deletion success, but with an earlier timestamp so it doesn't take effect.

Then our first fix is to run a LOCAL_QUORUM query with writetime() function to retrieve the write timestamp, add 1 milli second to it, and use "USING TIMESTAMP" to set it when deletion.
Then we realized it still doesn't work, because the timestamp retrieved with LOCAL_QUORUM seems not the final write time for the data inserted by LWT. Still, we process a deletion with an earlier timestamp.

So actually I have 3 questions:

  1. Dose the data inserted by LWT has different timestamps in different replicas, which actually generated from Cassandra nodes during 3rd step of LWT paxos (propose / accept)?
  2. Dose a query with consistency level LOCAL_QUORUM to the data inserted by LWT considers the response writetime the latest one from its ACKs? For example, 3 replicas inserted by LWT have 3 different timestamps, and a LOCAL_QUORUM query retrieves 2 of them and uses the latest timestamp of these 2 as the write time of the response?
  3. If we have to insist doing so (insert by LWT then normal delete), can we use the LOCAL_SERIAL consistency level and writetime() function to retrieve the timestamp, and use it as the timestamp for normal deletion to make sure the deletion works?

Or, is the only choice for us is to use both LWT insertion and LWT deletion for our user lock or abandon our distributed lock on Cassandra?

Sample Table:

CREATE TABLE "sample"."distributed_lock" (
  lock_id uuid,
  owner uuid,
  PRIMARY KEY (lock_id)
);

The way acquiring lock with LWT, CL = LOCAL_SERIAL

CONSISTENCY LOCAL_SERIAL;
INSERT INTO "sample"."distributed_lock" (lock_id, owner) VALUES(fake-uuid-1, fake-uuid-2) IF NOT EXISTS;

The previous way releasing lock without LWT, CL = LOCAL_QUORUM. We will use

CONSISTENCY LOCAL_QUORUM;
SELECT WRITETIME(owner), lock_id, owner FROM "sample"."distributed_lock";
 
DELETE FROM "sample"."distributed_lock" WHERE lock_id = "fake-uuid-fetched-above" USING TIMESTAMP "write-time-fetched-above";

Then deletion doesn't take effect, moreover, the writetime retrieved by SELECT is different from the writetime retrieved after a while.

So if we change the consistency level in the last step, from LOCAL_QUORUM to LOCAL_SERIAL. Will it work in any cases?

Any discussion is welcomed and thanks in advance ~

lightweight transactions
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

steve.lacerda avatar image
steve.lacerda answered alex.bai commented

Hi, In response to your questions:

1) No, the write time will not be different for different replicas. The paxos leader will define the write timestamp, not the replica nodes.

2) LOCAL_QUORUM does not exist for LWT's since the writes are serial. I believe you mean LOCAL_SERIAL. However, if you're using LOCAL_SERIAL then yes, if all 3 nodes have 3 different timestamps then the latest write timestamp would be used. However, if you're using LOCAL_SERIAL or LOCAL_QUORUM, which guarantees 2 of the 3 nodes have the same data, then having all 3 nodes with different timestamps would be impossible. The only way this could happen is if you're using ONE or LOCAL_ONE.

3) Can you issue these queries serially, so wait for the ACK from the LWT and then delete the lock?

2 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

alex.bai avatar image alex.bai commented ·

Thanks, Steve. Really appreciate your kindly answer.

1) It's good to know LWT timestamp is from paxos leader and all replicas have the same write time.

2) & 3)

Sorry I didn't describe our case clearly, let me use cql statement instead:

Sample Table:

CREATE TABLE "sample"."distributed_lock" (
  lock_id uuid,
  owner uuid,
  PRIMARY KEY (lock_id)
);

The way acquiring lock with LWT, CL = LOCAL_SERIAL

CONSISTENCY LOCAL_SERIAL;
INSERT INTO "sample"."distributed_lock" (lock_id, owner) VALUES(fake-uuid-1, fake-uuid-2) IF NOT EXISTS;

The previous way releasing lock without LWT, CL = LOCAL_QUORUM. We will use

CONSISTENCY LOCAL_QUORUM;
SELECT WRITETIME(owner), lock_id, owner FROM "sample"."distributed_lock";

DELETE FROM "sample"."distributed_lock" WHERE lock_id = "fake-uuid-fetched-above" USING TIMESTAMP "write-time-fetched-above";

Then deletion doesn't take effect.


0 Likes 0 ·
alex.bai avatar image alex.bai commented ·
so my question is, will change the CL of the last query (before deletion), from local_quorum to local_serial make the next deletion work?
0 Likes 0 ·
Erick Ramirez avatar image
Erick Ramirez answered

Your process for releasing the lock looks flawed to me.

As a reminder, all write operations are INSERTs under-the-hood -- INSERTs, UPDATEs and DELETEs are all INSERTs. This is due to Cassandra not doing a read-before-write which makes writes in Cassandra very, VERY fast. The only exception is LWTs which by definition must do a read to check the conditional part of the query (IF [NOT] EXISTS) before it does the write which is part of the reason LWTs are expensive compared to an ordinary write.

Between the time that you read the writetime of the lock until such time that you delete the lock, there's a race condition where the lock could have been overwritten one or more times making it ineffective. And when you issue a DELETE, Cassandra doesn't care whether the lock exists -- it just inserts a tombstone without.

Specifying a timestamp with the DELETE won't achieve anything because it will almost always be in the past compared to new INSERTs which are "flowing in" from the app in the meantime.

I don't know what your use case is or what you're using the locks for but if you want to prevent the lock from getting updated when you want to delete it, you should wrap it in LWT so no other in-flight transactions can override what you're doing:

DELETE FROM ... WHERE lock_id = ? IF EXISTS

To answer your follow up question, you can not use SERIAL consistencies for writes -- they can only be used for the read phase of the LWT read-before-write. In any case, setting the consistency won't provide the outcome you're after. Cheers!

Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.