Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

Beck avatar image
Beck asked ·

What happens if I "DELETE" not existing data? (DML)

Hi all,


I researched and tried to understand "DELETE" operator recently.I already almost understand.

I still have one question about it because I don't know the deep internals of Cassandra.


Question:

What happens if I delete not existing data?


AFAIK, "DELETE" operator is going to write something data.

I am not sure, will I get a big sstable after "nodetool flush" or not?



Example:

table info:
CREATE TABLE data_t (
    id bigint PRIMARY KEY,
    name text
)


This table is empty now.

If I delete not existing data in loop.

Like this:
for
  DELETE FROM data_t WHERE id = $1;
done


So many DML in Cassandra(continuously):

DELETE FROM data_t WHERE id = 1;
DELETE FROM data_t WHERE id = 2;
DELETE FROM data_t WHERE id = 3;
...etc

End of Example.

---


Thanks!

deletedml
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered ·

Great question @Beck. All mutations in C* (INSERT, UPDATE, DELETE) all do an insert under-the-hood. When you delete data, a tombstone is inserted. Since C* does not do a read-before-write, it doesn't know that the data doesn't exist so a tombstone is inserted regardless. In the scenario you described, the table will be full of tombstones even when there isn't any data. Cheers!

4 comments Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Awesome. I get it.

So, I must compact sstable if I need release disk space after gc_grace_seconds.

0 Likes 0 · ·

In fact, I have a suppose.


Scenario:

There are some data without TTL in my table.


If I want to delete large partition data, I must use one of under cases:

Case 1:

step:
1. Check data is exist.(SELECT ..)
2. Delete data if data exist.

Case 2:

1. Delete data.(by lightweight transaction:if exist and more static_column_conditions)

Case 3:

1. Delete data without checking data exist.


Is any solution better?

0 Likes 0 · ·

Your last comment is a little off-track but there are limited cases where a read-before-write is necessary and it will significantly affect the performance of your application and cluster. Why do you need to check for the existence of the data before deleting it?

0 Likes 0 · ·
Beck avatar image Beck Erick Ramirez ♦♦ ·

> Why do you need to check for the existence of the data before deleting it?

You are right.

It's not necessary unless the scenario needs to filter conditions.


I prefer "Case 3" after you provided the best answer.

I'll care about read-before-write when I need to use it.


Thanks again.

1 Like 1 · ·