Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

leehuihua avatar image
leehuihua asked Erick Ramirez commented

Is it a good idea to delete partitions using the IN() operator?

hi,All

I have read this question:Is deleting multiple partitions with IN() operator better than deleting partitions one by one?, and I have an idea of deleting multiple partitions by grouping.

For example, there are 10 records :{pk1, pk2,....,pk10};

for replication factor of 3 in a single-DC cluster,after grouping(using token-aware policy), I know :

  • pk1, pk2 and pk3 are located in node1, node2, node3;
  • pk4, pk5 and pk6 are located in node3, node4, node5;
  • pk7, pk8, pk9 and pk10 are located in node4, node5, node6;

Now I can delete these records:

DELETE FROM table_name WHERE pk IN (pk1, pk2, pk3);
DELETE FROM table_name WHERE pk IN (pk4, pk5, pk6);
DELETE FROM table_name WHERE pk IN (pk7, pk8, pk9, pk10);

In this way, we can reduce the number of replicas which the coordinator need to coordinate.

I wonder if this is a better way to delete partitions compared to deleting partitions one by one.

Thanks.

cassandra
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

smadhavan avatar image
smadhavan answered smadhavan edited

@leehuihua, in short, performing deletes at a partition level in multiple small batches makes sense. See also the answer provided by one of our experts here in a different question. Also, please see how deletion works in Apache Cassandra ® here in this documentation.

4 comments Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

if in my case, it is 10 rows in 10 partitions(every partition has 1 row to delete),does it still make sense?

0 Likes 0 ·

yeah, this works as long as you're not putting all the partition keys into one single IN clause. Individual deletes are fine.

Updated for clarity.

0 Likes 0 ·

in the second documentation you offered here ,it says :"Cassandra does not replay a mutation for a tombstoned record during its grace period." Won't it miss the mutation? For example, node A gets a deletion for a record , a tomb stone for this record is stored .After a few seconds ,node A get an insert for this record, but at this time ,node A is down,so the insert is missed. After a few minutes, node A recovers, it receives the hint for the mutation of this record from node B. but according to the documentation, node A will not replay the mutation, because there is a tomb stone for this record in node A, so node A miss the mutation。

0 Likes 0 ·

@smadhavan is there anything wrong with my description? I don't understand why Cassandra does not replay a mutation for a tombstoned record during its grace period.

0 Likes 0 ·
Erick Ramirez avatar image
Erick Ramirez answered Erick Ramirez commented

No, it is not a good idea to use the IN() operator to delete multiple partitions.

As I already stated in question #7337 which you linked, it isn't recommended to perform multi-partition deletes in a single CQL statement. The IN() operator is not an optimisation and in fact will always perform worse that multiple asynchronous deletes.

It doesn't make sense that you are spending all this time figuring out which replicas own the partition. You could have already deleted the partitions in a fraction of the time. There is no benefit from using theIN() operator, just disadvantages. Cheers!

2 comments Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

@Erick Ramirez

thanks! the reason you gave in question #7337 is that the IN() operator increases the work of the coordinator and that is why i spend the time figuring out which replicas own the partition on the client side. My final goal is to reduce the pressure of my Cassandra cluster.so i would rather spend some time to do the math on client side.But if this does not make sense for the cluster, would you mind giving some more detailed reason?

0 Likes 0 ·

I'm not sure what other justification you're looking for.

  • IN() is not an optimisation.
  • Asynchronous single-partition deletes will always be faster in ALL cases.
  • It is a waste of your time figuring out which replicas own the partition if your aim is performance.

Cheers!

0 Likes 0 ·