question

rajib76 avatar image
rajib76 asked Erick Ramirez edited

Which one is better, INSERT or UPDATE?

We have a table with 30 columns. We get new data for each row every day. As part of the new data only few columns get changed. We were earlier doing an insert for all the columns(irrespective of whether it is changed or not), we are thinking now to only update the required columns. Our understanding is that since Cassandra is a column oriented architecture, updating few columns will create less tombstones compared to if we insert all the columns again. Please let me know if our understanding is correct. We should get a better performance with update.

cql
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

steve.lacerda avatar image
steve.lacerda answered Erick Ramirez edited

Hi! I'm not sure what you mean by an insert for each row. Basically, if you have a row, and then you insert with the same primary key, you'd be doing an update and not an insert. Also, updates do not create tombstones, only deletes, ttl's, and null inserts create tombstones. It sounds to me like UPDATE is better because all you want to do is update a few columns, so if you have the primary key info, and UPDATE the cols, then there's less serialization that needs to happen. If you're thinking about using an LWT (if not exists, if exists), so a read before write then I'd continue with the inserts as that will probably be faster and create less problems versus LWT's.

3 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

rajib76 avatar image rajib76 commented ·
Thanks Steve, does not the second insert of the same row mark the previous version of the row for deletion? My understanding was that INSERTS/UPDATES will mark the previous version of the row for deletion(tombstone) and create a new version of the row
0 Likes 0 ·
Erick Ramirez avatar image Erick Ramirez ♦♦ rajib76 commented ·

It doesn't. Cassandra doesn't care about older versions -- only the latest version of a column/row/partition are returned in the results. Cheers!

0 Likes 0 ·
rajib76 avatar image rajib76 commented ·

I think I have now understood this and hence answering my own question. As mentioned by Steve, inserts or updates do not create tombstones. So, if we have an existing row and if we try to do an insert of the same row again, it actually updates all the columns and also updates the timestamp for the row. Thus during retrieval we get the latest row. When we keep inserting row for the same partition key , it will store data in different locations. Compaction will merge data for a single partition key for disk space and access optimization

0 Likes 0 ·
Erick Ramirez avatar image
Erick Ramirez answered

It doesn't matter whether you do an INSERT or an UPDATE. In Cassandra, all mutations (writes) are INSERTs under the hood -- and yes, even DELETEs are INSERTs of a tombstone marker.

With the exception of CQL counter columns which can only be incremented or decremented with an UPDATE statement, INSERT and UPDATE statements are identical and do the same thing.

The reason is that Cassandra does not perform a read-before-write (with the exception of lightweight transactions). Cassandra doesn't check (read) if a column/row/partition exists before doing an update or delete (write) so whether you are inserting, updating or deleting, Cassandra just inserts whatever the mutation is. This is what makes Cassandra writes very, VERY fast -- all writes are appended to end of the commitlog so there are no disk seeks compared to the way traditional DB files are updated.

When memtables are flushed to disk, they are merged (duplicates/deletions are dropped) and sorted (hence sorted-string tables in the name "SSTables") then written into an SSTable with the corresponding commitlog discarded. It does mean that partitions are fragmented across SSTables files and compaction threads will eventually coalesce the partitions fragments into single SSTables. Cheers!

Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.