question

praveenkg avatar image
praveenkg asked Erick Ramirez answered

Should we always use frozen with UDT?

We are planning to use the UDT for storing various inventory data. Ex:

CREATE TYPE inv_bucket(
soh double,
transit_qty double,
in_progress_sales_qty double
);

and table as

CREATE TABLE inv_by_loc (
item text,
market text,
location int,
inv_bucket bucket,
PRIMARY KEY (item, market, location)
)

Should I use column inv_bucket FROZEN<bucket> or inv_bucket bucket ? Any performance issue without FROZEN?

user-defined type
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

steve.lacerda avatar image
steve.lacerda answered Erick Ramirez edited

Hi, not necessarily, it just depends on what you're trying to accomplish. Frozen UDT's act as one field, so you cannot change the specific field within the UDT, you can only change the entire UDT. Whereas, with non-frozen UDT's the fields are open to UPDATE/SET. There are caveats to non-frozen UDT's though. I wouldn't have thousands of items in my collection, but that's general for both types. There are also caveats related to tombstones with non-frozen UDT's:

1) In a non-frozen collection, a tombstone is created for an insert and a non-incremental (UPDATE) update in the collection.

Example of a non-incremental update:

update test.map_test SET b = { '3': 'c'}, c = {'3':'c'} where a='a' ;

2) An incremental update adds a value to an existing value in the collection.

Example of an incremental update:

update test.map_test SET b = b + { '4': 'd'}, c = c + {'4':'d'} where a='a' ;

The inserts and non-incremental updates for a non-frozen collection can cause large numbers of tombstones.

Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Erick Ramirez avatar image
Erick Ramirez answered

It isn't necessary to have frozen types but there are benefits for doing so.

The primary reason that you would use frozen is when you want to use a UDT or collection as part of the primary key. It isn't possible to use them otherwise without being frozen because they can mutate (change value).

But even if you are not using a UDT column in the primary key, frozen UDTs have some performance advantages. By definition, frozen UDTs have a single value and are treated as blobs.

By having a single value, the data is stored in a single cell (hence a blob) so it makes it easy (and faster) to serialise them since the cell is read as a whole without having to iterate over every field in the UDT. This gives an added advantage with the heap since there is just one single object to clean up (by GC). Another advantage with storing the data in a single cell is the significant savings on disk space.

By now you're probably thinking this all sounds terrific so why not freeze all UDTs? The answer is it depends on the use case. In a classic example where you have a person's address stored as a UDT:

CREATE TYPE address (
    street_number int,
    unit_number text,
    street_name text,
    city text,
    postcode text
)

If a user needs to change their address, in most cases it's because they've moved houses and will need to update all the fields in the address. Your application would have retrieved the address and displayed it to a user for a user to be able to modify it.

But think of a situation where only one field needs to be updated in a large UDT with lots of fields. You would have to retrieve the whole UDT value then update all the fields in order to modify just one field. Reading the whole column before writing a new version is expensive and can hurt the performance of your cluster. You want to avoid doing a read-before-write whenever possible.

The decision to freeze or not is not always straight forward. You need to understand the advantages and disadvantages for both options. But whenever possible, using a frozen UDT is a good idea if it suits your use case. Cheers!

Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.