question

nirjharray avatar image
nirjharray asked gduan2000 commented

How does large amounts of data in a set collection impact performance?

As per the notes use of collections is recommended for small-ish numbers of elements to avoid performance issues.

  1. What is the threshold limit?
  2. Is it a right modeling to use SET data type for tags as it can be limiting it means compromising on the user experience?
workshopcollections
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered gduan2000 commented

The theoretical limit for the number of elements in a set collection is two billion. In practical terms, you should only store tens or hundreds of elements in a collection. To be clear, it is possible to store lots more in a set but storing thousands of elements isn't ideal.

NOTE: The limit used to be capped at MAX_UNSIGNED_SHORT (64K):

        if (bytes.remaining() > FBUtilities.MAX_UNSIGNED_SHORT)
            throw new InvalidRequestException(String.format("Set value is too long. Set values are limited to %d bytes but %d bytes value provided",
              FBUtilities.MAX_UNSIGNED_SHORT,
              bytes.remaining()));

but the cap has been removed since C* 3.0.1, 3.1 (CASSANDRA-10374).

CQL collections are designed to store small amounts of data such as a person's phone numbers or addresses, or tags/labels in a product catalogue. They are not designed for unbounded datasets.

When reading the contents of a CQL collection, the entire collection is retrieved even when you just want to read one element because the collection is not indexed. For this reason, you should model your data differently and instead store them as clustering columns if you intend to have large amounts of data in a set. Cheers!

1 comment Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

gduan2000 avatar image gduan2000 commented ·
hi Erick, is there a way to set the max number of items for a collection? say I only want to allow my collection column Set<text> to have 1000 items.
0 Likes 0 ·