Build Cloud-Native apps with Apache Cassandra

GOT QUESTIONS from the Workshop? You're in the right place! Post a question here and we'll get you answers.

Click here for Week 8 Materials and Homework.

Follow us on Eventbrite to get notified when new workshops are scheduled!


question

nirjharray avatar image
nirjharray asked ·

How does large amounts of data in a set collection impact performance?

As per the notes use of collections is recommended for small-ish numbers of elements to avoid performance issues.

  1. What is the threshold limit?
  2. Is it a right modeling to use SET data type for tags as it can be limiting it means compromising on the user experience?
workshopcollections
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered ·

The theoretical limit for the number of elements in a set collection is two billion. In practical terms, you should only store tens or hundreds of elements in a collection. To be clear, it is possible to store lots more in a set but storing thousands of elements isn't ideal.

NOTE: The limit used to be capped at MAX_UNSIGNED_SHORT (64K):

        if (bytes.remaining() > FBUtilities.MAX_UNSIGNED_SHORT)
            throw new InvalidRequestException(String.format("Set value is too long. Set values are limited to %d bytes but %d bytes value provided",
              FBUtilities.MAX_UNSIGNED_SHORT,
              bytes.remaining()));

but the cap has been removed since C* 3.0.1, 3.1 (CASSANDRA-10374).

CQL collections are designed to store small amounts of data such as a person's phone numbers or addresses, or tags/labels in a product catalogue. They are not designed for unbounded datasets.

When reading the contents of a CQL collection, the entire collection is retrieved even when you just want to read one element because the collection is not indexed. For this reason, you should model your data differently and instead store them as clustering columns if you intend to have large amounts of data in a set. Cheers!

Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.