Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

javas avatar image
javas asked javas published

Cassandra Map collection type read latency

I have a three-node Cassandra cluster with replication factor 3 and consistency level LOCAL_QUORUM. My table consists of two columns of MAP<BLOB, BLOB> type. Εach map contains up to 100 entries. I'm writing (append) into both `map` and I'm reading from one (1R/1W per transaction).

After a few hours of writing and reading accross 500k partitions, the table statistics were as follows:

Percentile  SSTables  Write Latency  Read Latency  Partition Size  Cell Count
                           (micros)      (micros)         (bytes)                 
       50%     10.00          35.43       2346.80            1597          60
       75%     10.00          51.01       4055.27            2299          72
       95%     12.00         105.78      17436.92            6866         215
       98%     12.00         182.79      36157.19            6866         215
       99%     12.00         454.83      52066.35            8239         215
       Min      5.00           3.31        379.02             104           3
       Max     14.00      186563.16     322381.14            9887         310

So far, so good. The next step was to create 30 million new partitions.

After about 15 hours of writing (in random partitions) I noticed a massive TPS drop (about 2k) and the table statistics were as follows:

Percentile  SSTables  Write Latency  Read Latency  Partition Size  Cell Count
                           (micros)      (micros)         (bytes)                 
       50%      2.00          51.01      20924.30            1916          50
       75%      3.00          73.46      43388.63            1916          60
       95%      4.00         126.93      89970.66            1916          60
       98%      4.00         219.34     107964.79            2299          72
       99%      4.00         379.02     129557.75            6866         179
       Min      0.00           3.97         51.01             104           3
       Max      8.00      186563.16     322381.14            9887         310

Performing the first test again accross 500k partitions, read latency remained high:

Percentile  SSTables  Write Latency  Read Latency  Partition Size  Cell Count
                           (micros)      (micros)         (bytes)                 
       50%      5.00          51.01      30130.99            1916          60
       75%      6.00          73.46      62479.63            1916          60
       95%      7.00         152.32     129557.75            1916          60
       98%      8.00         263.21     155469.30            3311         103
       99%      8.00         545.79     186563.16            6866         179
       Min      3.00           3.97        454.83             104           3
       Max     10.00      107964.79     557074.61            9887         310

Read latency for this table increases even more when transactions involve writing and reading counter column(into another table):

Percentile  SSTables  Write Latency  Read Latency  Partition Size  Cell Count
                           (micros)      (micros)         (bytes)
       50%     10.00          42.51      62479.63            1916          50
       75%     10.00          61.21     107964.79            1916          60
       95%     12.00         105.78     186563.16            1916          60
       98%     12.00         182.79     223875.79            3311         103
       99%     12.00         379.02     268650.95            6866         179
       Min     6.00            4.77        545.79             104           3
       Max     14.00      129557.75     557074.61            9887         310


Why is the read latency so high? Does the `map` column increase the latency?

Any suggestions(configuration or schema changes)?


I am using prepared statements:

  • Fetch row by partition ID
 SELECT id,attr,uids FROM user_profile WHERE id=:id
  • Update map entries
UPDATE user_profile SET attr=attr+:attr, attr=attr-:attrstoremove, uids=uids+:newuserids, md=md+:metadata, md=md-:attrstoremove, up=:up WHERE id=:id
  • Increase counter
UPDATE user_profile_counter SET cnt=cnt+:cnt WHERE cnt_id=:cnt_id AND id=:id;

This is my schema:

CREATE TABLE IF NOT EXISTS PROFILING.USER_PROFILE
(
    ID     TEXT,            -- PROFILE ID
    ATTR   MAP<BLOB, BLOB>, --USER ATTRIBUTES
    MD     MAP<BLOB, BLOB>, --METADATA PER ATTRIBUTE
    UIDS   SET<TEXT>, 
    UP     TIMESTAMP,       --LAST_UPDATE
    PRIMARY KEY (ID)
) WITH caching = {
   'keys' : 'ALL',
   'rows_per_partition' : '1' 
};

CREATE TABLE IF NOT EXISTS PROFILES.USER_PROFILE_COUNTER
(
    ID TEXT,      -- PROFILE ID
    CNT_ID BLOB,  -- COUNTER ID
    CNT COUNTER,  -- COUNTER VALUE
    PRIMARY KEY (ID, CNT_ID)
) WITH caching = {
    'keys' : 'ALL',
    'rows_per_partition' : '10'
};

The data are encrypted. Here is a row sample(consisting of three map entries):

YkceUdD6qEvOLw3Wgd8zWA |{0x95f56f594522:
0xacb7f42c7f0ac8187f17a8f2c04e5065, 0xa365a3dc007d:
0x24252727706b5065f9e1f65efec7ced8, 0xf0d55b110f87:
0x5a5ef3b0a041af8c7acf4040333afc96} |
{0x95f56f594522:
0x000d31363333333334363636323639, 0xa365a3dc007d:
0x000d31363333333431323938363735, 0xf0d55b110f87:
0x000d31363333333431323938363735} |
{'46TyNYCKTplibRyAfFsNRPQbvfQINNIIY4WmItuPayfvjDjEp49bnXSXLmD9hAm9'} |
2021-10-04 09:54:58.675000+0000

Cluster info (per node)

- Total memory 18GB(heap 5GB)

- 6 CPU cores

Versions

- Cassandra 3.11

- DataStax Java driver 4.11.3

- JDK 16

read latency
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered javas edited

It's not obvious to me what the underlying cause of the read latency is and it isn't something I can troubleshoot for you in a Q&A forum. You'll need support assistance for this type of issue.

In any case, I noted some glaring issues in your post:

  1. The user_profile table is only ever going to have one row per partition so 'rows_per_partition' : '1' is effectively saying you want to "cache all partitions".
  2. You've also got row-caching enabled for the user_profile_counter table.
  3. Caching rows is only effective if (a) you are repeatedly reading a subset of rows, and (b) your hot data fits in the cache.
  4. If you're not reading the data again then (c) you're paying a penalty for caching on the first read but not getting the benefit if you don't read the same rows again.
  5. If the hot data does not fit in the cache, (d) you're paying a penalty when the cache is full since rows get evicted to make room for new rows.
  6. As the data in the table grows, the bad effects of (c) and (d) increases.
  7. Caching rows is expensive and has limited use.
  8. You didn't specify the query for the reads so I have no idea if your query is the problem.
  9. Heap size is too small. We recommend allocating at least 8GB to the heap but 16GB in production is preferable for moderate workloads which means your system should have 24-30GB of RAM.
  10. Cassandra 3.11 only supports Java 8. C* 4.0 supports both Java 8 and Java 11. Future releases of Cassandra will support Java 11 + Java 17 LTS.
1 comment Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Erick, row-caching is disabled (row_cache_size_in_mb is set to 0 in cassandra.yml). I've updated my answer including the queries.

0 Likes 0 ·