DataStax Academy FAQ

DataStax Academy migrated to a new learning management system (LMS) in July 2020. We are also moving to a new Cassandra Certification process so there are changes to exam bookings, voucher system and issuing of certificates.

Check out the Academy FAQ pages for answers to your questions:


question

Tri avatar image
Tri asked ·

How to best use a large amount of memory?

Hardware Selection: https://academy.datastax.com/units/21033-hardware-selections-datastax-enterprise-operations-apache-cassandra

Recommended for production: 16GB to 64GB

More memory means - Better read performance thanks to caching - Memtables hold more recently written data

QUESTION1 which caching is involved? is it the file system cache at the OS level or Cassandra specific cache? If both are involved, which one is dominant?

QUESTION2 Let's assume the node has an abundant amount of memory. Let's say 256GB of RAM. How should the system (both Cassandra and Linux) be configured to best benefit from that extra RAM? Providing that all the advices about disabling swap, setting ulimits to unlimited are already done as recommended in Tuning the Kernel

I guess that this RAM would be better used for caching. But which cache? Cassandra or OS? Also would a large amount of cache be detrimental somewhere? Like super long GC pauses?

configurationtuning
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered ·

The answers differ based on the version and distribution of Cassandra deployed on your cluster.

DataStax Enterprise

  1. For DSE 5.1 and earlier versions, DSE uses the Linux page cache to cache SSTables. For DSE 6.0 and newer versions, DSE reserves a portion of RAM to use as a file cache (file_cache_size_in_mb in cassandra.yaml).
  2. As above, nodes only benefit from more RAM for newer versions of DSE with a larger file cache.

Apache Cassandra

  1. Cassandra uses the Linux page cache.
  2. No real benefit beyond 64GB since it uses the OS page cache.

Cheers!

2 comments Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

How about key & row cache in table property: Enabling and configuring caching

CREATE TABLE users (
  userid text PRIMARY KEY,
  first_name text,
  last_name text,
)
WITH caching = { 
  'keys' : 'a lot?',
  'rows_per_partition' : 'how much?'
};
0 Likes 0 · ·

Not sure what exactly you're asking so I'll try my best to respond.

Key cache is enabled by default on ALL keys. It's usually quite small (a few MBs) so there's no disadvantage to having all keys cached.

Row cache is disabled by default. Typically only enable it for (a) small tables which are accessed regularly but hardly change, or (b) on tables where there are hot partitions. There is no benefit to enabling row cache on a table where the access pattern is randomly distributed because the cache hit rate would be low. If there aren't partitions in the table that are "hot" (read heavily compared to the rest of the partitions), the overhead of caching the rows yields no payoff. Cheers!

0 Likes 0 · ·