Erick Ramirez avatar image
Erick Ramirez posted

FAQ - What is the recommended Linux IO scheduler?


Directories (and by extension filesystems/volumes/disks) used by Apache Cassandra all serve different purposes:

  • data - mostly read-heavy unless tables are on LCS with a high-update workload,
  • commitlog - mostly writes by the nature of commits
  • solr_data_dir - both reads and writes, purely for Solr (only applies to nodes running DSE with Search enabled)

These directories need to be on separate disks so they are not competing for the same IO bandwidth. However, this isn't necessary when it comes to NVMe SSDs because those are amazingly fast.

IO schedulers

The noop scheduler (no operation) uses a first-in-first-out (FIFO) algorithm and is good for volumes backed by multiple disks since the IO is spread across the disks.

The deadline scheduler splits requests into queues. Each request has a timestamp associated with it and the kernel uses it to calculate an "expiration" on the request, hence the name "deadline". Requests closest to the deadline are prioritised by the scheduler. By default, reads have a shorter expiration (500ms) over writes (5s) effectively prioritising reads over writes.


The Cassandra data directory is best suited for the deadline scheduler since reads are given priority over compactions. The choice of scheduler isn't so relevant for the commitlog since it is almost purely write-only workload.

When it comes to the solr_data_dir for DSE Search nodes, choose the noop scheduler when the volume is backed by multiple disks, otherwise use the deadline scheduler as the default.

As a final point, the choice of scheduler for servers with NVMe SSDs is irrelevant since they are extremely fast and very difficult to saturate. In most cases, it is advisable to not use an I/O scheduler (set to none) since the kernel will waste resources scheduling I/O requests unnecessarily, again because the disks are very fast.

10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.



Erick Ramirez contributed to this article

Related Articles