Directories (and by extension filesystems/volumes/disks) used by Apache Cassandra all serve different purposes:
data- mostly read-heavy unless tables are on LCS with a high-update workload,
commitlog- mostly writes by the nature of commits
solr_data_dir- both reads and writes, purely for Solr (only applies to nodes running DSE with Search enabled)
These directories need to be on separate disks so they are not competing for the same IO bandwidth. However, this isn't necessary when it comes to NVMe SSDs because those are amazingly fast.
noop scheduler (no operation) uses a first-in-first-out (FIFO) algorithm and is good for volumes backed by multiple disks since the IO is spread across the disks.
deadline scheduler splits requests into queues. Each request has a timestamp associated with it and the kernel uses it to calculate an "expiration" on the request, hence the name "deadline". Requests closest to the deadline are prioritised by the scheduler. By default, reads have a shorter expiration (500ms) over writes (5s) effectively prioritising reads over writes.
data directory is best suited for the
deadline scheduler since reads are given priority over compactions. The choice of scheduler isn't so relevant for the
commitlog since it is almost purely write-only workload.
When it comes to the
solr_data_dir for DSE Search nodes, choose the
noop scheduler when the volume is backed by multiple disks, otherwise use the
deadline scheduler as the default.
As a final point, the choice of scheduler for servers with NVMe SSDs is irrelevant since they are extremely fast and very difficult to saturate. In most cases, it is advisable to not use an I/O scheduler (set to
none) since the kernel will waste resources scheduling I/O requests unnecessarily, again because the disks are very fast.