ray_109035 avatar image
ray_109035 asked Erick Ramirez commented

Which Linux IO scheduler do I use for virtualizing DSE/Cassandra on Nutanix?

In this post, I'll go through the Linux IO settings for vdisks.

10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

ray_109035 avatar image
ray_109035 answered Erick Ramirez commented

The Nutanix Enterprise Cloud provides a globally accessible, tiered storage pool to ensure data is accessed (read or write) from the most performant storage tier. In the case of hybrid appliances that comprise of both SSD and HDD, then your DSE/Cassandra instance will always access data via the SSD tier. Similarly, for All Flash and NVMe based appliances, data will always be read from, and written to, the fastest solid state backed tier. Whether that be SSD or NVMe, we term this our "hot" tier.

A number of best practices exist, based on the fact that I/O will always interact with the "hot" tier. The following kernel level changes will improve database behaviour on the Nutanix platform. So, assuming your vdisks are enumerated sdb,sbc,sdd,sde,sdf,sdg:

# pwd
# cat 99-nutanix.rules
ACTION=="add|change", KERNEL=="sd[b-g]", ATTR{queue/scheduler}="noop"
ACTION=="add|change", KERNEL=="sd[b-g]", ATTR{queue/nr_requests}="128"
ACTION=="add|change", KERNEL=="sd[b-g]", ATTR{queue/max_sectors_kb}="1024"
ACTION=="add|change", KERNEL=="sd[b-g]", ATTR{queue/rotational}="0"

noop - this scheduler is recommended where the underlying disk infrastructure is performing I/O scheduling on virtual machines. Especially on SSDs that provide their own queueing.

nr_requests - maximum number of read and write requests that can be queued at one time before the next process requesting a read or write is put to sleep. Default is 128, but try incrementing towards 1024 say, to see if any further improvement can be obtained.

max_sectors_kb - Maximum allowed size of an I/O request in kilobytes. Most modern Linux kernels may already have this set.

rotational - by default this is 0 for SSDs, it will negate the need to perform checks as to whether the underlying device is a spinning disk or not. Unnecessary scheduler logic is reduced.

To test the above settings without performing a reboot use:

udevadm control --reload-rules && udevadm trigger

Reboots are always preferred of course, and you should ensure that the changes persist.

You can read more best practice ideas for running Datastax DSE/Cassandra on Nutanix by reading the following freely available guide created by Nutanix Solutions Engineering :

Feel free to send feedback and help us to improve on the current revision.


Ray Hassan - Nutanix Solutions
1 comment Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Erick Ramirez avatar image Erick Ramirez ♦♦ commented ·

Thanks Ray for another nice post. Cheers!

1 Like 1 ·