- Notes on the Generic Block Layer Rewrite in Linux 2.5
- - Generic Block Device Capability (/sys/block/<disk>/capability)
+ - Generic Block Device Capability (/sys/block/<device>/capability)
+ - CFQ IO scheduler tunables
+ - Block data integrity
- Deadline IO scheduler tunables
- Block io priorities (in CFQ scheduler)
+ - Queue's sysfs entries
- The members of struct request (in include/linux/blkdev.h)
- - Block layer statistics in /sys/block/<dev>/stat
+ - Block layer statistics in /sys/block/<device>/stat
- Switching I/O schedulers at runtime
+CFQ (Complete Fairness Queueing)
+The main aim of CFQ scheduler is to provide a fair allocation of the disk
+I/O bandwidth for all the processes which requests an I/O operation.
+CFQ maintains the per process queue for the processes which request I/O
+operation(syncronous requests). In case of asynchronous requests, all the
+requests from all the processes are batched together according to their
+process's I/O priority.
CFQ ioscheduler tunables
@@ -25,6 +36,72 @@ there are multiple spindles behind single LUN (Host based hardware RAID
controller or for storage arrays), setting slice_idle=0 might end up in better
throughput and acceptable latencies.
+This specifies, given in Kbytes, the maximum "distance" for backward seeking.
+The distance is the amount of space from the current head location to the
+sectors that are backward in terms of distance.
+This parameter allows the scheduler to anticipate requests in the "backward"
+direction and consider them as being the "next" if they are within this
+distance from the current head location.
+This parameter is used to compute the cost of backward seeking. If the
+backward distance of request is just 1/back_seek_penalty from a "front"
+request, then the seeking cost of two requests is considered equivalent.
+So scheduler will not bias toward one or the other request (otherwise scheduler
+will bias toward front request). Default value of back_seek_penalty is 2.
+This parameter is used to set the timeout of asynchronous requests. Default
+value of this is 248ms.
+This parameter is used to set the timeout of synchronous requests. Default
+value of this is 124ms. In case to favor synchronous requests over asynchronous
+one, this value should be decreased relative to fifo_expire_async.
+This parameter is same as of slice_sync but for asynchronous queue. The
+default value is 40ms.
+This parameter is used to limit the dispatching of asynchronous request to
+device request queue in queue's slice time. The maximum number of request that
+are allowed to be dispatched also depends upon the io priority. Default value
+for this is 2.
+When a queue is selected for execution, the queues IO requests are only
+executed for a certain amount of time(time_slice) before switching to another
+queue. This parameter is used to calculate the time slice of synchronous
+time_slice is computed using the below equation:-
+time_slice = slice_sync + (slice_sync/5 * (4 - prio)). To increase the
+time_slice of synchronous queue, increase the value of slice_sync. Default
+value is 100ms.
+This specifies the number of request dispatched to the device queue. In a
+queue's time slice, a request will not be dispatched if the number of request
+in the device exceeds this parameter. This parameter is used for synchronous
+In case of storage with several disk, this setting can limit the parallel
+processing of request. Therefore, increasing the value can imporve the
+performace although this can cause the latency of some I/O to increase due
+to more number of requests.
CFQ IOPS Mode for group scheduling
Basic CFQ design is to provide priority based time slices. Higher priority
Files denoted with a RO postfix are readonly and the RW postfix means
+add_random (RW)
+This file allows to trun off the disk entropy contribution. Default
+value of this file is '1'(on).
+discard_granularity (RO)
+This shows the size of internal allocation of the device in bytes, if
+reported by the device. A value of '0' means device does not support
+the discard functionality.
+discard_max_bytes (RO)
+Devices that support discard functionality may have internal limits on
+the number of bytes that can be trimmed or unmapped in a single operation.
+The discard_max_bytes parameter is set by the device driver to the maximum
+number of bytes that can be discarded in a single operation. Discard
+requests issued to the device must not exceed this limit. A discard_max_bytes
+value of 0 means that the device does not support discard functionality.
+discard_zeroes_data (RO)
+When read, this file will show if the discarded block are zeroed by the
+device or not. If its value is '1' the blocks are zeroed otherwise not.
hw_sector_size (RO)
This is the hardware sector size of the device, in bytes.
+iostats (RW)
+This file is used to control (on/off) the iostats accounting of the
+logical_block_size (RO)
+This is the logcal block size of the device, in bytes.
max_hw_sectors_kb (RO)
This is the maximum number of kilobytes supported in a single data transfer.
+max_integrity_segments (RO)
+When read, this file shows the max limit of integrity segments as
+set by block layer which a hardware controller can handle.
max_sectors_kb (RW)
This is the maximum number of kilobytes that the block layer will allow
for a filesystem request. Must be smaller than or equal to the maximum
size allowed by the hardware.
+max_segments (RO)
+Maximum number of segments of the device.
+max_segment_size (RO)
+Maximum segment size of the device.
+minimum_io_size (RO)
+This is the smallest preferred io size reported by the device.
nomerges (RW)
This enables the user to disable the lookup logic involved with IO
@@ -38,11 +89,31 @@ read or write requests. Note that the total allocated number may be twice
this amount, since it applies only to reads or writes (not the accumulated
+To avoid priority inversion through request starvation, a request
+queue maintains a separate request pool per each cgroup when
+CONFIG_BLK_CGROUP is enabled, and this parameter applies to each such
+per-block-cgroup request pool. IOW, if there are N block cgroups,
+each request queue may have upto N request pools, each independently
+regulated by nr_requests.
+optimal_io_size (RO)
+This is the optimal io size reported by the device.
+physical_block_size (RO)
+This is the physical block size of device, in bytes.
read_ahead_kb (RW)
Maximum number of kilobytes to read-ahead for filesystems on this block
+rotational (RW)
+This file is used to stat if the device is of rotational type or
+non-rotational type.
rq_affinity (RW)
If this option is '1', the block layer will migrate request completions to the