summaryrefslogtreecommitdiffstats
path: root/align/virt-alignment-scan.pod
blob: dcc71b4a223bb20d2685c1f9a4c952dd1e3eb1fe (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
=encoding utf8

=head1 NAME

virt-alignment-scan - Check alignment of virtual machine partitions

=head1 SYNOPSIS

 virt-alignment-scan [--options] -d domname

 virt-alignment-scan [--options] -a disk.img [-a disk.img ...]

=head1 DESCRIPTION

When older operating systems install themselves, the partitioning
tools place partitions at a sector misaligned with the underlying
storage (commonly the first partition starts on sector C<63>).
Misaligned partitions can result in an operating system issuing more
I/O than should be necessary.

The virt-alignment-scan tool checks the alignment of partitions in
virtual machines and disk images and warns you if there are alignment
problems.

Currently there is no virt tool for fixing alignment problems.  You
can only reinstall the guest operating system.  The following NetApp
document summarises the problem and possible solutions:
L<http://media.netapp.com/documents/tr-3747.pdf>

=head1 OUTPUT

To run this tool on a disk image directly, use the I<-a> option:

 $ virt-alignment-scan -a winxp.img
 /dev/sda1        32256          512    bad (alignment < 4K)

 $ virt-alignment-scan -a fedora16.img
 /dev/sda1      1048576         1024K   ok
 /dev/sda2      2097152         2048K   ok
 /dev/sda3    526385152         2048K   ok

To run the tool on a guest known to libvirt, use the I<-d> option and
possibly the I<-c> option:

 # virt-alignment-scan -d RHEL5
 /dev/sda1        32256          512    bad (alignment < 4K)
 /dev/sda2    106928640          512    bad (alignment < 4K)

 $ virt-alignment-scan -c qemu:///system -d Win7TwoDisks
 /dev/sda1      1048576         1024K   ok
 /dev/sda2    105906176         1024K   ok
 /dev/sdb1        65536           64K   ok

The output consists of 4 or more whitespace-separated columns.  Only
the first 4 columns are signficant if you want to parse this from a
program.  The columns are:

=over 4

=item col 1

the device and partition name (eg. C</dev/sda1> meaning the
first partition on the first block device)

=item col 2

the start of the partition in bytes

=item col 3

the alignment in bytes or Kbytes (eg. C<512> or C<4K>)

=item col 4

C<ok> if the alignment is best for performance, or C<bad> if the
alignment can cause performance problems

=item cols 5+

optional free-text explanation.

=back

The exit code from the program changes depending on whether poorly
aligned partitions were found.  See L</EXIT STATUS> below.

If you just want the exit code with no output, use the I<-q> option.

=head1 OPTIONS

=over 4

=item B<--help>

Display brief help.

=item B<-a> file

=item B<--add> file

Add I<file> which should be a disk image from a virtual machine.

The format of the disk image is auto-detected.  To override this and
force a particular format use the I<--format=..> option.

=item B<-c> URI

=item B<--connect> URI

If using libvirt, connect to the given I<URI>.  If omitted, then we
connect to the default libvirt hypervisor.

If you specify guest block devices directly (I<-a>), then libvirt is
not used at all.

=item B<-d> guest

=item B<--domain> guest

Add all the disks from the named libvirt guest.  Domain UUIDs can be
used instead of names.

=item B<--format=raw|qcow2|..>

=item B<--format>

The default for the I<-a> option is to auto-detect the format of the
disk image.  Using this forces the disk format for I<-a> options which
follow on the command line.  Using I<--format> with no argument
switches back to auto-detection for subsequent I<-a> options.

For example:

 virt-alignment-scan --format=raw -a disk.img

forces raw format (no auto-detection) for C<disk.img>.

 virt-alignment-scan --format=raw -a disk.img --format -a another.img

forces raw format (no auto-detection) for C<disk.img> and reverts to
auto-detection for C<another.img>.

If you have untrusted raw-format guest disk images, you should use
this option to specify the disk format.  This avoids a possible
security problem with malicious guests (CVE-2010-3851).

=item B<-q>

=item B<--quiet>

Don't produce any output.  Just set the exit code
(see L</EXIT STATUS> below).

=item B<-v>

=item B<--verbose>

Enable verbose messages for debugging.

=item B<-V>

=item B<--version>

Display version number and exit.

=item B<-x>

Enable tracing of libguestfs API calls.

=back

=head1 RECOMMENDED ALIGNMENT

Operating systems older than Windows 2008 and Linux before ca.2010
place the first sector of the first partition at sector 63, with a 512
byte sector size.  This happens because of a historical accident.
Drives have to report a cylinder / head / sector (CHS) geometry to the
BIOS.  The geometry is completely meaningless on modern drives, but it
happens that the geometry reported always has 63 sectors per track.
The operating system therefore places the first partition at the start
of the second "track", at sector 63.

When the guest OS is virtualized, the host operating system and
hypervisor may prefer accesses aligned to one of:

=over 4

=item * 512 bytes

if the host OS uses local storage directly on hard drive partitions,
and the hard drive has 512 byte physical sectors.

=item * 4 Kbytes

for local storage on new hard drives with 4Kbyte physical sectors; for
file-backed storage on filesystems with 4Kbyte block size; or for some
types of network-attached storage.

=item * 64 Kbytes

for high-end network-attached storage.  This is the optimal block size
for some NetApp hardware.

=item * 1 Mbyte

see L</1 MB PARTITION ALIGNMENT> below.

=back

Partitions which are not aligned correctly to the underlying
storage cause extra I/O.  For example:

                       sect#63
                       +--------------------------+------
                       |         guest            |
                       |    filesystem block      |
 ---+------------------+------+-------------------+-----+---
    |  host block             |  host block             |
    |                         |                         |
 ---+-------------------------+-------------------------+---

In this example, each time a 4K guest block is read, two blocks on the
host must be accessed (so twice as much I/O is done).  When a 4K guest
block is written, two host blocks must first be read, the old and new
data combined, and the two blocks written back (4x I/O).

=head2 LINUX HOST BLOCK AND I/O SIZE

New versions of the Linux kernel expose the physical and logical block
size, and minimum and recommended I/O size.

For a typical hard drive with 512 byte sectors:

 $ cat /sys/block/sda/queue/physical_block_size
 512
 $ cat /sys/block/sda/queue/logical_block_size
 512
 $ cat /sys/block/sda/queue/minimum_io_size
 512
 $ cat /sys/block/sda/queue/optimal_io_size
 0

For a NetApp LUN:

 $ cat /sys/block/sdc/queue/logical_block_size
 512
 $ cat /sys/block/sdc/queue/physical_block_size
 512
 $ cat /sys/block/sdc/queue/minimum_io_size
 4096
 $ cat /sys/block/sdc/queue/optimal_io_size
 65536

The NetApp allows 512 byte accesses (but they will be very
inefficient), prefers a minimum 4K I/O size, but the optimal I/O size
is 64K.

For detailed information about what these numbers mean, see
L<http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/newstorage-iolimits.html>

[Thanks to Mike Snitzer for providing NetApp data and additional
information.]

=head2 1 MB PARTITION ALIGNMENT

Microsoft picked 1 MB as the default alignment for all partitions
starting with Windows 2008 Server, and Linux has followed this.

Assuming 512 byte sectors in the guest, you will now see the first
partition starting at sector 2048, and subsequent partitions (if any)
will start at a multiple of 2048 sectors.

1 MB alignment is compatible with all current alignment requirements
(4K, 64K) and provides room for future growth in physical block sizes.

=head2 SETTING ALIGNMENT

Currently there is no virt tool for fixing alignment problems in
guests.  This is a difficult problem to fix because simply moving
partitions around breaks the bootloader, necessitating either manual
reinstallation of the bootloader using a rescue disk, or complex and
error-prone hacks.

L<virt-resize(1)> does not change the alignment of the first
partition, but it does align the second and subsequent partitions to a
multiple of 64 or 128 sectors (depending on the version of
virt-resize, 128 in virt-resize E<ge> 1.13.19).  For operating systems
that have a separate boot partition, virt-resize could be used to
align the main OS partition, so that the majority of OS accesses
except at boot will be aligned.

The easiest way to correct partition alignment problems is to
reinstall your guest operating systems.  If you install operating
systems from templates, ensure these have correct partition alignment
too.

For older versions of Windows, the following NetApp document contains
useful information: L<http://media.netapp.com/documents/tr-3747.pdf>

For Red Hat Enterprise Linux E<le> 5, use a Kickstart script that
contains an explicit C<%pre> section that creates aligned partitions
using L<parted(8)>.  Do not use the Kickstart C<part> command.  The
NetApp document above contains an example.

=head1 SHELL QUOTING

Libvirt guest names can contain arbitrary characters, some of which
have meaning to the shell such as C<#> and space.  You may need to
quote or escape these characters on the command line.  See the shell
manual page L<sh(1)> for details.

=head1 EXIT STATUS

This program returns:

=over 4

=item *

0

successful exit, all partitions are aligned E<ge> 64K for best performance

=item *

1

an error scanning the disk image or guest

=item *

2

successful exit, some partitions have alignment E<lt> 64K which can result
in poor performance on high end network storage

=item *

3

successful exit, some partitions have alignment E<lt> 4K which can result
in poor performance on most hypervisors

=back

=head1 SEE ALSO

L<guestfs(3)>,
L<guestfish(1)>,
L<virt-filesystems(1)>,
L<virt-rescue(1)>,
L<virt-resize(1)>,
L<http://libguestfs.org/>.

=head1 AUTHOR

Richard W.M. Jones L<http://people.redhat.com/~rjones/>

=head1 COPYRIGHT

Copyright (C) 2011 Red Hat Inc.

This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.