website/src/learning/advanced-virt-df/index.html


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193

[% topdir = "../.." -%]
[% PROCESS globals -%]
[% WRAPPER page
   title = "Advanced use of virt-df"
   h1 = "Advanced use of virt-df"
   section = "learning"
%]

<p>
This tutorial discusses advanced use of
<a href="http://www.libguestfs.org/virt-df.1.html">virt-df</a>.
Some of the topics covered are:
</p>

<ul>
<li> Using CSV output to import data into spreadsheets, databases
  and monitoring tools. </li>
<li> Graphing and using trends to predict future disk usage of guests. </li>
<li> Generating alerts when virtual machines are near to running
  out of disk space. </li>
<li> Using virt-df safely with untrusted and malicious guests. </li>
</ul>

[% WRAPPER h2 h2="CSV output" anchor="csv" %]

<p>
If you are going to do anything apart from looking at the output of
virt-df, you should use the <code>--csv</code> flag so that virt-df
produces machine-readable
<a href="http://en.wikipedia.org/wiki/Comma-separated_Values">comma-separated
values (CSV)</a> output.  The output looks like this:
</p>

<pre>
# <b>virt-df --csv</b>
Virtual Machine,Filesystem,1K-blocks,Used,Available,Use%
"CentOS5x64","/dev/sda1",101086,19290,76577,19.1%
"CentOS5x64","/dev/VolGroup00/LogVol00",8030648,3116144,4499992,38.8%
<i>[etc]</i>
</pre>

[% END %]

[% WRAPPER h2 h2="Using cron" anchor="cron" %]

<p>
You can write a cron job to collect virt-df output periodically
(I collect it once a day).
</p>

<pre>
# <b>cat &gt; /etc/cron.daily/local-virt-df</b>
#!/bin/bash -
date=$(date +%F)
virt-df --csv &gt; /var/local/virt-df.$date
# <b>chmod 0755 /etc/cron.daily/local-virt-df</b>
</pre>

<p>
The cron job will create one file every day in <code>/var/local</code>.
</p>

[% END %]

[% WRAPPER h2
   h2="Importing the data into spreadsheets and databases" anchor="import" %]

<p>
CSV files can be loaded directly into spreadsheets and databases:
</p>

<p>
<img src="df-openoffice.png" width="721" height="642"
  longdesc="Screenshot showing virt-df output in OpenOffice Calc"/>
</p>

<pre>
<b>CREATE TABLE df_data (
  "Virtual Machine" TEXT NOT NULL,
  "Filesystem" TEXT NOT NULL,
  "1K-blocks" BIGINT NOT NULL,
  "Used" BIGINT NOT NULL,
  "Available" BIGINT NOT NULL,
  "Use%" TEXT
);</b>
<b>COPY df_data FROM 'df.csv' WITH DELIMITER ',' CSV HEADER;</b>
</pre>

[% END %]

[% WRAPPER h2 h2="Sorting and querying the data" anchor="query" %]

<p>
Once your data has been imported, you can start to process it,
for example finding out which virtual machines are running
out of space:
</p>

<p>
<img src="df-openoffice-sorted.png" width="653" height="143"
  longdesc="Screenshot showing virt-df output in OpenOffice Calc"/>
</p>

<p>
The following PostgreSQL query on the previously imported data
shows all filesystems with over 60% usage:
</p>

<pre>
<b>SELECT "Virtual Machine", "Filesystem"
  FROM df_data
 WHERE (100. * "Used" / "1K-blocks") &gt; 60;</b>

 Virtual Machine |              Filesystem              
-----------------+--------------------------------------
 Debian5x64      | /dev/debian5x64.home.annexia.org/usr
 OpenSUSE11x64   | /dev/sda2
 VBox            | /dev/vg_f13x64/lv_root
(3 rows)
</pre>

[% END %]

[% WRAPPER h2 h2="Graphs and trends" anchor="graphs" %]

<p>
You can use daily historical data to graph disk usage.
In theory at least you could use trends in this data
to predict future requirements, although in my experience
usage tends to be <q>lumpy</q> &mdash; installing OpenOffice
in a VM results in a big increase in usage which is not
indicative of any trend.
</p>

<p>
In any case, here is a graph of usage data for one VM filesystem
over approximately one month, generated using OpenOffice Calc
(Insert&nbsp;&rarr;&nbsp;Chart) with a linear trend line
(select chart and do Insert&nbsp;&rarr;&nbsp;Trend&nbsp;Lines):
</p>

<p>
<img src="df-centos-graph.png" width="583" height="340"
  longdesc="Output of virt-df for one VM graphed over one month" />
</p>

[% END %]

[% WRAPPER h2 h2="Alerts" anchor="alerts" %]

XXX


[% END %]

[% WRAPPER h2 h2="Safety" anchor="safety" %]

<p>
virt-df is generally safe to use with untrusted or malicious guests,
but there are some things to be aware of.
</p>

<p>
An untrusted guest can present any disk data that it wants
to the host.  By simple manipulations of the filesystem it can show
the disk as full when it is empty, or empty when it is full.  This
is not important in itself, it only becomes an issue if a guest
could manipulate the statistics of another unrelated VM.
</p>

<p>
Older versions of virt-df ran a separate libguestfs appliance for each
guest.  This is safe because one guest cannot possibly interfere with
the statistics from another, but also slow.  Since virt-df 1.5.0,
several unrelated guests may share a single libguestfs appliance,
which is much faster but there is a (slim) possibility that one guest
might corrupt the appliance leading to misreported statistics for
another guest.
</p>

<p>
You can get the old, safest possible behaviour by adding
the <code>--one-per-guest</code> flag to the virt-df command line.
</p>

[% END %]

[% END -%]