1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
|
<html><head>
<title>Writing syslog Data to MySQL</title>
<meta name="KEYWORDS" content="syslog, mysql, syslog to mysql, howto">
</head>
<body>
<h1>Writing syslog messages to MySQL</h1>
<P><small><i>Written by
<a href="http://www.adiscon.com/en/people/rainer-gerhards.php">Rainer
Gerhards</a> (2005-08-02)</i></small></P>
<h2>Abstract</h2>
<p><i><b>In this paper, I describe how to write
<a href="http://www.monitorware.com/en/topics/syslog/">syslog</a>
messages to a <a href="http://www.mysql.com">MySQL</a> database.</b> Having
syslog messages in a database is often handy, especially when you intend to set
up a front-end for viewing them. This paper describes an approach
with <a href="http://www.rsyslog.com/">rsyslogd</a>, an alternative enhanced
syslog daemon natively supporting MySQL. I describe the components needed
to be installed and how to configure them.</i></p>
<h2>Background</h2>
<p>In many cases, syslog data is simply written to text files. This approach has
some advantages, most notably it is very fast and efficient. However, data
stored in text files is not readily accessible for real-time viewing and analysis.
To do that, the messages need to be in a database. There are various
ways to store syslog messages in a database. For example, some have the syslogd
write text files which are later feed via a separate script into the database.
Others have written scripts taking the data (via a pipe) from a
non-database-aware syslogd and store them as they appear. Some others use
database-aware syslogds and make them write the data directly to the database.
In this paper, I use that "direct write" approach. I think it is superior,
because the syslogd itself knows the status of the database connection and thus
can handle it intelligently (well ... hopefully ;)). I use rsyslogd to acomplish
this, simply because I have initiated the rsyslog project with
database-awareness as one goal.</p>
<p><b>One word of caution:</b> while message storage in the database provides an
excellent foundation for interactive analysis, it comes at a cost. Database i/o
is considerably slower than text file i/o. As such, directly writing to
the database makes sense only if your message volume is low enough to allow a)
the syslogd, b) the network, and c) the database server to catch
up with it. Some time ago, I have written a paper on
<a href="http://www.monitorware.com/Common/en/Articles/performance-optimizing-syslog-server.php">
optimizing syslog server performance</a>. While this paper talks about
Window-based solutions, the ideas in it are generic enough to apply here, too.
So it might be worth reading if you anticipate medium high to high traffic. If you
anticipate really high traffic (or very large traffic spikes), you should
seriously consider forgetting about direct database writes - in my opinion, such
a situation needs either a very specialised system or a different approach (the
text-file-to-database approach might work better for you in this case).
</p>
<h2>Overall System Setup</h2>
<P>In this paper, I concentrate on the server side. If you are thinking about
interactive syslog message review, you probably want to centralize syslog. In
such a scenario, you have multiple machines (the so-called clients) send their
data to a central machine (called server in this context). While I expect such a
setup to be typical when you are interested in storing messages in the database,
I do not describe how to set it up. This is beyond the scope of this paper. If
you search a little, you will probably find many good descriptions on how to
centralize syslog. If you do that, it might be a good idea to do it securely, so
you might also be interested in my paper on <a href="rsyslog_stunnel.html">
ssl-encrypting syslog message transfer</a>.</P>
<P>No matter how the messages arrive at the server, their processing is always the
same. So you can use this paper in combination with any description for
centralized syslog reporting.</P>
<P>As I already said, I use rsyslogd on the server. It has intrinsic support for
talking to MySQL databases. For obvious reasons, we also need an instance of
MySQL running. To keep us focussed, the setup of MySQL itself is also beyond the
scope of this paper. I assume that you have successfully installed MySQL and
also have a front-end at hand to work with it (for example,
<a href="http://www.phpmyadmin.net/">phpMyAdmin</a>). Please make sure that this
is installed, actually working and you have a basic understanding of how to
handle it.</P>
<h2>Setting up the system</h2>
<p>You need to download and install rsyslogd first. Obtain it from the
<a href="http://www.rsyslog.com/">rsyslog site</a>. Make sure that you disable
stock syslogd, otherwise you will experience some difficulties.</p>
<p>It is important to understand how rsyslogd talks to the database. In rsyslogd,
there is the concept of "templates". Basically, a template is a string that
includes some replacement characters, which are called "properties" in rsyslog.
Properties are accessed via the "<a href="property_replacer.html">Property
Replacer</a>". Simply said, you access properties by including their name
between percent signs inside the template. For example, if the syslog message is
"Test", the template "%msg%" would be expanded to "Test". Rsyslogd supports
sending template text as a SQL statement to MySQL. As such, the template must be
a valid SQL statement. There is no limit in what the statement might be, but
there are some obvious and not so obvious choices. For example, a template "drop
table xxx" is possible, but does not make an awful lot of sense. In practice,
you will always use an "insert" statment inside the template.</p>
<p>An example: if you would just like to store the msg part of the full syslog
message, you have probably created a table "syslog" with a single column "message".
In such a case, a good template would be "insert into syslog(message) values ('%msg%')".
With the example above, that would be expanded to "insert into syslog(message)
values('Test')". This expanded string is then sent to the database. It's that
easy, no special magic. The only thing you must ensure is that your template
expands to a proper SQL statement and that this statement matches your database
design.</p>
<p>Does that mean you need to create database schema yourself and also must
fully understand rsyslogd's properties? No, that's not needed. Because we
anticipated that folks are probably more interested in getting things going instead
of designing them from scratch. So we have provided a default schema as well
as build-in support for it. This schema also offers an additional
benefit: rsyslog is part of <a href="http://www.adiscon.com/en/">Adiscon</a>'s
<a href="http://www.monitorware.com/en/">MonitorWare product line</a> (which
includes open source and closed source members). All of these tools share the
same default schema and know how to operate on it. For this reason, the default
schema is also called the "MonitorWare Schema". If you use it, you can simply
add <a href="http://www.phplogcon.org/">phpLogCon, a GPLed syslog web interface</a>,
to your system and have instant interactive access to your database. So there
are some benefits in using the provided schema.</p>
<p>The schema definition is contained in the file "createDB.sql". It comes with
the rsyslog package. Review it to check that the database name is acceptable for
you. Be sure to leave the table and field names unmodified, because
otherwise you need to customize rsyslogd's default sql template, which we do not
do
in this paper. Then, run the script with your favourite MySQL tool. Double-check
that the table was successfully created.</p>
<p>Next, we need to tell rsyslogd to write data to the database. As we use
the default schema, we do NOT need to define a template for this. We can use the
hardcoded one (rsyslogd handles the proper template linking). So all we need to
do is add a simple selector line to /etc/rsyslog.conf:</p>
<blockquote>
<p><code>*.*
>database-server,database-name,database-userid,database-password</code></p>
</blockquote>
<p>In many cases, MySQL will run on the local machine. In this case, you can
simply use "127.0.0.1" for <i>database-server</i>. This can be especially
advisable, if you do not need to expose MySQL to any process outside of the
local machine. In this case, you can simply bind it to 127.0.0.1, which provides
a quite secure setup. Of course, also supports remote MySQL instances. In that
case, use the remote server name (e.g. mysql.example.com) or IP-address. The <i>
database-name</i> by default is "syslog". If you have modified the default, use
your name here. <i>Database-userid</i> and <i>-password</i> are the credentials
used to connect to the database. As they are stored in clear text in
rsyslog.conf, that user should have only the least possible privileges. It is
sufficient to grant it INSERT privileges to the systemevents table, only. As a
side note, it is strongly advisable to make the rsyslog.conf file readable by
root only - if you make it world-readable, everybody could obtain the password
(and eventually other vital information from it). In our example, let's assume
you have created a MySQL user named "syslogwriter" with a password of
"topsecret" (just to say it bluntly: such a password is NOT a good idea...). If
your MySQL database is on the local machine, your rsyslog.conf line might look
like in this sample:</p>
<blockquote>
<p><code>*.*
>127.0.0.1,syslog,syslogwriter,topsecret</code></p>
</blockquote>
<p>Save rsyslog.conf, restart rsyslogd - and you should see syslog messages
being stored in the "systemevents" table!</p>
<p>The example line stores every message to the database. Especially if you have
a high traffic volume, you will probably limit the amount of messages being
logged. This is easy to acomplish: the "write database" action is just a regular
selector line. As such, you can apply normal selector-line filtering. If, for
example, you are only interested in messages from the mail subsystem, you can
use the following selector line:</p>
<blockquote>
<p><code>mail.*
>127.0.0.1,syslog,syslogwriter,topsecret</code></p>
</blockquote>
<p>Review the <a href="rsyslog_conf.html">rsyslog.conf</a> documentation for
details on selector lines and their filtering.</p>
<p><b>You have now completed everything necessary to store syslog messages to
the MySQL database.</b> If you would like to try out a front-end, you might want
to look at <a href="http://www.phplogcon.org/">phpLogCon</a>, which displays
syslog data in a browser. As of this writing, phpLogCon is not yet a powerful
tool, but it's open source, so it might be a starting point for your own
solution.</p>
<h2>On Reliability...</h2>
<p>Rsyslogd writes syslog messages directly to the database. This implies that
the database must be available at the time of message arrival. If the database
is offline, no space is left or something else goes wrong - rsyslogd can not
write the database record. If rsyslogd is unable to store a message, it performs
one retry. This is helpful if the database server was restarted. In this case,
the previous connection was broken but a reconnect immediately succeeds. However,
if the database is down for an extended period of time, an immediate retry does
not help. While rsyslogd could retry until it finally succeeds, that would have
negative impact. Syslog messages keep coming in. If rsyslogd would be busy
retrying the database, it would not be able to process these messages.
Ultimately, this would lead to loss of newly arrived messages.</p>
<p>In most cases, rsyslogd is configured not only to write to the database but
to perform other actions as well. In the always-retry scenario, that would mean
no other actions would be carried out. As such, the design of rsyslogd is
limited to a single retry. If that does not succeed, the current message is will
not be written to the database and the MySQL database writer be suspended for a
short period of time. Obviously, this leads to the loss of the current message
as well as all messages received during the suspension period. But they are only
lost in regard to the database, all other actions are correctly carried out.
While not perfect, we consider this to be a better approach then the potential
loss of all messages in all actions.</p>
<p><b>In short: try to avoid database downtime if you do not want to experience
message loss.</b></p>
<p>Please note that this restriction is not rsyslogd specific. All approachs to
real-time database storage share this problem area.</p>
<h2>Conclusion</h2>
<P>With minumal effort, you can use rsyslogd to write syslog messages to a MySQL
database. Once the messages are arrived there, you can interactivley review and
analyse them. In practice, the messages are also stored in text files for
longer-term archival and the databases are cleared out after some time (to avoid
becoming too slow). If you expect an extremely high syslog message volume,
storing it in real-time to the database may outperform your database server. In
such cases, either filter out some messages or think about alternate approaches
involving non-real-time database writing (beyond the scope of this paper).</P>
<P>The method outlined in this paper provides an easy to setup and maintain
solution for most use cases, especially with low and medium syslog message
volume (or fast database servers).</P>
<h3>Feedback Requested</h3>
<P>I would appreciate feedback on this paper. If you have additional ideas,
comments or find bugs, please
<a href="mailto:rgerhards@adiscon.com">let me know</a>.</P>
<h2>References and Additional Material</h2>
<ul>
<li><a href="http://www.rsyslog.com">www.rsyslog.com</a> - the rsyslog site</li>
<li>
<a href="http://www.monitorware.com/Common/en/Articles/performance-optimizing-syslog-server.php">
Paper on Syslog Server Optimization</a></li>
</ul>
<h2>Revision History</h2>
<ul>
<li>2005-08-02 *
<a href="http://www.adiscon.com/en/people/rainer-gerhards.php">Rainer Gerhards</a> *
initial version created</li>
<li>2005-08-03 *
<a href="http://www.adiscon.com/en/people/rainer-gerhards.php">Rainer Gerhards</a>
* added references to demo site</li>
<li>2007-06-13 *
<a href="http://www.adiscon.com/en/people/rainer-gerhards.php">Rainer Gerhards</a>
* removed demo site - was torn down because too expensive for usage count</li>
</ul>
<h2>Copyright</h2>
<p>Copyright (c) 2005-2007
<a href="http://www.adiscon.com/en/people/rainer-gerhards.php">Rainer Gerhards</a>
and <a href="http://www.adiscon.com/en/">Adiscon</a>.</p>
<p>Permission is granted to copy, distribute and/or modify this document under
the terms of the GNU Free Documentation License, Version 1.2 or any later
version published by the Free Software Foundation; with no Invariant Sections,
no Front-Cover Texts, and no Back-Cover Texts. A copy of the license can be
viewed at <a href="http://www.gnu.org/copyleft/fdl.html">
http://www.gnu.org/copyleft/fdl.html</a>.</p>
</body>
</html>
|