From c34e7d21146a3e6722bbf159a06750fdef23b4f8 Mon Sep 17 00:00:00 2001 From: Rainer Gerhards Date: Mon, 8 Feb 2010 14:30:08 +0100 Subject: integrated new PostgreSQL doc converted to html, linked, etc... --- doc/rsyslog_pgsql.html | 336 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 336 insertions(+) create mode 100644 doc/rsyslog_pgsql.html (limited to 'doc/rsyslog_pgsql.html') diff --git a/doc/rsyslog_pgsql.html b/doc/rsyslog_pgsql.html new file mode 100644 index 00000000..dcb9dc3a --- /dev/null +++ b/doc/rsyslog_pgsql.html @@ -0,0 +1,336 @@ + + + + + + + + + + + + + + + + + +

Writing syslog messages +to MySQL, PostgreSQL or any other supported Database

Written by Rainer +Gerhards with some additions by Marc +Schiffbauer (2008-02-28)

Abstract

In this paper, I describe +how to write syslog +messages to a MySQL +or PostgreSQL +database. Having syslog messages +in a database is often handy, especially when you intend to set up a +front-end for viewing them. This paper describes an approach with +rsyslogd, +an alternative enhanced syslog daemon natively supporting MySQL and +PostgreSQL. I describe the components needed to be installed and how +to configure them. Please note that as of this writing, rsyslog +supports a variety of databases. While this guide is still MySQL- and +PostgreSQL-focused, you can probably use it together with other ones +too. You just need to modify a few settings.

Background

In many cases, syslog data is simply +written to text files. This approach has some advantages, most +notably it is very fast and efficient. However, data stored in text +files is not readily accessible for real-time viewing and analysis. +To do that, the messages need to be in a database. There are various +ways to store syslog messages in a database. For example, some have +the syslogd write text files which are later feed via a separate +script into the database. Others have written scripts taking the data +(via a pipe) from a non-database-aware syslogd and store them as they +appear. Some others use database-aware syslogds and make them write +the data directly to the database. In this paper, I use that "direct +write" approach. I think it is superior, because the syslogd +itself knows the status of the database connection and thus can +handle it intelligently (well ... hopefully ;)). I use rsyslogd to +acomplish this, simply because I have initiated the rsyslog project +with database-awareness as one goal.

One word of caution: +while message storage in the database provides an excellent +foundation for interactive analysis, it comes at a cost. Database i/o +is considerably slower than text file i/o. As such, directly writing +to the database makes sense only if your message volume is low enough +to allow a) the syslogd, b) the network, and c) the database server +to catch up with it. Some time ago, I have written a paper on +optimizing +syslog server performance. While this paper +talks about Window-based solutions, the ideas in it are generic +enough to apply here, too. So it might be worth reading if you +anticipate medium high to high traffic. If you anticipate really high +traffic (or very large traffic spikes), you should seriously consider +forgetting about direct database writes - in my opinion, such a +situation needs either a very specialized system or a different +approach (the text-file-to-database approach might work better for +you in this case). +

Overall System Setup

In this paper, I concentrate on +the server side. If you are thinking about interactive syslog message +review, you probably want to centralize syslog. In such a scenario, +you have multiple machines (the so-called clients) send their data to +a central machine (called server in this context). While I expect +such a setup to be typical when you are interested in storing +messages in the database, I do not describe how to set it up. This is +beyond the scope of this paper. If you search a little, you will +probably find many good descriptions on how +to centralize syslog. If you do that, it might be a good idea to do +it securely, so you might also be interested in my paper on +ssl-encrypting +syslog message transfer.

No matter how the messages arrive at +the server, their processing is always the same. So you can use this +paper in combination with any description for centralized syslog +reporting.

As I already said, I use +rsyslogd on the server. It has intrinsic support for talking to the +supported databases. For obvious reasons, we also need an instance of +MySQL or PostgreSQL running. To keep us focused, the setup of the +database itself is also beyond the scope of this paper. I assume that +you have successfully installed the database and also have a +front-end at hand to work with it (for example, phpMyAdmin +or phpPgAdmin. +Please make sure that this is installed, actually working and you +have a basic understanding of how to handle it.

Setting up the system

You need to download and +install rsyslogd first. Obtain it from the rsyslog +site. Make sure that you disable stock +syslogd, otherwise you will experience some difficulties. On some +distributions (Fedora 8 and above, for example), rsyslog may +already by the default syslogd, in which case you obviously do not +need to do anything specific. For many others, there are prebuild +packages available. If you use either, please make sure that you have +the required database plugins for your database available. It usually +is a separate package and typically not +installed by default.

It is important to understand +how rsyslogd talks to the database. In rsyslogd, there is the concept +of "templates". Basically, a template is a string that +includes some replacement characters, which are called "properties" +in rsyslog. Properties are accessed via the "Property +Replacer". Simply said, you access +properties by including their name between percent signs inside the +template. For example, if the syslog message is "Test", the +template "%msg%" would be expanded to "Test". +Rsyslogd supports sending template text as a SQL statement to the +database. As such, the template must be a valid SQL statement. There +is no limit in what the statement might be, but there are some +obvious and not so obvious choices. For example, a template "drop +table xxx" is possible, but does not make an awful lot of sense. +In practice, you will always use an "insert" statement +inside the template.

An example: if you would just like to +store the msg part of the full syslog message, you have probably +created a table "syslog" with a single column "message". +In such a case, a good template would be "insert into +syslog(message) values ('%msg%')". With the example above, that +would be expanded to "insert into syslog(message) +values('Test')". This expanded string is then sent to the +database. It's that easy, no special magic. The only thing you must +ensure is that your template expands to a proper SQL statement and +that this statement matches your database design.

Does that mean you need to +create database schema yourself and also must fully understand +rsyslogd's properties? No, that's not needed. Because we anticipated +that folks are probably more interested in getting things going +instead of designing them from scratch. So we have provided a default +schema as well as build-in support for it. This schema also offers an +additional benefit: rsyslog is part of Adiscon's +MonitorWare product +line (which includes open source and closed +source members). All of these tools share the same default schema and +know how to operate on it. For this reason, the default schema is +also called the "MonitorWare Schema". If you use it, you +can simply add phpLogCon, +a GPLed syslog web interface, to your system +and have instant interactive access to your database. So there are +some benefits in using the provided schema.

The schema definition is contained in +the file "createDB.sql". It comes with the rsyslog package +and one can be found for each supported database type (in the plugins +directory). Review it to check that the database name is acceptable +for you. Be sure to leave the table and field names unmodified, +because otherwise you need to customize rsyslogd's default sql +template, which we do not do in this paper. Then, run the script with +your favorite SQL client. Double-check that the table was +successfully created.

It is important to note that the +correct database encoding must be used so that the database will +accept strings independend of the string encoding. This is an +important part because it can not be guarantied that all syslog +messages will have a defined character encoding. This is especially +true if the rsyslog-Server will collect messages from different +clients and different products. +

For example PostgreSQL may refuse to +accept messages if you would set the database encoding to “UTF8” +while a client is sending invalid byte sequences for that encoding. +

Database support in rsyslog is +integrated via loadable plugin modules. To use the database +functionality, the database plugin must be enabled in the config file +BEFORE the first database table action is used. This is done by +placing the

$ModLoad ommysql

directive at the begining of /etc/rsyslog.conf for +MySQL and

$ModLoad ompgsql

for +PostgreSQL.

For +other databases, use their plugin name (e.g. omoracle).

Next, we need to tell rsyslogd to write data to +the database. As we use the default schema, we do NOT need to define +a template for this. We can use the hardcoded one (rsyslogd handles +the proper template linking). So all we need to do e.g. for MySQL is +add a simple selector line to /etc/rsyslog.conf:

*.* +:ommysql:database-server,database-name,database-userid,database-password

Again, other databases have other selector names, +e.g. ":ompgsql:" instead of ":ommysql:". See the +output plugin's documentation for details.

In many cases, the database will run +on the local machine. In this case, you can simply use "127.0.0.1" +for database-server. This can be especially advisable, if you +do not need to expose the database to any process outside of the +local machine. In this case, you can simply bind it to 127.0.0.1, +which provides a quite secure setup. Of course, rsyslog also supports +remote database instances. In that case, use the remote server name +(e.g. mydb.example.com) or IP-address. The database-name by +default is "Syslog". If you have modified the default, use +your name here. Database-userid and -password are the +credentials used to connect to the database. As they are stored in +clear text in rsyslog.conf, that user should have only the least +possible privileges. It is sufficient to grant it INSERT privileges +to the systemevents table, only. As a side note, it is strongly +advisable to make the rsyslog.conf file readable by root only - if +you make it world-readable, everybody could obtain the password (and +eventually other vital information from it). In our example, let's +assume you have created a database user named "syslogwriter" +with a password of "topsecret" (just to say it bluntly: +such a password is NOT a good idea...). If your database is on the +local machine, your rsyslog.conf line might look like in this sample:

*.* +:ommysql:127.0.0.1,Syslog,syslogwriter,topsecret

Save rsyslog.conf, restart rsyslogd - and you +should see syslog messages being stored in the "systemevents" +table!

The example line stores every message +to the database. Especially if you have a high traffic volume, you +will probably limit the amount of messages being logged. This is easy +to accomplish: the "write database" action is just a +regular selector line. As such, you can apply normal selector-line +filtering. If, for example, you are only interested in messages from +the mail subsystem, you can use the following selector line:

mail.* :ommysql:127.0.0.1,syslog,syslogwriter,topsecret

Review the rsyslog.conf +documentation for details on selector lines and their filtering.

You have now completed +everything necessary to store syslog messages to the a database. +If you would like to try out a front-end, you might want to look at +phpLogCon, +which displays syslog data in a browser. As of this writing, +phpLogCon is not yet a powerful tool, but it's open source, so it +might be a starting point for your own solution.

On Reliability...

Rsyslogd writes syslog messages +directly to the database. This implies that the database must be +available at the time of message arrival. If the database is offline, +no space is left or something else goes wrong - rsyslogd can not +write the database record. If rsyslogd is unable to store a message, +it performs one retry. This is helpful if the database server was +restarted. In this case, the previous connection was broken but a +reconnect immediately succeeds. However, if the database is down for +an extended period of time, an immediate retry does not help.

Message loss in this scenario +can easily be prevented with rsyslog. All you need to do is run the +database writer in queued mode. This is now described in a generic +way and I do not intend to duplicate it here. So please be sure to +read "Handling +a massive syslog database insert rate with Rsyslog", +which describes the scenario and also includes configuration +examples.

Conclusion

With minimal effort, you can use +rsyslogd to write syslog messages to a database. You can even make it +absolutely fail-safe and protect it against database server downtime. +Once the messages are arrived there, you can interactively review and +analyze them. In practice, the messages are also stored in text files +for longer-term archival and the databases are cleared out after some +time (to avoid becoming too slow). If you expect an extremely high +syslog message volume, storing it in real-time to the database may +outperform your database server. In such cases, either filter out +some messages or used queued mode (which in general is recommended +with databases).

The method outlined in this paper +provides an easy to setup and maintain solution for most use cases.

Feedback Requested

I would appreciate feedback on this paper. If you +have additional ideas, comments or find bugs, please let +me know.

References and Additional Material

www.rsyslog.com + - the rsyslog site +
+
Paper + on Syslog Server Optimization +
+

Revision History

2005-08-02 * + Rainer + Gerhards * initial version created +
+
2005-08-03 * + Rainer + Gerhards * added references to demo site +
+
2007-06-13 * + Rainer + Gerhards * removed demo site - was torn down because too + expensive for usage count +
+
2008-02-21 * + Rainer + Gerhards * updated reliability section, can now be done with + on-demand disk queues
+
2008-02-28 * Rainer + Gerhards * added info on other databases, updated syntax to more + recent one +
+
2010-01-29 * Marc Schiffbauer * added some + PostgreSQL stuff, made wording more database generic, fixed some + typos
+

Copyright

+ + \ No newline at end of file -- cgit