Handling a massive syslog database insert rate with Rsyslog

Written by Rainer Gerhards (2008-01-28)

Abstract

In this paper, I describe how log massive amounts of syslog messages to a database. This HOWTO is currently under development and thus a bit brief. Updates are promised ;).

The Intention

Database updates are inherently slow when it comes to storing syslog messages. However, there are a number of applications where it is handy to have the message inside a database. Rsyslog supports native database writing via output plugins. As of this writing, there are plugins available for MySQL an PostgreSQL. Maybe additional plugins have become available by the time you read this. Be sure to check.

In order to successfully write messages to a database backend, the backend must be capable to record messages at the expected average arrival rate. This is the rate if you take all messages that can arrive within a day and divide it by 86400 (the number of seconds per day). Let's say you expect 43,200,000 messages per day. That's an average rate of 500 messages per second (mps). Your database server MUST be able to handle that amount of message per second on a sustained rate. If it doesn't, you either need to add an additional server, lower the number of message - or forget about it.

However, this is probably not your peak rate. Let's simply assume your systems work only half a day, that's 12 hours (and, yes, I know this is unrealistic, but you'll get the point soon). So your average rate is actually 1,000 mps during work hours and 0 mps during non-work hours. To make matters worse, workload is not divided evenly during the day. So you may have peaks of up to 10,000mps while at other times the load may go down to maybe just 100mps. Peaks may stay well above 2,000mps for a few minutes.

So how the hack you will be able to handle all of this traffic (including the peaks) with a database server that is just capable of inserting a maximum of 500mps?

The key here is buffering. Messages that the database server is not capable to handle will be buffered until it is. Of course, that means database insert are NOT real-time. If you need real-time inserts, you need to make sure your database server can handle traffic at the actual peak rate. But lets assume you are OK with some delay.

Buffering is fine. But how about these massive amounts of data? That can't be hold in memory, so don't we run out of luck with buffering? The key here is that rsyslog can not only buffer in memory but also buffer to disk (this may remind you of "spooling" which gets you the right idea). There are several queuing modes available, offering differnent throughput. In general, the idea is to buffer in memory until the memory buffer is exhausted and switch to disk-buffering when needed (and only as long as needed). All of this is handled automatically and transparently by rsyslog.

With our above scenario, the disk buffer would build up during the day and rsyslog would use the night to drain it. Obviously, this is an extreme example, but it shows what can be done. Please note that queue content survies rsyslogd restarts, so even a reboot of the system will not cause any message loss.

How To Setup

Frankly, it's quite easy. You just need to do is instruct rsyslog to use a disk queue and then configure your action. There is nothing else to do. With the following simple config file, you log anything you receive to a MySQL database and have buffering applied automatically.

Note that you can modify a lot of queue performance parameters, but the above config will get you going with default values. If you consider using this on a real busy server, it is strongly recommended to invest some time in setting the tuning parameters to appropriate values.

Feedback requested

I would appreciate feedback on this tutorial. If you have additional ideas, comments or find bugs (I *do* bugs - no way... ;)), please let me know.

Revision History

Copyright

Copyright (c) 2008 Rainer Gerhards and Adiscon.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license can be viewed at http://www.gnu.org/copyleft/fdl.html.