From 765575e41f366135fe32ce888cfcc396617747c3 Mon Sep 17 00:00:00 2001 From: Rainer Gerhards Date: Tue, 23 Sep 2008 13:26:57 +0200 Subject: added doc on malformed messages, cause and how to work-around, to the doc set --- ChangeLog | 2 + doc/imfile.html | 8 +- doc/syslog_parsing.html | 196 ++++++++++++++++++++++++++++++++++++++++++++++++ doc/troubleshoot.html | 5 +- 4 files changed, 205 insertions(+), 6 deletions(-) create mode 100644 doc/syslog_parsing.html diff --git a/ChangeLog b/ChangeLog index acffe09e..da644d28 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,5 +1,7 @@ --------------------------------------------------------------------------- Version 3.18.5 (rgerhards), 2008-10-?? +- added doc on malformed messages, cause and how to work-around, to the + doc set - bugfix: imudp input module could cause segfault on HUP It did not properly de-init a variable acting as a linked list head. That resulted in trying to access freed memory blocks after the HUP. diff --git a/doc/imfile.html b/doc/imfile.html index 6fc6c2e2..5bdbce5c 100644 --- a/doc/imfile.html +++ b/doc/imfile.html @@ -125,10 +125,8 @@ $InputFilePollingInterval 10

[rsyslog.conf overview] [manual index] [rsyslog site]

This documentation is part of the -rsyslog -project.
-Copyright © 2008 by Rainer -Gerhards and -Adiscon. +rsyslog project.
+Copyright © 2008 by Rainer +Gerhards and Adiscon. Released under the GNU GPL version 3 or higher.

diff --git a/doc/syslog_parsing.html b/doc/syslog_parsing.html new file mode 100644 index 00000000..57da6657 --- /dev/null +++ b/doc/syslog_parsing.html @@ -0,0 +1,196 @@ + +syslog parsing in rsyslog + + +

syslog parsing in rsyslog

+

Written by Rainer Gerhards +(2008-09-23)

+

We regularly receive messages asking why rsyslog +parses this or that message incorrectly. Of course, it turns out that rsyslog does +the right thing, but the message sender does not. And also of course, this is not even +of the slightest help to the end user experiencing the problem ;). So I thought I write this +paper. It describes the problem source and shows potential solutions (aha!). +

Syslog Standardization

+The syslog protocol has not been standardized until relatively recently.The first document "smelling" a bit +like a standard is RFC 3164, which dates back +to August 2001. The problem is that this document is no real standard. It has assigned "informational" +status by the IETF which means it provides some hopefully +useful information but does not demand anything. It is impossible to "comply" to an informational +document. This, of course, doesn't stop marketing guys from telling they comply to RFC3164 and +it also does not stop some techs to tell you "this and that does not comply to RFC3164, so it is +<anybody else but them>'s fault". +

Then, there is RFC3195, which is +a real standard. In it's section 3 it makes (a somewhat questionable) reference to (informational) +RFC 3164 which may be interpreted in a way that RFC3195 standardizes the format layed out +in RFC 3164 by virtue of referencing them. So RFC3195 seems to extend its standardization +domain to the concepts layed out in RFC 3164 (which is why I tend to find that refrence +questionable). In that sense, RFC3195 standardizes the format informationally described in +RFC3164, Section 4. But it demands it only for the scope of RFC3195, which is syslog over +BEEP - and NOT syslog over UDP. So one may argue whether or not the RFC3164 format could +be considered a standard for any non-BEEP (including UDP) syslog, too. In the strict view +I tend to have, it does not. Refering to the RFC3195 context usually does not help, +because there are virtually no RFC3195 implementations available (at this time, +I would consider this RFC a failure). +

Now let's for a short moment assume that RFC3195 would somehow be able to demand +RFC3164 format for non-BEEP syslog. So we could use RFC3164 format as a standard. But does +that really help? Let's cite RFC 3164, right at the begining of section 4 (actually, this +is the first sentence): +

+
+   The payload of any IP packet that has a UDP destination port of 514
+   MUST be treated as a syslog message. 
+
+
+

Think a bit about it: this means that whatever is send to port 514 must be considered +a valid syslog message. No format at all is demanded. So if "this is junk" is sent to +UDP port 514 - voila, we have a valid message (interestingly, it is no longer a syslog +message if it is sent to port 515 ;)). You may now argue that I am overdoing. So let's +cite RFC 3164, Section 5.4, Example 2: +

+
+  Example 2
+
+        Use the BFG!
+
+   While this is a valid message, it has extraordinarily little useful
+   information.
+
+
+

As you can see, RFC3164 explicitely states that no format at all is required. +

Now a side-note is due: all of this does not mean that the RFC3164 authors +did not know what they were doing. No, right the contrary is true: RFC3164 mission +is to describe what has been seen in practice as syslog messages and the +conclusion is quite right that there is no common understanding on the +message format. This is also the reason why RFC3164 is an informational document: +it provides useful information, but does not precisely specify anything. +

After all of this bashing, I now have to admit that RFC3164 has some format +recommendations layed out in section 4. The format described has quite some +value in it and implementors recently try to follow it. This format is usually meant +when someone tells you that a software is "RFC3164 compliant" or expects "RFC3164 compliant messages". +I also have to admit that rsyslog also uses this format and, in the sense outlined here, +expects messages received to be "RFC3164 compliant" (knowingly that such a beast does not +exist - I am simply lying here ;)). +

Please note that there is some relief of the situation in reach. There is a new normative +syslog RFC series upcoming, and it specifies a standard message format. At the time of +this writing, the main documents are sitting in the RFC editor queue waiting for a transport +mapping to be completed. I personally expect them to be assigned RFC numbers in 2009. +

Practical Format Requirements

+

From a practical point of view, the message format expected (and generated by +default in legacy mode) is: +


+<PRI>TIMESTAMP SP HOST SP TAG MSG(Freetext)
+
+

SP is the ASCII "space" character and the definition of the rest of the fields +can be taken from RFC3164. Please note that there also is a lot of confusion on what +syntax and semantics the TAG actually has. This format is called "legacy syslog" because +it is not well specified (as you know by now) and has been "inherited from the real world". +

Rsyslog offers two parsers: one for the upcoming RFC series and one for legacy format. We +concentrate on the later. That parser applies some logic to detect missing hostnames, +is able to handle various ways the TIMESTAMP is typically malformed. In short it applies +a lot of guesswork in trying to figure out what a message really means. I am sure the +guessing algorithm can be improved, and I am always trying that when I see new malformed +messages (and there is an ample set of them...). However, this finds its limits where +it is not possible to differentiate between two entities which could be either. +For example, look at this message: +


+<144>Tue Sep 23 11:40:01 taghost sample message
+
+

Does it contain a hostname? Mabye. The value "taghost" is a valid hostname. Of course, it is +also a valid tag. If it is a hostname, the tag's value is "sample" and the msg value is "message". +Or is the hostname missing, the tag is "taghost" and msg is "sample message"? As a human, I tend +to say the later interpretation is correct. But that's hard to tell the message parser (and, no, I do +not intend to apply artificial intelligence just to guess what the hostname value is...). +

One approach is to configure the parser so that it never expects hostnames. This becomes problematic +if you receive messages from multiple devices. Over time, I may implement parser conditionals, +but this is not yet available and I am not really sure if it is needed comlexity... +

Things like this, happen. Even more scary formats happen in practice. Even from mainstream +vendors. For example, I was just asked about this message (which, btw, finally made me +write this article here): +


+"<130> [ERROR] iapp_socket_task.c 399: iappSocketTask: iappRecvPkt returned error"
+
+

If you compare it with the format RFC3164 "suggests", you'll quickly notice that +the message is "a bit" malformed. Actually, even my human intelligence is not sufficient +to guess if there is a TAG or not (is "[ERROR]" a tag or part of the message). I may not be +the smartest guy, but don't expect me to program a parser that is smarter than me. +

To the best of my konwledge, these vendor's device's syslog format can be configured, so it +would proabably be a good idea to include a (sufficiently well-formed) timestamp, +the sending hostname and (maybe?) a tag to make this message well parseable. +I will also once again take this sample and see if we can apply some guesswork. +For example, "[" can not be part of a well-formed TIMESTAMP, so logic can conclude +there is not TIMESTAMP. Also, "[" can not be used inside a valid hostname, so +logic can conclude that the message contains no hostname. Even if I implement this +logic (which I will probably do), this is a partial solution: it is impossible to +guess if there is a tag or not (honestly!). And, even worse, it is a solution only for +those set of messages that can be handled by the logic described. Now consider this +hypothetical message: +


+"<130> [ERROR] host.example.net 2008-09-23 11-40-22 PST iapp_socket_task.c 399: iappSocketTask: iappRecvPkt returned error"
+
+

Obviously, it requires additional guesswork. If we iterate over all the cases, we +can very quickly see that it is impossible to guess everything correct. In the example above +we can not even surely tell if PST should be a timezone or some other message property. +

A potential solution is to generate a parser-table based parser, but this requires +considerable effort and also has quite some runtime overhead. I try to avoid this for +now (but I may do it, especially if someone sponsors this work ;)). Side-note: if you want +to be a bit scared about potential formats, you may want to have a look at my paper +"On the Nature of Syslog Data". +

Work-Around

+

The number one work-around is to configure your devices so that they emit +(sufficiently) well-formed messages. You should by now know what these look +like. +

If that cure is not available, there are some things you can do in rsyslog to +handle the situation. First of all, be sure to read about +rsyslog.conf format +and the property replacer and properties specifically. +You need to understand that everything is configured in rsyslog. And that the message is parsed +into properties. There are also properties available which do not stem back directly to parsing. +Most importantly, %fromhost% property holds the name of the system rsyslog received +the message from. In non-relay cases, this can be used instead of hostname. In relay cases, +there is no cure other than to either fix the orginal sender or at least one of the +relays in front of the rsyslog instance in question. Similarly, you can use %timegenerated% +instead of %timereported%. Timegenerated is the time the message hit rsyslog for the first +time. For non-relayed, locally connected peers, Timegenerated should be a very close approximation +of the actual time a message was formed at the sender (depending, of course, on potential +internal queueing inside the sender). +Also, you may use the +%rawmsg% property together with the several extraction modes the property replacer supports. +Rawmsg contains the message as it is received from the remote peer. In a sense, you can +implement a post-parser with this method. +

To use these properties, you need to define your own templates and assign them. Details +can be found in the above-quoted documentation. Just let's do a quick example. Let's say +you have the horrible message shown above and can not fix the sending device for +some good reason. In rsyslog.conf, you used to say: +


+*.* /var/log/somefile
+
+

Of course, things do not work out well with that ill-formed message. So you decide +to dump the rawmsg to the file and pull the remote host and time of message generation +from rsyslog's internal properties (which, btw, is clever, because otherwise there is no +indication of these two properties...). So you need to define a template for that and +make sure the template is used with your file logging action. This is how it may look: +


+$template, MalfromedMsgFormater,"%timegenerated% %fromhost% %rawmsg:::drop-last-lf%\n"
+*.* /var/log/somefile;MalformedMsgFormatter
+
+

This will make your log much nicer, but not look perfect. Experiment a bit +with the available properties and replacer extraction options to fine-tune it +to your needs. +

Wrap-Up

+

Syslog message format is not sufficiently standardized. There exists a weak +"standard" format, which is used by a good number of implementations. However, there +exist many others, including mainstream vendor implementations, which have a +(sometimes horribly) different format. Rsyslog tries to deal with anomalies but +can not guess right in all instances. If possible, the sender should be configured +to submit well-formed messages. If that is not possible, you can work around these +issues with rsyslog's property replacer and template system. +

I hope this is a useful guide. You may also have a look at the +rsyslog troubleshooting guide for further help and places where +to ask questions. +

[manual index] [rsyslog site]

+

This documentation is part of the +rsyslog project.
+Copyright © 2008 by Rainer +Gerhards and Adiscon. +Released under the GNU GPL version 3 or higher.

+ diff --git a/doc/troubleshoot.html b/doc/troubleshoot.html index f2e9206b..b1e9c4ae 100644 --- a/doc/troubleshoot.html +++ b/doc/troubleshoot.html @@ -5,13 +5,16 @@

Having trouble with rsyslog? This page provides some tips on where to look for help and what to do if you need to ask for assistance. This page is continously being expanded. -

Useful troublehshooting ressources are: +

Useful troubleshooting ressources are:

Asking for Help

If you can't find the answer yourself, you should look at these places for -- cgit