summaryrefslogtreecommitdiffstats
path: root/doc/troubleshoot.html
blob: 0f0c7fcae2528020e3938cc7c3ee8d364d6d725d (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html><head><title>troubleshooting rsyslog</title></head>
<body>
<h2>troubleshooting rsyslog</h2>
<p><b>Having trouble with <a href="http://www.rsyslog.com">rsyslog</a>?</b>
This page provides some tips on where to look for help and what to do
if you need to ask for assistance. This page is continously being expanded.
<p>Useful troubleshooting ressources are:
<ul>
<li>The <a href="http://www.rsyslog.com/doc">rsyslog documentation</a> - note that the online version always covers
the most recent development version. However, there is a version-specific
doc set in each tarball. If you installed rsyslog from a package, there usually
is a rsyslog-doc package, that often needs to be installed separately.
<li>The <a href="http://wiki.rsyslog.com">rsyslog wiki</a> provides user tips and experiences.
<li>Check <a href="http://bugzilla.adiscon.com">the bugzilla</a> to see if your problem is a known
(and even fixed ;)) bug.
</ul>
<p><b>Malformed Messages and Message Properties</b>
<p>A common trouble source are <a href="syslog_parsing.html">ill-formed syslog messages</a>, which
lead to to all sorts of interesting problems, including malformed hostnames and dates.
Read the quoted guide to find relief. A common symptom is that the %HOSTNAME% property is
used for generating dynafile names, but some glibberish shows up. This is caused by the
malformed syslog messages, so be sure to read the 
<a href="syslog_parsing.html">guide</a> if you face that problem. Just let me add that the
common work-around is to use %FROMHOST% or %FROMHOST-IP% instead. These do not take the
hostname from the message, but rather use the host that sent the message (taken from
the socket layer). Of course, this does not work over NAT or relay chains, where the
only cure is to make sure senders emit well-formed messages.
<p><b>Configuration Problems</b>
<p>Rsyslog 3.21.1 and above has been enhanced to support extended configuration checking.
It offers a special command line switch (-N1) that puts it into "config verfication mode".
In that mode, it interprets and check the configuration file, but does not startup. This
mode can be used in parallel to a running instance of rsyslogd.
<p>To enable it, run rsyslog interactively as follows:
<p><b><i>/path/to/rsyslogd -f/path/to/config-file -N1</i></b>
<p>You should also specify other options you usually give (like -c3 and whatever else).
Any problems experienced are reported to stderr [aka "your screen" (if not redirected)].
<p><b>Configuration Graphs</b>
<p>Starting with rsyslog 4.3.1, the
&quot;<a href="rsconf1_generateconfiggraph.html">$GenerateConfigGraph</a>&quot;
command is supported, a very valuable troubleshooting tool. It permits to
generate a graph of how rsyslogd understood its configuration file. It is assumed that
many configuration issues can easily be detected just by looking at the configuration graph.
Full details of how to generate the graphs, and what to look for can be found in the
&quot;<a href="rsconf1_generateconfiggraph.html">$GenerateConfigGraph</a>&quot;
manual page.
<p><b>Asking for Help</b>
<p>If you can't find the answer yourself, you should look at these places for
community help.
<ul>
<li>The <a href="http://kb.monitorware.com/rsyslog-f40.html">rsyslog forum</a>. This is
the preferred method of obtaining support.
<li>The <a href="http://lists.adiscon.net/mailman/listinfo/rsyslog">rsyslog mailing list</a>.
This is a low-volume list which occasional gets traffic spikes.
The mailing list is probably a good place for complex questions.
</ul>
<p><b>Debug Log</b>
<p>If you ask for help, there are chances that we need to ask for an rsyslog debug log.
The debug log is a detailled report of what rsyslog does during processing. As such, it may
even be useful for your very own troubleshooting. People have seen things inside their debug
log that enabled them to find problems they did not see before. So having a look at the
debug log, even before asking for help, may be useful.
<p>Note that the debug log contains most of those things we consider useful. This is a lot
of information, but may still be too few. So it sometimes may happen that you will be asked
to run a specific version which has additional debug output. Also, we revise from time to
time what is worth putting into the standard debug log. As such, log content may change
from version to version. We do not guarantee any specific debug log contents, so do not
rely on that. The amount of debug logging can also be controlled via some environment
options. Please see <a href="debug.html">debugging support</a> for further details.
<p>In general, it is advisable to run rsyslogd in the foreground to obtain the log.
To do so, make sure you know which options are usually used when you start rsyslogd
as a background daemon. Let's assume "-c3" is the only option used. Then, do the following:
<ul>
<li>make sure rsyslogd as a daemon is stopped (verify with ps -ef|grep rsyslogd)
<li>make sure you have a console session with root permissions
<li>run rsyslogd interactively: /sbin/rsyslogd ..your options.. -dn &gt; logfile
<br>where "your options" is what you usually use. /sbin/rsyslogd is the full path
to the rsyslogd binary (location different depending on distro).
In our case, the command would be
<br>/sbin/rsyslogd -c3 -dn &gt; logfile
<li>press ctrl-C when you have sufficient data (e.g. a device logged a record)
<br><b>NOTE: rsyslogd will NOT stop automatically - you need to ctrl-c out of it!</b>
<li>Once you have done all that, you can review logfile. It contains the debug output.
<li>When you are done, make sure you re-enable (and start) the background daemon!
</ul>
<p>If you need to submit the logfile, you may want to check if it contains any
passwords or other sensitive data. If it does, you can change it to some <b>consistent</b>
meaningless value. <b>Do not delete the lines</b>, as this renders the debug log
unusable (and makes Rainer quite angry for wasted time, aka significantly reduces the chance
he will remain motivated to look at your problem ;)). For the same reason, make sure
whatever you change is change consistently. Really!
<p>Debug log file can get quite large. Before submitting them, it is a good idea to zip them.
Rainer has handled files of around 1 to 2 GB. If your's is larger ask before submitting. Often, 
it is sufficient to submit the first 2,000 lines of the log file and around another 1,000 around
the area where you see a problem. Also,
ask you can submit a file via private mail. Private mail is usually a good way to go for large files
or files with sensitive content. However, do NOT send anything sensitive that you do not want
the outside to be known. While Rainer so far made effort no to leak any sensitive information,
there is no guarantee that doesn't happen. If you need a guarantee, you are probably a
candidate for a <a href="professional_support.html">commercial support contract</a>. Free support
comes without any guarantees, include no guarantee on confidentiality
[aka "we don't want to be sued for work were are not even paid for ;)].
<b>So if you submit debug logs, do so at your sole risk</b>. By submitting them, you accept
this policy.
<p><b>Segmentation Faults</b>
<p>Rsyslog has a very rapid development process, complex capabilities and now gradually gets
more and more exposure. While we are happy about this, it also has some bad effects: some
deployment scenarios have probably never been tested and it may be impossible to test
them for the development team because of resources needed. So while we try to avoid this,
you may see a serious problem during deployments in demanding, non-standard, environments
(hopefully not with a stable version, but chances are good you'll run into troubles with
the development versions).
<p>Active support from the user base is very important to help us track down those things.
Most often, serious problems are the result of some memory misadressing. During development,
we routinely use valgrind, a very well and capable memory debugger. This helps us to create
pretty clean code. But valgrind can not detect everything, most importantly not code pathes 
that are never executed. So of most use for us is information about aborts and abort locations.
<p>Unforutnately, faults rooted in adressing errors typically show up only later, so the
actual abort location is in an unrelated spot. To help track down the original spot,
<a href="http://www.gnu.org/software/hello/manual/libc/Heap-Consistency-Checking.html">libc
later than 5.4.23 offers support</a> for finding, and possible temporary relief from it,
by means of the MALLOC_CHECK_ environment variable. Setting it to 2 is a useful troubleshooting
aid for us. It will make the program abort as soon as the check routines detect anything
suspicious (unfortunately, this may still not be the root cause, but hopefully closer to it).
Setting it to 0 may even make some problems disappear (but it will NOT fix them!). 
With functionality comes cost, and so exporting MALLOC_CHECK_ without need comes at
a performance penalty. However, we strongly recommend adding this instrumentation to your
test environment should you see any serious problems. Chances are good it will help us
interpret a dump better, and thus be able to quicker craft a fix.
<p>In order to get useful information, we need some backtrace of the abort. First, you need
to make sure that a core file is created. Under Fedora, for example, that means you need
to have an "ulimit -c unlimited" in place.
<p>Now let's assume you got a core file (e.g. in /core.1234). So what to do next? Sending a
core file to us is most often pointless - we need to have the exact same system configuration in
order to interpret it correctly. Obviously, chances are extremely slim for this to be. So we would
appreciate if you could extract the most important information. This is done as follows:
<ul>
<li>$gdb /path/to/rsyslogd
<li>$info thread
<li>you'll see a number of threads (in the range 0 to n with n being the highest number). For
    <b>each</b> of them, do the following (let's assume that i is the thread number):
    <ul>
    <li>$ thread i  (e.g. thread 0, thread 1, ...)
    <li>$bt
    </ul>
<li>then you can quit gdb with "$q"
</ul>
<p>Then please send all information that gdb spit out to the development team. It is best to first
ask on the forum or mailing list on how to do that. The developers will keep in contact with you
and, I fear, will probably ask for other things as well ;)
<p>Note that we strive for highest reliability of the engine even in unusual deployment scenarios.
Unfortunately, this is hard to achieve, especially with limited resources. So we are depending on
cooperation from users. This is your chance to make a big contribution to the project without the
need to program or do anything else except get a problem solved ;)
<p>[<a href="manual.html">manual index</a>]
[<a href="http://www.rsyslog.com/">rsyslog site</a>]</p>
<p><font size="2">This documentation is part of the
<a href="http://www.rsyslog.com/">rsyslog</a> project.<br>
Copyright &copy; 2008-2010 by <a href="http://www.gerhards.net/rainer">Rainer Gerhards</a> and
<a href="http://www.adiscon.com/">Adiscon</a>. Released under the GNU GPL 
version 3 or higher.</font></p>
</body>
</html>