logging: stop rotating taskotron.log
ClosedPublic

Authored by kparal on Apr 17 2015, 1:55 PM.

Details

Summary

This caused truncated logs in buildbot for tasks with a lot of debug
output.


I've spent most of today hunting for the reason why taskotron.log seems to get cropped for certain tasks, like this:
http://taskotron-dev.fedoraproject.org/taskmaster/builders/x86_64/builds/9063

If you look at the timestamps, you can easily see that taskotron.log suddenly stops at the last line, but stdio
continues showing logging messages. After many failed theories, I logged to the buildslave client and watched
an upgradepath run live. At certain point, the taskotron.log got overwritten with a new content. And then I saw this:

[kparal@taskotron-client28 ~]$ ll /var/log/taskotron
total 2820
-rw-r--r--. 1 buildslave buildslave  175746 Apr 17 12:08 taskotron.log
-rw-r--r--. 1 buildslave buildslave  499699 Apr 17 12:02 taskotron.log.1
-rw-r--r--. 1 buildslave buildslave 1109526 Apr 17 11:42 taskotron.log.2
-rw-r--r--. 1 buildslave buildslave   97944 Apr 17 11:42 taskotron.log.3
-rw-r--r--. 1 buildslave buildslave  498420 Apr 17 11:29 taskotron.log.4
-rw-r--r--. 1 buildslave buildslave  499992 Apr 17 09:28 taskotron.log.5

I suspected logrotate, then buildbot, but it turned out to be us. See the patch. When the log file got larger than
500kB (happened for upgradepath with a lot of scheduled updates), it got rotated, and that cut off buildbot file tracking.

I disabled log rotation completely. It is only enabled in the production mode, where we don't need it. If somebody uses it
in developer mode as well, they can still install the tmpfiles.d config file we have provided, and unused logs will be deleted
after a week. If that's not sufficient, they can always configure logrotate to handle that file, or we can prepare a snippet
for them.

With this fixed, we will be hopefully able to better debug some of those timeout issues we were seeing, but we missing
the debug messages for it (because they got rotated).

Test Plan

test suite works. I tried saving over 3MB of text into taskotron.log and didn't get rotated (it got rotated
before the patch).

Diff Detail

Repository
rLTRN libtaskotron
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
kparal retitled this revision from to logging: stop rotating taskotron.log.Apr 17 2015, 1:55 PM
kparal updated this object.
kparal edited the test plan for this revision. (Show Details)
kparal added reviewers: tflink, jskladan, mkrizek.
mkrizek accepted this revision.Apr 17 2015, 5:53 PM

Nice find. With the current approach/hack when we delete taskotron.log before a task execution, we don't need log rotation for now. Looks good to me.

BTW If the log rotated up to taskotron.log.5, I wonder if we make too much noise in outputs.

This revision is now accepted and ready to land.Apr 17 2015, 5:53 PM
In D339#6180, @mkrizek wrote:

BTW If the log rotated up to taskotron.log.5, I wonder if we make too much noise in outputs.

That seems to be caused by different runs. Those suffix number get bumped every time a log rotation happens, to a maximum number of 5.

But yes, we do produce a lot of output in the debug log. If there are 500 updates to check, and we log every koji and bodhi request, it's a lot of lines. On the other, it's the debug log, it's supposed to be detailed, and it helps us a lot when debugging problems. I think we currently have a reasonable verbosity level.

This revision was automatically updated to reflect the committed changes.