This caused truncated logs in buildbot for tasks with a lot of debug
output.
I've spent most of today hunting for the reason why taskotron.log seems to get cropped for certain tasks, like this:
http://taskotron-dev.fedoraproject.org/taskmaster/builders/x86_64/builds/9063
If you look at the timestamps, you can easily see that taskotron.log suddenly stops at the last line, but stdio
continues showing logging messages. After many failed theories, I logged to the buildslave client and watched
an upgradepath run live. At certain point, the taskotron.log got overwritten with a new content. And then I saw this:
[kparal@taskotron-client28 ~]$ ll /var/log/taskotron total 2820 -rw-r--r--. 1 buildslave buildslave 175746 Apr 17 12:08 taskotron.log -rw-r--r--. 1 buildslave buildslave 499699 Apr 17 12:02 taskotron.log.1 -rw-r--r--. 1 buildslave buildslave 1109526 Apr 17 11:42 taskotron.log.2 -rw-r--r--. 1 buildslave buildslave 97944 Apr 17 11:42 taskotron.log.3 -rw-r--r--. 1 buildslave buildslave 498420 Apr 17 11:29 taskotron.log.4 -rw-r--r--. 1 buildslave buildslave 499992 Apr 17 09:28 taskotron.log.5
I suspected logrotate, then buildbot, but it turned out to be us. See the patch. When the log file got larger than
500kB (happened for upgradepath with a lot of scheduled updates), it got rotated, and that cut off buildbot file tracking.
I disabled log rotation completely. It is only enabled in the production mode, where we don't need it. If somebody uses it
in developer mode as well, they can still install the tmpfiles.d config file we have provided, and unused logs will be deleted
after a week. If that's not sufficient, they can always configure logrotate to handle that file, or we can prepare a snippet
for them.
With this fixed, we will be hopefully able to better debug some of those timeout issues we were seeing, but we missing
the debug messages for it (because they got rotated).