Init logger the same way in disaposable and non-disposable modes
ClosedPublic

Authored by mkrizek on Jun 9 2015, 11:47 AM.

Details

Summary

This patch allows logger initialization to occur as early as possible.
Also logs are being written to <artifacts_dir>/<uuid>/<uuid>.log instead
of /var/log/taskotron.log in both modes. When running in disposable
clients mode, the initiator log is named <uuid>-initiator.log.

Test Plan

Tests pass

Diff Detail

Repository
rLTRN libtaskotron
Branch
feature/T495-fix-logtest
Lint
Lint OK
Unit
No Unit Test Coverage
mkrizek retitled this revision from to Init logger the same way in disaposable and non-disposable modes.Jun 9 2015, 11:47 AM
mkrizek updated this object.
mkrizek edited the test plan for this revision. (Show Details)
mkrizek added reviewers: tflink, kparal, jskladan.
kparal added inline comments.Jun 9 2015, 1:35 PM
libtaskotron/runner.py
228

This needs some research whether it is safe to do during execution. It is moving the log file while being used.

315

If I understand this change properly, we've basically stopped using /var/log/taskotron.log, and always include <uuid.log> in the artifacts dir. This will need some documentation changes, I assume. It will most probably also need buildbot step changes. And maybe some tmpfiles.d config changes. And of course some of the config options no longer apply (logdir, log_name). I'm not saying they are needed now, but we should file a ticket to make sure we don't forget about them :)

316

If we hardcode filelog=True and filelog_path=path, the config options conf.log_file_enabled and os.path.join(conf.logdir, conf.log_name) no longer apply. So this should probably be different for local and remote execution. Which again requires us to parse the formula before initializing the logging...

Alternatively, we could deprecate those config options and log always and into a standardized location (always artifactsdir/uuid/). The original idea of logging only in production mode (and having it configurable) was, I think, that we don't want to spam taskotron.log during development (writing a task consists of re-running it over and over again). But having it clearly separated per task in artifactsdir/uuid/ might change the situation and we can consider logging always, into artifactsdir. What do you think?

tflink added inline comments.Jun 9 2015, 2:07 PM
libtaskotron/runner.py
228

I'd prefer to change the path or name of the file that we're grabbing which might conflict - changing the path of the logfile like that seems risky to me. At the least, it seems like we'd have the potential for log information to be split between files and new information appended to the downloaded logfile, leading to some confusion. If I'm just misunderstanding, please correct me

316

My vote would be for keeping the logging config options that we already have unless there's a good reason to drop those options.

mkrizek added inline comments.Jun 9 2015, 2:38 PM
libtaskotron/runner.py
228

I see you concern, but if we rename the file we're grabbing we won't have the same log file naming when running in disposable and non-disposable mode. This way it's always artifacts_dir/<uuid>/<uuid>.log for the machine that actually runs the task.

mkrizek added inline comments.Jun 9 2015, 2:41 PM
libtaskotron/runner.py
228

For what it's worth, it works on my machine :)

Can somebody confirm that renaming a file while it's being written to does not actually rename it until it's first closed?

kparal added inline comments.Jun 10 2015, 8:45 AM
libtaskotron/runner.py
228

I would also like to keep the consistency here. I want to be able to say <uuid>.log always contains the debug output from your task. I don't want to say <uuid>.log contains the debug output from your task when run in local mode, but it is <uuid>-client.log when run remote mode (official taskotron infrastructure).

Can somebody confirm that renaming a file while it's being written to does not actually rename it until it's first closed?

The rename always works on Linux (it won't work on Windows, but we don't care). Also, as long as the file handler is kept open, the new data will be written to the "new" (freshly renamed) file. The problem would occur if the logging framework for some reason released the file handle and then tried to open the file again. Then the data would get written to a file with the old name. This could happen for example when rotating logs (we have it disabled at the moment, so it shouldn't happen).

I also see this as potentially risky. OTOH, as long as we keep it as the very last operation in the runner process (it seems to be right now), I don't see that risk too great. If we still want to be safe, we need to initialize file logging after we know the correct name of the log file, i.e. after we know if we run remotely, i.e. after we parse the task formula (as the current code stands). The easiest way here seems to be to pull something like logger.set_file_log() out of logger.init() and make it standalone. We will then be able to configure the file log even after logger.init() is called. We can even make it smarter and allow to set the file log in both methods. If set_file_log() is called and some file log is already configured, it will disable the handler, close the file, rename it, and then re-enable the handler again. It might sound complicated, but I think it should be fairly easy (I can do it, if you want).

316

We will probably need to drop conf.log_name='taskotron.log' in any case. We used to run just a single task at a moment. Now we will be running multiple tasks at once (multiple "initiator" runs on a single buildbot slave). So we can no longer use single taskotron.log and log into it from several processes, that would be a mess. I like the idea of using <uuid>.log.

Now the question is whether to use existing /var/log/taskotron/, or /var/lib/taskotron/artifacts/<uuid>/ for that. If we choose the former one, we're more consistent with our past, but the task output is effectively split into two places. If we choose the latter one, we've changed the log location completely, but the benefit is that all the task files are grouped in a single place, which is an idea I quite like. The fact that you have a single directory containing everything important (task standard output, task debug output, custom task artifacts, initiator logs), and you can easily manage it (copy it, archive it, delete it) without hunting for pieces in other system directories (/var/log/taskotron) is quite appealing to me.

We can still keep conf.log_file_enabled and we can still default to False in developer mode, so that very long debug files are not generated during development, unless configured. And we can still keep conf.logdir='/var/log/taskotron' if we think it will be useful for something else in the future. So the only casualty could be conf.log_name. Do you think that's something that needs to stay configurable? I can't come up with any reasons. We have other hardcoded names like task_output/ (and that's actually harcoded in a buildbot step, not in libtaskotron). For the same reason, I would remove conf.remote_stdio_name='stdio.log' and have it hardcoded as well, for simplicity and consistency. Then we can always rely on those names being the same, and we can adjust e.g. resultsdb frontend to distinguish those names from other output files, without having to synchronize configuration between many tools. My thinking is that e.g. in autotest you also could configure the path where the directories got created, but then you had dozens of specific files inside that directory and you weren't able to configure the name of each one of them, that would be excessive.

jskladan added inline comments.Jun 11 2015, 11:16 AM
libtaskotron/runner.py
316

Now the question is whether to use existing /var/log/taskotron/, or /var/lib/taskotron/artifacts/<uuid>/ for that. If we choose the former one, we're more consistent with our past, but the task output is effectively split into two places. If we choose the latter one, we've changed the log location completely, but the benefit is that all the task files are grouped in a single place, which is an idea I quite like. The fact that you have a single directory containing everything important (task standard output, task debug output, custom task artifacts, initiator logs), and you can easily manage it (copy it, archive it, delete it) without hunting for pieces in other system directories (/var/log/taskotron) is quite appealing to me.

I'd just want to point out that going with just /var/lib/taskotron/artifacts is actually kind of hunting the logs in the end, as "the general user" expects logs to be in /var/log. That is not saying that I'm against having the logs in one place, just please make sure to symlink /var/lib/taskotron/artifacts to /var/log/taskotron dir, so it is easily discoverable.

I'm not really understanding why the kind of consistency you're talking about here is valueable and worth doing. True, the actual task execution logs would always be in the same place but I fail to understand why that's a good thing. If I have a task that is executed with a remote client, why shouldn't the log/artifact file structure reflect how it was actually executed instead of trying to pretend that "it's all the same, nothing to see here, ignore the man behind the curtain".

It just feels like we're trying to add complexity and potential failure vectors (that I suspect would be a whole barrel full of monkeys to triage if they ever did happen) for very little real advantage. If I'm missing something here, can someone point out what that is? Why would we not want to have the taskotron.log file to reflect what is being done on the local machine and some other file (artifacts/remote-taskotron.log, remote/taskotron.log, etc.) to represent the elements of execution which happened on another machine?

In D382#7152, @tflink wrote:

I'm not really understanding why the kind of consistency you're talking about here is valueable and worth doing. True, the actual task execution logs would always be in the same place but I fail to understand why that's a good thing. If I have a task that is executed with a remote client, why shouldn't the log/artifact file structure reflect how it was actually executed instead of trying to pretend that "it's all the same, nothing to see here, ignore the man behind the curtain".

I think there's really nothing more to it than I described in D382#7050, so you have the full picture. I wanted simplicity in documentation, so that people looking for debug logs always know where to look for it. Also, if we wanted to provide a simple link to debug log from the ResultDB/ExecDB frontend in the future, it would be much easier if we had a static file name regardless of the method of execution. It might also introduce some difficulty when grepping the logs and looking for something. I agree that it adds a man behind the curtain.

It just feels like we're trying to add complexity and potential failure vectors (that I suspect would be a whole barrel full of monkeys to triage if they ever did happen) for very little real advantage. If I'm missing something here, can someone point out what that is? Why would we not want to have the taskotron.log file to reflect what is being done on the local machine and some other file (artifacts/remote-taskotron.log, remote/taskotron.log, etc.) to represent the elements of execution which happened on another machine?

Do you think those reasons described above are not real benefits, or that they are not worth the risk/benefit ratio? My initial impression was that the risk isn't that great, esp. when we don't move logs live, but start logging to the correctly named file from the beginning. But if you believe otherwise, I'm not going to dispute that much. I think both approaches are good enough here.

libtaskotron/runner.py
316

Would /var/log/taskotron/README be sufficient to fix discoverability?

If not, which logs would you like to see in /var/log/taskotron? We're going to have multiple log files now - local taskotron.log, remote taskotron.log, task stdio.log (only when running remotely). How would you like to see that structured and named? Subdirectories <uuid>/ containing symlinks to the log files? Otherwise?

In D382#7166, @kparal wrote:
In D382#7152, @tflink wrote:

I'm not really understanding why the kind of consistency you're talking about here is valueable and worth doing. True, the actual task execution logs would always be in the same place but I fail to understand why that's a good thing. If I have a task that is executed with a remote client, why shouldn't the log/artifact file structure reflect how it was actually executed instead of trying to pretend that "it's all the same, nothing to see here, ignore the man behind the curtain".

I think there's really nothing more to it than I described in D382#7050, so you have the full picture. I wanted simplicity in documentation, so that people looking for debug logs always know where to look for it. Also, if we wanted to provide a simple link to debug log from the ResultDB/ExecDB frontend in the future, it would be much easier if we had a static file name regardless of the method of execution. It might also introduce some difficulty when grepping the logs and looking for something. I agree that it adds a man behind the curtain.

Sorry, I managed to misread the other comments. I guess I'm of two minds here - one part of me agrees that it would be nice to have a single directory which contains all relevant output from the initiator and executor in one place. However, the admin part of me would rather see a single log file (for the initiator at least) in a static, known place so that monitoring is easier.

I'm almost wondering if duplicating the taskotron.log data would work better here - put one copy of the output into the task's artifacts dir but also have a "global" log that's put into a file that behaves similarly to how we're currently doing taskotron.log. That way, the user-facing concerns and the ease of administration concerns would both be addressed. As long as we're reasonably good about purging old log files, storage space consumption shouldn't be too much of an issue.

It just feels like we're trying to add complexity and potential failure vectors (that I suspect would be a whole barrel full of monkeys to triage if they ever did happen) for very little real advantage. If I'm missing something here, can someone point out what that is? Why would we not want to have the taskotron.log file to reflect what is being done on the local machine and some other file (artifacts/remote-taskotron.log, remote/taskotron.log, etc.) to represent the elements of execution which happened on another machine?

Do you think those reasons described above are not real benefits, or that they are not worth the risk/benefit ratio? My initial impression was that the risk isn't that great, esp. when we don't move logs live, but start logging to the correctly named file from the beginning. But if you believe otherwise, I'm not going to dispute that much. I think both approaches are good enough here.

My issue was mostly with changing the filename while everything was running. If we're setting the filename during init, don't change it during execution and at least appear to be consistent from a user's perspective, I don't have an issue with most of the high level changes being discussed.

libtaskotron/runner.py
316

I agree that we can't really have a single taskotron.log in our deployments if we go forward with our current plans but I was hoping to have one taskotron.log per buildslave - something along the lines of adding ~/.config/taskotron/ to the config dirs and generating a config file with a different log name for each buildslave user.

If we do away with /var/log/taskotron/taskotron.log, discoverability may be a minor issue but I think that at least printing the artifacts dir at the end of execution in a way that's hard to miss would be a reasonable workaround.

kparal added inline comments.Jun 15 2015, 10:40 AM
libtaskotron/runner.py
316

I was hoping to have one taskotron.log per buildslave - something along the lines of adding ~/.config/taskotron/ to the config dirs and generating a config file with a different log name for each buildslave user.

That's an interesting idea. I have to point out, though, that we don't currently merge config files, we simply pick the first one found. There were some complications with config file merging, but I no longer remember the details - we can look it up. So this will involve further changes to how config file parsing works.

Alternatively, we can add --config FILE cmdline option, generate the same but slightly different config files for each buildslave (different log locations) and run them with these arguments. It might be simpler, provided that we can somehow provide different arguments for different buildslaves when scheduling the job.

If we do away with /var/log/taskotron/taskotron.log, discoverability may be a minor issue but I think that at least printing the artifacts dir at the end of execution in a way that's hard to miss would be a reasonable workaround.

Definitely. We already do that during initialization (but it quickly scrolls away), it sounds reasonable to print it after the execution has finished as well.

mkrizek updated this revision to Diff 1063.Jun 18 2015, 10:22 AM
  • Amend log files structure
mkrizek updated this revision to Diff 1064.Jun 18 2015, 10:24 AM

Using the corrent base branch

Copying the commit message from the last change:

Amend log files structure

This patch introduces the following logging structure:

  1. /var/log/taskotron/taskotron-<username>.log
  2. /var/lib/taskotron/artifacts/<uuid>/taskotron.log
  3. /var/lib/taskotron/artifacts/<uuid>/taskotron-initiator.log
  4. /var/lib/taskotron/artifacts/<uuid>/taskotron-stdio.log
  1. In both disposable and non-disposable modes, logs are

appended after each run. It's unique for each user (buildslave).

  1. In disposable mode, it's log file from the vm running a task. In

non-disposable, it's a log file equivalent to the current
/var/log/taskotron/taskotron.log

  1. Only in disposable mode.
  2. Only in disposable mode. stdio from the vm running a task.
  1. always contains duplicate logs messages of either 2.

(non-disposable) or 3. (disposable).

Please note that tests fail. I'll update them in case we agree that
we'll go forward with this approach.

mkrizek added inline comments.Jun 18 2015, 10:28 AM
libtaskotron/config_defaults.py
69–70

Er, I guess this could be "hardcoded" as well as taskotron.log and taskotron-initiator.log

I really like how your new proposal simplifies and resolves the problems we encountered. The logdir always contains logs of what is actually running on that machine (that should allow us to do easy monitoring). The username in the filename solves the problem of different buildslaves on a single machine (I assume they will run under different user names). And the artifactsdir contains all the logs, properly named, so you don't need to hunt for different pieces of information in several places. I think it's great. There might a few implementation details to tackle, though... we'll see.

Please note that before finalizing this patch, we should first merge develop into disposable-develop, because there were some heavy changes in this area (the memhandler). And btw I just realized we will need to replay the memhandler to two different log files at different points of time - that could be an interesting challenge.

libtaskotron/config_defaults.py
69–70

Yes, to me it would make sense to use the same approach as for the other log names.

libtaskotron/logger.py
52–53

This has been left here, but it is not used anymore (I don't see a place where this would be assigned a value). So either it should be removed, or it should be used somewhere.

The original idea was that we use these global variables to be able to access handlers at any time and tweak them/disable them if needed. Your new approach with add_filehandler() no longer uses the global variables, it adds a "noname" handler and returns it and the caller needs to remember it/store it if it wants to access it in the future. That's fair approach, especially since we now use two file handlers.

I wonder which approach is better:

  1. Delete this global variable and have all file handlers "noname". We won't be able to distinguish them easily if we need it.
  2. Keep using this variable and create one more - let's call them file_handler_logdir and file_handler_artifactsdir. That would be specific to our use case while not limiting other use cases. But maybe it's unnecessary code.
  3. Keep and use this one for logging into logdir. Have the artifactsdir handler unnamed.
58–59

The meaning of "all" will depend on what we decide to do on line 55.

220–254

Here we could easily assign to global file_handler variable, if we wanted to.

237–238

Here we would also probably disable the memhandler. If we decided to not disable it for some reason... what happens when its capacity is exceeded?

258–266

I think we can consider simply dropping this now. We now crash hard if conf.logdir or conf.artifactsdir doesn't exist anyway. I think there's no point to work around the situation when the directories exist but we can't write to them. It made sense in the beginning, but I don't see the need anymore.

libtaskotron/runner.py
313–317

I wonder, do we need to call this before running logger.init()? It's best to have proper logging running as soon as possible, and I think we don't need artifactsdir to exist beforehand. We just need it before running logger.add_filehandler().

320

I wouldn't wrap this into TasktoronError. It's a filesystem error, not really related to us, and it's in the main runner - let's just do

raise

Putting "merge develop into disposable-develop" in TODO list for this patch.

libtaskotron/logger.py
52–53

My intent was to have file handlers unnamed, but just forgot to remove this line (that will teach me not to send review requests last minute before leaving to catch a train). Unless we can now think of a reason to have at least one of the file handlers named I'd rather not solve something that might never happen (using a file handler directly).

258–266

Makes sense to me.

libtaskotron/runner.py
313–317

Yeah, let's call it after logger.init(). Good catch.

320

Ok, makes sense.

kparal added inline comments.Jun 19 2015, 7:49 AM
libtaskotron/logger.py
52–53

OK, let's try going that way.

mkrizek updated this revision to Diff 1076.Jun 23 2015, 1:51 PM
  • Amend log files structure
  • Address issues in the review
  • Add code accidentaly removed in rebase
  • Remove memory handler only after the last file handler is added
kparal accepted this revision.Jun 30 2015, 12:01 PM

Looks good. A few change suggestions before you push it.

libtaskotron/logger.py
239

This should probably be log.error, because we're raising it now.

240

Could you please document IOError being raised in function docstring?

testing/functest_logger.py
22–28

This method can now be integrated into setup(), right? It serves the same purpose, both run before any test method call, just the old xunit-style setup_method() cannot accept fixtures as arguments. Let's merge it.

testing/test_logger.py
63

s/and/is/

This revision is now accepted and ready to land.Jun 30 2015, 12:01 PM
tflink accepted this revision.Jul 2 2015, 7:39 AM

+1 on what @kparal pointed out but other than that, looks good to me