This patch allows logger initialization to occur as early as possible.
Also logs are being written to <artifacts_dir>/<uuid>/<uuid>.log instead
of /var/log/taskotron.log in both modes. When running in disposable
clients mode, the initiator log is named <uuid>-initiator.log.
Details
- Reviewers
kparal tflink jskladan - Maniphest Tasks
- T495: Tests fail on disposable-develop branch
Tests pass
Diff Detail
- Repository
- rLTRN libtaskotron
- Branch
- feature/T495-fix-logtest
- Lint
Lint OK - Unit
No Unit Test Coverage
libtaskotron/runner.py | ||
---|---|---|
228 | This needs some research whether it is safe to do during execution. It is moving the log file while being used. | |
315 | If I understand this change properly, we've basically stopped using /var/log/taskotron.log, and always include <uuid.log> in the artifacts dir. This will need some documentation changes, I assume. It will most probably also need buildbot step changes. And maybe some tmpfiles.d config changes. And of course some of the config options no longer apply (logdir, log_name). I'm not saying they are needed now, but we should file a ticket to make sure we don't forget about them :) | |
316 | If we hardcode filelog=True and filelog_path=path, the config options conf.log_file_enabled and os.path.join(conf.logdir, conf.log_name) no longer apply. So this should probably be different for local and remote execution. Which again requires us to parse the formula before initializing the logging... Alternatively, we could deprecate those config options and log always and into a standardized location (always artifactsdir/uuid/). The original idea of logging only in production mode (and having it configurable) was, I think, that we don't want to spam taskotron.log during development (writing a task consists of re-running it over and over again). But having it clearly separated per task in artifactsdir/uuid/ might change the situation and we can consider logging always, into artifactsdir. What do you think? |
libtaskotron/runner.py | ||
---|---|---|
228 | I'd prefer to change the path or name of the file that we're grabbing which might conflict - changing the path of the logfile like that seems risky to me. At the least, it seems like we'd have the potential for log information to be split between files and new information appended to the downloaded logfile, leading to some confusion. If I'm just misunderstanding, please correct me | |
316 | My vote would be for keeping the logging config options that we already have unless there's a good reason to drop those options. |
libtaskotron/runner.py | ||
---|---|---|
228 | I see you concern, but if we rename the file we're grabbing we won't have the same log file naming when running in disposable and non-disposable mode. This way it's always artifacts_dir/<uuid>/<uuid>.log for the machine that actually runs the task. |
libtaskotron/runner.py | ||
---|---|---|
228 | For what it's worth, it works on my machine :) Can somebody confirm that renaming a file while it's being written to does not actually rename it until it's first closed? |
libtaskotron/runner.py | ||
---|---|---|
228 | I would also like to keep the consistency here. I want to be able to say <uuid>.log always contains the debug output from your task. I don't want to say <uuid>.log contains the debug output from your task when run in local mode, but it is <uuid>-client.log when run remote mode (official taskotron infrastructure).
The rename always works on Linux (it won't work on Windows, but we don't care). Also, as long as the file handler is kept open, the new data will be written to the "new" (freshly renamed) file. The problem would occur if the logging framework for some reason released the file handle and then tried to open the file again. Then the data would get written to a file with the old name. This could happen for example when rotating logs (we have it disabled at the moment, so it shouldn't happen). I also see this as potentially risky. OTOH, as long as we keep it as the very last operation in the runner process (it seems to be right now), I don't see that risk too great. If we still want to be safe, we need to initialize file logging after we know the correct name of the log file, i.e. after we know if we run remotely, i.e. after we parse the task formula (as the current code stands). The easiest way here seems to be to pull something like logger.set_file_log() out of logger.init() and make it standalone. We will then be able to configure the file log even after logger.init() is called. We can even make it smarter and allow to set the file log in both methods. If set_file_log() is called and some file log is already configured, it will disable the handler, close the file, rename it, and then re-enable the handler again. It might sound complicated, but I think it should be fairly easy (I can do it, if you want). | |
316 | We will probably need to drop conf.log_name='taskotron.log' in any case. We used to run just a single task at a moment. Now we will be running multiple tasks at once (multiple "initiator" runs on a single buildbot slave). So we can no longer use single taskotron.log and log into it from several processes, that would be a mess. I like the idea of using <uuid>.log. Now the question is whether to use existing /var/log/taskotron/, or /var/lib/taskotron/artifacts/<uuid>/ for that. If we choose the former one, we're more consistent with our past, but the task output is effectively split into two places. If we choose the latter one, we've changed the log location completely, but the benefit is that all the task files are grouped in a single place, which is an idea I quite like. The fact that you have a single directory containing everything important (task standard output, task debug output, custom task artifacts, initiator logs), and you can easily manage it (copy it, archive it, delete it) without hunting for pieces in other system directories (/var/log/taskotron) is quite appealing to me. We can still keep conf.log_file_enabled and we can still default to False in developer mode, so that very long debug files are not generated during development, unless configured. And we can still keep conf.logdir='/var/log/taskotron' if we think it will be useful for something else in the future. So the only casualty could be conf.log_name. Do you think that's something that needs to stay configurable? I can't come up with any reasons. We have other hardcoded names like task_output/ (and that's actually harcoded in a buildbot step, not in libtaskotron). For the same reason, I would remove conf.remote_stdio_name='stdio.log' and have it hardcoded as well, for simplicity and consistency. Then we can always rely on those names being the same, and we can adjust e.g. resultsdb frontend to distinguish those names from other output files, without having to synchronize configuration between many tools. My thinking is that e.g. in autotest you also could configure the path where the directories got created, but then you had dozens of specific files inside that directory and you weren't able to configure the name of each one of them, that would be excessive. |
libtaskotron/runner.py | ||
---|---|---|
316 | Now the question is whether to use existing /var/log/taskotron/, or /var/lib/taskotron/artifacts/<uuid>/ for that. If we choose the former one, we're more consistent with our past, but the task output is effectively split into two places. If we choose the latter one, we've changed the log location completely, but the benefit is that all the task files are grouped in a single place, which is an idea I quite like. The fact that you have a single directory containing everything important (task standard output, task debug output, custom task artifacts, initiator logs), and you can easily manage it (copy it, archive it, delete it) without hunting for pieces in other system directories (/var/log/taskotron) is quite appealing to me. I'd just want to point out that going with just /var/lib/taskotron/artifacts is actually kind of hunting the logs in the end, as "the general user" expects logs to be in /var/log. That is not saying that I'm against having the logs in one place, just please make sure to symlink /var/lib/taskotron/artifacts to /var/log/taskotron dir, so it is easily discoverable. |
I'm not really understanding why the kind of consistency you're talking about here is valueable and worth doing. True, the actual task execution logs would always be in the same place but I fail to understand why that's a good thing. If I have a task that is executed with a remote client, why shouldn't the log/artifact file structure reflect how it was actually executed instead of trying to pretend that "it's all the same, nothing to see here, ignore the man behind the curtain".
It just feels like we're trying to add complexity and potential failure vectors (that I suspect would be a whole barrel full of monkeys to triage if they ever did happen) for very little real advantage. If I'm missing something here, can someone point out what that is? Why would we not want to have the taskotron.log file to reflect what is being done on the local machine and some other file (artifacts/remote-taskotron.log, remote/taskotron.log, etc.) to represent the elements of execution which happened on another machine?
I think there's really nothing more to it than I described in D382#7050, so you have the full picture. I wanted simplicity in documentation, so that people looking for debug logs always know where to look for it. Also, if we wanted to provide a simple link to debug log from the ResultDB/ExecDB frontend in the future, it would be much easier if we had a static file name regardless of the method of execution. It might also introduce some difficulty when grepping the logs and looking for something. I agree that it adds a man behind the curtain.
It just feels like we're trying to add complexity and potential failure vectors (that I suspect would be a whole barrel full of monkeys to triage if they ever did happen) for very little real advantage. If I'm missing something here, can someone point out what that is? Why would we not want to have the taskotron.log file to reflect what is being done on the local machine and some other file (artifacts/remote-taskotron.log, remote/taskotron.log, etc.) to represent the elements of execution which happened on another machine?
Do you think those reasons described above are not real benefits, or that they are not worth the risk/benefit ratio? My initial impression was that the risk isn't that great, esp. when we don't move logs live, but start logging to the correctly named file from the beginning. But if you believe otherwise, I'm not going to dispute that much. I think both approaches are good enough here.
libtaskotron/runner.py | ||
---|---|---|
316 | Would /var/log/taskotron/README be sufficient to fix discoverability? If not, which logs would you like to see in /var/log/taskotron? We're going to have multiple log files now - local taskotron.log, remote taskotron.log, task stdio.log (only when running remotely). How would you like to see that structured and named? Subdirectories <uuid>/ containing symlinks to the log files? Otherwise? |
Sorry, I managed to misread the other comments. I guess I'm of two minds here - one part of me agrees that it would be nice to have a single directory which contains all relevant output from the initiator and executor in one place. However, the admin part of me would rather see a single log file (for the initiator at least) in a static, known place so that monitoring is easier.
I'm almost wondering if duplicating the taskotron.log data would work better here - put one copy of the output into the task's artifacts dir but also have a "global" log that's put into a file that behaves similarly to how we're currently doing taskotron.log. That way, the user-facing concerns and the ease of administration concerns would both be addressed. As long as we're reasonably good about purging old log files, storage space consumption shouldn't be too much of an issue.
It just feels like we're trying to add complexity and potential failure vectors (that I suspect would be a whole barrel full of monkeys to triage if they ever did happen) for very little real advantage. If I'm missing something here, can someone point out what that is? Why would we not want to have the taskotron.log file to reflect what is being done on the local machine and some other file (artifacts/remote-taskotron.log, remote/taskotron.log, etc.) to represent the elements of execution which happened on another machine?
Do you think those reasons described above are not real benefits, or that they are not worth the risk/benefit ratio? My initial impression was that the risk isn't that great, esp. when we don't move logs live, but start logging to the correctly named file from the beginning. But if you believe otherwise, I'm not going to dispute that much. I think both approaches are good enough here.
My issue was mostly with changing the filename while everything was running. If we're setting the filename during init, don't change it during execution and at least appear to be consistent from a user's perspective, I don't have an issue with most of the high level changes being discussed.
libtaskotron/runner.py | ||
---|---|---|
316 | I agree that we can't really have a single taskotron.log in our deployments if we go forward with our current plans but I was hoping to have one taskotron.log per buildslave - something along the lines of adding ~/.config/taskotron/ to the config dirs and generating a config file with a different log name for each buildslave user. If we do away with /var/log/taskotron/taskotron.log, discoverability may be a minor issue but I think that at least printing the artifacts dir at the end of execution in a way that's hard to miss would be a reasonable workaround. |
libtaskotron/runner.py | ||
---|---|---|
316 |
That's an interesting idea. I have to point out, though, that we don't currently merge config files, we simply pick the first one found. There were some complications with config file merging, but I no longer remember the details - we can look it up. So this will involve further changes to how config file parsing works. Alternatively, we can add --config FILE cmdline option, generate the same but slightly different config files for each buildslave (different log locations) and run them with these arguments. It might be simpler, provided that we can somehow provide different arguments for different buildslaves when scheduling the job.
Definitely. We already do that during initialization (but it quickly scrolls away), it sounds reasonable to print it after the execution has finished as well. |
Copying the commit message from the last change:
Amend log files structure
This patch introduces the following logging structure:
- /var/log/taskotron/taskotron-<username>.log
- /var/lib/taskotron/artifacts/<uuid>/taskotron.log
- /var/lib/taskotron/artifacts/<uuid>/taskotron-initiator.log
- /var/lib/taskotron/artifacts/<uuid>/taskotron-stdio.log
- In both disposable and non-disposable modes, logs are
appended after each run. It's unique for each user (buildslave).
- In disposable mode, it's log file from the vm running a task. In
non-disposable, it's a log file equivalent to the current
/var/log/taskotron/taskotron.log
- Only in disposable mode.
- Only in disposable mode. stdio from the vm running a task.
- always contains duplicate logs messages of either 2.
(non-disposable) or 3. (disposable).
Please note that tests fail. I'll update them in case we agree that
we'll go forward with this approach.
libtaskotron/config_defaults.py | ||
---|---|---|
69–70 | Er, I guess this could be "hardcoded" as well as taskotron.log and taskotron-initiator.log |
I really like how your new proposal simplifies and resolves the problems we encountered. The logdir always contains logs of what is actually running on that machine (that should allow us to do easy monitoring). The username in the filename solves the problem of different buildslaves on a single machine (I assume they will run under different user names). And the artifactsdir contains all the logs, properly named, so you don't need to hunt for different pieces of information in several places. I think it's great. There might a few implementation details to tackle, though... we'll see.
Please note that before finalizing this patch, we should first merge develop into disposable-develop, because there were some heavy changes in this area (the memhandler). And btw I just realized we will need to replay the memhandler to two different log files at different points of time - that could be an interesting challenge.
libtaskotron/config_defaults.py | ||
---|---|---|
69–70 | Yes, to me it would make sense to use the same approach as for the other log names. | |
libtaskotron/logger.py | ||
52–53 | This has been left here, but it is not used anymore (I don't see a place where this would be assigned a value). So either it should be removed, or it should be used somewhere. The original idea was that we use these global variables to be able to access handlers at any time and tweak them/disable them if needed. Your new approach with add_filehandler() no longer uses the global variables, it adds a "noname" handler and returns it and the caller needs to remember it/store it if it wants to access it in the future. That's fair approach, especially since we now use two file handlers. I wonder which approach is better:
| |
58–59 | The meaning of "all" will depend on what we decide to do on line 55. | |
220–254 | Here we could easily assign to global file_handler variable, if we wanted to. | |
237–238 | Here we would also probably disable the memhandler. If we decided to not disable it for some reason... what happens when its capacity is exceeded? | |
258–266 | I think we can consider simply dropping this now. We now crash hard if conf.logdir or conf.artifactsdir doesn't exist anyway. I think there's no point to work around the situation when the directories exist but we can't write to them. It made sense in the beginning, but I don't see the need anymore. | |
libtaskotron/runner.py | ||
313–317 | I wonder, do we need to call this before running logger.init()? It's best to have proper logging running as soon as possible, and I think we don't need artifactsdir to exist beforehand. We just need it before running logger.add_filehandler(). | |
320 | I wouldn't wrap this into TasktoronError. It's a filesystem error, not really related to us, and it's in the main runner - let's just do raise |
Putting "merge develop into disposable-develop" in TODO list for this patch.
libtaskotron/logger.py | ||
---|---|---|
52–53 | My intent was to have file handlers unnamed, but just forgot to remove this line (that will teach me not to send review requests last minute before leaving to catch a train). Unless we can now think of a reason to have at least one of the file handlers named I'd rather not solve something that might never happen (using a file handler directly). | |
258–266 | Makes sense to me. | |
libtaskotron/runner.py | ||
313–317 | Yeah, let's call it after logger.init(). Good catch. | |
320 | Ok, makes sense. |
libtaskotron/logger.py | ||
---|---|---|
52–53 | OK, let's try going that way. |
- Amend log files structure
- Address issues in the review
- Add code accidentaly removed in rebase
- Remove memory handler only after the last file handler is added
Looks good. A few change suggestions before you push it.
libtaskotron/logger.py | ||
---|---|---|
239 | This should probably be log.error, because we're raising it now. | |
240 | Could you please document IOError being raised in function docstring? | |
testing/functest_logger.py | ||
22–28 | This method can now be integrated into setup(), right? It serves the same purpose, both run before any test method call, just the old xunit-style setup_method() cannot accept fixtures as arguments. Let's merge it. | |
testing/test_logger.py | ||
63 | s/and/is/ |
Er, I guess this could be "hardcoded" as well as taskotron.log and taskotron-initiator.log