Add remote execution for disposable clients
ClosedPublic

Authored by mkrizek on Apr 29 2015, 1:59 PM.

Details

Summary

The patch assumes that a virtual machine runs with the libtaskotron rpm
installed and a public key (of a user that runs runtask) in
/root/.ssh/authorized_keys. The ip address of the vm is hardcoded in
libtaskotron/runner.py. All of this will be replaced with actual code
that handles virtual machines.

Overview of output files:

Artifacts: <artifacts_dir>/<uuid>/
Std out/err of the task ran in the vm: <artifacts_dir>/<uuid>/stdio.log
VM taskotron.log: <artifacts_dir>/<uuid>/taskotron.log
Buildslave taskotron.log: <artifacts_dir>/<uuid>/taskotron-initiator.log

Test Plan

Create a vm as described in the commit message. Update hardcoded ip
address of the vm in libtaskotron/runner.py. Run a task whose yaml file
contains something like:
environment:

machine:
  dummy: ''

Diff Detail

Repository
rLTRN libtaskotron
Branch
feature/remote_exec
Lint
Lint OK
Unit
No Unit Test Coverage
mkrizek retitled this revision from to Add remote execution for disposable clients.Apr 29 2015, 1:59 PM
mkrizek updated this object.
mkrizek edited the test plan for this revision. (Show Details)
mkrizek added reviewers: tflink, kparal, jskladan.
tflink requested changes to this revision.May 5 2015, 8:57 PM

In general, I like it but have some questions/concerns:

  • I'd like to see less generic filenames the log files
  • I had a hard time figuring out what was going on - it would just fail with no explanation. I'm not sure we want log messages for every remote action, though - that'd get pretty verbose pretty quick. Other thoughts?
  • it dies very easily, some examples of what I hit:
    • libtaskotron is already installed on the vm
    • /etc/taskotron/taskotron.yaml exists on the vm

we could hold off and put the "resiliancy", "verbosity" and "logfile name" bits into tickets as enhancements, though.

libtaskotron/remote_exec.py
84

I like the idea of saving stdio to a logfile but if we're running multiple VMs on a virthost, that logfile is gonna get messy and quasi-unusable pretty quick. I'd rather see the output saved in <hostname>.stdio.log or something that identifies with the buildslave name or remote machine name.

This could be added in another ticket, though - I'm fine with either way

115

fun fact (and new to me): if the package is already installed, yum will exit with a code of 1 so this will blow up rather silently if the package required is already installed on the remote machine.

Not sure that's the behavior that we want here

145

do we want to handle the case of a pre-existing file? right now, this silently fails if the destination file already exists

libtaskotron/runner.py
242

This blows up with exception if there is no taskotron.log on the remote machine. While not the most common case, it can still be valid and should log an error message instead of blowing up

346

If we keep the concept of having different taskotron.log files for local/remote execution, I'd like to see this be less generic so that it's not confusing to have multiple runners on the same virthost

This revision now requires changes to proceed.May 5 2015, 8:57 PM
mkrizek added inline comments.May 6 2015, 2:20 PM
libtaskotron/remote_exec.py
84

Good point.

115

Hmm, not on my (fc20) or kparal's (fc21) machines:

# yum -y install libtaskotron-config
Loaded plugins: langpacks, refresh-packagekit
Package libtaskotron-config-0.3.15-1.fc20.noarch already installed and latest version
Nothing to do
# echo $?
0
145

Does overwriting the file make more sense?

libtaskotron/runner.py
242

Thanks, I knew about that, just didn't fix it for some reason :D

346

We delete taskotron.log before execution anyway on buildmaster (and so we should delete remote_logdir as well), no? Or is it appropriate time to come up with better solution that deleting the logs?

tflink added a comment.May 6 2015, 3:57 PM

Some of this is getting outside the scope of the communication method and I'm fine with breaking them off as enhancements if they're out of scope for this review

libtaskotron/remote_exec.py
115

Of course now that I've written it down, I can't reproduce it :)

I wonder what I was doing differently yesterday but it looks like this can be ignored

145

I think that making overwriting an option would be great. There are some cases where we want to be overwriting (taskotron.yaml) and there are other cases where we might not want to be overwriting the existing target.

On the other hand, this isn't a huge issue and we could just wait and see if it ends up causing problems. I'm fine with either

libtaskotron/runner.py
240

I think that we want to do something with the stdout/stderr here. I finally figured out why the task I was testing with was failing - it was missing something and blowing up before creating the log file. The error showed up in stdout on the vm but not on the task initiator.

346

If we end up going forward with what I had in mind for virthosts - we'll need something that's a bit more specific than what we're doing now. If we have multiple buildslaves running on the same machine, that'll create multiple logfiles and decent odds that one of them will be running when another is wrapping up.

It makes sense to me to either a) start putting logs in a per-buildslave location or b) having names that are unique enough so that each buildslave can figure out which logs belong to it.

mkrizek added inline comments.May 7 2015, 1:04 PM
libtaskotron/runner.py
240

Wasn't it saved in the stdio file on the task initiator? I am not sure I understand.

346

Ok, I didn't realize we were going to have multiple runners on the same machine. In that case, how about we store everything in the artifacts dir right away and not copy everything after the run is over?

tflink added inline comments.May 11 2015, 1:35 PM
libtaskotron/runner.py
240

I don't see any output in the stdio.log - the only file that got output was the remote-exec/taskotron.log file. I'll poke at this a bit more today to make sure, though

346

if the permissions work out, that sounds reasonable.

I think that we'll have more things to rework/tweak if we go forward with the multiple runners per machine plan, though. if storing everything in the artifacts dir right away doesn't work well, I think this would be better as an enhancement ticket that we deal with after figuring out some of the other details.

I've poked at this a bit more using rpmlint modified to work as a remote task. If I execute that task on a build (drbd-8.9.2-3.fc23.x86_64 is the one I happened to use) and I run the task with python runtask.py -a x86_64 -i drbd-8.9.2-3.fc23 -t koji_build ../task-rpmlint/rpmlint.yml I get a bunch of output where it's installing repos and packages (which I'll skip for brevity) until I get to:

[libtaskotron:remote_exec.py:142] 2015-05-12 14:16:00 DEBUG   Putting file (/etc/taskotron/taskotron.yaml) on to remote host (/etc/taskotron/taskotron.yaml)
[libtaskotron:remote_exec.py:158] 2015-05-12 14:16:00 DEBUG   Getting file from remote host /home/tflink/code/taskotron/libtaskotron/taskotron_logs/taskotron.log to /home/tflink/code/taskotron/libtaskotron/taskotron_logs/taskotron.log
[libtaskotron:remote_exec.py:162] 2015-05-12 14:16:00 ERROR   Could not get file root@192.168.124.202:/home/tflink/code/taskotron/libtaskotron/taskotron_logs/taskotron.log to /home/tflink/code/taskotron/libtaskotron/taskotron_logs/taskotron.log: No such file
[libtaskotron:logger.py:89] 2015-05-12 14:16:00 CRITICAL Traceback (most recent call last):
  File "runtask.py", line 10, in <module>
    runner.main()
  File "/home/tflink/code/taskotron/libtaskotron/libtaskotron/runner.py", line 357, in main
    task_runner.get_output()
  File "/home/tflink/code/taskotron/libtaskotron/libtaskotron/runner.py", line 242, in get_output
    self.ssh.get_file(logfile, logfile)
  File "/home/tflink/code/taskotron/libtaskotron/libtaskotron/remote_exec.py", line 163, in get_file
    raise exc.TaskotronRemoteError(e)
TaskotronRemoteError: [Errno 2] No such file

There is no real indication to me on what happened other than something around putting the taskotron.yaml file on to the remote host failed for, if I look at the stdio.log, I see nothing indicating what the problem is - that does have output this time but that stops with the successful yum command.

After adding some debug code of my own, turns out that the problem is local - I don't have an /etc/taskotron/taskotron.yaml on my task initiator and the ssh.put_file('/retc/taskotron/taskotron.yaml', '/etc/taskotron/taskotron.yaml') is failing because it can't find the source file.

I think that adding some global-ish error handling to this would help debug issues.

libtaskotron/runner.py
199

we don't require taskotron.yaml to be in /etc/ for normal operation, I'd like to see more flexibility here but that can be added as a later enhancement since it's a bit outside the scope of what this ticket was supposed to accomplish

354

I'd like to see an except block here, even if it just logs the exception that it catches - right now it's just squelching errors that cause problems

mkrizek updated this revision to Diff 995.May 20 2015, 12:15 PM
  • Address issues in the review
mkrizek updated this object.May 20 2015, 12:17 PM
mkrizek updated this revision to Diff 996.May 20 2015, 12:23 PM
  • Copy vm taskotron.log to artifacts directory
mkrizek updated this object.May 20 2015, 12:23 PM
mkrizek updated this revision to Diff 997.May 20 2015, 12:26 PM

Using the correct base branch

mkrizek added inline comments.May 20 2015, 12:32 PM
libtaskotron/remote_exec.py
91

I suppose we need to make writing stdio into a file optional but to move things forward let's do it in a separate review. Thoughts?

libtaskotron/runner.py
342

We need to create uuid directory inside artifacts directory as soon as possible. If we do that outside (Remote)Runner class, we prevent others from using the class because it would fail on artifacts dir not existing. Any thoughts on this?

tflink added inline comments.May 20 2015, 2:39 PM
libtaskotron/remote_exec.py
91

yeah, I'm fine with adding stuff later. I'm sure there's stuff that we haven't thought of yet, anyways :)

libtaskotron/runner.py
342

I can't think of any use cases where someone would want to use the (Remote)Runner class without the other stuff in main(). If we document the need for those directories to exist, I don't think it'd be much of a problem anyways - can you think of any use cases where that isn't the case?

mkrizek added inline comments.May 20 2015, 2:58 PM
libtaskotron/runner.py
342

Can't think of anything either. Just wanted to raise that concern in case someone would come up with something.

Overall, it looks really good - works well in my testing.

I have 2 concerns:

  1. no test coverage - this kind of code is not the easiest stuff to test but I'm a little nervous about adding code that has no test coverage
  2. there are some minor error checking things that aren't addressed - like remote_exec.write_file() that doesn't check for an existing file before overwriting

I think that 2) isn't worth dealing with right now - that stuff can be dealt with as needed with enhancements. I don't think it's worth trying for perfection right off the bat, especially this early on.

For 1), part of me wants to just put it off so we can get this code in disposable-develop but I'm leaning more towards waiting until there's some test coverage in remote_exec. Any other thoughts on this?

libtaskotron/runner.py
231

This throws a TB if there is a missing type or item. I'm not sure that's a valid use case but it'd be better not to blow up. I suspect that we're going to need some more arg validation or verify that what we have works for stuff like your concern about rm -rf being in item.

That being said, I also think that this is out of scope for this review.

In D356#6842, @tflink wrote:

Overall, it looks really good - works well in my testing.

I have 2 concerns:

  1. no test coverage - this kind of code is not the easiest stuff to test but I'm a little nervous about adding code that has no test coverage
  2. there are some minor error checking things that aren't addressed - like remote_exec.write_file() that doesn't check for an existing file before overwriting

File is created if it doesn't exist. Do you think we should not write to the file if it doesn't exist or?

I think that 2) isn't worth dealing with right now - that stuff can be dealt with as needed with enhancements. I don't think it's worth trying for perfection right off the bat, especially this early on.

For 1), part of me wants to just put it off so we can get this code in disposable-develop but I'm leaning more towards waiting until there's some test coverage in remote_exec. Any other thoughts on this?

I have already started writing tests. I am not really sure how to approach this as there would be a lot of mocking paramiko methods. Any ideas?

mkrizek updated this revision to Diff 999.May 25 2015, 2:08 PM
  • Make writing vm's output into a file optional
mkrizek updated this revision to Diff 1000.May 25 2015, 2:11 PM

Er, forgot to use the correct base branch...again

mkrizek updated this revision to Diff 1005.May 27 2015, 2:14 PM
  • Merge branch 'disposable-develop' into feature/remote_exec
  • Use actual location of taskotron.yaml
In D356#6847, @mkrizek wrote:
In D356#6842, @tflink wrote:

Overall, it looks really good - works well in my testing.

I have 2 concerns:

  1. no test coverage - this kind of code is not the easiest stuff to test but I'm a little nervous about adding code that has no test coverage
  2. there are some minor error checking things that aren't addressed - like remote_exec.write_file() that doesn't check for an existing file before overwriting

File is created if it doesn't exist. Do you think we should not write to the file if it doesn't exist or?

Doesn't it overwrite anything that already exists in that file or am I missing something? It's not huge, just not sure that's ideal.

I think that 2) isn't worth dealing with right now - that stuff can be dealt with as needed with enhancements. I don't think it's worth trying for perfection right off the bat, especially this early on.

For 1), part of me wants to just put it off so we can get this code in disposable-develop but I'm leaning more towards waiting until there's some test coverage in remote_exec. Any other thoughts on this?

I have already started writing tests. I am not really sure how to approach this as there would be a lot of mocking paramiko methods. Any ideas?

Not many specific ideas at the moment but I suspect building a stub paramiko object that's reused in many places might work well here instead of just creating a bunch of dinguses over and over in the unit tests. I don't know of a technique to avoid heavy mocking/stubbing for stuff like this

mkrizek updated this revision to Diff 1009.May 28 2015, 2:08 PM
  • Add remote_exec tests WIP
mkrizek updated this revision to Diff 1010.May 29 2015, 1:38 PM
  • Add more tests for remote_exec
tflink accepted this revision.Jun 1 2015, 6:02 PM

Looks good to me.

I suspect that some parts of this may need some tweaking as we complete the feature but we'll cross that bridge if and when we get there. At the very least this is a great start

This revision is now accepted and ready to land.Jun 1 2015, 6:02 PM

I know this has already been committed (do I get the "missed the train" badge?), but since I've spent a few hours studying the changes, I decided to provide some more feedback here. Most of it is cosmetic, but there's one serious issue there as well (the config initialization adjustment).

conf/taskotron.yaml.example
94

Maybe task files (git checkout)?

libtaskotron.spec
26

This should be >= 1.15.1 to be in sync with requirements.txt.

libtaskotron/config_defaults.py
69

I'm a bit confused about the remote part here. In the remote_stdio_name, I understand it, it's the stdio of the VM launched, so remote client. But in this case, it's the local taskotron log, isn't it? The remote taskotron log is saved as taskotron.log, right? So shouldn't this be local_log_name or initiator_log_name? (We really need to agree on some terminology in disposable clients mode, so that we're consistent and know what we mean).

libtaskotron/remote_exec.py
32

Would it be possible to say which file? I know not every file handle has a file name attached, but maybe just for those that do? Maybe even throw an exception?

90

There's definitely missing something here :-)

230

Copy paste error from get_dir().

242

Copy paste error from get_dir().

libtaskotron/runner.py
31

You cannot call config.get_config() in the top level scope. If you do it like this, it is executed before running main() and before parsing cmdline args and running logger.init_prior_config(). If the config parsing fails for some reason, the default logging is not yet initialized, and you will not receive any helpful error messages.

You have to move config.get_config() to the class methods where you need to use it (or run it once in __init__() and save it).

33

On line 61 we call this artifactsdir (meaning the task-particular artifacts dir, i.e. <global artifacts dir>/<uuid>), here we call it uuid_dir. I find that mildly confusing. Can we use the same term in both places?

189

If you wanted to make it simpler, curl is always available ;)

190

Please put a line break here, and at least on line 217 as well. Thank you.

226

This is a great comment. But the item is provided by taskotron trigger, so the very fedmsg would have to be fake, right? So we need to make sure we check fedmsg certificates. And even if this happened, the only thing you control is a disposable client.

238–248

Would it make sense to swap these two operations? So that if somebody's task, by accident, creates a file called taskotron.log, it does not overwrite our own file?

337–340

You can also use

remote = task_data.get('environment', {}).get('machine')

if you like it more than handling the KeyError.

346

I'm again confused by the word remote here, a few comments would definitely improve it here. If remote is true, does it mean that this runtask script is running on a remote machine (i.e. the VM client)? Or does it mean we will schedule it remotely, i.e. we're running on the buildslave initiator?

By reading the code below, it seems that the latter is the case. Would it make sense to rename remote to run_remotely? It seems clearer to me.

testing/test_remote_exec.py
23

A comment would help here.

24

A comment would help here.

89

A comment would help here.

91

A comment would help here.

@kparal Thanks for the review, there are some good catches. I addressed the issues in D380.

libtaskotron/runner.py
190

OK. Left it there as it is since those lines will go away (hopefully) soon anyway.

226

Right. Or maybe someone could send a (force) build directly to buildbot with that item. But the attacker would need to know the password to the buildbot instance I think. Either way, the comment is just a result of me becoming more paranoid. :)