Destroy VM when execution is aborted
ClosedPublic

Authored by mkrizek on Sep 29 2015, 11:25 AM.

Details

Summary

Also add --no-destroy cli option

Test Plan

Tests amended. Run few tasks to check if vms are (not) being destroyed.

Diff Detail

Repository
rLTRN libtaskotron
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
mkrizek retitled this revision from to Destroy VM when execution is aborted.Sep 29 2015, 11:25 AM
mkrizek updated this object.
mkrizek edited the test plan for this revision. (Show Details)
mkrizek added reviewers: tflink, kparal, jskladan, lbrabec.
tflink accepted this revision.Sep 30 2015, 6:53 AM

Not sure I understand why the uuid_dir change is grouped in with this one but it makes sense.

Works well for me locally.

This revision is now accepted and ready to land.Sep 30 2015, 6:53 AM
In D603#11425, @tflink wrote:

Not sure I understand why the uuid_dir change is grouped in with this one but it makes sense.

Works well for me locally.

The reason being that if we failed before creating RemoteRunner (e.g. on connecting to the machine), we would get tb (keyerror on 'artifactsdir') on:

# finalization
log.info('Execution finished at: %s. Task artifacts were saved in: %s',
         datetime.datetime.utcnow().strftime('%Y-%m-%d %H:%M:%S UTC'),
	​ task_runner.arg_data['artifactsdir'])
kparal requested changes to this revision.Sep 30 2015, 9:07 AM

Please note this interacts heavily with D550 handled by @lbrabec. One of these diffs will need to take the other one into account.

libtaskotron/runner.py
123–124

Your approach here is to stop remote execution in case of keyboard interrupt or a remote error, clean up, and then continue execution as if nothing has happened (it can even return 0 as an exit code). Is there a reason for that?

I would rather do this:

  1. the whole remote execution code should be wrapped in try-finally
  2. in the finally block, there should be the cleanup code (minion teardown)
  3. during teardown, we need to expect that it might not finish successfully (e.g. if there was an error during minion spawning, there might be nothing to tear down). We should definitely log all issues, we might even consider raising those issues (maybe only in cases when there was no issue during remote execution). I don't know the best practices here, would be interesting to find out.
  4. if there was an error during remote execution, don't suppress them, but re-raise them (that will happen automatically if there's no except block). that will ensure that no unnecessary code will get executed afterwards and that we will exit a non-zero status.
  5. if we want to go real fancy, you can wrap the main function, catch the keyboard interrupt signal, and print a pretty message into the console instead of a traceback before exiting with a non-zero status
129

Please get rid of debug here. I see no reason to avoid teardown when running in debug mode. That was a temporary measure before --no-destroy was implemented. Now we have it and don't need to keep the hack around.

424

"do not destroy disposable client at the end of task execution" might be clearer

500–501

In D550 I recommended Lukas to move this line to process_args(). I think it fits there.

This revision now requires changes to proceed.Sep 30 2015, 9:07 AM
In D603#11436, @kparal wrote:

Please note this interacts heavily with D550 handled by @lbrabec. One of these diffs will need to take the other one into account.

True but that's been stalled for a while and I'm not sure it's worth waiting for unless it picks up momentum again.

mkrizek updated this revision to Diff 1572.Sep 30 2015, 4:17 PM
  • Address issues from the review
tflink accepted this revision.Sep 30 2015, 4:38 PM
tflink added inline comments.
libtaskotron/runner.py
151

trivial nit: "or when --no-destroy"

500–501

shall we handle that in D550, then?

mkrizek added inline comments.Oct 1 2015, 5:50 AM
libtaskotron/runner.py
151

Ah, will fix that before commiting, thanks.

500–502

I moved only this "args['artifactsdir'] = os.path.join(config.get_config().artifactsdir, args['uuid'])" there. Does it make sense to move also the makedirs there? (I think it's fine as it is)

jskladan accepted this revision.Oct 2 2015, 10:15 AM
kparal accepted this revision.Oct 2 2015, 10:55 AM
kparal added inline comments.
libtaskotron/runner.py
123–124

Looks good now, thanks.

500–502

It's fine as it is. You could have even ignored my comment - I got confused by all those uuid changes, but it would be fine to move that line as part of D550. Whatever you prefer.

This revision is now accepted and ready to land.Oct 2 2015, 10:55 AM
This revision was automatically updated to reflect the committed changes.