Switch job scheduling over to pungi4 + fedmsg
ClosedPublic

Authored by adamwill on Feb 24 2016, 1:32 AM.

Details

Summary

This is a big diff with all the changes from the 'pungi4'
branch - that branch has the changes split into multiple commits.

This pretty much entirely rewrites scheduling so instead of using
fedfind to find composes and wait for them to exist and find images
in them, and having systemd timers with implied knowledge about when
various types of compose show up, we listen out for fedmsg messages
to tell when a new compose has appeared, and we use the Pungi 4
metadata to decide what images we want to download and test.

Ideally we'd like to have the fedmsg listening bits be a Taskotron
trigger and task, but that's still going through review at present
so we're just going to use a simple standalone consumer for now.

One annoying issue is that we still want to test the daily 'two week
Atomic' test composes, and those will not be done with Pungi 4 yet.
So we have a couple of small hacks in the fedmsg consumer and some
more dumb hacks in the scheduler to cope with those: we listen out
for the fedmsg's from that compose process as well as Pungi 4
fedmsg's, and we just hard code the expected location and metadata
of the single ISO we actually want to test within such a compose. At
first I wrote a whole 'clever' layer in fedfind to synthesize Pungi
4-y metadata for a non-Pungi-4 compose, but it was way too much code
for the job we really need to do, this is much simpler.

Test Plan

Install it (you'll need to clean up the old systemd units
manually, unfortunately, setuptools is pretty dumb), start the new
consumer service, wait for a compose to happen and see if you get
some jobs. fedmsg-dg-replay may help test the fedmsg consumer-y bits,
and to test the rest of it you can use the CLI (it's been rejigged
a lot and now simply accepts a compose location), or just hook right
into jobs_from_compose.

There will be a companion diff for openqa_fedora that updates the
flavor names (and fixes a few other things up to work in a world
where we use the Pungi 4 'compose ID' as the openQA 'build').

Diff Detail

Repository
rOPENQA fedora_openqa
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
adamwill retitled this revision from to Switch job scheduling over to pungi4 + fedmsg.Feb 24 2016, 1:32 AM
adamwill updated this object.
adamwill edited the test plan for this revision. (Show Details)
adamwill added reviewers: jskladan, garretraziel.

So here's why I sent this now:

<adamw> dgilmore: so what's your current thinking wrt pungi4 switchover?
<dgilmore> adamw: branched is not enabled
 adamw: and rawhide is disabled
<dgilmore> I am going to change it
 and will have to work as we go on the missing things
<dgilmore> adamw: I may run it manually tomorrow
<dgilmore> I will need to get nirik to have someone reinstall branched and rawhide composer boxes
<adamw> when you say 'rawhide is disabled', you mean at this point we are getting no more old-style rawhide nightlies?
<dgilmore> adamw: correct
<adamw> ok
 and we will never get any old-style 24 branched nightlies?
 (unless we decide all this is awful and we have to change our plans)
<dgilmore> correct
<adamw> okay.

So we should have a pretty low bar for merging this, since the current code will basically never manage to test anything but two-week atomic composes ever again (even the current bit is very unlikely to work any more when we get around to figuring out how TCs and RCs are going to work, based on my poke through the Pungi 4 code today, they're gonna look different).

this using ISOURL is correct for now as the openqa update which would make us have to use ISO_URL is still in updates-testing ATM, though I'll push it stable ASAP and we'll have to update that.

garretraziel accepted this revision.Feb 24 2016, 8:57 AM

Other than my comments, lgtm.

scheduler/setup.py
18

We should bump version also :-).

32

I think that fedfind is still needed for fedfind.helpers.

This revision is now accepted and ready to land.Feb 24 2016, 8:57 AM
jskladan accepted this revision.Feb 24 2016, 9:06 AM
jskladan added a subscriber: jdulaney.

ACK, code looks good. Get rid of the stupid copyright/license clause, or replace it by the short version - having a header longer than the code is silly...

scheduler/fedora_openqa_schedule/consumer.py
2–21 ↗(On Diff #1935)

This is really unnecessary, and we don't do it anywhere in the rest of the code.
If @jdulaney really needs to have "copyright" on a piece of code, that is basically just a copy of fedmsg consumer example script, then be it, but do it this way:

# Copyright 2016 John Dulaney
# License: GPL-2.0+ <http://spdx.org/licenses/GPL-2.0+>
# Authors: John Dulaney ...
           Adam Williamson ...

Also please update config sample.

garretraziel requested changes to this revision.Feb 24 2016, 11:11 AM

Not sure whether it's true in production, but I tried to use fedmsg-dg-replay to replay this message and consume() function doesn't receive only "msg" part, but it receives whole message like this:

{u'username': u'jsedlak',
 u'i': 1,
 u'timestamp': 1456312058,
 u'msg_id': u'2016-1db3386a-5b35-4f69-8a1d-1d23d0bcc1ae',
 u'topic': u'org.fedoraproject.dev.pungi.compose.status.change',
 u'msg': {u'status': u'FINISHED_INCOMPLETE',
   u'location': u'http://kojipkgs.fedoraproject.org/compose//rawhide/Fedora-Rawhide-20160222.n.0/compose',
   u'compose_id': u'Fedora-Rawhide-20160222.n.0'}
}

Please verify it.

This revision now requires changes to proceed.Feb 24 2016, 11:11 AM

Also, when I use msg = msg['msg'] at the beginning of consume function, it schedules jobs, but then it shows:

logger.info("Jobs run on %s: %s", compose, ' '.join(jobs))
TypeError: sequence item 0: expected string, int found

problem is that jobs is list of ints, but join only works on list of strings.

thanks for testing! you may be able to tell I didn't ;) (well I tested the bits below fedmsg, but not fedmsg, it was too late). I will clean up the problems and test before merge, and check with threebean the format the message actually arrives in.

adamwill added inline comments.Feb 24 2016, 4:02 PM
scheduler/fedora_openqa_schedule/consumer.py
2–21 ↗(On Diff #1935)

"This is really unnecessary, and we don't do it anywhere in the rest of the code."

Actually we do, the rest of the scheduler uses it too. I took the format from some 'best practices' thing somewhere. Since the consumer is part of the scheduler package now I think it makes sense to keep the format consistent...

if we decide to change it, let's change the whole scheduler as one separate commit.

This revision was automatically updated to reflect the committed changes.