rewrite createhdds in python, make it shinier
ClosedPublic

Authored by adamwill on Dec 11 2015, 6:48 AM.

Details

Summary

createhdds.sh was just too damn simple and understandable, so
I thought I'd make it three times longer, object oriented,
and hard to understand!

OK, OK, that's not great sales. Alright. The main thing was to
make it smarter. This rewrite lets it do these things:

  • Only create the images that are missing (not rebuild all)
  • Work out the releases to build images for
  • Rename images when appropriate
  • Rebuild images when they need rebuilding
  • Remove old / abandoned images

It can figure out what images ought to be present - including
working out the 'next' release and figuring out from that what
releases it needs images for - and build only the missing ones.
There's a 'version' concept for images; if the existing image
is older than the version given in the data file, it'll be
rebuilt. The data file can list 'rename' pairs, allowing images
to be renamed (like when we move from a single image to multiple
label/filesystem variants). This code uses fedfind's ability
to find the current release version to figure out what releases
we need virtbuilder images for (so you don't have to pass it
in). And it can find image files that aren't in the 'currently
expected' set and wipe them. Images can also have a 'maxage',
triggering a rebuild when they exceed it - this is intended
for the virtbuilder images, so we get a rebuild with the
latest updates every so often (default is two weeks).

The point of all this is to help with unattended deployment/
maintenance, i.e. the ansible deployment we have in infra;
the idea is that we can just set that up to run the 'all'
subcommand every so often, and it'll remove old images, create
new ones, and rebuild ones that are outdated.

I kept the ability to build a single image (or a whole image
'group'), and included the ability to just run a check without
actually doing a rebuild. There's a few little weird things
and holes here as it's not really the focus of the tool.

Test Plan

Build all images, do a full test run, and see if
it works OK. Test out all the variations of building single
images / image groups, and using the 'check' command.

Diff Detail

Repository
rOPENQA fedora_openqa
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
adamwill retitled this revision from to rewrite createhdds in python, make it shinier.Dec 11 2015, 6:48 AM
adamwill updated this object.
adamwill edited the test plan for this revision. (Show Details)
adamwill added reviewers: jskladan, garretraziel.
adamwill added a comment.EditedDec 11 2015, 6:50 AM

I should probably add a little readme or something explaining hdds.json and the .commands files - for now they're documented in the main script's docstrings, I'll throw something together tomorrow if I can. Hopefully you can follow how it works from the docstrings.

The disk size handling is a bit half-assed; I was using blivet's cool Size class for it, which works great, but blivet's deps are pretty heavy to pull in just for this use, so I hacked together something good enough.

adamwill updated this revision to Diff 1773.Dec 11 2015, 6:59 AM

rename 'apiver' to 'imgver', it's really not an api

adamwill updated this revision to Diff 1774.Dec 11 2015, 7:02 AM

more 'API' renaming

adamwill updated this revision to Diff 1775.Dec 11 2015, 7:13 AM

sigh, i screwed up the apiver/imgver sed, fixing

Overall, it is probably OK, although it is overly-complicated for what it does.

Thorough documentation of the hdds.json is a must, though - I could probably get to the gist of it by following the code carefully, but that is not the right way (but I guess you'll be adding the docs soon anyway).

Some comments for the first-pass of the code in-line

tools/createhdds.py
489

Why return current, when it's never really used in any of the func-calls?

543

hdds is not used at all in the method, but the comments mention the hdds.json on multiple occasions - this is my first pass of the code, so take it with a grain of salt, but it sometimes does not make sense to me. For example:

If the user passed in label or filesystem, we pass them on
to get_guestfs_images as single-item lists, otherwise we
​just pass 'None', causing it to use the values from hdds.json

How is it overriden by hdds.json, when the hdds value (the parsed hdds.json file) is not used at all?

683–691

This could IMHO be embedded in the main() directly, the run() is (AFAIK) not called from anywhere else, anyway...

691

Since hdds is not used at all in cli_image(), how about changing the order of the arguments (args.func(args, hdds))? This would then allow the cli_image() to have def cli_image(args, *_) signature.
Absolutely not necessary, just a food for thought.

tools/hdds.json
5

Why imgver instead of imgver?

garretraziel accepted this revision.Dec 11 2015, 11:38 AM

Other than my and Josef's concerns, looks good to me.

tools/createhdds.py
254

It's not necessary ATM, but could you make root password also configurable?

266–278

I was hoping that expect is only temporary solution, but oh well, I guess that there is still no better way how to do this.

This revision is now accepted and ready to land.Dec 11 2015, 11:38 AM
jskladan requested changes to this revision.Dec 11 2015, 11:59 AM

Nacking just so the 'Ack' status from @garretraziel does not make it seem like it's ready to land

This revision now requires changes to proceed.Dec 11 2015, 11:59 AM
adamwill updated this revision to Diff 1776.Dec 11 2015, 3:01 PM

fix up ' imgver' in hdds.json too

adamwill added inline comments.Dec 11 2015, 3:23 PM
tools/createhdds.py
490

because I wrote check() quite early - before I had the actual CLI code in place - and was just printing everything it returned to make sure it was sane :) i.e., no particularly great reason. I can change it.

544

so what happens here is that we actually pulled a dict out of hdds *in parse_args()*, and stuffed it into args; note where this function reads args.imggrp. I originally had it set up so parse_args() would just send the image 'type' and the name of the image group, and this function would find the appropriate group in hdds itself, then thought 'wait, that's silly, since parse_args() already got the dict it can just return the whole thing'. This is part of the 'clever clever' trick to set up a named subcommand parser for each individual image group. It may be that there's a cleaner/more obvious way to set this up, I'll think about it a bit, and comment it if not.

684–692

first comment - yeah, probably, I just boilerplate this stuff around from script to script (it's all there in relval and fedfind too). second comment - sure, it's a tiny thing but it'd give me more pylint points, I guess I'll do it :)

tools/hdds.json
6

Because I did a bad sed at midnight after a few drinks :) I fixed it up in createhdds.py but forgot to fix the json too until this morning.

On complexity - so I was trying to strike a balance between a somewhat-sensible and extensible design, and simplicity. I could write it much simpler just keeping all the image definitions in-line in the python and not worry so much about genericness, but I kinda hate having a bunch of what's essentially configuration in-line in code. I actually have an impulse to make it much *more* generic - right now there's a decent amount of special-casing in terms of how the 'grouped' things in the image groups are handled (filesystems and labels and release and arch), if we suddenly decide we want some other form of image group variation I'll have to make quite a few tweaks.

What I like about this design is that you can add a new image or image group just by editing hdds.json (and adding a .commands file if appropriate) - you don't have to touch the code at all.

tools/createhdds.py
255

could do, sure.

267–279

yeah, I couldn't think of any. I tried without it and found the same problem you did - virt-builder can't do the selinux relabel itself and just dumps a .autorelabel to make it happen on first boot. rwmjones pointed me to https://bugzilla.redhat.com/show_bug.cgi?id=1049656 , which is in the general neighbourhood but not exactly what I saw when I ran it with -x -v - when I did that I saw it complaining about the policy version (which is why I wrote that in my comment).

Given the requirement 'boot the image, make sure the relabel finished, and shut it down cleanly' I couldn't think of any better option than expect :/ I did at least move all other stuff out of expect so this is the only thing we're using it for now. if anyone has any ideas I'm happy to try...

adamwill updated this revision to Diff 1777.Dec 12 2015, 6:35 AM

address review concerns, add README, do more stuff in commands

So this mostly addresses the issues raised in review, and adds
a README file explaining various stuff including the JSON file
format.

Note that instead of making the root password configurable in
the JSON (as suggested), I decided to be more consistent:
update, root-password, and selinux-relabel are all actually
virt-builder customization commands we were passing in as
arguments, so I went ahead and moved them into the .commands
files instead (adding a minimal.commands). This achieves the
same goal and is consistent - we only handle non-customization
arguments in the JSON / python.

adamwill updated this revision to Diff 1779.Dec 12 2015, 8:22 PM

document 'maxage'

adamwill updated this revision to Diff 1780.Dec 12 2015, 8:26 PM

comment the 'imggrp' argparse stuff a bit more

as this threw jskladan off, let's document it as hard as
possible.

adamwill updated this revision to Diff 1781.Dec 12 2015, 10:11 PM

drop the 'image version' files, simplify some other stuff

in the shower this morning I realized a way to simplify this
quite a bit. We don't need special 'image version' files -
we can just stuff the image version into the filename, and
then everything we want to happen will just happen magically.
The old files will now be considered 'unknown' (so all will
clean them up) and the new files 'missing' (so all will build
them). Simples! As a special case, the version isn't included
if it's < 2, so the current filenames are still 'valid'.

That kinda led to a few other re-arranagements: I prefer this
approach to mutating the guestfs image file names (providing
an argument that specifies image attributes that should be
appended to the file name) as it's more generic and it all
happens at instantiation, it doesn't need special methods.
And quite a bit of the stuff on the top-level Image class
just didn't really seem necessary any more. That whole class
could actually just go and be replaced with a size-setting
function now I guess...

adamwill updated this revision to Diff 1782.Dec 12 2015, 10:14 PM

clean up guestfs image temp files if user ctrl-c's

adamwill updated this revision to Diff 1783.Dec 13 2015, 5:03 AM

drop Image class in favour of a simple size string function

about the only useful thing the Image class did any more was
handle size strings, so let's ditch it and just have a simple
function for the size strings. (I actually would like to use
blivet's awesome Size class for this, but it's a heavy dep
just for this purpose).

jskladan requested changes to this revision.Dec 15 2015, 12:02 PM

Definitelly much better than the first pass. I still have some food for thought though.

tools/createhdds.py
99

This does not really make much sense. First of all, why not store the 1 in the first place? Next up, the content of the imgver does not really need to be integer at all. The way your code works now, the images get rebuilt if imgver changes to any kind of new value. For example if I had a json file with imgver set to 20, and "updated" the imgver to 3, the code would still say that the said image needs to be rebuilt, as there is no file with _v3_ in the directory.

So it would IMHO only make sense to remove the int() check, and just plain store whatever is imgver set to, into the filename.
Change the documentation to reflect the fact, that the code does not at all care about it being an integer, or an ascending number, and just say something in the lines of "When imgver changes, the image gets rebuilt".

Ditto for the VirtBuilderImage.

101–104

see comment on get_guestfs_images

195–196

See comment on GuestfsImage.

287–293

This could really be replaced with:

name_extras = []
if len(imggrp.get('filesystems', [])) > 1 or len(filesystems) > 1:
    name_extras.append(filesystem)
if len(imggrp.get('labels', [])) > 1 or len(labels) > 1:
    name_extras.append(label)
img = GuestfsImage(​name, imgver, size, filesystem, label, parts, writes, uploads, name_extras)

and then in GuestfsImage one could easily replace the:

if name_extras:
    for item in name_extras:
        getter = attrgetter(item)
        self.filename = "{0}_{1}".format(self.filename, getter(self))

with much simpler:

if name_extras:
    for item in name_extras:
        self.filename = "{0}_{1}".format(self.filename, item)

and get rid of the (very hard to read, and unnecessary) attrgetter "magic".

I get why it was written this way, but since the default values of filesystem and label are never really used, as this is the only place where the GuestfsImage is instantiated, and the filesystem and label parameters are passed to the constructor.

This revision now requires changes to proceed.Dec 15 2015, 12:02 PM
adamwill added inline comments.Dec 15 2015, 5:24 PM
tools/createhdds.py
99

Why not store the 1 in the first place - so *currently existing* images aren't considered invalid, i.e. we can switch over to this new script without configuring renames for every existing image or renaming them manually. We can do that, though, if you'd prefer.

Integer comparison stuff - yeah, you're right, we can ditch that now.

287–293

yeah, in fact I actually meant to change that already but got sidetracked into something else (I tried ditching the classes entirely, but that didn't work out well and I gave up on it) and forgot it...

adamwill updated this revision to Diff 1790.Dec 15 2015, 5:58 PM

drop imgver concept entirely, simplify name_extras

So how about this: we don't really need imgver at all. We
can just do it with 'name': if we change an image group
incompatibly, just change its name to 'desktop_v2' or
whatever. The only slightly odd thing about that is that
it will also become the subcommand name, but meh, that doesn't
seem like a very big deal. I wasn't planning to automate use
of the subcommand names or anything, only 'check' and 'all'.

Also this simplifies name_extras as suggested by jskladan.

garretraziel requested changes to this revision.Dec 16 2015, 1:38 PM

Looks good in overall, name_extras is much more readable now, but I would (and I think that @jskladan would agree) leave imgver in there - just drop int() type conversion and append it every time. Although you could control it with name, changing programs' subcommand feels weird :-).

This revision now requires changes to proceed.Dec 16 2015, 1:38 PM

Well, I honestly don't really love the design of an 'imgver' that's actually just an arbitrary string that gets appended to the image name, but if you really want that, fine. I think it's icky.

adamwill updated this revision to Diff 1804.Dec 16 2015, 10:32 PM

restore imgver, by popular demand

it works a bit more simply now; it's just a string that gets
included in the filename (with the usual _ separators) if
specified, if not specified it is entirely omitted (this
preserves the current names).

jskladan accepted this revision.Dec 17 2015, 7:40 AM

Great, thanks!

garretraziel accepted this revision.Dec 17 2015, 12:11 PM

Yup, seems fine to me.

This revision is now accepted and ready to land.Dec 17 2015, 12:11 PM
This revision was automatically updated to reflect the committed changes.