createhdds: work around #1351352, fix support_server fails(?)
AbandonedPublic

Authored by adamwill on Jun 29 2016, 9:43 PM.

Details

Summary

First of all, this works around #1351352 for all virt-builder
images. As described in the bug, it seems that if you build
an updated f24 image with an selinux-relabel at present, it
gets stuck in a boot loop; the workaround is to set SELinux
to permissive for the first boot by editing /etc/selinux/config
then restore it with a firstboot-command (which will run when
createhdds lets the system reboot after the relabel, logs in,
and shuts down, so when openQA boots the image, SELinux should
be in enforcing mode).

Secondly, we patch some kernel params out of grub.cfg which
virt-builder puts in there (again, with a firstboot-command)
in support.commands. The support_server jobs are failing
sometimes in prod, and from the video, it looks like the login
prompt shows briefly in one video mode, then the console
switches modes and the prompt disappears; if openQA doesn't
run a needle match while the prompt is briefly visible, it fails
to log in. I think dropping these params (especially console=
tty0) should avoid that.

Test Plan

Check that creation of all virt-builder images works
and that support-server test runs with the new support server
image (I didn't bump the imgver because I don't feel like doing
a PR for openqa_fedora too...). The failure is kinda intermittent
though.

Diff Detail

Repository
rOPENQA fedora_openqa
Branch
relabel-permissive
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 694
Build 694: arc lint + arc unit
adamwill retitled this revision from to createhdds: work around #1351352, fix support_server fails(?).Jun 29 2016, 9:43 PM
adamwill updated this object.
adamwill edited the test plan for this revision. (Show Details)
adamwill added a reviewer: garretraziel.

staging is on this branch of the tools ATM, and I've actually ninja'd a rebuild of the support-server disk image with these changes onto prod too (though its git checkout is still develop), because I want to see if the fix actually works and the bug only seems to happen often on prod (probably a speed thing).

Here are some cases of the support-server fail:

https://openqa.fedoraproject.org/tests/23985
https://openqa.fedoraproject.org/tests/23884
https://openqa.fedoraproject.org/tests/23790
https://openqa.fedoraproject.org/tests/23950

if you watch the videos carefully, you can see the login prompt appear briefly around 3 secs into boot then some kind of mode switch occurs very shortly afterwards and the prompt is lost.

The other fix is the important one here, though - without that, all virt-builder image builds fail, so when the upgrade images age out in a few days they'll fail to rebuild.

adamwill added a comment.EditedJun 29 2016, 10:38 PM

Interestingly the same issue actually affects the minimal image, but we're shielded from it there (at least through upgrade_preinstall) because in the test where we use that image, we never actually check for a login prompt on tty1; we use $self->boot_to_login_screen(), which winds up just doing a wait_still_screen, then switch to tty3. The support_server test uses _console_wait_login, which *does* login on tty1. The minimal upgrade test does use _console_wait_login after the upgrade, though, I wonder if we've ever failed there...

adamwill updated this revision to Diff 2326.Jun 30 2016, 6:00 AM

replace virt-builder kernel args with 'rhgb quiet'

Is this really necessary if we're switching to virt-install in D917?

Right, as I said in D917, it makes this unnecessary. But if you want to take a bit of time to review D917 we should merge this in the mean time.

D917 seems to be working fine except a change in ansible's git behaviour caused us to lose all the goddamn images it had built on stg :( Now rebuilding them.

garretraziel resigned from this revision.Jul 8 2016, 12:03 PM
garretraziel removed a reviewer: garretraziel.

This is no longer needed.

adamwill abandoned this revision.Aug 3 2016, 8:34 PM

Indeed, no longer needed.