add reporting to ResultsDB
ClosedPublic

Authored by garretraziel on Jul 7 2016, 10:34 AM.

Details

Summary

This adds possibility to report results to ResultsDB. It
changes cli args a little bit (now you specify either --wiki or
--resultsdb or both and --submit if you want to submit your
results). It sets more variables in openQA during test scheduling,
namely subvariant and imagetype (we COULD retrieve those things
from BUILD, but it's safer this way) as well as job ID of ResultsDB
job (that had to be created during test scheduling).

In contrast to Wiki reporting, we want to report both passes and
fails to ResultsDB (we probably aren't going to use ResultsDB directly,
we are planning to write some service on top of it).

We are using shiny new convention of naming tests with dot separator,
so QA:Testcase_Boot_default_install becomes
openqa.installation.boot_default_install. We are also trying to put
links where appropriate, so job overview links to openQA's overview,
testcase links to testcase page on wiki and result links to openQA
job run.

Right now, ResultsDB consumer is missing, I'll add it in some future
DR.

Test Plan

Run ResultsDB and ResultsDB-frontend locally, schedule
some tests, let it report to ResultsDB, observe results.

Diff Detail

Repository
rOPENQA fedora_openqa
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
garretraziel retitled this revision from to add reporting to ResultsDB.Jul 7 2016, 10:34 AM
garretraziel updated this object.
garretraziel edited the test plan for this revision. (Show Details)
garretraziel added a reviewer: adamwill.
garretraziel added a subscriber: jskladan.
adamwill requested changes to this revision.Jul 8 2016, 9:32 PM

A few comments in-line.

scheduler/fedora_openqa_schedule/cli.py
93

typo.

169

I don't like this, and it will break our production deployments (I'd have to change things so prod explicitly passes --submit). Why did you change it? The existing setup allows for it to be specified by config but overridden on the CLI.

169

The help text here is a bit confusing, as really what the wiki and resultsdb args control is whether results of that type are *produced*, not whether they are *submitted*. submit controls whether whatever results are produced get *submitted* (or just printed).

scheduler/fedora_openqa_schedule/config.py
37

I don't think either of these will actually affect reporting, because the way you changed args.submit, the do_report arg for report_results() and resultsdb_report() will never be None, it will always be True (if --submit was specified) or False (if not). submit_resultsdb does affect whether a resultsdb 'job' gets created at all, though.

scheduler/fedora_openqa_schedule/report.py
224

'additional' (two d's)

scheduler/fedora_openqa_schedule/schedule.py
315

this probably needs to account for the case where the job has already been created, and take the job_id of the existing job?

we also have nothing that will ever change the status *away from* RUNNING, right, since there's no consumer yet? that seems like a problem.

The submit config settings / CLI args are really feeling pretty mushy right around now, what with this config setting being read here. I'm having trouble keeping track of them, they could probably do with a group-up rethink?

scheduler/schedule.conf.sample
18

I think this is some kinda rebase mistake? Don't think this exists any more.

20

so this - and wiki_url - seems to only be used for producing links that go in the test results; it's not actually the URL we connect to in order to retrieve the test results, the openQA-python-client configuration is used there (as we initiate OpenQA_Client with no args). I can see a use for this - in case we want our OpenQA_Client to connect to http://localhost but we want to use https://openqa.fedoraproject.org in the links that are produced in the results data, for instance. But it needs to be more clearly explained at least, and generally, the mismatches in how all these different URL settings are actually used is a bit confusing I think.

26

AFAICT this is not used at all.

This revision now requires changes to proceed.Jul 8 2016, 9:32 PM

I'll correct mistakes and resend this.

scheduler/fedora_openqa_schedule/cli.py
169

I thought that we are using moksha consumers for reporting in prod, so cli is used only by us, during development. We thought that we are rarely (if ever) using cli for reporting to wiki. But I don't have strong opinion either way, I can change it back/make whole cli usage more backward compatible.

scheduler/fedora_openqa_schedule/config.py
37

I'll look at it.

scheduler/fedora_openqa_schedule/schedule.py
315

We can get ID of existing job, but question is whether we want. This way, if we (for whatever reason) run openQA for one compose twice, it will show as two jobs (but with the same ref_url and name). Or is there any chance this code will run twice for one openQA run for one compose?

I'll ask jskladan how ResultsDB works, but from what I heard, "job status" is more of a relic they want to get rid of and I am not sure whether it's problem if we leave job "RUNNING".

I agree that I should unify and clear "submit" behavior and usage.

scheduler/schedule.conf.sample
18

I put it there intentionally because I saw that we are setting this value in config.py. But yeah, I forgot that we aren't using this value anymore, so correct change is to remove CONFIG.set('report', 'jobs-wait', '360') from config.py.

20

I can change comment (or variable name) to make this more clear. OpenQA_Client is taking this URL from its configuration, but I didn't want to read configuration of some foreign library (and besides, as you said, there might be use case for it - connecting to localhost vs. creating links that are visitable from outside).

26

You are right, I put it there before I realized that we want to produce links to prod wiki every time.

adamwill added inline comments.Jul 11 2016, 10:47 PM
scheduler/fedora_openqa_schedule/cli.py
169

yeah, that's true actually. and the consumers do their own 'report/don't report' configuration. So it's not that significant, I just got a bit lost trying to trace it all out in review.

scheduler/fedora_openqa_schedule/schedule.py
315

OK, I see. It's a bit of a squishy problem, then.

The 'normal' case is pretty simple: this will just get hit once, automatically, per compose. In which case there's really no problem. But there *are* awkward cases occasionally which require re-submitting the jobs manually for some reason or other, say something goes wrong and the tests all fail when they shouldn't, I might have to log in and re-submit them manually, or something like that.

I guess I don't really know whether it'd be better to try and re-use the existing RDB 'job' (and hence effectively group all openQA submissions for the same compose into the same RDB 'job') or create a new one per submission, I'm really not sure (and I don't know how significant a difference it is). One thing to note, I suppose, would be that if we keep it this way, you'll get a different result for re-submitting openQA jobs via this tool (new RDB job) vs. restarting an openQA job through the web UI (same RDB job as the original openQA job).

  • few changes
  • resultsdb job creation finalization
jskladan added inline comments.Jul 22 2016, 11:30 AM
scheduler/fedora_openqa_schedule/schedule.py
315

We wouldn't really care about the job state - the Job state is kind of a leftover from the times when ResultsDB also stood in as execution state reporting tool. So having the jobs on RUNNING is not a problem. In the next ReslutsDB patch, the constraint on the job status will be removed, so one less problem to think about (it could all be in some random state then).

On the job-id front - I'd rather create new job - we'll be consuming the results anyway, the (number of) jobs is irrelevant - the most recent results for the specified item(s) are returned by resultsdb anyway. The job acts more as a group - in the next resultsdb version (sometimes in the future) I'll make the semantics clearer, and might even make the job/group not required.

So, about changes in newest DR:

  1. I haven't changed CLI behaviour back. We should discuss it (and if you have different opinion, please say so!), but AFAIK CLI is used mostly during development, where you probably don't want to submit results to our wiki/ResultsDB. In production, we are using fedmsg consumers and I would also argue that this is preferred way of deployment. So I vote for CLI not submitting by default and fedmsg consumer submitting by default - that's the reason I put -s to CLI. But again, it's possible that I am wrong and I don't have strong opinion either way - I can change it back and make it somewhat backward-compatible (thoughts on how CLI should be used to submit/print results to Wiki/ResultsDB?).
  1. I have added --create_resultsdb_job argument to CLI. The thing is, you want to create job in ResultsDB during scheduling, but when we have schedule and report as different subcommands, one cannot find out whether user wants to really submit to ResultsDB before user runs report subcommand. We could do something like "search for existing job with this compose and if it doesn't exist, create one", but jskladan described it as "not ideal". Job from ResultsDB perspective describes "one run" (as in "one CLI schedule command" or "one fedmsg that caused openQA scheduling on compose") and is somewhat akin to buildbot job (and according to jskladan, is relic and was replaced by ExecDB). That's the reason we had to set it to "RUNNING" state, even though we couldn't care less. It would be great (and jskladan said that it's in the works) if "job" in ResultsDB wasn't mandatory and we could use some different result grouping. I don't think that not being able to submit jobs without having job ID will create any problems, but I can add something like possibility to specify job ID in CLI, if you think that it is necessary. So now it works like this:
    • If user will want to submit results to ResultDB, he needs to run CLI with --create_resultsdb_job. It creates new job in ResultsDB and then adds its ID to settings during job scheduling.
    • After job is finished, user can use report with --resultsdb. If job was run without --create_resultsdb_job, job doesn't have job ID in its settings and nothing will be reported to ResultsDB.
  1. Config files and, specifically, submit behaviour. Code above doesn't use config file for CLI for submit. Reasoning is that CLI is used mainly for development, so it either has default behaviour (not submit) or you can specify --submit. I would vote for reading submit value from config file only for fedmsg consumer-part of scheduler. But again, this is more of a proposal and opportunity for discussion.

All in all, I wanted to have often used variant as default solution for things. There is unfortunate dichotomy in that we have fedmsg consumer that is run automatically and is deployed in production and CLI tool that is run by hand and more often used for development and I think that we should decide how to behave in what conditions.

  • another bunch of changes

So I have actually changed behaviour little bit.

  • running report without any additional argument will just print out the results, adding --wiki submits results to wiki, adding --resultsdb submits results to ResultsDB (and you can, of course, use both arguments)
  • user can specify ResultsDB job ID from CLI during reporting (so user can report openQA jobs to different ResultsDB job or report job for which ResultsDB job ID wasn't created during scheduling)
adamwill accepted this revision.Aug 3 2016, 9:03 PM

OK, so aside from the inline comments I guess this is fine...the whole thing with the config files and the CLI parameters really comes from when we just had cron jobs running the CLI to submit results, so there weren't the different paths. What you have now is fine, I guess. My main use case for reporting results with the CLI is when something went wrong with the fedmsg stuff and it needs fixing manually, or something like that.

scheduler/fedora_openqa_schedule/schedule.py
315

compose.split('-')[1] is not entirely safe, as we have some composes now where the distribution name is 'Fedora-Atomic' - e.g. 'Fedora-Atomic-24-20160803.0'. I had this problem in fedfind too, which now uses compose.split('-')[-2] instead.

scheduler/schedule.conf.sample
24

So now you made this be used, but do we actually need it? What's the use case for changing it, ever?

This revision is now accepted and ready to land.Aug 3 2016, 9:03 PM
garretraziel added inline comments.Aug 8 2016, 10:40 AM
scheduler/schedule.conf.sample
24

No, I've removed (unused) wiki_stg_url. wiki_url was actually used before. I didn't want to hardcode wiki URL and figured out that you "might" want testcase detail URL to lead somewhere else (but TBH I don't believe someone is going to change this ever). Do you think I should hardcode this?

This revision was automatically updated to reflect the committed changes.