Allow checks to set result's log_url to specific artifact
ClosedPublic

Authored by jskladan on Jun 11 2015, 2:45 PM.

Details

Summary

We want to allow checks to set log-urls in resultsdb to point onto a selected
artifact file. Thus adding an artifact attribute to the CheckDetail class
and then using the (possible) artifact variable as a storage for a relative
path to a file in the $artifactsdir. resutlsdb_directive then checks for
the presence of this variable in TAP, and replaces the result's log_url
to the URL or the respective artifact file, if needed.

Test Plan

unittests added, not yet tested in real runs

Diff Detail

Repository
rLTRN libtaskotron
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
jskladan retitled this revision from to Allow checks to set result's log_url to specific artifact.Jun 11 2015, 2:45 PM
jskladan updated this object.
jskladan edited the test plan for this revision. (Show Details)
jskladan added reviewers: kparal, tflink, mkrizek.

Functionally, it looks good to me. I wonder if we should think about supporting multiple artifacts per item but that's outside the scope of this review/ticket.

For the lint errors, I think that the issue is consistency instead of whether or not it looks ok - here are quite a few lint issues in here and the files it touches. I went through and fixed all the lint errors in directives/resultsdb_directive.py (patch against this diff:

) and I don't think that any of those changes make the code that much harder to read.

I'll start a thread on qadevel@ (again?) about coding style but from the conversations we've had before, I suspect that the chances that all of us will agree on a set of exceptions quickly are approximately 0. Therefore, I'm going to propose that we use strict PEP8 with exceptions only when there is absolutely no other way to get the code done - that limits the amount of time we're likely to spend discussing exceptions, general style and customizing rules for flake8 et. al.

testing/test_check.py
53

I think that just using a string for the artifacts path would be better here. it's less complicated and fewer things to go wrong since we're not trying to test config handling here

Thanks, Josef! I have some improvement suggestions ;-) See below.

In D389#7149, @tflink wrote:

I wonder if we should think about supporting multiple artifacts per item but that's outside the scope of this review/ticket.

We discussed it and realized that a directory support should be sufficient for now. Also, the single artifact link can be a generated html page with links to other files present in the artifacts dir, if some check wants to go fancy about presenting results.

I'll start a thread on qadevel@ (again?) about coding style but from the conversations we've had before, I suspect that the chances that all of us will agree on a set of exceptions quickly are approximately 0. Therefore, I'm going to propose that we use strict PEP8 with exceptions only when there is absolutely no other way to get the code done - that limits the amount of time we're likely to spend discussing exceptions, general style and customizing rules for flake8 et. al.

Oh, not again! :/

However, I have to say that the existing lint errors were bugging me in this case. Not because the source code would be unreadable (Josef insists on having spaces between parameters keys and values, I don't, but both seems fine to me), but because the review was hard to read. So it's true I'd like to see one of that fixed in the future reviews - either the spaces removed (I know it's a habit, but you get notified before the patch is submitted), or the particular lint warning disabled. Either works for me.

libtaskotron/check.py
89–93

I think this contains unnecessarily low-level implementation details. Also, I realized that "relative path to" is confusing - relative from where? From the artifacts base directory? From the current working directory? Let's make it simpler:

path to a file or directory in artifacts directory. It will represent task output specific for this particular :attr:``item``.

And we will convert this path to an absolute/canonical path afterwards and check that it really is in the correct directory.

Btw I'm not sure if we can easily hyperlink artifacts directory to one of our howto guides, i.e. here:
https://docs.qadevel.cloud.fedoraproject.org/libtaskotron/latest/taskyaml.html#provided-variables
Would be nice, but I haven't studied how to do it and whether it would not make the source code unreadable. And we're missing the documentation for artifactsdir anyway, reported as T497.

119–120

So if the person sets cd.artifact after creation, the path will not be stripped if needed. I think this either needs to be in a property setter, or we need to do this in the resultsdb_directive. Because this can be potentially set before the file exists, and I'm not sure how os.path.abspath() or os.path.realpath() behave on nonexistent files, the latter approach might be safer (and I think I would prefer it, the fact we need to make the path relative is caused by how we communicate with resultsdb, it's not important for the TAP output itself).

In resultsdb_directive, I'd check that the file exists, canonize the path, check that it is in the required directory, and make it relative. If any of those steps fail, I'd print a warning and not include it in the resultsdb submission call.

libtaskotron/directives/resultsdb_directive.py
231

When looking at this, I realized that whenever we'll want to change the host domain, or some path of the URL, we'll need to modify all existing records in the database. That makes me wonder whether it wouldn't be better to store just the relative path in the db and have a baseurl configuration in the resultsdb frontend which would join the two together. Of course, this is now complicated by the fact that we use the same log_url for both artifacts files (which we store ourselves) and buildbot logs (which buildbot maintains). Just a thought for the future. Not part of this ticket.

232

Out of curiosity, why do you use 8 spaces here when we use 4 spaces indentation everywhere else?

testing/test_check.py
53

TestingConfig has its own value for artifactsdir, which is /var/tmp/taskotron-test/artifacts. If that gets changed in the future, this check will fail. I don't mind either way, but if this is really tested without a config object, I'd rather have a separate function for stripping the basedir (sounds trivial, but might be more complex, as I proposed some changes above) with an input artifactsdir parameter, so that the test will be completely independent on the config value.

kparal added inline comments.Jun 12 2015, 1:34 PM
libtaskotron/check.py
89–93

Reading the comment after myself, I think I should clarify - we would support both relative and absolute paths for artifact. A better docstring:

an absolute path to a file or directory in artifacts directory. Can also be relative, from the artifacts directory.
It will represent task output specific for this particular :attr:``item``.

The reason is that you'll receive ${artifactsdir} variable in the task formula, but it is not a subdirectory in ${workdir}. Providing absolute paths is trivial (you'll need to use it when creating the file anyway). Providing relative paths from ${artifactsdir} to artifact is non-trivial, but I'd still keep the option to do that, it might be handy for someone.

In D389#7149, @tflink wrote:

Therefore, I'm going to propose that we use strict PEP8 with exceptions only when there is absolutely no other way to get the code done - that limits the amount of time we're likely to spend discussing exceptions, general style and customizing rules for flake8 et. al.

Let's agree to disagree - with the limit to 80 chars per line, PEP8 is absolutely impractical. More so, the lint errors in the review correspond with the coding style of the respective files. Fixing lint just one the new code, so it shuts up IMHO reduces the readability of the code as a whole, and changing all the files with the "functional" patch makes the code hard to review.

Not that I disagree with the need for more-or-less unified coding style, but PEP8 taken too seriously, and literally is IMHO one of the worst options.

jskladan added inline comments.Jun 15 2015, 8:57 AM
libtaskotron/check.py
89–93

While the comment as drafted in your comment feels more confusing than explanatory to me, and I do not really get why The reason is that you'll receive ${artifactsdir} variable in the task formula, but it is not a subdirectory in ${workdir}. is of any importance. But hey, I'm not the docstring guy :)

Personaly, I'd rather not get into the whole "is absolute, but can be relative" thing, especially since we do not want to get too implementation-detaily. How about jus simplet::

path to a file or directory in the artifacts directory.
It will represent task output specific for this particular :attr:``item``.
119–120

In resultsdb_directive, I'd check that the file exists, canonize the path, check that it is in the required directory, and make it relative.

Yeah, it probably makes sense to move it to the resultsdb_directive. The only "complicated" thing here being that we allow both absolute and relative paths to the $artifactsdir, so the check that the file exists part will be a bit nasty, but nothing unbearable (consider this pseudocode)::

artifact_path = cd.artifact
if not os.path.exists(artifact_path):
    artifact_path = os.path.join(env_data['artifactsdir'], cd.artifact)
    if not os.path.exists(artifact_path):
        log('Artifact %s not found' % cd.artifact)
        cd.artifact = None
artifact_path = os.path.abspath(artifact_path)
if not artifact_path.startswith(env_data['artifactsdir']):
    log('Artifact %s is outside artifactsdir' % cd.artifact)
    cd.artifact = None
else:
    cd.artifact = artifact_path[len(env_data['artifactsdir']):]

Or do you see any other, simpler way?

libtaskotron/directives/resultsdb_directive.py
231

I'd rather not go that way. First of all, the resultsdb_fronted will not necessarily be the only consumer (think Bodhi2). Secondly, I'm still in the mindset, that ResultsDB should not be "too smart" - knowing too much of the implementation/deployment details of the whole Taskotron stack is not its role. Should we ever decide to change the domain/URL path we can either set a redirect for a finite amount of time (say three months) before the data is useless anyway, or make a batch-update of the DB.
Also, if we end up skipping domains more than once a year, there is probably something seriously wrong with our setup, and fixing it in ResultsDB won't even be the right place to do it.

232

What do you mean? I do not see 8-spaces indentation instead of 4-space anywhere in the code above. This is simply continuation line. But I might be suffering from temporary blindness :)

testing/test_check.py
53

It IMHO makes sense to take the value from config, since the code being tested removes config.artifactsdir from the path's beginning. This IMHO makes the test more implementation independent, but I do not really have a problem with using just a string, if we find it a better solution

kparal added inline comments.Jun 15 2015, 11:42 AM
libtaskotron/check.py
119–120

If you think it's too complicated, we can allow only absolute paths in cd.artifact. And that can be checked easily via a setter.

But yes, I meant something like this. Maybe with small adjustments. Let's suppose we only support absolute paths, or if it is a relative path, it is the usual "relative from cwd" type (as long as the documentation doesn't say that anything else is possible). The code could look like this:

artifact_path = os.path.realpath(cd.artifact)
artifactsdir = os.path.realpath(env_data['artifactsdir'])

if not os.path.exists(artifact_path):
    logger.warn('Artifact %s does not exist, ignoring' % artifact_path)
    artifact_path = None
else if os.path.commonprefix(artifact_path, artifactsdir) != artifactsdir:
    logger.warn('Artifact %s is placed outside of artifacts directory %s, ignoring' % (artifact_path, artifactsdir))
    artifact_path = None
else:
    artifact_path = os.path.relpath(artifact_path, start=artifactsdir)

(I think we should not touch the input object, i.e. not clear cd.artifact inside the resultsdb_directive.)

If we want to support relative path from artifacts dir:

artifactsdir = os.path.realpath(env_data['artifactsdir'])
if os.path.isabs(cd.artifact):
    artifact_path = cd.artifact
else:
    artifact_path = os.path.join(artifactsdir, cd.artifact)
artifact_path = os.path.realpath(artifact_path)

if not os.path.exists(artifact_path):
    logger.warn('Artifact %s does not exist, ignoring' % artifact_path)
    artifact_path = None
else if os.path.commonprefix(artifact_path, artifactsdir) != artifactsdir:
    logger.warn('Artifact %s is placed outside of artifacts directory %s, ignoring' % (artifact_path, artifactsdir))
    artifact_path = None
else:
    artifact_path = os.path.relpath(artifact_path, start=artifactsdir)

The docstring could then look like:

path to a file or directory placed in the artifacts directory, either absolute ``$artifactsdir/report.html`` or relative ``report.html``.
It will represent task output specific for this particular :attr:``item``.

I think with the example it's quite obvious that relative means relative to artifacts dir.

All three approaches are fine, as long as they're clear form the docstring documentation. I prefer enforcing absolute paths in a setter, if you want to make it really simple, or being more user friendly and allowing relative paths from the artifactsdir, if you don't mind having the code more complicated.

libtaskotron/directives/resultsdb_directive.py
231

Those are good arguments.

232

Line 207 starts on column 17, but line 208 starts on column 25 (I would expect 21). I think that's what triggered that lint warning. The same for lines 213 and 214.

jskladan updated this revision to Diff 1035.Jun 15 2015, 1:45 PM
  • Linter is satisfied, adjusted path handling as proposed by @kparal
kparal accepted this revision.Jun 16 2015, 11:37 AM

Please fix the copy paste error, otherwise looks good to me.

testing/test_resultsdb_directive.py
182–183

This looks like a copy paste error.

This revision is now accepted and ready to land.Jun 16 2015, 11:37 AM
jskladan updated this revision to Diff 1045.Jun 17 2015, 7:36 AM
  • fixed according to review
Closed by commit rLTRN5d9b1a6feee3: Allow checks to set result's log_url to specific artifact (authored by Josef Skladanka <jskladan@redhat.com>). · Explain WhyJun 17 2015, 8:13 AM
This revision was automatically updated to reflect the committed changes.