implement yumrepoinfo
ClosedPublic

Authored by kparal on Apr 14 2014, 2:14 PM.

Details

Summary

This ports repoinfo.conf from AutoQA into Taskotron as yumrepoinfo. A lot of adjustments were made. Most important notes:

  1. It again uses the singleton pattern (using a global variable) to avoid parsing config files for every instantiation.
  2. I decided to keep the ConfigParser structure because the string interpolation is convenient. But if we think we should e.g. convert everything to YAML, it can be done.
Test Plan

New tests. Unlike with config.py, I decided to allow config file parsing from disk even for unit tests. The rationale:

  • with Config, it doesn't make sense to load adjustments for test suite (the test results should be the same everywhere => reliable)
  • with Config, we already have a set of defaults in config_defaults.py (and we need them), so we can easily just load that
  • with yumrepoinfo, I'm unclear whether some unit tests (for different modules) won't be written with a particular repoinfo contents in mind. Anything statically provided could quickly become obsolete (yumrepoinfo changes at least once every 6 months)
  • with yumrepoinfo, in order to provide the defaults without config file parsing, I'd have to create yumrepoinfo_defaults.py, which would need to be imported and parsed, therefore causing additional disk access anyway.

So, I think there's no performance hit when allowing other test modules to import yumrepoinfo and load the config file from disk. Because of singleton pattern, it's done only once (per arch). For some heavier disk activity, functional tests should be used, of course.

Diff Detail

Lint
Lint Skipped
Unit
Unit Tests Skipped
jskladan requested changes to this revision.Apr 15 2014, 12:39 PM
jskladan added inline comments.
libtaskotron/yumrepoinfo.py
51

Would it make sense to list the possible keys, that can be used?

54

I know this is a corner case, but maybe something like:

self.defaults['arch'] = rpm_utils.basearch(self.defaults.get('arch', arch))

would make sense. But listing the possible keys in defaults (see line 50), can easily solve the problem.

58

Not sure what the conventions are here, but I consider:

self.parser = ConfigParser.SafeConfigParser(defaults = self.defaults)

more readable.

kparal added inline comments.Apr 15 2014, 1:06 PM
libtaskotron/yumrepoinfo.py
51

I have decided to remove the option completely, nobody has ever used it in AutoQA. We can add it back once needed. Thanks for pointing out the problems.

58

Sorry to disagree :-)
http://legacy.python.org/dev/peps/pep-0008/#other-recommendations

We don't need to follow PEP8 guidelines, not at all, but in this case I actually consider missing spaces more readable than present spaces, in line with the recommendation. So that's why I write it this way.

kparal updated this revision.Apr 15 2014, 1:10 PM
  • remove defaults= from init, because nobody has ever used it
  • add explanation header to yumrepoinfo.conf
kparal updated this revision.Apr 15 2014, 1:12 PM
  • remove init(defaults=) unit test
tflink accepted this revision.Apr 16 2014, 7:38 AM

Sigh, another piece of code that I don't really like but also don't have a better solution for. I don't like the idea of statically defining valid releases in a config file that we're going to have to package. That leaves us in a similar situation as AutoQA where the package will need to be updated every time the repos change but on the other hand, there are other packages that are also required to update as releases change state (mock is one that comes to mind). At least this way, we aren't mandating the existance of /etc/taskotron/yumrepoinfo.conf which was a frustration of mine with AutoQA's repoinfo.

with Config, it doesn't make sense to load adjustments for test suite (the test results should be the same everywhere => reliable)

Sure, but you still run into the potential issue where production will almost always be using a different absolute repo than anything outside the fedora infra. It's also using download.fp.o which returns a different mirror for almost every call - I'm not sure that we're really ensuring consistency here.

with yumrepoinfo, I'm unclear whether some unit tests (for different modules) won't be written with a particular repoinfo contents in mind. Anything statically provided could quickly become obsolete (yumrepoinfo changes at least once every 6 months)

Why are we statically defining all of this? All of this _should_ be algorithmic in the general case so we should be able to have some templates for yum repos that are fed a baseurl and release (special casing rawhide) to get our yum repo information instead of having everything live in a config file.

with yumrepoinfo, in order to provide the defaults without config file parsing, I'd have to create yumrepoinfo_defaults.py, which would need to be imported and parsed, therefore causing additional disk access anyway.

This sounds like premature optimization to me. How many times would repoinfo be parsed per task execution? I can't imagine any more than 2 or 3 unless we end up with some poorly written tasks. The singleton can't cross exec() boundaries unless explicitly passed so unless it's only being called inside tasks, it'll be parsed multiple times either way.

So, in the end, I'm not crazy about this but I don't have a better idea. With that in mind, it looks good to me. Thanks for taking care of this.

conf/yumrepoinfo.conf
9

I'd like to see a comment here about replacing download.fedoraproject.org. Outside of the fedora infra, it is a frontend to mirrormanager and will not always return the same mirror for succesive requests. I think it's the best default we could have but also think that adding a note that it would be better to replace it with a semi-local absolute mirror would be best

Sigh, another piece of code that I don't really like but also don't have a better solution for. I don't like the idea of statically defining valid releases in a config file that we're going to have to package. That leaves us in a similar situation as AutoQA where the package will need to be updated every time the repos change but on the other hand, there are other packages that are also required to update as releases change state (mock is one that comes to mind).

It's very suboptimal, yes. I tried to port an existing solution, not to devise a new one. But if you have some good pointers, I can try. When I think about it, I could have made most of the document dynamic. The only static pieces that need to be updated every 6 months are:

  1. Rawhide to tag mapping:
[rawhide]
tag = fc21
  1. Stable vs development status for each repo. This status changes a) URLs b) probably also some additional semantics some checks might need to have.

I don't really know how to avoid the need to have a configuration file defining at least these two pieces of information and update it regularly.

All the other stuff can probably go away and I can do it if you want. The logic will be in code instead of in a config file, but the logic is very stable (how you compute URL for a repo, or which repo is a parent of which repo). As long as we support just Fedora.

with Config, it doesn't make sense to load adjustments for test suite (the test results should be the same everywhere => reliable)

Sure, but you still run into the potential issue where production will almost always be using a different absolute repo than anything outside the fedora infra. It's also using download.fp.o which returns a different mirror for almost every call - I'm not sure that we're really ensuring consistency here.

With the current setup, any config file modifications will have no effect on the test suite. Are you trying to say they should? For unit tests, I think the URLs don't matter (we won't download anything anyway). For functest, there might be such a need... not sure. I assumed it's better to deal with it once we get there.

Why are we statically defining all of this? All of this _should_ be algorithmic in the general case so we should be able to have some templates for yum repos that are fed a baseurl and release (special casing rawhide) to get our yum repo information instead of having everything live in a config file.

Yes, if we put the logic into code (we take a URL template from the config file and substitute a few variables in it, that's it). We still need to know which release is stable and which is development, though, as mentioned above. This is probably a simple change to do, if you prefer it that way, I can change it.

conf/yumrepoinfo.conf
9

I don't think we had many problems with it, because as long as yum operates on those URLs, it resolves the server first and then performs all operations with a single repo (see baseurl= in /etc/yum.repos.d/fedora.repo, it's the same). But yeah, I remember some kind of issues like that in the past. I'll add a note that it's better to hand-pick a mirror.

kparal closed this revision.Apr 16 2014, 1:18 PM

Closed by commit rLTRN34e39154cccb (authored by @kparal).