Use python-rpm, named pipe and threading
for downloading RPM headers.
Details
- Reviewers
kparal tflink jdulaney - Maniphest Tasks
- T226: depcheck: download only rpm headers?
Add headers_only=true to task-depcheck to koji directive,
run depcheck.
Diff Detail
- Repository
- rLTRN libtaskotron
- Branch
- feature/T226-headers_alt
- Lint
No Linters Available - Unit
No Unit Test Coverage
Note: msimacek from koschei also implemented this approach, independently on me, here is his solution.
I like this solution much more than D223. That doesn't mean I don't have any reservations about the current code, I have quite a few, but I think the concept is good.
There are some possible improvements, like using an unnamed pipe, similar to the linked koschei script (which is not an easy-to-understand code, however). Also, if there were some concerns about using threads, I think we can do this in a single thread:
- download a fixed amount of data, e.g. 10kB (let's measure the usual rpm header size and use this value)
- try to pass it to ts.hdrFromFdno()
- if it fails, download another chunk, e.g. twice the size than in #1
- try to pass it to ts.hdrFromFdno()
- if it worked, close and clean up everything
- if it didn't work, bail out (return None or something) and fall back to full RPM download
All nice and simple, in a single thread. It does not support downloading arbitrary amount of data, but I think we don't want to support that anyway. If the header is not found in a reasonable small chunk of initial data, we want to print a warning and fall back to regular methods.
@tflink, you had the largest reservation regarding RPM header approach, what do you think about this patch? Of course, if we detect production environment, we can always download full RPMs, if we want to. But with the number of RPMs we test every day on our dev machines, I think that after running this a week or two on our dev machines, we will be reasonably sure whether there are some potential problems with it or not.
I would still like to incorporate either this or D223. It has not been a pressing issue in the past, because we now have rpm caching in dev mode, but from time to time it is still PITA to work on depcheck and wait for the initial several GBs large download. We also had issues with slow downloads in production, and soon we will have the need to speed up our checks, because esp. during freeze periods our task queues start to fill up rapidly. I think the major issue here is not "this might cause weird issues" but rather "can we detect broken headers and in such case download the full file instead?". And it seems to me that the answer is yes. Not to mention we could even store the rpm headers as artifacts now that we have support for it, and use them for reproducing issues if needed.
Sure, but the Diff is almost 8 months old, and nothing happened. If you do not feel like closing it, I'll remove myself from reviewers - it annoyed me on the front page long enough :)
It looks like there are currently other things to do. I will abandon this revision, we can open it again when we will need it.