user stats: print progress to stderr, ignore bot accounts
ClosedPublic

Authored by kparal on Apr 17 2015, 4:46 PM.

Details

Summary

The progress is now sorted and printed to stderr, because the stdout is
supposed to be redirected to a file (containing html output).

Bot accounts are now ignored in user stats.

Text processing is now a bit more efficient (joining a list of strings
instead of appending to a string many times over).

I also removed one try-except block I did not understand, I'm sure we can
come up with a better way if you explain to me what it was for.

Test Plan

tested, this works:

./relval.py user-stats --release 22 --milestone Beta > stats-wiki.html

Diff Detail

Repository
rRELVAL relval
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
kparal retitled this revision from to user stats: print progress to stderr, ignore bot accounts.Apr 17 2015, 4:46 PM
kparal updated this object.
kparal edited the test plan for this revision. (Show Details)
kparal added a reviewer: adamwill.

Thanks for the submission!

  1. Stripping the whitespace at line 754 is wrong. The canonical form of the email signature separator is -- - two dashes and a space.
  1. I'm not keen on the approach to ignoring bot accounts. Firstly, I don't think it should just be hardcoded. I actually find it useful to know how many results the bots have reported, and compare them to humans. So I think it should be configurable (though I'm OK if ignoring bots is the default). Secondly, I don't like the idea of a hardcoded list in cli.py as a way to decide what accounts are bots. I'd like something better.

I had actually been kicking around the idea of extending the {{result template to indicate bot results; it should be relatively easy to add a named parameter to the template, say 'bot', which could be set true to indicate a result came from a bot. We could then quite easily retroactively edit all results filed by coconut and colada to set that parameter. We can have the template display a little robot icon or something, so it's easy to visually spot results from bots. I'd have to make the wikitcms result parser somewhat more sophisticated in order to handle a template with mixed parameters, but it's possible. The alternative is to stuff it into the username field; something like {{result|pass|bot:coconut}}, and the parser looks out for the 'bot:' prefix - which is easier for wikitcms to handle but doesn't let us change the wiki representation of the template for bots.

Note that wikipedia actually has a mechanism for identifying 'bots', but it's focused around allowing bot accounts to make large numbers of edits, and suppressing them from history, so I don't think it'd be quite appropriate to use that for identifying test bots.

  1. It should be OK for the try/except to go. It dates back to before commit dfed91bde10599ca6b65a2a9257bd6455150d163 - before that commit, it wasn't using the page.results_wikitext property, it was reinventing it, only poorly, in a way that used str.index() (rather than str.find() as page.results_wikitext does) and so would throw an exception if the page didn't actually contain the separator it searched for. I don't think results_wikitext should ever throw an exception.
  1. Sorting pages by name is not going to give the best results; it will sort TCs later than RCs, among other problems. wikitcms provides the sortname property for this purpose. Try page.sortname as the sort key.
  1. Is there a python3-safe way of doing print >> sys.stderr, "Processing page:", page.name? relval doesn't work with python3 yet, but I've been at least half-assedly trying to write it such that making it work with py3 won't be too difficult once mwclient does.
adamwill requested changes to this revision.Apr 21 2015, 11:47 PM

So I've done the bot changes in a different (I hope, better) way:

https://fedoraproject.org/w/index.php?title=Template:Result&diff=410627&oldid=367481
https://www.happyassassin.net/cgit/wikitcms/commit/?id=b56241bc25aa45f7f259e6bc685e5f6d2685978c
https://www.happyassassin.net/cgit/wikitcms/commit/?id=710db546dd2b31fc125714e3d3af27a7e3cde468
https://www.happyassassin.net/cgit/wikitcms/commit/?id=3f231a0f3df072f259b91699acd66596befc1d49
https://www.happyassassin.net/cgit/relval/commit/?id=65e92e64bc8e03b3f4fcf5073f9688c6edec3520

Just needs a change to openqa_fedora_tools to mark results as 'bot' (which I'll submit shortly), and me to knock up a one-off script to edit all previous results submitted by coconut and colada. With these changes, relval user-stats will not include 'bot' results by default, but you can pass -b to include them.

Please revise this to just the other changes (taking account of my other comments). Thanks!

This revision now requires changes to proceed.Apr 21 2015, 11:47 PM
In D341#6179, @adamwill wrote:
  1. Stripping the whitespace at line 754 is wrong. The canonical form of the email signature separator is -- - two dashes and a space.

I have configured my editor to automatically strip whitespace on save. I guess I'll need to take care when editing relval :)

  1. It should be OK for the try/except to go. It dates back to before commit dfed91bde10599ca6b65a2a9257bd6455150d163 - before that commit, it wasn't using the page.results_wikitext property, it was reinventing it, only poorly, in a way that used str.index() (rather than str.find() as page.results_wikitext does) and so would throw an exception if the page didn't actually contain the separator it searched for. I don't think results_wikitext should ever throw an exception.

As a recommendation for future, always catch specific exceptions, like IndexError or ValueError. In the absolutely worst case, use except Exception. Otherwise you'll also consume KeyboardInterrupt (Ctrl+C) and SystemExit, which is not what you want. Especially if you have something like continue or pass directly after the except:, so you consume it silently and don't even know that it happened.

  1. Sorting pages by name is not going to give the best results; it will sort TCs later than RCs, among other problems. wikitcms provides the sortname property for this purpose. Try page.sortname as the sort key.

I just wanted to have *some* kind of ordering (to be able to estimate progress) instead of completely randomly jumping through pages :) But sure, will try it.

  1. Is there a python3-safe way of doing print >> sys.stderr, "Processing page:", page.name? relval doesn't work with python3 yet, but I've been at least half-assedly trying to write it such that making it work with py3 won't be too difficult once mwclient does.

I guess sys.stderr.write() is the easiest solution.

In D341#6281, @adamwill wrote:

So I've done the bot changes in a different (I hope, better) way:
https://www.happyassassin.net/cgit/relval/commit/?id=65e92e64bc8e03b3f4fcf5073f9688c6edec3520

I have one objection here, you use -b / --bot to enable bot mode in user stats, but -b / --no-bot to disable bot mode in auto submission. The short cmdline option feels very inconsistent. My recommendation is to change one of the following for auto submission (any of this is fine):

  • get rid of -b, because in programmatic mode you don't need short options (they are executed by scripts, not humans, so you don't need to save people from typing). Leave only --no-bot.
  • change this to -h / --human
  • or to -h / --no-bot

Please revise this to just the other changes (taking account of my other comments). Thanks!

Will do.

By the way, you can also attach your review comments to specific lines, if you want, just click on the line number (or drag over multiple numbers) in the patch view. I'm mentioning it just in case you don't know about it.

adamwill added a comment.EditedApr 23 2015, 7:22 PM
In D341#6384, @kparal wrote:

As a recommendation for future, always catch specific exceptions, like IndexError or ValueError. In the absolutely worst case, use except Exception. Otherwise you'll also consume KeyboardInterrupt (Ctrl+C) and SystemExit, which is not what you want. Especially if you have something like continue or pass directly after the except:, so you consume it silently and don't even know that it happened.

Yep, I know, you'll note fedfind has none of these. I've just been too lazy to go back and fix them all in relval/wikitcms yet. (I'd do it as part of pylint'ing the whole of both projects, which is going to be a solid few days' work, which is why I didn't do it yet.) In practice they're usually in places where they'll run more or less instantly, so the chances of accidentally catching a ctrl-c are minimal.

  1. Sorting pages by name is not going to give the best results; it will sort TCs later than RCs, among other problems. wikitcms provides the sortname property for this purpose. Try page.sortname as the sort key.

I just wanted to have *some* kind of ordering (to be able to estimate progress) instead of completely randomly jumping through pages :) But sure, will try it.

I could get clever and look at how to set up the classes such that doing comparisons on ValidationEvent and ValidationPage objects 'just works' and gives you the expected ordering, I guess...

In D341#6281, @adamwill wrote:

So I've done the bot changes in a different (I hope, better) way:
https://www.happyassassin.net/cgit/relval/commit/?id=65e92e64bc8e03b3f4fcf5073f9688c6edec3520

I have one objection here, you use -b / --bot to enable bot mode in user stats, but -b / --no-bot to disable bot mode in auto submission. The short cmdline option feels very inconsistent. My recommendation is to change one of the following for auto submission (any of this is fine):

  • get rid of -b, because in programmatic mode you don't need short options (they are executed by scripts, not humans, so you don't need to save people from typing). Leave only --no-bot.
  • change this to -h / --human
  • or to -h / --no-bot

Yeah, I was aware of that when I did it, but I couldn't think of a better short name than 'b' for 'no-bot' (since n was taken). 'human' is neat, I like that. I didn't really care too much as nothing actually uses report-auto now, but I'll try and remember to clean it up with one of your ideas, good note.

kparal updated this revision to Diff 949.Apr 24 2015, 10:36 AM

Adjust to the changes in wikitcms.

It would be nice to somehow identify the bots in the results, but that is not part of this patch.

kparal updated this revision to Diff 953.Apr 27 2015, 11:57 AM
  • make stderr printout python 3 compatible
adamwill accepted this revision.May 4 2015, 9:55 PM

Looks good, will merge. Thanks!

This revision is now accepted and ready to land.May 4 2015, 9:55 PM
This revision was automatically updated to reflect the committed changes.