Wikipedia:Bots/Requests for approval

New to bots on Wikipedia? Read these primers!

If you want to run a bot on the English Wikipedia, you must first get it approved. To do so, follow the instructions below to add a request. If you are not familiar with programming it may be a good idea to ask someone else to run a bot for you, rather than running your own.

 Instructions for bot operators

Current requests for approval

Operator: Novem Linguae (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 18:27, Monday, January 30, 2023 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): JavaScript

Source code available: GANReviewTool, in particular GARCloserController.js, GARCloserWikicodeGenerator.js, and GARCloserWikicodeGenerator.test.js

Function overview: Adapt my user script User:Novem Linguae/Scripts/GANReviewTool, which has a GARCloser functionality for closing Wikipedia:Good article reassessments (GARs), to mass close GARs instead of doing them individually. This is a one time run to delist Doug Coldwell good articles.

Links to relevant discussions (where appropriate): Wikipedia:Administrators' noticeboard#Delisting Coldwell GAs, User talk:SandyGeorgia#GARCloser and mass delisting

Edit period(s): One time run

Estimated number of pages affected: 300

Exclusion compliant (Yes/No): No

Already has a bot flag (Yes/No): Yes

Function details: Run GANReviewTool user script in a new mode I will write into the user script, that allows it to be fed a list of GAs to mass delist, instead of being run individually.

The base code is well unit tested and has been in production for months. Should be pretty solid. See User:Novem Linguae/Scripts/GANReviewTool/GARLog.

I'll probably add a 10 second throttle for the mass editing mode. I'll probably add a check for my username before allowing the mass editing mode. Planning to run it in browser.

The nature of this Doug Coldwell use case will probably involve some additional modifications, such as the creation of pro forma GAR pages with an explanation of the Doug Coldwell situation and a link to the AN with consensus, and perhaps some code to combine DYK talk page templates into the Article History talk page templates.

I'll do my development and testing on testwiki.

Discussion

Operator: Usernamekiran (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 16:51, Monday, January 9, 2023 (UTC)

Automatic, Supervised, or Manual: supervised

Programming language(s): AWB

Source code available: AWB's custom module using regex, will upload in my userspace soon

Function overview: remove references/links on mass level (expired/hijacked domains)

Links to relevant discussions (where appropriate): special:permalink/1132589552#pakrail.com at WP:COIN

Edit period(s): mostly one time run per request (removing spammy link)

Estimated number of pages affected: around 1000 for current request

Exclusion compliant (Yes/No): No

Already has a bot flag (Yes/No): Yes

Function details: currently, pakrail.com redirects to an online casino website. It has been used in around 1170 railway related articles. I created a regex that finds the instance of pakrail.com, and removes the <ref... text-pakrail.com ... /ref>

I made around 50 edits through my alt Usernamekiran (AWB) account using that regex. Currently it is removing the links if it is in referencing template.

There is no scope for mistake, I would like the approval for saving the edits automatically.

currently it is not removing the plain link from "external link" section. (eg: * [http://pakrail.com Pakistan Railways official site]) I will remove these links using some other method from AWB, and I will perfect the method soon.

PS: previous BRFAs were filed under bot's old username, UsernamekiranBOT. —usernamekiran (talk) 16:54, 9 January 2023 (UTC) PPS: the pakrail.com was never the official webiste. —usernamekiran (talk) 17:13, 9 January 2023 (UTC)Reply[reply]

Discussion

Is there some reason you don't just let GreenC's bot (see Wikipedia:Link rot/URL change requests) do this? * Pppery * it has begun... 16:57, 9 January 2023 (UTC)Reply[reply]

@Pppery: Honestly speaking, I did not recall it at the moment, and it makes me feel stupid now. But now that I have code ready, I would prefer to go with my own AWB editing. —usernamekiran (talk) 17:06, 9 January 2023 (UTC)Reply[reply]
This is what we call a "JUDI" site see WP:JUDI - there are processes already setup to deal with these we have procesed 100s of hijacked JUDI domains. You don't want to remove all the references or links. They can be flipped to usurped in some cases, tagged with {{usurped}} in others, etc.. it's a complex process. See WP:USURPURL. Code is already in place to handle it. - GreenC 18:18, 9 January 2023 (UTC)Reply[reply]
The archive URLs should not be deleted. See WP:USURPURL for how to deal with usurped domains. You want to maintain the citation as much as possible, by replacing the bad usurped URL with a good archived version. -- GreenC 18:27, 9 January 2023 (UTC)Reply[reply]
@GreenC: fortunately I already had stopped after making exactly 150 edits. But the reliability of the current source is also disputed. So I think removing that particular source would be okay. —usernamekiran (talk) 20:09, 9 January 2023 (UTC)Reply[reply]
I don't see a dispute discussion in the BRFA. -- GreenC 00:29, 10 January 2023 (UTC)Reply[reply]

In Special:Diff/1132588299 you left behind an orphaned ref. It worked out in the end, after AnomieBOT rescued it you just took care of that copy too, but it would have been better to not leave the orphan in the first place. Anomie 04:38, 10 January 2023 (UTC)Reply[reply]

@Anomie: Yes, I updated the regex earlier so now it removes all kinds of links that I could think of/came across. Before that update, it couldn't remove plain external links, like I mentioned above in the original request. Now it does that as well. —usernamekiran (talk) 06:04, 10 January 2023 (UTC)Reply[reply]
That's nice, but has nothing to do with what I said. Anomie 12:14, 10 January 2023 (UTC)Reply[reply]
I apologise for the confusion. I meant, now it removes plain external links, and by the last statement Now it does that as well. I was referring to the defined references, like the first diff you provided, where a fragment was left behind. Now it handles such format as well. —usernamekiran (talk) 12:38, 10 January 2023 (UTC)Reply[reply]

{{Bot trial complete}} well, sort of. It was using my alt Usernamekiran (AWB) (talk · contribs), I did around 1100 edits semi-automatically, all these edits were okay. The only unexpected one pointed above by Anomie (I somehow missed it when I was doing the edits), but now it has been taken care of. —usernamekiran (talk) 15:46, 10 January 2023 (UTC)Reply[reply]

all the ~1100 edits. —usernamekiran (talk) 06:25, 11 January 2023 (UTC)Reply[reply]
Great and all that you ran tests on your other accounts, but you can't say "trial complete" if it never went to trial. Primefac (talk) 11:48, 11 January 2023 (UTC)Reply[reply]

A user has requested the attention of a member of the Bot Approvals Group. Once assistance has been rendered, please deactivate this tag by replacing it with {{t|BAG assistance needed}}. I have already finished this particular task. But would it possible to get a clearance for non-controversial, non-cosmetic, non-judgement call (non CONTEXTBOT) one-off find-and-replace tasks? I don't come across such tasks much, but in case I do, it would be convenient to have "auto save" option on AWB. I will test my regex thoroughly on my sandbox before every task. —usernamekiran (talk) 05:16, 25 January 2023 (UTC)Reply[reply]

Operator: Magnus Manske (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 11:17, Wednesday, November 30, 2022 (UTC)

Function overview: The bot finds pages with links to a redirect page that links back to the original page:

[[Page A]] links to [[Page B]] which redirects to [[Page A]]

The bot will try and replace the link in question with plain text.

Automatic, Supervised, or Manual: Automatic

Programming language(s): PHP

Source code available: https://bitbucket.org/magnusmanske/magnustools/src/master/scripts/circular_redirects/circular_redirects.php

Links to relevant discussions (where appropriate): Diff from a recent circular redirect discussion

Edit period(s): Daily or weekly

Estimated number of pages affected: There are ~300K pages that have circular redirect links, but only ~10% (rough estimate) have a "simple" case that can be addressed by the bot as it is now. Capabilities to solve more complex cases might be added in the future.

Namespace(s): Main

Exclusion compliant Yes

Adminbot No

Function details: Example edit, all test edits.

Discussion

  • Note: This bot appears to have edited since this BRFA was filed. Bots may not edit outside their own or their operator's userspace unless approved or approved for trial. AnomieBOT 11:23, 30 November 2022 (UTC)Reply[reply]
    Magnus, please do not run the bot again until it has approval to edit. Primefac (talk) 11:45, 30 November 2022 (UTC)Reply[reply]
  • Could you please point to a discussion where this is seen as a Good Thing? I seem to recall discussions in the past where circular redirects were usually acceptable as they indicated an {{r with potential}} type situation. Primefac (talk) 11:45, 30 November 2022 (UTC)Reply[reply]
  • I think that would depend on who you are discussing the matter with. (I'm actually responsible for prompting Magnus about this problem.) I think that circular redirects are worse than useless. For a reader who clicks on one, there is frustration, just as bad as a page self-link. They probably click again, using the servers uselessly. Where the circular redirect is created from a redlink, rather than a stub being created, WP loses a growth point. I do not buy the argument that {{r with potential}} is any sort of substitute for a redlink, in terms of getting articles created.
Talking to people who've considered the issue solely from a technical point of view, it seems this an "old chestnut" - no obvious fix. Looking at it socially, there is indeed no fix that does not undo some good-faith edits. But there is a large backlog, now affecting 4% of all articles I believe.
If the backlog can be cleared, I hope we can move onto a more sensible approach. By that I mean this issue is too large to be referred to Redirects for Discussion in each case. There should be some triage, because some of the redirects created are not that useful, as some of the (red)links introduced are unhelpful. But there has to be an initial clearance. Charles Matthews (talk) 15:57, 30 November 2022 (UTC)Reply[reply]
  • As a small data point, I'll add that WP:XFDC unlinks circular redirects when you close a RfD as retarget. Legoktm (talk) 22:11, 30 November 2022 (UTC)Reply[reply]
  • Why isn't it better to leave a redlink than to remove the link completely? Mike Christie (talk - contribs - library) 12:16, 1 December 2022 (UTC)Reply[reply]
    A redlink to what? A => B => A, removing link A => B, leaving plain text behind. Magnus Manske (talk) 16:10, 1 December 2022 (UTC)Reply[reply]
    I was thinking that since a circular redirect isn't red and hence appears to not require an article to be created, it would be better to make it into a red link. Of course that's nothing to do with the wikitext in the article with a redirect, it's a function of whether there's a page (redirect or not) at the target of the link. The bot would have to delete redirect pages, not edit links, to make this happen, and I understand that is not what this bot is designed to do. Mike Christie (talk - contribs - library) 22:03, 2 December 2022 (UTC)Reply[reply]
For the avoidance of doubt, this bot is not for removing redirects. Charles Matthews (talk) 21:29, 2 December 2022 (UTC)Reply[reply]
  • What about pages that link to a page which itself links to a sub-section on the original page? ProcrastinatingReader (talk) 21:34, 13 December 2022 (UTC)Reply[reply]
  • Noting that I've mass rollback'd the test edits, as several of them contained errors (where links contained a pipe, the replacement did not remove the pipe) ProcrastinatingReader (talk) 16:58, 16 December 2022 (UTC)Reply[reply]
  • @Magnus Manske: I've got a couple of random comments:
    • I'm generally opposed to using regex to parse wikitext. It's always tempting, but it's usually more complicated than it appears at first, and I strongly suspect wikitext is theoretically impossible to parse correctly in the general case with a regex. The kinds of errors spotted by ProcrastinatingReader will keep cropping up. This kind of thing should be done by a real wiki parser. I don't know what parsing tools are available in PHP, but Parsoid is always an option.
    • I'm not familiar with the history, but it sounds like this is something which has been considered before and rejected. Perhaps a slightly different take would be useful, however. Use the same code to detect when this happens, only on recent edits. Then have the bot drop a note on the talk page of the person who created the cycle: "This recent edit of yours <include link to diff> created a circular redirect. That's not always a problem, but it can be. Please take a look and see if the link you added is correct". Adjust the wording as appropriate. Keep track of how many of those alerts result in the link being removed, and come back with statistics which will tell us if this is actually useful or not. Or perhaps expose some deeper pattern which can be used to filter which cycles are OK and which are not. -- RoySmith (talk) 13:59, 28 January 2023 (UTC)Reply[reply]

Operator: Qwerfjkl (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 16:16, Thursday, October 13, 2022 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): AWB (+JavaScript if necessary)

Source code available: AWB (RedirectTagger)

Function overview: Tag redirects with various redirect categories.

Links to relevant discussions (where appropriate): Wikipedia:Bot requests#Adding {{r to category}}

Edit period(s): one time run, with more later if necessary

Estimated number of pages affected: large, in the hundreds of thousands

Exclusion compliant (Yes/No): No

Already has a bot flag (Yes/No): Yes

Function details: Tag applicable redirects with:

Discussion

Operator: William Avery (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 18:08, Friday, September 2, 2022 (UTC)

Function overview: A template, {{plain row headers}}, will be placed immediately before every table that currently uses "plain row headers" styling. The name of the CSS class used to achieve the plain row headers styling will be changed from "plainrowheaders" to "plain-row-headers". If a table has the "plainrowheaders" CSS class, but contains no row headers to be thus styled, the "plainrowheaders" CSS class will be removed from the table.

For background to this, and the motivation, see Wikipedia:TemplateStyles.

Automatic, Supervised, or Manual: Automatic

Programming language(s): Python

Source code available: pywikibot script plainrowheaders.py

Links to relevant discussions (where appropriate):

For background see:

Edit period(s): Total edit time will be in the region of 9 days(~ 125,000 articles ÷ 600 edits per minute ÷ 24 hours a day), but will occur over a longer period than that. My plan is to concentrate on individual, heavily affected, subject areas in turn. Music, films, TV, theatre, sports, lists of historic buildings and species are areas where much data is tabulated in articles. I intend to do trials in each area before running batches of ~10,000 articles. This should also help shorten any period of watchlist disruption for individual editors down to a day or two.

After the initial processing, there will need to be further runs on a smaller scale, as editors will still be using the current construct, and pages may have been deep reverted for reasons unrelated to this processing.

Estimated number of pages affected: 125,000 per this search. Obviously there are very few pages with the {{plain row headers}} template in place as yet.

Namespace(s): Mainspace/Articles

Exclusion compliant: Yes, per pywikibot

Function details: Each table in the page is processed. This only applies to tables started with {|, not templates that output tables.

If the class attribute of the table contains the class name "plainrowheaders", that classname is replaced with "plain-row-headers".

If the table's class attribute now contains "plain-row-headers", several successive steps are taken to discover whether the table in fact makes use of the class, and therefore requires the {{plain row headers}} template.

  1. Each table header in the table, as parsed by mwparserfromhell, is examined for table headers with "scope=row".
  2. Table headers that start e.g. !scope=row{{Some template| may also be present. mwparserfromhell doesn't see the attribute because there is no following pipe. A regular expression can detect these, and output a warning with the name of the template. (Usually the template should be being invoked with a parameter such as "rowheader=true", rather than used in such a fashion.)
  3. The table body markup may contain a template that is known to emit table headers with "scope=row", such as {{Single chart}}. These can be tested for with a regular expression. Some of these templates, such as {{Episode list}}, are intended for use within templates that emit a whole table, but they turn up in plain tables.
  4. If the markup of the table body looks like it contains templates (i.e. includes "{{"), the templates can be subst'ed and the resultant markup reparsed, as at step one. In practice this is only necessary for relatively few of the tables.

Each table using plain row header styling should be preceded by its own {{plain row headers}}, so the processing keeps track of whether such a template has been encountered since the last table. It assumes hat any such template is the one that belongs to the current table.

If no table header cells with scope=row were found or the "wikitable" class name is not among the table's classes, the "plain-row-headers" class is not producing any effect, and is removed from the table. Otherwise, if none is present, the styling template {{plain row headers}} is inserted before the table.

Care has been taken to ensure that if the processing is run for a second time on a page that has already been processed, it makes no changes.

To simplify checking, no standard fixes or cosmetic code changes are included.

Test edits have been carried out in the bot's userspace. e.g. here and here.

Division of work into batches

I use my own extension of the pywikibot.pagegenerators module, which can generate lists of pages to be processed from a custom database table, in this case prh_batch. I can populate this table using prh_batch.py. The search needs to be divided into slices because there is a limit of 10,000 on results. Once the list of pages is in the database table I can run queries against the enwiki_p database replica to assign them to Wikiprojects.

Discussion

Given the documented caveat that the ability to affect page content outside the template should not be relied on,[1] I do not think this task should proceed. — JJMC89(T·C) 01:19, 3 September 2022 (UTC)Reply[reply]

Right, I thought that would be discussed, hence why I added The wider community was notified and nothing of interest was discussed. In other discussion from the content transform team members, the HTML emitted does not make this an issue. to the documentation of the template. Absolute crickets on the task of interest (phab:T176272) where I made clear what I intended. When I reviewed the background of why that caveat is there, it was more of a "wait and see" than the wording implies on that template. See the revision after the one you linked to (why did you add it as a permalink excluding that revision?) which points to phab:T155813#3037085 and related discussion.
Separately, the discussion there about scoped CSS is more or less irrelevant today and barely relevant when it was first discussed as some sort of thing that would be beneficial for VE. Though it seems to be making a comeback on the design side finally ([1]), it's been practically dead since around the time TemplateStyles was first discussed. Even then, it doesn't seem like a valuable restriction for end-users like us -- it was proposed entirely as a convenience for VE, and TBH looking at what it was suggested for in that context I don't think it's all that pertinent there either.
To go a step further, there is a template (er, templates) that does similar today though at much smaller scale for this exact use case, which is Template:Import-blanktable/Template:Row hover highlight (~600 uses). Which, coincidentally, was recommended for use by one of the engineers who participated in the RFC linked above for how TemplateStyles would work.
At worst, we have to go through N number of articles to remove this in the future if it completely breaks VE or some other system at some arbitrary time in the future, or WMF can't somehow work around it. Izno (talk) 02:41, 3 September 2022 (UTC)Reply[reply]

Against, You can realize your idea simple and fast with CSS formula (as global.css for this local wikipedia or in the software WikiMedia):

th[scope="row"] {
	font-weight: normal;
}

✍️ Dušan Kreheľ (talk) 10:47, 28 September 2022 (UTC)Reply[reply]

Not relevant. This is a case where we are trying to move what is in MediaWiki:Common.css to WP:TemplateStyles. Izno (talk) 22:50, 28 September 2022 (UTC)Reply[reply]

{{BAG assistance needed}} User:Izno, who requested this task, has expressed a concern on the noticeboard at the length of time that this request has gone without attention from a BAG member, and a willingness to provide any further input required here. I am therefore requesting BAG assistance. William Avery (talk) 18:17, 24 October 2022 (UTC)Reply[reply]

References

  1. ^ mw:Extension:TemplateStyles#Caveats: Styles included by a template can currently affect content on the page outside of the content generated by that template, but this ability may be removed in the future and should not be relied upon.

Needs wider discussion. Per WP:BOTREQUIRE#4, please show broader consensus to perform this task and perform this task by bot at large scale. I do not see any immediate discussion that involves uninvolved editors expressing support or opposition to this task. I see a lot of technical details, work steps, todo lists, and work progress notifications, etc. concentrated on technical pages. Being (as far as I know) the first time a bot is "implementing" TemplateStyles this way places this BRFA as a precedent and puts an even larger onus on BAG to establish a clear consensus for the task. I see general support for enabling and careful use of TemplateStyles as a whole. I see general support for making a guideline. Since then it has been expanded to discuss a case like this with tables, although I don't see any direct discussion. It has also been expanded to include even more workflow for conversion, which is again a "how to" rather than "should". So, as far as I can locate previous discussions I can link to and understand the intent here, this task takes it several steps further from previous explicit consensus - it (1) styles outside its template (i.e. not "only style the associated template's output"), (2) styles tables (i.e. not "a specific template or group of templates"), (3) does this on a case-by-case basis (i.e. only tables that are manually and specifically classes "plainrowheaders") and (4) automates the process (i.e. currently, only this BRFA itself, which besides the proposer and implementer has 2 editors opposing based on arguments with sufficient merit to consider). I'm sure I'm grossly oversimplifying, but that's kind of the point - consensus should be clear and I shouldn't need to dig this deep to understand if the task is appropriate during WP:BOTAPPROVAL. —  HELLKNOWZ  TALK 19:38, 24 October 2022 (UTC)Reply[reply]

@Hellknowz Ok. Do you have a recommendation on where? I am inclined to WP:VPT or WT:TemplateStyles but if you think this should be done at WP:VPPRO, I can take it there.
You really did go down a rabbit hole there though. Anyway, the below is for your edification:
Regarding item 3 in that list; if it were all <table>s then MediaWiki:Common.css would be the correct home for it (which is where the relevant CSS lives today, and in MediaWiki:Mobile.css, but I think that is an artifact of when plainrowheaders was added vice when TemplateStyles was added and not any other reason). Regarding item 4 in that list, it is infeasible to do the change relevant to this BRFA any other way (well, I could do it in AWB but it would take a while and be more likely to cause errors). Regarding 2 editors opposing based on arguments with sufficient merit to consider, the latter editor's comment has 0 relevance in that it's basically like "you can put it in Common.css"... which is where it is today and which is sufficiently answered by MediaWiki talk:Common.css/to do#description.
I think item 2 of your list also isn't interesting as this is not a new addition of course, it is moving CSS from place X to place Y. And has existing precedent already in the form of a much lesser-used template.
Again, strictly for edification. I await your recommendation. :) Izno (talk) 02:54, 14 November 2022 (UTC)Reply[reply]
@Izno: I guess any VP discussion would probably be sufficient. If you want to future-proof it with an RfC or something, that's cool. It really is up to you guys - I (or, technically, you) just need a discussion or guideline/policy we can point to and say - here is consensus for this task. Also, can I strongly suggest a couple examples of exactly what the changes will look like so that no one has to guess what all these technical things mean.
Thanks for clarifying on the various points. As you can probably tell, I didn't try to conclude whether any of these points are actually fulfilled. To be clear, these are not necessarily problems, these are just questions about the scope of the task where I cannot find clear consensus (or at least an obvious answer). It's more to give you an indication of what I saw as an outside observer and what someone else may or may not support or oppose or disregard. —  HELLKNOWZ  TALK 11:01, 14 November 2022 (UTC)Reply[reply]

Bots in a trial period

Operator: Legoktm (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 00:45, Sunday, January 22, 2023 (UTC)

Function overview: Automatically fix (low priority) obsolete-tag lint errors

Automatic, Supervised, or Manual: Automatic

Programming language(s): Rust

Source code available: [2]

Links to relevant discussions (where appropriate): Wikipedia:Bots/Noticeboard/Archive_17#MalnadachBot_and_watchlists

Edit period(s): One time run (well, possibly multiple runs, but at some point it will be done)

Estimated number of pages affected: There are 4 million errors, but many pages have multiple lint errors, so I'd estimate less than 1 million pages.

Namespace(s): All

Exclusion compliant (Yes/No): Yes

Function details:

The main difference from other attempts is that Legobot will attempt to fix all obsolete-tag lint errors at once, and if it is unable to fix everything, it will not edit the page. This should ensure that Legobot does not edit a page more than once, which was the main issue in the above-linked BOTN discussion.

For each page that is reporting lint errors, the bot pulls the Parsoid HTML for the page, and:

  • Any <font>...</font> tags are turned into <span>...</span> (or <div>...</div> if it contains block elements) with the appropriate inline styles. The color, face, and size attributes are parsed according to the HTML spec.
    • If the <font>...</font> tag specified a color (color attribute or inline style) and it contains links, then another <span>...</span> will be added inside the link, wrapping the link text.
  • Any <strike>...</strike> are turned into <s>...</s>
  • Any <tt><nowiki>...</nowiki></tt> are turned into <code><nowiki>...</nowiki></code>
  • Any <center>...</center> are turned into <div class="center">...</div>
    • If any of the children are tables, class="center" will be set on those if possible (e.g. can't be done if the table is in a template).

The HTML is converted back to wikitext, and checks with the Linter API if there are any lint errors left. If there are remaining obsolete-tag errors, it does nothing. If all obsolete-tag errors have been resolved, then it saves the page.

One known limitation includes being unable to fix things inside template parameters, e.g. {{1x|<center>foo</center>}}. I haven't decided whether this is worth fixing; the lack of support isn't an issue because it simply won't edit those pages.

I've prepared a little over 1,000 edits as a demo where you can see the wikitext change and a side-by-side comparison of the rendered HTML.

Discussion

Approved for trial (100 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. If it's possible to also link the pages that were not edited, please list some of them (at the very least as proof-of-concept that the bot skips pages appropriately). Primefac (talk) 12:50, 22 January 2023 (UTC)Reply[reply]

Comment: I support this task in general, but the above specification appears incomplete, and the bot would replace tags that should not be replaced. The bot needs to be more selective in selecting tags to replace. I looked through the first 30 or so demo edits and found some errors.
  • Turning all <tt>...</tt> tags into <code>...</code> tags is not appropriate; conversion to {{mono}} or <kbd>...</kbd> or other formatting is sometimes the correct change. See mw:Help:Lint errors/obsolete-tag for examples of these conversions and those for other obsolete tags, based on context.
  • Similarly, converting <center>...</center> into a div tag is not always the right change; when it wraps tables, galleries, or other block content, markup like |class=center or wrapping the whole block in table markup is sometimes needed. See this diff and this diff for examples of a div tag that doesn't work properly.
  • Converting font tags into span tags when they contain a color specification and wrap a link is not the correct fix; the span tags should be placed inside the link. See this diff for an example (the font tag wrapping [[/Lobby|•]] is replaced incorrectly). See also this diff, where the tags around the link to User:Damërung/Secret are replaced incorrectly. Also this diff; the original page contains a "font tag wrapping link" error that is not detected (see T294720).
  • Wrapping a table in a span tag is invalid markup and causes a new Linter error. See this diff for an example.
  • This edit appears to have deleted a necessary and valid (although misnested) </big> tag. It's a bit of GIGO, but this sort of thing is all over the place; let's not introduce new errors.
  • This edit appears to have created a new misnested bold tag error.
How will the bot know when to apply the above proposed fixes and when to do something different or leave the code for human editors to fix? – Jonesey95 (talk) 15:49, 22 January 2023 (UTC)Reply[reply]
Thank you for the detailed feedback @Jonesey95!
  • Would you recommend to just skip dealing with <tt>...</tt> entirely then? My initial thoughts was that while there are often better replacements for <tt>...</tt> as you and the wiki page suggest, <code>...</code> isn't absolutely wrong, and the value gained by using the correct one isn't really worth the human time. But if that disagrees with the current consensus, I can axe it.
  • Right. I'm confused by The bellman's userpage, why the "I agree to multi-license..." text isn't center-aligned despite being inside <center>...</center>. Based on [3], it seems like <div class="center">...</div> would actually be a better replacement than a raw inline text-align style as the class appears to handle centering of block content?
  • I mostly implemented the wrapping of link contents, the remaining todo is when <font style="color:...;">...</font> is used instead of the color attribute.
  • In this case it should be a div rather than a span, right? I can add something to detect if a block element is being used, and to switch the tag based on that.
  • Regarding the final two issues, I switched it to bail out if there are any lint issues, so it will not make things worse. In theory it should've fixed the misnesting bug, but I think there is also a separate Parsoid bug that I raised on IRC, I'll see what they have to say...
In general I'd like the bot to be defensive as possible to begin with, if it is not confident in the fix, we skip. And then we can iterate on specific cases it can't handle, and go back over the skipped pages. For example, if we don't have a good solution on the center issue (2nd bullet), then I could have the bot skip fixing if there are any block elements nested in the center tag.
Either tomorrow or the next day (presumably after we figure out the center issue) I'll generate a new set of demo edits and "categorize" them so it's easy to see specific types of fixes (e.g. link inside of font) and also places where it skipped. Legoktm (talk) 07:48, 23 January 2023 (UTC)Reply[reply]
I recommend:
  • Replacing <tt>...</tt> with <code>...</code> when it wraps <nowiki>...</nowiki>. In my experience, that is the only safe tt replacement to do semi-automatically.
  • It may be that div class="center" is a better replacement in some cases; I haven't played with it. In my experience, no method of centering applies correctly to every situation. Even class="center" within gallery tags doesn't work sometimes (I have a phab ticket lying around somewhere). It's exhausting.
  • I haven't tried wrapping a whole table in a div tag to specify the font, but since span tags appear to work (even though they are invalid HTML), a div tag around a block element is probably correct.
I think you're on the right track. Even with a conservative set of patterns, you will be able to fix a lot of pages. There are plenty of pages with just one center tag, or a few easy font tags. You might try starting in the User talk and Project spaces, since there are a lot of short pages with just a few signatures in those. Getting the AFD pages cleaned up, for example, would be pretty nice, since each one is transcluded in a larger page.
I hesitate to expose my poor regex skills, but feel free to look at User:Jonesey95/AutoEd/doi.js and User:Jonesey95/AutoEd/coauthors.js for patterns that I use. They almost never result in errors, but I still preview every edit before saving, just in case. The proposed code at User:SheepLinterBot/1 may also have some value. – Jonesey95 (talk) 14:15, 23 January 2023 (UTC)Reply[reply]
  • Done. For those it doesn't replace, it'll still trigger the obsolete tags lint error, preventing save.
  • OK, It seems class="center" is definitely better since it also handles non-text centering, so I've switched to that.
  • And switched to divs when the font tag contains block elements.
Still going through your regexes...did people really try <font colour="">...</font>??? Did that ever work? Legoktm (talk) 02:03, 25 January 2023 (UTC)Reply[reply]
If there is one thing I have learned in ten years as a gnome on this site, it is that Wikipedia editors are endlessly creative (currently 481 hits for "font colour" tags) in the way that they make errors. I'm pretty sure that "font colour" has never worked. – Jonesey95 (talk) 03:54, 25 January 2023 (UTC)Reply[reply]
@SSastry (WMF) and I discussed the center tag today, he pointed out it was documented that it only centers tables rather than their content. So in theory if we wanted to get as close to identical as possible, we'd have to mark up each child with class="center" and then tables would get style="align-left: auto; align-right: auto;". That would probably get very messy and not always possible if the children are templates. It's also unlikely someone used a center tag knowing that it would center the table but not the contents...
So I think the best option, though not purely identical, is to swap center with div class="center" (as I suggested earlier), and then if any of the children are tables, mark those with class="center", if possible.
I published a new set of demo edits and tried to group them by edge case. I will spend a while tomorrow reviewing them, and if you don't mind peeking at some that would be appreciated. And if all looks good, I'll kick off the trial! Legoktm (talk) 07:17, 26 January 2023 (UTC)Reply[reply]
Re: It's also unlikely someone used a center tag knowing that it would center the table but not the contents: If I understand that sentence correctly, I think it is incorrect. I see tables wrapped in center tags all the time, and it is clear that the editors wanted the table to appear normally, without centering any of the interior content, but as a block in the horizontal middle of the page. Maybe I misunderstand your sentence. As for centering tables with |class="center" or style="align-left: auto; align-right: auto;", I wonder why that would be the recommendation when Wikipedia:HTML 5#Tables has shown "margin:1em auto" as the recommended styling for many years. I have found that tables wrapped by center tags are easily updated by adding that "margin:1em auto" style, as recommended. – Jonesey95 (talk) 16:12, 26 January 2023 (UTC)Reply[reply]
Comments on this batch of demo edits:
  • Exclude pages in Wikipedia space that contain the string "log/" or "Log/" in the title. Those are compilation pages that do not contain any actual errors; the transcluded pages need to be fixed.
  • I would be wary of including User pages in your initial batches. People don't usually like having their sandboxes and article drafts messed with, and those do not always live at /sandbox. I would stick to pages with discussions on them for a while.
  • This diff wrapped a multi-line bit of content (i.e. content with a hard line break, not just p or br tags) in the "Tools" section using a span tag. LintHint does not detect an error, but wrapping multi-line content with span tags instead of div tags can introduce new errors.
  • This color replacement appears to be very nicely done, preserving the color both inside and outside the wikilinks. This similar situation appears to have missed additional wikilinks that needed interior styling.
  • This edit added nowiki tags.
  • This edit appears to have changed the position of Template:Historical (probably for the better, but just noting it).
  • This edit de-centered the block content in the Invite section.
  • This edit de-centered the block in the International section.
  • This edit replaced font size=-1 with font-size:x-small, which is one size too small, according to the help page.
  • This edit looks promising. You could do a targeted search for this newsletter and hit them all.
  • Likewise, a targeted search for the envelope icon would probably yield a high fix rate. You can look at User:MalnadachBot/Signature submissions for additional common patterns.
That's probably enough from me. Don't get discouraged. As you can see from the regex pages of people who have gone before you, this stuff is hard to get right, especially as you broaden the scope of potential fixes. – Jonesey95 (talk) 17:19, 26 January 2023 (UTC)Reply[reply]

Comment: It appears that you are filing a BRFA for obsolete HTML tags including font tags. If you are going to include font tags, please consider the following:

  • Will there be any errors when fixing font tags, i.e. span tags with color outside a wikilink which is an error?
  • Will your code for font tags be stronger than my regexes? My regexes' strength is currently in a ratio of about 1:1.55 - 1:1.6 pages (1:1.9 - 1:2 using these safe regexes).

Please consider the following, as I already filed a BRFA on October for font tags. In case you don't know, the ratio (I call the edit-to-page ratio) is the percentage of edits made to a number of pages checked. Sheep (talkhe/him) 13:57, 24 January 2023 (UTC)Reply[reply]

It is valuable to have more than one bot in development for this particular task, since it is so large. – Jonesey95 (talk) 15:43, 24 January 2023 (UTC)Reply[reply]
There is also a relatively important difference between how the proposed LegoBot task and the proposed SheepLinterBot task handle the issue of "MalnadachBot makes way too many edits": LegoBot forces a "ratio" of 1:1 by only making edits to pages it can fix in one go, while SheepLinterBot is a finer-combed tool that reduces the number of edits by narrowing the task. There is room for both of them on-wiki, I think. casualdejekyll 20:05, 24 January 2023 (UTC)Reply[reply]
My understanding from both BRFAs is that each bot will abandon an edit if it is unable to fix all of the Linter errors that it sets out to fix, so each bot will have a ratio of less than 1:1 (pages fixed : pages examined). This BRFA's description says "if it is unable to fix everything, it will not edit the page". – Jonesey95 (talk) 21:10, 24 January 2023 (UTC)Reply[reply]
On a reread, I think me, you, and Sheep all used completely different ideas of what the ratio was supposed to be. Maybe not the greatest measurement, then. casualdejekyll 12:28, 25 January 2023 (UTC)Reply[reply]
@Sheep8144402: hi!
  • Yes, it should correctly mark up the inside of links with the correct color if necessary.
  • My font code (and all the other tags) uses an HTML parser. In theory it should cover all possible invocations and use cases because it looks at the structure of the tag rather than how it is laid out in text. The use of HTML increases the confidence in the fixes, but that also means it will miss e.g. commented out wikitext that a regex-based bot would (of course, commented out tags don't trigger lint errors in the first place!).
In the first 5,000 lint errors (not pages) I pulled, my bot prepared a little over 1,000 edits. That number will go down even further as we've changed the task description and guardrails to be more restrictive. Legoktm (talk) 02:21, 25 January 2023 (UTC)Reply[reply]
  • While MalnadachBot was busy with Task 13 in the last 6 months, I have substantially imporved it. It is back to fixing Lint errors and no longer has issues with font tags raised in the BOTN discussion. That said, I support this and Sheep's bot task since all 3 of our bots work in different ways and have their own roles in bring down Lint errors. I can run MalnadachBot on more complex patterns that are difficult to handle programmtically by the other 2 bots, in addition to overlapped scopes. I have fixed 11 million errors with MalnadachBot, putting a large dent in the backlog which was at 22 million when I started. Now the number is at 8.784 million, we can bring this down to less than a million by working together! ಮಲ್ನಾಡಾಚ್ ಕೊಂಕ್ಣೊ (talk) 13:06, 28 January 2023 (UTC)Reply[reply]

Operator: JPxG (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 01:22, Sunday, January 8, 2023 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): Python

Source code available: https://github.com/jp-x-g/wegweiser

Function overview: This bot's purpose is to carry out some basic tasks for the Signpost, of which I am the editor-in-chief and publication manager. Right now, I am engaged in updating the historical indices at Module:Signpost: this includes things like adding articles that were previously not indexed, and adding tags for untagged articles. I'm doing this by using my Wegweiser scripts to compose data and manually pasting the output into the module's index pages, which is extremely tedious (i.e. I have to navigate to the 2005 index to paste in output for tagging the arbitration reports, then the 2006, then the 2007, then the 2008, etc up to 2015, then navigate back to the 2005 index to paste in output for tagging the discussion reports, etc, etc). Currently, my edits to the modules look like this. I also intend to update the indices with pageview counts for intervals like "30 days after publication", etc, which can be used to tabulate information that Template:Graph:Pageviews cannot handle (it can only generate graphs, it can't output single numbers!).

Another issue is that, currently, the indices are built using User:Mr. Stradivarius's SignpostTagger script: this means that every time we publish a new issue, the articles don't show up in the index until someone individually goes to each of them and runs the SignpostTagger userscript. This is extremely sub-optimal (single-issue comment pages, for example, won't render properly until each article is in the index) and error-prone (some inevitably get missed when individually clicking on 24 articles, opening the tagger for each of them, running it and checking for completion).

Links to relevant discussions (where appropriate): Module_talk:Signpost#Adding_authorship

Edit period(s): Manually, whenever large maintenance tasks are carried out, and at time of publication, to update indices.

Estimated number of pages affected: Unknown (there are only 19 module indices, but some work may need to be done on Signpost articles as well)

Exclusion compliant (Yes/No): Yes

Already has a bot flag (Yes/No): No

Function details: As described above: one function of the bot would be to integrate information into the Module:Signpost indices from the obsolete Signpost article category system (e.g. Category:Wikipedia Signpost Special report archives 2005, Category:Wikipedia Signpost Special report archives 2006, Category:Wikipedia Signpost Special report archives 2007, Category:Wikipedia Signpost Special report archives 2008, Category:Wikipedia Signpost Special report archives 2009, Category:Wikipedia Signpost Special report archives 2010, Category:Wikipedia Signpost Special report archives 2011, Category:Wikipedia Signpost Special report archives 2012, Category:Wikipedia Signpost Special report archives 2013, Category:Wikipedia Signpost Special report archives 2014, Category:Wikipedia Signpost Special report archives 2015, Category:Wikipedia Signpost Technology reports archives, Category:Wikipedia Signpost Technology reports archives 2005, Category:Wikipedia Signpost Technology reports archives 2006, Category:Wikipedia Signpost Technology reports archives 2007, Category:Wikipedia Signpost Technology reports archives 2008, Category:Wikipedia Signpost Technology reports archives 2009, Category:Wikipedia Signpost Technology reports archives 2010, Category:Wikipedia Signpost Technology reports archives 2011, Category:Wikipedia Signpost Technology reports archives 2012, Category:Wikipedia Signpost Technology reports archives 2013, Category:Wikipedia Signpost Technology reports archives 2014, Category:Wikipedia Signpost Technology reports archives 2015, ad nauseam).

The way it does this is by hitting the API to retrieve the categories' contents, formatting it into an array, then hitting the API to retrieve the contents of the Lua tables, and formatting that into JSON. Next, it compares the category contents to the Lua table's contents, checks to see if the Lua module has the appropriate tag for that category (e.g. Category:Wikipedia Signpost Arbitration report archives 2005 would be "arbitrationreport"), and if not, adds them. Then it converts this JSON back to the proper format for a Lua table, and submits this to the server as an edit.

For pageview tracking, it does the same as the above, except instead of getting page names from categories, it gets them from the action=query&list=allpages API endpoint, and instead of adding tags, it adds "views" key-value pairs to the dict.


For fleshing out indices, it does much the same, with respect to the Lua tables; the difference is that, rather than comparing those entries' tags to membership in a category, it compares them to a list of all pages from that year obtained through the action=query&list=allpages API endpoint (and adds new entries to the Lua table if they are not already present).

I have tested this set of scripts at the following diff: [4] (this output is valid and works fine with the module).

Discussion

Trusted user. Approved for trial. Please provide a link to the relevant contributions and/or diffs when the trial is complete. Trial length is whatever you feel is appropriate. Enterprisey (talk!) 04:34, 8 January 2023 (UTC)Reply[reply]
@Enterprisey: Despite WegweiserBot's valiant effort, he remains unable to edit the module pages due to not being autoconfirmed. Can he get +AC (or +bot) flags? jp×g 05:13, 8 January 2023 (UTC)Reply[reply]
This appears to have been sorted. Primefac (talk) 07:24, 8 January 2023 (UTC)Reply[reply]

Operator: Wbm1058 (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 02:36, Saturday, June 25, 2022 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): PHP

Source code available: refreshlinks.php, refreshmainlinks.php

Function overview: Purge pages with recursive link update in order to refresh links which are old

Links to relevant discussions (where appropriate): User talk:wbm1058#Continuing null editing, Wikipedia talk:Bot policy#Regarding WP:BOTPERF, phab:T157670, phab:T135964, phab:T159512

Edit period(s): Continuous

Estimated number of pages affected: ALL

Exclusion compliant (Yes/No): No

Already has a bot flag (Yes/No): Yes

Function details: This task runs two scripts to refresh English Wikipedia page links. refreshmainlinks.php null-edits mainspace pages whose page_links_updated database field is older than 32 days, and refreshlinks.php null-edits all other namespaces whose page_links_updated database field is older than 80 days. The 32- and 80-day figures may be tweaked as needed to ensure more timely refreshing of links or reduce load on the servers. Each script is configured to edit a maximum of 150,000 pages on a single run, and restart every three hours if not currently running (thus each script may run up to 8 times per day).

Status may be monitored by these Quarry queries:


Discussion

I expect speedy approval, as a technical request, as this task only makes null edits. Task has been running for over a month. My main reason for filing this is to post my source code and document the process including links to the various discussions about it. – wbm1058 (talk) 03:02, 25 June 2022 (UTC)Reply[reply]

  • Comment: This is a very useful bot that works around long-standing feature requests that should have been straightforward for the MW developers to implement. It makes sure that things like tracking categories and transclusion counts are up to date, which helps gnomes fix errors. – Jonesey95 (talk) 13:30, 25 June 2022 (UTC)Reply[reply]
  • Comment: My main concerns are related to the edit filter; I'm not sure whether that looks at null edits or not. If it does, it's theoretically possible that we might suddenly be spammed by a very large number of filter log entries, if and when a filter gets added that widely matches null edits (and if null edits do get checked by the edit filter, we would want the account making them to have a high edit count and to be autoconfirmed, because for performance reasons, many filters skip users with high edit counts).

    To get some idea of the rate of null edits: the robot's maximum editing speed is 14 edits per second (150000 × 8 in a day). There are 6,610,568 articles, 57,434,796 pages total (how did we end up with almost ten times as many pages as articles?); this means that the average number of edits that need making per day is around 825000 per day, or around 9.5 per second. Wikipedia currently gets around 160000 edits per day (defined as "things that have an oldid number", so including moves, page creations, etc.), or around 2 per second. So this bot could be editing four times as fast as everyone else on Wikipedia put together (including all the other bots), which would likely be breaking new ground from the point of view of server load (although the servers might well be able to handle it anyway, and if not I guess the developers would just block its IP from making requests) – maybe a bit less, but surely a large proportion of pages rarely get edited.

    As a precaution, the bot should also avoid null-editing pages that contain {{subst: (possibly with added whitespace or comments), because null edits can change the page content sometimes in this case (feel free to null-edit User:ais523/Sandbox to see for yourself – just clicking "edit" and "save" is enough); it's very hard to get the wikitext to subst a template into a page in the first place (because it has a tendency to replace itself with the template's contents), but once you manage it, it can lay there ready to trigger and mess up null edits, and this seems like the sort of thing that might potentially happen by mistake (e.g. Module:Unsubst is playing around in similar space; although that one won't have a bad interaction with the bot, it's quite possible we'll end up creating a similar template in future and that one will cause problems). --ais523 23:06, 6 July 2022 (UTC)

    • While this task does not increase the bot's edit count, it has performed 7 other tasks and has an edit count of over 180,000 pages which should qualify as "high". wbm1058 (talk) 03:38, 8 July 2022 (UTC)Reply[reply]
    • There are far more users than articles; I believe User talk: is the largest namespace and thus the most resource-intensive to purge (albeit perhaps with a smaller average page size). wbm1058 (talk) 03:38, 8 July 2022 (UTC)Reply[reply]
    • The term "null edit" is used here for convenience and simplification; technically the bot purges the page cache and forces a recursive link update. This is about equivalent to a null edit, but I'm not sure that it's functionally exactly the same. – wbm1058 (talk) 03:38, 8 July 2022 (UTC)Reply[reply]
      • Ah; this seems to be a significant difference. A "purge with recursive link update" on my sandbox page doesn't add a new revision, even though a null edit does. Based on this, I suspect that purging pages is lighter on the server load than an actual null edit would be, and also recommend that you use "purge with recursive link update" rather than "null edit" terminology when describing the bot. --ais523 08:32, 8 July 2022 (UTC)Reply[reply]
        • Yes and just doing a recursive link update would be even lighter on the server load. The only reason my bot forces a purge is that there is currently no option in the API for only updating links. See this Phabricator discussion. – wbm1058 (talk) 12:42, 8 July 2022 (UTC)Reply[reply]
    • As I started work on this project March 13, 2022 and the oldest page_links_updated date (except for the Super Six) is April 28, 2022, I believe that every page in the database older than 72 days has now been null-edited at least once, and I've yet to see any reports of problems with unintended substitution. wbm1058 (talk) 03:38, 8 July 2022 (UTC)Reply[reply]
      • This is probably a consequence of the difference between purges and null edits; as long as you stick to purges it should be safe from the point of view of unintended substitution. --ais523 08:32, 8 July 2022 (UTC)Reply[reply]
    • To make this process more efficient the bot bundles requests into groups of 20; each request sent to the server is for 20 pages to be purged at once. wbm1058 (talk) 03:38, 8 July 2022 (UTC)Reply[reply]
  • Comment: I've worked the refreshlinks.php cutoff from 80 down to 70 days; the process may be able to hold it there. I've been trying to smooth out the load so that roughly the same number of pages are purged and link-refreshed each day. – wbm1058 (talk) 11:49, 8 July 2022 (UTC)Reply[reply]
  • Note. This process is dependent on my computer maintaining a connection with a Toolforge bastion. Occasionally my computer becomes disconnected for unknown reasons, and when I notice this I must manually log back in to the bastion. If my computer becomes disconnected from the bastion for an extended time, this process may fall behind the expected page_links_updated dates. – wbm1058 (talk) 11:55, 12 July 2022 (UTC)Reply[reply]
  • Another note. The purpose/objective of this task is to keep the pagelinks, categorylinks, and imagelinks tables reasonably-updated. Regenerating these tables for English Wikipedia using the rebuildall.php maintenance script is not practical for English Wikipedia due to its huge size. Even just running the RefreshLinks.php component of rebuildall is not practical due to the database size (it may be practical for smaller wikis). The goal of phab:T159512 (Add option to refreshLinks.php to only update pages that haven't been updated since a timestamp) is to make it practical to run RefreshLinks.php on English Wikipedia. My two scripts find the pages that haven't been updated since a timestamp, and then purge these pages with recursive link updates. Recursive link updates is what refreshLinks.php does. – wbm1058 (talk) 14:42, 16 July 2022 (UTC)Reply[reply]
  • Approved for trial (30 days). Please provide a link to the relevant contributions and/or diffs when the trial is complete. Let's see if anything breaks. Primefac (talk) 16:24, 6 August 2022 (UTC)Reply[reply]
    @Primefac: This task just purges the page cache and forces recursive link updates, so there are no relevant contributions and/or diffs for me to provide a link to. But I see that text is coming from the {{BotTrial}} template, so you probably didn't intend to make that request. As to "anything breaking", the bot went down sometime after I left on wikibreak, and now that I'm back it's catching up. In other words, the task as currently configured "breaks" easily and requires a lot of minding to keep it running. Perhaps it would be more reliable if I figured out how to set it up as a tool running from my Toolforge admin console. – wbm1058 (talk) 15:11, 25 August 2022 (UTC)Reply[reply]
    To improve reliability, I suggest running the task on the toolforge grid. When running on the grid, the server running your code and the database are on the same hi-speed network. You appear to have tunnelled the toolforge database to local port 4711. This setup is only intended for development-time debugging and will be unreliable for long-running tasks, as you have discovered.
    Also, I suggest using significantly lesser limit than 150000 – that is a very large number of titles to expect from a single database call, and could cause timeouts and/or put too much pressure on the database. Instead process just 5-10k titles at a time, and run the script more frequently. – SD0001 (talk) 19:18, 29 August 2022 (UTC)Reply[reply]
    @SD0001 and Primefac: I set up https://toolsadmin.wikimedia.org/tools/id/refreshlinks now I'm trying to figure out what to do with it. Apparently Grid is legacy and deprecated, and Jobs framework and Kubernetes are preferred for new bot setups. But before I automate this task on Toolforge I need to set it up there so I can manually run it. Per the Toolforge quickstart guide (which is anything but quick for helping me get started) I created my tool's code/html root directory: mkdir public_html but I don't need to create my bot's code, I just need to copy it to that directory. One of the files needed to run my bot is the file containing login passwords and I'm leery of copying that to a directory with "public" in its name! Some guidance on how to do this would be appreciated since the quickstart authors apparently felt that wasn't necessary. Microsoft Notepad probably isn't installed on the Toolforge and I probably need Linux rather than Microsoft commands. Can I import the files from wikipages (i.e. User:Bot1058/refreshlinks.php)? wbm1058 (talk) 19:09, 31 August 2022 (UTC)Reply[reply]
    @Wbm1058. All files in the tool directory (not just public_html) are public by default. Passwords, OAuth secrets and the like can be made private by using chmod, chmod 600 file-with-password.txt.
    Since you're creating a bot and not than a webservice, the files shouldn't go into public_html. They can be in any directory. See wikitech:Help:Toolforge/Grid for submitting jobs to the grid. (The grid is legacy, yes, but the newer k8s-based Jobs framework is not that mature and can be harder to work with, especially for people not familiar with containers.)
    To copy over files from a Windows system, IMO the best tool is WinSCP (see wikitech:Help:Access to Toolforge instances with PuTTY and WinSCP). It's also possible to edit files directly on toolforge, such as by using nano. – SD0001 (talk) 20:39, 31 August 2022 (UTC)Reply[reply]
    I finally got around to installing WinSCP. That was easy since it uses PuTTY and I just told it to use my configuration that I previously installed for PuTTY. I couldn't find any of the three "Advanced Site Settings" screens; it appears those were in a previous version of WinSCP but are not in the current version 5.21.3. Not sure I really need them since the setup seems to all have been automatically imported from PuTTY. I think "Advanced Site Settings" was renamed to "Preferences". Under "Preferences"→"Environment" I see "Interface, Window, Commander, Explorer, Languages" rather than "Directories, Recycle bin, Encryption, SFTP, Shell".

    Now I see I created the directory /mnt/nfs/labstore-secondary-tools-project/refreshlinks for my first "tool",
    and the sub-directory /mnt/nfs/labstore-secondary-tools-project/refreshlinks/public_html (my tool's code/html root directory)
    I also have a personal directory /mnt/nfs/labstore-secondary-tools-home/wbm1058 which has just one file: replica.my.cnf (my database access credentials)
    and when I try to look at other user's personal directories I get "Permission denied" errors so I assume that any PHP code I put in my personal directory would be private so only I could read it. My tool also has a replica.my.cnf file which I can't read with WinSCP when logged into my personal account. But if in PuTTY I "become refreshlinks" then I can read my tool's replica.my.cnf file and see that it's different credentials than my personal replica.my.cnf file.

    All my bots use the botclasses framework (User:RMCD bot/botclasses.php). Should I create another tool named "botclasses" for my framework, to avoid the need to make separate copies for each individual tool that uses it? I see wikitech:Portal:Toolforge/Tool Accounts#Manage files in Toolforge that I may need to "take ownership" of files or "mount" them. §Sharing files via NFS (what is NFS?) says "Shared config or other files may be placed in the /data/project/shared directory, which is readable (and potentially writeable) by all Toolforge tools and users." Still trying to digest this information. – wbm1058 (talk) 17:41, 15 September 2022 (UTC)Reply[reply]
    answering my own question: NFS = Network File System, a distributed file system protocol originally developed by Sun Microsystems in 1984. – wbm1058 (talk) 19:10, 6 October 2022 (UTC)Reply[reply]
    Yes, personal user directories are private. replica.my.cnf files are different for each user and tool and have the mode -r-------- which means only the owner can read and no one can modify.
    The recommendation to use different tool accounts per "tool" is for webservices (since each tool account can have only one web domain). For bots, just use a single tool account for multiple bots – that's easier to maintain and manage. – SD0001 (talk) 05:53, 18 September 2022 (UTC)Reply[reply]
    Thanks. Then I'd like to rename refreshlinks to a more generic name that covers all my bots, but tools can't be renamed, nor can maintainers delete Tool Accounts. I will follow the steps described at Toolforge (Tools to be deleted). It should be obvious from my experience trying to get a "quick start" on Toolforge why you have such a growing list of tools that have been volunteered for deleting by their maintainers. – wbm1058 (talk) 18:11, 22 September 2022 (UTC)Reply[reply]
    @SD0001: I set up https://toolsadmin.wikimedia.org/tools/id/billsbots and then in PuTTY I "become billsbots" and mkdir php creating a PHP directory where I can upload needed files from the PHP directory on my Windows PC. Then I go over to WinSCP to try to upload the files. There I can upload botclasses.php into /billsbots/ root directory but I don't have permission to upload to the /billsbots/php/ sub-directory I just created. I see "tools.billbots" is the owner of the /billsbots/php/ sub-directory but wbm1058 is owner of botclasses.php. I logged into WinSCP the same way I log into PuTTY as wbm1058. Is there a way inside WinSCP to "become billsbots" analogous to the way I do that in PuTTY? I assume "tools.billbots" should be the owner of its public PHP files and not "wbm1058"? Also unsure of what rights settings the php directory and the files in that directory that don't house passwords should have. Right now they just are the default from mkdir php and the upload. –wbm1058 (talk) 18:52, 24 September 2022 (UTC)Reply[reply]
    There's no need to become the tool in WinSCP – group permissions can be used instead of owner permissions. The group tools.billsbot includes the user wbm1058. Problem in this case is that the group doesn't have write permission. See wikitech:Help:Access_to_Toolforge_instances_with_PuTTY_and_WinSCP#Troubleshooting_permissions_errors. Files which don't have passwords typically should have 774 (owner+group can do everything, public can read) perms. – SD0001 (talk) 05:38, 25 September 2022 (UTC)Reply[reply]

@SD0001: Thank you so much for your help. I've now successfully manually run refreshlinks.php from the command prompt in PuTTY. I need to be logged in as myself for it to work, and not as my tool, because I own and have read permission for my password file, and my tool does not. Per wikitech:Help:Toolforge/Grid#Submitting simple one-off jobs using 'jsub' when I become my tool then

jsub -N refreshlinks php /mnt/nfs/labstore-secondary-tools-project/billsbots/php/refreshlinks.php and I got this in my refreshlinks.out file:
Warning: include(/mnt/nfs/labstore-secondary-tools-project/billsbots/php/logininfo.php): failed to open stream: Permission denied in /mnt/nfs/labstore-secondary-tools-project/billsbots/php/refreshlinks.php on line 28

wbm1058 (talk) 15:32, 1 October 2022 (UTC)Reply[reply]

@Wbm1058 become the tool, take the file (transfers ownership to tool) and then do chmod 660 – that would give access to both yourself and the tool. – SD0001 (talk) 18:20, 1 October 2022 (UTC)Reply[reply]
  • @SD0001 and Primefac:I just got an email notice for Phabricator T319590: Migrate billsbots from Toolforge GridEngine to Toolforge Kubernetes. Damn, I haven't even gotten anything running on an automated basis yet, just a few one-time runs as I try to familiarize myself with how the GridEngine works, and already I have a bureaucratic nag! I knew going into this that establishing my bots on Toolforge would not be easy, and my expectations have been exceeded! Maybe I just need to bite the bullet and learn how to use the "not that mature" and possibly "harder to work with" Jobs framework, and familiarize myself with containers. – wbm1058 (talk) 16:35, 6 October 2022 (UTC)Reply[reply]
    @Wbm1058 Looks like that was part of mass-creation of tickets so nothing to urgently worry about (they've covered A to D only so my tool hasn't come up yet!). If they're becoming pushy about this, I suppose the Jobs framework is mature now, though there are quite a few things it doesn't support.
    It should be easy enough to migrate - instead of putting a jsub command in crontab for scheduling, use toolforge-jobs command, passing --image as tf-php74. – SD0001 (talk) 17:53, 6 October 2022 (UTC)Reply[reply]
  • Just noticed now that I got an email on October 9 which I overlooked at first because I didn't recognize the sender.
    sftp-server killed by Wheel of Misfortune on tools bastion
    From Root <root@tools.wmflabs.org>

Your process `sftp-server` has been killed on tools-sgebastion-10 by the Wheel of Misfortune script.

You are receiving this email because you are listed as the shell user running the killed process or as a maintainer of the tool that was.

Long-running processes and services are intended to be run on the either the Kubernetes environment or the job grid not on the bastion servers themselves. In order to ensure that login servers don't get heavily burdened by such processes, this script selects long-running processes at random for destruction.

See <https://phabricator.wikimedia.org/T266300> for more information on this initative. You are invited to provide constructive feedback about the importance of particular types long running processes to your work in support of the Wikimedia movement.

For further support, visit #wikimedia-cloud on libera.chat or <https://wikitech.wikimedia.org>

I guess that explains why the task as currently configured "breaks" easily and requires a lot of minding to keep it running. Thanks, I guess, for this belated message that came only 3 12 months after I got my automated process running this way. So I suppose speedy approval isn't merited and won't be forthcoming. I did not know that I was running a process named sftp-server. What is that, and what is it doing? Most of this bot's process is still running on my own PC. Every few hours when a new script-run starts, it logs into the replica database and does a query which, even when it returns 150K results, takes only a couple of minutes. Then it logs out. It's not like this is constantly hitting on bastion resources. The only reason I need to be logged into the bastion 24×7 (via PuTTY) is that, if I'm not, then my bot, when it starts, will not be able to "tunnel" and thus will fail. The vast majority of the time I'm logged into the bastion, I'm just sitting there idle, doing nothing. Not "heavily burdening" the login server. I need to "tunnel" because there is no MediaWiki API for the database query I need to make. Otherwise I don't need the Toolforge because there is an API for making the "null edit" purges. – wbm1058 (talk) 15:53, 14 October 2022 (UTC)Reply[reply]
I think the Wheel of Misfortune sftp-server kills are from my open WinSCP session. I didn't get WinSCP installed and running until September 15, and the first email I saw from the Wheel of Misfortune was sent on October 9 (and I've received several since then). I keep WinSCP open on my desktop for my convenience. I just saw there is a "Disconnect Session" option on the "Session" tab in WinSCP and I just clicked on it. Hopefully that will stop the Wheel of Misfortune's anger. Now I can just click "Reconnect Session" when I go back to use WinSCP again – which saves me the trouble of needing to close and reopen the entire app. As far as I know the Wheel of Misfortune has never actually shut down my bot itself, perhaps because individual bot runs are not sufficiently long-running processes to draw the attention of the "Wheel". Even runs that purge 150,000 pages run in a matter of hours, not days. – wbm1058 (talk) 17:21, 19 December 2022 (UTC)Reply[reply]
  • Perhaps helpful to see how other bots running on Toolforge are configured to find a template for how to set mine up. – wbm1058 (talk) 22:45, 14 October 2022 (UTC)Reply[reply]
    Here's how I set my PHP bots up: User:Novem Linguae/Essays/Toolforge bot tutorial#Running at regular intervals (cronjob, kubernetes, grid). I found kubernetes to have a heavy learning curve, but I suppose getting the code off your local computer and onto Toolforge is the "proper" way to do things. Another method might be setting up a webserver on Toolforge/kubernetes that is an API for the query you need to make. Hope this helps. –Novem Linguae (talk) 08:35, 15 October 2022 (UTC)Reply[reply]
    Being connected to the bastion 24x7 is a no-no. Ideally, the bot process should run on toolforge itself so that no connection is needed at all between your local system and toolforge. If you really want to run the bot on local system, the tunnel connection to the database should be made only when required, and closed immediately after. Creating temporary new connections is cheap, leaving them open indefinitely is not. – SD0001 (talk) 16:51, 16 October 2022 (UTC)Reply[reply]
  • I've got my first Kubernetes one-off job running now, to refresh 40,000 pages. Commands I used to get it started:
wbm1058@tools-sgebastion-10:~$ become billsbots
tools.billsbots@tools-sgebastion-10:~$ toolforge-jobs run refreshlinks-k8s --command "php ./php/refreshlinks.php" --image tf-php74 --wait
ERROR: timed out 300 seconds waiting for job 'refreshlinks-k8s' to complete:
+------------+-----------------------------------------------------------------+
| Job name:  | refreshlinks-k8s                                                |
+------------+-----------------------------------------------------------------+
| Command:   | php ./php/refreshlinks.php                                      |
+------------+-----------------------------------------------------------------+
| Job type:  | normal                                                          |
+------------+-----------------------------------------------------------------+
| Image:     | tf-php74                                                        |
+------------+-----------------------------------------------------------------+
| File log:  | yes                                                             |
+------------+-----------------------------------------------------------------+
| Emails:    | none                                                            |
+------------+-----------------------------------------------------------------+
| Resources: | default                                                         |
+------------+-----------------------------------------------------------------+
| Status:    | Running                                                         |
+------------+-----------------------------------------------------------------+
| Hints:     | Last run at 2022-11-03T16:53:38Z. Pod in 'Running' phase. State |
|            | 'running'. Started at '2022-11-03T16:53:40Z'.                   |
+------------+-----------------------------------------------------------------+
tools.billsbots@tools-sgebastion-10:~$ toolforge-jobs list
Job name:         Job type:    Status:
----------------  -----------  ---------
refreshlinks-k8s  normal       Running
tools.billsbots@tools-sgebastion-10:~$

Will wait a bit for new emails or Phabricators to come in telling me what I'm still doing wrong, before proceeding to the next step, creating scheduled jobs (cron jobs). – wbm1058 (talk) 19:12, 3 November 2022 (UTC)Reply[reply]

One thing I'm apparently still doing wrong is Login to Wikipedia as Bot1058 from a device you have not recently used. That's the title of an email I get every time I run a one-off job on Toolforge. The message says "Someone (probably you) recently logged in to your account from a new device. If this was you, then you can disregard this message. If it wasn't you, then it's recommended that you change your password, and check your account activity." The Help button at the bottom of the email message links to mw:Help:Login notifications, which says "this feature relies on cookies to keep track of the devices you have used to log in". I'm guessing that cookies are not working in my Toolforge account.
The code I use to log in is:
$objwiki = new wikipedia();
$objwiki->login($user, $pass);
    /**
     * This function takes a username and password and logs you into wikipedia.
     * @param $user Username to login as.
     * @param $pass Password that corrisponds to the username.
     * @return array
     **/
    function login ($user,$pass) {
    	$post = array('lgname' => $user, 'lgpassword' => $pass);
        $ret = $this->query('?action=query&meta=tokens&type=login&format=json');
        print_r($ret);
        /* This is now required - see https://bugzilla.wikimedia.org/show_bug.cgi?id=23076 */
        $post['lgtoken'] = $ret['query']['tokens']['logintoken'];
        $ret = $this->query( '?action=login&format=json', $post );

        if ($ret['login']['result'] != 'Success') {
            echo "Login error: \n";
            print_r($ret);
            die();
        } else {
            print_r($ret);
            return $ret;
        }
    }
These emails will get very annoying pretty fast if I get this task set up to run frequent, small jobs rather than infrequent, large jobs – as @SD0001: suggests. Help please! wbm1058 (talk) 13:52, 4 November 2022 (UTC)Reply[reply]
The login code looks ok to me. Not sure why the emails didn't stop coming after the first few times, but if necessary you can disable them from Special:Preferences notifications tab. My general tip for botops is to use OAuth, which avoids this and several other problems. – SD0001 (talk) 19:11, 4 November 2022 (UTC)Reply[reply]
I found a relevant Phabricator task and added my issue there. – wbm1058 (talk) 13:08, 6 November 2022 (UTC)Reply[reply]
I think I solved this. Per comments in the Phab, as my bot only logged in and didn't make any edits, the IP(s) weren't recorded in the CheckUser table and every log in was treated as being from a "new" IP. To work around this, I did some one-off runs of another task this bot has which does actually make edits. After running that bot task a few times on the Toolforge, the emails stopped coming, even for the task that just refreshes links and doesn't make any edits.
But in the meantime before I figured that out, I searched for OAuth "quick start" links, and am posting my finds here:
At some point while navigating this forest of links, my mind exploded. I'm putting OAuth on my back burner now, to focus on creating scheduled jobs. Meanwhile I have these links saved here so I may come back to this at some point. – wbm1058 (talk) 15:46, 10 November 2022 (UTC)Reply[reply]

Job logs

On my way to creating scheduled jobs, I ran into another issue. Per wikitech:Help:Toolforge/Jobs framework#Job logs, Subsequent same-name job runs will append to the same files... there is no automatic way to prune log files, so tool users must take care of such files growing too large. What?! How hard can it be to offer a "supersede" option to override the default "append"? – wbm1058 (talk) 22:07, 12 November 2022 (UTC)Reply[reply]

I've raised this issue in T301901. – wbm1058 (talk) 09:59, 13 November 2022 (UTC)Reply[reply]
A "supersede" option sounds like a bad idea as that would mean you can only ever see the logs of the latest job run. – SD0001 (talk) 14:00, 25 January 2023 (UTC)Reply[reply]
@SD0001: I get your point, but wikitech:Help:Toolforge/Jobs framework#Job logs says Log generation can be disabled with the --no-filelog parameter when creating a new job. If it makes sense to sometimes disable logs entirely, why wouldn't it also make sense to sometimes supersede them? All logs for bots running on my desktop PC are always superseded. That's usually not a problem, but sometimes it would be nice to be able to go back and look at a previous log to see what happened on the run where a bug first surfaced. The logs for this task are quite long though.
I've successfully started running this bot's tasks 3, 4, and 5 as a scheduled hourly task on the jobs framework, see User:Bot1058#Tasks. The logs for those tasks are usually pretty short though, so it does make sense to append there. – wbm1058 (talk) 16:10, 27 January 2023 (UTC)Reply[reply]
Abandoned complicated workaround after T301901 closed

@SD0001: I'm trying to implement the somewhat complicated workaround given at wikitech:Help:Toolforge/Jobs framework#Custom log files. I've added some explanations to this section (see the edit history) so let me know if I added anything that's not correct. I take the following as instructions to type the following directly from my PuTTY keyboard.

If you save this file as log-wrapper.sh, give it execution permissions:

tools.mytool@tools-sgebastion-11:~$ cat > log-wrapper.sh <<EOF
> #!/bin/sh
> jobname=$1
> command=$2
> mkdir -p logs
> sh -c $command 1>>logs/${jobname}.log 2>>logs/${jobname}.log
> EOF
tools.mytool@tools-sgebastion-11:~$ chmod a+x log-wrapper.sh

After doing that I notice that the $1 and $2, and $command and ${jobname}, were eaten somehow. The contents of my log-wrapper.sh file are:

#!/bin/sh
jobname=
command=
mkdir -p logs
sh -c  1>>logs/.log 2>>logs/.log

which doesn't seem right to me. Of course I can just copy-paste the contents of the file from the Help: page directly with WinSCP, rather than type them in with PuTTY (which I did). If this Help: page isn't giving instructions that work, it should be corrected. I've made a couple of unsuccessful attempts, and something was obviously wrong with my syntax. – wbm1058 (talk) 19:06, 17 November 2022 (UTC)Reply[reply]

./php/refreshlinks.php: 1: cannot open ?php: No such file
./php/refreshlinks.php: 2: /bin: Permission denied
./php/refreshlinks.php: 3: log-wrapper.sh: not found
./php/refreshlinks.php: 4: log-wrapper.sh: not found
./php/refreshlinks.php: 5: Syntax error: word unexpected (expecting ")")
Kubernetes' beta phase has been declared done and the new phab:T327254 "next steps in grid engine deprecation" has opened. But Job logs still says there is no automatic way to prune log files, so tool users must take care of such files growing too large. Huh? I guess I made the mistake of trying to piggyback on an existing Phab rather than opening a new one. – wbm1058 (talk) 13:29, 25 January 2023 (UTC)Reply[reply]
@Wbm1058 For now, I would suggest not worrying about pruning log files. It would take a long time before the logs grow big enough to be of any concern, at which time you could just delete or truncate it manually. – SD0001 (talk) 14:01, 25 January 2023 (UTC)Reply[reply]

Bots that have completed the trial period

Operator: Qwerfjkl (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 05:15, Friday, December 30, 2022 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): Python

Source code available: Based off User:AssumptionBot/code + code to check if the creator is autoconfirmed

Function overview: Add AFC unsubmitted templates to drafts.


Links to relevant discussions (where appropriate): Wikipedia:Village pump (proposals)/Archive 188#Bot proposal (AFC submission templates), Wikipedia:Bots/Requests for approval/AssumptionBot

Edit period(s): Continuous

Estimated number of pages affected: ~100 a.day

Exclusion compliant (Yes/No): No

Already has a bot flag (Yes/No): Yes

Function details: Adds AFC unsubmitted templates ( {{afc submission/draft}} ) to drafts in draftspace that don't have the relevant templates already i.e. another AFC template.

Discussion

Pinging @ProcrastinatingReader. — Qwerfjkltalk 05:16, 30 December 2022 (UTC)Reply[reply]

Approved for trial (14 days). Please provide a link to the relevant contributions and/or diffs when the trial is complete. ProcrastinatingReader (talk) 17:20, 8 January 2023 (UTC)Reply[reply]

@ProcrastinatingReader, this will probably affect far more than 100 the first time it is run (probably more than 10,000 pages, at least 9000). This trial might need to be edit-based. — Qwerfjkltalk 18:35, 9 January 2023 (UTC)Reply[reply]
As the logic is quite simple, I was aiming to have it running for a little while, while the bot is in trial, in case anything crops up and people from watchlists/reviewers have anything to add. Can you make sure the edit summary has a link to this BRFA?
Regarding edit-based, feel free to do 50 edits first and let me know once those are done, but I'd still keep it in a 14-day trial assuming the first 50 look good. ProcrastinatingReader (talk) 21:06, 9 January 2023 (UTC)Reply[reply]
@ProcrastinatingReader, I've started running this. (Special:Contributions/Qwerfjkl (bot)). To prevent a large number of edits, it will only edit one page per minute at most. — Qwerfjkltalk 18:11, 10 January 2023 (UTC)Reply[reply]
Actually, I'll probably just ignore drafts created a long time ago, i.e. a month or longer. — Qwerfjkltalk 18:39, 10 January 2023 (UTC)Reply[reply]
  • Support this BRFA. From a side effects standpoint, I wonder if... 1) Are these 9,000 untagged drafts normally subject to G13? That is, will adding this template cause a bunch of drafts to enter the G13 queue? 2) If these drafts ARE normally subject to G13 even without the template, adding this template will likely reset the G13 timer for 9,000 drafts. Not a show stopper, but figured I'd mention these possible "side effects" in case we need to consider them or discuss them. Thanks. –Novem Linguae (talk) 23:30, 10 January 2023 (UTC)Reply[reply]
    Everything in the draft space is (in theory) subject to G13. Since it is a bot adding the AFC templates, it will not reset the G13 timer (as bot edits are explicitly excluded pretty much because of this exact circumstance). Primefac (talk) 09:06, 11 January 2023 (UTC)Reply[reply]
  • If it doesn't already, might be good to make sure the bot respects the {{Nobots}} tag with and without parameters, so that experienced users have a way to create exceptions. –Novem Linguae (talk) 15:26, 12 January 2023 (UTC)Reply[reply]
  • There's a discussion at Wikipedia talk:WikiProject Articles for creation#Tagging drafts with Template:afc submission/draft where the bot edit wars with a new user six times, so might want to give some thought to the bot's re-application strategy and how aggressive or not aggressive we want it to be. Maybe it's good that the bot re-applies the tag? Maybe it's not? I'm not sure. –Novem Linguae (talk) 15:26, 12 January 2023 (UTC)Reply[reply]
    I'm actually going to transclude the section below to save some clicks, though obviously folks are welcome to comment there if necessary. Primefac (talk) 09:47, 13 January 2023 (UTC)Reply[reply]

I've started the trial of Wikipedia:Bots/Requests for approval/Qwerfjkl (bot) 15, for tagging untagged drafts with {{afc submission/draft}}. You can see the edits at Special:Contributions/Qwerfjkl (bot). — Qwerfjkltalk 20:38, 10 January 2023 (UTC)Reply[reply]

This is great, I'm glad someone will be adding the tag to pages missing it. Hey man im josh (talk) 20:41, 10 January 2023 (UTC)Reply[reply]
Qwerfjkl one change I would make is not to add to completely blank pages such as with this. That just seems odd to me :/ KylieTastic (talk) 14:29, 11 January 2023 (UTC)Reply[reply]
I agree, but will note that the redlink was blanked by the creator (no other edits other than Kylie's decline) so I G7'd it. Primefac (talk) 14:43, 11 January 2023 (UTC)Reply[reply]
@KylieTastic, I've updated the code accordingly. — Qwerfjkltalk 16:39, 11 January 2023 (UTC)Reply[reply]
One thing to consider is not re-adding the template when it is removed. For example, at Draft:List of LaRusso cast, the bot added the template six times after being reverted. — Ingenuity (talk • contribs) 13:40, 12 January 2023 (UTC)Reply[reply]
Is there ever a good reason not to have the submission template in the draft space? The only situation I can think of is experienced users who are using draft space to work on something but plan on moving it when ready instead of submitting via AfC. Hey man im josh (talk) 14:29, 12 January 2023 (UTC)Reply[reply]
It could just be someone preference, but often it indicates to me an editor who may need to be watched or is trying to use draft-space as a webhost. They are removing declines as well on this and others and generally not looking constructive. KylieTastic (talk) 14:40, 12 January 2023 (UTC)Reply[reply]
@Ingenuity, I'm fairly sure if you add {{bots|deny=Qwerfjkl (bot)}} the bot won't edit; I'm unwilling to check the history if the editor is just being disruptive. — Qwerfjkltalk 17:04, 12 January 2023 (UTC)Reply[reply]
I believe the point is that the bot is doing it in the first place. Sure, that will fix this draft, but what about the next one? And the next? Primefac (talk) 17:44, 12 January 2023 (UTC)Reply[reply]
@Primefac, I can't think of a valid reason that a non-autoconfirmed editor would have to get rid of this template. — Qwerfjkltalk 20:53, 12 January 2023 (UTC)Reply[reply]
If you want what will actually happen, the bot will keep on adding the template until the editor gives up. It doesn't seem that problematic.
If this is a major problem, however, I can change the code so it doesn't reëdit pages. — Qwerfjkltalk 20:55, 12 January 2023 (UTC)Reply[reply]
I understand the point of why this bot exists: people create drafts and don't realise they have to submit them in order for them to be moved out of draft space, so they linger and then the creator wonders why their draft got deleted six months later. However, if someone is removing the /draft template, then (much like user talk warnings) we should probably assume that they know about this aspect, and therefore should not edit war with them and force the /draft template to be on the page. In other words, there is no requirement for the /draft tag to be on there, so a user doesn't need a "valid reason" to remove it. Primefac (talk) 09:43, 13 January 2023 (UTC)Reply[reply]
@Primefac, fair enough. The bot should no longer add the template more than once (counting only drafts tagged since now). — Qwerfjkltalk 15:05, 13 January 2023 (UTC)Reply[reply]
See, for example, Draft:Império Alviverde (history). — Qwerfjkltalk 10:37, 14 January 2023 (UTC)Reply[reply]

Drafts that are nominated for CSD For those with admin goggles, see Draft:Greg Romaguera which was deleted as an BLP attack page. Within seconds of the nomination, the bot added the draft template but I am not sure it is a good idea to encourage re-submission of drafts that are actively nominated for deletion. Can the bot be coded somehow to skip actively nominated drafts then do a "catch-up" at some point later in case the nomination is declined? S0091 (talk) 22:18, 22 January 2023 (UTC)Reply[reply]

  • @S0091, I suppose I could, but I don't see the problem. Either it's deleted, so the page content doesn't matter, or it's not, and submitting it is fine. — Qwerfjkltalk 07:24, 23 January 2023 (UTC)Reply[reply]
    I think the issue in this particular scenario may because the draft is blanked automatically for G10s so the bot comes along adds the draft template. It seems odd to encourage re-submission of a potential attack page. However, G10s are usually handled quickly so probably rare that would occur so agree this is likely a non-issue. S0091 (talk) 18:34, 23 January 2023 (UTC)Reply[reply]
    @S0091, what's the template used? — Qwerfjkltalk 18:11, 24 January 2023 (UTC)Reply[reply]
    @Qwerfjkl, {{db-g10}} and its' many redirects ask the user to blank the page the template is applied to, which some tools (I don't know which ones off the top of my head) do automatically. casualdejekyll 20:18, 24 January 2023 (UTC)Reply[reply]
    @Casualdejekyll, @S0091, it won't add the draft template if {{db-attack}} is applied, which is what Twinkle adds. — Qwerfjkltalk 20:21, 24 January 2023 (UTC)Reply[reply]
    Or {{db-negublp}}. — Qwerfjkltalk 20:23, 24 January 2023 (UTC)Reply[reply]
  • Trial complete. @ProcrastinatingReader. — Qwerfjkltalk 18:14, 24 January 2023 (UTC)Reply[reply]

Operator: Sheep8144402 (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 16:08, Friday, October 21, 2022 (UTC)

Function overview: Fix any font tag Linter error

Automatic, Supervised, or Manual: automatic, may be supervised to reduce mistakes

Programming language(s): AWB

Source code available: User:SheepLinterBot/1 for regexes, User:SheepLinterBot/1/Signature submissions#Completed per the table

Links to relevant discussions (where appropriate): 1 (especially this) 2 3 4

Edit period(s): varies

Estimated number of pages affected: varies, usually few hundred to few thousand for one sig, millions in total

Namespace(s): any applicable that have obsolete font tags linter errors updated 26 November 2022

Exclusion compliant (Yes/No): no

Function details: (This BRFA was originally made to fix TWA-related linter errors but was withdrawn (and postponed) because I kinda changed my mind back then.) Fixes any signature with font tag linter errors I may request that the bot fix, so the estimated number of pages may vary per sig. The linter errors it fixes varies depending on what I put in the queue, although I may use regex expressions to try to clear all the other font tags linter errors at once.

Originally it replaces MalnadachBot 12 due to issues involving many edits in a single page to fix linter errors; you can see here why the bot makes many edits to a single page to fix linter errors. Some of the regexes come from here to start and then I came up with more to minimize the number of font tags being left over after an edit.

Edit as of 22 December 2022: Originally it was planned that I fix signatures that have other linter errors as well, but because doing such on base user talk pages triggers the "you have new messages" notification even when the edit is minor for my main account, I will also request approval to fix signatures that have other Linter errors as well, i.e. missing end tags. Actually, nevermind; that is gonna be left for MalnadachBot 12. This bot task aims to take over MalnadachBot fixing font tags.

Discussion

  • Comment: The proposed bot operator should probably read all of this long discussion and explain how they intend to address the concerns raised there. – Jonesey95 (talk) 16:31, 21 October 2022 (UTC)Reply[reply]
    Agreed with the above; are you reducing the number of times the bot will edit the page, or will the bot only make a single edit to each page? Primefac (talk) 08:07, 25 October 2022 (UTC)Reply[reply]
    Edit periods will vary because it's my discretion to run the bot; these regex expressions will (hopefully) reduce the number of times the bot will edit the page. Most often it'll edit only one time, although if there are at least 100 obsolete HTML tags or so, or there is a mix of color, face and size attributes then it may take a few edits (hopefully not peaking at Malnadachbot's 10+).

    Also note that it may take a few days to build those regexes because in reality I have a lot of work to get done so I may have less time to build those regexes and put them all at once. Sheep (talk) 11:54, 25 October 2022 (UTC)Reply[reply]

    Update: (it's been a while) OK, so I've done a hundred edits using AWB with these regexes. Note that these mostly catch font tags with one attribute only. This may hopefully reduce the number of outstanding font tags to fix. Also if there are non-Latin characters, or the font tags are outside the link, the regexes may not catch those. Edit: I did read that discussion and I'm now coding in some regexes to minimize the number of font tags being left over after each edit. Sheep (talk) 23:10, 21 November 2022 (UTC)Reply[reply]
    This is a strange (non-)response to the comment above. Has the proposed bot operator read that discussion?
    The edit summary should link to a page that explains the Linter errors that are being fixed.
    Notice at this history page that MalnadachBot visited the page twice, Sheep8144402 visited once, and there are still font errors on the page. While I think that it is good that the number of errors was reduced with each edit, editors at the discussion linked above objected to the multiple visits. Maybe the bot could examine the page for font tags after it makes its proposed edits but before saving; if there are font tags remaining, the bot could abandon the edit. – Jonesey95 (talk) 04:53, 22 November 2022 (UTC)Reply[reply]
    Also that was before I edited the regexes because before, if there were any other tag inside the font tag, my regexes wouldn't catch those, so now I've edited them so they can catch them now. Sheep (talk) 13:07, 22 November 2022 (UTC)Reply[reply]
    I checked my edits and found that 383/1968 font tags were left over, so that meant 19.46% of the font tags were left over. These edits were made using these regexes. When I use these regexes to make another hundred edits, the number of font tags left over are 171/1254, which is 13.64%. I aim to get as close to zero as possible so I can maximize the number of edits the bot will make (assuming the bot skips edits when it can't get all font tags with the regexes at once). Sheep (talk) 00:21, 23 November 2022 (UTC)Reply[reply]
    That sounds like good progress. I looked through the most recent 30 edits, and I see some easy additions that you could make to your regexes. I think that you can get to about 90 or even 95% with a few more iterations. If the bot abandons edits with font tags remaining on the page, that should result in a pretty successful bot that avoids most of the complaints that plagued the hard-working MalnadachBot. – Jonesey95 (talk) 03:06, 23 November 2022 (UTC)Reply[reply]
    I've now coded in regexes so they can catch font tags with at most two attributes; however they can't catch any font tag inside a font tag, and using the = character between the regex font tags will not work even when escaped. Again I've made a hundred edits with these regexes and found 224/1521 (14.73%) font tags left over. The unusually high % is because 72 of the font tags missed do not have quotation marks, which my regexes were designed to catch font tags with quotation marks. I've edited these regexes and caught these tags, which reduces the % to 9.99%.
    Note that sometimes, my regexes create more Linter errors (example). So it seems my regexes are not 100% accurate. However I will make the bot skip any edits if the page still has font tags to reduce the number of extra edits needed. Sheep (talk) 01:01, 24 November 2022 (UTC)Reply[reply]
    That percentage looks good to me. I recommend to BAG that this bot be allowed to go to trial, with the understanding that if there are font tags remaining on a given page after the bot has applied its regexes, the bot will skip editing those pages entirely. The bot should be fed a selection of pages that result in some skips and some edits. To help with selection of pages: any VPT archive page is likely result in a skip, since they are full of unusual signatures, and many XFD and User Talk pages should result in successful edits, since they tend to be short. – Jonesey95 (talk) 01:57, 24 November 2022 (UTC)Reply[reply]
    Approved for trial (50 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. This is on the recommendations of Jonesey95, who I know is on the front lines in this effort. Primefac (talk) 11:16, 28 November 2022 (UTC)Reply[reply]
    @Primefac: Also since this is going to be an AWB bot, mind adding it to the checkpage so it can run? Sheep (talk) 13:20, 28 November 2022 (UTC)Reply[reply]
    Trial complete. (contribs) Please note that some of the replacements weren't correct, so I had to stop the bot during the trial to fix whatever errors it has caused. It's because some replacements aren't updated when I updated regexes to include whether there's a double quotation mark or not, so these errors show up. It made 20 edits at the time I had to stop. These replacements have been fixed, so everything should be AOK.
    The bot has checked 78 pages and did 50 edits, so the edit-to-page ratio is about 1:1.56 using these set of regexes; the bot edits ~64.1% of all pages it checks. It is lower when there are many font tags on the page, but higher when most of them have just a few. Sheep (talk) 21:16, 28 November 2022 (UTC)Reply[reply]
    Please modify the edit summary so that it links to this BRFA. I examined the last 30 edits in the set (after the regex fix) and did not find any errors. Nice work. I recommend an extended trial. – Jonesey95 (talk) 21:46, 28 November 2022 (UTC)Reply[reply]
    I've now coded in regexes to catch font tags with 4 or 5 digit color hexadecimal codes and font style tags with color, face or size, and to include every character possible except the equals sign (since including it would break regexes). Also before they could catch font tags with either double or no quotation marks; now they can catch font tags with single quotation marks (apostrophes). I've done another hundred edits with these regexes, but rather than visiting exactly 100 pages (which ends up with a small % of font tags left over), I decided to skip pages when there are still font tags remaining. This bot would have this behavior.
    The edit-to-page ratio (which shows the number of edits made after checking that many pages) is 1:1.4 (~71.4% pages edited). So I'm pretty sure that when this gets approved, my bot should be able to fix about 4-5mincluding font tags not counted by Linter obsolete HTML tags linter errors. Sheep (talk) 22:35, 2 December 2022 (UTC)Reply[reply]

I have checked all of the 100 latest edits. There are some errors:

If you have made more such edits from your main account, please fix them. IMO it is okay if the bot skips more pages, the important thing should be that the bot does not replace one error with another. ಮಲ್ನಾಡಾಚ್ ಕೊಂಕ್ಣೊ (talk) 15:00, 3 December 2022 (UTC)Reply[reply]

  • Welp, just doing the search insource:/\<font/ is not ideal then, so maybe we come up with a plan; I have this page where you can submit signatures that (1) the regexes can't catch, or (2) have font tags that wrap wikilinks so therefore the color doesn't render have just updated 3 January 2023 font tags. And rather than fixing those particular signature(s) which will cause something similar to this to happen, the regexes will be applied to fix other font tags, and if there are still font tags remaining, the bot will skip that page. Sheep (talk) 15:51, 3 December 2022 (UTC)Reply[reply]
    If possible, adjust your regexes to be more selective about what they choose to modify. It is better to fix 1,000 pages with no errors than to fix 10,000 pages with 400 errors, IMHO. – Jonesey95 (talk) 21:15, 3 December 2022 (UTC)Reply[reply]
    I have now added regexes to catch font tags wrapping links; some of the ideas were from here to start and then the rest were from my regexes above. Regexes have been adjusted to exclude [ and ] so there would be fewer errors like these. Unfortunately these cause the bot to skip such instances like these, but it should resolve most of the issues from above. Sheep (talk) 01:05, 4 December 2022 (UTC)Reply[reply]
    Update: OK, it's been a while, but I've now done three hundred edits (instead of a hundred) using these regexes. The reason I do three hundred is to allow more edits to be checked before this bot task actually runs. Unfortunately if font tags wrap wikilinks, the page may be skipped. However to ensure that some of such pages can still be edited, I have adjusted regexes to catch font tags that wrap a wikilink and text around it.
    The edit-to-page ratio in this case is 1:1.653 (~60.5% pages edited). It seems my bot will average about 1:1.5 based on these tests. Sheep (talk) 00:47, 15 December 2022 (UTC)Reply[reply]
    Also those edits were made when regexes are made to ignore hidden comments, images, internal/external wikilinks, math and nowiki. Sheep (talk) 01:46, 18 December 2022 (UTC)Reply[reply]
  • Update: An extended trial may also be used for this since I also requested approval to fix signatures with other Linter errors as well. Originally it was made so that I fix signatures with other Linter errors, but many of such signatures appear on User talk pages, which when I edit them, trigger a notification to the affected users. Some examples for this:
All of these result in 964 user talk pages edited, which will trigger a notification to most users, and which is not a good thing for me. The bot will still skip pages with font tags still remaining. Edit: The true number is far higher than this since User talk namespace has the most obsolete tags of all namespaces. Sheep (talkhe/him) 21:04, 22 December 2022 (UTC)Reply[reply]
Actually, that's gonna be left for MalnadachBot 12. I don't feel like taking over the entire MalnadachBot task 12 since there is already approval for that task. {{BAG assistance needed}} I do feel like an extended trial to ensure there are no errors; I also may develop regexes to maximize the number of pages edited while avoiding errors. However no response by BAG within past few weeks since the trial was complete. Just so you know, the whole purpose I do tests of edits (usually a hundred, this is an example) is to ensure the bot operates correctly when using regular expressions to fix these tags. Sheep (talkhe/him) 21:42, 25 December 2022 (UTC)Reply[reply]
I'm currently listing a sample of signatures on this page. Basically whatever signatures get submitted are processed and listed in a page similar to this. Mean page size is calculated by adding up the kilobytes of these pages and then dividing by the number of pages with the signature. Either an extended trial through the sample or a bunch of random pages is fine for me. Sheep (talkhe/him) 04:24, 2 January 2023 (UTC)Reply[reply]

Trial 2

Symbol tick plus blue.svg Approved for extended trial (100 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. Primefac (talk) 11:24, 11 January 2023 (UTC)Reply[reply]

Trial complete. (contribs) No errors this time. I used the sample so to demonstrate what I mean by this task. While I was going across the sample, I used the ten signatures' replacements along with the regex replacements so that they can all be replaced at the same time. Some points I want to consider while doing this trial:
  • I skipped base user talk pages since the bot account is currently unflagged and editing base user talk pages would trigger a notification to them, which I don't want.
  • The "font style/class" regex is designed to catch every character, but if it has color, the wikilink color wouldn't render. Because of this, I excluded square brackets from the set of characters the regex would check, and put <font class=... as a separate regex; however, that means the bot will skip pages with <font style="background:... wrapping wikilink(s) and similar skipped pages with such instances but will use the font style/class regex for now. Nevertheless you can get the bot to fix such instances like these correctly on User:SheepLinterBot/1/Signature submissions. Input a signature like <font style="color:...">[[wikilink target|something]]</font> and the bot will fix the signature correctly ([[wikilink target|<span style="color:...">something</span>]]). updated 14:10, 12 January 2023 (UTC)
  • The first set of regexes that fix font tags is considered twice, before and after processing the page, and the equal sign is now considered in the set of characters to check between font tags, which is why some consecutive font tags were fixed.
  • <b>[[User:Deiz|<FONT STYLE="verdana" COLOR="#000000">Dei</FONT><FONT COLOR="#FF3300">zio</FONT>]]</b> can be caught by regexes and turns into <b>[[User:Deiz|<span style="verdana;color:#000000;">Dei</span><span style="color:#FF3300;">zio</span>]]</b>, but I don't know if that is the correct replacement for the signature since font style="verdana" isn't correct. In earlier times, whenever I stumbled with that signature, I would replace it with <b>[[User:Deiz|<span style="font-family:verdana;color:#000000;">Dei</span><span style="color:#FF3300;">zio</span>]]</b> since "verdana" is a font face and I thought it would be acceptable to use "font-family" CSS as the replacement. And in later times I thought it was not valid so I just replaced it with <b>[[User:Deiz|<span style="color:#000000;">Dei</span><span style="color:#FF3300;">zio</span>]]</b>. I do not know what I should go with.
Otherwise, everything should be AOK. Sheep (talkhe/him) 01:17, 12 January 2023 (UTC)Reply[reply]
For things like User:Diez's signature, I would replace it with what it looks like currently with valid css. Most users just move markup around in their signature till they get something they like. This seems like they were experimenting with adding font family before settling on something that doesn't render it. I would replace it with <b>[[User:Deiz|<span style="color:#000000;">Dei</span><span style="color:#FF3300;">zio</span>]]</b>. The trial looks good otherwise. ಮಲ್ನಾಡಾಚ್ ಕೊಂಕ್ಣೊ (talk) 02:18, 12 January 2023 (UTC)Reply[reply]
As noted as to why the regexes that fix font tags with color attribute only are used multiple times, that is because each time they are used, they fix only one instance of consecutive font tags. This is an instance of me using the first four regexes seven times to fix font tags.
Regexes work fine most of the time, but there are edge cases where they sometimes don't work properly with the equal sign to check in the character set. They are useful for fixing some consecutive font tags; however when the equal sign is used, not all font tags get replaced. Using the regex \< *font +size *\= *(\"|\'|) *(0|1|1px|-[2-5]) *(\"|\'|) *\>(.+)\<\/ *font *\>, <font size="-2">foo</font>bar<font size="-2">baz</font> gets replaced with <span style="font-size:x-small;">foo</font>bar<font size="-2">baz</span>. The regex was supposed to catch the first closing font tag but it instead went for the second. I do not know why. Sheep (talkhe/him) 01:09, 13 January 2023 (UTC)Reply[reply]
Just a note that it's because I did not make the quantifier lazy. \< *font +size *\= *(\"|\'|) *(0|1|1px|-[2-5]) *(\"|\'|) *\>(.+)\<\/ *font *\> turns <font size="-2">foo</font>bar<font size="-2">baz</font> into <span style="font-size:x-small;">foo</font>bar<font size="-2">baz</span>, but \< *font +size *\= *(\"|\'|) *(0|1|1px|-[2-5]) *(\"|\'|) *\>(.+?)\<\/ *font *\> turns <font size="-2">foo</font>bar<font size="-2">baz</font> into <span style="font-size:x-small;">foo</span>bar<span style="font-size:x-small;">baz</span>. However, that regex is now edited to include <font size="1.5">foo</font> and similar; it is now \< *font +size *\= *(\"|\'|) *([0-1]|1px|[0-1]\.[0-9]*|-[2-5]) *(\"|\'|) *\>(.+?)\<\/ *font *\>. Problem solved. However that means it can't fix font tags inside and outside wikilink(s) properly unless I reorder the first set of regexes, which I did; font tag regexes with two attributes go first.
There should be no instance of this happening since the bot would skip pages that still have font tags linter errors. In case you don't know, the order for fixing font tags goes as follows: signature replacements, to regex replacements, to the first four regexes six more times. Sheep (talkhe/him) 17:05, 13 January 2023 (UTC)Reply[reply]
Also noting that I have now coded regexes to fix font tags with color face and size. Hopefully it can further increase the edits % on average. Sheep (talkhe/him) 17:32, 13 January 2023 (UTC)Reply[reply]
I've done three hundred edits once again and I will now compare the two tests with the edit-to-page ratio, which measures the % of checked pages edited while checking pages.
Notice the comparison that the ratio has increased closer to 1:1. It is not possible to get it to exactly 1:1 (achieveable using a perfect set of regexes), although I will still code regexes to catch more font tags. I did three hundred rather than one hundred so the ratios could be more accurate. Unfortunately, this will be the last test before the bot task is approved. (Update as of 14:51, 17 January 2023 (UTC): I am going to do the very last test of 300 edits in the upcoming hours since as of this post I am in high school right now, so I cannot use AWB during my school hours.)
I would like to point out one thing that would make the page harder to read. When using the second set of regexes to fix font tags, for some reason when there's already the same tag in the wikilink, another exact tag would be added. For example, <font color="red">([[foo|<font color="red">bar</font>]])</font> would be replaced with <span style="color:red;">([[foo|<span style="color:red;"><span style="color:red;">bar</span></span>]])</span>. I had skipped the page containing it despite it counting through the ratio, though that was done for accuracy reasons. Sheep (talkhe/him) 20:20, 15 January 2023 (UTC)Reply[reply]
While this BRFA is open I will still continue to develop regexes to fix more font tags and skip fewer pages while keeping the error rate as low as possible. Before implementing them to AWB, I would test the regexes by using a fake signature in another website. In the meantime, since there are no errors in the extended trial, this can be approved, and then the process of fixing font tags can begin. Or, you can approve this for the last extended trial, with a mix of random pages and the sample. updated 14:51, 17 January 2023 (UTC) Sheep (talkhe/him) 02:26, 13 January 2023 (UTC)Reply[reply]

Comparison of three tests of 300 edits

The very last test of 300 edits before this bot task is approved is now complete. Here are the results:

Test of 300 edits Regexes used Edits % (ratio)
15 December 2022 Special:Permalink/1127480757 60.5% (1:1.653)
15 January 2023 Special:Permalink/1133798228 65.6% (1:1.525)
18 January 2023 Special:Permalink/1134207566 63.1% (1:1.585)

1 page was skipped due to characters in the Unicode Private Use Area, and 1 page was skipped due to not having font tags (there was a false positive when trying to get pages with font tags), so they had to be discounted in the ratio. Also, I had to manually skip one page due to two consecutive span tags in a wikilink when trying to fix font tags wrapping one wikilink and text around it. Apparently with the font style/class regex, a signature ended up getting replaced with span tags outside a wikilink. So either I have to make it two separate regexes, or you can submit the signature to my submission page so the bot can get the fix correct.

Regexes are made to ignore external/interwiki links, images, nowiki, math and hidden comments. To know how strong my regexes are, I use two things for two scenarios. If the bot was made to not skip pages with font tags, I would use the font tag percentage. If the bot was made to skip pages with font tags, I would use the edit-to-page ratio. Currently I use the ratio because I will make the bot skip pages with font tags; it's better for other editors complaining about MalnadachBot making many edits to a single page to fix font tags, doing such creates fewer errors when editing pages, and it is also easier to calculate. After the last test of 300 edits, either this BRFA can move on to the last extended trial, with half of the edits made with random pages and the other half from the sample, or it can go to straight approval. Sheep (talkhe/him) 13:29, 18 January 2023 (UTC)Reply[reply]

A user has requested the attention of a member of the Bot Approvals Group. Once assistance has been rendered, please deactivate this tag by replacing it with {{t|BAG assistance needed}}. No edit by BAG since seven and a half days. Sheep (talkhe/him) 00:12, 20 January 2023 (UTC)Reply[reply]

Operator: Theleekycauldron (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 11:34, Tuesday, December 13, 2022 (UTC)

Function overview: Updates pages within Wikipedia:Did you know/Statistics and notifies DYK nominators when their hooks perform exceptionally well

Automatic, Supervised, or Manual: Automatic

Programming language(s): Python

Source code available: yes

Links to relevant discussions (where appropriate): None obtained, but I've been doing this semi-automatically for about a year and no one's objected. Also, it's not a very controversial task

Edit period(s): Daily

Estimated number of pages affected: 20 pages per month, with the exception of one-time full-sweeps

Namespace(s): Wikipedia, User talk

Exclusion compliant (Yes/No): Not yet

Function details: After hooks appear on Template:Did you know, they are archived to Wikipedia:Recent additions. From Wikipedia:Recent additions, editors used to manually add entries to Wikipedia:Did you know/Statistics, manually and somewhat less than comprehensively, to log the best-performing hooks. With the addition of a semi-automatic script and standardizing templates about a year ago, the updating of Wikipedia:Did you know/Statistics/Monthly DYK pageview leaders has become much easier, but it's something I should probably be running a bot for instead of clicking a button every day. I've finally rewritten the code in python, and it's actually better than the original! It works like this:

  1. Every day, at around 03:00 UTC, it goes through Wikipedia:Recent additions
  2. For each hook in the page, it creates a Hook object containing start and end time, the text of the hook, and the image file. From there, the Hook object self-executes a function to calculate its pageview data.
  3. For each Hook object, a function puts the objects in terms of the {{DYK stats table}} templates, creating a statistics table of all the hooks and their pageviews that the bot edits to the stats page. An example of this can be found in the second table at Wikipedia:Did you know/Statistics/Monthly DYK pageview leaders.
  4. Another function analyzes the pageviews data from the set of Hook objects and creates a summary table with some basic information such as low, high, median, and the percentage of hooks that clear a certain bar (600 views per hour for non-imaged hooks and 1000 views per hour for imaged hooks). An example of this table can be found at the top of Wikipedia:Did you know/Statistics/Monthly DYK pageview leaders, as well as in the lines of Wikipedia:Did you know/Statistics/Monthly summary statistics.
  5. Finally, for each Hook object that passes the threshold, the bot uses a search to find anyone given a DYK credit for the hook and gives them a {{DYK views}} template, which just gives them a nice compliment and a link to their hook on the stats page.

theleekycauldron (talkcontribs) (she/her) 23:39, 16 December 2022 (UTC)Reply[reply]

Discussion

  • {{BAG assistance needed}} Forgive me for this buzz, but it seems like this nomination has languished without the attention of a BAG for a bit long. The template said to wait a week, it's been I believe 8 days, and most of the current BRFAs are getting an initial comment from a BAG member on the day of or the day after. That includes a bot in the same field, DYK, that was thrown to trial almost immediately. Is there something I'm doing wrong? theleekycauldron (talkcontribs) (she/her) 05:49, 22 December 2022 (UTC)Reply[reply]
  • @Theleekycauldron: Approved for trial (7 days). Please provide a link to the relevant contributions and/or diffs when the trial is complete. --TheSandDoctor Talk 23:44, 25 December 2022 (UTC)Reply[reply]
  • Thanks, TheSandDoctor! Trial complete. :) Okay, breaking down the edits made, because there were a lot of 'em:
  • Truly sorry about the edit volume! Let me know if there's anything more specific you want to know. theleekycauldron (talkcontribs) (she/her) 05:56, 2 January 2023 (UTC)Reply[reply]
  • Oh, and there's an ongoing bug Gerda Arendt has been telling me about – a page got moved during its DYK run, and the script isn't currently set up to count that correctly. I may implement some kind of system for that at a later time. theleekycauldron (talkcontribs) (she/her) 05:58, 2 January 2023 (UTC)Reply[reply]
    I see no reason for the bot to take care of that special condition, which simply should not happen. I warned the very day when it happened last, seeing that coming, and sure enough nobody listened. I should change my user name to Cassandra ;) - Private question: if the minimum for stats is 12 * 1.000 = 12.000, why didn't I get the nice compliment for Talia Or? What about stats for the image also, in relation? ('cause I'm sure in her case the hook didn't matter at all.) For more fine tuning, you could set the results in relation to the views of the Main page that specific day, which varies, seen on User talk:Dank, with a low on the Christmas days, and a spike afterwards (which I had also predicted). Happy new year! --Gerda Arendt (talk) 06:35, 2 January 2023 (UTC)Reply[reply]
    Hah! Cassandra indeed :) the answer to your question would be that Talia Or had a 24-hour run, not a 12-hour run, so the minimum bar would be 24,000 (for a consistent average of 1,000 views per hour). Or scored 545.7vph, which is about average for images in December 2022. Happy new year! theleekycauldron (talkcontribs) (she/her) 06:48, 2 January 2023 (UTC)Reply[reply]
    As it turns out (courtesy ping to @Gerda Arendt), if you've thumbed through the number of views on the Main Page over time, you'll see weird jumps and falls – changes of millions of views per day at a time. They're not spikes, though, they're lasting sudden changes of millions of views per day. Those are likely caused by web scrapers or otherwise, and aren't easily adjusted for in calculating DYK views. theleekycauldron (talkcontribs) (she/her) 06:50, 2 January 2023 (UTC)Reply[reply]

Operator: Legoktm (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 01:55, Thursday, September 8, 2022 (UTC)

Function overview: Semi-protect TFAs

Automatic, Supervised, or Manual: Automatic

Programming language(s): Rust

Source code available: [5]

Links to relevant discussions (where appropriate):

Edit period(s): Daily

Estimated number of pages affected: 1 per day

Namespace(s): mainspace

Exclusion compliant (Yes/No): No

Adminbot (Yes/No): Yes

Function details: Note: This has only been approved for a 30-day trial, at which point it would need further community consensus to keep running AIUI.

  • This is fully independent of the move protection the bot already applies
  • At 23:00 UTC, get the next day's TFA (following any redirect)
  • Get edit protection status:
    • If protection is indefinite, do nothing
    • If protection expires after the article is off TFA, do nothing
    • If protection expires before the article is off TFA, extend the current protection until it is off TFA (keeps existing protection level)
    • If there is no protection, apply semi-protection until it is off TFA

I ran a simulation of the next 30ish TFAs (full logs): here's an example of a page that has no edit protection:

INFO tfa_semi_prot: 55 Wall Street needs protection!
INFO tfa_semi_prot: Protection options: [["action","protect"],["title","55 Wall Street"],["protections","edit=autoconfirmed|move=sysop"],["expiry","2022-09-12T00:00:00Z|2022-09-12T00:00:00Z"],["reason","Upcoming TFA ([[WP:BOT|bot protection]])"]]

And here's an example of a page that has semi-protection, but it needs to be extended:

INFO tfa_semi_prot: A.C. Monza needs protection to be extended!
INFO tfa_semi_prot: Protection options: [["action","protect"],["title","A.C. Monza"],["protections","edit=autoconfirmed|move=sysop"],["expiry","2022-09-21T00:00:00Z|2022-09-21T00:00:00Z"],["reason","Upcoming TFA ([[WP:BOT|bot protection]])"]]

Discussion

Notifications: Wikipedia_talk:Today's_featured_article#TFA_bot_semi-protection, @Hog Farm, Sdkb, ProcrastinatingReader, SD0001, and Peacemaker67:. Legoktm (talk) 02:09, 8 September 2022 (UTC)Reply[reply]

Thanks very much Legoktm. I can't speak for the code, but your efforts to operationalise this for the trial period is greatly appreciated. Regards, Peacemaker67 (click to talk to me) 03:46, 8 September 2022 (UTC)Reply[reply]
Thank you as well. This was very much needed. Hog Farm Talk 13:47, 8 September 2022 (UTC)Reply[reply]

The RfC closer approved a 30-day trial, and then evaluate how well it went, presumably culminating with another RfC. To do so, we need a mostly equivalent 30-day period we can compare against. I'm not sure we can look look to the previous month, since it could be impacted by seasonal events (e.g. vandalism goes down when school starts) nor the same time in the previous year (COVID, etc.). One idea I had last night was to run the trial over the next 60 days, only semi-protecting every other day. I think that would give us a reasonable sample of data to compare and evaluate the effectiveness of the protection. Legoktm (talk) 16:35, 8 September 2022 (UTC)Reply[reply]

That sounds reasonable. Hog Farm Talk 20:58, 8 September 2022 (UTC)Reply[reply]
Every other day over 60 days sounds reasonable to me. Maybe drop a note saying this at Wikipedia_talk:Today's_featured_article so interested parties are aware? If no objections are forthcoming then I think it's good to proceed with that plan. ProcrastinatingReader (talk) 13:22, 10 September 2022 (UTC)Reply[reply]
Done. Legoktm (talk) 06:02, 11 September 2022 (UTC)Reply[reply]

{{BotTrial}} Trial for 30 days of bot protection, done every other day, as discussed above. Thanks Legoktm, let me know how it goes. ProcrastinatingReader (talk) 18:47, 16 September 2022 (UTC)Reply[reply]

Great! Set the cron for 0 23 */2 * * and a calendar reminder to turn it off in mid-November. Legoktm (talk) 06:36, 18 September 2022 (UTC)Reply[reply]
Comment: It seems that this trial is largely working and I support the idea. But it seems that lately there has been an LTA who is vandalizing TFAs with autoconfirmed accounts, which results in ECP. wizzito | say hello! 23:49, 14 October 2022 (UTC)Reply[reply]

{{bot trial complete}} I've turned off the semi-protecting job. For next steps, we need to finish the data collection/analysis that was started at User:TFA Protector Bot/Semi-protection trial (I will aim to make some time in the next few days to start updating that again). Then hold an RfC for discussion on the long-term future of this task. Legoktm (talk) 20:06, 17 November 2022 (UTC)Reply[reply]

Sounds good; thanks! {{u|Sdkb}}talk 22:13, 17 November 2022 (UTC)Reply[reply]

Image-Symbol wait old.svg On hold. Marking this as as On Hold for the duration of the RFC. Feel free to disable the template once the RFC happened. Headbomb {t · c · p · b} 04:00, 18 November 2022 (UTC)Reply[reply]


Approved requests

Bots that have been approved for operations after a successful BRFA will be listed here for informational purposes. No other approval action is required for these bots. Recently approved requests can be found here (edit), while old requests can be found in the archives.



Denied requests

Bots that have been denied for operations will be listed here for informational purposes for at least 7 days before being archived. No other action is required for these bots. Older requests can be found in the Archive.

Expired/withdrawn requests

These requests have either expired, as information required by the operator was not provided, or been withdrawn. These tasks are not authorized to run, but such lack of authorization does not necessarily follow from a finding as to merit. A bot that, having been approved for testing, was not tested by an editor, or one for which the results of testing were not posted, for example, would appear here. Bot requests should not be placed here if there is an active discussion ongoing above. Operators whose requests have expired may reactivate their requests at any time. The following list shows recent requests (if any) that have expired, listed here for informational purposes for at least 7 days before being archived. Older requests can be found in the respective archives: Expired, Withdrawn.