Closed Bug 1248476 Opened 8 years ago Closed 8 years ago

Train Akismet using historical spam

Categories

(developer.mozilla.org :: Security, enhancement)

enhancement
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: wbamberg, Assigned: jezdez)

Details

(Keywords: in-triage)

Akismet likes us to train it by feeding it known spam. Since we've been deleting spammy pages for over a year, and have (mostly, I hope) been writing "spam" in the revision comment, perhaps we could use this corpus of known-spam.

To do this we'd need to:
* somehow retrieve all pages deleted over, say, the last year
* that were marked as "spam" in the deletion comment
* somehow feed this to Akismet
Good thing to use those that are marked with "spam" in the deletion comment, because as templates/macros are currently wiki pages as well, and as we have deleted many of them, be sure to not submit deleted pages that have "Template:" in their slugs to Akismet.
Yes, there are definitely other reasons we delete things, and it would be bad to feed it ham by mistake. I don't know about anyone else, but I'm pretty consistent in using "spam" as the deletion comment for spam pages.

If we were to get a list of deleted pages whose comment was "spam", we could manually scan the titles to confirm that it is all spam. Most of our spammy pages have very spammy titles.
I also use spam most of the time and 'spammer' when banning an account.
Severity: normal → enhancement
Keywords: in-triage
Assignee: nobody → jezdez
Status: NEW → ASSIGNED
PR with a management command we can use: https://github.com/mozilla/kuma/pull/3790
This is now merged, waiting for another dev to be around to do a deploy.
I think this should be marked as RESOLVED/FIXED
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
For bugs that are resolved, we remove the security flag. These haven't had their flag removed, so I'm removing it now.
Group: websites-security
You need to log in before you can comment on or make changes to this bug.