Add a dashboard for training Akismet on blocked edits



3 years ago
3 years ago


(Reporter: jwhitlock, Assigned: jwhitlock)




(Whiteboard: [specification][type:feature])



3 years ago
What problem would this feature solve?
When Akismet incorrectly identifies an edit as spam (a "false positive" in spam-fighting terminology), there is no way to tell Akismet that it was incorrect. A method is needed for staff can train Akismet and reduce the false positive rate.

Who has this problem?
All contributors to MDN

How do you know that the users identified above have this problem?
Non-staff contributors have had valid edits blocked by the spam check, and told us about it.  From February 18th to March 10th, Akismet has blocked 324 contributions, and allowed 6174. Some portion of the 324 blocked contributions are false positives.

How are the users identified above solving this problem now?
Some complain to the msn-admins list [1], as suggested by the message they get for blocked contributions. Others have opened bugs like bug 1254180 to request edits.

[1] (archived are private to list members)

Do you have any suggestions for solving the problem? Please explain in detail.
When Akismet marks a revision as spam, store information about the attempted commit in the database, in the DocumentSpamAttempt table.  Create a dashboard so that MDN staff can review the details, and either confirm Akismet's decision or identify the change as "Ham".  Send the "Ham" results to Akismet to further train spam detection and reduce future false positives.

The spam submission information will contain sensitive information, such as IP addresses. The dashboard should be restricted to staff, and the data deleted 1) when a staff member has confirmed or denied the decision, and 2) after a short time period.  It should be omitted from any anonymized databases.

In order to get it deployed quickly, the first version will show the raw content as submitted, and not a diff against the previous content.  Interface improvements can be made in future iterations.

Is there anything else we should know?
Wordpress uses Akismet for checking comments. The official Akismet plugin [2] stores all comments in the database, and marks them as being identified as spam by Akismet.  The blog admin can then review the comments, and decide if a comment is legitimate (false positive), or should have been marked as spam (false negative), with the admin's decision sent to Akismet for training, and the blog's displayed contents updated accordingly.

Our use case, checking shared wiki content, is different than blog comments. We can train Akismet in order to reduce the false positive rate, but it is hard or impossible to re-submit the content to get it to display on MDN as if it was never blocked.  We will have to ask the user to attempt to resubmit their edit.



3 years ago
Blocks: 1188029
Keywords: in-triage


3 years ago
Assignee: nobody → jwhitlock
Commits pushed to master at
bug 1255609 - Enable training on false positives

Add additional data to wiki's DocumentSpamAttempt:
* data - JSON-serialized data sent to Akismet
* review - Staff review status (default 'Needs Review')
* reviewer - Staff reviewer
* reviewed - Review date

Old DocumentSpamAttempts do not have this additional data, so their
status is set to "Review Unavailable".

Update the Django admin for DocumentSpamAttempt so that it works better
as a dashboard for second-guessing Akismet:
* Limit the length of title and slug on the list display
* Allow filtering by "Needs Review", to treat the admin as a review
* Add an editable dropdown to quickly review multiple edits from known
* When a review of "Spam" is chosen, it confims Akismet's choice, and no
  futher API call is needed.
* When a review of "Ham / False Positive" is chosen, submit the
  original data to Akismet's submit_ham API endpoint. On failure, revert
  to unreviewed status so it can be tried again.

Add a management command and task to drop submission data for old
records, saving database resources and removing unneeded PII.

Update the anonymization scripts to drop submission data.
Merge pull request #3816 from mozilla/spam_dashboard_1255609

bug 1255609 - Enable training on false positives


Comment 2

3 years ago
A few issues are apparent from testing in staging:

* The Document column in the list view is too wide to fit in a desktop browser, and needs to be limited
* Reviewing a spam attempt sends a second email to the mailing list

I'll do this work against this bug before closing it.
Commits pushed to master at
bug 1255609 - Don't send email on spam review

Only send an email when a DocumentSpamAttempt is created (when a
contributor's edit is blocked as spam), instead of when the
DocumentSpamAttempt is updated (such as when a reviewer submits an edit
as ham).  Also, reuse the constance config EMAIL_LIST_FOR_FIRST_EDITS.
bug 1255609 - Doc in DocumentSpamAttempt admin

In the DocumentSpamAttemt admin list, shorten the document name so that
there are no more than 25 characters before a line break. This will help
make it fit well in a desktop browser.
bug 1255609 - Rename to EMAIL_LIST_SPAM_WATCH

Rename the contstance config "EMAIL_LIST_FOR_FIRST_EDITS" to the more
generic "EMAIL_LIST_SPAM_WATCH". Production and staging have not
customized, so should be a safe change.
Merge pull request #3825 from mozilla/spam_dashboard_fixes_1255609

bug 1255609 - Fixes for spam dashboard

Comment 4

3 years ago
Changes are deployed to production. I suggest that future changes are incorporated into an interface that isn't the Django admin.
Last Resolved: 3 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.