Closed Bug 1124358 Opened 9 years ago Closed 9 years ago

[spam] Experiment with Akismet

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: hoosteeno, Assigned: hoosteeno)

References

Details

(Whiteboard: [patchwelcome][difficulty=expert])

Justin Crawford [:hoosteeno] [:jcrawford]

Assignee

Description

•

9 years ago

We have a proposal on the table to apply Akismet to some/all content submissions on the MDN wiki. This will be an expensive project to undertake. We should run an experiment to learn if Akismet can help identify good and bad content on MDN. 

Suggested experiment:

Build a manual or persistent application that will...
* Look for new revisions (perhaps using the revisions feed https://developer.mozilla.org/en-US/docs/feeds/atom/revisions) 
* Post the body, title, and email address associated with new revisions to the Akismet API (http://akismet.com/development/api/#comment-check)
* Capture the response and store it along with the revision's URL
* Visually compare the outcome of Akismet scans to the manual triage underway (over the course of a week) to learn if Akismet can viably replace human triage

The tool coming out of this experiment might itself be useful to human triagers.

Māris Fogels [:mars] (please needinfo)

Updated

•

9 years ago

Severity: normal → enhancement

Justin Crawford [:hoosteeno] [:jcrawford]

Assignee

Comment 1

•

9 years ago

I wrote a little application using some node libraries that posts MDN revisions to the Akismet API[0]. It is a hack; please ignore the quality of the code and focus on the results.

I used this tool to post the contents and IP addresses of 100 random unbanned revisions (known ham) and 100 random banned revisions (known spam) from a recent MDN database export. I captured the results in a spreadsheet[1]. I did this three different times.

Across all 3 tests: 
* 98%-99% of ham was correctly identified as ham
* 70%-83% of spam was correctly identified as spam

Caveats:
* The Akismet API asks for many fields that we can capture, but don't currently store -- for example, user agent, time of day, language, etc. If we included those fields the accuracy would probably go up, since Akismet depends on a host of criteria to identify spam.
* The "known ham" and "known spam" I used are not perfect. Some of the ham I sent might actually have been spam. Some of the spam I sent might not have been true spam, but instead some other kind of objectionable content. Again in this instance, the test is probably less accurate than MDN's real results would be.

I believe this experiment demonstrates the power of Akismet to dramatically improve our spam triage.

Leaving this bug open a little while for discussion.

[0] https://github.com/hoosteeno/test_akismet/blob/master/test_akismet.js
[1] https://docs.google.com/spreadsheets/d/1PmvIp9nehcAREsQLzaAYqtQeNONQJ9taRqgeFcSVPjE/edit#gid=713193332

Justin Crawford [:hoosteeno] [:jcrawford]

Assignee

Updated

•

9 years ago

Assignee: nobody → hoosteeno

Eric Shepherd [:sheppy]

Comment 2

•

9 years ago

WANT.

Luke Crouch [:groovecoder]

Comment 3

•

9 years ago

Whoa, thanks :hoosteeno! This is great to know.

Justin Crawford [:hoosteeno] [:jcrawford]

Assignee

Updated

•

9 years ago

Blocks: 1168472

Justin Crawford [:hoosteeno] [:jcrawford]

Assignee

Comment 4

•

9 years ago

I opened a meta bug for spam heuristics where further discussion can occur: bug 1168472.

Status: NEW → RESOLVED

Closed: 9 years ago

Resolution: --- → FIXED

Luke Crouch [:groovecoder]

Updated

•

9 years ago

Comment 5

•

9 years ago

I had no idea this experiment had already been run; I thought you were just getting ready to do it. Well done!

This does sound promising; it certainly reaffirms my feeling that we should go for it as soon as we can reasonably do so.

BMO Automation

Updated

•

4 years ago

Product: developer.mozilla.org → developer.mozilla.org Graveyard

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

[spam] Experiment with Akismet

Categories

(developer.mozilla.org Graveyard :: General, enhancement)

Tracking

(Not tracked)

People

(Reporter: hoosteeno, Assigned: hoosteeno)

References

Details

(Whiteboard: [patchwelcome][difficulty=expert])

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Updated

Comment 2

Comment 3

Updated

Comment 4

Updated

Comment 5

Updated