Closed Bug 1259173 Opened 9 years ago Closed 9 years ago

Adjust content sent in Akismet check-content submission

Categories

(developer.mozilla.org Graveyard :: Editing, defect)

All
Other
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jwhitlock, Assigned: jwhitlock)

References

Details

(Keywords: in-triage, Whiteboard: [specification][type:change])

What feature should be changed? Please provide the URL of the feature if possible. ================================================================================== An Akismet development lead has suggested changes in the data we send to check edits for spam: * Omit fields that don't change (for example, a page title that doesn't change with an edit) * Omit derived fields like the slug * Remove quotes from tags * Include the permalink parameter * Include HTTP headers in submission * Send only the new content (for example, the lines added to a page, not the whole page) His example: So if I edit https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/splice to add a link to my spam site at the bottom, and I add the tag "lottery" with the comment "Improving splice documentation", your submissions would look like this: ``` <a href="http://winthelottteryonline.tld">win the lottery play the lottery win money online now</a> Improving splice documentation lottery ``` What problems would this solve? =============================== These changes should reduce the false positive rate (edits blocked as spam). Who would use this? =================== Contributors and staff reviewers What would users see? ===================== Contributors would see less legitimate content blocked by Akismet. What would users do? What would happen as a result? =================================================== Contributors would not be annoyed by spam protection and be encouraged to make more edits. Staff reviewers would not spend as much time rescuing false positives. Is there anything else we should know? ====================================== The PHP code can be used as an example of selecting HTTP headers to send: https://plugins.trac.wordpress.org/browser/akismet/trunk/class.akismet.php?rev=1348427#L122
Blocks: 1188029
Assignee: nobody → jwhitlock
Blocks: 1260253
Blocks: 1264390
No longer blocks: 1260253
Commit pushed to master at https://github.com/mozilla/kuma https://github.com/mozilla/kuma/commit/18c45138753d3cb16b32b552d16eaee9b00e9d7d bug 1259173 - Rewrite RevisionForm Tests The RevisionForm is used in different ways in different views. Rewrite the tests to setup the forms correctly for English edits, new pages, new translations, and editing translations. Also, since the Akismet payload will change quite a bit, especially the comment_content, test more of the payload for each scenario.
Commits pushed to master at https://github.com/mozilla/kuma https://github.com/mozilla/kuma/commit/6afdbf268c16bfc3a316d0a9644b7289dc7c609d bug 1259173 - Add HTTP headers to Akismet payload Copy logic from Akismet's Wordpress plugin for adding select headers to the Akismet payload. https://github.com/mozilla/kuma/commit/6bbc59370b0905b37e50fe9cd087a6eedd267d1a bug 1259173 - Fix Akismet 'blog' parameter The 'blog' parameter is supposed to be a full URI with a scheme (like http://). This adds a full URI as the default, and doesn't use it if the caller also passed in a blog parameter. https://github.com/mozilla/kuma/commit/31fcbacb92a4b5cead5d5d845c9d07a9f72c3909 bug 1259173 - Add blog and permalink to Akismet Add the blog (the homepage URL) and permalink (the original article URL) to the Akismet parameters. https://github.com/mozilla/kuma/commit/845b48c19d8d3a10db2a27103194f204eb419341 bug 1259173 - Summarize changes in comment_content Avoid sending titles, slugs, etc. unless they have changed. For the content, try to just send the lines that are new or that changed. https://github.com/mozilla/kuma/commit/7b35b1dbf68cd54a86a18e55bd7cfd41c961c924 bug 1259173 - Refactor to AkismetRevisionData The AkismetRevisionData class includes methods to extract Akismet data from requests, form data, Revisions and Documents, moving this logic outside the RevisionForm. https://github.com/mozilla/kuma/commit/928c8420cbe93b7d1bf462b5fed95bfe20e98d71 bug 1259173 - Update payload for admin ham/spam Add AkismetHistoricalData for creating an Akismet payload from a historical revision, using the new style of Akismet payload (setting the blog, content is just the differences, etc.). Use this instead of the function revision_akismet_parameters in the Django admin form and the management command. https://github.com/mozilla/kuma/commit/752cb29ce873d409ad6a4b4bd7010b90d72e304e bug 1259173 - Migrate database for RevisionIP.data This migration should be deployed to production before the code that uses it.
Commit pushed to master at https://github.com/mozilla/kuma https://github.com/mozilla/kuma/commit/ad74a64ab68b78dc0818f4b0de6770370f6f05eb bug 1259173 - Fix wiki migration 0031 Rename duplicate numbered migration 0030_add_data_to_revisionip.py.
Commits pushed to master at https://github.com/mozilla/kuma https://github.com/mozilla/kuma/commit/8901c802092492daee5beb335a53841294862983 bug 1259173 - Save Akismet payload for revisions When Akismet identfies an edit as not spam, save the submission details in case it has to be submitted as spam. https://github.com/mozilla/kuma/commit/57728d3bd1c45decc376a3619dad9616f4d3ce93 Merge pull request #3835 from mozilla/akismet-payload-1259173 Bug 1259173 - Adjust Akismet payload
Code is in staging and production. Akismet is enabled on staging, but not production. Holding open until we at least enable training mode in production.
Status: NEW → ASSIGNED
Commits pushed to master at https://github.com/mozilla/kuma https://github.com/mozilla/kuma/commit/a71fe8cfea474b149fbca19451925ae5fbddd2bb bug 1259173 - Anonymize Akismet payload https://github.com/mozilla/kuma/commit/00033418bf665f315b8c4aa2197fa8aeee79a711 bug 1259173 - Explicitly drop legacy tables A fresh developer database does not have these tables, which will cause an error when they attempt to test the anonymization process. Omit them from the clone_db script, and drop them if encountered. https://github.com/mozilla/kuma/commit/dd5bbb25fdf08bd4fbea0bf1929f41b8d69bf04a bug 1259173 - Copy wiki_documentspamattempt Clone the table for DocumentSpamAttempt, so that anonymize script can scrub it. https://github.com/mozilla/kuma/commit/128d19f322e14f9ae365e6e36bca2321e7994780 bug 1259173 - Improve error handling Fix error handling so that errors are reported nicely, instead of raising its own errors due to incorrect attributes, etc. https://github.com/mozilla/kuma/commit/3f8a25603626c018a79a621b9e61c907e3a1794b bug 1259173 - Clean up PEP8 issues Now passes flake8 checks. https://github.com/mozilla/kuma/commit/6f98dbb88fb231d7d1510bb08d1edeeff4662eba Merge pull request #3843 from mozilla/anon-akismet-1259173 bug 1259173 - Update anonymize scripts for Akismet, other changes r=jpetto
Our Akismet representative found some unexpected results around tags for this change: https://developer.allizom.org/en-US/docs/Web/JavaScript$compare?locale=en-US&to=913527&from=913432 The previous tags were: "JavaScript" "Landing" "JavaScript" The new tags are: "JavaScript" "Landing" "cialis" "JavaScript" Only the new tag "cialis" should be send to Akismet, but "JavaScript" is also being sent. I am not sure why this page has a duplicate "JavaScript" tag either.
Commits pushed to master at https://github.com/mozilla/kuma https://github.com/mozilla/kuma/commit/1c746f16b8221c5cfe47d0431fc3e520d5266a65 bug 1259173 - Akismet payload for new translations Compare new translations to the English version, to minimize differences to the changed strings. https://github.com/mozilla/kuma/commit/78b7b4db029219a1d348a8ba42cfef665148b40c bug 1259173 - Use previous revision's tags on edit The document.tags.names() method returns the names of tags, which includes alternate cases like "javascript" and "JavaScript". The revision's tags uses the same one, like "JavaScript" and "JavaScript". This data issue appears to be fixed in production, but staging has an old copy of the database.
Over 400 edits have been submitted with the new code, and it seems to be mostly working with the occasional bug. Instead of adding new issues as comments, I'll open new bugs.
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Product: developer.mozilla.org → developer.mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.