Closed Bug 910767 Opened 11 years ago Closed 7 years ago

[tracker] reduce spam issues

Categories

(Input Graveyard :: Submission, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED WONTFIX
2014Q3

People

(Reporter: willkg, Unassigned)

References

Details

(Whiteboard: u=dev c=feedback p= s=)

We're getting a non-trivial amount of spam.

This bug covers:

1. specifying what that means with some metrics (i.e. it's not sufficient to define the problem we have now as, "OMG! SPAM!!!!!!111!!")

2. identifying one or more trends in the spam we're getting now

3. determining what steps we can take now to alleviate the spam we're getting now

4. writing up bugs to cover those steps
Putting this in the 2013q3 marathon. If I don't get to it, then we'll do it in 2013q4. It's fairly important we work on it sooner rather than later.
Priority: -- → P1
Whiteboard: u=dev c=feedback p= s=2013.q3
Putting this in 2013q4.
Whiteboard: u=dev c=feedback p= s=2013.q3 → u=dev c=feedback p= s=2013.q4
Fixing the sprint.
Whiteboard: u=dev c=feedback p= s=2013.q4 → u=dev c=feedback p= s=input.2013q4
Blocks: 909479
This needs the analyzers group stuff set up before I can start fiddling with things.
Depends on: 907872
One thing I've been doing with input processing is deleting duplicates. That is, if all fields except the date match, I only count it once.  You may get false positives if five people say "sucks" or "facebook" a lot, but the kind of stuff that's detailed and actually useful is very unlikely to be duplicated.

I tend to use small windows (1 week or so) so maybe we could do a cooling off period as in one-duplicated-per-week allowed or something like that.
I've seen a few responses where the description is pretty intensely long (> 2mb). It's probably the case we want the user to tell us lots of stuff, but 2 million characters is pretty over the top. We should truncate incoming response descriptions at some length that's the upper bound of what we're ok with. 5000 characters?
Depends on: 929647
Assignee: nobody → willkg
Adding this as a note so I don't forget:

Any action that deletes spam from the site should log something somewhere. I think for now we can just log counts. I want to know how much spam we're deleting with the new tools and when. We might even want to convert those spam responses into "spam responses" in a separate table so we can keep track of when they were created and when they were deleted and by whom.
Status: NEW → ASSIGNED
Depends on: 902032
Depends on: 949464
Depends on: 949461
Update:

> 2. identifying one or more trends in the spam we're getting now

We identified a couple of spam classes:

1. double-submits from Firefox for Android
2. scripts posting the same thing over and over again in a short period of time


> 3. determining what steps we can take now to alleviate the spam we're getting now

1. I added some code to eliminate double-submits by ratelimiting the (ip address, desc) tuple--that's working well
2. I added some code to reduce script issues by ratelimiting on ip address

Both of these ratelimits generate statsd data so we can track them in graphite and see whether they're going awry and how often they're getting tickled.


Outstanding things here:

1. the spam duplicates report needs a "delete all but one" button and a "delete all" button for deleting identified spam (bug #949461)

2. we need a way to run the spam duplicates report over different time ranges to identify and remove old spam


I'm done all I can do here for now. Turning it into a tracker and bumping it out of this quarter.
Summary: [research] reduce spam issues → [tracker] reduce spam issues
Whiteboard: u=dev c=feedback p= s=input.2013q4 → u=dev c=feedback p= s=input.2014q1
Whiteboard: u=dev c=feedback p= s=input.2014q1 → u=dev c=feedback p= s=
Putting this in the 2014q3 target milestone. We'll work on it then.
Target Milestone: --- → 2014Q3
Unassigning for now. I'm not working on it.
Assignee: willkg → nobody
Status: ASSIGNED → NEW
The Input service has been decommissioned (see bug 1315316) and has been replaced by a redirect to an external vendor (SurveyGizmo). I'm bulk WONTFIXing Input bugs that do not appear to be relevant anymore.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WONTFIX
Product: Input → Input Graveyard
You need to log in before you can comment on or make changes to this bug.