Closed Bug 972830 Opened 10 years ago Closed 10 years ago

Add adaptive rate limiting to Kitsune

Categories

(support.mozilla.org :: General, defect, P3)

defect

Tracking

(Not tracked)

RESOLVED WONTFIX
Future

People

(Reporter: atopal, Unassigned)

References

Details

(Whiteboard: u=user c=questions p= s=2013.backlog)

Currently we are employing static rate limiting for Kitsune. This is particularly problematic in the support forums, where new contributors frequently hit those limits and have to be added to a special list manually to be exempt from those limits. Thus we are in a situation where we can't lower the limits because it would hurt new contributors and we can't increase them, because that would allow for spamming.

With adaptive rate limiting we'd start with a low limit and relax the limitation based on the age of the account and the number of posts made.  It's important to look at both age and number of posts. That will allow us to recognize spammers while their limits are low and delete their accounts, but at the same time the limits would grow as legitimate contributors become more proficient and thus never limit them. We'd still keep (really high) limits, just in case.

Example: Today the rate limiting in the forums is 4 posts per minute and 100 per day. In the new model it would start with 1 post per minute and 20 posts per day.
The counter starts once you have made 10 posts. 7 days after your 10th posts we increase the limit to 5 posts per minute and 50 posts per day. 7 days after that we increase the limit to 10 posts per minute and 200 posts per day. 

Those are just example figures and can be adjusted accordingly.

Also, this is a fairly complex system, there might be simpler ways to achieve the same objective, namely: Limiting the damage that spammers can cause while being transparent to legitimate contributors.
I should have mentioned, this is based on Cor-el's suggestion in the forums: https://support.mozilla.org/en-US/forums/forum-moderators/709969?page=2#post-57419
Do we have any metrics on how much spam we're getting and how our attempts to alleviate the incoming spam are working?

If not, then we really should stop implementing new and increasingly more complex things and start measuring. Also, it'd be good for someone to spend some time researching what shapes the spam we're getting take. I did that for Input and the distinctive shapes helped me reduce the amount of spam significantly.
We don't have any written record of the spam that is being deleted. Bug 939938 should give us an opportunity to collect data on that. 

As to people affected by this: In addition to the forum moderators 11 people were put on the the special list manually. That covers most of our active contributors. In other words: Our current rate limiting needs manual intervention that will probably scale with the number of contributors added. Thus today's rate limiting does not achieve the objective of limiting the damage that spammers can cause while being transparent to legitimate contributors.

I do agree that this is a fairly complex solution. If there is a solution that achieves that objective and is less complex, let's do that.
Making this block on bug #939938 because I think we really need to start measuring the spam before we start throwing more time and energy at problems like this that we don't have data for.
Depends on: 939938
Yes to that, but one aspect of the issue is that the current method needs manual intervention and blocks legitimate users, and we do have data for that.
(In reply to Kadir Topal [:atopal] from comment #5)
> Yes to that, but one aspect of the issue is that the current method needs
> manual intervention and blocks legitimate users, and we do have data for
> that.

Where's the data for blocking legitimate users? So far I only see a count of how many people who were put on the special list which is 11 and while that requires manual work, I think that's wildly smaller amount of work than implementing, tuning and maintaining an adaptive rate limiting. Seems like it'd be easier and more fruitful to reduce the amount of work to add someone to a group.
Those 11 together with the forum moderators are pretty much our forum community. Those people were added manually because they hit the limit, that suggests it's going to be like that for future contributors too. We could make it easier to add people to the group, but it's fairly easy already. The issue is that people don't know about that list, so they hit the limit and are stumped. We could add more information about what to do, if you are a legitimate user, and that would certainly help.  We could also expand the list of people who can add users to the list to include moderators, but it would still stop the user from moving forward until someone unblocks them.

As always we'll need to strike a balance between operationalizing and doing things manually. There is no rush here as far as I can tell, so we can take the time to come up with a solution that would strike that balance.
It sounds like the problems that this bug is to address are these:

1. reduce the very minimal effort required to maintain the special group

2. reduce the problem where new contributors hit the rate limit and are confused


This bug might vaguely handle the first item, but since it's minimal now and there's no data to suggest it will be arduous in the future other, I don't see why it makes sense to implement adaptive rate limiting.

This bug definitely doesn't handle the second item--that will still be an issue. Sounds like we should spend time on that problem. But before we do that, we really should add some code to measure how often that happens. I suggest writing up a bug for measuring the problem and then a bug for alleviating the problem.
For the record, I'm not excited about adding adaptive rate limiting. I think it'll be a huge mess of complexity, edge cases and since it ties into the gamification of SUMO, it'll cause a bunch of other social problems.

Given that this bug is specifically about implementing adaptive rate limiting and not about a problem that SUMO has, my vote is we mark this bug as WONTFIX unless there's compelling data to suggest that the work required to implement, test and maintain adaptive rate limiting is worth it. I don't think there's a compelling case right now or in the near future.

If at some point in the distant future we decide maybe it's worth resurrecting this idea, then we can do so at that point.
> This bug definitely doesn't handle the second item--that will still be an
> issue. Sounds like we should spend time on that problem. But before we do
> that, we really should add some code to measure how often that happens. I
> suggest writing up a bug for measuring the problem and then a bug for
> alleviating the problem.

Are you suggesting we measure how often people hit the limit? If yes, what would be the possible outcomes?

I have no idea what adaptive rate limiting would actually cost. Is it 1 dev day? 10 dev days? How are we supposed to know whether it's worth operationalizing it or maintaining a manual process without at least knowing a ball park figure? 

I'd also expect the champion for this area to be better equipped to judge the benefit of removing manual processes.

This suggestion might very well be overkill, and maintaining a manual process might be a better way to spend our resources, but it's still worth at least a look.
(In reply to Kadir Topal [:atopal] from comment #10)
> > This bug definitely doesn't handle the second item--that will still be an
> > issue. Sounds like we should spend time on that problem. But before we do
> > that, we really should add some code to measure how often that happens. I
> > suggest writing up a bug for measuring the problem and then a bug for
> > alleviating the problem.
> 
> Are you suggesting we measure how often people hit the limit? If yes, what
> would be the possible outcomes?

Yes, we need to measure something otherwise we have no idea what the magnitude of the problem is. Your second question sounds like you're asking me to predict what the data would be--I can't do that.


> I have no idea what adaptive rate limiting would actually cost. Is it 1 dev
> day? 10 dev days? How are we supposed to know whether it's worth
> operationalizing it or maintaining a manual process without at least knowing
> a ball park figure? 

I said "I think it'll be a huge mess of complexity, edge cases and since it ties into the gamification of SUMO, it'll cause a bunch of other social problems." You know I can't provide an estimate until there are specifics, but I will say that I think this is a big project especially when you factor in testing, tuning and maintaining and dealing with all the discussions surrounding edge cases where this isn't working, where cheating in the gamification lets people spam and all that stuff. More importantly, I think this is a really bad road to go down--the fewer automated things we tie to SUMO gamification numbers, the better.


> I'd also expect the champion for this area to be better equipped to judge
> the benefit of removing manual processes.

This statement puzzles me. We have manual processes for all kinds of things. Being manual isn't bad. If it's a manual process that's using up a lot of someone's time--that's bad. I don't think that's the case here if we've got a mostly static group of 11 people. If it's hard to maintain, we should make that easier.


> This suggestion might very well be overkill, and maintaining a manual
> process might be a better way to spend our resources, but it's still worth
> at least a look.

I neither disagree with this nor agree with it. I'm saying it's not worth looking at until we have data that shows it's a compelling problem to solve. We have limited development resources. If we spend our resources on thinking about this, we're not spending them on other things. There are definitely other things that are more tangible than this is right now. Ergo, we need more data here before we should proceed.

I'm also saying that this solution is *only* vaguely solving the manual process aspect. It probably doesn't solve the probably more important aspect that it's creating a bad experience for users who are getting rate limited.
Cross referencing, See: https://support.mozilla.org/en-US/forums/forum-moderators/708566?page=8#post-57462

"Philipp has deactivated 5 users in the past hour with the first being deactivated at  2:06:00 PM PST 2/14/14. (awesome valentine's day present :P) and the next 4 being about 5-15 minutes apart. I deactivated one."

All users have been hitting https://support.mozilla.org/en-US/questions/986372? for some reason....

List of users:
Leslon: https://support.mozilla.org/en-US/user/1057809 (original account)
Lantium: https://support.mozilla.org/en-US/user/1057812
gaidiado: https://support.mozilla.org/en-US/user/1057822
lesira: https://support.mozilla.org/en-US/user/1057817
lesnikk: https://support.mozilla.org/en-US/user/1057816
Lexxium: https://support.mozilla.org/en-US/user/1057813
As far as I can tell, none of that spam would have been blocked by more aggressive rate limiting. The users each only appear to have posted a few times, which is fully with in the rate limit. Additionally, even if multiple users come from the same IP address, we don't rate limit them together because sometimes legitimate users share IP addresses.

It sounds like what we really need is not rate limiting, but spam detection. This is not about the rate at which people post, it is about the content of posts.
I'm looking for a bug that deals with spam but can't find it. It's probably in the Mods forum somewhere but I can't look for it atm. The content of the post was similar to previous posts before these.

Content from previous user: https://support.mozilla.org/en-US/forums/forum-moderators/708566?page=8#post-57113
Content from current spammers: https://support.mozilla.org/en-US/forums/forum-moderators/708566?page=8#post-57462

Now, I gotta go. Keep updated.
(In reply to Mike Cooper [:mythmon] from comment #13)
> As far as I can tell, none of that spam would have been blocked by more
> aggressive rate limiting. The users each only appear to have posted a few
> times, which is fully with in the rate limit.

yes, the user was only able to reply a handful of times until the accounts got deactivated, but this is only because of lucky circumstances as two mods were around & the replies were always put into a topic, which i was subscribed to (so i got instant email notifications).
if there is a window where no mod is around to intervene then our current system wouldn't stop or at least delay such spammers to vandalize the whole forum. in case the spammer with one account is undisturbed for just 30 minutes, he can reply to every question posted on that day for instance - with email notifications sent out to potentially hundreds of users (and the content in this case wasn't particular pleasant in this case)...
I agree with Mike: this bug isn't going to fix the spam problems listed so far.

We should create a new bug that focuses on the spam problem and figure out whether we have multiple different spam problems (each should get its own bug) and discuss the problems, the data and our options there. Dumping more stuff in this bug isn't helpful since this bug is about a solution and not about a problem.
In addition to what Mike and Will mentioned, this new system will still be not optimal to enable new contributors to post multiple answers their first day without issues. (In my first day I'm not going to be able to answer many people...leaving me puzzled by the message of the limit)

The last "spam" attack (is more a profanity-trolling-flame war) is the type of thing that in other platforms are blocked by keyword detection. 

Spam is a no silver bullet case. A little bit of IA to detect it, a little bit of limitations plus a little bit of manual work.

Unless the situation is totally out of control...I'll suggest to step back and combine all the bugs around the issue into a holistic approach combining code and processes.
My understanding is that this is a special case, a new kind of spammer that hasn't been a problem before. 
In this case I agree with Will, let's open a bug that identifies this particular issue and figure out solutions for it (we might even have the answer in some of the other bugs but we need a clear definition of the issue first).
(In reply to madalina from comment #18) 
> In this case I agree with Will, let's open a bug that identifies this
> particular issue and figure out solutions for it (we might even have the
> answer in some of the other bugs but we need a clear definition of the issue
> first).

Bug 976076 - [research] spam prevention, detection and user protection
Given we've got a bug for dealing with the spam issue, I'm going to close this one out. This is a solution that's fraught with issues and complexity.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.