I stumbled upon a case of fairly obvious vote abuse (probably generated by a script) in the following article: https://support.mozilla.org/en-US/kb/getting-started-firefox-os A machine with the following configuration: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0) Generated 66 positive votes between: 2013-02-09 02:47:15 and 2013-02-09 03:02:06 The votes don't show up on Google Analytics. I will investigate and report back if I can find similar patterns in other articles.
Some other articles who suffer from the same effect: Firefox for Android Crashes On Startup - How To Fix Mozilla News Sådan bruger du Java, hvis det er blevet blokeret Superheroes Wanted! It seems that any article linked from the Home Page was hit by this effect.
I guess we have to move forward with rate limiting. I assume that there are similar issues in the support forums. Also, I guess that this is not necessarily malicious. Usually the GA vote numbers are about 10% lower than our own number, which might also be due of time zone differences, but in the last two day our own numbers exploded while GA stayed level: Date GA Kitsune 2/8/2013 14048 15252 2/9/2013 13153 15947 2/10/2013 12764 14046 2/11/2013 13121 14308 2/12/2013 12980 14429 2/13/2013 12851 14024 2/14/2013 12515 13807 2/15/2013 12289 13373 2/16/2013 12339 13600 2/17/2013 12219 96351 2/18/2013 12501 118311 I'll look into it.
Ah, mystery solved, the excess votes were created by: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; Netsparker) This might be our own security testing team. I'll see if we can confirm their use of Netsparker. I guess GA filters Netsparker automatically. I know I do that for the log analysis.
There's a crazy spike in some articles last Saturday https://support.mozilla.org/en-US/kb/getting-started-firefox-os/history https://support.mozilla.org/en-US/kb/get-started-firefox-overview-main-features/history Can we remove them?
(In reply to Ibai Garcia [:ibai] from comment #4) > Can we remove them? If there is a way to identify all the bogus votes, then we could write a migration to delete them. (In reply to Kadir Topal [:atopal] from comment #3) > This might be our own security testing team. I'll see if we can confirm > their use of Netsparker. I guess GA filters Netsparker automatically. I know > I do that for the log analysis. Gah. That's not good. We shouldn't have automated scripts causing data to be created on production.
It is hard to tell what votes are bogus from the DB without doing a manual check... for example, the UA that Kadir has shared has created a lot of bad votes, but it also has created what looks like a legit set of votes. I have compiled the votes generated by that UA in the last 2 months and the majority are single votes here and there: https://docs.google.com/spreadsheet/ccc?key=0AmCjyDM0fEFgdHh2eFFkcG5SYUVNdXpxWDlWZFd3ZlE#gid=0 Hard to tell if they are "malicious". What is true is that when the same UA generates more than a couple of votes per minute per article, this tend to represent abuse (i.e. they don't stop in two...but they go to dozens). So maybe we could take that machine configuration and block it when we detect abuse? Do we have something more tangible as a cookie with a unique identifier? I'm thinking that with FX OS and everyone running the same device, the UA trick is going to be...a really bad idea...but maybe we can do it for machines that are not Firefox.
(In reply to Ibai Garcia [:ibai] from comment #6) > Do we have something more > tangible as a cookie with a unique identifier? Yes, we set a cookie on all anonymous votes to (try to) prevent multiple votes per user. A script just works around that by not sending the cookie.
How crazy is not to record the vote if we can write in the cookie?
(In reply to Ibai Garcia [:ibai] from comment #8) > How crazy is not to record the vote if we can write in the cookie? I am not sure I fully understand this. We record the vote and set the cookie in the response. If they try to vote again, the cookie will be there and the vote won't count. At the time of recording the vote, we have no idea what is going to happen to the cookie. We could make it a little harder to vote by requiring a CSRF token. This would just make their script slightly harder but it still would be possible to cheat the system. The only way to make the system cheat-proof is to only allow votes from auth'd users. But that would probably not work so well on the KB where I assume most votes are from anonymous users.
I meant can't instead of can...my mistake. So your first paragraph answers my suggestion. We need a method that reduces the "cheating" (I don't think that is necessarily cheating...it's more like trolling) but still enables non registered votes. And it needs to be friendly (a Captcha doesn't seem like a good option). I'm inclined to remove votes coming from "fishy" UAs. We can refine the method a little bit.
Sorry, meant to comment here after my meeting with Ricky. Ibai, Netsparker is a tool that is used to probe sites for security issues. No normal user will have that in their UI. So, if we remove those votes we should not be removing any legitimate votes. Also, I'm only seeing it come up on the 17th and 18th, and a little bit on the 26th: 2013-02-17 82401 2013-02-18 104329 2013-02-26 83 Ricky, here is the SQL query I'd suggest: First, removing all vote_metadata, so we won't be stuck with it after removing the actual votes. DELETE FROM `wiki_helpfulvotemetadata` WHERE `wiki_helpfulvotemetadata`.`vote_id` in ( SELECT `wiki_helpfulvote`.`id` FROM `wiki_helpfulvote` WHERE `wiki_helpfulvote`.`user_agent` LIKE '%Netsparker%' AND `created` BETWEEN '2013-02-17 0' AND '2013-02-19 0'); Then deleting the actual votes should be quite straight forward: DELETE FROM `wiki_helpfulvote` WHERE `wiki_helpfulvote`.`user_agent` LIKE '%Netsparker%' AND `created` BETWEEN '2013-02-17 0' AND '2013-02-19 0'; Next step is getting rate limited activated on SUMO, see bug 785850
I'll test the SQL out and add it to a migration for deploying.
BTW, I've talked to most of you, but just so you know -- we would never (at least intentionally, and definitely not in this case) run automation that would change values in production).
The votes have been deleted.
Meant to comment here, but just so everyone is aware: this IP/DDOS wasn't any accidental or malicious event from Web QA's side -- the attacker just happened to use NetSparker, a tool we (I, mostly) use quite often.
Created attachment 723099 [details] Spike in votes (In reply to Ricky Rosario [:rrosario, :r1cky] from comment #15) > The votes have been deleted. I still see spikes on some articles (see attachments).
The votes for this 2 articles come from this machine: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0) This matches the articles that I referred in my previous comment. It seems that we removed part of the effect but not completely. And similarly to the other case, they don't show up on GA...so they may be happening because of script or something. I can't understand how somebody can be doing this....
Apparently the Netsparker votes had not been removed. They are now. But also, we are adding rate limiting, so at least in the future this should not be an issue anymore. Unless people are launching a sophisticated attack.