Closed Bug 843789 Opened 11 years ago Closed 11 years ago

Add spam prevention features to [Bedrock] forms

Categories

(www.mozilla.org :: Pages & Content, defect, P2)

x86
macOS
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: cmore, Assigned: jpetto)

References

Details

(Whiteboard: u=dev c=forms p=)

Attachments

(3 files)

A quick grep turned up the following forms *without* a CAPTCHA. Some may not require CAPTCHA, but I wanted to get the complete list to start.

./apps/collusion/templates/collusion/collusion.html

./apps/firefox/templates/firefox/organizations/organizations.html:

./apps/mozorg/templates/mozorg/partnerships.html

./apps/privacy/templates/privacy/firefox.html

./apps/privacy/templates/privacy/privacy_contact.html

We should also consider accessibility in the solution of preventing spam on our website's forms.
Priority: -- → P3
Whiteboard: u=dev c=forms p=
icaaq: Know of any good resources for accessible spam prevention methods?
Flags: needinfo?(icaaaq)
The ./apps/mozorg/templates/mozorg/partnerships.html form is also live at http://www.mozilla.org/en-US/firefox/partners/ 

If possible, I'd like to add a spam prevention tool to the form in both place by the end of March.

Thx,
Jen

PS:  I'm marking 792792 (opened by Greg Jost and Mark Crandon in September) a duplicate of this one.
pmac: I've used text-CAPTCHA and it's suggestions on a smaller project but that's only en-us. http://textcaptcha.com/really

There are what I know of no accessible image/voice captchas out there yet.
Flags: needinfo?(icaaaq)
I'm open to any solution.  But I desperately need something.  The volume of garbage is overwhelming.
Priority: P3 → P2
How about we implement a CAPTCHA on these forms until there is a better solution that is accessible? If the spam is high enough that we accidentally overlook potential partnerships, that could be worse than having an accessible spam prevention method for v1 of this bug.
I'm Ok with that, Chris.
I haven't used it yet, but reCAPTCHA looks promising: http://www.google.com/recaptcha

Should we sign up for a key?
yes let's do it
(In reply to Jon Petto [:jpetto] from comment #8)
> I haven't used it yet, but reCAPTCHA looks promising:
> http://www.google.com/recaptcha
> 
> Should we sign up for a key?

I've asked about the reCatcha key and I think we were instructed to get request a new one. I would ask in #webdev or #it. I remember fox2mike mentioning it before.
Assignee: nobody → jon
We must have a key for this already.

ReCaptcha is already implemented on the get involved form

http://www.mozilla.org/en-US/contribute/

Click in the email field to open up the form.

So i know we can use this easily for django forms (forms.py) the above look to be hardcoded forms so not sure how that will all work.
ReCaptcha is indeed already in use on the site. I started a conversation about possible solutions because :icaaq had lamented on IRC that all captchas, including ReCaptcha, are very bad for a11y. If there were non-captcha solutions that could similarly cut down on our spam levels then I was hoping someone would let us know. So far we've not heard anything. Since we do already have ReCaptcha in use, we can just go with it for now. But I'd still like to stay open to the possibility of using another more accessible technique in future.
I agree, Pmac.  Ideally we would use an accessible tool, but since one doesn't seem to currently exist, let's move forward with Recaptcha.
Can we implement reCaptcha on bedrock so that if we find an alternative in the future, we can swap out the code in one central place? It wouldn't be fun to have to go back to all forms and replace it with something new.
(In reply to Chris More [:cmore] from comment #14)
> Can we implement reCaptcha on bedrock so that if we find an alternative in
> the future, we can swap out the code in one central place? It wouldn't be
> fun to have to go back to all forms and replace it with something new.

Totally agree. I've been working on this a bit today - looking at the current implementation and how we could modularize it to one bit of shared code. This is definitely the route we want to take.

I'll probably need a little help navigating bedrock when it comes time to place the code. I'll ping IRC/pmac if/when necessary.
The technical side of this bug is coming along nicely, but we have a problem with the UI. The following pages have breakpoints/views that are too skinny to fit any of the default ReCAPTCHA themes:

http://www.mozilla.org/en-US/collusion/
http://www.mozilla.org/en-US/about/partnerships/
http://www.mozilla.org/en-US/firefox/organizations/

It looks like the skinniest built-in theme for the ReCAPTCHA is 318px wide. We need to get down to 220px. ReCAPTCHA does support customs themes, so we shouldn't be stuck. I can take this on - I would make a theme as close to the existing 'clean' theme (seen here: http://www.mozilla.org/en-US/contribute/) as possible.

Is it okay to proceed this way? Do we need to get a design mocked up?
(In reply to Jon Petto [:jpetto] from comment #16)
> The technical side of this bug is coming along nicely, but we have a problem
> with the UI. The following pages have breakpoints/views that are too skinny
> to fit any of the default ReCAPTCHA themes:
> 
> http://www.mozilla.org/en-US/collusion/
> http://www.mozilla.org/en-US/about/partnerships/
> http://www.mozilla.org/en-US/firefox/organizations/
> 
> It looks like the skinniest built-in theme for the ReCAPTCHA is 318px wide.
> We need to get down to 220px. ReCAPTCHA does support customs themes, so we
> shouldn't be stuck. I can take this on - I would make a theme as close to
> the existing 'clean' theme (seen here:
> http://www.mozilla.org/en-US/contribute/) as possible.
> 
> Is it okay to proceed this way? Do we need to get a design mocked up?

Can you post a screenshot of what it would look like with ReCAPTCHA on the page and at a breakpoint that is too skinny?
From local copy of /firefox/organizations.
Upon reading a bit more of the ReCAPTCHA documentation, it appears the skinniest we can go is 300px (restricted by the image). We could likely use CSS/JS to shrink that down to 220px, though that would almost certainly reduce usability.

I don't think restructuring the pages in question (making the sidebar wider) is feasible, so...

We could move the forms to the main/wide column, though that might not be great for usability/visibility either.

A possible simpler first step would be implementing a CSRF (and perhaps Honey Pot) protection scheme. This wouldn't help much for human spammers, but should help quite a bit against bots.
Here are a few non-captcha methods of fighting spam: http://textcaptcha.com/really

What do we think about any combination of these? I believe CSRF is baked in to bedrock, and we could pretty easily add a honey pot field.

I think we could move forward pretty quickly with a CSRF/honey pot combo. There would be almost zero impact to users (aside from one extra field to ignore for the visually impaired), and should give us pretty quick feedback as to their effectiveness.

From a slightly elevated viewpoint, if CSRF/honey pot *don't* help us significantly, it may mean the spammers are human...which means a captcha may not be much more of a deterrent.

Do we have numbers on how much less spam the captcha'd contribute form (http://www.mozilla.org/en-US/contribute/) has compared to other, non-protected forms?
Can you tell from the attached if spam is human or computer?  I'd guess computer?
I would guess computer as well. Looks like the majority of spam entries have no name or email specified at all, while the rest have randomly generated info.

I presume Salesforce has the ability to filter out entries without a name or email. We could also add a bit of JS to ensure humans aren't clicking the submit button without filling in any info.

I found a blog post detailing how to use the honey pot technique with Salesforce: http://www.oliverjobson.co.uk/web-development/salesforce-com-web-to-lead-filtering-out-spam/

This would likely help us with the partnerships form (http://www.mozilla.org/en-US/about/partnerships/).

We can use that same technique, as well as an added CSRF, on forms which are handled by bedrock (collusion & organizations).
CSRF and/or Saleforce filtering would probably be "good enough" for now giving the reCaptcha issues.
Enabling CSRF on bedrock will take a little configuring as I don't think it's been used to this point. When the site launched we didn't have memcached or a database. Now that there's memcached you should be able to use that for csrf.
Another good Salesforce spam: http://www.interactiveties.com/b_web_to_lead_spam.php
(In reply to Chris More [:cmore] from comment #26)
> Another good Salesforce spam:
> http://www.interactiveties.com/b_web_to_lead_spam.php

actually that validation is pretty week. a csrf would be better.
Another way we can handle this is if we add a simple question and validation like:

"What is the name of our web browser?" > "Firefox"

or 

"What is one plus five?" > "six" or "6"

I would guess a simple question like this would resolve almost all spam and is accessible without reCAPTCHA.

pmac/jpetto: thoughts on asking a question and validate the question?
it might work for a little while, and should be decently accessible. If someone was targeting our forms they could just get all the questions and answers from github as we've no other place to put them but in the code.
I think a simple question/answer would work, though I'm not sure it's necessary for our first pass. It might be better to hold off on making life more difficult for real users until we try some other less intrusive tactics.

Based on the massive number of entries with blank names/emails in the PDF jbertsch posted, it doesn't even look like the bots are really trying.

If we just require a name and email (both with JS and on the back end), the majority of the spam in that PDF would be dealt with. If we add CSRF, we'd protect against possible future spam POSTing from outside scripts. If we add a honey pot (a checkbox invisible to users labeled "Check this if you're not human."), we would be protected against smarter scripts (as most bots will check any available boxes).
(In reply to Jon Petto [:jpetto] from comment #30)

+1
I'm open to trying less intrusive solutions if we can implement relatively quickly and easily.  If it doesn't work, then we can discuss next steps.
jpetto: +1 to requiring name/email and the honey pot check box. Is that something you can move forward with?
(In reply to Chris More [:cmore] from comment #33)
> jpetto: +1 to requiring name/email and the honey pot check box. Is that
> something you can move forward with?

Yep, after style guide updates are in I can jump on this. May need pmac's help/direction with the bedrock stuff, but we should be good to go early next week.
I should have a PR ready for the partners forms (found at /firefox/partners/ and /about/partnerships/) early next week. However, it's likely that the Salesforce endpoint URL we've been using is stored in the spammer's scripts. Our new protection features wont be able to help attacks made directly against that URL (as it's handled by Salesforce).

We could do one or two things to help out on the Salesforce end:

1) Add blank field rules to the incoming data. Right now, Salesforce is accepting requests without a name, email, or company. If those were made required on the Salesforce end, things would improve, but likely only temporarily. It would be relatively easy for the spammers to start sending junk data.

2) Change the Salesforce endpoint URL. As we're going to be sending the data through bedrock first (meaning the Salesforce URL wont be visible in the HTML), any new Salesforce URL would only be discoverable by browsing the source code, which I'm guessing/hoping would be a significant hindrance to spammers.

My Salesforce skills are at least 6 years old, but I think doing the above would be relatively quick. I'd be happy to poke around/configure Salesforce if necessary.
Lets do it.  What do u need from me?  Sales force admin access?
Yep, admin access, and anything in the current Salesforce configuration that you think may be relevant.
Mark - Any word on admin access to Salesforce?
Jon, you should have access now.
Mark - Got in. Looks like we cannot update the endpoint URL, but we can set rules requiring first name, last name, company, and email. Are those correct in terms of fields that should be required?

Any fields we mark required will be so in all instances when creating Leads, not just when coming in from the forms on mozilla.org. Is that okay?
works for me.
I have created and enabled a validation rule in Salesforce. If a lead comes in that doesn't have data for first name, last name, email, and company, then that lead is never added to Salesforce.

I ran a quick test from http://www.mozilla.org/en-US/about/partnerships/ with JavaScript turned off (to avoid front-end validation). The lead submitted without a company value was not created. The lead submitted with data for all 4 required fields was created successfully.

This validation will also run when creating a lead from within Salesforce, but oddly *not* when creating a lead within Salesforce from the "Quick Create" sidebar form (which shouldn't be a big deal, just an FYI).

Please keep tabs on incoming leads today. We should see normal activity for real leads - those with all required data. The validation rule is simple to disable should a problem arise.
This isn't going to work.  Now I get an email for each failed lead created which will generate hundreds of emails.  There is no way to turn it off either. 

Can we enforce it from the web page beforehand?!
We will be adding code shortly to enforce these rules from the mozilla.org side (code is being reviewed). It's hard to say how effective this will be because you don't *have* to go through a form on mozilla.org to submit data to Salesforce.

Web-to-lead forms by default send information directly to Salesforce. The code we're adding will intercept and validate that data on the mozilla.org side first. The problem is that the Salesforce endpoint URL has been easily accessible (visible in the HTML on mozilla.org) for some time now, meaning spammers likely know that URL and are sending data directly at it, circumventing anything to do with the mozilla.org site.

Are the emails unique enough for you to filter them to a folder/the trash? If not, we can disable the Salesforce validation and see how effective our code changes on mozilla.org are.
Should I raise a support request with Salesforce to see if we can change the endpoint URL?
It wouldn't be a terrible idea, though we'd need to be sure there aren't any web-to-lead forms out there we don't know about that would break should the endpoint change.

Currently, the only two I'm aware of are on /about/partnerships/ and /firefox/partners/.

Also, note that what would have to change is the oid value, and not the actual URL the form POSTs to. As far as I can tell, all Salesforce web-to-lead forms POST to a single endpoint. It's the oid that dictates which account/instance gets the lead information.

I believe the oid is associated with an entire Salesforce install (and not just these forms), so I wouldn't be surprised if they can't change it, but it's worth asking.
Commits pushed to master at https://github.com/mozilla/bedrock

https://github.com/mozilla/bedrock/commit/5df6b6afdb72d3505b6adfcf15f340319bd4e7af
Add spam prevention features. Bug 843789.

- Added general use honeypot widget
- Added CSRF to /firefox/partners & /about/partnerships
- Added labels to /firefox/partners form
- Added required field to /collusion form
- Updated /apps/firefox/tests.py to play with updated proxy

https://github.com/mozilla/bedrock/commit/063ee95075ebb790372b22995783ecd1faf8d87a
Merge pull request #768 from jpetto/bug-843789-spam-prevention

Add spam prevention features. Bug 843789.
Can you confirm on https://www.allizom.org/en-US/firefox/partners/ as well?
None of the leads are going through from this entry point.

https://www.mozilla.org/en-US/about/partnerships/
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
This is broken on /firefox/partners/ as well. We're missing the /b/ redirect. PR submitted. Hopefully we can merge soon.

https://github.com/mozilla/bedrock/pull/922
Commit pushed to master at https://github.com/mozilla/bedrock

https://github.com/mozilla/bedrock/commit/dc569acd2bb6d3c5f8fecb6b0dc3cfda9912798e
Add /b/ redirect. Bug 843789.

Add missing bedrock passthrough for partnerships forms AJAX
endpoint (/about/partnerships/contact-bizdev/).
This fix is in production.
Status: REOPENED → RESOLVED
Closed: 11 years ago11 years ago
Resolution: --- → FIXED
Works.  I thought we were going to put some type of spam protection on this.  A captcha or something.  The spam is as brutal as ever.
We put two different bits of spam protection on the /about/partnerships form - one to reduce the number of bots, and one to make sure submissions are coming from authorized URLs only.

What's the current rate of spam for this form look like? If it's so high that it looks like an automated process, we'll need to investigate further. If it's at a rate that seems to be human, we're in a more difficult position.

Some spammers employ humans to man terminals and fill out forms. If this is the case, adding a CAPTCHA wouldn't be much of a deterrent (as the purpose of a CAPTCHA is to ensure a human is submitting the form). If humans are behind the spam, we may need to look at something like IP blocking.
I'll attach a printout.  You can see the volume and type.
Attached file Salesforce lead spam
Looks like ~80 entries for 5/29, with ~50 clearly being spam (the Acunetix entries). Definitely could be done by hand.

Can you include a timestamp on the Created Date field? It would be helpful to know how fast those spam entries are coming in.
Sorry, I wasn't able to figure out how to include the time on the list.

We received approximately 30 between yesterday and today.
Hm, ok. That's certainly a number achievable through manual labor. If spammers are sitting at computers and submitting our forms, we'll need to look in to more drastic measures - IP blocking/limiting or something similar.

I'll talk with pmac next week to see what our options are.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: