Closed Bug 892189 Opened 7 years ago Closed 7 years ago

Add robots.txt to block spiders to download.mozilla.org

Categories

(Webtools :: Bouncer, defect)

x86
macOS
defect
Not set
normal

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: cmore, Unassigned)

References

(Blocks 1 open bug)

Details

Can we add a robots.txt file to http://download.mozilla.org to block search engines from spidering that URL. That URL should not be showing up in search as the query paramaters passed to it may not be specific to the search query. It is showing up on bing, I've blocked it on bing, but that is just temporary.

For example, if you search bing for firefox there is a "deep link" called "Download" that points to: https://download.mozilla.org/?product=firefox-stub&os=win&lang=en-US. That is the windows build and my search came from Mac.

To block all spiders to all URLs on d.mo., the robotx.txt file should be:

User-agent: *
Disallow: /
Blocks: 745355
Perhaps this should be handled by WebOps as an server setup issue ?
jakem: should we switch components for this? I didn't know a better place to start witht his one.
Flags: needinfo?(nmaul)
cturra: can you help here? We are getting a lot of invalid downloads from bing.com and robots.txt should be used to prevent this for all search engines.
Assignee: nobody → server-ops-webops
Component: Bouncer → Server Operations: Web Operations
Flags: needinfo?(cturra)
Product: Webtools → mozilla.org
QA Contact: nmaul
Version: Trunk → other
Webops does not manage content. Moving to the webtools component.
Assignee: server-ops-webops → nobody
Component: Server Operations: Web Operations → Bouncer
Flags: needinfo?(nmaul)
Flags: needinfo?(cturra)
Product: mozilla.org → Webtools
QA Contact: nmaul
(In reply to Daniel Maher [:phrawzty] (afk 24-07-2013 through 04-08-2013) from comment #4)
> Webops does not manage content. Moving to the webtools component.

That's where it was initially and :nthomas said to pass it over to WebOps and now WebOps is saying pass it over to the Webtools team. 

Laura/Jakem/nthomas: Can we just decide who is going to do this change? Bing is serving up links to en-US windows builds to all users regardless of their locale or operating system. A robot.txt is the way to prevent this from happening. d.m.o should not be indexed by robots.
/CC bsavage as someone who touched bouncer this year

Apologies if that was wrong, it was a guess based on download.m.o being a php app. Perhaps this is a simple as dumping the robots.txt at https://github.com/mozilla/tuxedo/tree/master/bouncer/php/, maybe it needs some Apache work too.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
(In reply to Brandon Savage [:brandon] from comment #7)
> Fixed in
> https://github.com/mozilla/tuxedo/commit/
> 226d71cfa6b85103aaf66631b6288e2d2afb92fc
> 
> All this needs now is to be deployed.

Thanks, Brandon! What is the normal deployment for this?

Try searching for site:download.mozilla.org on both Google and Bing. Google totally ignores the domain as it should and bing has been indexing it all along.
Let me talk with :laura and :jakem this morning and see about getting this shipped. Assuming that we are already serving the latest master (before this change) it should quite literally be trivial.
(In reply to Brandon Savage [:brandon] from comment #9)
> Let me talk with :laura and :jakem this morning and see about getting this
> shipped. Assuming that we are already serving the latest master (before this
> change) it should quite literally be trivial.

Can we get an update here?
Flags: needinfo?(bsavage)
This has been pushed to production.
Flags: needinfo?(bsavage)
QA verified on stage and prod - automation also passes. Thanks to Brandon for adding a test for robots.txt
Status: RESOLVED → VERIFIED
(In reply to Brandon Savage [:brandon] from comment #11)
> This has been pushed to production.

Thanks!
You need to log in before you can comment on or make changes to this bug.