892189 - Add robots.txt to block spiders to download.mozilla.org

Reporter

Description

•

12 years ago

Can we add a robots.txt file to http://download.mozilla.org to block search engines from spidering that URL. That URL should not be showing up in search as the query paramaters passed to it may not be specific to the search query. It is showing up on bing, I've blocked it on bing, but that is just temporary. For example, if you search bing for firefox there is a "deep link" called "Download" that points to: https://download.mozilla.org/?product=firefox-stub&os=win&lang=en-US. That is the windows build and my search came from Mac. To block all spiders to all URLs on d.mo., the robotx.txt file should be: User-agent: * Disallow: /

Chris More [:cmore]

Reporter

Updated

•

12 years ago

Blocks: 745355

Nick Thomas [:nthomas] (UTC+12)

Comment 1

•

12 years ago

Perhaps this should be handled by WebOps as an server setup issue ?

Chris More [:cmore]

Reporter

Comment 2

•

12 years ago

jakem: should we switch components for this? I didn't know a better place to start witht his one.

Flags: needinfo?(nmaul)

Chris More [:cmore]

Reporter

Comment 3

•

12 years ago

cturra: can you help here? We are getting a lot of invalid downloads from bing.com and robots.txt should be used to prevent this for all search engines.

Assignee: nobody → server-ops-webops

Component: Bouncer → Server Operations: Web Operations

Flags: needinfo?(cturra)

Product: Webtools → mozilla.org

QA Contact: nmaul

Version: Trunk → other

Daniel Maher [:phrawzty]

Comment 4

•

12 years ago

Webops does not manage content. Moving to the webtools component.

Assignee: server-ops-webops → nobody

Component: Server Operations: Web Operations → Bouncer

Flags: needinfo?(nmaul)

Flags: needinfo?(cturra)

Product: mozilla.org → Webtools

QA Contact: nmaul

Chris More [:cmore]

Reporter

Comment 5

•

12 years ago

(In reply to Daniel Maher [:phrawzty] (afk 24-07-2013 through 04-08-2013) from comment #4) > Webops does not manage content. Moving to the webtools component. That's where it was initially and :nthomas said to pass it over to WebOps and now WebOps is saying pass it over to the Webtools team. Laura/Jakem/nthomas: Can we just decide who is going to do this change? Bing is serving up links to en-US windows builds to all users regardless of their locale or operating system. A robot.txt is the way to prevent this from happening. d.m.o should not be indexed by robots.

Nick Thomas [:nthomas] (UTC+12)

Comment 6

•

12 years ago

/CC bsavage as someone who touched bouncer this year Apologies if that was wrong, it was a guess based on download.m.o being a php app. Perhaps this is a simple as dumping the robots.txt at https://github.com/mozilla/tuxedo/tree/master/bouncer/php/, maybe it needs some Apache work too.

Brandon Savage [:brandon]

Comment 7

•

12 years ago

Fixed in https://github.com/mozilla/tuxedo/commit/226d71cfa6b85103aaf66631b6288e2d2afb92fc All this needs now is to be deployed.

Brandon Savage [:brandon]

Updated

•

12 years ago

Status: NEW → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

Chris More [:cmore]

Reporter

Comment 8

•

12 years ago

(In reply to Brandon Savage [:brandon] from comment #7) > Fixed in > https://github.com/mozilla/tuxedo/commit/ > 226d71cfa6b85103aaf66631b6288e2d2afb92fc > > All this needs now is to be deployed. Thanks, Brandon! What is the normal deployment for this? Try searching for site:download.mozilla.org on both Google and Bing. Google totally ignores the domain as it should and bing has been indexing it all along.

Brandon Savage [:brandon]

Comment 9

•

12 years ago

Let me talk with :laura and :jakem this morning and see about getting this shipped. Assuming that we are already serving the latest master (before this change) it should quite literally be trivial.

Chris More [:cmore]

Reporter

Comment 10

•

12 years ago

(In reply to Brandon Savage [:brandon] from comment #9) > Let me talk with :laura and :jakem this morning and see about getting this > shipped. Assuming that we are already serving the latest master (before this > change) it should quite literally be trivial. Can we get an update here?

Flags: needinfo?(bsavage)

Brandon Savage [:brandon]

Comment 11

•

12 years ago

This has been pushed to production.

Flags: needinfo?(bsavage)

Matt Brandt [:mbrandt]

Comment 12

•

12 years ago

QA verified on stage and prod - automation also passes. Thanks to Brandon for adding a test for robots.txt

Status: RESOLVED → VERIFIED

Chris More [:cmore]

Reporter

Comment 13

•

12 years ago

(In reply to Brandon Savage [:brandon] from comment #11) > This has been pushed to production. Thanks!

Bugzilla

Add robots.txt to block spiders to download.mozilla.org

Categories

(Webtools :: Bouncer, defect)

Tracking

(Not tracked)

People

(Reporter: cmore, Unassigned)

References

(Blocks 1 open bug)

Details

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Updated

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Comment 13