Closed Bug 857621 Opened 12 years ago Closed 12 years ago

Categories

(Infrastructure & Operations Graveyard :: WebOps: Other, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: kats, Assigned: cturra)

References

Details

Attachments

(1 file)

https://wiki.mozilla.org/robots.txt disallows everything. I was searching for something on google today and got the attached (I added the red outline for emphasis). Is this intended?
This is making it hard to find things on wikimo - and killing the site's google-fu. it's probably too late for the latter..
This seems like a major regression.
Severity: normal → major
Seriously.
Severity: major → critical
Andrei, "critical" means "all work stops". Think of it as "I'm willing to have a meeting with Mitchell to explain this". So, not this bug. Callek, what do you think? Who can make this call? Sadly, it's probably too late for the site's pagerank..
Severity: critical → normal
Personally I'm ok if google indexes/crawls wiki.m.o, nothing on there should ever be private that a crawl would expose it. The only downside is it makes it a larger target of spammers, (since they can get their pagerank artificially increased), but I think we're in a better state (right now) re: newly incomming spam than we used to be. Final say, lies with webops, imho.
Assignee: nobody → server-ops-webops
Component: wiki.mozilla.org → Server Operations: Web Operations
Product: Websites → mozilla.org
QA Contact: nmaul
Version: unspecified → other
(In reply to Dustin J. Mitchell [:dustin] from comment #4) > Andrei, "critical" means "all work stops". Think of it as "I'm willing to > have a meeting with Mitchell to explain this". So, not this bug. I thought that was blocker :p Anyway, I think it is pretty critical for wikimo to be google searchable, because if it's not that makes it considerably harder for anyone to find anything on it, especially people looking for things they don't *know* are even on the wiki.
i agree this really isn't a blocker -- blockers are reserved for scenarios like "omg, the site is down!!111one" i have applied a new robots.txt file for the wiki that should be a little less restrictive for bots. note, i have applied Disallow list of the top known "scraper" bots, plus many of our "Special:..." pages. *interesting note on that, i chose not to use wildcard matching in here since only the Google and Microsoft's Bing bots honor wildcards.
Assignee: server-ops-webops → cturra
Status: NEW → RESOLVED
Closed: 12 years ago
OS: Mac OS X → All
Hardware: x86 → All
Resolution: --- → FIXED
Thanks! I apologize for the somewhat aggressive severity setting :p
This seems to have regressed. The current robots.txt disallows all robots; User-agent: * Disallow: /
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
looks like a recent mediawiki updated the robots.txt back to the vanilla one that is shipped with the product. i have updated this again and will look into how we can make sure this gets applied with future upgrades.
Status: REOPENED → RESOLVED
Closed: 12 years ago12 years ago
Resolution: --- → FIXED
Component: Server Operations: Web Operations → WebOps: Other
Product: mozilla.org → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
See Also: → 1970197
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: