Closed Bug 294295 Opened 20 years ago Closed 9 years ago

Permit pages describing Bugzilla's configuration to be indexed by search engines (alter robots.txt)

Categories

(bugzilla.mozilla.org :: General, enhancement, P4)

enhancement

Tracking

()

RESOLVED WONTFIX

People

(Reporter: timeless, Unassigned)

References

Details

problem: web.archive.org can't index bmo's look and feel.
this means it can't tell me how a guided version of bugzilla's enter bug page
looked a year ago. nor can it tell me who owned a component a year ago.

the following items should be safe for robots to visit
please add:

createaccount.cgi
describecomponents.cgi
describekeywords.cgi 
enter_bug.cgi 
query.cgi 
sidebar.cgi 
page.cgi

mozilla-16.png
mozilla-banner.gif
skins/
I'd like some opinions from some folks who are familiar with Bugzilla innards on
what kind of effect this would have.
Priority: -- → P4
"Mean" robots already ignore the robots.txt file.

Right now, Red Hat has a Google-able Bugzilla, on show_bug, and it seems to hold
up OK for them.

I don't know what the robots.txt was designed to stop in the first place.

But on bmo, if they get the guided bug entry page, the robots could hit the dups
query over and over, which might be at least a little irritating.

I'm not clear on why a robot would ever want to visit sidebar.cgi, also.
addendum, / needs to be added so that index is reachable ...

the reason i want sidebar and friends is so that i can review L&F when i use
web.archive.org.
sidebar does user-agent, doesn't it?

index.cgi should be included too

Not sure about createaccount; what if someone has a link to a zillion
createaccount.cgi pages and google follows it? I don't think we require POST.
Are all Bugzilla bugs spider-reachable, or not necessarily? I think having
indexing of Bugzilla pages would be a great thing, but I question how well it
would work if we don't really have a fully connected set.
i'm explicitly *not* adding show_bug.cgi to the list. that's a huge kettle of
fish i do *not* want to deal with at the moment.
(In reply to comment #6)
> i'm explicitly *not* adding show_bug.cgi to the list. that's a huge kettle of
> fish i do *not* want to deal with at the moment.

Good choice for your database server.  We allow bots to index our bugs, and crawl*.googlebot.com is consistently the #1 client by a long shot.
Assignee: justdave → justdave
timeless: can you attach your suggested robots.txt?

Gerv
QA Contact: myk → reed
Rearranging this component to not have me as the default assignee, so that it
doesn't appear like I'm intending to work on bugs that other people could be
taking care of if they didn't think I was already doing them.  If this bug is a
software issue on b.m.o and you'd like to fix it, the modified source is now
available for the patching (see bug 373688).  Reassigning to the new default
owner.
Assignee: justdave → nobody
QA Contact: reed → other-bmo-issues
Summary: add items to robots.txt → Permit pages describing Bugzilla's configuration to be indexed by search engines (alter robots.txt)
Component: Bugzilla: Other b.m.o Issues → General
Product: mozilla.org → bugzilla.mozilla.org
This is fixed right? (on b.m.o at least?)
It is, but we currently do not allow indexing of config.cgi which has more information about BMO's configuration. We may need to allow this too. Right now we allow:

Allow: /*index.cgi
Allow: /*page.cgi
Allow: /*show_bug.cgi
Allow: /*describecomponents.cgi
Sitemap: http://bugzilla.mozilla.org/page.cgi?id=sitemap/sitemap.xml

dkl
Depends on: 839969
allowing config.cgi in its current form is a non-started; we'd need caching around it.
this isn't worth the effort.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.