Closed
Bug 127452
Opened 23 years ago
Closed 20 years ago
Change robots.txt to allow search engines index bugzilla.mozilla.org
Categories
(bugzilla.mozilla.org :: General, enhancement)
bugzilla.mozilla.org
General
Tracking
()
RESOLVED
DUPLICATE
of bug 187870
People
(Reporter: andrea.monni, Assigned: endico)
Details
I file this bug from a post by Nicolás Lichtmaier on n.p.m.general, the original one can be found here: http://groups.google.com/groups?hl=en&selm=3C76B4AA.7000600%40technisys.com.ar And here follows the test: <Nicolás Post> From: Nicolás Lichtmaier (nick@technisys.com.ar) Subject: Bugzilla and search engines Newsgroups: netscape.public.mozilla.general Date: 2002-02-22 13:15:17 PST Currently bugzilla.mozilla.org disallow indexing by search engines. The robots.txt file has: User-agent: * Allow: /index.html Disallow: / Having the bug pages indexed by Google is of course a good thing: * Google could be used to search if a bug has already been reported, and it will probably make a better job than the current search system. * If someone searchs about a company Google will probably shown an evangelism bug reported to his website. * If someone is searching about a web feature (e.g. border-collapese) Google will be able to display Mozilla's bug about that. So I propose to change robots.txt to only disallow entries which causes searchs in bugzilla, and allow http://bugzilla.mozilla.org/show_bug.cgi at least. Thoughts? </Nicolás Post> I can only add that, obviously, I second this proposal and that Nicolás is the only to blame for this proposal! ;) Andrea
Comment 2•22 years ago
|
||
I read that discussion, and I also agree Google should be allowed in. The arguments I remember being mentioned against were: (a) You can do all kinds of search in the Bugzilla query page. Google will answer in half a second, whereas with Bugzilla I always try to limit the search to one or two components and only search the summary, in order to keep the search time reasonable. I never search in the description/comments. So, no, in reality you can't search the description/comments with Bugzilla. (b) Google may return too many results. I think we all know that Google normally returns thousands of results for our queries, but the order in which it displays them is amazing. If you search the Bugzilla database with it, in most cases you will get what you're looking for on the first result page. (c) The index will be out of date. No more than three months, I think. Most of my searches are for bugs older than this. I understand that many Bugzilla users won't benefit from Google because of its index being old, but a large class of users, like me, will benefit. (d) The bots will be a large load for the server. I am the administrator of a dynamic web site which totals around 1700 pages. When Google reindexes the entire site, it makes one request every 3 to 5 minutes, and it takes a few days to finish reindexing. My server _has_ gone up in flames a couple of times, due to bad robots, but never because of Google. In order to avoid something like this happening, we could specify, in robots.txt, that only Google is allowed. Since bmo has 200 thousand bugs, it would take around a year to reindex it at 3-5 minutes per request. I don't know what Google would do for that. It would need one request every 5 seconds to finish in one week. But at first we should only allow it in, and it would find only those bugs for which there are references on the web and groups, and these are probably much less. We can then observe its behavior and decide whether we will create pages with links that will lead it to all bugs.
Comment 3•22 years ago
|
||
See bug 145588 for what we're planning to do to implement Google-like search in Bugzilla.
Comment 4•22 years ago
|
||
While fixing bug 145588 will make searching easier, I can see advantages to letting Google index show_bug.cgi (and only show_bug.cgi). I think the biggest advantage is that our bugs will be intwined in the results of other queries. I can't tell you how many times I've had a problem w/some Microsoft software, went to Google to find the answer, and found an article in the MSKB. Even though I could have just gone to the MSKB first, I didn't think about it; Google rescued me. A couple other advantages that are similar in nature were listed in comment 0. As a side note, we don't have an index.html to be allowed anymore.
Comment 5•22 years ago
|
||
Well, we could allow index.cgi in the robots.txt file. The issue is that the googlebot grabbing all 200000 bugs would be a really heavy load on bmo. Maybe post mod-perl, we can look at it,but... Also, w/o last-modified dates there would be bandwidth issues too (I assume that the googlebot does conditional fetches) Plus you'd have to get the listof bugs to google somehow, and I don't think google indexes cgi requests - you'd need a static url (http://bmo/show_bug/12345) via mod_rewrite or similar.
Comment 8•20 years ago
|
||
*** This bug has been marked as a duplicate of 187870 ***
Status: NEW → RESOLVED
Closed: 20 years ago
Resolution: --- → DUPLICATE
Updated•13 years ago
|
Component: Bugzilla: Other b.m.o Issues → General
Product: mozilla.org → bugzilla.mozilla.org
You need to log in
before you can comment on or make changes to this bug.
Description
•