add enter_bug.cgi to the disallow list in robots.txt

RESOLVED WONTFIX

Status

()

RESOLVED WONTFIX
6 years ago
6 years ago

People

(Reporter: glob, Assigned: dkl)

Tracking

Production
x86
Mac OS X

Details

Attachments

(1 attachment)

(Reporter)

Description

6 years ago
as a result of stuff like:

> [10/Feb/2013:23:42:35 -0800] "GET /enter_bug.cgi?product=Core&component=WebRTC HTTP/1.0" 301 284 "https://bugzilla.mozilla.org" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 0 bugzilla.mozilla.org 139
> [10/Feb/2013:23:42:35 -0800] "GET /enter_bug.cgi?product=Core&component=WebRTC HTTP/1.0" 301 284 "https://bugzilla.mozilla.org" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 0 bugzilla.mozilla.org 140
> [10/Feb/2013:23:42:37 -0800] "GET /enter_bug.cgi?product=Core&component=WebRTC HTTP/1.0" 301 284 "https://bugzilla.mozilla.org" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 0 bugzilla.mozilla.org 126
> [10/Feb/2013:23:42:37 -0800] "GET /enter_bug.cgi?product=Core&component=WebRTC HTTP/1.0" 301 284 "https://bugzilla.mozilla.org" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 0 bugzilla.mozilla.org 156
> [10/Feb/2013:23:42:38 -0800] "GET /enter_bug.cgi?product=Core&component=WebRTC HTTP/1.0" 301 284 "https://bugzilla.mozilla.org" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 0 bugzilla.mozilla.org 120
> [10/Feb/2013:23:42:38 -0800] "GET /enter_bug.cgi?product=Core&component=WebRTC HTTP/1.0" 301 284 "https://bugzilla.mozilla.org" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 0 bugzilla.mozilla.org 120
> [10/Feb/2013:23:42:39 -0800] "GET /enter_bug.cgi?product=Core&component=WebRTC HTTP/1.0" 301 284 "https://bugzilla.mozilla.org" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 0 bugzilla.mozilla.org 113
> [10/Feb/2013:23:42:39 -0800] "GET /enter_bug.cgi?product=Core&component=WebRTC HTTP/1.0" 301 284 "https://bugzilla.mozilla.org" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 0 bugzilla.mozilla.org 117
> [10/Feb/2013:23:42:40 -0800] "GET /enter_bug.cgi?product=Core&component=WebRTC HTTP/1.0" 301 284 "https://bugzilla.mozilla.org" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 0 bugzilla.mozilla.org 126
> [10/Feb/2013:23:42:40 -0800] "GET /enter_bug.cgi?product=Core&component=WebRTC HTTP/1.0" 301 284 "https://bugzilla.mozilla.org" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 0 bugzilla.mozilla.org 129
> [10/Feb/2013:23:42:41 -0800] "GET /enter_bug.cgi?product=Core&component=WebRTC HTTP/1.0" 301 284 "https://bugzilla.mozilla.org" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 0 bugzilla.mozilla.org 127
> [10/Feb/2013:23:42:42 -0800] "GET /enter_bug.cgi?product=Core&component=WebRTC HTTP/1.0" 301 284 "https://bugzilla.mozilla.org" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 0 bugzilla.mozilla.org 150
> [10/Feb/2013:23:42:42 -0800] "GET /enter_bug.cgi?product=Core&component=WebRTC HTTP/1.0" 301 284 "https://bugzilla.mozilla.org" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 0 bugzilla.mozilla.org 87
> [10/Feb/2013:23:42:43 -0800] "GET /enter_bug.cgi?product=Core&component=WebRTC HTTP/1.0" 301 284 "https://bugzilla.mozilla.org" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 0 bugzilla.mozilla.org 90

bots are always going to get the same response from enter_bug.cgi ("please login"), so we should just block that cgi.
(Reporter)

Comment 1

6 years ago
according to google's webmaster tools, it's already blocked:

http://bugzilla.mozilla.org/enter_bug.cgi
Blocked by line 5: Disallow: /
(Assignee)

Comment 2

6 years ago
Created attachment 714118 [details] [diff] [review]
Patch to explicitly disallow enter_bug.cgi in robots.txt (v1)

As you noted and I have also confirmed with third party robots.txt analyzers, the enter_bug.cgi page should already be disallowed due to Disallow: / and it seems Googlebot is not honoring that properly. We can try to add it as an explicit disallow for a while and see if it helps anyway.

dkl
Assignee: nobody → dkl
Status: NEW → ASSIGNED
Attachment #714118 - Flags: review?(glob)
(Reporter)

Comment 3

6 years ago
thanks for the patch, and for testing with other analysers.
let's leave it .. the torrent of requests has stopped.

if it ramps up again, we should try to find the source IP and determine if these are legit googlebot requests.
Status: ASSIGNED → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → WONTFIX
(Reporter)

Updated

6 years ago
Attachment #714118 - Flags: review?(glob)
You need to log in before you can comment on or make changes to this bug.