Treeherder needs a robots.txt to prevent load from crawlers

RESOLVED FIXED

Status

P3
normal
RESOLVED FIXED
4 years ago
4 years ago

People

(Reporter: emorley, Assigned: fubar)

Tracking

Details

(Reporter)

Description

4 years ago
We should probably add a robots.txt to stop additional load from search engines crawling treeherder URLs, since they'll be referenced all over the place.

We could add one to the UI repo root, but I guess that will only cover treeherder.m.o/ui/* unless we fiddle with the apache config and add a redirect from root?
(Reporter)

Comment 1

4 years ago
I meant to add: search engines support JS, so hitting the UI means they do cause API load, plus they'll also likely stumble upon API URLs in bug comments, or worse things like the dynamically generated Swagger docs (treeherder-dev.a.org/docs/ ; is disabled on prod).
(Reporter)

Updated

4 years ago
Blocks: 1080757
Summary: Treeherder needs a robots.txt → Treeherder needs a robots.txt to prevent load from crawlers
(Reporter)

Updated

4 years ago
Component: Treeherder → Treeherder: Infrastructure
QA Contact: laura
(Reporter)

Updated

4 years ago
No longer blocks: 1080757
(Assignee)

Comment 2

4 years ago
Going with the default no-bots-here robots.txt unless there's something you'd like excepted:

User-agent: *
Disallow: /

deployed on stage and prod webheads
Assignee: nobody → klibby
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED
(Reporter)

Comment 3

4 years ago
sgtm, thank you :-)

Is this currently managed via puppet? Is this something we could get checked into the repo instead? I'#m a fan of having as few hidden/magical things as possible, and for most people if it isn't in the repo, its invisible to them :-)
(Assignee)

Comment 4

4 years ago
it is in puppet because of the apache config, which is managed by the webapp module. with the proxy happening, you'd have to have gunicorn handle it if you wanted it in the repo, I think.
(Reporter)

Updated

4 years ago
Depends on: 1118387
You need to log in before you can comment on or make changes to this bug.