We should probably add a robots.txt to stop additional load from search engines crawling treeherder URLs, since they'll be referenced all over the place. We could add one to the UI repo root, but I guess that will only cover treeherder.m.o/ui/* unless we fiddle with the apache config and add a redirect from root?
I meant to add: search engines support JS, so hitting the UI means they do cause API load, plus they'll also likely stumble upon API URLs in bug comments, or worse things like the dynamically generated Swagger docs (treeherder-dev.a.org/docs/ ; is disabled on prod).
Summary: Treeherder needs a robots.txt → Treeherder needs a robots.txt to prevent load from crawlers
Component: Treeherder → Treeherder: Infrastructure
QA Contact: laura
No longer blocks: 1080757
Going with the default no-bots-here robots.txt unless there's something you'd like excepted: User-agent: * Disallow: / deployed on stage and prod webheads
Assignee: nobody → klibby
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED
sgtm, thank you :-) Is this currently managed via puppet? Is this something we could get checked into the repo instead? I'#m a fan of having as few hidden/magical things as possible, and for most people if it isn't in the repo, its invisible to them :-)
it is in puppet because of the apache config, which is managed by the webapp module. with the proxy happening, you'd have to have gunicorn handle it if you wanted it in the repo, I think.
You need to log in before you can comment on or make changes to this bug.