Closed Bug 1360744 Opened 8 years ago Closed 7 years ago

Hardware Survey Report on metrics.mozilla needs to allow a description to show in Google search results

Categories

(Infrastructure & Operations Graveyard :: WebOps: Other, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: hcrince, Assigned: ericz)

Details

(Whiteboard: [kanban:https://webops.kanbanize.com/ctrl_board/2/4902])

When searching for Firefox Hardware report, the following result returns 'We would like to show you a description, but the site won't let us". See https://cloud.githubusercontent.com/assets/883721/21422967/bda1b6e0-c83a-11e6-9ac1-c8eca3e67fab.PNG We would like for a search for Firefox Hardware report to return a description of: "The Firefox Hardware Report is a public weekly report of the hardware used by a representative sample of the population from Firefox's release channel on desktop. This information can be used by developers to improve the Firefox experience for users." Per openjck in waffle card (https://waffle.io/mozilla/firefox-hardware-report/cards/58f10252c5fd8c00a6a23b03), a possible solution may be: Removing the top-level robots.txt would make those other sites indexable. Anything behind password protection would still be protected from search engines, but their URLs and any public content could show up. I would recommend continuing to protect those sites while making the Hardware Report available to search engines. We could do that in two ways: either by updating the top-level robots.txt to manually list the patterns of URLs we want to protect, or by having one catch-all robots.txt in each of the protected directories.
Assignee: nobody → server-ops-webops
Component: Dashboard → WebOps: Other
Product: Web Apps → Infrastructure & Operations
QA Contact: smani
Whiteboard: [kanban:https://webops.kanbanize.com/ctrl_board/2/4902]
shyam@quorra ~ $ curl -I -L https://metrics.mozilla.com/robots.txt HTTP/1.1 200 OK X-Backend-Server: TS Content-Type: text/plain Date: Fri, 02 Jun 2017 16:49:34 GMT Connection: Keep-Alive Content-Length: 28 That's from Zeus... Since this is now hosted on Heroku, I'm wondering if we should fix it (not sure how that'll affect indexing on Google's end since the new domain on Heroku is what should get picked up by search engines anyway) and what's left on this server is just redirects. Bug 1362525 is where we changed to redirects.
Triage - let's pick this up for next sprint - Critical - 2 hours max.
Assignee: server-ops-webops → eziegenhorn
I have added an allow clause for /firefox-health-report that I think should allow google to crawl it looking at their guidelines: https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt $ curl https://metrics.mozilla.com/robots.txt User-agent: * Allow: /firefox-hardware-report Disallow: /
(In reply to Eric Ziegenhorn :ericz from comment #3) > I have added an allow clause for /firefox-health-report that I think should > allow google to crawl it looking at their guidelines: > https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt > > $ curl https://metrics.mozilla.com/robots.txt > User-agent: * > Allow: /firefox-hardware-report > Disallow: / Thanks, Eric. The URL has changed since the bug was opened. Sorry for not noticing this earlier. The new URL is: https://hardware.metrics.mozilla.com/
John, that is hosted on Heroku so the developer who owns that app there would be best positioned to modify robots.txt there.
Flags: needinfo?(jkarahalis)
Gotcha. I can add a robots.txt for sure, but would it have an effect if metrics.mozilla.com has a robots.txt disallowing all?
Flags: needinfo?(jkarahalis)
Crawlers on hardware.metrics.mozilla.com will use it's own robots.txt and not consider the higher-level domain metrics.mozilla.com according to Google's guidelines.
Oh, great! I misread the guidance on subdomains but it looks like you're right. So we should be all set. Thanks!
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Feel free to remove the "Allow" for /firefox-hardware-report since it's only a redirect anyway.
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.