add search results to robots.txt

NEW
Unassigned

Status

Mozilla Developer Network
General
5 months ago
5 months ago

People

(Reporter: atopal, Unassigned)

Tracking

({in-triage})

Details

(Reporter)

Description

5 months ago
About 2% of our daily Google crawl budget is spent on search results pages that don't provide value for users. Please add them to robots.txt so GoogleBot knows to skip them in the future.

Example: 
/en-US/search?q=%E6%B3%B0%E5%B7%9E%E5%8A%9E%E7%9C%9F%E5%AE%9E%E5%87%BA%E7%94%9F%E8%AF%811300252952Qqfrtznv
Keywords: in-triage
Search results are currently marked to be excluded from the index:

<meta name="robots" content="noindex, follow">

It may be worth re-reading Google's FAQ on robots.txt:

https://support.google.com/webmasters/answer/6062608?hl=en

The last item explains that, if an external page links to the page, it may still be crawled and appears in search results.

My worry is that including search results in robots.txt will mean that they will start appearing in search results again, but with the "Sorry, the site doesn't allow us to show a description here" message.

It may be worth checking if we're linking to search results internally (GA, Apache logs), from templates or MDN content, which would make the issue worse.
You need to log in before you can comment on or make changes to this bug.