Closed Bug 886577 Opened 9 years ago Closed 9 years ago
search: do not index pages tagged with 'junk'
What feature should be changed? Please provide the URL of the feature if possible. ================================================================================== Search (https://developer.mozilla.org/search) What problems would this solve? =============================== Junk pages in results Who would use this? =================== All visitors What would users see? ===================== They would *not* see junk pages anymore What would users do? What would happen as a result? =================================================== Search; don't see junk pages in results Is there anything else we should know? ======================================
This bug is a bit vague. Maybe add some examples of what constitutes a "junk" page? Does that include things like spam, user pages, talk pages, etc? Should we just delete junk pages when we find them? It also might be better if the tag were named for its purpose - i.e. "noindex" ala the robots meta tag . That could trigger adding noindex to the <head> of pages to clue in Google as well. Could also be useful if we had a configurable set of tags that indicate pages not to index. : http://en.wikipedia.org/wiki/Noindex
Sorry, this was spawned by bug 886175. I assumed we would at least skip pages with a tag of 'junk' but you're right - there may be more tags to skip. (https://developer.mozilla.org/en-US/docs/tag/junk) Janet - any info you can add? How do we know which pages to exclude from search indices?
I agree that in theory, pages that need to be deleted are a subset of pages that should not be indexed. So there might be pages that should not have the Junk tag but also should not be indexed. However, I can't think of a use case at the moment. I suppose we could agree on a tag (such as "noindex") for that case if it ever arises, so that the code can look for it. Sheppy? Jean-Yves? Any thoughts?
I can't think of other pages that shouldn't be indexed, but it makes sense to prepare for the eventuality. I'd say that any of these tags should prevent a page from being indexed: * noindex * junk * spam * delete * deleteme In any capitalization. Also, don't include any pages under <locale>/docs/Trash I've set up that hierarchy as a place for me to move junk to to get it out of the way while reorganizing content, so it makes sense to simply not index those. That last one won't be necessary once we have a real system for handling deletion of pages, etc, but for now it's a good idea, I think.
Reading through this bug, it appears we have no examples of pages that should remain in existence but that should not appear in search results. Rather than adopting another feature, with another branch of code to test, another thing to explain to contributors, another workflow to support -- with no demonstrated long term benefit -- would it make more sense to just let the page deletion feature cover this use case, at least until a true need for this arises?
(In reply to John Karahalis [:openjck] from comment #5) > Reading through this bug, it appears we have no examples of pages that > should remain in existence but that should not appear in search results. > Rather than adopting another feature, with another branch of code to test, > another thing to explain to contributors, another workflow to support -- > with no demonstrated long term benefit -- would it make more sense to just > let the page deletion feature cover this use case, at least until a true > need for this arises? Agreed, actually. When this was filed, we didn't know that page deletion was coming anytime in the near future.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WONTFIX
Product: developer.mozilla.org → developer.mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.