search: do not index pages tagged with 'junk'

RESOLVED WONTFIX

Status

defect
RESOLVED WONTFIX
6 years ago
6 years ago

People

(Reporter: groovecoder, Unassigned)

Tracking

Details

(Whiteboard: [specification][type:change])

Reporter

Description

6 years ago
What feature should be changed? Please provide the URL of the feature if possible.
==================================================================================
Search (https://developer.mozilla.org/search)

What problems would this solve?
===============================
Junk pages in results

Who would use this?
===================
All visitors

What would users see?
=====================
They would *not* see junk pages anymore

What would users do? What would happen as a result?
===================================================
Search; don't see junk pages in results

Is there anything else we should know?
======================================
Reporter

Updated

6 years ago
Blocks: 839214
This bug is a bit vague. Maybe add some examples of what constitutes a "junk" page? Does that include things like spam, user pages, talk pages, etc? Should we just delete junk pages when we find them?

It also might be better if the tag were named for its purpose - i.e. "noindex" ala the robots meta tag [1]. That could trigger adding noindex to the <head> of pages to clue in Google as well.

Could also be useful if we had a configurable set of tags that indicate pages not to index.

[1]: http://en.wikipedia.org/wiki/Noindex
Reporter

Comment 2

6 years ago
Sorry, this was spawned by bug 886175. I assumed we would at least skip pages with a tag of 'junk' but you're right - there may be more tags to skip.

(https://developer.mozilla.org/en-US/docs/tag/junk)

Janet - any info you can add? How do we know which pages to exclude from search indices?
Flags: needinfo?(jswisher)

Comment 3

6 years ago
I agree that in theory, pages that need to be deleted are a subset of pages that should not be indexed. So there might be pages that should not have the Junk tag but also should not be indexed. However, I can't think of a use case at the moment.

I suppose we could agree on a tag (such as "noindex") for that case if it ever arises, so that the code can look for it.

Sheppy? Jean-Yves? Any thoughts?
Flags: needinfo?(jswisher)
I can't think of other pages that shouldn't be indexed, but it makes sense to prepare for the eventuality.

I'd say that any of these tags should prevent a page from being indexed:

* noindex
* junk
* spam
* delete
* deleteme

In any capitalization.

Also, don't include any pages under <locale>/docs/Trash

I've set up that hierarchy as a place for me to move junk to to get it out of the way while reorganizing content, so it makes sense to simply not index those.

That last one won't be necessary once we have a real system for handling deletion of pages, etc, but for now it's a good idea, I think.
Blocks: 886175
Component: General → Site search
Reading through this bug, it appears we have no examples of pages that should remain in existence but that should not appear in search results. Rather than adopting another feature, with another branch of code to test, another thing to explain to contributors, another workflow to support -- with no demonstrated long term benefit -- would it make more sense to just let the page deletion feature cover this use case, at least until a true need for this arises?
(In reply to John Karahalis [:openjck] from comment #5)
> Reading through this bug, it appears we have no examples of pages that
> should remain in existence but that should not appear in search results.
> Rather than adopting another feature, with another branch of code to test,
> another thing to explain to contributors, another workflow to support --
> with no demonstrated long term benefit -- would it make more sense to just
> let the page deletion feature cover this use case, at least until a true
> need for this arises?

Agreed, actually. When this was filed, we didn't know that page deletion was coming anytime in the near future.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.