Closed Bug 1198931 Opened 9 years ago Closed 4 years ago

Content in the Archive/ hierarchy should not be indexed by search engines

Categories

(developer.mozilla.org Graveyard :: General, enhancement)

All
Other
enhancement
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: sheppy, Unassigned)

References

Details

(Keywords: in-triage, Whiteboard: [specification][type:bug])

What did you do?
================
Sometimes people find obsolete, archived, material using Google. They shouldn't use this stuff -- it's just there for old-timers to refer back to, and for historical purposes.

What happened?
==============
People read old stuff.

What should have happened?
==========================
People shouldn't find the old stuff. Hopefully they will find  the new stuff instead.

Is there anything else we should know?
======================================
It's suggested that robots.txt may be able to handle this, but I don't know for sure.
I disagree about the need this feature. 

We make it clear in the Archive zone that the content is deprecated: I think finding the information with the clear deprecation information is better than to find others' info about the subject without this information.

Implementing this would have a negative impact: less users would be aware that something is deprecated, no more relevant, …
I actually agree with Jean-Yves and I think we should focus on better zone styling instead.
Hm. I don't agree; I think that we can avoid problems with newbies finding obsolete docs and not realizing what they're dealing with by having this in place; and the on-site search will still reveal this stuff, so the people that actually need to see it will be able to find it.
Severity: normal → enhancement
Keywords: in-triage
Just to have my thoughts expressed more clearly:

I feel that because most of these pages are not useful to everyday folks, due to their historical nature and the fact that the information in many of those pages could even be dangerously out of date, they should be difficult to find.

There are a few options for accomplishing this:

1. This patch takes the simplest course: it asks that pages in the archive not be indexed by search engines at all. Users will still be able to find these pages by searching using MDN's built-in search interface, so discoverability is reduced but not eliminated. That makes this solution, IMO, a pretty good one, if not perfect.

2. Coming up with some method for preventing the archived content from being ranked high on Google, so that people are likely to find more up-to-date or appropriate content first.

3. Injecting some kind of warning into the SEO summary of these pages. This would have to be done automatically, as we cannot rely on contributors to remember to update every page that gets archived (especially if they archive a large tree of pages).

We may also want to add a warning message to the pages, explaining what "Archived" means.

And we already have a bug filed for adding a toggle to the options on MDN's search UI to turn on and off inclusion of the archive in searches (bug 969601).

My personal feeling is the most effective and resource-sensible solution to this issue is #1 (hence my decision to go ahead and spend time working on a patch), and I thought we had consensus on this based on discussions in #mdn, but apparently I was mistaken.

Other options involve a lot more work and won't happen anytime soon. Plus, we can always remove this disallow once we implement a better solution anyway.
Do you have a concrete example (or two) where archived content appears in a SERP ahead of non-archived content?
Flags: needinfo?(eshepherd)
(In reply to Jean-Yves Perrier [:teoli] from comment #6)
> Do you have a concrete example (or two) where archived content appears in a
> SERP ahead of non-archived content?

Googling "mozilla create extension" results in the third result being in the archive area, above "Getting started with Firefox extensions", "Creating custom Firefox extensions with the Mozilla build system", and even "Setting up an extension development environment".

That's not the worst example of this I've seen, but it's pretty bad and it's the one I came up with just now.
Flags: needinfo?(eshepherd)
It looks to me that this specific page should simply be made a redirect to the modern page.
Is this a dupe of bug 969601? Or slightly different?
Bug 969601 is about on-site search results; this one is about indexing by external search engines (Google, Yahoo, etc.).
MDN Web Docs' bug reporting has now moved to GitHub. From now on, please file content bugs at https://github.com/mdn/sprints/issues/ and platform bugs at https://github.com/mdn/kuma/issues/.
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → WONTFIX
Product: developer.mozilla.org → developer.mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.