Closed Bug 454967 Opened 16 years ago Closed 12 years ago

Redirects still not working

Categories

(developer.mozilla.org Graveyard :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED INVALID

People

(Reporter: aaronlev, Assigned: sheppy)

References

Details

(Keywords: access)

Can we finally fix the redirects properly please please?

I keep discovering redirects that aren't working, probably because of punctuation:

For example:
http://developer.mozilla.org/En/ARIA:_Accessible_Rich_Internet_Applications/Relationship_to_HTML_FAQ#Who_supports_ARIA.3F
or
http://developer.mozilla.org/en/docs/AJAX%3aWAI_ARIA_Live_Regions/API_Support

It's a bit frustrating because there are a lot of good articles about Mozilla a11y and WAI-ARIA out on the web which point to our docs which now don't point to our resources. This affects what the readers get out of it, and the page ranking for our docs.

I also keep getting reports from people about broken links, and have to deal with that.

I'm complaining :)
Looking into this.
Assignee: nobody → eshepherd
OS: Windows XP → All
Hardware: PC → All
In addition:

http://developer.mozilla.org/ja/docs/Bugzilla-jp:Guide

is redirecting incorrectly to:

https://developer.mozilla.org/Bugzilla-jp:Guide/ja

That should be going to:

https://developer.mozilla.org/ja/Bugzilla-jp:Guide
Assignee: eshepherd → oremj
We still need this fixed; a lot of links to the site that were created in our MediaWiki days are doing this bizarre redirect with the language code at the end instead of the middle for some reason.
Here's the current block of rewrites that appears to deal with URLs of the type being redirected...  nothing's jumping out at me here, but maybe someone's got better eyes or isn't so tired :)

    ### Begin MDC rewrite rules ###
    RewriteCond %{REQUEST_URI} ^/(.*)$
    RewriteRule ^/([a-z]{2}|[a-z_]{5})/docs/(.*):(.*)$ /$2:$3/$1 [L,QSA,NE,NC,R]

    RewriteCond %{REQUEST_URI} ^/(.*)$
    RewriteRule ^/([a-z]{2}|[a-z_]{5})/docs/(.*)$ /$1/$2 [L,QSA,NE,NC,R]

    RewriteCond %{REQUEST_URI} ^/(.*)$
    RewriteRule ^/docs/(.*):(.*)$ /$1:$2/En [L,QSA,NE,NC,R]

    RewriteCond %{REQUEST_URI} ^/(.*)$
    RewriteRule ^/docs/(.*)$ /En/$1 [L,QSA,NE,NC,R]
    ### End MDC rewrite rules ###
(In reply to comment #6)
>     ### Begin MDC rewrite rules ###
>     RewriteCond %{REQUEST_URI} ^/(.*)$
>     RewriteRule ^/([a-z]{2}|[a-z_]{5})/docs/(.*):(.*)$ /$2:$3/$1 [L,QSA,NE,NC,R]

This seems incorrect as $1 refers to the locale here and it is being rewritten so it is at the end.  I believe this should be:

RewriteRule ^/([a-z]{2}|[a-z_]{5})/docs/(.*):(.*)$ /$1/$2:$3 [L,QSA,NE,NC,R]


>     RewriteCond %{REQUEST_URI} ^/(.*)$
>     RewriteRule ^/([a-z]{2}|[a-z_]{5})/docs/(.*)$ /$1/$2 [L,QSA,NE,NC,R]
> 
>     RewriteCond %{REQUEST_URI} ^/(.*)$
>     RewriteRule ^/docs/(.*):(.*)$ /$1:$2/En [L,QSA,NE,NC,R]

This also seems incorrect for the same reason.  It should be:

RewriteRule ^/docs/(.*):(.*)$ /En/$1:$2 [L,QSA,NE,NC,R]

> 
>     RewriteCond %{REQUEST_URI} ^/(.*)$
>     RewriteRule ^/docs/(.*)$ /En/$1 [L,QSA,NE,NC,R]
>     ### End MDC rewrite rules ###


Do we even need the first and third RewriteRules here?  It seems like the second and fourth rules would be enough.  Why are we treating pages with a colon in a special manner? Maybe there is some other URL format to redirect that I am missing?
(In reply to comment #7)
> RewriteRule ^/docs/(.*):(.*)$ /En/$1:$2 [L,QSA,NE,NC,R]

Lowercase /en/ should be the canonical case, see also bug 492148.
Eric, do you remember why we have special rules for colons?
No, but I seem to recall they were needed...
Maybe the mindtouch people remember why they are needed. Will you ping them?
I've sent them email; will comment again when I hear back.
Hey all - popping in at Eric's request. The original bug which tracked why these colons were necessary can be found here: http://bugs.developer.mindtouch.com/view.php?id=3259

The short is that namespace prefixes themselves can be localized, and given the number of languages that MDC is available in, we opted to use a catch-all instead of hardcoding the actual localized versions into the rewrite rules.
So what's the solution to this bug?
The short answer: special cases in the mdc_redirect.php pre-processing hook. 

To provide more background on the issue for people tracking it; when we originally ported MDC to MindTouch, maintaining permalinks was a top priority. There were a bunch of issues we had to address:

 * MediaWiki supported localized namespace prefixes - we do not
 * The original implementation of polyglotism in MediaWiki did not match up to MindTouch's - we treat namespaced pages as the absolute root level, not the language level
 * Deprecation of the /docs/ folder (but with permalinks maintained)

There's already a significant amount of business logic in mapping those links, so we split up the redirect rules between Apache and MindTouch itself - MDC actually has a special PHP pre-processing hook before the page is rendered to find it's correct location. 

We had a list of use cases we had to match that we tested extensively against before launching (see the previously linked bug). 

In order to prevent endless redirects, we explicitly set the language code to the last part - this prevents our pre-processing hook from being executed for "valid" URL entry points and reduces the risk of breaking future pages, but the downside is that for edge cases on old links, it won't work. 

So the solution here is to look at the edge cases that are failing, and add those special cases into mdc_redirects.php. I can take a stab at this after work and submit a new patch to Eric. I'll try to catch the use cases in the bug as filed. 

Regarding the very first link: it's an external redirect, and MindTouch does not automatically redirect for external redirects. (I'm not sure if MediaWiki ever did?)
Let me know when a patch is ready.
Assignee: oremj → eshepherd
Still waiting on this.

Also, any idea why Google still lists out of date URLs in their search results in the first place? Do we have a robots.txt file that's preventing it from updating its search results? These old links should not be showing up in Google searches in the first place.
(In reply to comment #17)
> Still waiting on this.
> 
> Also, any idea why Google still lists out of date URLs in their search results
> in the first place? 

Because a lot of high ranked pages link to those URLs and the server doesn't send an error 404.

> Do we have a robots.txt file that's preventing it from
> updating its search results? 

No, please! 

> These old links should not be showing up in Google
> searches in the first place.

The only sane solution is a 301 REDIRECT, as I said in bug 500287.
 - Top ranked blog entries (also e.g. Google Doctype) link to us, 
   users shouldn't get lost and confused. 
   You shouldn't expect bloggers would update links in old blog entries. 
 - 301 REDIRECTs immediately throw out old URLs from Google's index,
   transferring the old page rank to the new URLs.
Yeah, we should definitely be sending 301s for these ancient paths...  what are we doing instead?
Well, right now, they're being converted into incorrect URLs that result in going to nonexistent pages in the wiki. This is what needs to be fixed.
But if we're really sending 301s, then why doesn't google repair the links to point to the "incorrect URLs" we're converting to?
The 301s are not being sent yet. I am still putting together a new plug-in to do this.
(In reply to comment #19)
> Yeah, we should definitely be sending 301s for these ancient paths...  what 
> are we doing instead?

See this examples, both should have a permanent redirect to
 https://developer.mozilla.org/en/CSS/z-index

 https://developer.mozilla.org/en/CSS:z-index 
  (will stay for ever in google's index, the canonical page is excluded
   due "duplicate content")

 https://developer.mozilla.org/en/docs/CSS:z-index
  obvious an more serious bug, redirects to
   https://developer.mozilla.org/CSS:z-index/en
(In reply to comment #22)
> The 301s are not being sent yet. I am still putting together a new plug-in to
> do this.

R=301 should do the trick.
Component: Deki Infrastructure → Other
Flags: needinfo?(jkarahalis)
Old Deki bug.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → INVALID
Flags: needinfo?(jkarahalis)
Product: developer.mozilla.org → developer.mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.