Closed Bug 840092 Opened 11 years ago Closed 11 years ago

Translate a doc (sometimes) prepends 'en' to the slug

Categories

(developer.mozilla.org Graveyard :: Localization, defect, P1)

defect

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: sheppy, Unassigned)

References

Details

(Whiteboard: [specification][type:bug][specification-comment:1])

Somehow we wound up with a translation of the IndexedDB page here:

https://developer.mozilla.org/fr/docs/en/IndexedDB

But there was also a page here:

https://developer.mozilla.org/fr/docs/IndexedDB

We didn't know about the latter, so we renamed (in the Django UI, because we couldn't stop the redirect from the latter to the former) the first one to match the second.

But now it looks like that was the wrong thing to do, as content is missing. We're very confused (as you can tell by this weirdly phrased bug report). Please help us!
Just re-writing this in a format that should be a little easier for us to skim. Sheppy, please let me know if any of this is incorrect.

What did you do?
================
1. Load https://developer.mozilla.org/fr/docs/en/IndexedDB
2. Load https://developer.mozilla.org/fr/docs/IndexedDB

What happened?
==============
The first page redirects to the English version of the page. The second page loads the English version of the document.

What should have happened?
==========================
The first page should not exist (404). The second page should load the French version of the document, which existed previously.

Is there anything else we should know?
======================================
Whiteboard: [specification][type:bug][specification-comment:1]
General thing: that page (fr/docs/en/IndexedDB) should have never been created. But it did...

Moins52 created fr/docs/IndexedDB page, then for some reason, saved the same page under a different slug (namely en/IndexedDB) causing the (kuma) redirect one can see here: https://developer.mozilla.org/fr/profiles/moins52 (page is called Redirect 1).

Although, due to legacy URL structure, when "en/" is contained in a URL, it is detected and the URL is rewritten to the equivalent en-US page. This happens before the kuma redirect. This is why Sheppy had to go to the Django UI to rename the page.
But the target page of the renaming already existed, resulting in losing track of the page that was created as the second (and its most recent edit).

In the result, we have an edit that we can't track down now (it doesn't appear in the fr/docs/IndexedDB history), and an accessible page that is not the most recent update. Both share the same slug.
Summary: Problems with page have resulted in creation of an alternate universe → Translate a doc (sometimes) prepends 'en' to the slug
Same thing happening in bug 836792. Luke, do we expect that fixing this bug will fix these individual problems, or will we need to correct those slugs manually?
Flags: needinfo?(lcrouch)
We will likely have to clean and fix the slugs manually. Not sure if/how we would write a db migration to do it. :/
Flags: needinfo?(lcrouch)
This needs to be our #1 top priority; things are getting out of hand fast. It's getting rapidly worse.
Severity: normal → blocker
(In reply to Luke Crouch [:groovecoder] from comment #5)
> We will likely have to clean and fix the slugs manually. Not sure if/how we
> would write a db migration to do it. :/

Can you please open a bug for fixing these slugs manually? I think you could describe the actual/expected results better than I could.

https://bugzilla.mozilla.org/form.mdn#h=detail|bug
Flags: needinfo?(lcrouch)
Blocks: 841088
wiki/test_views test to reproduce the issue:

https://gist.github.com/groovecoder/4947142

Filed https://bugzilla.mozilla.org/show_bug.cgi?id=841088
Flags: needinfo?(lcrouch)
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → DUPLICATE
Sorry I made the PR against this bug, so I'm going to mark the other one a dupe of this.
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
(From email:) This is blocking Japanese MWC efforts.
Okay, doing some data spelunking on this and looking through the localization logic. I think I have an answer for why this is happening, and will probably have a fix & a cleanup today.

Here are the details, for anyone who's interested:

* Many (but not all) pages in en-US trace their way up the topic path eventually to a page entitled "MDN" with a slug of "en"

* Back when we migrated to Kuma, we left "en/" out of the slugs. So, that root parent "MDN" page is there, but left unexpressed in en-US slugs.

* Back in September, bug 792418 introduced logic for translation that tries to match the en-US topic hierarchy for new translations. That means, if you create a new translation of HTML/HTML5/Image, but HTML and HTML/HTML5 don't yet exist in the target locale, the system tries to fill in the parents for the locale.

* But, there's this hidden "en" at the root of en-US. So, when a new translation is created and the parent-filling logic happens, the hidden "en" from en-US is cloned into the target locale and made visible (and thus part of the URL path).

At present, it looks like there are 121 documents with "en/" leading the slug:

mysql> select count(*) from wiki_document where substr(slug, 1, 3) = 'en/';
+----------+
| count(*) |
+----------+
|      121 |
+----------+

That should be easy enough to clean up with some quick SQL.

To prevent this in the future, I think the solution is to just nuke that hidden "en" page in en-US. It's not doing anything vital. Then, also nuke its clones in all other locales. Any pages claiming those "en" roots as parents will themselves become topic roots, as expected, and this issue should not reoccur.

Since all the above calls for some possibly scary SQL manipulations / Django migrations, I'm downloading a copy of the site to try all this out on my laptop first. So, the fix will probably be quick, but I want to try it out safely before hitting the real site.
(In reply to Les Orchard [:lorchard] from comment #14)

> * Back when we migrated to Kuma, we left "en/" out of the slugs. So, that
> root parent "MDN" page is there, but left unexpressed in en-US slugs.

And also for the sake of explanation, there are similar hidden roots in most other locales (eg. nl/ for nl, ja/ for ja, etc). But, they haven't been a problem, because we do not support creating translations from any locale other than en-US.

If it doesn't hurt anything, I may remove those hidden roots in other locales, too.
(In reply to Les Orchard [:lorchard] from comment #15)
> If it doesn't hurt anything, I may remove those hidden roots in other
> locales, too.

Probably a good idea, since we have talked about allowing people to use other langauges as the "source" for a translation, for example using a Spanish document as the source of a Greek document.

Will this remove "MDN" from the breadcrumb? Removing it could affect our SEO and of course user experience, so if possible it might be nice to hardcode that in as part of this.
We'll find a way to hardcode MDN into the template so it's not lost.
Okay, so I think this is fixed now. Reopen this bug if the en/ thing crops up again.

I did the following:

* Removed en/ from any slugs where it was present.

* In cases where removing en/ from the slug would result in a collision with an existing page, I added "-840092-dup" to the end of the slug.

* The above two things affected about 121 documents across locales.

* All pages affected by the above were tagged "bug-840092". For example, in ja locale:
    https://developer.mozilla.org/ja/docs/tag/bug-840092

* All pages with "-840092-dup" added were tagged "bug-840092-dup" For example, in ja locale:
    https://developer.mozilla.org/ja/docs/tag/bug-840092-dup

* All pages with slug "en" have had their children promoted to the site root, so that future localizations will not catch the "en/" slug prefix.

* All pages with slugs matching their locale have had their children promoted to the site root, so that future localizations from non-en-US locales will not run into this problem.
Status: REOPENED → RESOLVED
Closed: 11 years ago11 years ago
Resolution: --- → FIXED
Oh, and yeah, the "MDN" breadcrumb is gone now. Here's a bug:

https://bugzilla.mozilla.org/show_bug.cgi?id=841461
WOW, https://developer.mozilla.org/ja/docs/Apps is finally accessible! Thank you :)
Status: RESOLVED → VERIFIED
Oh, and feel free to remove the bug-840092 and bug-840092-dup tags whenever those pages are next edited. They're just there so we have something to collect the pages I touched, in case anything went wrong during the process.
Blocks: 821694
Firefox has detected that the server is redirecting the request for this address in a way that will never complete.


https://developer.mozilla.org/zh-cn/docs/CSS/transform
(In reply to 446240525 from comment #22)
> Firefox has detected that the server is redirecting the request for this
> address in a way that will never complete.

I think that is Bug 818477.
(In reply to 446240525 from comment #22)
> Firefox has detected that the server is redirecting the request for this
> address in a way that will never complete.
> 
> 
> https://developer.mozilla.org/zh-cn/docs/CSS/transform

If you ever see a case like this, try adding ?redirect=no to the URL to see if there's a REDIRECT to itself in the page source. That should allow you to edit the page.
Product: developer.mozilla.org → developer.mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.