Closed
Bug 797571
Opened 12 years ago
Closed 11 years ago
Repair orphaned translations automatically where possible
Categories
(developer.mozilla.org Graveyard :: Localization, defect, P1)
developer.mozilla.org Graveyard
Localization
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: lorchard, Unassigned)
References
Details
(Whiteboard: [localization])
Some notes continued over from bug 792417, describing some data spelunking on a data set that's a few weeks old:
select count(d1.id)
from wiki_document as d1
where d1.parent_id is null and
d1.locale <> 'en-US' and
d1.slug in (select d2.slug
from wiki_document as d2
where d2.locale='en-US')
That yields 6758 documents in non-en-US locales, whose slugs correspond to documents in the en-US locale, yet do not claim to be translations. But, since the slugs match, they probably *are* translations
Another interesting stat:
select count(d1.id)
from wiki_document as d1
where d1.parent_id is null and
d1.locale <> 'en-US' and
left(d1.html,8) <> 'REDIRECT' and
d1.slug in (select d2.slug
from wiki_document as d2
where d2.locale='en-US')
This yields 4523 documents - meaning that 4523 of the 6758 apparent orphaned translations whose slugs correspond to en-US pages are actually redirects.
A quick glance at some of those 4523 tell me that the seem to point at pages that have been moved from en-US-inspired slugs to translated slugs. (eg. /de/docs/Code_snippets/Scrollbar -> /de/docs/Codeschnipsel/Scrollbar) In an arbitrary sampling of these, it looked like the redirect target was itself marked as a translation of an en-US page. So, no real orphan problem here.
That leaves 2235 remaining non-en-US documents whose slugs correspond to en-US documents. Those are probably real orphaned translations, and so we can probably fix those. (Thus, this bug)
Reporter | ||
Comment 1•12 years ago
|
||
More research:
select count(id) from wiki_document
where parent_id is null and
html like '%languages(%'
This yields 6174 documents which contain 'languages(' - a likely indicator that there's a {{ languages() }} or {{ wiki.languages() }} macro in the page that migration failed to parse and use for setting a translation parent.
These are probably translation orphans we can address by trying harder to parse that macro to locate an en-US parent.
Reporter | ||
Comment 2•12 years ago
|
||
Actually, let's refine that last query:
select count(id) from wiki_document
where parent_id is null and
locale <> 'en-US' and
html like '%languages(%'
That yields 2901 documents - en-US documents are never translations of other en-US documents :)
Reporter | ||
Updated•12 years ago
|
Whiteboard: s=2012-10-30
Updated•12 years ago
|
Whiteboard: s=2012-10-30 → s=2012-10-30 p=
Updated•12 years ago
|
Whiteboard: s=2012-10-30 p= → s=2012-10-30 u=user
Updated•12 years ago
|
Priority: -- → P2
Updated•12 years ago
|
Whiteboard: s=2012-10-30 u=user → s=2012-10-30 u=user c=Localization
Updated•12 years ago
|
Priority: P2 → P1
Updated•12 years ago
|
Whiteboard: s=2012-10-30 u=user c=Localization → [localization]
Comment 3•11 years ago
|
||
Jean-Yves and I spent a few hundreds hours to fix pages with a lost parents.
I have implemented a maintenance page to list without-parent pages some time ago for that:
https://developer.mozilla.org/de/docs/without-parent (0 in most locales now)
Les, do we still have a problem here? Do you want to run these queries again and describe if there is still something that would need reparation?
Flags: needinfo?(lorchard)
Comment 4•11 years ago
|
||
mysql> select count(d1.id)
-> from wiki_document as d1
-> where d1.parent_id is null and
-> d1.locale <> 'en-US' and
-> d1.slug in (select d2.slug
-> from wiki_document as d2
-> where d2.locale='en-US');
+--------------+
| count(d1.id) |
+--------------+
| 8663 |
+--------------+
1 row in set (0.28 sec)
So there are even more non-English pages with identical slugs to English pages that don't have an English "parent" page.
mysql> select count(d1.id)
-> from wiki_document as d1
-> where d1.parent_id is null and
-> d1.locale <> 'en-US' and
-> left(d1.html,8) <> 'REDIRECT' and
-> d1.slug in (select d2.slug
-> from wiki_document as d2
-> where d2.locale='en-US');
+--------------+
| count(d1.id) |
+--------------+
| 740 |
+--------------+
1 row in set (0.16 sec)
I'm not sure what accounts for such a big drop in this number and this ratio, unless:
* We've cleaned out old redirects?
* redirects aren't in the first 8 characters anymore?
mysql> select count(id) from wiki_document
-> where parent_id is null and
-> locale <> 'en-US' and
-> html like '%languages(%' ;
+-----------+
| count(id) |
+-----------+
| 585 |
+-----------+
1 row in set (0.34 sec)
So, much fewer non-English pages without a parent that use the "languages" macro. Not sure what that means ...
Flags: needinfo?(lorchard)
Comment 5•11 years ago
|
||
I think the first query is useless because redirects have no parent. So that increased because we have moved pages in the locales.
I guess a query that excludes redirects and deleted pages could use d1.is_redirect = 0 and deleted = 0.
Furthermore we are not worrying about Talk and User pages.
select count(d1.id)
from wiki_document as d1
where d1.parent_id is null and
d1.locale <> 'en-US' and d1.is_redirect = 0 and d1.deleted = 0
and d1.slug not LIKE "Talk:%" and d1.slug not LIKE "User:%" and
d1.slug in (select d2.slug
from wiki_document as d2
where d2.locale='en-US');
That will give us 475 pages, a spot check told me, that these pages are listed on our maintenance pages "without-parent" already.
For the language macro usage: It's going down because we've fixed a bunch of these pages and are removing the macro little by little. Again, no need for redirects, deleted, Talk and User pages.
select locale, slug from wiki_document
where parent_id is null and is_redirect = 0 and deleted = 0
and slug not LIKE "Talk:%" and slug not LIKE "User:%" and
locale <> 'en-US' and
html like '%languages(%'
Same here: Spot check says that these pages are already listed on the "without-parent" maintenance page.
So, if I don't miss something here, this should be covered by the listing on https://developer.mozilla.org/<locale>/docs/without-parent
We already fixed a lot there and are intending to get those 0 in every locale.
This bug proposes to have an automatic reparation. That never happened and probably will not.
Anything else left to do here?
Comment 6•11 years ago
|
||
Great comment and information Florian. We're done here!
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → WONTFIX
Updated•5 years ago
|
Product: developer.mozilla.org → developer.mozilla.org Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•