Closed Bug 792417 Opened 7 years ago Closed 7 years ago

Repair translated docs' breadcrumbs

Categories

(developer.mozilla.org :: Localization, defect)

x86
macOS
defect
Not set

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: groovecoder, Assigned: lorchard)

References

Details

by transplanting current en-US topical hierarchy to the other locales.

See https://bugzilla.mozilla.org/show_bug.cgi?id=788823#c6
Kind of hoping that the solution to bug 792418 provides a utility method that can be used by a management command to normalize the topic paths for translated documents that are missing them.
Assignee: nobody → lorchard
So, FWIW: I did a bit of data spelunking in advance of this bug, and was surprised...

select count(*) from wiki_document 
where parent_id is not null 
    and parent_topic_id is null

This yields only 189 documents that are translations without topic parents. As we've thought it out, those are all the documents that would be affected by the proposed management command.

It seems like this was a larger problem than that.
This might be the more interesting problem:

select count(d1.id) 
from wiki_document as d1, wiki_document as d2 
where d1.parent_id is null and d1.locale<>'en-US' and 
    d1.slug=d2.slug and d2.locale='en-US'

That yields 6769 documents in non-en-US locales, whose slugs correspond to documents in the en-US locale, yet do not claim to be translations. But, since the slugs match, they probably *are* translations
I confirm that there are a lot of translations that lost their translating parents. There is a bug somewhere where we put the one we found until know. Ethertank put quite a few there.

I would say: automatically link them to the translation parents, as we will be able to modify it if needed. I guess very few of them are wrong.
One more interesting stat:

select count(d1.id) 
from wiki_document as d1 
where d1.parent_id is null and 
    d1.locale <> 'en-US' and 
    d1.slug not in (select d2.slug 
                    from wiki_document as d2 
                    where d2.locale='en-US')

That yields 9214 documents not in the en-US locale whose slugs do not correspond to en-US pages, and don't claim to be translations. 

That's also a larger number than I expected. It suggests to me that other locales have lots of content independent from en-US - and/or we have lots of orphaned translations and no easy way to automatically associate them with en-US counterparts (ie. using matching slugs as the criteria)

Here's the breakdown by locale:

pl	2709
fr	2033
es	1274
ja	833
pt-PT	666
zh-CN	535
cs	247
zh-TW	184
de	166
ru	148
ko	139
it	108
nl	50
ca	31
hu	20
pt-BR	17
he	14
fi	11
tr	6
el	5
ka	5
ar	4
ro	3
fa	2
vi	2
id	1
th	1
Commits pushed to master at https://github.com/mozilla/kuma

https://github.com/mozilla/kuma/commit/69483e52e47704b95e283afab1c2753771f4dedd
fix bug 792417: Management command to repair breadcrumbs for translations without

https://github.com/mozilla/kuma/commit/beadf70b692c4660d2d6aac2cdc29d954d4facd6
Merge pull request #639 from lmorchard/792417-repair-l10n-breadcrumbs

fix bug 792417: Management command to repair breadcrumbs for translations without
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
marking as verified. Will file new bug for anything we find that is related to this
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.