Closed Bug 1168488 Opened 9 years ago Closed 9 years ago

Marketplace websites: automate conversion from Cyrillic to Latin script for Serbian

Categories

(Marketplace Graveyard :: General, enhancement, P3)

Avenir
enhancement

Tracking

(Not tracked)

RESOLVED FIXED
2015-06-16

People

(Reporter: flod, Assigned: clouserw)

Details

(Whiteboard: [qa-])

I'd like to discuss the possibility to automate the conversion from Cyrillic to Latin script for Serbian and Marketplace projects.

This is the current situation.

1. Projects are translated on Verbatim in Cyrillic (locale code 'sr').

2. From time to time I compare the status of sr versus sr-Latn, run this script and manually open a pull request to update the localization against each of the Marketplace projects.
https://github.com/flodolo/scripts/blob/master/mozilla_l10n/sr-conversion/convert_githubrepo.sh

The script is pretty simple: it relies on icu4c/uconv to transliterate the gettext files. Cyrillic->Latin is a safe operation, since the only strings in Cyrillic are the actual translated strings. 

We've been using the same process to create Firefox OS localization in Latin for the past 2 years.

3. Wait for someone to merge the PR.

The ideal solution would be to integrate the conversion with the update process (add new strings + convert to Latin the files). 

An alternative solution would be to give me access to the repositories to be able to merge PRs and solve at least #3.
Will, do you have any thoughts on this? I don't know exactly how the update process works (I know the script, not who runs it, and physically where).
Flags: needinfo?(wclouser)
Severity: normal → enhancement
Priority: -- → P3
After digging a bit online I think I can do this:

>  msgfilter -i messages.po -o output.po recode-sr-latin

Any concerns with that solution?  Does icu4c/uconv do anything special?
Flags: needinfo?(wclouser)
(In reply to Wil Clouser [:clouserw] from comment #2)
> Any concerns with that solution?  Does icu4c/uconv do anything special?

I'm trying to compare the results of ICU and msgfilter (with Olympia) but it's kind of hard, since msgfilter reformats the world.

From a quick check I can't spot any difference in translations, so I think it's OK (and easier) to use msgfilter.
(In reply to Francesco Lodolo [:flod] from comment #3)
> I'm trying to compare the results of ICU and msgfilter (with Olympia) but
> it's kind of hard, since msgfilter reformats the world.

Applied a fake sed filter, and the diff looks good.
(In reply to Wil Clouser [:clouserw] from comment #5)
> (a) convert last weeks strings, so it will always be a week behind
> (b) not kick in until after the next extraction
> 
> but it will be automatic in the future.  Thanks!

Great, thanks. That's a lot better than the current situation (random manual) :-)
I would also like to have a double check if things are working as expected: for example, I'd expect Fireplace to have the same numbers of missing strings in sr and sr-Latn, but it's not the case (32 vs 49).
I think that's a symptom of the 1 week of lag -- it hasn't run yet.
(In reply to Wil Clouser [:clouserw] from comment #9)
> I think that's a symptom of the 1 week of lag -- it hasn't run yet.

Strange, but let's see after next update.

Conversion happens after merging new strings into the localized files. My understanding is that, if sr has 49 strings missing after merge, the resulting sr-Latn file should have 49 strings missing as well after the commit.
You need to log in before you can comment on or make changes to this bug.