Closed Bug 430491 Opened 16 years ago Closed 16 years ago

AMO should have en-US fallback instead of showing msgids

Categories

(addons.mozilla.org Graveyard :: Localization, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: morgamic, Assigned: cpollett)

Details

Attachments

(1 file, 2 obsolete files)

When we add a new string, until it is localized in other languages it shows up as a msgid which sometimes completely ruins a page/element/widget/whatever.

A more graceful approach would be to provide en-US fallback in place of msgids, which still accomplishes the goal of saying, "hey this isn't localized!" but doesn't break the page completely.  At least the en-US alternative has meaning.

The mixing of en-US and other locales already occurs since a lot of user-submitted content is not localized, so while I appreciate the argument that english inside of another locale is bad, I don't think it's as bad as showing msgids...

An idea was to wrap _() with something else -- __()?  But we were worried about how strings are extracted.

From here we need to figure out what to do because the current system isn't working -- especially for quicker dev cycles.
Is moving to __() and manually doing fallback the right way to do this, or should we convert to using .po files the way there are supposed to be used with English strings as the msgids?  How often have we changed the English strings?  less than 10?  less than 5?
I am unsure if changing English strings are the major problem: Seemingly identical strings will also suffer. One example would be "sort by: name" where "Name" becomes an l10n string. In some (actually, many) languages, "sort by" will ask for a different case following it than when "name" is used in nominative case. Using plain English message IDs will make it impossible in such cases for localizers to apply correct grammar in their languages.
(In reply to comment #2)
> I am unsure if changing English strings are the major problem: Seemingly
> identical strings will also suffer. One example would be "sort by: name" where
> "Name" becomes an l10n string. In some (actually, many) languages, "sort by"
> will ask for a different case following it than when "name" is used in
> nominative case. Using plain English message IDs will make it impossible in
> such cases for localizers to apply correct grammar in their languages.
> 

You did a great job in bug 412597 explaining why we're using what we are.  I'm just saying - I'm having second thoughts.  This has actually been brewing in my mind for a while (I was going to write a post about it), but this bug got a jumpstart on me.

One of the main reasons, changing English strings, happens so rarely I don't think it's a big deal.  The other reason, the one you mention above, is a problem, but I'm wondering how often it would occur.  Compromises on both sides (creative English wording and generic forms in other locales) would address it I think.
Note, the common way to use POs still offers context markers to disambiguate ambiguous English strings.
http://www.gnu.org/software/gettext/manual/html_node/gettext_150.html#Contexts

Not that I found a way to use those from php.
An idea I just had:

We could have two sets of .po files - one AMO style and one normal.  The localizers could use the normal style and check them in and then an SVN hook or something could just convert from normal -> AMO using en-US as a reference.

So:  Using the normal+translated .po, we grab the English string (msgid) and look it up in the English AMO .po as the msgstr.  Then we grab the associated AMO .po msgid (the one with underscores) and combine that with the normal+translated msgstr in a new file.  Eventually we create an AMO+translated file.

...or we could just use the regular files since this doesn't fix the namespace problems...
(In reply to comment #5)
> An idea I just had:
> 
> We could have two sets of .po files - one AMO style and one normal.  The
> localizers could use the normal style and check them in and then an SVN hook or
> something could just convert from normal -> AMO using en-US as a reference.
> 
> So:  Using the normal+translated .po, we grab the English string (msgid) and
> look it up in the English AMO .po as the msgstr.  Then we grab the associated
> AMO .po msgid (the one with underscores) and combine that with the
> normal+translated msgstr in a new file.  Eventually we create an AMO+translated
> file.
> 
> ...or we could just use the regular files since this doesn't fix the namespace
> problems...
> 

Sounds like it would be easier to just drop gettext and use something simple like an ini file and a custom php function.
(In reply to comment #6)
> Sounds like it would be easier to just drop gettext and use something simple
> like an ini file and a custom php function.

Everything that needs to generate a gigantic array on every request an search it repeatedly for each page view we receive sounds like a performance deathtrap to me. (Not saying it will actually turn out unusable, I just want us to think of performance aspects when discussing solutions for this).
Btw. PHP's PEAR has two "Translation" packages.
A fairly decent hook in translate toolkit should be able to just do this.

Terminology I use:

"monolingual" denotes the po files as they're currently used, with symbolic lookup names.

"bilingual" denotes po files as commonly used.

Now, a converter could use the symbolic names to feed them into a msgctxt in the bilingual files. Converting back, it would use en-US as template to fill in missing strings, where missing means, symbolic names in the monolingual en-US files that don't have corresponding msgctxts in the corresponding bilingual localized file.

That way, we had symbolic lookups with English fallback without msgctxt to expose to php, while having bilingual files with msgctxt for translators.
Attached patch proposed new extract-po.sh (obsolete) — Splinter Review
Hey,

I noticed something which might solve this problem:
If in the extract-po.sh script add the following line: --keyword=__ \
(see attached new  extract-po.sh) it extracts both __('translate_me') and
_('translate_me'). Even better, it also extracts:
__('translate_me', arg2)

Given this we can now code __(arg1, arg2) to do something like:
call _(arg1) checks if equal to arg1, if not output _(arg1)
if they are equal and arg2 is not null output arg2
otherwise output en_US version of arg1
Attached patch patch that implement ___() (obsolete) — Splinter Review
This does what I said in my last post. However, cake already uses two underscores so switched to 3
Attachment #322122 - Attachment is obsolete: true
Attachment #322162 - Flags: review?(fwenzel)
Comment on attachment 322162 [details] [diff] [review]
patch that implement ___()

I looks and works fine, therefore r+.

I'd like you to change two things though before checkin:
- I don't think bootstrap is a good place for this: it contains initial request handling code, not global functions. How about config/core.php, at the bottom?
- not sure where global $lang comes from, but I think everywhere else in the code we use the constant LANG for this, so maybe you should go like this here too: putenv('LANG='.LANG);

Either way, great work, Chris! That'll make our code much more readable despite translation fallback.
Attachment #322162 - Flags: review?(fwenzel) → review+
hey wenzel,

this does essentially what you asked the $lang thing in the last patch actually caused problems. The new way I am doing it uses getenv and does not rely on anything globally defined.
Assignee: nobody → cpollett
Attachment #322162 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #322172 - Flags: review?(fwenzel)
Comment on attachment 322172 [details] [diff] [review]
moves to core.php also avoid $lang global

ah yes, that looks good. Resetting the locale to what it was before seems like a good idea too.
Attachment #322172 - Flags: review?(fwenzel) → review+
Cool. I checked it into r13512. I am closing this bug. But people can feel to reopen it if they want a different solution.
Status: ASSIGNED → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
>+    $my_lang = getenv('LANG');
>+    $my_lc_all = getenv('LC_ALL');
>+    putenv('LANG=en_US');
>+    putenv('LC_ALL=en_US');	
>+    setlocale(LC_ALL, "en_US");
>+    $output = _($to_translate);
>+
>+    putenv("LANG=".$my_lang);
>+    putenv("LC_ALL=".$my_lc_all);	
>+    setlocale(LC_ALL, $my_lc_all);

This looks much simpler than config/language.php::setCurrentLanguage.

Doesen't comments like thes still hold?:

        // Set the language.  We can't use LC_ALL here because it includes LC_CTYPE
        // and some languages (I'm looking at you Turkish!) will break php when
        // string functions are used
(In reply to comment #16)
> >+    $my_lang = getenv('LANG');
> >+    $my_lc_all = getenv('LC_ALL');
> >+    putenv('LANG=en_US');
> >+    putenv('LC_ALL=en_US');	
> >+    setlocale(LC_ALL, "en_US");
> >+    $output = _($to_translate);
> >+
> >+    putenv("LANG=".$my_lang);
> >+    putenv("LC_ALL=".$my_lc_all);	
> >+    setlocale(LC_ALL, $my_lc_all);
> 
> This looks much simpler than config/language.php::setCurrentLanguage.
> 
> Doesen't comments like thes still hold?:
> 
>         // Set the language.  We can't use LC_ALL here because it includes
> LC_CTYPE
>         // and some languages (I'm looking at you Turkish!) will break php when
>         // string functions are used
> 

As far as I know this is still a problem with PHP and this patch should be fixed to not use LC_ALL.  The code we're using right now is in /config/language.php.  The original discovery of the problem is bug 366316.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
LC_ALL here is only being used to translate english. Then the previous value is restored. So it shouldn't be a problem??
Keywords: push-needed
As long as setting LC_ALL doesn't change the other values it should be fine.
Status: REOPENED → RESOLVED
Closed: 16 years ago16 years ago
Resolution: --- → FIXED
Keywords: push-needed
Product: addons.mozilla.org → addons.mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: