Closed Bug 900182 Opened 11 years ago Closed 10 years ago

pseudo localizations for developer builds

Categories

(Firefox OS Graveyard :: Gaia::Build, defect)

defect
Not set
normal

Tracking

(tracking-b2g:backlog, b2g-v2.1 verified, b2g-v2.2 verified)

VERIFIED FIXED
1.3 C2/1.4 S2(17jan)
tracking-b2g backlog
Tracking Status
b2g-v2.1 --- verified
b2g-v2.2 --- verified

People

(Reporter: Pike, Assigned: stas)

References

Details

(Whiteboard: [rtl-meta][ucid:SystemPlatform39, ft:system-platform])

Attachments

(8 files, 1 obsolete file)

Pseudo localizations are fake localizations that can be generated from the en-US strings programmatically. They're great in that they're readable with English knowledge, but expose standard challenges in localizability (l12y).

We should include a few of those into the developer workflow in gaia.

I'd start with a simple unicode transliteration plus a tad of extension of string length, and a RTL one.

We can grab ideas from https://github.com/translate/translate/blob/master/translate/tools/podebug.py#L140.
Vivien, is functionality like this OK to be in python, or should we strive for a js-only implementation?
Flags: needinfo?(21)
(In reply to Axel Hecht [:Pike] from comment #1)
> Vivien, is functionality like this OK to be in python, or should we strive
> for a js-only implementation?

We're moving the build system to run on top of Firefox or on top of Node.js without python dependency so js sounds better.

I'm curious to learn about |standard challenges| as and definitively will be happy to have anything that can help inside the build system.
Flags: needinfo?(21)
Assignee: nobody → yurenju.mozilla
Whiteboard: [ucid:SystemPlatform39, 1.4:p2, ft:system-platform]
Blocks: gaia-rtl
Whiteboard: [ucid:SystemPlatform39, 1.4:p2, ft:system-platform]
Whiteboard: [ucid:SystemPlatform39, 1.4:p2, ft:system-platform]
Status: NEW → ASSIGNED
Target Milestone: --- → 1.3 C2/1.4 S2(17jan)
add integration tset cases for multilocale first.
Attachment #8357680 - Flags: review?(timdream)
Comment on attachment 8357680 [details] [review]
github PR: https://github.com/mozilla-b2g/gaia/pull/15149

Given that travis fails because helper.js is missing, re-requesting review from Tim. I'm pretty sure this should be an r-, though.

Yuren, can you check if you forgot to git add build/helper.js?
Attachment #8357680 - Flags: review+ → review?(timdream)
Sorry guys, I probably was sleepy last day...
Attachment #8357680 - Flags: review?(timdream) → review+
I would add a function |enablePseudo()| to L10nManager.
workable WIP for running gaia on browser :D
https://github.com/yurenju/gaia/commit/8cb226ceb879b1de8f6c8703c2a88f12d3f87d9c

I migrate some code from Pike's pull request:
https://github.com/Pike/gaia/compare/master...l10n-tools


I also expose an API |parseProperties()| from l10n.js or we need implement it in build system again. since Kaze is major author of l10n.js, I would like to know your opinion on this change.

Kaze, does it make sense?
Flags: needinfo?(kaze)
Kaze looks very busy so I'll keep my modification in l10n.js and will request a review to pike.
Flags: needinfo?(kaze)
Since we are refactoring gaia build system to use build_stage to assemble apps there, we should wait refactoring process finish then start to fix this bug to fix this bug easier.
Depends on: 968666
Move this one to v1.5 based on comment 17
Whiteboard: [ucid:SystemPlatform39, 1.4:p2, ft:system-platform] → [ucid:SystemPlatform39, 1.5, ft:system-platform]
No longer blocks: 1.4-system-platform
Move it to backlog since build script refactoring is the prerequisite for working on this one.
No longer blocks: 2.0-system-platform
blocking-b2g: --- → backlog
Whiteboard: [ucid:SystemPlatform39, 1.5, ft:system-platform] → [ucid:SystemPlatform39, ft:system-platform]
Status: ASSIGNED → NEW
Blocks: 995169
Component: Gaia → Gaia::Build
*Commenting my concerns (again) on the bug related to it*

> Noted that after bug 968666 and bug 900182, I intend to remove these locale
> strings from Gaia tree; we will programmatically generate fake locales (e.g.
> Accented English zz-ZZ) when build Gaia instead.

Let's not forget that Arabic (among others) use a different Font (Droid Arabic Naskh, Bug #893530) which means detecting style issues for Arabic will be harder (I'm filing one right now btw) for localizers if they don't have Arabic itself in their daily FOTAs, not to mention that not all localizers are technical enough to understand how to get a locale, merge it into gaia's and build it into their devices.

Axel, thoughts?
Flags: needinfo?(l10n)
The localizations in the gaia repo are not intended for l10n testing, but for l12y testing. They're aiming to help developers understand the impact on their work on localization, by a few examples.

That was nice, but it's not helping developers a lot if the strings aren't there that they're hacking on, and if they can't read the UI.

Thus pseudo l10n is the way to go for developer feedback.

For localizers, we should make things simpler. Juren (IIRC) has the plan to make gaia buildable from inside Firefox via an addon. That should help.
But moreover, the reference device will just have all locales and frequent updates on at least one channel, and that'll be that. Little need for everyone to rely on the localizations currently in the gaia repo.
Flags: needinfo?(l10n)
Attached patch WIP 1 (obsolete) — Splinter Review
I felt like this bug was an important part of our recent RTL-focused efforts, so I decided to take a stab at it.  Here's a WIP that I put together.  You can also check out the branch at https://github.com/stasm/gaia/tree/pseudolocales.

I'd like to take this bug and assign to myself.  Yuren, are you okay with this?

The patch adds two pseudo-locales defined in shared/resources/languages.json: en-US-accented and en-US-rtl.  They will only be available in developer builds.

en-US-accented has its English letters changed to their pretty accented counterparts (e.g. ȧƀƈḓḗƒɠħīĵķŀḿƞǿƥɋřşŧŭṽẇẋẏẑ for the lowercase letters).  In addition, every vowel is repeated twice, which increases the length of strings by around a third. (Vowel letters (aeiou) constitute ~38% of the total usage of letters in English.)

en-US-rtl is defined as an RTL language in shared/js/l10n.js and surrounds all its words with RLO-PDF Unicode codes.  RLO stands for right-to-left override and forces the English text to be displayed right-to-left;  PDF resets the override.  This approach is preferred over, for instance, trying to manually reverse the order of letters, because it handles punctuation better.  The result is a bit weird, but can be read by English-speaking users.  See http://www.w3.org/International/questions/qa-bidi-controls for more info on those Unicode codes.

I based my work on ideas implemented in Translate Toolkit: 

  https://github.com/translate/translate/blob/master/translate/tools/podebug.py#L140

and Chromium:

  https://code.google.com/p/chromium/issues/detail?id=73052
  https://code.google.com/p/chromium/codesearch#chromium/src/tools/grit/grit/pseudo_rtl.py&sq=package:chromium
  https://code.google.com/p/chromium/codesearch#chromium/src/tools/grit/grit/pseudo.py&sq=package:chromium

I kind of wished I'd added the Swedish Chef speak, but ultimately, en-US-accented seemed like a more useful and readable solution.

Since this is a WIP, a few caveats:

 - I didn't remove ar, fr and zh-TW for now;  it can be done in another bug or here,
 - there are currently lots of error message about en-US-{accented,rtl) files not being found;  I'll see what I can do to silence them,
 - there are no tests in the WIP;  I'll work on tests next, after I get some f+'s.

Some questions to feedbackees:

 - Yuren: the patch is orthogonal to any work related to multilocale.js, and that's on purpose.  It benefits from the modularity of the refactored l10n.js (bug 914414) and mostly adds logic to build/l10n.js.  Do you have any objections?
 - Axel: what do you think about the specific pseudolocales that I implemented?
 - Gandalf: same question as Axel + general feedback on changes to build/l10n.js and shared/js/l10n.js, as the module peer, please.
 - Ahmed:  I would appreciate it if you flashed my branch and change the language to '[Dev] English RTL.'  Does it make sense?  Do you think this can be useful to English-speaking developers when testing for RTL layout bugs?

Thanks for your help, guys.
Attachment #8418394 - Flags: feedback?(yurenju.mozilla)
Attachment #8418394 - Flags: feedback?(nefzaoui.ahmed)
Attachment #8418394 - Flags: feedback?(l10n)
Attachment #8418394 - Flags: feedback?(gandalf)
Attached image Screenshot of en-US-rtl
Comment on attachment 8418394 [details] [diff] [review]
WIP 1

Review of attachment 8418394 [details] [diff] [review]:
-----------------------------------------------------------------

A few comments:

We should have a gecko pref intl.uidirection.something=rtl to make CSS work, I think.

I found the screenshot for RTL hard to decipher, I wonder if it'd be easier with the flipped map from https://github.com/translate/translate/blob/master/translate/tools/podebug.py#L158. Not sure what their natural bidi direction is, maybe we can drop the RTL markers.
Anyway, could you give that a spin and attach a screenshot for that?

I haven't done an in-depth look at the actual code, though.

The pseudo-latin one does some string length expanding by doubling every vowel, right? Did you check how much that expands in practice? It'd be great to have a number or set of numbers to compare with flod's stats. But that's also cool for a follow-up, IMHO.
Attachment #8418394 - Flags: feedback?(l10n) → feedback+
Comment on attachment 8418394 [details] [diff] [review]
WIP 1

One more, I don't think that en-US-rtl validates as locale code, we should use one that does. en-x-rtl would probably, but I wonder if there's actually a commonly used on in the industry. I did some googling a while back, and didn't see anything, though.
(In reply to Axel Hecht [:Pike] from comment #25)

> I found the screenshot for RTL hard to decipher, I wonder if it'd be easier
> with the flipped map from
> https://github.com/translate/translate/blob/master/translate/tools/podebug.
> py#L158. Not sure what their natural bidi direction is, maybe we can drop
> the RTL markers.
> Anyway, could you give that a spin and attach a screenshot for that?

Sure, here it is.  Some glyphs are missing, I'd need to look for alternatives or fix the font.  I think it does look better, thanks for the suggestion.  What's your take?
Flags: needinfo?(l10n)
(In reply to Axel Hecht [:Pike] from comment #25)
> Comment on attachment 8418394 [details] [diff] [review]
> WIP 1
> 
> Review of attachment 8418394 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> A few comments:
> 
> We should have a gecko pref intl.uidirection.something=rtl to make CSS work,
> I think.

I found bug 478416 which introduced it, but I'll need to dig into it to understand how to use it.

BTW, Android has a similar approach:  instead of having an extra pseudolocale for RTL, it has a pref that forces RTL layout on all locales.  This doesn't change the actual order of letters, though.

> The pseudo-latin one does some string length expanding by doubling every
> vowel, right? Did you check how much that expands in practice? It'd be great
> to have a number or set of numbers to compare with flod's stats. But that's
> also cool for a follow-up, IMHO.

Yes, that's correct.  I estimate that it expands strings by 10-30% on average.  I'd love to get flod's help to make exact measurements.  The main problem is that it mostly affects strings which are already quite long.  Words which usually cause the most problems in localizations, like OK, No, Next, Back, Month, Week, Day etc. are unfortunately minimally affected.  I'd like to try adding some heuristics that would extend short strings more.  This might be a good follow-up bug.

(In reply to Axel Hecht [:Pike] from comment #26)
> Comment on attachment 8418394 [details] [diff] [review]
> WIP 1
> 
> One more, I don't think that en-US-rtl validates as locale code, we should
> use one that does. en-x-rtl would probably, but I wonder if there's actually
> a commonly used on in the industry. I did some googling a while back, and
> didn't see anything, though.

Right, I haven't given it a thought yet.  We could try using 'x' for private tags: x-rtl, x-accented, which gives us the most flexibility, but wouldn't work well with websites using navigator.language for client-sde language negotiation.  Or we could maybe add a 4-letter-long script tag to en-US: en-bidi-US, en-accn-US?  Do you have other suggestions?  I'll dive into the RFC (https://tools.ietf.org/html/rfc5646) and see if there are other options.
Comment on attachment 8418394 [details] [diff] [review]
WIP 1

that's great and we can solve those bugs in parallel!
Attachment #8418394 - Flags: feedback?(yurenju.mozilla) → feedback+
Assignee: yurenju.mozilla → nobody
(In reply to Staś Małolepszy :stas from comment #28)
> Yes, that's correct.  I estimate that it expands strings by 10-30% on
> average.  I'd love to get flod's help to make exact measurements.  

You can find the script I used here. In the same folder there's a view in PHP to display the json file as a table.
https://github.com/flodolo/scripts/blob/master/mozilla_l10n/compare_string_length_gaia.py

This was the result against mozilla-beta
http://l10n.mozilla-community.org/~flod/compare_length/
I found http://msdn.microsoft.com/en-us/library/windows/desktop/dd319106%28v=vs.85%29.aspx using

qps-ploc for pseudo localization, and qps-plocm for mirrored (RTL).

Looking at some code-points, we'll need to keep the bidi markers, I guess.
Flags: needinfo?(l10n)
(In reply to Yuren [:yurenju] from comment #29)
> that's great and we can solve those bugs in parallel!

Thanks, Yuren. I'm assigning the bug to myself.
Assignee: nobody → stas
Comment on attachment 8418394 [details] [diff] [review]
WIP 1

>  - Ahmed:  I would appreciate it if you flashed my branch and change the language to '[Dev] English RTL.'  Does it make sense?  Do you think this can be useful to English-speaking developers when testing for RTL layout bugs?

Yes, definitely super useful.
Thank you very much for this one, I can see it boosting developers testing feedback about RTL. :)
Attachment #8418394 - Flags: feedback?(nefzaoui.ahmed) → feedback+
Attached patch WIP 2Splinter Review
Guys, thanks a lot for your positive feedback! Here's a new WIP with comments addressed.

- RTL now uses mirrored text (English → ɥsıʅƃuƎ); this is much easier to read than hsilgnE.
- Changed the locale codes to qps-ploc and qps-plocm, just like Microsoft does.  qps is reserved for private and local use as per the RFC.
- Added the intl.uidirection.qps-plocm=rtl pref to better emulate the conditions in which regular RTL locales are displayed.

I'll start adding tests tomorrow.  Gandalf, any thoughts on the general structure?
Attachment #8418394 - Attachment is obsolete: true
Attachment #8418394 - Flags: feedback?(gandalf)
Attachment #8419044 - Flags: feedback?(gandalf)
Comment on attachment 8419044 [details] [diff] [review]
WIP 2

Review of attachment 8419044 [details] [diff] [review]:
-----------------------------------------------------------------

::: build/l10n.js
@@ +72,4 @@
>    }
>  
>  
> +  /* Helpers */

it would be nice to have a short description of what this code does at the top of this block (and separate file in l20n.js repo).

@@ +128,5 @@
> +  function makeRTL(val) {
> +    return val.replace(reWords, function(match) {
> +      return '\u202e' + match + '\u202c';
> +    });
> +  }

please, document this function

@@ +228,4 @@
>        }
>  
>        if (locale.ast && locale.ast.hasOwnProperty(id)) {
> +        return walkContent(locale.ast[id], pseudolocalize.bind(this));

walkContent should be used only on pseudolocalize languages.

@@ +265,4 @@
>        var attrs = L10n.getL10nAttributes(elements[i]);
>        var val = this.ctx.getEntitySource(attrs.id);
>        ast[attrs.id] = val;
> +      walkContent(val, getPlaceables.bind(this, ast));

same here
Attachment #8419044 - Flags: feedback?(gandalf) → feedback+
Attachment #8423871 - Flags: review?(gandalf)
Making the unit tests run both in our repo and in Gaia is sometimes tricky, to say the least.  I'm quite happy with the solution I found for this bug.  The challenge was that the pseudolocales logic lives in build/l10n.js, not in shared/js/l10n.js.  As soon as I learned that I can require('/build/l10n.js') from the unit tests in Gaia (thanks Pike) things went pretty smooth.

This patch creates pseudolocalization on buildtime and saves them into locales-obj/qps-*.json files.  This has the advantage of keeping things lean on the runtime.  The downside is that you need to set GAIA_CONCAT_LOCALES=1 for pseudolocales to work (with =0 you'll still see the pseudolocales listed in the language panel, but upon choosing, all text will be identical to English).

try: https://tbpl.mozilla.org/?tree=Gaia-Try&rev=f82189047b08
travis: https://travis-ci.org/mozilla-b2g/gaia/builds/25334481

Gandalf & Vivien, can you review this approach and the patch, please?
Attachment #8423878 - Flags: review?(gandalf)
Attachment #8423878 - Flags: review?(21)
Blocks: 1011519
Blocks: 1011520
Blocks: 1011528
Follow-ups:

  Bug 1011519 - Remove ar, fr and zh-TW from Gaia source repo
  Bug 1011520 - Add a fake East-Asian pseudolocale
  Bug 1011528 - Improve the heuristics used to make Accented English strings longer
Comment on attachment 8423871 [details] [diff] [review]
Patch against l20n.js repo

Review of attachment 8423871 [details] [diff] [review]:
-----------------------------------------------------------------

that looks good.

I'm a little bit worried that this code continues to bind is to the notion of entity=string (by your Entity.toString), but I guess that we can change it fairly easily once we get to the point where we will be pushing multi-variant entities.
Attachment #8423871 - Flags: review?(gandalf) → review+
Comment on attachment 8423878 [details] [diff] [review]
Patch against Gaia

Review of attachment 8423878 [details] [diff] [review]:
-----------------------------------------------------------------

::: build/test/integration/build.test.js
@@ +695,5 @@
> +    test('build with GAIA_CONCAT_LOCALES=0 doesn\'t include pseudolocales', function(done) {
> +      helper.exec('GAIA_CONCAT_LOCALES=0 make', function(error, stdout, stderr) {
> +        helper.checkError(error, stdout, stderr);
> +        var zipPath = path.join(process.cwd(), 'profile', 'webapps',
> +          'settings.gaiamobile.org', 'application.zip');

Maybe system instead of settings? It's a small difference but it sounds better to me to bank on that System app exists than Settings.

@@ +712,5 @@
> +        helper.checkError(error, stdout, stderr);
> +        var zipPath = path.join(process.cwd(), 'profile', 'webapps',
> +          'settings.gaiamobile.org', 'application.zip');
> +        var zip = new AdmZip(zipPath);
> +        var enUSFileInZip = zip.getEntry('locales-obj/en-US.json');

In the future, is there an option we will want to build without en-US at all? And in result we will want to accent another language that will be a base then?
Attachment #8423878 - Flags: review?(gandalf) → review+
Comment on attachment 8423878 [details] [diff] [review]
Patch against Gaia

Review of attachment 8423878 [details] [diff] [review]:
-----------------------------------------------------------------

::: apps/sharedtest/test/unit/l10n/lib/util_test.js
@@ +92,5 @@
> +    });
> +
> +  });
> +
> +  describe.skip('two-level-deep dict', function(){

This is skipped b/c l10n.js currently doesn't support nested dicts (there's no need for it to support them atm).  Gandalf, do you want me to remove this, or leave skipped?

@@ +123,5 @@
> +    });
> +
> +  });
> +
> +  describe.skip('attribute that is a one-level-deep dict', function(){

Oops, this should not be skipped.  These tests pass.  I'll change this locally before I land.
(In reply to Staś Małolepszy :stas from comment #41)
> Gandalf, do you want me to remove this, or leave skipped?

Leave it skipped.

I see us getting back to it when we'll be working on AST cleanups (and L20n format).
(In reply to Zibi Braniecki [:gandalf] from comment #39)
> I'm a little bit worried that this code continues to bind is to the notion
> of entity=string (by your Entity.toString), but I guess that we can change
> it fairly easily once we get to the point where we will be pushing
> multi-variant entities.

Multi-variant translations are supported via walkContent.  Currently this happens in getDictionary, but could be changed in the future.  In this step, we walk the AST and change all variants as needed.

The modification to Entity.prototypy.toString is needed to localize the visible DOM if one of the pseudolocales happens to be the default locale.  Note that in that case we don't care about all of the AST, just the value returned by the compiler, to which an entity was resolved.  This aligns well with the contract we have with the developer to return strings.

(In reply to Zibi Braniecki [:gandalf] from comment #40)
> Maybe system instead of settings? It's a small difference but it sounds
> better to me to bank on that System app exists than Settings.

Sure, I'll change it.

> In the future, is there an option we will want to build without en-US at
> all? And in result we will want to accent another language that will be a
> base then?

I doubt this will ever become a thing.  en-US is the de facto language of the development.  We might have builds without it, but that would require special make flags, and the test would not be affected.
Blocks: 1012702
Comment on attachment 8423878 [details] [diff] [review]
Patch against Gaia

As per an IRC chat with Vivien, moving the review request off of him and trying Yuren.  Can you take a look, please?  The changes mostly affect the buildtime version of l10n.js.  Thanks!
Attachment #8423878 - Flags: review?(21) → review?(yurenju.mozilla)
Stas, could you also send a pull request to ensure all tests are passed?
Flags: needinfo?(stas)
I saw the result of CI on comment 37.
Flags: needinfo?(stas)
Comment on attachment 8423878 [details] [diff] [review]
Patch against Gaia

Stas,

after reading the patch, code looks good but I don't think l10n should be a part of optimization process (webapp-optimize.js) in gaia build system and I apologize because I didn't realize this issue on last feedback request.

but, because we are blocking this feature by build system refactoring for a long time, I sugguest landing this patch to get pseudo localization first and file a follow-up bug to mirgrate those code from l10n.js to multilocale.js.

then we can generate *.properties files for pseudo localization and concat them into locale-obj/*.json in webapp-optimize.js if needed.
Attachment #8423878 - Flags: review?(yurenju.mozilla) → review+
Thanks, Yuren.  That sounds like a good way forward.

Commit: https://github.com/mozilla-b2g/gaia/commit/85fa5edba944ee4c7672166a15d3f5a14d2bd3e7
Merged: https://github.com/mozilla-b2g/gaia/commit/e58fd8cb68010988233c87d71cd783981b5b48bd

L20n.js: https://github.com/l20n/l20n.js/commit/f3e76d8e0da1ec1ee2839ffc408e6d33192f25dc

Thanks to everyone involved for your feedback!
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Blocks: 1028127
Blocks: 1036380
No longer blocks: 995169
Verified the issue is fixed on 2.2 and 2.1 Flame

Pseudo-localization is implemented in the "Developer" menu

"Flame 2.2

Device: Flame 2.2 Master (319mb)(Kitkat Base)(Full Flash)
BuildID: 20141203040207
Gaia: 725685831f5336cf007e36d9a812aad689604695
Gecko: 2c9781c3e9b5
Gonk: 48835395daa6a49b281db62c50805bd6ca24077e
Version: 37.0a1 (2.2 Master)
Firmware: V188-1
User Agent: Mozilla/5.0 (Mobile; rv:37.0) Gecko/37.0 Firefox/37.0"

"Flame 2.1 

Device: Flame 2.1 (319mb)(KitKat)(Full Flash)
Build ID: 20141203001205
Gaia: dbaf3e31c9ba9c3436e074381744f2971e15c7bf
Gecko: ebce587d2194
Gonk: 48835395daa6a49b281db62c50805bd6ca24077e
Version: 34.0 (2.1)
Firmware Version: v188-1
User Agent: Mozilla/5.0 (Mobile; rv:34.0) Gecko/34.0 Firefox/34.0"
==================================================================
Cannot verify the fix for 2.0

Cannot locate the "pseudo-menu"
Status: RESOLVED → VERIFIED
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(ktucker)
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]
Flags: needinfo?(ktucker)
Mass Edit: adding the [rtl-meta]
Whiteboard: [ucid:SystemPlatform39, ft:system-platform] → [rtl-meta]
Whiteboard: [rtl-meta] → [rtl-meta][ucid:SystemPlatform39, ft:system-platform]
blocking-b2g: backlog → ---
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: