Closed Bug 1449505 Opened 6 years ago Closed 6 years ago

Add mozIntl.getLocaleDisplayNames

Categories

(Core :: Internationalization, enhancement, P3)

enhancement

Tracking

()

RESOLVED FIXED
mozilla61
Tracking Status
firefox61 --- fixed

People

(Reporter: zbraniecki, Assigned: zbraniecki)

References

Details

Attachments

(7 files, 1 obsolete file)

59 bytes, text/x-review-board-request
jfkthame
: review+
Details
59 bytes, text/x-review-board-request
jfkthame
: review+
Details
59 bytes, text/x-review-board-request
jfkthame
: review+
Details
59 bytes, text/x-review-board-request
jaws
: review+
Details
59 bytes, text/x-review-board-request
jfkthame
: review+
Details
59 bytes, text/x-review-board-request
jaws
: review+
Details
59 bytes, text/x-review-board-request
MattN
: review+
Details
We would like to get a mozIntl API for retrieving display names for a list of locales.

The initial simple API could work like this:

```
let values = mozIntl.getLocaleDisplayNames([
  "en-US",
  "de-DE",
  "fr-CA"
]);

values === [
  "English (American)",
  "German (Germany)",
  "French (Canada)"
];
```

I'd prefer not to use existing `mozIntl.GetDisplayNames` because we will tinker more with this API and compose the data from some custom source (.FTL files?) for some locales while using CLDR for others.

So it is possible that one day we will fold this into `GetDisplayNames`, but let's start with a separate API.
This approach will not work now. We do not package language/region/script names with our ICU.

I think it's ok, since we don't need them in JS Intl API, it doesn't make much sense to ship Firefox with 100 language names in 100 locales, if we only need 100 language names in one locale.

So instead, we may want to get the same API but store data for it in l10n resources. I'll try that.
Andre - can you teach me how to check the size impact if I wanted to enable this data table in our ICU? (list of language/region names)
Flags: needinfo?(andrebargull)
Removing these four lines in update-icu.sh should probably be sufficient to enable language, region, and script names: https://searchfox.org/mozilla-central/rev/028cd8342899357d80fafba2d528a0fd5819e316/intl/update-icu.sh#38-39,46-47

After modifying update-icu.sh, the ICU data file needs to be rebuild with:
--
export PYTHONPATH=$MOZ_INBOUND/python/mozbuild/
cd $MOZ_INBOUND/intl
./update-icu.sh https://ssl.icu-project.org/repos/icu/tags/release-60-2/icu4c
---


Rebuilding ICU will downgrade the embedded tzdata to 2017c, upgrading it to 2018c requires these steps:
---
cd $MOZ_INBOUND/intl
./update-tzdata.sh -e $ICU60/bin/icupkg 2018c
---


This will increase config/external/icu/data/icudt60l.dat from 11,585,936 bytes to 14,464,288 bytes.

Try-build with both changes performed: https://treeherder.mozilla.org/#/jobs?repo=try&revision=080ca7cd96e386f54aa39557bfd486da142510f8
Flags: needinfo?(andrebargull)
seems like 700-800k! That's pretty major. Thanks Andre! I'll continue pursuing the FTL path rather than CLDR.
Ok, we'll start with just unifying the code we have today (using the .properties files) into mozIntl, and then we can start improving.

I'm also going to switch us to use CLDR-like logic for patterns and separators.
Assignee: nobody → gandalf
Status: NEW → ASSIGNED
Priority: -- → P3
Attachment #8963117 - Attachment is obsolete: true
Comment on attachment 8965154 [details]
Bug 1449505 - Migrate InlineSpellChecker.jsm to use mozIntl.getLocaleDisplayNames.

https://reviewboard.mozilla.org/r/233832/#review239502


Code analysis found 2 defects in this patch:
 - 2 defects found by mozlint

You can run this analysis locally with:
 - `./mach lint path/to/file` (JS/Python)


If you see a problem in this automated review, please report it here: http://bit.ly/2y9N9Vx


::: toolkit/modules/InlineSpellChecker.jsm:7
(Diff revision 1)
>   * License, v. 2.0. If a copy of the MPL was not distributed with this
>   * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
>  
>  var EXPORTED_SYMBOLS = [ "InlineSpellChecker",
>                           "SpellCheckHelper" ];
>  var gLanguageBundle;

Error: 'glanguagebundle' is defined but never used. [eslint: no-unused-vars]

::: toolkit/modules/InlineSpellChecker.jsm:8
(Diff revision 1)
>   * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
>  
>  var EXPORTED_SYMBOLS = [ "InlineSpellChecker",
>                           "SpellCheckHelper" ];
>  var gLanguageBundle;
>  var gRegionBundle;

Error: 'gregionbundle' is defined but never used. [eslint: no-unused-vars]
Comment on attachment 8965158 [details]
Bug 1449505 - Migrate Preferences languages.js to use mozIntl.getLocaleDisplayNames.

https://reviewboard.mozilla.org/r/233840/#review239504


Code analysis found 3 defects in this patch:
 - 3 defects found by mozlint

You can run this analysis locally with:
 - `./mach lint path/to/file` (JS/Python)


If you see a problem in this automated review, please report it here: http://bit.ly/2y9N9Vx


::: browser/components/preferences/languages.js:80
(Diff revision 1)
> -            this._availableLanguagesList.push(li);
> -          }
> -        }
> +      }
> -      }
> +    }
> +
> +    let localeNames = Services.intl.getLocaleNames(undefined, localeCodes);

Error: 'localenames' is assigned a value but never used. [eslint: no-unused-vars]

::: browser/components/preferences/languages.js:83
(Diff revision 1)
> -      }
> +    }
> +
> +    let localeNames = Services.intl.getLocaleNames(undefined, localeCodes);
> +
> +    for (let i in localeCodes) {
> +      let isVisible = localeValues[i] == "true" && 

Error: Trailing spaces not allowed. [eslint: no-trailing-spaces]

::: browser/components/preferences/languages.js:87
(Diff revision 1)
> +    for (let i in localeCodes) {
> +      let isVisible = localeValues[i] == "true" && 
> +        (!(localeCodes[i] in this._acceptLanguages) || !this._acceptLanguages[localeCodes[i]]);
> +
> +      let name = bundlePreferences.getFormattedString("languageCodeFormat",
> +        [localesNames[i], localeCodes[i]]);

Error: 'localesnames' is not defined. [eslint: no-undef]
Comment on attachment 8965159 [details]
Bug 1449505 - Migrate FormAutofill to use mozIntl.getLocaleDisplayNames.

https://reviewboard.mozilla.org/r/233842/#review239506


Code analysis found 1 defect in this patch:
 - 1 defect found by mozlint

You can run this analysis locally with:
 - `./mach lint path/to/file` (JS/Python)


If you see a problem in this automated review, please report it here: http://bit.ly/2y9N9Vx


::: browser/extensions/formautofill/FormAutofillStorage.jsm:1268
(Diff revision 1)
>      // Compute country name
>      if (!("country-name" in address)) {
> -      if (address.country && REGION_NAMES[address.country]) {
> -        address["country-name"] = REGION_NAMES[address.country];
> +      if (address.country) {
> +        try {
> +          address["country-name"] = Services.intl.getRegionDisplayNames(undefined, [address.country]);
> +        } catch() {

Error: Parsing error: unexpected token ) [eslint]
Thanks Code Review Bot! This is WIP, but I'll take your linting anyway :)
Comment on attachment 8965158 [details]
Bug 1449505 - Migrate Preferences languages.js to use mozIntl.getLocaleDisplayNames.

https://reviewboard.mozilla.org/r/233840/#review240416


Code analysis found 1 defect in this patch:
 - 1 defect found by mozlint

You can run this analysis locally with:
 - `./mach lint path/to/file` (JS/Python)


If you see a problem in this automated review, please report it here: http://bit.ly/2y9N9Vx


::: browser/components/preferences/languages.js:83
(Diff revision 2)
> -      }
> +    }
> +
> +    let localeNames = Services.intl.getLocaleNames(undefined, localeCodes);
> +
> +    for (let i in localeCodes) {
> +      let isVisible = localeValues[i] == "true" && 

Error: Trailing spaces not allowed. [eslint: no-trailing-spaces]
This patchset unifies all uses of language/region/locale display name formatting into mozIntl. It's still using the same .properties files for now.

In the follow-ups I'll want to migrate us to use .ftl files generated out of CLDR data.

The only functional change is that we use English patterns from CLDR which means that locales are formatted to "$lang ($region)" rather than "$lang/$region". I think it's a better choice and it'll make it easier to switch to CLDR/ICU later.
Comment on attachment 8965153 [details]
Bug 1449505 - Add mozIntl.getLocaleDisplayNames.

https://reviewboard.mozilla.org/r/233830/#review240622

::: toolkit/components/mozintl/mozIntl.js:16
(Diff revision 3)
>    Cc["@mozilla.org/intl/ospreferences;1"].getService(Ci.mozIOSPreferences);
>  
>  /**
> + * RegExp used to parse a BCP47 language tag (ex: en-US, sr-Cyrl-RU etc.)
> + */
> +const languageTagMatch = /^([a-z]{2,3}|[a-z]{4}|[a-z]{5,8})(?:[-_]([a-z]{4}))?(?:[-_]([A-Z]{2}|[0-9]{3}))?((?:[-_](?:[a-z0-9]{5,8}|[0-9][a-z0-9]{3}))*)(?:[-_][a-wy-z0-9](?:[-_][a-z0-9]{2,8})+)*(?:[-_]x(?:[-_][a-z0-9]{1,8})+)?$/i;

I don't understand this regex... it starts out with

    ([a-z]{2,3}|[a-z]{4}|[a-z]{5,8})

which seems like it would be equivalent to just

    ([a-z]{2,8})

and doesn't make sense as the beginning of a bcp47 tag. Shouldn't the primary language subtag always be 2-3 chars? (Or what am I missing?)

Anyhow, why aren't we using our Locale object to implement this? That already supports parsing a bcp47 tag into its subtags, right?

::: toolkit/components/mozintl/mozIntl.js:227
(Diff revision 3)
> +    const localePattern = "{0} ({1})";
> +    const localeSeparator = ", ";

This seems like something we need to fix before we can actually use this. In particular, inserting comma as a delimiter would look pretty bad in several non-Latin-script locales.
> Anyhow, why aren't we using our Locale object to implement this? That already supports parsing a bcp47 tag into its subtags, right?

Yeah, except that it is in C++ and we don't expose it via XPIDL. We will have a parser in Intl.Locale which is now in Stage 3 (bug 1433303), but we're waiting for some dependencies there.

I thought that we could move the code that we already have (the regexp is from here [0]). If you prefer not to, I'm ok blocking on Intl.Locale here.

> This seems like something we need to fix before we can actually use this. In particular, inserting comma as a delimiter would look pretty bad in several non-Latin-script locales.

I agree, but unfortunately that's not a trivial fix. We don't seem to carry this data in our ICU and it's not easy to only add those two cells.

I'm working on a plan to add a systematic way to pull CLDR data into our l10n resources (FTL files) to have those cells only for locales we localize to, but that's going to take a few months.

My understanding is that even if it would look bad:

1) It's going to be very rare (we don't have many variants/scripts)
2) It's an improvement over the current situation (we currently format using hardcoded patterns with latin chars as well, just worse ones).

If you prefer to block on this project, I'll try to bump the priority but it is going to delay this bug by a few months at least.


[0] https://searchfox.org/mozilla-central/source/toolkit/modules/InlineSpellChecker.jsm#250
Depends on: 1433303
(In reply to Zibi Braniecki [:gandalf][:zibi] from comment #35)
> > Anyhow, why aren't we using our Locale object to implement this? That already supports parsing a bcp47 tag into its subtags, right?
> 
> Yeah, except that it is in C++ and we don't expose it via XPIDL. We will
> have a parser in Intl.Locale which is now in Stage 3 (bug 1433303), but
> we're waiting for some dependencies there.
> 
> I thought that we could move the code that we already have (the regexp is
> from here [0]). If you prefer not to, I'm ok blocking on Intl.Locale here.

I don't mind doing this, I guess, but the regexp still seems wrong to me:

» "garbage".match(languageTagMatch)
← Array(5) [ "garbage", "garbage", undefined, undefined, "" ]

So it reckons "garbage" is a valid langtag?! Surely that's a bug.

> > This seems like something we need to fix before we can actually use this. In particular, inserting comma as a delimiter would look pretty bad in several non-Latin-script locales.
> 
> I agree, but unfortunately that's not a trivial fix. We don't seem to carry
> this data in our ICU and it's not easy to only add those two cells.
> 
> I'm working on a plan to add a systematic way to pull CLDR data into our
> l10n resources (FTL files) to have those cells only for locales we localize
> to, but that's going to take a few months.
> 
> My understanding is that even if it would look bad:
> 
> 1) It's going to be very rare (we don't have many variants/scripts)
> 2) It's an improvement over the current situation (we currently format using
> hardcoded patterns with latin chars as well, just worse ones).

Fair enough, if we're already broken in this regard I guess it doesn't hurt to do an incomplete version here too.
> I don't mind doing this, I guess, but the regexp still seems wrong to me:

Good catch. Will update the regexp!
Comment on attachment 8965159 [details]
Bug 1449505 - Migrate FormAutofill to use mozIntl.getLocaleDisplayNames.

https://reviewboard.mozilla.org/r/233842/#review240750

::: browser/extensions/formautofill/FormAutofillUtils.jsm:852
(Diff revision 3)
>      element.textContent = bundle.GetStringFromName(element.getAttribute(attributeName));
>      element.removeAttribute(attributeName);

Nit: It would be clearer if you moved these two lines to the data-localization case to make the two cases consistent. Then you can get rid of the function-scoped `bundle` var.
Attachment #8965159 - Flags: review?(MattN+bmo) → review+
(In reply to Jonathan Kew (:jfkthame) from comment #36)
> I don't mind doing this, I guess, but the regexp still seems wrong to me:
> 
> » "garbage".match(languageTagMatch)
> ← Array(5) [ "garbage", "garbage", undefined, undefined, "" ]
> 
> So it reckons "garbage" is a valid langtag?! Surely that's a bug.

Heh, it seems that it is not.

See http://schneegans.de/lv/

Basically, https://tools.ietf.org/html/bcp47#section-2.1 :

```
 language      = 2*3ALPHA            ; shortest ISO 639 code
                 ["-" extlang]       ; sometimes followed by
                                     ; extended language subtags
               / 4ALPHA              ; or reserved for future use
               / 5*8ALPHA            ; or registered language subtag
```

which makes `garbage` a well formed language in a language tag.

The regexp as is is a customized minimized attempt to reproduce the whole scope. A bit more complete version like this - https://github.com/sebinsua/ietf-language-tag-regex/blob/master/index.js - could be used as well probably.

Or we could expose a local function Anba wrote in SpiderMonkey - https://searchfox.org/mozilla-central/source/js/src/builtin/intl/CommonFunctions.js#160 but I'm not sure how easy that would be since it's not available in any XPCOM.

I'm open to suggestions, but, similarly to the patterns - this is a copy of a regexp used to parse BCP47 in the existing code, I just reorganized it to prepare for a move to Intl.Locale once that's available.

I don't like it, and I'd prefer a better solution, but neither getting a full BCP47 parser, nor trying to expose SpiderMonkey guts here seems optimal. Can you advise on what would you like to see here?
(In reply to Zibi Braniecki [:gandalf][:zibi] from comment #39)
> (In reply to Jonathan Kew (:jfkthame) from comment #36)
> > I don't mind doing this, I guess, but the regexp still seems wrong to me:
> > 
> > » "garbage".match(languageTagMatch)
> > ← Array(5) [ "garbage", "garbage", undefined, undefined, "" ]
> > 
> > So it reckons "garbage" is a valid langtag?! Surely that's a bug.
> 
> Heh, it seems that it is not.
> 
> See http://schneegans.de/lv/
> 
> Basically, https://tools.ietf.org/html/bcp47#section-2.1 :
> 
> ```
>  language      = 2*3ALPHA            ; shortest ISO 639 code
>                  ["-" extlang]       ; sometimes followed by
>                                      ; extended language subtags
>                / 4ALPHA              ; or reserved for future use
>                / 5*8ALPHA            ; or registered language subtag
> ```
> 
> which makes `garbage` a well formed language in a language tag.

Ugh... yeah, so it is indeed well-formed (although not valid).

Reading BCP47 (https://tools.ietf.org/html/bcp47#section-2.2.1) a bit more:

   4.  Four-character language subtags are reserved for possible future
       standardization.

   5.  Any language subtags of five to eight characters in length in the
       IANA registry were defined via the registration process in
       Section 3.5 and MAY be used to form the primary language subtag.
       An example of what such a registration might include is the
       grandfathered IANA registration "i-enochian".  The subtag
       'enochian' could be registered in the IANA registry as a primary
       language subtag (assuming that ISO 639 does not register this
       language first), making tags such as "enochian-AQ" and "enochian-
       Latn" valid.

       At the time this document was created, there were no examples of
       this kind of subtag.  Future registrations of this type are
       discouraged: an attempt to register any new proposed primary
       language MUST be made to the ISO 639 registration authority.
       Proposals rejected by the ISO 639 registration authority are
       unlikely to meet the criteria for primary language subtags and
       are thus unlikely to be registered.

So in practice, the {4} and {5,8} branches of the regex should never be used. 4-char subtags are "reserved for possible future standardization", which AFAIK shows no sign of happening; and as for 5-8 char subtags, there aren't any, and any future attempt to register one is "unlikely" to succeed.

Sigh... OK, I suppose we can use this, though I wish I had a better suggestion!
Comment on attachment 8965153 [details]
Bug 1449505 - Add mozIntl.getLocaleDisplayNames.

https://reviewboard.mozilla.org/r/233830/#review240884
Attachment #8965153 - Flags: review?(jfkthame) → review+
Comment on attachment 8965154 [details]
Bug 1449505 - Migrate InlineSpellChecker.jsm to use mozIntl.getLocaleDisplayNames.

https://reviewboard.mozilla.org/r/233832/#review240886

Seems like the collection of tests that used to be here is rather more extensive than the set you've included in the getLocaleDisplayNames patch; is it worth migrating more of these? (E.g. things like "es-419".)
Attachment #8965154 - Flags: review?(jfkthame) → review+
Comment on attachment 8965155 [details]
Bug 1449505 - Migrate NarrateControls.jsm to use mozIntl.getLocaleDisplayNames.

https://reviewboard.mozilla.org/r/233834/#review240888
Attachment #8965155 - Flags: review?(jfkthame) → review+
Comment on attachment 8965157 [details]
Bug 1449505 - Migrate translation-infobar to use mozIntl.getLocaleDisplayNames.

https://reviewboard.mozilla.org/r/233838/#review240892
Attachment #8965157 - Flags: review?(jfkthame) → review+
Comment on attachment 8965156 [details]
Bug 1449505 - Migrate Preferences translation.js to use mozIntl.getLocaleDisplayNames.

https://reviewboard.mozilla.org/r/233836/#review241114
Attachment #8965156 - Flags: review?(jaws) → review+
Comment on attachment 8965158 [details]
Bug 1449505 - Migrate Preferences languages.js to use mozIntl.getLocaleDisplayNames.

https://reviewboard.mozilla.org/r/233840/#review241116

::: browser/components/preferences/languages.js:87
(Diff revision 3)
> +      let isVisible = localeValues[i] == "true" &&
> +        (!(localeCodes[i] in this._acceptLanguages) || !this._acceptLanguages[localeCodes[i]]);
> +
> +      let name = bundlePreferences.getFormattedString("languageCodeFormat",
> +        [localeNames[i], localeCodes[i]]);
> +      var li = new LanguageInfo(name, localeCodes[i], isVisible);

let
Attachment #8965158 - Flags: review?(jaws) → review+
> is it worth migrating more of these? (E.g. things like "es-419".)

The tests are skipped because they're unsupported. And they're unsupported because we don't carry data for regions such as "419". I could migrate the tests and keep them skipped, but I'm not sure if it's worth it.

The additional values will get supported once we start importing data from CLDR into FTL files that will be used here instead of the current hand-curated .properties.
Pushed by zbraniecki@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/398d8addfbd8
Add mozIntl.getLocaleDisplayNames. r=jfkthame
https://hg.mozilla.org/integration/autoland/rev/4bf3befa991d
Migrate InlineSpellChecker.jsm to use mozIntl.getLocaleDisplayNames. r=jfkthame
https://hg.mozilla.org/integration/autoland/rev/f0b4429eedff
Migrate NarrateControls.jsm to use mozIntl.getLocaleDisplayNames. r=jfkthame
https://hg.mozilla.org/integration/autoland/rev/a58126ed4b3f
Migrate Preferences translation.js to use mozIntl.getLocaleDisplayNames. r=jaws
https://hg.mozilla.org/integration/autoland/rev/75a7dee8e060
Migrate translation-infobar to use mozIntl.getLocaleDisplayNames. r=jfkthame
https://hg.mozilla.org/integration/autoland/rev/07143ec143ae
Migrate Preferences languages.js to use mozIntl.getLocaleDisplayNames. r=jaws
https://hg.mozilla.org/integration/autoland/rev/e82c5152cc17
Migrate FormAutofill to use mozIntl.getLocaleDisplayNames. r=MattN
Comment on attachment 8965159 [details]
Bug 1449505 - Migrate FormAutofill to use mozIntl.getLocaleDisplayNames.

https://reviewboard.mozilla.org/r/233842/#review241246


Code analysis found 1 defect in this patch:
 - 1 defect found by mozlint

You can run this analysis locally with:
 - `./mach lint path/to/file` (JS/Python)


If you see a problem in this automated review, please report it here: http://bit.ly/2y9N9Vx


::: browser/extensions/formautofill/FormAutofillUtils.jsm:853
(Diff revision 4)
>          throw new Error("Unexpected attributeName");
>      }
>  
> -    element.textContent = bundle.GetStringFromName(element.getAttribute(attributeName));
> -    element.removeAttribute(attributeName);
>    },

Error: Block must not be padded by blank lines. [eslint: padded-blocks]
Pushed by philringnalda@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/760e4d29af7e
followup, placate eslint on a CLOSED TREE
Pushed by philringnalda@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/e96c92f80c09
followup, fix forthcoming xpcshell bustage in test_mozintl.js
Pushed by zbraniecki@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/d909fdf48f6c
Add mozIntl.getLocaleDisplayNames. r=jfkthame
https://hg.mozilla.org/integration/autoland/rev/d24c13f4f873
Migrate InlineSpellChecker.jsm to use mozIntl.getLocaleDisplayNames. r=jfkthame
https://hg.mozilla.org/integration/autoland/rev/fcaf5e700dd9
Migrate NarrateControls.jsm to use mozIntl.getLocaleDisplayNames. r=jfkthame
https://hg.mozilla.org/integration/autoland/rev/cb8e3bb17721
Migrate Preferences translation.js to use mozIntl.getLocaleDisplayNames. r=jaws
https://hg.mozilla.org/integration/autoland/rev/0576fee62637
Migrate translation-infobar to use mozIntl.getLocaleDisplayNames. r=jfkthame
https://hg.mozilla.org/integration/autoland/rev/c9e2b7bb3b4f
Migrate Preferences languages.js to use mozIntl.getLocaleDisplayNames. r=jaws
https://hg.mozilla.org/integration/autoland/rev/5eed2707eb00
Migrate FormAutofill to use mozIntl.getLocaleDisplayNames. r=MattN
Flags: needinfo?(gandalf)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: