Closed Bug 1337694 Opened 7 years ago Closed 7 years ago

Add language negotiation heuristics to LocaleService

Categories

(Core :: Internationalization, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla55
Tracking Status
firefox55 --- fixed

People

(Reporter: zbraniecki, Assigned: zbraniecki)

References

Details

Attachments

(1 file, 5 obsolete files)

In several places, including LocaleService itself, we'll want to perform language negotiation to retrieve the best available fallback chain for a given language list and user locale preferences.

I'll shape the API after:
 - https://tools.ietf.org/html/rfc4647
 - http://searchfox.org/mozilla-central/rev/848c29538ab007fb95dc6cff194f0e3d3809613d/js/src/builtin/Intl.js#859
 - https://github.com/l20n/l20n.js/blob/v1.0.x/lib/l20n/intl.js#L434

but if possible, I'd like to allow multiple availableLocales lists to be taken into consideration to make the algorithm useful for negotiating scenarios where we're looking for best available language fallback chain that provides match for more than one dataset (say, l10n data and plural rules).

This should replace things like:
 - http://searchfox.org/mozilla-central/search?q=findClosestLocale&path=
Olly, in bug 1336281 comment 28 you suggested switching `jsval` as a type for array of strings to 

`[optional] unsigned long aCount, [array, size_is(aCount)] out string aArray`

Here, I'm working on an API that will have a bit more complex signature.
It has to accept 2 arrays of strings (requestedLocales and acceptedLocales), and a string (defaultLocale), and returns an array of strings (supportedLocales).

My question is - is it better to use the IDL syntax you proposed or would it be more convenient for future users to write this pure C++ function and then a wrapper that just converts the nsTArray<nsCString> into jsval, or unify that in one IDL function with the params you suggested?

So, is it better to do:

  void NegotiateLanguages(nsTArray<nsCString>& requestedLocales,
                          nsTArray<nsCString>& availableLocales,
                          nsACString& defaultLocale,
                          nsTArray<nsCString>& aRetVal);

and

NS_IMETHODIMP
LocaleService::NegotiateLanguages(JS::HandleValue requestedLocales, JS::HandleValue availableLocales, nsACString& defaultLocale
                                  JS::MutableHandleValue& aRetVal);

or a single:

NS_IMETHODIMP
LocaleService::NegotiateLanguages(uint32_t aRequestedCount, const char** requestedLocales,
                                  uint32_t aAvailableCount, const char** availableLocales,
                                  nsACString& defaultLocale,
                                  uint32_t* aSupportedCount, char*** aRetVal);

It seems to me that for both, C++ and JS convenience it would be easier to have two methods - C++ would be able to operate on nsTArray and nsCStrings, and JS wouldn't have to provide the count attrs for each array.
Does it sound like a good solution or is the combined one preferred?
Flags: needinfo?(bugs)
Assignee: nobody → gandalf
Status: NEW → ASSIGNED
Same q for :jfkthame
Flags: needinfo?(jfkthame)
I'm no expert on IDL and JS APIs and all that, but I believe you can do something like:

    void negotiateLocales([array, size_is(aRequestedCount)] in string aRequested,
                          [array, size_is(aAvailableCount)] in string aAvailable,
                          in string aDefaultLocale,
                          [optional] in unsigned long aRequestedCount,
                          [optional] in unsigned long aAvailableCount,
                          [optional] out unsigned long aCount,
                          [retval, array, size_is(aCount)] out string aLocales);

and this will be conveniently callable from JS without needing to provide any extra 'count' parameters:

    locales = locSvc.negotiateLocales(['de-AT', 'de', 'fr'], ['es', 'fr-FR', 'fr-CA', 'de-DE'], 'en-US');

    -> ['de-DE', 'fr-FR', 'en-US']

or whatever....

Calling this from C++ would be a little less convenient (and more error-prone), so if we're expecting to have a number of C++ users we might still want to have a non-XPCOM API that takes nsTArray<nsACString>& parameters, etc., to simplify things for those callers.
Flags: needinfo?(jfkthame)
Yeah, something what jfkthame said should work.


I wonder, if there will be lots of rather complicated methods in the service, would it be bad idea to 
have ChromeOnly webidl interface for it and then some static methods. bz or peterv might have an opinion.
(WebIDL bindings just make dealing with arrays and such so much easier.)
Flags: needinfo?(bugs)
Thanks!

I do not expect any other complex methods and I do see only JS callers for now.

I'll write it the way you suggested and if we'll get C++ callers, I may add a non-XPCOM API.
Comment on attachment 8839011 [details]
Bug 1337694 - Add Language negotiation to LocaleService API

I created a stub for the API I'll have to write.

I added the C++ API as you suggested because I think it'll be easier to test and it was not that hard*.

I'm ironing out the algo with :pike in [0] because it's a bit easier to prototype in JS, but I'll write it in C++ here once we're done.

*) In the meantime, jfkthame, can you help me make this stub not completely insult the memory management in C++? I did my best, and got it to compile but it crashes like there's no tomorrow on "./mach gtest Intl*".
I'm pretty sure I committed some serious crime against humanity with how I squeezed those square nsACStrings and nsTArrays into the round char***'s. :(

[0] https://github.com/projectfluent/fluent.js/pull/5
Attachment #8839011 - Flags: feedback?(jfkthame)
Comment on attachment 8839011 [details]
Bug 1337694 - Add Language negotiation to LocaleService API

https://reviewboard.mozilla.org/r/113766/#review115356

::: intl/locale/LocaleService.h:75
(Diff revision 1)
> -  void Negotiate(nsTArray<nsCString>& requestedLocales,
> -      nsTArray<nsCString>& availableLocales, nsACString& defaultLocale, nsTArray<nsCString>& aRetVal);
> +  void NegotiateLanguages(nsTArray<nsACString>& aRequested,
> +                          nsTArray<nsACString>& aAvailable,
> +                          nsACString& aDefaultLocale,
> +                          nsTArray<nsACString>& aRetVal);

If we're going to have a C++-friendly method like this (I realize it may be just for testing at this point), the three in-params should all be declared `const`.

::: intl/locale/LocaleService.cpp:79
(Diff revision 1)
> -void LocaleService::Negotiate(nsTArray<nsCString>& requestedLocales,
> -    nsTArray<nsCString>& availableLocales, nsACString& defaultLocale, nsTArray<nsCString>& aRetVal)
> +void LocaleService::NegotiateLanguages(nsTArray<nsACString>& aRequested,
> +    nsTArray<nsACString>& aAvailable, nsACString& aDefaultLocale, nsTArray<nsACString>& aRetVal)

Writing the C++-style method as a wrapper around the XPCOM one is really painful: if we want both, I'd do it the other way around.

However, it's probably easier all around - if we don't think we're really going to be using this from C++ - to skip this altogether and write tests for the API using JS instead of gtest.

::: intl/locale/LocaleService.cpp:83
(Diff revision 1)
> +  const char** requested = new const char *[aRequested.Length()];
> +  const char** available = new const char *[aAvailable.Length()];

These two arrays are allocated (`new`) here, but never released (`delete[]`), so they'd leak.

::: intl/locale/LocaleService.cpp:85
(Diff revision 1)
> -    nsTArray<nsCString>& availableLocales, nsACString& defaultLocale, nsTArray<nsCString>& aRetVal)
> +    nsTArray<nsACString>& aAvailable, nsACString& aDefaultLocale, nsTArray<nsACString>& aRetVal)
>  {
> +
> +  const char** requested = new const char *[aRequested.Length()];
> +  const char** available = new const char *[aAvailable.Length()];
> +  const char* defaultLocale = PromiseFlatCString(aDefaultLocale).get();

Not safe, see searchfox.org/mozilla-central/source/xpcom/string/nsTPromiseFlatString.h.

::: intl/locale/LocaleService.cpp:90
(Diff revision 1)
> +  const char* defaultLocale = PromiseFlatCString(aDefaultLocale).get();
> +  uint32_t requestedCount = aRequested.Length();
> +  uint32_t availableCount = aAvailable.Length();
> +
> +  for (uint32_t i = 0; i < requestedCount; i++) {
> +    requested[i] = PromiseFlatCString(aRequested[i]).get();

Again, it's not valid to use `PromiseFlatCString` like this. The pointers that end up in `requested[]` will be invalid by the time they're subsequently used.

To actually create C-style arrays of C-style strings here, like the XPCOM version of the method wants, I think you'd need to `moz_xstrdup()` the strings (and then free them afterwards)... but I really, really don't think we want to do this at all.

::: intl/locale/LocaleService.cpp:161
(Diff revision 1)
> +  printf("%s\n", pattern);
> +
> +  uint32_t count = 1;
> +  char** retVal = (char**)moz_xmalloc(count * sizeof(char*));
> +
> +  retVal[0] = (char*)pattern;

This assigns a local (stack-based) variable to `retVal[0]`, which is a no-no because the `pattern` buffer will evaporate as soon as this method returns.

You need to heap-allocate a new buffer for the string you're returning; you could `moz_xmalloc` it and then `strcpy` the string into it, but there's a shorthand that does this for you:

    retVal[0] = moz_xstrdup(pattern);

That gives you a newly-allocated duplicate of the string (which it will be up to the caller to free).
Attachment #8839011 - Attachment is obsolete: true
Attachment #8839011 - Flags: feedback?(jfkthame)
Thanks :jfkthame!

I'll be using your guidance as I go through the service. Currently, we're still working out the exact algorithm we'll want to use with :pike in https://github.com/projectfluent/fluent.js/pull/5 but I started implementing the data structure Locale that we'll be using.

And instantly I encountered an interesting problem, that I'd like to discuss before I start coding:

The first thing we need to do is parse a locale ID string like "en-Latn-US-mac" into a structure with 4 parts (language, script, region and variant).

We can do that in two ways:
1) Parse the string "by hand" either using C++ regexp, or just looping over chars.
2) Use ICU uloc API

The uloc API simplifies our code, but it brings a certain cost:

1) It does more than just chunk the string. It seems to canonicalize it (turning lower case region part into upper case, and caps-first script part etc.). We don't need it since we'll be handling all matching as case insensitive.
2) It does not handle ranges.

One of the important concepts we'd like to use is a locale range - any part of the locale string can be replaced with an asterisk to represent that the locale captures all possible values for the given part.

For example, "en-Latn" means "language en, script latn, no region specified", but "en-Latn-*" means "language en, script latn, any region".

We could workaround this in ICU by doing something half-smart and for example pre-parsing the locale id and replace any '*' with "zz", "zzzz" or "zzz" depending on the part in which '*' is.

But I feel like if we have to pre-parse and analyze parts, we're 90% of the way.

In other words, :jfkthame - how do you think we should handle in C++ something like this:

```
const languageCodeRe = '([a-z]{2,3}|\\*)';
const scriptCodeRe = '(?:-([a-z]{4}|\\*))';
const regionCodeRe = '(?:-([a-z]{2}|\\*))';
const variantCodeRe = '(?:-([a-z]+|\\*))';
const localeRe = new RegExp(
`^${languageCodeRe}${scriptCodeRe}?${regionCodeRe}?${variantCodeRe}?$`, 'i');
```
Flags: needinfo?(jfkthame)
(In reply to Zibi Braniecki [:gandalf][:zibi] from comment #10)

> One of the important concepts we'd like to use is a locale range - any part
> of the locale string can be replaced with an asterisk to represent that the
> locale captures all possible values for the given part.
> 
> For example, "en-Latn" means "language en, script latn, no region
> specified", but "en-Latn-*" means "language en, script latn, any region".
> 
> We could workaround this in ICU by doing something half-smart and for
> example pre-parsing the locale id and replace any '*' with "zz", "zzzz" or
> "zzz" depending on the part in which '*' is.
> 
> But I feel like if we have to pre-parse and analyze parts, we're 90% of the
> way.

Yeah, that seems like an ugly mixture of approaches.

> In other words, :jfkthame - how do you think we should handle in C++
> something like this:
> 
> ```
> const languageCodeRe = '([a-z]{2,3}|\\*)';
> const scriptCodeRe = '(?:-([a-z]{4}|\\*))';
> const regionCodeRe = '(?:-([a-z]{2}|\\*))';
> const variantCodeRe = '(?:-([a-z]+|\\*))';
> const localeRe = new RegExp(
> `^${languageCodeRe}${scriptCodeRe}?${regionCodeRe}?${variantCodeRe}?$`, 'i');
> ```

Do we currently include the ICU regex support in the build? If so, I'd be inclined to look at using that, for a solution that would look very much like your JS above. But if we don't include it, then just scanning the chars looking for hyphens and checking each substring found would probably be fairly straightforward; it's not worth adding a substantial chunk more ICU code just for that.

Maybe it's useful to look into nsCharSeparatedTokenizer rather than entirely manual scanning?
Flags: needinfo?(jfkthame)
Thanks :jfkthame!

Making slow and steady progress! I think I have the base building blocks now and I think it starts looking ok - I'd like to get it to the point where the first matches work and get a big-picture feedback on the code from you then.

But for now, I got stuck in C++ array+string code and need a bit of help. If you try to compile the current WIP patch, it errors with:

0:02.42 In file included from /home/zbraniecki/projects/mozilla/mozilla-unified/intl/locale/LocaleService.h:11:0,
 0:02.42                  from /home/zbraniecki/projects/mozilla/mozilla-unified/intl/locale/LocaleService.cpp:6,
 0:02.42                  from /home/zbraniecki/projects/mozilla/mozilla-unified/obj-x86_64-pc-linux-gnu/intl/locale/Unified_cpp_intl_locale0.cpp:20:
 0:02.42 /home/zbraniecki/projects/mozilla/mozilla-unified/obj-x86_64-pc-linux-gnu/dist/include/nsTArray.h: In instantiation of ‘static void nsTArrayElementTraits<E>::Construct(E*, A&&) [with A = nsAutoCString&; E = nsACString_internal]’:
 0:02.42 /home/zbraniecki/projects/mozilla/mozilla-unified/obj-x86_64-pc-linux-gnu/dist/include/nsTArray.h:2239:25:   required from ‘nsTArray_Impl<E, Alloc>::elem_type* nsTArray_Impl<E, Alloc>::AppendElement(Item&&) [with Item = nsAutoCString&; ActualAlloc = nsTArrayInfallibleAllocator; E = nsACString_internal; Alloc = nsTArrayInfallibleAllocator; nsTArray_Impl<E, Alloc>::elem_type = nsACString_internal]’
 0:02.42 /home/zbraniecki/projects/mozilla/mozilla-unified/intl/locale/LocaleService.cpp:89:46:   required from here
 0:02.42 /home/zbraniecki/projects/mozilla/mozilla-unified/obj-x86_64-pc-linux-gnu/dist/include/nsTArray.h:674:5: error: ‘nsACString_internal::nsACString_internal(const self_type&)’ is protected within this context
 0:02.42      new (static_cast<void*>(aE)) E(mozilla::Forward<A>(aArg));

It seems that this line in intl/locale/LocaleService.cpp:

aRetVal.AppendElement(availableLocale);

does not work. How should I add a string to an array?
Flags: needinfo?(jfkthame)
Comment on attachment 8841428 [details]
Bug 1337694 - Add language negotiation heuristics to LocaleService.

https://reviewboard.mozilla.org/r/115658/#review117326

::: intl/locale/Locale.h:8
(Diff revision 2)
> + * License, v. 2.0. If a copy of the MPL was not distributed with this
> + * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
> +
> +#ifndef mozilla_intl_Locale_h__
> +#define mozilla_intl_Locale_h__
> +

This needs an `#include "nsString.h"` to make sure nsCString is defined.

::: intl/locale/Locale.cpp:29
(Diff revision 2)
> +          mScript.Assign(part);
> +          break;
> +        } else {
> +          partNum++;
> +          // fallover to region case
> +        }

You'll want to annotate this with MOZ_FALLTHROUGH to avoid a warning (or error?)

::: intl/locale/LocaleService.cpp:82
(Diff revision 2)
>  }
>  
> +void LocaleService::FilterMatches(const nsTArray<nsACString>& aRequested,
> +  const nsTArray<nsACString>& aAvailable, nsTArray<nsACString>& aRetVal)
> +{
> +  for (uint32_t i; i < aRequested.Length(); i++) {

Don't forget to initialize `i`!

::: intl/locale/LocaleService.cpp:85
(Diff revision 2)
> +  const nsTArray<nsACString>& aAvailable, nsTArray<nsACString>& aRetVal)
> +{
> +  for (uint32_t i; i < aRequested.Length(); i++) {
> +    nsAutoCString requestedLocale(aRequested[i]);
> +
> +    for (uint32_t j; j < aAvailable.Length(); j++) {

And `j`.

Though depending what you need to do with the strings eventually, copying them to local `nsAutoCString`s may be redundant. Can you just range-loop over the arrays?

    for (const auto& requested : aRequested) {
      for (const auto& available: aAvailable) {
        if (requested.Equals(available)) {
          ...
        }
      }
    }

::: intl/locale/LocaleService.cpp:89
(Diff revision 2)
> +
> +    for (uint32_t j; j < aAvailable.Length(); j++) {
> +      nsAutoCString availableLocale(aAvailable[j]);
> +
> +      if (requestedLocale.Equals(availableLocale)) {
> +        aRetVal.AppendElement(availableLocale);

I think this fails because aRetVal is declared as an array of nsACString, which is an "abstract" string interface that you can't directly construct -- which the array would need to be able to do.

If you use `nsTArray<nsCString>&` (i.e. an array of concrete `nsCString` objects) as the type, I think you'll have better success.

::: intl/locale/moz.build:36
(Diff revision 2)
>  
>  XPIDL_MODULE = 'locale'
>  
>  EXPORTS += [
>      'DateTimeFormat.h',
> +    'Locale.h',

Does this really need to be in the public EXPORTS? I think that may be a problem, because it'll prevent the use of the internal string APIs/classes in this header, and you want nsCString... Can you move it to EXPORTS.mozilla.intl? Seems like that's more where it belongs, anyhow.

(Or is it purely internal, in which case it may not need to be exported at all?)
Thank you! Very helpful feedback.

Yeah, Locale is internal, and I'll use the iterator and nsCString array :)
Flags: needinfo?(jfkthame)
Actually, unless the Locale class is going to get a *lot* bigger and more complex, I'd be inclined to just add it directly to the LocaleService files, rather than create a new header and implementation file just for it.
Comment on attachment 8841428 [details]
Bug 1337694 - Add language negotiation heuristics to LocaleService.

I think this is the right moment to ask for a general feedback. The API starts doing what it's supposed to be doing and it seems to me that it is properly spaced within LocaleService.

It's basically a C++ port of the JS code[0] at the moment, but instead of our min-subtags, I'll use ICU getLikelySubtags.

I left a few XXX comments for things that I hope have API, but I couldn't find how to do them.

I also did end up with C++ and XPCOM APIs because I see a couple cases where we will want to negotiate languages in C++ (addons manager) and a few for js (l10nregistry, mozIntl etc.)
I think it doesn't add too much burden and makes it easy to call. :)

[0] https://github.com/projectfluent/fluent.js/tree/master/fluent-langneg/src
Attachment #8841428 - Flags: feedback?(jfkthame)
Comment on attachment 8841428 [details]
Bug 1337694 - Add language negotiation heuristics to LocaleService.

https://reviewboard.mozilla.org/r/115658/#review118250

Various comments below that I hope may be useful. But I'll also post an updated version of the patch, incorporating these notes and with an alternative implementation of FilterMatches() for consideration.

::: intl/locale/LocaleService.h:98
(Diff revision 5)
> +  nsCString mLanguage;
> +  nsCString mScript;
> +  nsCString mRegion;
> +  nsCString mVariant;

These data members can probably be made 'private'.

::: intl/locale/LocaleService.h:104
(Diff revision 5)
> +  Locale(nsACString &locale, bool range);
> +
> +  bool matches(Locale &locale);

Move the `&` chars so they're attached to the type rather than the variable name.

Oh, and gecko style calls for an `a` prefix on args, so `aLocale` and `aRange`, please.

And let's constify this, so:

    bool Matches(const Locale& aLocale) const;

::: intl/locale/LocaleService.cpp:117
(Diff revision 5)
> +{
> +  for (auto& requested : aRequested) {
> +    bool foundMatch = false;
> +
> +    // 1) Try to find an exact match
> +    //XXX: Should be case insensitive

Just add `nsCaseInsensitiveCStringComparator()` as a second argument to the `Equals()` call.

::: intl/locale/LocaleService.cpp:118
(Diff revision 5)
> +  for (auto& requested : aRequested) {
> +    bool foundMatch = false;
> +
> +    // 1) Try to find an exact match
> +    //XXX: Should be case insensitive
> +    for (auto& available: aAvailable) {

Make it `const auto&`, to be clear that we don't need to touch it.

::: intl/locale/LocaleService.cpp:126
(Diff revision 5)
> +    if (foundMatch) {
> +      break;
> +    }

I think the loops here can be replaced by use of std::find_if() with an appropriate predicate. I'll post an updated version of the patch, that's easier than trying to explain in line-by-line comments.

::: intl/locale/LocaleService.cpp:134
(Diff revision 5)
> +
> +    Locale requestedLocale = Locale(requested, false);
> +
> +    // 2) Try to match against available as a range
> +    for (auto& available: aAvailable) {
> +      //XXX: Would be nice to cache the available here, for future loops.

Stash them in a local `AutoTArray<Locale,N>` that parallels the `nsTArray<nsCString>` argument.

::: intl/locale/LocaleService.cpp:138
(Diff revision 5)
> +    for (auto& available: aAvailable) {
> +      //XXX: Would be nice to cache the available here, for future loops.
> +      Locale availableLocale = Locale(available, true);
> +
> +      if (requestedLocale.matches(availableLocale)) {
> +        aRetVal.AppendElement(available);

We should avoid appending a value that's already in the array (once we start returning more than one, see below). So we need a !Contains() check here.

(Or an alternative implementation might be to create a local copy of the `aAvailable` list, and whenever we find a match and put it into `aRetVal`, remove it from the available list so that it won't be found again.)

::: intl/locale/LocaleService.cpp:144
(Diff revision 5)
> +    if (foundMatch) {
> +      break;
> +    }

I don't think you want to `break` here, as we still want to look for matches for the remaining requests. The `break` would be appropriate if you're wanting to return only the first/best match, but AIUI that's not the case here; we want all available locales that satisfy the list of requests, in order from best to worst.

::: intl/locale/LocaleService.cpp:200
(Diff revision 5)
> +                                  const char*  aDefaultLocale,
> +                                  uint32_t aRequestedCount,
> +                                  uint32_t aAvailableCount,
> +                                  uint32_t* aCount, char*** aRetVal)
> +{
> +  nsTArray<nsCString> requestedLocales;

For all 3 of these local arrays, use `AutoTArray<nsCString,N>` with an `N` that is likely large enough for all usual cases, to avoid a heap allocation except in extreme circumstances.

::: intl/locale/LocaleService.cpp:225
(Diff revision 5)
> +  char** retVal =
> +    static_cast<char**>(moz_xmalloc(sizeof(char*) * supportedLocales.Length()));
> +
> +  uint32_t i = 0;
> +  for (const auto& supported : supportedLocales) {
> +    retVal[i++] = strdup(supported.get());

Use `moz_xstrdup` rather than plain `strdup`.

::: intl/locale/LocaleService.cpp:284
(Diff revision 5)
> +  }
> +}
> +
> +bool Locale::matches(Locale &locale)
> +{
> +  //XXX: Need to do this for all 4 parts

Looks worth defining a handy lambda here.

::: intl/locale/LocaleService.cpp:286
(Diff revision 5)
> +    (mLanguage == nullptr && locale.mLanguage == nullptr) ||
> +    (mLanguage.EqualsLiteral("*") || locale.mLanguage.EqualsLiteral("*")) ||
> +    (mLanguage != nullptr && locale.mLanguage != nullptr &&

These fields are `nsCString`s, not `char*` pointers, so you don't need all those `nullptr` checks.

::: intl/locale/LocaleService.cpp:290
(Diff revision 5)
> +  return
> +    (mLanguage == nullptr && locale.mLanguage == nullptr) ||
> +    (mLanguage.EqualsLiteral("*") || locale.mLanguage.EqualsLiteral("*")) ||
> +    (mLanguage != nullptr && locale.mLanguage != nullptr &&
> +     //XXX: Should be case insensitive
> +     mLanguage.Equals(locale.mLanguage));

Add `nsCaseInsensitiveCStringComparator()` as 2nd parameter.

::: intl/locale/mozILocaleService.idl:35
(Diff revision 5)
> +                          [array, size_is(aRequestedCount)] in string aRequested, 
> +                          [array, size_is(aAvailableCount)] in string aAvailable, 

trailing spaces
See FilterMatches, in particular; also tidied up various other points, mostly as per comments above.
Sorry, I posted the wrong patch; here's the real updated one I intended.
Attachment #8842877 - Attachment is obsolete: true
Thanks! This looks great!

I incorporated your changes to the main patch and completed the algorithm.

I'll now start working on tests and docs for the code - I narrowed it down for now to 'matching' strategy, but we may eventually want to add 'lookup' or 'filtering'. For now matching should be the right call.
Comment on attachment 8841428 [details]
Bug 1337694 - Add language negotiation heuristics to LocaleService.

https://reviewboard.mozilla.org/r/115658/#review118636

I know you didn't request review yet, but I had a few comments anyhow...

::: intl/locale/LocaleService.h:108
(Diff revision 6)
>  
> +  void FilterMatches(nsTArray<nsCString>& aRequested,
> +                     nsTArray<nsCString>& aAvailable,
> +                     nsTArray<nsCString>& aRetVal);
> +
> +  Locale GetLikelySubtagsLocale(nsACString& locale, bool found);

If this is going to use `found` to pass back a result, it needs to be passed as a `bool*`. Currently this passes by value, and the caller will never see it change.

Oh, and use `const nsCString&` for the locale parameter, so you won't need PromiseFlat... internally. This isn't exposed to XPCOM and doesn't need to use the A-version of the string interface.

And a-prefix the arguments. :)

::: intl/locale/LocaleService.cpp:164
(Diff revision 6)
> +    // First time, create the Locale representations of aAvailable.
> +    if (availLocales.IsEmpty()) {
> +      for (auto& avail : aAvailable) {
> +        availLocales.AppendElement(Locale(avail, true));
> +      }
> +    }

Might as well do this unconditionally right at the start, where `availLocales` is declared, seeing as it's always done. The only reason to defer it to here would be if we had a "return first match only" option that might return an exact match without ever reaching this point.

::: intl/locale/LocaleService.cpp:190
(Diff revision 6)
> +      auto matchesRange = [&](const Locale& aLoc) {
> +        return maxRequestedLocale.Matches(aLoc);
> +      };

I don't like re-using the name `matchesRange` here; while scoping will make it do what you intend, it's confusing for the reader. Maybe `maxMatchesRange`, to indicate what this version is matching against?

::: intl/locale/LocaleService.cpp:204
(Diff revision 6)
> +        }
> +      }
> +    }
> +
> +    // 4) Try to match against a variant as a range
> +    requestedLocale.SetVariantRange();

Would it work (equally well or better?) to use the "maximized" version `maxRequestedLocale` as the basis for steps 4 and 5, successively replacing its variant and region subtags with `*`? I haven't thought this through carefully, though, just wondering...
Attachment #8842930 - Attachment is obsolete: true
Comment on attachment 8841428 [details]
Bug 1337694 - Add language negotiation heuristics to LocaleService.

Thanks Jonathan!

I applied your feedback and started working on tests.

Unfortunately, it seems that there's some weird bug that makes the browser (and tests) hang when I try to launch it, unless I comment out the aSubtag1.Equals(aSubtag2).

Steps to reproduce:
1) Compile it
2) Launch the browser with "Enable browser chrome and add-on debugging" option from Tools (ctrl+shift+i, settings icon)
3) Open Browser Console (ctrl+shift+j)
4) `let lsvc = Components.classes["@mozilla.org/intl/localeservice;1"].getService(Components.interfaces.mozILocaleService);`
5) `lsvc.negotiateLanguages(["en"], ["en"]);`

If I uncomment the `aSubtag1.Equals(aSubtag2)` either with the case insensitive or not, it hands (no crash, just hang).

Any idea why? Also, if you see anything else in the code, I think it's a good time for general feedback on the code.
Attachment #8841428 - Flags: feedback?(jfkthame)
:bz helped me analyze this and pointed out that I did nothing to advance the iterator after the match is found which caused the freeze.

So, this is now fixed and I'm working on tests, but keeping the f? for general feedback.
Comment on attachment 8841428 [details]
Bug 1337694 - Add language negotiation heuristics to LocaleService.

https://reviewboard.mozilla.org/r/115658/#review118934

::: intl/locale/LocaleService.h:19
(Diff revision 9)
> +class Locale
> +{
> +public:
> +  // Should be `const` once we make nsACString.Split const
> +  Locale(nsACString& aLocale, bool aRange);
> +
> +  bool Matches(const Locale &aLocale) const;
> +
> +  void SetVariantRange();
> +  void SetRegionRange();
> +
> +private:
> +  nsCString mLanguage;
> +  nsCString mScript;
> +  nsCString mRegion;
> +  nsCString mVariant;
> +};

AFAICS the Locale class doesn't really need to be exposed publicly at all; our public APIs are all using strings to pass locale codes around. This Locale is just something used internally by the LocaleService during its matching operations.

As such, I think we should make it a private internal class within LocaleService, so nobody is misled into thinking they should be using it elsewhere.

::: intl/locale/LocaleService.cpp:83
(Diff revision 9)
>      }
>    }
>  }
>  
> +Locale
> +LocaleService::GetLikelySubtagsLocale(nsCString& aLocID, bool* aFound)

I'm going to suggest replacing this with a method on Locale, something like `Locale::AddLikelySubtags`, that modifies the object in-place and returns true if anything was changed; I think that will end up being a bit more convenient.

::: intl/locale/LocaleService.cpp:150
(Diff revision 9)
> +  AutoTArray<Locale, 100> availLocales;
> +  for (auto& avail : aAvailable) {
> +    availLocales.AppendElement(Locale(avail, true));
> +  }
> +
> +  for (auto& requested : aRequested) {

General thought about this loop: given that we're aiming to find matches of all types (1) - (5), in decreasing order of "goodness", I think it's guaranteed that the later searches will re-find things that already matched earlier. Which is why all the `!aRetVal.Contains(...)` checks are needed.

But that gets a bit expensive, and it feels ugly to repeatedly find and consider adding the same locale at each step.

So I suggest removing entries from the `availLocales` array as they're used, so that they don't get considered for addition later. We can't really do that for the original `aAvailable` array (which belongs to the caller; ideally it'd be a `const` param), but `availLocales` is a local that we can use however we wish.

If you do that, the expression `aAvailable[rangeMatch - availLocales.begin()]` will no longer be correct to look up the original locale string, but you can make `Locale` store a reference to the original string, so it's readily available there.

I'll throw together an updated patch that uses this approach, to see how it looks...

::: intl/locale/LocaleService.cpp:180
(Diff revision 9)
> +    Locale maxRequestedLocale = GetLikelySubtagsLocale(requested, &found);
> +
> +    auto maxMatchesRange = [&](const Locale& aLoc) {
> +      return maxRequestedLocale.Matches(aLoc);
> +    };

If we're going to work with the maximized version from here on, then I'd suggest just assigning it to the `requestedLocale` variable (from above), and continuing to use the already-defined `matchesRange`; no need for a second lambda just to bind to a different local variable.

::: intl/locale/LocaleService.cpp:201
(Diff revision 9)
> +    }
> +
> +    // 4) Try to match against a variant as a range
> +    maxRequestedLocale.SetVariantRange();
> +
> +    while ((rangeMatch = std::find_if(rangeMatch, availLocales.end(),

Unless I'm missing something, you need to reset the `rangeMatch` iterator to `availLocales.begin()` before this loop, otherwise it's not going to find anything.

::: intl/locale/LocaleService.cpp:210
(Diff revision 9)
> +        aRetVal.AppendElement(avail);
> +      }
> +      rangeMatch++;
> +    }
> +
> +    // 5) Try to match against a region as a range

These steps of the algorithm are really repetitive; I think we should define a lambda that encapsulates the matching operation, and then simply call that each time.

::: intl/locale/LocaleService.cpp:213
(Diff revision 9)
> +    }
> +
> +    // 5) Try to match against a region as a range
> +    maxRequestedLocale.SetRegionRange();
> +
> +    while ((rangeMatch = std::find_if(rangeMatch, availLocales.end(),

Same here.
Updated patch that incorporates a bunch of suggestions from above.
That looks great! Incorporating into the main patch and will try to finish tests this weekend. Thank you!
Attachment #8843596 - Attachment is obsolete: true
Comment on attachment 8841428 [details]
Bug 1337694 - Add language negotiation heuristics to LocaleService.

On top of your changes, I finished:

 * tests
 * strategies
 * "_"->"-" normalization

I believe it's a good time for the last round of feedback :)

What's left for me is:

 * document the code
 * improve the C++ tests in gtest
Attachment #8841428 - Flags: feedback?(jfkthame)
Depends on: 1344555
Comment on attachment 8841428 [details]
Bug 1337694 - Add language negotiation heuristics to LocaleService.

https://reviewboard.mozilla.org/r/115658/#review118950

::: intl/locale/LocaleService.h:81
(Diff revision 11)
> -protected:
> -  nsTArray<nsCString> mAppLocales;
> +  // aRequest and aAvailable should be `const` once
> +  // we make nsACString.Split const

That would be nice; I didn't see an existing bug about it, so just filed bug 1344555 with a patch.

::: intl/locale/LocaleService.cpp:152
(Diff revision 11)
> +      switch (aStrategy) {
> +        case LangNegStrategy::Lookup:
> +          return;
> +        case LangNegStrategy::Matching:
> +          continue;
> +        case LangNegStrategy::Filtering:
> +          break;
> +      }

With 5 occurrences of exactly this pattern, I'd be inclined to define a macro HANDLE_STRATEGY, so that this becomes just

    if (!aRetVal.IsEmpty()) {
      HANDLE_STRATEGY;
    }

and similarly in each of the `if (findRangeMatches(...))` instances below.

::: intl/locale/LocaleService.cpp:174
(Diff revision 11)
> +        switch (aStrategy) {
> +          case LangNegStrategy::Filtering:
> +            break;
> +          case LangNegStrategy::Matching:
> +            return true;
> +          case LangNegStrategy::Lookup:
> +            return true;
> +        }

This would be more concise as simply

    if (aStrategy != LangNegStrategy::Filtering) {
      return true; // we only want the first match
    }

::: intl/locale/LocaleService.cpp:183
(Diff revision 11)
> +            return true;
> +          case LangNegStrategy::Lookup:
> +            return true;
> +        }
> +      }
> +      return foundMatch; 

trailing space

::: intl/locale/LocaleService.cpp:323
(Diff revision 11)
> +  if (aStrategy < 0 || aStrategy > 2) {
> +    return NS_ERROR_INVALID_ARG;
> +  }

Do this validity check first, before setting up the local arrays.

::: intl/locale/LocaleService.cpp:357
(Diff revision 11)
> +  nsAutoCString normLocale(aLocale);
> +  normLocale.ReplaceChar('_', '-');

Hmm, interesting. If we need to worry about this (where will we get locale codes using _ from?), I wonder if we should be normalizing the result of NegotiateLanguages? If so, maybe we should make `mLocaleStr` an actual string (rather than a const reference) and apply `ReplaceChar` to that, so that we'll be putting normalized strings into the result array?
Comment on attachment 8841428 [details]
Bug 1337694 - Add language negotiation heuristics to LocaleService.

https://reviewboard.mozilla.org/r/115658/#review118950

> Hmm, interesting. If we need to worry about this (where will we get locale codes using _ from?), I wonder if we should be normalizing the result of NegotiateLanguages? If so, maybe we should make `mLocaleStr` an actual string (rather than a const reference) and apply `ReplaceChar` to that, so that we'll be putting normalized strings into the result array?

> where will we get locale codes using _ from?

Non-trusted places. 

requestedLocales:
will come from the user. It means that they can come from prefs, from user query, from accepted-headers, from some other form of list created and curated by the user.

That means that we have to be ready for upper, lower-case, for underscores and dashes. My thinking is that I want to canonicalize it as lower case with dashes and compare.
It also means that in theory we may get wrong tag ("abcd-FEGHIJKL-FrFa21-oomomo") or non-ascii chars in it and we should invalidate a tag like this.
The only question is we want to be salvaging pieces (if the first subtag is good, take it etc.) or just reject tags that have errors. I think I'd like to go for the permissive approach of salvaging what we can.

availableLocales:
will come from the devs who will most likely generate it from some index or from listing files/packages/directories.
For that reason I also believe that we should be ready for lower/upper cases, dashes and underscores.
I'd feel more safe as to what kind of characters may be inside, but since we need a sanitization for requested, we should apply it for available as well.

defaultlocale:
this will be most safe as it'll probably come from the platform, but as above, would be good to keep it sanitized as well.

> I wonder if we should be normalizing the result of NegotiateLanguages

I don't think we should.

My logic is that the availableLocales list will be generated out of paths, or packages that the caller want to select from.
If the package is "fr_fr.xpi" or "./locales/main.DE_de.ftl", I believe the user will expect to be able to just paste the result string and retrieve the right file/package/directory.

So as a user of this API I expect the result to match exactly the tags from available locales.

Does it sound reasonable to you?
I applied rest of the feedback.
Attachment #8841428 - Flags: feedback?(jfkthame)
Comment on attachment 8841428 [details]
Bug 1337694 - Add language negotiation heuristics to LocaleService.

Sorry for the noise, MR is frustrating with the cancelling of the f?

I added a couple tests for weird and incorrect input and two of them fail now. One because it contains non-ASCII chars, and one because it somehow serializes [2] into "2" and [] into an empty Locale that matches everything?

I think we should sanitize the input by just testing if each string is a string in the XPCOM method (and throw if not), and in the C++ method verify that each string is a plain ASCII and skip the tag if it's not.
We also may want to test that "" is an incorrect tag as well.
Attachment #8841428 - Flags: feedback?(jfkthame)
Attachment #8844562 - Attachment is obsolete: true
Thank you!

I added your patch (plus '*' for range), updated tests and docs.

I think this is ready for review!
Comment on attachment 8841428 [details]
Bug 1337694 - Add language negotiation heuristics to LocaleService.

https://reviewboard.mozilla.org/r/115658/#review119740

A few cosmetic fixes (comment typos etc), otherwise this looks good. The one substantive question I have relates to the Lookup strategy, and whether it should require a defaultLocale to be passed (as the current comment claims, but code does not enforce).

::: intl/locale/LocaleService.h:25
(Diff revision 14)
>   *
>   * It's intended to be the core place for collecting available and
>   * requested languages and negotiating them to produce a fallback
>   * chain of locales for the application.
> + *
> + * The terms `Locale ID` and `Language ID` are used slightly differnetely

typo, "differently"

::: intl/locale/LocaleService.h:29
(Diff revision 14)
> + *
> + * The terms `Locale ID` and `Language ID` are used slightly differnetely
> + * by different organizations. Mozilla uses the term `Language ID` to describe
> + * a string that contains information about the language itself, script,
> + * region and variant. For example "en-Latn-US-mac" is a correct Language ID.
> + * 

lose the trailing space

::: intl/locale/LocaleService.h:109
(Diff revision 14)
> +   * the list of available locales.
> +   *
> +   * Internally it uses the following naming scheme:
> +   *
> +   *  Requested - locales requested by the user
> +   *  Availalbe - locales for which the data is available

typo, "Available"

::: intl/locale/LocaleService.cpp:85
(Diff revision 14)
>        obs->NotifyObservers(nullptr, "intl:app-locales-changed", nullptr);
>      }
>    }
>  }
>  
> +#define HANDLE_STRATEGY \

Let's have a comment for this, something like

```
// After trying each step of the negotiation algorithm for each requested locale,
// if a match was found we use this macro to decide whether to return immediately,
// skip to the next requested locale, or continue searching for additional matches,
// according to the desired negotiation strategy.
```

::: intl/locale/LocaleService.cpp:86
(Diff revision 14)
> +switch (aStrategy) { \
> +  case LangNegStrategy::Lookup: \
> +    return; \
> +  case LangNegStrategy::Matching: \
> +    continue; \
> +  case LangNegStrategy::Filtering: \
> +    break; \
> +}

Indent the continuation lines, please (I'd probably go for indenting them to match the macro name, plus an extra 2-char step).

::: intl/locale/mozILocaleService.idl:32
(Diff revision 14)
> +   *   Result:
> +   *     Supported: ['es-MX', 'es', 'fr', 'fr-CA']

The result will also have the default locale appended (unless it was already present in the matches).

::: intl/locale/mozILocaleService.idl:40
(Diff revision 14)
> +   * matching -
> +   *   Matches the best match from the available locales for every requested
> +   *   locale.
> +   *
> +   *   Result:
> +   *     Supported: ['es-MX', 'fr']

ditto

::: intl/locale/mozILocaleService.idl:44
(Diff revision 14)
> +   *   Result:
> +   *     Supported: ['es-MX', 'fr']
> +   *
> +   * lookup -
> +   *   Matches a single best locale. This strategy always returns a list
> +   *   of the length 1 and requires a defaultLocale to be set.

"and requires a defaultLocale" ... this is not true AFAICS in the code, it'll just return an empty list if no matches are found. Do we want to change that (e.g. throw an error if we're called with strategy=Lookup and no default, as it would indicate a programming error and will probably confuse subsequent code that assumes a locale will always be returned)?

::: intl/locale/mozILocaleService.idl:76
(Diff revision 14)
> +   * a list of available locales.
> +   *
> +   * Internally it uses the following naming scheme:
> +   *
> +   *  Requested - locales requested by the user
> +   *  Availalbe - locales for which the data is available

typo, "Available"
Attachment #8841428 - Flags: review?(jfkthame)
Comment on attachment 8841428 [details]
Bug 1337694 - Add language negotiation heuristics to LocaleService.

https://reviewboard.mozilla.org/r/115658/#review119740

> Indent the continuation lines, please (I'd probably go for indenting them to match the macro name, plus an extra 2-char step).

I don't understand, the switch lines are longer than the macro name line.
I updated the patch to use the wording suggested by :jfkthame - "ordered list" instead of "sorted list".
I only updated files that I'm touching in this patch. I'll update the OSPrefs comments when I'll be touching that files.
Comment on attachment 8841428 [details]
Bug 1337694 - Add language negotiation heuristics to LocaleService.

https://reviewboard.mozilla.org/r/115658/#review120018

A couple cosmetic things that would be nice to clean up, otherwise looks great!

::: intl/locale/LocaleService.cpp:89
(Diff revision 17)
> +#define HANDLE_STRATEGY \
> +switch (aStrategy) { \
> +  case LangNegStrategy::Lookup: \
> +    return; \
> +  case LangNegStrategy::Matching: \
> +    continue; \
> +  case LangNegStrategy::Filtering: \
> +    break; \
> +}

Sorry, I guess I wasn't clear... I meant doing something like

```
#define HANDLE_STRATEGY \
          switch (aStrategy) { \
            case LangNegStrategy::Lookup: \
              return; \
            case LangNegStrategy::Matching: \
              continue; \
            case LangNegStrategy::Filtering: \
              break; \
          }
```
i.e. indent the body of the macro under its name.

::: intl/locale/LocaleService.cpp:133
(Diff revision 17)
> +void LocaleService::FilterMatches(const nsTArray<nsCString>& aRequested,
> +                                  const nsTArray<nsCString>& aAvailable,
> +                                  LangNegStrategy aStrategy,
> +                                  nsTArray<nsCString>& aRetVal)

Sorry, just noticed this... please put the return type (even if it's just `void`) on its own line (and then adjust the indented params accordingly).

::: intl/locale/LocaleService.cpp:219
(Diff revision 17)
> +bool LocaleService::NegotiateLanguages(const nsTArray<nsCString>& aRequested,
> +                                       const nsTArray<nsCString>& aAvailable,
> +                                       const nsACString& aDefaultLocale,
> +                                       LangNegStrategy aStrategy,
> +                                       nsTArray<nsCString>& aRetVal)

Return type on its own line.

::: intl/locale/LocaleService.cpp:340
(Diff revision 17)
> +  bool result = NegotiateLanguages(
> +    requestedLocales, availableLocales, defaultLocale, strategy, supportedLocales
> +  );

This isn't a style of function-call wrapping we generally have in Gecko C++, at least not often; could we do
```
  bool result = NegotiateLanguages(requestedLocales, availableLocales,
                                   defaultLocale, strategy, supportedLocales);
```
instead, please.

::: intl/locale/LocaleService.cpp:348
(Diff revision 17)
> +  *aRetVal =
> +    static_cast<char**>(moz_xmalloc(sizeof(char*) * supportedLocales.Length()));
> +
> +  *aCount = 0;
> +  for (const auto& supported : supportedLocales) {
> +    (*aRetVal)[(*aCount)++] = moz_xstrdup(supported.get());
> +  }

This looks like it can use the CreateOutArray helper as proposed in bug 1344445:

```
  *aCount = supportedLocales.Length();
  *aRetVal = CreateOutArray(supportedLocales);
```

(If you want to leave it as-is here and then convert it when 1344445 lands, that's also fine; whatever's the easiest to manage the patches.)
Attachment #8841428 - Flags: review?(jfkthame) → review+
Comment on attachment 8841428 [details]
Bug 1337694 - Add language negotiation heuristics to LocaleService.

https://reviewboard.mozilla.org/r/115658/#review120018

> This looks like it can use the CreateOutArray helper as proposed in bug 1344445:
> 
> ```
>   *aCount = supportedLocales.Length();
>   *aRetVal = CreateOutArray(supportedLocales);
> ```
> 
> (If you want to leave it as-is here and then convert it when 1344445 lands, that's also fine; whatever's the easiest to manage the patches.)

Yup, I'll factor it out in bug 1344445.
Pushed by zbraniecki@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/120c713a857f
Add language negotiation heuristics to LocaleService. r=jfkthame
https://hg.mozilla.org/mozilla-central/rev/120c713a857f
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Filed bug 1345957 to adjust the relationship between defaultLocale and availableLocales.
Blocks: 1345957
Depends on: 1348797
See Also: → 1446052
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: