Closed Bug 1428698 Opened 6 years ago Closed 6 years ago

Align intl::locale::Locale with BCP47

Categories

(Core :: Internationalization, enhancement, P3)

enhancement

Tracking

()

RESOLVED FIXED
mozilla60
Tracking Status
firefox60 --- fixed

People

(Reporter: zbraniecki, Assigned: zbraniecki)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

Once bug 1428530 lands, I'd like to take the new Locale class and align it more with BCP47.

We currently use '*' for ranges, but that makes our implementation quirky, and I got a comparable negotiation heuristic without the quirkiness in https://github.com/projectfluent/fluent-locale-rs

I'd like to get this step before we expose it to JS code (bug 1349377).
Depends on: 1428530
Comment on attachment 8940600 [details]
Bug 1428698 - Align intl::locale::Locale with BCP47.

It's still fairly early, but the core Locale class works, so it may be a good time to ask :jfkthame for feedback on the C++ use!
Attachment #8940600 - Flags: feedback?(jfkthame)
Comment on attachment 8940600 [details]
Bug 1428698 - Align intl::locale::Locale with BCP47.

The patch is now complete, and passes all gtests, but it has a bug when running xpcshell-tests.

I'm not sure why, but it seems that ereaseFromAvail somehow breaks the availLocales and the consecutive loops through them repeat the last locale, twice in many cases.

In other words if the available is ["de", "fr", "it"] and I match "de" on the first step in FilterMatches, then on the second it'll loop through ["it", "it"] instead of ["fr", "it"].

Jonathan - if you notice anything suspicious during the feedback round, it may be it! :)

I'd also like to consider filtering out duplicates in requested, available and supported, but not sure how.

Except of those two, I think the patch is in a good shape for the first round of feedback, so leaving it for now until Jonathan has a chance to take a look :)
Attachment #8940600 - Flags: feedback?(jfkthame)
Assignee: nobody → gandalf
Status: NEW → ASSIGNED
Comment on attachment 8940600 [details]
Bug 1428698 - Align intl::locale::Locale with BCP47.

https://reviewboard.mozilla.org/r/210812/#review217054

A number of suggested tweaks here... note that I haven't actually compiled this, though, so there could be errors in the suggestions!

Offhand I don't see why you're getting the xpcshell test problem in comment 5; there must be something I'm overlooking. I'll take a fresh look later and see if it becomes clearer.

::: intl/locale/Locale.h:13
(Diff revision 3)
> - * set to "*".
> - */
>  class Locale {
>    public:
> -    Locale(const nsCString& aLocale, bool aRange);
> +    Locale(const nsACString& aLocale);
> +    Locale(const char* aLocale): Locale(nsAutoCString(aLocale)) {};

This should be

    Locale(const char* aLocale)
      : Locale(nsDependentCString(aLocale))
      { }

to avoid actually copying the string.

::: intl/locale/Locale.h:15
(Diff revision 3)
> -    bool Matches(const Locale& aLocale) const;
> -    bool LanguageMatches(const Locale& aLocale) const;
> +    nsACString& GetLanguage();
> +    nsACString& GetScript();
> +    nsACString& GetRegion();
> +    nsTArray<nsCString>& GetVariants();

These should probably all be declared `const` and return const references, as:

    const nsACString& GetLanguage() const;

etc. And might as well inline them, too.

::: intl/locale/Locale.h:20
(Diff revision 3)
> +    bool IsBogus();
> +    void AsString(nsACString& aRetVal);
>  
> -    void SetVariantRange();
> -    void SetRegionRange();
> +    bool Matches(Locale& aOther);
> +    bool MatchesRange(Locale& aOther, bool aAvailableRange, bool aRequestedRange);

I think all these can also be `const` methods.

::: intl/locale/Locale.h:41
(Diff revision 3)
>    private:
> -    const nsCString& mLocaleStr;
> -    nsCString mLanguage;
> -    nsCString mScript;
> -    nsCString mRegion;
> -    nsCString mVariant;
> +    nsAutoCStringN<3> mLanguage;
> +    nsAutoCStringN<4> mScript;
> +    nsAutoCStringN<2> mRegion;
> +    nsTArray<nsCString> mVariants;
> +    bool mBogus = false;

As a matter of style, I'd prefer to initialize `mBogus` to `true`, and then have the constructor set it to `false` at the point where it puts a valid state into the member variables.

(Actually, I'd lean towards reversing the entire sense of the flag, making it `bool mIsValid = false` here, setting `mIsValid = true` when a valid language is found, exposing `Locale::IsValid()` rather than `IsBogus()`, etc.)

::: intl/locale/Locale.cpp:33
(Diff revision 3)
> -        break;
> -      case 1:
> -        if (part.EqualsLiteral("*") || part.Length() == 4) {
> +      nsAutoCStringN<3> normSubTag;
> +      ToLowerCase(subTag, normSubTag);
> +      mLanguage.Assign(normSubTag);

I don't think there's any need for the intermediate `normSubTag` here; you can just do

    mLanguage = subTag;
    ToLowerCase(mLanguage);

::: intl/locale/Locale.cpp:36
(Diff revision 3)
> -          mLanguage.Assign(part);
> -        }
> -        break;
> -      case 1:
> -        if (part.EqualsLiteral("*") || part.Length() == 4) {
> -          mScript.Assign(part);
> +        return;
> +      }
> +      nsAutoCStringN<3> normSubTag;
> +      ToLowerCase(subTag, normSubTag);
> +      mLanguage.Assign(normSubTag);
> +      position = 2;

It's unclear to me what `position` means here, exactly; if it's an index into the `aLocale` string, then shouldn't this be `position = slen;` rather than `2`? Or if it's the index the next component of the locale to look for (like the old `partNum`), why isn't it `1` here?

::: intl/locale/Locale.cpp:40
(Diff revision 3)
> -        // fallover to region case
> -        partNum++;
> -        MOZ_FALLTHROUGH;
> -      case 2:
> -        if (part.EqualsLiteral("*") || part.Length() == 2) {
> +			const nsACString& firstLetter = Substring(normSubTag, 0, 1);
> +      nsAutoCStringN<1> upperFirstLetter;
> +      ToUpperCase(firstLetter, upperFirstLetter);
> +      normSubTag.Replace(0, 1, upperFirstLetter);
> +      mScript.Assign(normSubTag);

Again, this can be much simpler, given that we don't need all the complexity of full Unicode casing:

    mScript = subtag;
    ToLowerCase(mScript);
    mScript[0] = GetUppercase(mScript[0]);

::: intl/locale/Locale.cpp:47
(Diff revision 3)
> -        break;
> -      case 3:
> -        if (part.EqualsLiteral("*") || (part.Length() >= 3 && part.Length() <= 8)) {
> +      nsAutoCStringN<2> normSubTag;
> +      ToUpperCase(subTag, normSubTag);
> +      mRegion.Assign(normSubTag);

As above, skip the intermediate `normSubTag` variable.

::: intl/locale/Locale.cpp:56
(Diff revision 3)
> +      nsAutoCStringN<8> normSubTag;
> +      ToLowerCase(subTag, normSubTag);
> +      mVariants.AppendElement(normSubTag);

And again.

::: intl/locale/Locale.cpp:71
(Diff revision 3)
> +Locale::AsString(nsACString& aRetVal)
> +{

As a sanity-check, let's add

    MOZ_ASSERT(aRetVal.IsEmpty());

at the beginning of this.

::: intl/locale/Locale.cpp:80
(Diff revision 3)
> +    return;
> -    }
> +  }
> -    if (mRegion.IsEmpty()) {
> -      mRegion.AssignLiteral("*");
> +
> +  aRetVal.Append(mLanguage);
> +
> +  if (mScript.Length()) {

I think we prefer to write `!mScript.IsEmpty()` for this check.

::: intl/locale/Locale.cpp:90
(Diff revision 3)
> -      mVariant.AssignLiteral("*");
> +  if (mRegion.Length()) {
> +    aRetVal.AppendLiteral("-");
> +    aRetVal.Append(mRegion);
> -    }
> +  }
> +
> +  if (mVariants.Length()) {

No need for the `if`-check here, the `for`-loop will just be skipped if the array is empty.

::: intl/locale/Locale.cpp:91
(Diff revision 3)
> +    aRetVal.AppendLiteral("-");
> +    aRetVal.Append(mRegion);
> -    }
> +  }
> +
> +  if (mVariants.Length()) {
> +    for (auto& variant : mVariants) {

`const auto&`

::: intl/locale/Locale.cpp:192
(Diff revision 3)
> +}
> +
> +void
> +Locale::ClearRegion()
> +{
> +  mRegion.Assign("");

To clear the string, prefer

    mRegion.Truncate();

rather than an assignment.

::: intl/locale/LocaleService.cpp:603
(Diff revision 3)
>  LocaleService::LanguagesMatch(const nsCString& aRequested,
>                                const nsCString& aAvailable)

The parameters here can now be `const nsACString&`, I think.
Comment on attachment 8940600 [details]
Bug 1428698 - Align intl::locale::Locale with BCP47.

https://reviewboard.mozilla.org/r/210812/#review217054

> Again, this can be much simpler, given that we don't need all the complexity of full Unicode casing:
> 
>     mScript = subtag;
>     ToLowerCase(mScript);
>     mScript[0] = GetUppercase(mScript[0]);

I couldn't get it to compile. Had to go for `mScript.Replace(0, 1, GetUpperCase(mScript[0]));`.
Comment on attachment 8940600 [details]
Bug 1428698 - Align intl::locale::Locale with BCP47.

Switched AsString to return a new string. Ready for the next round of feedback (and maybe you'll be able to help me find the culprit for the test failure?)
Attachment #8940600 - Flags: feedback?(jfkthame)
(In reply to Zibi Braniecki [:gandalf][:zibi] from comment #10)
> Comment on attachment 8940600 [details]
> Bug 1428698 - Align intl::locale::Locale with BCP47.
> 
> Switched AsString to return a new string. Ready for the next round of
> feedback (and maybe you'll be able to help me find the culprit for the test
> failure?)

I think the problem is that the Locale object can't just be copied around "blindly" by moving chunks of memory, which is what nsTArray will do by default; we need to ensure it gets properly constructed/destructed. To do this, add

    DECLARE_USE_COPY_CONSTRUCTORS(mozilla::intl::Locale)

at the end of Locale.h (after closing the namespaces; this needs to be at global scope).

With that change, I no longer get the strange error as described in comment 5, with duplicated "de"; eraseFromAvail seems to behave properly. However, some of the following testcases report failures, so you may want to re-examine the matching behavior and see if the expectations are correct.
Seems like that was it! The patch is now ready for review :)
Comment on attachment 8940600 [details]
Bug 1428698 - Align intl::locale::Locale with BCP47.

https://reviewboard.mozilla.org/r/210812/#review219228

Looking good! A couple minor comments, as I think you can simplify/clarify a bit (oh, and what about that XXX comment?), but basically I think this is just about ready.

::: intl/locale/Locale.cpp:147
(Diff revision 6)
>  {
> -  return AddLikelySubtagsForLocale(mLocaleStr);
> +  return MatchesRange(aOther, false, false);
>  }
>  
>  bool
> -Locale::AddLikelySubtagsWithoutRegion()
> +Locale::MatchesRange(const Locale& aOther, bool aAvailableRange, bool aRequestedRange) const

I think the parameter names `aAvailableRange` and `aRequestedRange` are confusing here; they assume the caller has a specific use-case, but this method doesn't know anything about "available" vs "requested", it is just asking whether "this" Locale matches an "other" one.

So IMO, `aThisRange` and `aOtherRange` would be clearer here.

Although, is this ever called with `aAvailableRange` (or `aThisRange`) set to `false`? Only by the `Matches` method immediately above, AFAICS -- but can't we just use `==` for that? Is it needed at all?

So with that, I suspect you can drop the first of the `bool` parameters here altogether, and just have

    `Locale::Matches(const Locale& aOther, bool aRange) const`

::: intl/locale/Locale.cpp:165
(Diff revision 6)
> +  if ((!aAvailableRange || !mRegion.IsEmpty()) &&
> +      (!aRequestedRange || !aOther.mRegion.IsEmpty()) &&
> +      !mRegion.Equals(aOther.mRegion)) {
> +    return false;
> +  }
> +  //XXX: Match variants

So...what about this? Don't we need to fill in the missing piece?

::: intl/locale/LocaleService.cpp:484
(Diff revision 6)
>      if (!aRetVal.IsEmpty()) {
>        HANDLE_STRATEGY;
>      }
>  
>      // 2) Try to match against the available locales treated as ranges.
> -    auto findRangeMatches = [&](const Locale& aReq) {
> +    auto findRangeMatches = [&](Locale& aReq, bool aAvailRange, bool aReqRange) {

AFAICS, the callers always pass `true` for `aAvailRange` (and the comment above says "the available locales [are] treated as ranges" anyhow), so I think you can just drop that argument. Only `aReqRange` varies and needs to be passed by the caller.

::: intl/locale/tests/gtest/TestLocale.cpp:35
(Diff revision 6)
> +TEST(Intl_Locale_Locale, Matches) {
> +  Locale loc = Locale("en-US");
> +
> +  Locale loc2 = Locale("en-GB");
> +  ASSERT_FALSE(loc.Matches(loc2));
> +
> +  Locale loc3 = Locale("en-US");
> +  ASSERT_TRUE(loc.Matches(loc3));
> +}

This could simply test using the equality operator, no? (Leaving the `Matches` name to be used for the version that handles ranges, as suggested above.)

How about testing that we handle casing and underscore normalization? So IIUC,

    Locale loc4 = Locale("En_us");
    ASSERT_TRUE(loc == loc4);

I suppose it'd make sense to add a set of range-match tests here, too; I realize they're tested indirectly by the negotiateLanguages tests, but as we have low-level C++ tests, we might as well cover that as well.
Attachment #8940600 - Flags: review?(jfkthame)
Thanks Jonathan!

Updated the patch. Per our conversation on IRC I kept the two flags for this/other ranges but renamed them and shifted function names as well.

Ready for re-review
Comment on attachment 8940600 [details]
Bug 1428698 - Align intl::locale::Locale with BCP47.

https://reviewboard.mozilla.org/r/210812/#review220450

Looks nice, let's go for it.

::: intl/locale/Locale.cpp:73
(Diff revisions 6 - 7)
>        position = 4;
>      } else if (position <= 4 && slen >= 3 && slen <= 8) {
>        // we're quirky here because we allow for variant to be 3 char long.
>        // BCP47 requires variants to be 5-8 char long at lest.
>        //
>        // We do this to support the `ja-JP-mac` quick that we have.

comment typo: s/quick/quirk/

::: intl/locale/tests/gtest/TestLocale.cpp:59
(Diff revisions 6 - 7)
> +  ASSERT_FALSE(loc.Matches(loc2, false, true));
> +  ASSERT_FALSE(loc.Matches(loc2, false, false));
> +  ASSERT_TRUE(loc.Matches(loc2, true, true));
> +
> +  Locale loc3 = Locale("en");
> +  ASSERT_FALSE(loc == loc2);

I think you meant to test loc3 here. :)
Attachment #8940600 - Flags: review?(jfkthame) → review+
Comment on attachment 8940600 [details]
Bug 1428698 - Align intl::locale::Locale with BCP47.

https://reviewboard.mozilla.org/r/210812/#review220602


Static analysis found 2 defects in this patch.
 - 2 defects found by clang-tidy

You can run this analysis locally with:
 - `./mach static-analysis check path/to/file.cpp` (C/C++)


If you see a problem in this automated review, please report it here: http://bit.ly/2y9N9Vx


::: intl/locale/Locale.cpp:15
(Diff revision 8)
> +
>  #include "unicode/uloc.h"
>  
>  using namespace mozilla::intl;
>  
> -Locale::Locale(const nsCString& aLocale, bool aRange)
> +Locale::Locale(const nsACString& aLocale)

Error: Bad implicit conversion constructor for 'locale' [clang-tidy: mozilla-implicit-constructor]

::: intl/locale/Locale.cpp:15
(Diff revision 8)
> +
>  #include "unicode/uloc.h"
>  
>  using namespace mozilla::intl;
>  
> -Locale::Locale(const nsCString& aLocale, bool aRange)
> +Locale::Locale(const nsACString& aLocale)

Error: Bad implicit conversion constructor for 'locale' [clang-tidy: mozilla-implicit-constructor]
Priority: -- → P3
Pushed by zbraniecki@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/804f26b2c6b8
Align intl::locale::Locale with BCP47. r=jfkthame
Ha. Interesting. The reason the test fails is because its entry is "ES-es", and we normalize it to "es-ES".

Basically, before the patch:

```
["ES-es"] === Services.locale.negotiateLanguages(["es-AR"], ["ES-es"]);
```

After this patch:

```
["es-ES"] === Services.locale.negotiateLanguages(["es-AR"], ["ES-es"]);
```


I see two options:

1) We can fix it in the test (as far as I see that's the only test that fails!) and then we would finally be able to enforce language tag normalization
2) We can recreate the behavior of negotiateLanguages returning non-normalized language tags. This will be a bit tricky because we'll have to keep track of what is the source string for each Locale object out of available to put it in the result array.



Jonathan - do you have a preference? If you prefer the latter - can you advice on how to refactor the FilterMatches code to keep track of the input string for each locale? I'd prefer not to keep it on the Locale object anymore because it seems like an artifact of the negotiation algorithm, not Locale class.
Flags: needinfo?(gandalf) → needinfo?(jfkthame)
I think we should just say that all APIs returning a locale code should return it in normalized form, and fix the test accordingly.
Flags: needinfo?(jfkthame)
Ok. The try looks green - https://treeherder.mozilla.org/#/jobs?repo=try&revision=5b74e91e86d10af643a9b67c69d9b6a56abe607f

I also added a comment about canonicalization of the output of negotiateLanguages. Relanding.
Pushed by zbraniecki@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/c1ced18856e8
Align intl::locale::Locale with BCP47. r=jfkthame
https://hg.mozilla.org/mozilla-central/rev/c1ced18856e8
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla60
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: