Open Bug 1596544 Opened 21 days ago Updated 17 days ago

Update Intl.Locale for the removal of canonical-form requirements on U Extensions between TR35-53 and TR35-57

Categories

(Core :: JavaScript: Internationalization API, enhancement)

enhancement
Not set

Tracking

()

ASSIGNED

People

(Reporter: Waldo, Assigned: Waldo)

Details

(Keywords: leave-open)

Attachments

(3 files)

In bug 1433303 comment 45 I noted concerns I had about Intl.Locale, and the canonicalizing it performs, being different from those performed by CanonicalizeLanguageTag.

Those concerns were valid under TR35 at the time I reviewed it.

But TR35 as it exists now has removed these requirements for U Extension canonical form:

  • All attributes are sorted in alphabetical order.
  • All keywords are sorted by alphabetical order of keys.
  • All keywords are in lowercase.
  • All keys and types use the canonical form (from the name attribute; see Section 3.6.4 U Extension Data Files).
  • Type value "true" is removed.

So now both APIs canonicalize identically -- particularly, that fourth bullet that requires replacing keys and types with preferred forms in certain cases, and maybe the fifth bullet too -- and my original concern no longer applies.

We should update our Intl.Locale canonicalizing to be identical to CanonicalizeLanguageTag.

And once that change is made, I think we can ship this. \o/

So, reading things more carefully again, I realize I -- still -- didn't understand things well enough when I wrote comment 0, or when earlier I said Intl.Locale was ready to advance. (Or at least I had forgotten some things I had previously understood.)

While TR35 changes did remove the replacement distinction, it doesn't deal with duplicates. And really, when I stare at Intl.Locale more closely, I pretty much conclude it can't.

It is extremely sensible for UnicodeExtensionComponents to pick the first value of each keyword and preserve only that. And even if it weren't...if a keyword is specified via the out-of-band option, it doesn't make sense for the presence of such to not replace all the existing instances of that keyword. (The alternative behavior of as-if-by-extension-prepending is just dumb, because it's sort of exposing a thing that looks more like implementation detail than reasoned choice.) And the presence or absence of a keyword in an option shouldn't really affect whether a keyword appears multiple times in the final thing, or just once.

Ultimately, I think Intl.Locale is specified in sane fashion as it exists now. But if so, there still remains a different-canonicalization behavior here that is unfortunate. Not as bad as when replacements happened -- subtags actively changing (and then having to explain to users how aliases and preferred values work, where you find the list of them, and so on in all the grody detail of TR35 and the XML files whose meaning it documents) is more confusing and worse than just duplicates/trailing useless subtags being removed. But not ideal.

I'm going to file an issue against the main Intl spec to make the canonicalization algorithm remove duplicate attributes and trim out later duplicate keywords. Ideally that will move in concert with the Intl.Locale change so everything moves consistently. But if not, explaining that duplicates aren't removed by the main canonicalization algorithm is simple enough I can let it slide.

This issue is blocked on https://github.com/tc39/proposal-intl-locale/issues/77 and probably also https://github.com/tc39/ecma402/issues/330.

Using complete UTS 35 canonicalisation in Intl.Locale was intentional (https://github.com/tc39/proposal-intl-locale/issues/43, https://github.com/tc39/proposal-intl-locale/issues/14, [1]), so removing it in our implementation now, just because the current proposal uses outdated references seems premature.


[1] IIRC more comments/remarks about how and when to canonicalise are spread throughout other issues/PRs in the proposal repo. For example https://github.com/tc39/proposal-intl-locale/issues/63, https://github.com/tc39/proposal-intl-locale/pull/11, https://github.com/tc39/proposal-intl-locale/pull/21 are likely candidates which could contains more hints about canonicalisation.

Keywords: leave-open
Pushed by jwalden@mit.edu:
https://hg.mozilla.org/integration/autoland/rev/b5c5ba07d3db
intl_ValidateAndCanonicalizeUnicodeExtensionType should ignore the second |option| argument until it's needed to report an error.  r=anba
You need to log in before you can comment on or make changes to this bug.