Closed Bug 1457571 Opened 6 years ago Closed 5 years ago

Language tag canonicalization should probably remove all extlang subtags

Tracking

()

Status:

RESOLVED DUPLICATE of bug 1522070

Tracking Flags:

Tracking

Status

firefox61

---

affected

People

(Reporter: anba, Unassigned)

References

Details

André Bargull [:anba]

Reporter

Description

•

6 years ago

https://tools.ietf.org/html/rfc5646#section-4.5

> The canonical form contains no 'extlang' subtags.  There is an
> alternate 'extlang form' that maintains or reinstates extlang
> subtags.  This form can be useful in environments where the presence
> of the 'Prefix' subtag is considered beneficial in matching or
> selection (see Section 4.1.2).

The question is now how to process extraneous extlang subtags? Simply drop all extlang subtags or treat the first extlang as the primary language subtag and remove the remaining extlang subtags (this is how ICU canonicalizes extlang subtags)?

Example:
The first option will return "en" for |Intl.getCanonicalLocales("en-abc-def-ghi")| whereas the second option will return "abc".

Jason Orendorff [:jorendorff]

Updated

•

6 years ago

Priority: -- → P3

André Bargull [:anba]

Reporter

Comment 1

•

6 years ago

Hmm, this kind of blocks bug 1433303, because the Intl.Locale proposal now contains methods to add resp. remove likely subtags <https://github.com/tc39/proposal-intl-locale/pull/30>:

The obvious choice to implement this new feature is to call ICU's uloc_addLikelySubtags/uloc_minimizeSubtags functions. But since both functions call uloc_canonicalize internally, we can't call them with the full language tag stored in the Intl.Locale object, because then we may trigger ICU-specific canonicalization steps. Examples: uloc_canonicalize returns "es-ES-u-cu-esp" when called with "es-ES-preeuro" (neither IANA nor CLDR compatible canonicalization), or it returns "it" when called with "und-ita" (IANA incompatible canonicalization; possibly CLDR compatible).

As a workaround I'd propose to call uloc_addLikelySubtags/uloc_minimizeSubtags with a `language-script-region` BCP49 language tag (so without variant, extension, and privateuse subtags). But for that to work we'd need to remove all extlang subtags...

Blocks: 1433303

André Bargull [:anba]

Reporter

Comment 2

•

5 years ago

Will be fixed as part of bug 1522070.

Status: NEW → RESOLVED

Closed: 5 years ago

Resolution: --- → DUPLICATE

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

Language tag canonicalization should probably remove all extlang subtags

Categories

(Core :: JavaScript: Internationalization API, enhancement, P3)

Tracking

()

People

(Reporter: anba, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Comment 2