Closed Bug 1457571 Opened 2 years ago Closed 1 year ago
Language tag canonicalization should probably remove all extlang subtags
https://tools.ietf.org/html/rfc5646#section-4.5 > The canonical form contains no 'extlang' subtags. There is an > alternate 'extlang form' that maintains or reinstates extlang > subtags. This form can be useful in environments where the presence > of the 'Prefix' subtag is considered beneficial in matching or > selection (see Section 4.1.2). The question is now how to process extraneous extlang subtags? Simply drop all extlang subtags or treat the first extlang as the primary language subtag and remove the remaining extlang subtags (this is how ICU canonicalizes extlang subtags)? Example: The first option will return "en" for |Intl.getCanonicalLocales("en-abc-def-ghi")| whereas the second option will return "abc".
Hmm, this kind of blocks bug 1433303, because the Intl.Locale proposal now contains methods to add resp. remove likely subtags <https://github.com/tc39/proposal-intl-locale/pull/30>: The obvious choice to implement this new feature is to call ICU's uloc_addLikelySubtags/uloc_minimizeSubtags functions. But since both functions call uloc_canonicalize internally, we can't call them with the full language tag stored in the Intl.Locale object, because then we may trigger ICU-specific canonicalization steps. Examples: uloc_canonicalize returns "es-ES-u-cu-esp" when called with "es-ES-preeuro" (neither IANA nor CLDR compatible canonicalization), or it returns "it" when called with "und-ita" (IANA incompatible canonicalization; possibly CLDR compatible). As a workaround I'd propose to call uloc_addLikelySubtags/uloc_minimizeSubtags with a `language-script-region` BCP49 language tag (so without variant, extension, and privateuse subtags). But for that to work we'd need to remove all extlang subtags...
Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → DUPLICATE
Duplicate of bug: 1522070
You need to log in before you can comment on or make changes to this bug.