Use UTS 35 Unicode BCP 47 Locale Identifiers instead of RFC-5646/6067 BCP 47 language tags

Assigned to
(NeedInfo from)



5 months ago
5 days ago


(Reporter: anba, Assigned: anba, NeedInfo)


(Blocks 1 bug, {leave-open})

Firefox Tracking Flags

(Not tracked)



(6 attachments)



5 months ago

Intl.Locale (bug 1433303) depends on denotes multiple open issues, but the PR got merged nonetheless, so it's not entirely clear what to implement in some edge cases.


Comment 1

3 months ago

I propose we first switch the language tag parser over to Unicode BCP 47 locale identifiers and if that works out without causing any web-compat issues, we can proceed to switch the canonicalisation to whatever UTS 35 specifies. (But also see


Comment 2

3 months ago changed ECMA-402 to use Unicode BCP47
locale identifiers instead of BCP47 language tags for language tags. That means
extlang subtags are no longer supported in language tags.


Comment 3

3 months ago

Irregular grandfathered language tags and regular grandfathered tags with
extlang-like subtags can't be parsed as Unicode BCP 47 locale identifiers, so
they now need to be rejected by the language tag parser.

Depends on D23536


Comment 4

3 months ago

Language tags only consisting of a private-use subtags are not allowed in Unicode
BCP 47 locale identifiers.

Depends on D23537


Comment 5

3 months ago

Unicode BCP 47 locale identifiers don't support four letter language subtags.

Depends on D23538


Comment 6

3 months ago
  • Strict parsing for "u" and "t" extensions is not yet implemented.
  • Canonicalisation per UTS 35 is also not yet implemented, so it still refers to BCP 47 tags.

Depends on D23539


Comment 7

3 months ago

Unicode BCP 47 locale identifiers have stricter requirements for the Unicode ("-u-") and
tranformed content ("-t-") extension sequences.

  • Keys in Unicode extensions must be of the form "alphanum alpha".
  • Transformed content extensions need to be parsed following the transformed_extensions
    syntax from UTS 35.

Depends on D23540



3 months ago
Duplicate of this bug: 1457571


3 months ago
Assignee: nobody → andrebargull

Comment 10

2 months ago

Pushed by
Part 1: Remove support for extlang subtags. r=jwalden
Part 2: Remove support for irregular grandfathered tags and regular grandfathered tags with extlang-like subtags. r=jwalden
Part 3: Remove support for privateuse-only language tags. r=jwalden
Part 4: Remove support for four letter language subtags. r=jwalden
Part 5: Update comments to refer to Unicode BCP 47 locale identifiers. r=jwalden
Part 6: Add strict parsing of Unicode and transform extension sequences. r=jwalden

Keywords: checkin-needed

== Change summary for alert #20419 (as of Fri, 12 Apr 2019 06:30:28 GMT) ==


1% Base Content JS linux64-shippable opt 4,023,191.33 -> 4,002,330.67
1% Base Content JS linux64-shippable-qr opt 4,023,148.00 -> 4,002,240.67
1% Base Content JS osx-10-10-shippable opt 4,020,194.67 -> 3,999,178.67
1% Base Content JS windows10-64-shippable opt 4,083,708.00 -> 4,062,744.67
1% Base Content JS windows10-64-shippable-qr opt 4,083,694.00 -> 4,062,758.00
0% Base Content JS linux64-shippable-qr opt 4,020,210.33 -> 4,002,276.67

For up to date results, see:

I...think this is still open to implement UTS canonicalization from comment 1? Not sure any more from reading the comment history here. I guess that's all the canonical-form things mentioned in like for en-u-ms-imperialen-u-ms-uksystem.

If that's all that's left, I guess we need to do some more hacking to read through all the transform extension data and generate the necessary code to handle that.

Flags: needinfo?(andrebargull)
Priority: -- → P2
You need to log in before you can comment on or make changes to this bug.