Closed Bug 1857742 Opened 9 months ago Closed 7 months ago

:lang() fails to understand some valid BCP 47 language tags since FF 114

Tracking

()

Status:

RESOLVED FIXED

Milestone:

121 Branch

Tracking Flags:

Tracking

Status

firefox-esr115

---

fixed

firefox119

---

wontfix

firefox120

---

wontfix

firefox121

---

fixed

People

(Reporter: 747.neutron, Assigned: jfkthame)

References

(Regression)

Details

(Keywords: regression)

Attachments

(10 files)

lang-pseudo-class-bug.html 9 months ago 747.neutron 1017 bytes, text/html		Details
:lang() tester rendering on GC 117 9 months ago 747.neutron 33.47 KB, image/png		Details
browserling-screenshot.png 9 months ago 747.neutron 120.78 KB, image/png		Details
browserling-screenshot(1).png 9 months ago 747.neutron 113.79 KB, image/png		Details
Bug 1857742 - patch 1 - Vendor the oxilangtag crate into third_party/rust. r=#layout,#supply-chain-reviewers 7 months ago Jonathan Kew [:jfkthame] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1857742 - patch 2 - Use oxilangtag rather than unic_langid to parse lang tags for nsStyleUtil::LangTagCompare. r=#layout 7 months ago Jonathan Kew [:jfkthame] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1857742 - patch 3 - Add some more :lang()-matching reftests. r=#layout 7 months ago Jonathan Kew [:jfkthame] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1857742 - patch 1 [esr115] - Vendor the oxilangtag crate into third_party/rust. 7 months ago Jonathan Kew [:jfkthame] 48 bytes, text/x-phabricator-request	RyanVM : approval-mozilla-esr115+	Details \| Review
Bug 1857742 - patch 2 [esr115] - Use oxilangtag rather than unic_langid to parse lang tags for nsStyleUtil::LangTagCompare. 7 months ago Jonathan Kew [:jfkthame] 48 bytes, text/x-phabricator-request	RyanVM : approval-mozilla-esr115+	Details \| Review
Bug 1857742 - patch 3 [esr115] - Add some more :lang()-matching reftests. 7 months ago Jonathan Kew [:jfkthame] 48 bytes, text/x-phabricator-request	RyanVM : approval-mozilla-esr115+	Details \| Review

747.neutron

Reporter

Description

•

9 months ago

Attached file lang-pseudo-class-bug.html — Details

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/118.0

Steps to reproduce:

While I'm not sure whether it is HTML or CSS parser to blame, but the :lang() pseudo-class apparently refuses to parse some valid BCP 47 tags, while it allows some other invalid ones.

As shown in my attached file, the behavior leads me to believe that Firefox actually recognizes Unicode language identifier as defined in UTS #35, instead of BCP 47.

Unicode language and locale identifiers inherit the design and the repertoire of subtags from [BCP47] Language Tags. There are some extensions and restrictions made for the use of the Unicode locale identifier in CLDR:

It does not allow for the full syntax of [BCP47]:

No extlang subtags are allowed (as in the BCP 47 canonical form, see BCP 47 Section 4.5 and Section 3.1.7)

No irregular BCP 47 legacy language tags (marked as “Type: grandfathered” in BCP 47) are allowed (these are all deprecated in BCP 47)

A tag must not start with the subtag "x": thus a privateuse (eg x-abc) can only be after a language subtag, like "und"

It allows for certain semantic additions and constraints:

Certain codes that are private-use in BCP 47 and ISO are given semantics by LDML

Each macrolanguage has an identified primary encompassed language, which is treated as an alias for the macrolanguage, and thus is replaced when canonicalizing (as allowed by BCP 47, see Section 4.1.2)

It allows certain syntax for backwards compatibility (not BCP 47-compatible):

The "_" character for field separator characters, as well as the "-" used in [BCP47] (however, the canonical form is with "-")

The subtag "root" to indicate the generic locale used as the parent of all languages in the CLDR data model ("und" can be used instead)

The language tag may begin with a script subtag rather than a language subtag. This is specialized use only, and not required for CLDR conformance.

I think it is a bug, because according to the HTML and CSS standards, their values should be handled as a BCP 47 language tag.

The lang attribute (in no namespace) specifies the primary language for the element's contents and for any of the element's attributes that contain text. Its value must be a valid BCP 47 language tag, or the empty string.

The :lang() pseudo-class, which accepts a comma-separated list of one or more language ranges [...] An element’s content language matches a language range if, when represented in BCP 47 syntax [BCP47], it matches that language range in an extended filtering operation per [RFC4647] Matching of Language Tags (section 3.3.2).

As I tested on Browserling, the behavior was introduced at Firefox 114. Chrome 117 is not affected by this issue.

Actual results:

Open the attachment with the browser.

The browser does not render the iw-ase-jpan-basiceng paragraph in boldface, even though it is a valid BCP 47 tag that should match iw.
The browser does render the zh_gb_oxendict paragraph in boldface, even though it is not a valid BCP 47 tag (and thus should not match zh).
The browser does not render the en-gb-oed and i-navajo paragraphs in boldface, even though they are valid (grandfathered) BCP 47 tags.
The browser does not render the x-lojban paragraph in boldface, even though it is a valid BCP 47 private tag.

Expected results:

Each paragraph should be rendered as how the text says.

747.neutron

Reporter

Comment 1

•

9 months ago

Attached image :lang() tester rendering on GC 117 — Details

Rendering on Google Chrome 117 (which I believe is correct).

747.neutron

Reporter

Comment 2

•

9 months ago

Attached image browserling-screenshot.png — Details

Rendering on FF 113, Windows 10, according to Browserling

747.neutron

Reporter

Comment 3

•

9 months ago

Attached image browserling-screenshot(1).png — Details

Rendering on FF 114, Windows 10, according to Browserling

BugBot [:suhaib / :marco/ :calixte]

Comment 4

•

9 months ago

The Bugbug bot thinks this bug should belong to the 'Core::Internationalization' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.

Component: Untriaged → Internationalization

Product: Firefox → Core

Tom S [:evilpie]

Updated

•

9 months ago

Component: Internationalization → CSS Parsing and Computation

Keywords: regression

Regressed by: 1121792

Version: Firefox 118 → Firefox 114

BugBot [:suhaib / :marco/ :calixte]

Comment 5

•

9 months ago

:jfkthame, since you are the author of the regressor, bug 1121792, could you take a look? Also, could you set the severity field?

For more information, please visit BugBot documentation.

Flags: needinfo?(jfkthame)

:lang() fails to understand some valid BCP 47 language tags since FF 114

ESR Uplift Approval Request