Closed Bug 1685075 Opened 5 years ago Closed 4 years ago

localeCompare behaves differently in C.utf8 locale

Tracking

()

Status:

RESOLVED FIXED

Milestone:

91 Branch

Tracking Flags:

Tracking

Status

firefox-esr78

---

unaffected

firefox84

---

unaffected

firefox85

---

wontfix

firefox86

---

wontfix

firefox87

---

wontfix

firefox88

---

wontfix

firefox89

---

wontfix

firefox90

---

wontfix

firefox91

---

fixed

People

(Reporter: marusak.matej, Assigned: anba)

References

(Regression)

Details

(Keywords: regression)

Attachments

(3 files)

Bug 1685075 - Part 1: Replace black/white-list in ICU data filter file. r=zbraniecki! 4 years ago André Bargull [:anba] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1685075 - Part 2: Remove "en-US-posix" locale from ICU data file. r=zbraniecki! 4 years ago André Bargull [:anba] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1685075 - Part 3: Use the actual supported locale when computing the default locale. r=zbraniecki! 4 years ago André Bargull [:anba] 48 bytes, text/x-phabricator-request		Details \| Review

marusak.matej

Reporter

Description

•

5 years ago

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:83.0) Gecko/20100101 Firefox/83.0

Steps to reproduce:

I am using the current firefox-nighty (I have it symlinked to 'firefox' in the following examples). This started to happen a few weeks ago. I cannot point out the exact version unfortunately.

$ LC_ALL=C.utf8 firefox                                                         
"Virtio block device".localeCompare("Virtio SCSI")                              
1

$ LC_ALL=en_US.UTF-8 firefox                                                    
"Virtio block device".localeCompare("Virtio SCSI")                              
-1

in Chrome it always is -1.

Even with using { sensitivity: 'base' } in localeCompare the
behavior would stay the same.

I am on Fedora 33.

Actual results:

Based on locale the result differs and it differs from what other browsers do.

Expected results:

It always should be -1 no matter the locale.

Tom S. (please needinfo tschuster)

Updated

•

5 years ago

Component: Untriaged → JavaScript: Standard Library

Product: Firefox → Core

André Bargull [:anba]

Assignee

Comment 1

•

5 years ago

This is a regression from bug 1635561.

STR:

Run LC_ALL=C.UTF-8 mozregression
Open dev-console and evaluate Intl.Collator().resolvedOptions().locale

Before bug 1635561, this returned "en-US", but now it is returning "en-US-posix".

moz-regression output:

12:29.00 INFO: No more integration revisions, bisection finished.
12:29.00 INFO: Last good revision: 1ce1ac399abc56ace9d4dd63190dcd3cf897e59a
12:29.00 INFO: First bad revision: 47eb8c778c414b89da6f59c092531252412e7fcb
12:29.00 INFO: Pushlog:
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=1ce1ac399abc56ace9d4dd63190dcd3cf897e59a&tochange=47eb8c778c414b89da6f59c092531252412e7fcb

Status: UNCONFIRMED → NEW

Component: JavaScript: Standard Library → Internationalization

Ever confirmed: true

Regressed by: 1635561

BMO Automation

Updated

•

5 years ago

Has Regression Range: --- → yes

Dan Minor [:dminor]

Comment 2

•

5 years ago

I ran a quick comparison between Firefox and Chrome:

let a = Intl.Collator("en-US");
let b = Intl.Collator("en-US-posix");
a.compare("Virtio block device", "Virtio SCSI") // -1 in both Chrome and Firefox
b.compare("Virtio block device", "Virtio SCSI") // -1 in Chrome, 1 in Firefox
a.resolvedOptions().locale // "en-US" in both Chrome and Firefox
b.resolvedOptions().locale // "en-US" in Chrome, "en-US-posix" in Firefox

Assignee: nobody → dminor

Severity: -- → S3

Zibi Braniecki [:zbraniecki][:gandalf]

Comment 3

•

5 years ago

is it V8 or us or implementer specific behavior?

Flags: needinfo?(andrebargull)

André Bargull [:anba]

Assignee

Comment 4

•

5 years ago

V8/Chrome doesn't ship "en-US-posix", so any request for it will always return the "en-US" fallback. I guess https://phabricator.services.mozilla.com/D98390 changed our behaviour, but I can't tell if the old or the new behaviour is more correct.

ICU changes "C" to "en-US-posix" in uprv_getDefaultLocaleID(), so at least from ICU's side using "en-US-posix" is correct. The collation difference happens because "en-US-posix" uses different rules, cf. https://searchfox.org/mozilla-central/source/intl/icu/source/data/coll/en_US_POSIX.txt.

Flags: needinfo?(andrebargull)

Zibi Braniecki [:zbraniecki][:gandalf]

Comment 5

•

5 years ago

Ok, so it seems to me like our behavior is correct. The only wiggle room I see is:

Do we want to ship en-US-posix CLDR data?
Should we read C as en-US-posix just because ICU does?

BugBot [:suhaib / :marco/ :calixte]

Updated

•

5 years ago

Keywords: regression

BugBot [:suhaib / :marco/ :calixte]

Comment 6

•

5 years ago

Set release status flags based on info from the regressing bug 1635561

status-firefox84: --- → unaffected

status-firefox85: --- → affected

status-firefox86: --- → affected

status-firefox-esr78: --- → unaffected

Julien Cristau [:jcristau]

Updated

•

5 years ago

status-firefox85: affected → fix-optional

André Bargull [:anba]

Assignee

Comment 7

•

5 years ago

Hmm, ICU canonicalises "en-US-posix" to "en-US-u-va-posix" (cf. Intl.getCanonicalLocales("en-us-posix") in V8/JSC), even though there's no variant mapping for "posix" in https://github.com/unicode-org/cldr/blob/master/common/supplemental/supplementalMetadata.xml. So when Intl.Collator("en-US-posix") is called, "en-US-posix" is first canonicalised through CanonicalizeLocaleList (which results in "en-US-u-va-posix" in V8/JSC) and when then searching for an available locale in LookupMatcher any Unicode extension sequences are removed (which means "en-US-u-va-posix" is changed to "en-US" in V8/JSC). So Intl.Collator("en-US-posix") doesn't use the "en-US-posix" locale in V8, because V8 doesn't ship "en-US-posix". And it also doesn't work in JSC, because JSC calls ICU canonicalisation functions which make it impossible to select "en-US-posix".

Maybe the Intl.getCanonicalLocales("en-us-posix") case should go into test262. This will cause test errors in V8/JSC, which may encourage someone to fix this case in ICU... :-)

Zibi Braniecki [:zbraniecki][:gandalf]

Comment 8

•

5 years ago

Thank you for the analysis!

I reported it in https://github.com/tc39/test262/issues/2928 and pending their resolution we'll likely close this bug.

Zibi Braniecki [:zbraniecki][:gandalf]

Comment 9

•

5 years ago

Andre - from the upstream ticket it seems that en-US-posix canonicalization should lead to en-US-u-va-posix according to LDML, and not just ICU4C implementation detail.

Would you agree that it means that our implementation is not performing full canonicalization?

Flags: needinfo?(andrebargull)

André Bargull [:anba]

Assignee

Comment 10

•

5 years ago

I don't think https://unicode.org/reports/tr35/#Legacy_Variants applies for "Unicode BCP 47 locale identifiers", but instead only for older locale identifier syntaxes. In the test262 ticket, you mentioned:

[...] https://unicode.org/reports/tr35/#Canonical_Unicode_Locale_Identifiers which calls https://unicode.org/reports/tr35/#Legacy_Variants .

But I don't see any reference to "3.8.2 Legacy Variants" in "3.2.1 Canonical Unicode Locale Identifiers". And I also don't see it mentioned in Annex C. LocaleId Canonicalization.

Therefore I still think the correct canonicalisation (in an ECMA-402 context) for en-US-posix is en-US-posix.

Flags: needinfo?(andrebargull)

Zibi Braniecki [:zbraniecki][:gandalf]

Comment 11

•

5 years ago

My mistake. I also cannot find a reference to 3.8.2 from 3.2.1. Reported upstream

Zibi Braniecki [:zbraniecki][:gandalf]

Updated

•

5 years ago

Component: Internationalization → JavaScript: Internationalization API

Jason Orendorff [:jorendorff]

Updated

•

5 years ago

Priority: -- → P2

Pascal Chevrel:pascalc

Updated

•

5 years ago

status-firefox85: fix-optional → wontfix

status-firefox86: affected → wontfix

status-firefox87: --- → affected

Julien Cristau [:jcristau]

Updated

•

5 years ago

status-firefox87: affected → wontfix

status-firefox88: --- → affected

Tim Spurway [:tspurway]

Updated

•

5 years ago

status-firefox88: affected → wontfix

status-firefox89: --- → fix-optional

Zibi Braniecki [:zbraniecki][:gandalf]

Comment 12

•

4 years ago

We now have CLDR consensus - https://unicode-org.atlassian.net/browse/CLDR-14487 - LDML will get updated to apply legacy variants during canonicalization.

André Bargull [:anba]

Assignee

Comment 13

•

4 years ago

Okay, if the resolution is to canonicalise "en-US-posix" to "en-US-u-va-posix", we should simply strip "en-US-posix" from the ICU data file, because "en-US-u-va-posix" can never be selected from Intl service constructors.

From https://tc39.es/ecma402/#sec-internal-slots:

[[AvailableLocales]] is a List that contains structurally valid (6.2.2) and canonicalized (6.2.3) Unicode BCP 47 locale identifiers [...]. Language tags on the list must not have a Unicode locale extension sequence. [...]

Because elements in [[AvailableLocales]] mustn't have Unicode locale extension sequences, like for example "u-va-posix", the input "en-US-u-va-posix" can never be resolved from LookupMatcher and BestFitMatcher and therefore it doesn't make sense to ship the data for it.

André Bargull [:anba]

Assignee

Comment 14

•

4 years ago

Zibi, do you agree with the plan to remove "en-US-posix" from the ICU data file?

Flags: needinfo?(zbraniecki)

Zibi Braniecki [:zbraniecki][:gandalf]

Comment 15

•

4 years ago

Zibi, do you agree with the plan to remove "en-US-posix" from the ICU data file?

Yes. I'm comfortable with it. the data seems to be mostly confusing users and causing web compat issues since other browsers do not ship it.

Flags: needinfo?(zbraniecki)

André Bargull [:anba]

Assignee

Comment 16

•

4 years ago

Attached file Bug 1685075 - Part 1: Replace black/white-list in ICU data filter file. r=zbraniecki! — Details

Replace "whitelist" and "blacklist" with "includelist" resp. "excludelist", because
the latter is now the preferred name in ICU and the ICU docs/examples are all using
the new names.

André Bargull [:anba]

Assignee

Comment 17

•

4 years ago

Attached file Bug 1685075 - Part 2: Remove "en-US-posix" locale from ICU data file. r=zbraniecki! — Details

The filter file doesn't support exclusion lists for the "locales" filter type
(https://github.com/unicode-org/icu/blob/main/docs/userguide/icu_data/buildtool.md#filtering-by-locale),
therefore we have to manually exclude "en-US-posix" from the relevant resource
types: "en-US-posix" data is only present for collation, locales, and break
iteration. Break iteration is already completely stripped from the data file,
so we don't need to change anything on that front.

The string must be "en_US_POSIX" to match the resource file name, also see
https://unicode-org.atlassian.net/browse/ICU-21400.

Depends on D117975

André Bargull [:anba]

Assignee

Comment 18

•

4 years ago

Attached file Bug 1685075 - Part 3: Use the actual supported locale when computing the default locale. r=zbraniecki! — Details

This change ensures we don't report "en-US-posix" as the default locale when
LANG=C is set by the user, because that could be confusing after part 2.

The current rules about selecting the appropriate default locale were last
discussed in https://bugzilla.mozilla.org/show_bug.cgi?id=1175347. The
preference in that bug was to accept every part of the default locale as long
as there's a possible fallback locale. For example when the user locale is
"de-ZA", which can be supported through the fallback to "de", "de-ZA" as a whole
is accepted. But "de-ZA" is not accepted when the default locale is for example
just "de".

The test cases were adapted to use a locale which has multiple subtags and which
has only partial support in Intl.Collator: Intl.Collator only natively
supports "az", but not "az-Cyrl-AZ". "az-Cyrl-AZ" is completely supported by all
other Intl service constructors.

Depends on D117976

Dan Minor [:dminor]

Comment 19

•

4 years ago

:anba, thanks for the patches!

Assignee: dminor → andrebargull

Pulsebot

Comment 20

•

4 years ago

Pushed by andre.bargull@gmail.com: https://hg.mozilla.org/integration/autoland/rev/38fcef1d6c87 Part 1: Replace black/white-list in ICU data filter file. r=zbraniecki https://hg.mozilla.org/integration/autoland/rev/44cf438c40fd Part 2: Remove "en-US-posix" locale from ICU data file. r=zbraniecki https://hg.mozilla.org/integration/autoland/rev/8e44d65bbe08 Part 3: Use the actual supported locale when computing the default locale. r=zbraniecki

Dorel Luca [:dluca]

Comment 21

•

4 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/38fcef1d6c87
https://hg.mozilla.org/mozilla-central/rev/44cf438c40fd
https://hg.mozilla.org/mozilla-central/rev/8e44d65bbe08

Status: NEW → RESOLVED

Closed: 4 years ago

status-firefox91: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 91 Branch

BugBot [:suhaib / :marco/ :calixte]

Comment 22

•

4 years ago

Since the status are different for nightly and release, what's the status for beta?
For more information, please visit auto_nag documentation.

status-firefox90: --- → ?

André Bargull [:anba]

Assignee

Updated

•

4 years ago

status-firefox89: fix-optional → wontfix

status-firefox90: ? → wontfix

You need to log in before you can comment on or make changes to this bug.