1614941 - Upper Sorbian Month/Year format shows year first instead of last for DateTimeFormat.format()

Assignee

Description

•

5 years ago

Michael reported this on the .l10n newsgroup, and I'm not sure if this is a intl api bug or a cldr bug.

new Intl.DateTimeFormat(['hsb'], {month:'long', year:'numeric'}).format(new Date())
"2020 februara"

but the expected result would be "februara 2020".

I'm not sure if this is a problem about picking an unfortunate pattern for "just date and year", or missing/bad data in cldr. I also don't know what ICU does if a pattern isn't there.

Starting with a bug here. What pattern do we use, which cldr data point do we end up using in this situation?

Michael Wolf

Comment 1

•

5 years ago

Attached image hamburger_whatsnew.png — Details

Zibi Braniecki [:zbraniecki][:gandalf]

Comment 2

•

5 years ago

CLDR: https://github.com/unicode-cldr/cldr-dates-full/blob/master/main/hsb/ca-gregorian.json#L333

As you can see, for narrow and short, the order is correct, but for long it is reversed:

"yM": "M.y",
"yMMM": "MMM y",
"yMMMM": "y MMMM",

CLDR issues can be reported at https://unicode-org.atlassian.net/projects/CLDR/issues

Axel Hecht [:Pike]

Assignee

Comment 3

•

5 years ago

Well, the json is not data we're using to ship. I have a hard time finding any data definitions in the xml or txt files for yMMMM that's not an interval format.

Which is why I asked about our data, and which skeleton we build, and which of our data points we actually end up using.

Zibi Braniecki [:zbraniecki][:gandalf]

Comment 4

•

5 years ago

Well, the json is not data we're using to ship.

We ship compressed data from ICU storage in https://searchfox.org/mozilla-central/source/intl/icu/source/data/locales

I have a hard time finding any data definitions in the xml or txt files for yMMMM that's not an interval format.

Yep, same here. My guess is that since CLDR's yMMMM skeleton matches root's skeleton's pattern, the ICU just cut it out to save space: https://searchfox.org/mozilla-central/source/intl/icu/source/data/locales/root.txt#1080

If hsb's yMMMM skeleton would differ, it'd be stored in https://searchfox.org/mozilla-central/source/intl/icu/source/data/locales/hsb.txt#336

Jeff Beatty [:gueroJeff]

Comment 5

•

5 years ago

Can we overwrite those files locally to ship the correct format while we wait for Unicode to respond? What is the typical response time for resolving CLDR issues like this? If we can't use local changes, what do we do about this bug while we wait for Unicode to act? I understand needing to submit these changes upstream, but this affects the quality of our localized user experiences, so we should have a vested interest in a workaround while we wait for upstream changes to be made.

Zibi Braniecki [:zbraniecki][:gandalf]

Comment 6

•

5 years ago

ICU allows for local overrides. We never did this and we don't really have a procedure around it so I'm not sure how much work it is (do we need build system changes?). Ive been advocating for an investment in alignment between cldr and gecko, and I suspect that this will become even more pronounced once we land Intl.DiplayNames and try to use them for language selectors.

Zibi Braniecki [:zbraniecki][:gandalf]

Comment 7

•

5 years ago

As for response time - I'd expect it to be very quick since the fix is trivial, but the release cycle is 6 months and as far as I'm aware we never got to actually contribute data to cldr yet. Flod was looking into it once through their survey tool which I think he has access to and can just contribute the data but I'm not sure if he got to the point where he's familiar with the process and I don't think anyone else is.

Jonathan Kew [:jfkthame]

Comment 8

•

5 years ago

(In reply to Zibi Braniecki [:zbraniecki][:gandalf] from comment #4)

Well, the json is not data we're using to ship.

We ship compressed data from ICU storage in https://searchfox.org/mozilla-central/source/intl/icu/source/data/locales

I have a hard time finding any data definitions in the xml or txt files for yMMMM that's not an interval format.

Yep, same here. My guess is that since CLDR's yMMMM skeleton matches root's skeleton's pattern, the ICU just cut it out to save space: https://searchfox.org/mozilla-central/source/intl/icu/source/data/locales/root.txt#1080

If hsb's yMMMM skeleton would differ, it'd be stored in https://searchfox.org/mozilla-central/source/intl/icu/source/data/locales/hsb.txt#336

Right, it'll inherit from the root locale if there's no more specific override.

The charts at https://www.unicode.org/cldr/charts/36/by_type/date_&_time.gregorian.html#21dded0fd50ba37e confirm that the expected pattern for "yMMMM" would be "y MMMM", which is the default for "all others" in the chart here; there's no exception for "hsb".

Michael Wolf

Comment 9

•

5 years ago

Hm, yes, it should be "MMMM y" and not "y MMMM". Dsb is not mentioned under "MMMM y", either. I cannot check it because I don't use Lower Sorbian Firefox productively so there are no fingerprinters collected yet.

Jonathan Kew [:jfkthame]

Comment 10

•

5 years ago

According to https://st.unicode.org/cldr-apps/v#/hsb/Gregorian/21dded0fd50ba37e, this data item shows as "missing" for Upper Sorbian, which I assume means it just hasn't been submitted to CLDR (as opposed to a specific value being present, but not matching what's requested here).

So submitting the proper format upstream is the primary thing that the localization community should do here. Whether we can implement a local override in the meantime is a further question to consider.

As a crude but immediate fix, can the hsb team work around this by replacing the cfr-whatsnew-tracking-blocked-subtitle string in the .ftl file with something like

Since { DATETIME($earliestDate, month: "long") } { DATETIME($earliestDate, year: "numeric") }

to force the desired order by generating the two parts separately?

Axel Hecht [:Pike]

Assignee

Comment 11

•

5 years ago

We already apply patches to ICU, and IMHO we can also patch the data for hsb and dsb.

We should do that instead of trying to work around the CLDR data in ways that are hard to maintain down the road, like manually hacking date formats.

Michael Wolf

Comment 12

•

5 years ago

I've created an Atlassian account. So I could file an issue on the CLDR issues page above mentioned. On this occasion I searched the issues and found the name of the Sorbian user who submitted the data to CLDR. I can contact him.

Jeff Walden [:Waldo]

Comment 13

•

5 years ago

The way we patch ICU is with separate patch files. But all the CLDR locale data, we store in the tree in pre-assembled format in an ICU .dat file in config/external/icu/data. If we were to start patching CLDR inputs, we would need a separate local-patch process from the one we use for ICU, and we'd need to do some work to integrate it into our update-ICU process -- which currently is some roughly-manual commands periodically run -- to avoid these things being overwritten.

ICU and CLDR may be bits in the same area, but the extra .dat file generation step means patching ICU is pretty simple to do, while patching CLDR is much less simple. I already push back on us taking ICU patches of any meaningful complexity, and frankly patching CLDR is definitely more complexity than any simple C++ patch-file. That could be changed. But it would require work that is not at all the same difficulty as patching ICU right now.

Jonathan Kew [:jfkthame]

Comment 14

•

5 years ago

(In reply to Axel Hecht [:Pike] from comment #11)

We already apply patches to ICU, and IMHO we can also patch the data for hsb and dsb.

We should do that instead of trying to work around the CLDR data in ways that are hard to maintain down the road, like manually hacking date formats.

Given Jeff's comment, I think we should give real consideration to the option of "manually hacking [the] date format" to work around this, assuming it can be done as suggested within the hsb localization. Yes, it's not a great solution from the point of view of scalability or maintainability, but it has the merit of being simple and highly localized, and carries minimal risk (either now or in the future). And it would be entirely within the control of the localization team; it doesn't introduce any other dependency or bottleneck in the process.

Axel Hecht [:Pike]

Assignee

Comment 15

•

5 years ago

I've given my idea a try, gonna push that and let us check out the build.

Michael, thanks for starting the work to get changes into CLDR, would you mind opening an issue there so that we can link to it here? The patch I'm about to upload might also have hints on which data we'd like to add.

Jonathan, my take on what's simple and highly localized differs from yours. We're already talking about two localizations, and we're talking about inspecting each variable that's not explicitly formatted. As there might be a preformatted fluent value passed in, this can be anything. So this is a lot of locations, in two localizations, and no decent support from Pontoon to do that.

Axel Hecht [:Pike]

Assignee

Comment 16

•

5 years ago

Attached file Bug 1614941, add more date formats to dsb and hsb, r=jwalden — Details

Axel Hecht [:Pike]

Assignee

Comment 17

•

5 years ago

https://phabricator.services.mozilla.com/D62732 is what I have. I also managed to create the dat file from it, but sadly phabricator doesn't want me to store that file.

I've taken inspiration on which entries to add from de.txt, which is at least geographically close, but took the values from neighboring values in dsb/hsb. Michael, if you could take a look at those?

PS: intl/icu_sources_data.py doesn't work on macs, due to --output-sync not supported by bsd makes.

Axel Hecht [:Pike]

Assignee

Comment 18

•

5 years ago

Did a local build in the meantime, and that shows the expected result for new Intl.DateTimeFormat(['hsb'], {month:'long', year:'numeric'}).format(new Date())

Michael Wolf

Comment 19

•

5 years ago

I filed issue https://unicode-org.atlassian.net/projects/CLDR/issues/?filter=reportedbyme on CLDR and wrote an e-mail to the Sorbian user asking him to complete the missing date/time formats.

Axel Hecht [:Pike]

Assignee

Comment 20

•

5 years ago

The issue URL is https://unicode-org.atlassian.net/browse/CLDR-13580.

See Also: → https://unicode-org.atlassian.net/browse/CLDR-13580

Michael Wolf

Comment 21

•

5 years ago

@Axel Seems that all formats are OK on Phabricator. But, what Q stands for, quarter? And E is for weekday, isn't it? Most formats are as in German. The most important difference is that the full month name is in genitive.

Jonathan Kew [:jfkthame]

Comment 22

•

5 years ago

I guess patching files under intl/icu/source/data/locales/ in our tree won't help on distros that compile --with-system-icu, which I think was a supported option last time I looked.

Michael Wolf

Comment 23

•

5 years ago

Thanks for the corrected issue URL, Axel.

Axel Hecht [:Pike]

Assignee

Comment 24

•

5 years ago

(In reply to Jonathan Kew (:jfkthame) from comment #22)

I guess patching files under intl/icu/source/data/locales/ in our tree won't help on distros that compile --with-system-icu, which I think was a supported option last time I looked.

Yeah, system-icu won't have those fixes. OTH, we don't know what system icu has data-wise at all, I guess?

With more CLDR data being hooked into rust impls, and also with flatpack/snaps for distros, maybe it's OK to let that option go away?

Zibi Braniecki [:zbraniecki][:gandalf]

Comment 25

•

5 years ago

With more CLDR data being hooked into rust impls,

We are discussing ways to coordinate CLDR data in Gecko for Rust impls in bug 1613271.

As for the patch itself - it may be worth investigating using file substitution instead of patches here: https://github.com/unicode-org/icu/blob/master/docs/userguide/icu_data/buildtool.md#file-substitution

This would open up a way for us to locally add new locales, and manage our overrides as full resources.

Axel Hecht [:Pike]

Assignee

Comment 26

•

5 years ago

Is there a way to do partial overrides at build time? The nice thing about just a patch is that we'll get updates from upstream easily. At least many of them.

Zibi Braniecki [:zbraniecki][:gandalf]

Comment 27

•

5 years ago

Is there a way to do partial overrides at build time?

Not sure!

Steven - can you advise? Is there a way to just add a couple patterns instead of switching to maintain our own fork of the hsb gregorian.txt file?

Flags: needinfo?(srl)

Michael Wolf

Comment 28

•

5 years ago

The Sorbian user replied. He wrote that new data won't be included before CDLR 38, probably released in October. Data submission will start in April. He will get back to me when it's nearer the time.

Phabricator Automation

Updated

•

5 years ago

Assignee: nobody → l10n

Status: NEW → ASSIGNED

Axel Hecht [:Pike]

Assignee

Comment 29

•

5 years ago

It seems that ICU updated w/out looking at this. I see no reaction at all on the ticket there, sadly.

On a positive note, my original patch was for 65, and we're not at 67.1, and the patch rebased w/out any issues at all.

Pulsebot

Comment 30

•

5 years ago

Pushed by axel@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/aea74a39c13a add more date formats to dsb and hsb, r=jwalden

Atila Butkovits

Comment 31

•

5 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/aea74a39c13a

Status: ASSIGNED → RESOLVED

Closed: 5 years ago

status-firefox80: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla80

André Bargull [:anba]

Comment 32

•

5 years ago

Hi Michael,

CLDR 38 will add its own date-time patterns for the skeletons "MMMMd" and "yMMMM":

Skeleton	CLDR 37 (ICU 67)	CLDR 38 (ICU 68)	Firefox
MMMMd	MMMM d	d MMMM	d. MMMM
yMMMM	y MMMM	LLLL y	MMMM y

(The pattern symbols are from https://unicode.org/reports/tr35/tr35-dates.html#Date_Field_Symbol_Table, for example "MMMM" means "month in wide format", whereas "LLLL" means "stand-alone month in wide format".)

For example when formatting January 1, 2020, we'll get the following strings:

Skeleton	CLDR 37 (ICU 67)	CLDR 38 (ICU 68)	Firefox
MMMMd	januara 1	1 januara	1. januara
yMMMM	2020 januara	januar 2020	januara 2020

Do we want to keep our current, customised output format or should we switch to use the standard patterns used in CLDR 38?

Thanks,
André

Flags: needinfo?(milupo)

Michael Wolf

Comment 33

•

5 years ago

Hi André,

thank you for your reply and help.

Still a question: In which context the pattern "januara 2020" is used? It is because "januara" is the genitive form of the month name but the genitive form of the month name is only used when there is a day number before so "1. januara" is correct but "januara 2020" should be "januar 2020" (month name in nominative) if there is no day number.

BTW, the use case from comment 1 I couldn't check again because there were no dates in that place again until now.

Flags: needinfo?(milupo)

André Bargull [:anba]

Comment 34

•

5 years ago

The JavaScript API only allows to specify the width of the individual date-time components, but no other context information can be applied. It's up to the web page developer to choose the correct options when creating an Intl.DateTimeFormat object (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Intl/DateTimeFormat).

For example this is currently only possible:

let date = new Date(2020, 0, 1);
let formatter = new Intl.DateTimeFormat("hsb", {month: "long", year: "numeric"});
console.log(formatter.format(date)); // Firefox output: "januara 2020"

or:

let date = new Date(2020, 0, 1);
let formatter = new Intl.DateTimeFormat("hsb", {month: "long", day: "numeric"});
console.log(formatter.format(date)); // Firefox output: "1. januara"

Michael Wolf

Comment 35

•

5 years ago

Hm, CLDR 38 would be better here. It distinguishes between Formatting (month name in genitive) and standalone (month name in nominative). But, to complicate the issue :-) sometimes a preposition is used before the date like in the use case of comment 1. There the preposition "wot" is used which requires the genitive. But the issue in comment 1 was the order in the date only, y MMMM instead of MMMM y.

A question: Which rule does e.g. Czech apply? It's similar to Upper Sorbian and Lower Sorbian.

André Bargull [:anba]

Comment 36

•

5 years ago

Czech uses:

Skeleton	Pattern	URL
MMMMd	d. MMMM	https://github.com/unicode-org/cldr/blob/387d0301f0ade6ffa632a18755590671065d9f53/common/main/cs.xml#L4254
yMMMM	LLLL y	https://github.com/unicode-org/cldr/blob/387d0301f0ade6ffa632a18755590671065d9f53/common/main/cs.xml#L4268

So Czech uses "M" when "d" is present, otherwise "L" is used.

What about the other difference in "MMMMd", where CLDR 38 uses "d MMMM", whereas our custom format is currently using "d. MMMM". I guess we want to stick with the full stop after "d"? (Is that a bug in CLDR 38? Every other pattern for Sorbian uses full stops as a delimiter: https://github.com/unicode-org/cldr/blob/387d0301f0ade6ffa632a18755590671065d9f53/common/main/hsb.xml#L1154-L1188 and https://github.com/unicode-org/cldr/blob/387d0301f0ade6ffa632a18755590671065d9f53/common/main/dsb.xml#L1163-L1197.)

Steven R. Loomis

Comment 37

•

2 years ago

(In reply to Zibi Braniecki [:zbraniecki][:gandalf] from comment #27)

Is there a way to do partial overrides at build time?

Not sure!

Steven - can you advise? Is there a way to just add a couple patterns instead of switching to maintain our own fork of the hsb gregorian.txt file?

Years later, sorry i missed this

Flags: needinfo?(srl)

hamburger_whatsnew.png 5 years ago Michael Wolf 74.15 KB, image/png		Details
Bug 1614941, add more date formats to dsb and hsb, r=jwalden 5 years ago Axel Hecht [:Pike] 47 bytes, text/x-phabricator-request		Details \| Review