Closed Bug 1614941 Opened 5 years ago Closed 4 years ago

Upper Sorbian Month/Year format shows year first instead of last for DateTimeFormat.format()

Categories

(Core :: JavaScript: Internationalization API, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla80
Tracking Status
firefox80 --- fixed

People

(Reporter: Pike, Assigned: Pike)

References

Details

Attachments

(2 files)

Michael reported this on the .l10n newsgroup, and I'm not sure if this is a intl api bug or a cldr bug.

new Intl.DateTimeFormat(['hsb'], {month:'long', year:'numeric'}).format(new Date())
"2020 februara"

but the expected result would be "februara 2020".

I'm not sure if this is a problem about picking an unfortunate pattern for "just date and year", or missing/bad data in cldr. I also don't know what ICU does if a pattern isn't there.

Starting with a bug here. What pattern do we use, which cldr data point do we end up using in this situation?

Attached image hamburger_whatsnew.png

CLDR: https://github.com/unicode-cldr/cldr-dates-full/blob/master/main/hsb/ca-gregorian.json#L333

As you can see, for narrow and short, the order is correct, but for long it is reversed:

"yM": "M.y",
"yMMM": "MMM y",
"yMMMM": "y MMMM",

CLDR issues can be reported at https://unicode-org.atlassian.net/projects/CLDR/issues

Well, the json is not data we're using to ship. I have a hard time finding any data definitions in the xml or txt files for yMMMM that's not an interval format.

Which is why I asked about our data, and which skeleton we build, and which of our data points we actually end up using.

Well, the json is not data we're using to ship.

We ship compressed data from ICU storage in https://searchfox.org/mozilla-central/source/intl/icu/source/data/locales

I have a hard time finding any data definitions in the xml or txt files for yMMMM that's not an interval format.

Yep, same here. My guess is that since CLDR's yMMMM skeleton matches root's skeleton's pattern, the ICU just cut it out to save space: https://searchfox.org/mozilla-central/source/intl/icu/source/data/locales/root.txt#1080

If hsb's yMMMM skeleton would differ, it'd be stored in https://searchfox.org/mozilla-central/source/intl/icu/source/data/locales/hsb.txt#336

Can we overwrite those files locally to ship the correct format while we wait for Unicode to respond? What is the typical response time for resolving CLDR issues like this? If we can't use local changes, what do we do about this bug while we wait for Unicode to act? I understand needing to submit these changes upstream, but this affects the quality of our localized user experiences, so we should have a vested interest in a workaround while we wait for upstream changes to be made.

ICU allows for local overrides. We never did this and we don't really have a procedure around it so I'm not sure how much work it is (do we need build system changes?). Ive been advocating for an investment in alignment between cldr and gecko, and I suspect that this will become even more pronounced once we land Intl.DiplayNames and try to use them for language selectors.

As for response time - I'd expect it to be very quick since the fix is trivial, but the release cycle is 6 months and as far as I'm aware we never got to actually contribute data to cldr yet. Flod was looking into it once through their survey tool which I think he has access to and can just contribute the data but I'm not sure if he got to the point where he's familiar with the process and I don't think anyone else is.

(In reply to Zibi Braniecki [:zbraniecki][:gandalf] from comment #4)

Well, the json is not data we're using to ship.

We ship compressed data from ICU storage in https://searchfox.org/mozilla-central/source/intl/icu/source/data/locales

I have a hard time finding any data definitions in the xml or txt files for yMMMM that's not an interval format.

Yep, same here. My guess is that since CLDR's yMMMM skeleton matches root's skeleton's pattern, the ICU just cut it out to save space: https://searchfox.org/mozilla-central/source/intl/icu/source/data/locales/root.txt#1080

If hsb's yMMMM skeleton would differ, it'd be stored in https://searchfox.org/mozilla-central/source/intl/icu/source/data/locales/hsb.txt#336

Right, it'll inherit from the root locale if there's no more specific override.

The charts at https://www.unicode.org/cldr/charts/36/by_type/date_&_time.gregorian.html#21dded0fd50ba37e confirm that the expected pattern for "yMMMM" would be "y MMMM", which is the default for "all others" in the chart here; there's no exception for "hsb".

Hm, yes, it should be "MMMM y" and not "y MMMM". Dsb is not mentioned under "MMMM y", either. I cannot check it because I don't use Lower Sorbian Firefox productively so there are no fingerprinters collected yet.

According to https://st.unicode.org/cldr-apps/v#/hsb/Gregorian/21dded0fd50ba37e, this data item shows as "missing" for Upper Sorbian, which I assume means it just hasn't been submitted to CLDR (as opposed to a specific value being present, but not matching what's requested here).

So submitting the proper format upstream is the primary thing that the localization community should do here. Whether we can implement a local override in the meantime is a further question to consider.

As a crude but immediate fix, can the hsb team work around this by replacing the cfr-whatsnew-tracking-blocked-subtitle string in the .ftl file with something like

Since { DATETIME($earliestDate, month: "long") } { DATETIME($earliestDate, year: "numeric") }

to force the desired order by generating the two parts separately?

We already apply patches to ICU, and IMHO we can also patch the data for hsb and dsb.

We should do that instead of trying to work around the CLDR data in ways that are hard to maintain down the road, like manually hacking date formats.

I've created an Atlassian account. So I could file an issue on the CLDR issues page above mentioned. On this occasion I searched the issues and found the name of the Sorbian user who submitted the data to CLDR. I can contact him.

The way we patch ICU is with separate patch files. But all the CLDR locale data, we store in the tree in pre-assembled format in an ICU .dat file in config/external/icu/data. If we were to start patching CLDR inputs, we would need a separate local-patch process from the one we use for ICU, and we'd need to do some work to integrate it into our update-ICU process -- which currently is some roughly-manual commands periodically run -- to avoid these things being overwritten.

ICU and CLDR may be bits in the same area, but the extra .dat file generation step means patching ICU is pretty simple to do, while patching CLDR is much less simple. I already push back on us taking ICU patches of any meaningful complexity, and frankly patching CLDR is definitely more complexity than any simple C++ patch-file. That could be changed. But it would require work that is not at all the same difficulty as patching ICU right now.

(In reply to Axel Hecht [:Pike] from comment #11)

We already apply patches to ICU, and IMHO we can also patch the data for hsb and dsb.

We should do that instead of trying to work around the CLDR data in ways that are hard to maintain down the road, like manually hacking date formats.

Given Jeff's comment, I think we should give real consideration to the option of "manually hacking [the] date format" to work around this, assuming it can be done as suggested within the hsb localization. Yes, it's not a great solution from the point of view of scalability or maintainability, but it has the merit of being simple and highly localized, and carries minimal risk (either now or in the future). And it would be entirely within the control of the localization team; it doesn't introduce any other dependency or bottleneck in the process.

I've given my idea a try, gonna push that and let us check out the build.

Michael, thanks for starting the work to get changes into CLDR, would you mind opening an issue there so that we can link to it here? The patch I'm about to upload might also have hints on which data we'd like to add.

Jonathan, my take on what's simple and highly localized differs from yours. We're already talking about two localizations, and we're talking about inspecting each variable that's not explicitly formatted. As there might be a preformatted fluent value passed in, this can be anything. So this is a lot of locations, in two localizations, and no decent support from Pontoon to do that.

https://phabricator.services.mozilla.com/D62732 is what I have. I also managed to create the dat file from it, but sadly phabricator doesn't want me to store that file.

I've taken inspiration on which entries to add from de.txt, which is at least geographically close, but took the values from neighboring values in dsb/hsb. Michael, if you could take a look at those?

PS: intl/icu_sources_data.py doesn't work on macs, due to --output-sync not supported by bsd makes.

Did a local build in the meantime, and that shows the expected result for new Intl.DateTimeFormat(['hsb'], {month:'long', year:'numeric'}).format(new Date())

I filed issue https://unicode-org.atlassian.net/projects/CLDR/issues/?filter=reportedbyme on CLDR and wrote an e-mail to the Sorbian user asking him to complete the missing date/time formats.

@Axel Seems that all formats are OK on Phabricator. But, what Q stands for, quarter? And E is for weekday, isn't it? Most formats are as in German. The most important difference is that the full month name is in genitive.

I guess patching files under intl/icu/source/data/locales/ in our tree won't help on distros that compile --with-system-icu, which I think was a supported option last time I looked.

Thanks for the corrected issue URL, Axel.

(In reply to Jonathan Kew (:jfkthame) from comment #22)

I guess patching files under intl/icu/source/data/locales/ in our tree won't help on distros that compile --with-system-icu, which I think was a supported option last time I looked.

Yeah, system-icu won't have those fixes. OTH, we don't know what system icu has data-wise at all, I guess?

With more CLDR data being hooked into rust impls, and also with flatpack/snaps for distros, maybe it's OK to let that option go away?

With more CLDR data being hooked into rust impls,

We are discussing ways to coordinate CLDR data in Gecko for Rust impls in bug 1613271.

As for the patch itself - it may be worth investigating using file substitution instead of patches here: https://github.com/unicode-org/icu/blob/master/docs/userguide/icu_data/buildtool.md#file-substitution

This would open up a way for us to locally add new locales, and manage our overrides as full resources.

Is there a way to do partial overrides at build time? The nice thing about just a patch is that we'll get updates from upstream easily. At least many of them.

Is there a way to do partial overrides at build time?

Not sure!

Steven - can you advise? Is there a way to just add a couple patterns instead of switching to maintain our own fork of the hsb gregorian.txt file?

Flags: needinfo?(srl)

The Sorbian user replied. He wrote that new data won't be included before CDLR 38, probably released in October. Data submission will start in April. He will get back to me when it's nearer the time.

Assignee: nobody → l10n
Status: NEW → ASSIGNED

It seems that ICU updated w/out looking at this. I see no reaction at all on the ticket there, sadly.

On a positive note, my original patch was for 65, and we're not at 67.1, and the patch rebased w/out any issues at all.

Pushed by axel@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/aea74a39c13a add more date formats to dsb and hsb, r=jwalden
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla80

Hi Michael,

CLDR 38 will add its own date-time patterns for the skeletons "MMMMd" and "yMMMM":

Skeleton CLDR 37 (ICU 67) CLDR 38 (ICU 68) Firefox
MMMMd MMMM d d MMMM d. MMMM
yMMMM y MMMM LLLL y MMMM y

(The pattern symbols are from https://unicode.org/reports/tr35/tr35-dates.html#Date_Field_Symbol_Table, for example "MMMM" means "month in wide format", whereas "LLLL" means "stand-alone month in wide format".)

For example when formatting January 1, 2020, we'll get the following strings:

Skeleton CLDR 37 (ICU 67) CLDR 38 (ICU 68) Firefox
MMMMd januara 1 1 januara 1. januara
yMMMM 2020 januara januar 2020 januara 2020

Do we want to keep our current, customised output format or should we switch to use the standard patterns used in CLDR 38?

Thanks,
André

Flags: needinfo?(milupo)

Hi André,

thank you for your reply and help.

Still a question: In which context the pattern "januara 2020" is used? It is because "januara" is the genitive form of the month name but the genitive form of the month name is only used when there is a day number before so "1. januara" is correct but "januara 2020" should be "januar 2020" (month name in nominative) if there is no day number.

BTW, the use case from comment 1 I couldn't check again because there were no dates in that place again until now.

Flags: needinfo?(milupo)

The JavaScript API only allows to specify the width of the individual date-time components, but no other context information can be applied. It's up to the web page developer to choose the correct options when creating an Intl.DateTimeFormat object (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Intl/DateTimeFormat).

For example this is currently only possible:

let date = new Date(2020, 0, 1);
let formatter = new Intl.DateTimeFormat("hsb", {month: "long", year: "numeric"});
console.log(formatter.format(date)); // Firefox output: "januara 2020"

or:

let date = new Date(2020, 0, 1);
let formatter = new Intl.DateTimeFormat("hsb", {month: "long", day: "numeric"});
console.log(formatter.format(date)); // Firefox output: "1. januara"

Hm, CLDR 38 would be better here. It distinguishes between Formatting (month name in genitive) and standalone (month name in nominative). But, to complicate the issue :-) sometimes a preposition is used before the date like in the use case of comment 1. There the preposition "wot" is used which requires the genitive. But the issue in comment 1 was the order in the date only, y MMMM instead of MMMM y.

A question: Which rule does e.g. Czech apply? It's similar to Upper Sorbian and Lower Sorbian.

Czech uses:

Skeleton Pattern URL
MMMMd d. MMMM https://github.com/unicode-org/cldr/blob/387d0301f0ade6ffa632a18755590671065d9f53/common/main/cs.xml#L4254
yMMMM LLLL y https://github.com/unicode-org/cldr/blob/387d0301f0ade6ffa632a18755590671065d9f53/common/main/cs.xml#L4268

So Czech uses "M" when "d" is present, otherwise "L" is used.


What about the other difference in "MMMMd", where CLDR 38 uses "d MMMM", whereas our custom format is currently using "d. MMMM". I guess we want to stick with the full stop after "d"? (Is that a bug in CLDR 38? Every other pattern for Sorbian uses full stops as a delimiter: https://github.com/unicode-org/cldr/blob/387d0301f0ade6ffa632a18755590671065d9f53/common/main/hsb.xml#L1154-L1188 and https://github.com/unicode-org/cldr/blob/387d0301f0ade6ffa632a18755590671065d9f53/common/main/dsb.xml#L1163-L1197.)

(In reply to Zibi Braniecki [:zbraniecki][:gandalf] from comment #27)

Is there a way to do partial overrides at build time?

Not sure!

Steven - can you advise? Is there a way to just add a couple patterns instead of switching to maintain our own fork of the hsb gregorian.txt file?

Years later, sorry i missed this

Flags: needinfo?(srl)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: