Upper Sorbian Month/Year format shows year first instead of last for DateTimeFormat.format()
Categories
(Core :: JavaScript: Internationalization API, defect)
Tracking
()
Tracking | Status | |
---|---|---|
firefox80 | --- | fixed |
People
(Reporter: Pike, Assigned: Pike)
References
Details
Attachments
(2 files)
Michael reported this on the .l10n newsgroup, and I'm not sure if this is a intl api bug or a cldr bug.
new Intl.DateTimeFormat(['hsb'], {month:'long', year:'numeric'}).format(new Date())
"2020 februara"
but the expected result would be "februara 2020"
.
I'm not sure if this is a problem about picking an unfortunate pattern for "just date and year", or missing/bad data in cldr. I also don't know what ICU does if a pattern isn't there.
Starting with a bug here. What pattern do we use, which cldr data point do we end up using in this situation?
Comment 1•5 years ago
|
||
Comment 2•5 years ago
|
||
CLDR: https://github.com/unicode-cldr/cldr-dates-full/blob/master/main/hsb/ca-gregorian.json#L333
As you can see, for narrow and short, the order is correct, but for long it is reversed:
"yM": "M.y",
"yMMM": "MMM y",
"yMMMM": "y MMMM",
CLDR issues can be reported at https://unicode-org.atlassian.net/projects/CLDR/issues
Assignee | ||
Comment 3•5 years ago
|
||
Well, the json is not data we're using to ship. I have a hard time finding any data definitions in the xml or txt files for yMMMM
that's not an interval format.
Which is why I asked about our data, and which skeleton we build, and which of our data points we actually end up using.
Comment 4•5 years ago
|
||
Well, the json is not data we're using to ship.
We ship compressed data from ICU storage in https://searchfox.org/mozilla-central/source/intl/icu/source/data/locales
I have a hard time finding any data definitions in the xml or txt files for yMMMM that's not an interval format.
Yep, same here. My guess is that since CLDR's yMMMM
skeleton matches root's skeleton's pattern, the ICU just cut it out to save space: https://searchfox.org/mozilla-central/source/intl/icu/source/data/locales/root.txt#1080
If hsb's yMMMM
skeleton would differ, it'd be stored in https://searchfox.org/mozilla-central/source/intl/icu/source/data/locales/hsb.txt#336
Comment 5•5 years ago
|
||
Can we overwrite those files locally to ship the correct format while we wait for Unicode to respond? What is the typical response time for resolving CLDR issues like this? If we can't use local changes, what do we do about this bug while we wait for Unicode to act? I understand needing to submit these changes upstream, but this affects the quality of our localized user experiences, so we should have a vested interest in a workaround while we wait for upstream changes to be made.
Comment 6•5 years ago
|
||
ICU allows for local overrides. We never did this and we don't really have a procedure around it so I'm not sure how much work it is (do we need build system changes?). Ive been advocating for an investment in alignment between cldr and gecko, and I suspect that this will become even more pronounced once we land Intl.DiplayNames and try to use them for language selectors.
Comment 7•5 years ago
|
||
As for response time - I'd expect it to be very quick since the fix is trivial, but the release cycle is 6 months and as far as I'm aware we never got to actually contribute data to cldr yet. Flod was looking into it once through their survey tool which I think he has access to and can just contribute the data but I'm not sure if he got to the point where he's familiar with the process and I don't think anyone else is.
Comment 8•5 years ago
|
||
(In reply to Zibi Braniecki [:zbraniecki][:gandalf] from comment #4)
Well, the json is not data we're using to ship.
We ship compressed data from ICU storage in https://searchfox.org/mozilla-central/source/intl/icu/source/data/locales
I have a hard time finding any data definitions in the xml or txt files for yMMMM that's not an interval format.
Yep, same here. My guess is that since CLDR's
yMMMM
skeleton matches root's skeleton's pattern, the ICU just cut it out to save space: https://searchfox.org/mozilla-central/source/intl/icu/source/data/locales/root.txt#1080If hsb's
yMMMM
skeleton would differ, it'd be stored in https://searchfox.org/mozilla-central/source/intl/icu/source/data/locales/hsb.txt#336
Right, it'll inherit from the root locale if there's no more specific override.
The charts at https://www.unicode.org/cldr/charts/36/by_type/date_&_time.gregorian.html#21dded0fd50ba37e confirm that the expected pattern for "yMMMM" would be "y MMMM", which is the default for "all others" in the chart here; there's no exception for "hsb".
Comment 9•5 years ago
|
||
Hm, yes, it should be "MMMM y" and not "y MMMM". Dsb is not mentioned under "MMMM y", either. I cannot check it because I don't use Lower Sorbian Firefox productively so there are no fingerprinters collected yet.
Comment 10•5 years ago
|
||
According to https://st.unicode.org/cldr-apps/v#/hsb/Gregorian/21dded0fd50ba37e, this data item shows as "missing" for Upper Sorbian, which I assume means it just hasn't been submitted to CLDR (as opposed to a specific value being present, but not matching what's requested here).
So submitting the proper format upstream is the primary thing that the localization community should do here. Whether we can implement a local override in the meantime is a further question to consider.
As a crude but immediate fix, can the hsb
team work around this by replacing the cfr-whatsnew-tracking-blocked-subtitle string in the .ftl file with something like
Since { DATETIME($earliestDate, month: "long") } { DATETIME($earliestDate, year: "numeric") }
to force the desired order by generating the two parts separately?
Assignee | ||
Comment 11•5 years ago
|
||
We already apply patches to ICU, and IMHO we can also patch the data for hsb
and dsb
.
We should do that instead of trying to work around the CLDR data in ways that are hard to maintain down the road, like manually hacking date formats.
Comment 12•5 years ago
|
||
I've created an Atlassian account. So I could file an issue on the CLDR issues page above mentioned. On this occasion I searched the issues and found the name of the Sorbian user who submitted the data to CLDR. I can contact him.
Comment 13•5 years ago
|
||
The way we patch ICU is with separate patch files. But all the CLDR locale data, we store in the tree in pre-assembled format in an ICU .dat file in config/external/icu/data
. If we were to start patching CLDR inputs, we would need a separate local-patch process from the one we use for ICU, and we'd need to do some work to integrate it into our update-ICU process -- which currently is some roughly-manual commands periodically run -- to avoid these things being overwritten.
ICU and CLDR may be bits in the same area, but the extra .dat
file generation step means patching ICU is pretty simple to do, while patching CLDR is much less simple. I already push back on us taking ICU patches of any meaningful complexity, and frankly patching CLDR is definitely more complexity than any simple C++ patch-file. That could be changed. But it would require work that is not at all the same difficulty as patching ICU right now.
Comment 14•5 years ago
|
||
(In reply to Axel Hecht [:Pike] from comment #11)
We already apply patches to ICU, and IMHO we can also patch the data for
hsb
anddsb
.We should do that instead of trying to work around the CLDR data in ways that are hard to maintain down the road, like manually hacking date formats.
Given Jeff's comment, I think we should give real consideration to the option of "manually hacking [the] date format" to work around this, assuming it can be done as suggested within the hsb
localization. Yes, it's not a great solution from the point of view of scalability or maintainability, but it has the merit of being simple and highly localized, and carries minimal risk (either now or in the future). And it would be entirely within the control of the localization team; it doesn't introduce any other dependency or bottleneck in the process.
Assignee | ||
Comment 15•5 years ago
|
||
I've given my idea a try, gonna push that and let us check out the build.
Michael, thanks for starting the work to get changes into CLDR, would you mind opening an issue there so that we can link to it here? The patch I'm about to upload might also have hints on which data we'd like to add.
Jonathan, my take on what's simple and highly localized differs from yours. We're already talking about two localizations, and we're talking about inspecting each variable that's not explicitly formatted. As there might be a preformatted fluent value passed in, this can be anything. So this is a lot of locations, in two localizations, and no decent support from Pontoon to do that.
Assignee | ||
Comment 16•5 years ago
|
||
Assignee | ||
Comment 17•5 years ago
|
||
https://phabricator.services.mozilla.com/D62732 is what I have. I also managed to create the dat file from it, but sadly phabricator doesn't want me to store that file.
I've taken inspiration on which entries to add from de.txt
, which is at least geographically close, but took the values from neighboring values in dsb/hsb. Michael, if you could take a look at those?
PS: intl/icu_sources_data.py
doesn't work on macs, due to --output-sync
not supported by bsd makes.
Assignee | ||
Comment 18•5 years ago
|
||
Did a local build in the meantime, and that shows the expected result for new Intl.DateTimeFormat(['hsb'], {month:'long', year:'numeric'}).format(new Date())
Comment 19•5 years ago
|
||
I filed issue https://unicode-org.atlassian.net/projects/CLDR/issues/?filter=reportedbyme on CLDR and wrote an e-mail to the Sorbian user asking him to complete the missing date/time formats.
Assignee | ||
Comment 20•5 years ago
|
||
The issue URL is https://unicode-org.atlassian.net/browse/CLDR-13580.
Comment 21•5 years ago
|
||
@Axel Seems that all formats are OK on Phabricator. But, what Q stands for, quarter? And E is for weekday, isn't it? Most formats are as in German. The most important difference is that the full month name is in genitive.
Comment 22•5 years ago
|
||
I guess patching files under intl/icu/source/data/locales/ in our tree won't help on distros that compile --with-system-icu, which I think was a supported option last time I looked.
Comment 23•5 years ago
|
||
Thanks for the corrected issue URL, Axel.
Assignee | ||
Comment 24•5 years ago
|
||
(In reply to Jonathan Kew (:jfkthame) from comment #22)
I guess patching files under intl/icu/source/data/locales/ in our tree won't help on distros that compile --with-system-icu, which I think was a supported option last time I looked.
Yeah, system-icu won't have those fixes. OTH, we don't know what system icu has data-wise at all, I guess?
With more CLDR data being hooked into rust impls, and also with flatpack/snaps for distros, maybe it's OK to let that option go away?
Comment 25•5 years ago
|
||
With more CLDR data being hooked into rust impls,
We are discussing ways to coordinate CLDR data in Gecko for Rust impls in bug 1613271.
As for the patch itself - it may be worth investigating using file substitution instead of patches here: https://github.com/unicode-org/icu/blob/master/docs/userguide/icu_data/buildtool.md#file-substitution
This would open up a way for us to locally add new locales, and manage our overrides as full resources.
Assignee | ||
Comment 26•5 years ago
|
||
Is there a way to do partial overrides at build time? The nice thing about just a patch is that we'll get updates from upstream easily. At least many of them.
Comment 27•5 years ago
|
||
Is there a way to do partial overrides at build time?
Not sure!
Steven - can you advise? Is there a way to just add a couple patterns instead of switching to maintain our own fork of the hsb gregorian.txt file?
Comment 28•5 years ago
|
||
The Sorbian user replied. He wrote that new data won't be included before CDLR 38, probably released in October. Data submission will start in April. He will get back to me when it's nearer the time.
Updated•4 years ago
|
Assignee | ||
Comment 29•4 years ago
|
||
It seems that ICU updated w/out looking at this. I see no reaction at all on the ticket there, sadly.
On a positive note, my original patch was for 65, and we're not at 67.1, and the patch rebased w/out any issues at all.
Comment 30•4 years ago
|
||
Comment 31•4 years ago
|
||
bugherder |
Comment 32•4 years ago
|
||
Hi Michael,
CLDR 38 will add its own date-time patterns for the skeletons "MMMMd" and "yMMMM":
Skeleton | CLDR 37 (ICU 67) | CLDR 38 (ICU 68) | Firefox |
---|---|---|---|
MMMMd | MMMM d | d MMMM | d. MMMM |
yMMMM | y MMMM | LLLL y | MMMM y |
(The pattern symbols are from https://unicode.org/reports/tr35/tr35-dates.html#Date_Field_Symbol_Table, for example "MMMM" means "month in wide format", whereas "LLLL" means "stand-alone month in wide format".)
For example when formatting January 1, 2020, we'll get the following strings:
Skeleton | CLDR 37 (ICU 67) | CLDR 38 (ICU 68) | Firefox |
---|---|---|---|
MMMMd | januara 1 | 1 januara | 1. januara |
yMMMM | 2020 januara | januar 2020 | januara 2020 |
Do we want to keep our current, customised output format or should we switch to use the standard patterns used in CLDR 38?
Thanks,
André
Comment 33•4 years ago
|
||
Hi André,
thank you for your reply and help.
Still a question: In which context the pattern "januara 2020" is used? It is because "januara" is the genitive form of the month name but the genitive form of the month name is only used when there is a day number before so "1. januara" is correct but "januara 2020" should be "januar 2020" (month name in nominative) if there is no day number.
BTW, the use case from comment 1 I couldn't check again because there were no dates in that place again until now.
Comment 34•4 years ago
|
||
The JavaScript API only allows to specify the width of the individual date-time components, but no other context information can be applied. It's up to the web page developer to choose the correct options when creating an Intl.DateTimeFormat
object (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Intl/DateTimeFormat).
For example this is currently only possible:
let date = new Date(2020, 0, 1);
let formatter = new Intl.DateTimeFormat("hsb", {month: "long", year: "numeric"});
console.log(formatter.format(date)); // Firefox output: "januara 2020"
or:
let date = new Date(2020, 0, 1);
let formatter = new Intl.DateTimeFormat("hsb", {month: "long", day: "numeric"});
console.log(formatter.format(date)); // Firefox output: "1. januara"
Comment 35•4 years ago
|
||
Hm, CLDR 38 would be better here. It distinguishes between Formatting (month name in genitive) and standalone (month name in nominative). But, to complicate the issue :-) sometimes a preposition is used before the date like in the use case of comment 1. There the preposition "wot" is used which requires the genitive. But the issue in comment 1 was the order in the date only, y MMMM instead of MMMM y.
A question: Which rule does e.g. Czech apply? It's similar to Upper Sorbian and Lower Sorbian.
Comment 36•4 years ago
|
||
Czech uses:
So Czech uses "M" when "d" is present, otherwise "L" is used.
What about the other difference in "MMMMd", where CLDR 38 uses "d MMMM", whereas our custom format is currently using "d. MMMM". I guess we want to stick with the full stop after "d"? (Is that a bug in CLDR 38? Every other pattern for Sorbian uses full stops as a delimiter: https://github.com/unicode-org/cldr/blob/387d0301f0ade6ffa632a18755590671065d9f53/common/main/hsb.xml#L1154-L1188 and https://github.com/unicode-org/cldr/blob/387d0301f0ade6ffa632a18755590671065d9f53/common/main/dsb.xml#L1163-L1197.)
Comment 37•10 months ago
|
||
(In reply to Zibi Braniecki [:zbraniecki][:gandalf] from comment #27)
Is there a way to do partial overrides at build time?
Not sure!
Steven - can you advise? Is there a way to just add a couple patterns instead of switching to maintain our own fork of the hsb gregorian.txt file?
Years later, sorry i missed this
Description
•