Open Bug 1280654 Opened 8 years ago Updated 2 years ago

[meta] Expose a set of CLDR data via custom APIs for date/time format UI

Categories

(Core :: JavaScript: Internationalization API, defect, P3)

defect

Tracking

()

People

(Reporter: zbraniecki, Unassigned)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

(Keywords: meta, Whiteboard: [gecko-l20n][milestone5])

There are many cases where we need CLDR data for our internal code.

For example, lots of code related to Date and Date/Time pickers could use bits like firstDayOfTheWeek, weekendStarts, weekendEnds, date order (YMD, MDY, DMY), is the locale LTR/RTL etc.

Currently we don't have APIs for that data, and the progress to design those APIs in ECMA402 is moving slowly, because it's hard to find the right balance between generic and too strict.

But internally, we could expose something very generic and make use of CLDR data internally without the need to include ICU from our toolkit code.

Example of the API could be sth like:

var value = mozIntl.getCLDRData('en-US', 'datetime/gregorian/months/wide/standalone'); // January

var value = mozIntl.getCLDRData('pl', 'datetime/gregorian/months/wide/standalone'); // styczen

CC'ing Scott Wu, who'll be working on the Date/Time pickers and will need this.
:waldo, in principle, are you ok with us exposing this in the same way we'll do with PluralRules and RelativeTimeFormat but without intent to standardize it and expose into non-chrome content?

(we may use the experience with this API to inform us on what APIs we will create for ECMA402)
Flags: needinfo?(jwalden+bmo)
I expect some people might grouch about the greater dependence on ICU/CLDR, but really it's only the latter, and as I've said elsewhere, that's just data about languages/locales and there's no reason that work should be repeated.  I'd prefer if these APIs were considered partial stopgaps as Intl functionality is improved over time.  But that aside, this seems reasonable to me.
Flags: needinfo?(jwalden+bmo)
Reasonable in the abstract, at least -- no comments on comment 0's strawman API, which honestly I probably lack the experience in the area of using these APIs to evaluate.
thanks! Yeah, I consider it increased dependency on CLDR. We can use ICU to load CLDR data, because we have it, but really, it doesn't matter for me how we load those bits - and if someone prefers to write custom CLDR data loader so that we don't rely on ICU, it can be done and our API from this bug will keep working.

And dependency on CLDR makes a lot of sense to me and I didn't hear anyone arguing that we should build our custom equivalent of CLDR.
Hi Zibi, I put together a list of CLDR data the date time pickers would need, both in the picker UI and in the input field:

1. First day of the week
This will be used to render calendar with appropriate layout as illustrated the UX spec[1].
I couldn't find the corresponding field in the CLDR cart[2], though I did find it in the supplemental data section[3] under "firstDay". I wonder if we have access to this data?

2. Weekends
We'd like to highlight the weekends, and this info is also available in the supplemental data, under "weekendStart" and "weekendEnd".

3. Input field skeletons
I don't think there's a way to output the placeholder format strings for the input fields. As far as I know, the Intl.datetimeformat can only return formatted output but not it's skeleton. We'll need 4 formats: day/month/year[1], month/year[4], time[5], week[6]. As for datetime-local we can get it by combining the date and time.

The UX team does not have a strong opinion as to what exactly the patterns should be (ex. yMd / yMMMd), as you can see from the UX spec that they've only listed the options.

4. Words?
I'm not sure if this belongs to API or dtd. In the calendar for example, I could use Intl.DateTimeFormat to get most of the localized strings I need (ex. getting the month, date, and day using made up dates), but I don't know of a way to get localized AM/PM strings reliably. Would you say we'll need an API for this? or just translate them in dtd? or maybe there's an existing API that can do this (formatToParts?)

Thank you!

Links:
[1] https://mozilla.invisionapp.com/share/237UTNHS8#/screens/171579820
[2] http://www.unicode.org/cldr/charts/29/summary/en.html
[3] http://www.unicode.org/repos/cldr/trunk/common/supplemental/supplementalData.xml
[4] https://mozilla.invisionapp.com/share/237UTNHS8#/screens/171579381
[5] https://mozilla.invisionapp.com/share/237UTNHS8#/screens/171579902
[6] https://mozilla.invisionapp.com/share/237UTNHS8#/screens/171579822
Flags: needinfo?(gandalf)
Ok, matching it to ICU APIs:

(In reply to Scott Wu [:scottwu] from comment #6)
> 1. First day of the week

icu::Calendar::getFirstDayOfWeek

http://icu-project.org/apiref/icu4c/classicu_1_1Calendar.html#a55d7c70691dd47644ae1720b1868a075

> 2. Weekends

Probably icu::Calendar::getDayOfWeekType but I'll check with ICU yet
http://icu-project.org/apiref/icu4c/classicu_1_1Calendar.html#a0bc1e5fdb589dfbaa8e8f0e24f7e9179
 
> 3. Input field skeletons

> I don't think there's a way to output the placeholder format strings for the
> input fields. As far as I know, the Intl.datetimeformat can only return
> formatted output but not it's skeleton. We'll need 4 formats:
> day/month/year[1], month/year[4], time[5], week[6]. As for datetime-local we
> can get it by combining the date and time.
> 
> The UX team does not have a strong opinion as to what exactly the patterns
> should be (ex. yMd / yMMMd), as you can see from the UX spec that they've
> only listed the options.

That's probably this: http://icu-project.org/apiref/icu4c/dtptngen_8h.html

We're also discussing it at ECMA402 - https://github.com/tc39/ecma402/issues/21

 
> 4. Words?
> I'm not sure if this belongs to API or dtd. In the calendar for example, I
> could use Intl.DateTimeFormat to get most of the localized strings I need
> (ex. getting the month, date, and day using made up dates), but I don't know
> of a way to get localized AM/PM strings reliably. Would you say we'll need
> an API for this? or just translate them in dtd? or maybe there's an existing
> API that can do this (formatToParts?)

It would be better if we didn't have to.

We probably will want to expose it via http://icu-project.org/apiref/icu4c/udat_8h.html#aed73d44c01906572e8349d0307dafb27

Is that all you think you need out of CLDR?
Flags: needinfo?(gandalf)
:waldo, sorry for bothering you again. I talked to CLDR person (Steven Loomis) and he's saying that ICU does not expose an easy way to access raw CLDR data because it doesn't carry raw data.

He suggests that we either carry a cldr-json data set or we use ICU APIs to retrieve the data.
The former seems like it would make us carry the CLDR data twice - once hidden inside ICU, and second set accessible directly.

My current proposal is to create a custom set of APIs that will use ICU APIs to retrieve the bits. If we'll ever decide to move away from ICU, we'll have to transition those to some other method of retrieving the CLDR bits, which I think is fair.

According to to Scott's needs it could be sth like:

mozIntl.CLDR.getFirstDayOfTheWeek -> Int (eg. 1)
mozintl.CLDR.getWeekendRange -> [Int, Int] (eg. [6, 0]) 
mozIntl.CLDR.getDateTimePattern -> String ("d.MM.y")
mozIntl.CLDR.getDateTimeSymbol -> String (eg. "Monday", "January", "AM")


Does it sound good to you?
Flags: needinfo?(jwalden+bmo)
(In reply to Zibi Braniecki [:gandalf][:zibi] from comment #7)
> Is that all you think you need out of CLDR?

Yes that's all we can think of now.

Looking at the APIs, I realized that they are very specific to our use case for building date time pickers, rather than generic ones that could be used elsewhere in Firefox. So I wonder if creating APIs is the best way? rather than having the data in a jsm?

Of course having a copy of the data would cause maintenance problem as you mentioned, but I wonder how it weighs against the effort of creating these APIs?

Thanks a lot Zibi!
(In reply to Scott Wu [:scottwu] from comment #9)

> Looking at the APIs, I realized that they are very specific to our use case
> for building date time pickers, rather than generic ones that could be used
> elsewhere in Firefox. So I wonder if creating APIs is the best way? rather
> than having the data in a jsm?
>
> Of course having a copy of the data would cause maintenance problem as you
> mentioned, but I wonder how it weighs against the effort of creating these
> APIs?

I would argue that yes, it's worth it, for three reasons:

1) It keeps our code one level of abstraction away from relying directly on ICU, which means that if we ever decide to replace ICU with something else all we have to do is to update those API's internal code, not Fx code.
2) It keeps us from relying on internal structure of CLDR data. CLDR structure is not frozen so it may change over time. If we attempt to build a generic API, we'll end up with code in Fx like `mozIntl.getCLDRData('/path/to/the/given/node');` - and that path may change between CLDR 30 and 31 which will mean that every CLDR update will require us to hunt down all uses of the API and update the paths. That's very daunting.

In the new approach, we know exactly what code we expect and where it's used and we will have tests for each API. If one updated CLDR 30 to 31 and tests for mozIntl.getFirstDayOfTheWeek tests break - you know one place you have to fix.

3) It prepares us better for building ECMA 402 APIs. We will never expose raw data there, but we may end up exposing things like the bits you need because people keep asking for them. Having some version of those APIs privately in Firefox gives us good testing grounds.

Also, date/time pickers are not the only code that needs this :) Firefox OS was asking for the same, I'm pretty sure that there will be more places where we will want some of those bits - Notifications, alarms, History/Context Graph could use day names and eventually maybe which days are weekend.
Summary: Expose a generic API for polling data out of CLDR → Expose a set of CLDR data via custom APIs for date/time format UI
That's fantastic! Having APIs would definitely make implementing UIs easier. Just wanted to make sure it's the best choice going forward.

I had a discussion with Morpheus offline regarding date time patterns, specifically about the date placeholder format. The UX team prefers having the month spelled out like Apr/15/2016 to avoid confusion (mm/dd/yyyy or dd/mm/yyyy), but we less certain what an empty field should look [1].

My understanding is that mozIntl.CLDR.getDateTimePattern('yMMMd') would give me "MMM d,y" for US or "d. MMM y" for Europeans. But we are concerned that MMM would be confusing for users. Or maybe we could replace the MMM with the localized "Month" string using getDateTimeSymbol API? Wonder what's your take on this?

On a related note, I wonder if I could get the order of date (month/day/year or day/month/year) reliably by parsing the patterns? The reason I'm asking is that the picker UI would depend on the date format. For example, the month picker [2] would have spinner like this [ month | year ], but for Chinese locales it should be [ year | month ]. I don't know if there's a more straight forward way, so please let me know if you do :)

Thank you!

[1] https://mozilla.invisionapp.com/share/237UTNHS8#/screens/171579820
[2] https://mozilla.invisionapp.com/share/237UTNHS8#/screens/171579381
Flags: needinfo?(gandalf)
One thing I remember from the fx os days is that not all languages have month abbreviations.

I don't recall the exact list, but there's at least one that just numbers their months, for example.
Depends on: 1287503
Depends on: 1287677
(In reply to Scott Wu [:scottwu] from comment #11)
> I had a discussion with Morpheus offline regarding date time patterns,
> specifically about the date placeholder format. The UX team prefers having
> the month spelled out like Apr/15/2016 to avoid confusion (mm/dd/yyyy or
> dd/mm/yyyy), but we less certain what an empty field should look [1].

Spelled out month is easy with DateTimeFormat API. For an empty field, I'm also concerned :(
 
> My understanding is that mozIntl.CLDR.getDateTimePattern('yMMMd') would give
> me "MMM d,y" for US or "d. MMM y" for Europeans. But we are concerned that
> MMM would be confusing for users. Or maybe we could replace the MMM with the
> localized "Month" string using getDateTimeSymbol API? Wonder what's your
> take on this?

No strong opinion here. I just know it's confusing and I don't know what's the optimal solution.

I didn't touch the skeleton/pattern API yet because I think it's the most questionable one so I want to focus on what's clear and give us all more time to settle on what we want to do with placeholder.

The current proposal of getDisplayNames can return you localized word "Month", "Year", "Day", so if you decide to go that route, we'll have it.

> On a related note, I wonder if I could get the order of date (month/day/year
> or day/month/year) reliably by parsing the patterns? The reason I'm asking
> is that the picker UI would depend on the date format. For example, the
> month picker [2] would have spinner like this [ month | year ], but for
> Chinese locales it should be [ year | month ]. I don't know if there's a
> more straight forward way, so please let me know if you do :)

That's the basic idea. The skeleton/pattern API will be designed to give you the correct pattern for the given skeleton (so if you want "ymd", we'll return the right one for the locale) so then you can just check if in the returned pattern "y" is before "m" or after.
There will be other uses of that API, but getting the date/time picker order will certainly be one of them.
Flags: needinfo?(gandalf)
Depends on: 1289951
Mass change dependency tree for bug 1279002 into a whiteboard keyword.
No longer blocks: gecko-l20n
Whiteboard: [gecko-l20n]
Definitely this should be APIs.  Definitely we should not duplicate the data internally, or repackage a separate copy of the CLDR subset needed for this case.

(In reply to Zibi Braniecki [:gandalf][:zibi] from comment #8)
> :waldo, sorry for bothering you again. I talked to CLDR person (Steven
> Loomis) and he's saying that ICU does not expose an easy way to access raw
> CLDR data because it doesn't carry raw data.

It's all compressed and stored in a very-custom way, but yeah, the literal original information isn't there.  Or at least I think that's what he meant.  ICU still has the information at hand.

> He suggests that we either carry a cldr-json data set or we use ICU APIs to
> retrieve the data.
> The former seems like it would make us carry the CLDR data twice - once
> hidden inside ICU, and second set accessible directly.

Using ICU APIs to retrieve the data is fine.

> According to to Scott's needs it could be sth like:
> 
> mozIntl.CLDR.getFirstDayOfTheWeek -> Int (eg. 1)
> mozintl.CLDR.getWeekendRange -> [Int, Int] (eg. [6, 0]) 
> mozIntl.CLDR.getDateTimePattern -> String ("d.MM.y")
> mozIntl.CLDR.getDateTimeSymbol -> String (eg. "Monday", "January", "AM")

The "CLDR" bit is a pure implementation detail.  No web developer wanting this information, will care that CLDR's providing it, nor likely know that CLDR's providing it.  Nor should APIs promise CLDR (implicitly, or explicitly by name), in case CLDR itself ever goes away (which seems super-unlikely).  These should either be directly on mozIntl (and then ultimately on Intl, in some ultimately-standardized form), or they should be on an object on Intl that has some other name: "Calendar", "DateTime", or something -- or maybe multiple of them, if you're querying enough different things.

Given the getCanonicalLocales precedent as a generic utility function, unrelated to any specific area of i18n functionality, I would lean toward Intl.Calendar or similar rather than Intl.* directly.

(In reply to Zibi Braniecki [:gandalf][:zibi] from comment #13)
> I didn't touch the skeleton/pattern API yet because I think it's the most
> questionable one
> 
> That's the basic idea. The skeleton/pattern API will be designed to give you
> the correct pattern for the given skeleton (so if you want "ymd", we'll
> return the right one for the locale) so then you can just check if in the
> returned pattern "y" is before "m" or after.
> There will be other uses of that API, but getting the date/time picker order
> will certainly be one of them.

Exposing raw patterns is IMO pretty problematic.  There's no reason to assume that the "pattern" for a particular set of components will always have a consistent order and structure -- it could easily depend on the particular date/time chosen.

I *think* there are some locales, for example, that have different calendars covering different portions of history, so the pattern/format for one instant will not apply universally.  Even if there aren't, now, there *could* be in the future, and that action is out of our control (and perhaps in the hands of some tinpot dictator somewhere).
Flags: needinfo?(jwalden+bmo)
Depends on: 1303579
:scottwu:

For date/time patterns, would it be enough for you to use formatToParts:


var x = new Intl.DateTimeFormat('pl', {
  hour: '2-digit',
  minute: '2-digit',
});
var parts = x.formatToParts(now);

parts.map(part => {
  switch (part.type) {
    case 'hour':
    case 'minute':
      return '--'
    default:
      return part.value;
  }
}).join(''); // "--:--"

?
Flags: needinfo?(scwwu)
(In reply to Zibi Braniecki [:gandalf][:zibi] from comment #16)
> For date/time patterns, would it be enough for you to use formatToParts:

I think this would actually be enough for our use. It gives us the order of date/time and their separators. Just need to write a utility function to make it easier to work with.

What do you think Jessica? Have I missing anything?
Flags: needinfo?(scwwu) → needinfo?(jjong)
> I think this would actually be enough for our use. It gives us the order of date/time and their separators. Just need to write a utility function to make it easier to work with.

Make sure to look at http://searchfox.org/mozilla-central/rev/76609a05d6ef7ba4223ed79e479c73fb2543a107/js/src/builtin/Intl.cpp#2118 the list of tokens that can come up.

You should be prepared for most of them. Intl engine works really well for selecting the best applicable format for the locale and it leads to some interesting outliers.

For example:

var x = new Intl.DateTimeFormat('fa', {year: 'numeric'});
x.formatToParts(0);

[
  {type:"year", value:"\u06F1\u06F3\u06F4\u06F8"},
  {type:"literal", value:" "},
  {type:"era", value:"\u0647\u200D.\u0634."}
]

This is currently reported as http://unicode.org/cldr/trac/ticket/9838 and will get fixed in CLDR 31, but it's worth knowing that there may be scenarios where fields beyond ones you asked for show up in the result. (same would be true if we exposed a pattern)
(In reply to Scott Wu [:scottwu] from comment #17)
> (In reply to Zibi Braniecki [:gandalf][:zibi] from comment #16)
> > For date/time patterns, would it be enough for you to use formatToParts:
> 
> I think this would actually be enough for our use. It gives us the order of
> date/time and their separators. Just need to write a utility function to
> make it easier to work with.
> 
> What do you think Jessica? Have I missing anything?

Yes, I think this is enough for us, what we need are the order and separators. And for time, if we can have the user's preference for 12/24hr format (I think this is discussed somewhere else), this would do. Thanks.
Flags: needinfo?(jjong)
Assignee: nobody → gandalf
Whiteboard: [gecko-l20n] → [gecko-l20n][milestone5]
Priority: -- → P2
Depends on: 1312053
What's the status / plan forward here?
Flags: needinfo?(gandalf)
We're making progress with things like mozIntl.getLocaleInfo, mozIntl.getCalendarInfo and so on.
The next steps as bug 1303579 and bug 1376616. Both are for grabs if you're interested.
Depends on: 1376616
Flags: needinfo?(gandalf)
Assignee: gandalf → nobody
Summary: Expose a set of CLDR data via custom APIs for date/time format UI → [meta] Expose a set of CLDR data via custom APIs for date/time format UI
Moving to p3 because no activity for at least 1 year(s).
See https://github.com/mozilla/bug-handling/blob/master/policy/triage-bugzilla.md#how-do-you-triage for more information
Priority: P2 → P3
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.