Closed Bug 1216150 Opened 4 years ago Closed 4 years ago

Implement ECMA 402 DateTimeFormat formatToParts behind a compartment option (enabled only for b2g certified apps)

Categories

(Core :: JavaScript: Internationalization API, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla46
blocking-b2g 2.5+
tracking-b2g backlog
Tracking Status
firefox46 --- fixed
firefox47 --- fixed

People

(Reporter: zbraniecki, Assigned: Waldo)

References

Details

(Keywords: dev-doc-complete, Whiteboard: [DocArea=JS])

Attachments

(5 files, 11 obsolete files)

29.27 KB, patch
Details | Diff | Splinter Review
35.11 KB, patch
zbraniecki
: review+
Details | Diff | Splinter Review
10.07 KB, patch
Waldo
: review-
Details | Diff | Splinter Review
6.94 KB, patch
bzbarsky
: review+
Details | Diff | Splinter Review
3.01 KB, patch
fabrice
: review+
Details | Diff | Splinter Review
A lot of Firefox OS UX requires some basic formatting of time patterns returned by Intl.DateTimeFormat.

Java supports that via fieldPosition parameter[0] and we're working on including that in Emca 402 rev. 3 [1].

But we really need this now because otherwise we are using dirty hacks that provide very bad localization results for some edge case locales like zh-CN, ar or de.

I'm working with Ecma group to get a consensus on the exact API, and I'd like to have a patch here ready to land in case we get an agreement in time for FxOS 2.5 so that we can fix those very visible use cases.

I will need some help with C++.

[0] http://www.icu-project.org/apiref/icu4c/udat_8h.html#a6b7a2e8ae0b620aa677ae6c9d469a0b9
[1] https://github.com/tc39/ecma402/issues/30
Attached patch 1216150.patch (obsolete) — Splinter Review
Jeff, I'm sorry for my horrible C++. Can you advise me on how to write this C++ properly?

I won't ask for review yet, until I understand better what are the chances that we will add it to ECMA 402, but I'd like to know if that patch is all it takes for us to expose this ICU function.
Assignee: nobody → gandalf
Attachment #8675690 - Flags: feedback?(jwalden+bmo)
Comment on attachment 8675690 [details] [diff] [review]
1216150.patch

Review of attachment 8675690 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/builtin/Intl.cpp
@@ +1978,5 @@
>      if (!str)
>          return false;
>  
> +    if (fp) {
> +      RootedObject res(cx, NewObjectWithGivenProto<PlainObject>(cx, nullptr));

This'll need a null-check.

@@ +1982,5 @@
> +      RootedObject res(cx, NewObjectWithGivenProto<PlainObject>(cx, nullptr));
> +
> +      // this should be res.string = str;
> +      RootedValue stringValue(cx, ObjectValue(*res));
> +      if (!DefineProperty(cx, res, str, stringValue, nullptr, nullptr, 0))

s/str,/cx->names().string/ I think.

And, it's probably worth renaming |str|, if it's not always a string.

@@ +1988,5 @@
> +
> +
> +      // this should be res.fieldPosition = fieldPosition;
> +      RootedValue fposValue(cx, ObjectValue(*res));
> +      if (!DefineProperty(cx, Intl, fieldPosition, fposValue, nullptr, nullptr, 0))

s/fieldPosition/cx->names().fieldPosition, which you'll need to add to vm/CommonPropertyNames.h.

I have no idea where this Intl thing comes from.  You should write your patch testing against a mozilla-central build, not (I guess?) trying to write uncompiled code.
Attachment #8675690 - Flags: feedback?(jwalden+bmo)
Attached patch 1216150.patch (obsolete) — Splinter Review
Hya!

Got the POC to work. It's still just a POC, but it seems to be fully functional.

I'm trying to get a consensus within ECMA 402 on this design, and would like to try to get it in Fx OS 2.5 to fix time formatting that is currently in a permanent dirtyhack state.

Jeff - does it look like something you may be OK accepting into our codebase? I will want to put the field2DateTimeComponent outside of the formatWithPosition function because it will be used by other Intl formatters in the future, but I wasn't sure how to handle that properly.

I'd also like to consider limiting the exposure of this to B2G only until we're closer to ECMA 402 edition 3 release.

My main concern is to get this in Fx 2.5 and be able to stop relying on toLocaleFormat (which doesn't work on non-latin languages anyway).
Attachment #8675690 - Attachment is obsolete: true
Attachment #8677795 - Flags: feedback?(jwalden+bmo)
[Blocking Requested - why for this release]: We currently rely on date time token formatting in multiple very visible places like:

 - status bar clock
 - lockscreen clock
 - Clock app
 - SMS status info
 - Contacts

But we do not have *any* API to handle that. The current workaround that I implemented as part of mozIntl.DateTimeFormat is limited and leads to many scenarios in which an unexpected result is visible when gecko/gaia language combinations go wrong. Examples of such behavior has been recorded in bug 1209866.

The only proper solution is to expose field positions from ICU to Intl API and use it.

I'm working with ECMA to standardize the API proposal and so far got positive feedback from the ECMA 402 leader.

I need this patch to replace our dirty hack and would love to get this in 2.5.

This will fortunately be a very limited change as all it will require is this patch and a single change in mozIntl API in Gaia.

Getting this in B2G will allow us to stop relying on properietary and limited `Date.prototype.toLocaleFormat`, fix edge cases that we will discover otherwise only very late in the cycyle when 2.5 will get in hands of testers in other locales, and will overall improve our Intl story allowing us to further customize the time/date tokens.

The only possible cost of this patch is that between now and June 2016 the exact API in ECMA may change and I'll have to update our implementation which I'm totally fine with.
blocking-b2g: --- → 2.5?
Btw. I don't need the decision to be made anytime soon, I set b2g-2.5? so put it on release managers radar. I believe we will be able to make some decisions in about two weeks after the next TC39 meeting.
Comment on attachment 8677795 [details] [diff] [review]
1216150.patch

Review of attachment 8677795 [details] [diff] [review]:
-----------------------------------------------------------------

More or less on track, tho a substantial number of things needing changing still.  And of course this says nothing about whether the API will be sufficiently set-in-stone to ship.  Racing ahead of standards groups without rough consensus is an excellent way to shoot ourselves in the foot, and we have far more experience doing that than I wish we did.  :-(

::: js/src/builtin/Intl.cpp
@@ +1693,5 @@
> +    {
> +        return nullptr;
> +    }
> +
> +    RootedValue formatWithPositionGetter(cx);

Add a comment something like

  // Then do likewise for DateTimeFormat.prototype.formatWithPosition.

@@ +1983,5 @@
> +
> +    int size;
> +    size = udat_format(df, x, Char16ToUChar(chars.begin()), INITIAL_CHAR_BUFFER_SIZE,
> +                        nullptr, &status);
> +

Not sure why this change...

@@ +2014,5 @@
> +    }
> +
> +    Vector<char16_t, INITIAL_CHAR_BUFFER_SIZE> chars(cx);
> +    if (!chars.resize(INITIAL_CHAR_BUFFER_SIZE))
> +        return false;

Rather than a Vector, use a StringBuffer.  That provides a nice API for extracting a string afterward, too, which you want later on.

@@ +2019,5 @@
> +    UErrorCode status = U_ZERO_ERROR;
> +    UFieldPositionIterator* fpositer = ufieldpositer_open(&status);
> +    if (U_FAILURE(status)) {
> +        return false;
> +    }

SpiderMonkey style doesn't brace single-line things, unless something else (loop head, condition, etc.) crosses two lines.

@@ +2022,5 @@
> +        return false;
> +    }
> +
> +    int size;
> +    size = udat_formatForFields(df, x, Char16ToUChar(chars.begin()), INITIAL_CHAR_BUFFER_SIZE, fpositer, &status);

Again, do it the same style it was done before.

@@ +2034,5 @@
> +        JS_ReportErrorNumber(cx, GetErrorMessage, nullptr, JSMSG_INTERNAL_INTL_ERROR);
> +        return false;
> +    }
> +
> +    JSString* str = NewStringCopyN<CanGC>(cx, chars.begin(), size);

This would want to be a RootedString -- as it is, this string won't be tracked by the GC when you create the object just next, and this string could be invalid at that point.

@@ +2038,5 @@
> +    JSString* str = NewStringCopyN<CanGC>(cx, chars.begin(), size);
> +    if (!str)
> +        return false;
> +
> +    RootedObject res(cx, JS_NewObject(cx, nullptr));

Use NewBuiltinClassInstance<PlainObject>(cx) instead of this.

@@ +2046,5 @@
> +    RootedValue stringValue(cx, StringValue(str));
> +    if (!DefineProperty(cx, res, cx->names().string, stringValue, nullptr, nullptr, JSPROP_ENUMERATE))
> +        return false;
> +
> +    RootedObject fPosObject(cx, JS_NewObject(cx, nullptr));

Same again, and the other JS_NewObjects in here.

@@ +2052,5 @@
> +        return false;
> +
> +    int32_t fieldName, beginIndex, endIndex;
> +
> +    while((fieldName = ufieldpositer_next(fpositer, &beginIndex, &endIndex)) != -1) {

Space after while.

The docs said "a negative value if there are no more fields for which to provide information".  So checking -1 is wrong -- check < 0.

@@ +2053,5 @@
> +
> +    int32_t fieldName, beginIndex, endIndex;
> +
> +    while((fieldName = ufieldpositer_next(fpositer, &beginIndex, &endIndex)) != -1) {
> +        RootedObject fPosField(cx, JS_NewObject(cx, nullptr));

Simply doing

  RootedT t(cx);

involves several memory accesses, so we want to move all Rooted in the loop, outside the loop.  Inside the loop we can assign to each anew, each loop iteration.

@@ +2067,5 @@
> +            return false;
> +
> +        char* name;
> +
> +        // See i18n/unicode/udat.h for detailed list of fields

Just unicode/udat.h -- that's the canonical path used to access this stuff, not the path internal to Mozilla's copy.

@@ +2069,5 @@
> +        char* name;
> +
> +        // See i18n/unicode/udat.h for detailed list of fields
> +        switch (fieldName) {
> +            case 0: name = "era"; break;

switch (static_cast<UDateFormatField>(fieldName))

and then use the named constants in the header.  Definitely you should never use raw numbers like this!

@@ +2086,5 @@
> +            case 17: name = "timeZoneName"; break;
> +            case 35: name = "timeSeparator"; break;
> +        }
> +
> +        RootedAtom a(cx, Atomize(cx, name, strlen(name)));

These names should all be added to CommonPropertyNames.h, then just assign

  RootedValue fieldValue(cx);
  fieldValue.setString(cx->names().year);

or so.

@@ +2098,5 @@
> +        return false;
> +
> +    result.setObject(*res);
> +
> +    ufieldpositer_close(fpositer);

Every return-false earlier in this file, that causes this not to be reached, represents a leak.  You want the much-less-fragile

  ScopedICUObject<UFieldPositionIterator> toClose(fpositer, ufieldpositer_close);

or so, just after iterator creation.

@@ +2110,5 @@
>      CallArgs args = CallArgsFromVp(argc, vp);
>      MOZ_ASSERT(args.length() == 2);
>      MOZ_ASSERT(args[0].isObject());
>      MOZ_ASSERT(args[1].isNumber());
> +    MOZ_ASSERT(args[1].isBool());

s/1/2/

That said, you're asserting the length is 2 earlier, so this is definitely wrong.  You'll have to change s/2/3/ and ensure every caller of FormatDateTime passes that third argument.

@@ +2144,5 @@
> +    if (args[2].toBoolean()) {
> +      success = intl_FormatWithPositionDateTime(cx, df, args[1].toNumber(), &result);
> +    } else {
> +      success = intl_FormatDateTime(cx, df, args[1].toNumber(), &result);
> +    }

bool success = args[2].toBoolean()
               ? intl_Format(...)
               : intl_Format(...);

for the style.

::: js/src/vm/CommonPropertyNames.h
@@ +85,1 @@
>      macro(fieldOffsets, fieldOffsets, "fieldOffsets") \

Flip these two to remain alphabetical.
Attachment #8677795 - Flags: feedback?(jwalden+bmo) → feedback+
Attached patch 1216150.patch (obsolete) — Splinter Review
Thanks a lot for that!

I updated the patch to address your feedback and added tests.

> And of course this says nothing about whether the API will be sufficiently set-in-stone to ship.  Racing ahead of standards groups without rough consensus is an excellent way to shoot ourselves in the foot, and we have far more experience doing that than I wish we did.  :-(

Agree. It's a very unfortunate situation. I believe that we may be able to avoid having to get this patch in 2.5 if we can get bug 1208808 landed in 2.5, but I'm not sure yet.

My goal right now is to get the patches in shape and ready to be landed *if* you decide that it's the right thing to do.

I believe that it will be your call once we have bug 1208808 landed and we see how it affects the exposition of the bugs related to trying to remove dayperiod portion of Intl string using toLocaleFormat('%p') token.

With this patch being ready, we can at least minimize the time in case we decide to land it for 2.5.
Attachment #8677795 - Attachment is obsolete: true
Attachment #8680290 - Flags: review?(jwalden+bmo)
btw. Here's an example of a bug where a reviewer exposed the problem: bug 1209866.


He is on a machine with zh-CN locale, so his toLocaleFormat('%p') returns the dayperiod token in zh-CN irrelevant of navigator.language.
Then, he's launching Firefox OS with navigator.language=en-US.

We try to take a time string ("03:24 pm") and format it: "03:24 <small>pm</small>"), but because we only have the dayperiod token in zh-CN, this fails.

Without this feature, we will always fail, because there is no way to format a token out of a time string without knowing the token position.

I believe that this is just one of many examples of environments in which the interaction between Intl API and toLocaleFormat fails.
[Tracking Requested - why for this release]:

Thanks for nominating it for 2.5. We will track this for the next release/Quality program.
blocking-b2g: 2.5? → ---
Summary: Implement ECMA 402 extension for DateTimeFormat fieldPosition argument → Implement ECMA 402 DateTimeFormat formatToParts
Attached patch 1216150.patch (obsolete) — Splinter Review
We got a consensus proposal that I'm basing my patches and spec changes on.

This patch is a bit simpler than the previous one. 

There are two things that I'm not sure about:

1) I'm once again using char*  because those are not property names anymore, just string values. Not sure if I should keep them in a `switch` or some map.

2) In the main `while` loop I have three times similar code for adding a new part to the return array. I was thinking about abstracting it into a small function but that function would have to take 8 arguments as references so I gave up. I'm not sure if I should have.

Can you look at this and tell me what needs updates?

Thanks!
Attachment #8680290 - Attachment is obsolete: true
Attachment #8680290 - Flags: review?(jwalden+bmo)
Attachment #8682343 - Flags: feedback?(jwalden+bmo)
Comment on attachment 8682343 [details] [diff] [review]
1216150.patch

Review of attachment 8682343 [details] [diff] [review]:
-----------------------------------------------------------------

Could you explain exactly what the algorithm performed by the Intl.cpp intl_FormatToPartsDateTime code is?  I can critique the JSAPI-using, SpiderMonkey-internal-method-using parts of this pretty easily.  But I'm not sure how fully useful that is, if I don't understand the algorithm itself.

::: js/src/builtin/Intl.cpp
@@ +1678,5 @@
>  
>      /*
>       * Install the getter for DateTimeFormat.prototype.format, which returns a
>       * bound formatting function for the specified DateTimeFormat object
>       * (suitable for passing to methods like Array.prototype.map).

If you do the reuse thing, you can do

"""
Install getters forDateTimeFormat.prototype.format{,ToParts}, each of which returns a formatting function bound to the specified....
"""

@@ +1680,5 @@
>       * Install the getter for DateTimeFormat.prototype.format, which returns a
>       * bound formatting function for the specified DateTimeFormat object
>       * (suitable for passing to methods like Array.prototype.map).
>       */
> +    RootedValue formatGetter(cx);

You can reuse the one |getter| RootedValue, so no need to rename this.

@@ +1701,5 @@
> +    {
> +        return nullptr;
> +    }
> +    if (!DefineProperty(cx, proto, cx->names().formatToParts, UndefinedHandleValue,
> +                        JS_DATA_TO_FUNC_PTR(JSGetterOp, &formatToPartsGetter.toObject()),

i.e.

    if (!GlobalObject::getIntrinsicValue(cx, cx->global(), cx->names().DateTimeFormatFormatToPartsGet,
                                         &getter))
    {
        return nullptr;
    }

    if (!DefineProperty(cx, proto, cx->names().formatToParts, UndefinedHandleValue,
                        JS_DATA_TO_FUNC_PTR(JSGetterOp, &getter.toObject()),
...

@@ +2002,5 @@
> +
> +    return true;
> +}
> +
> +const char* getPartNameForFormatField(UDateFormatField fieldName) {

static PropertyName*
getPartNameForFormatField(UDateFormatField fieldName, JSContext* cx)
{

@@ +2003,5 @@
> +    return true;
> +}
> +
> +const char* getPartNameForFormatField(UDateFormatField fieldName) {
> +    // See unicode/udat.h for detailed list of fields

Refer to http://www.icu-project.org/apiref/icu4c/udat_8h.html instead.  Also note that complete-sentence comments in SpiderMonkey have full punctuation, i.e. trailing period, initial capital, etc.

@@ +2006,5 @@
> +const char* getPartNameForFormatField(UDateFormatField fieldName) {
> +    // See unicode/udat.h for detailed list of fields
> +    switch (fieldName) {
> +        case UDAT_ERA_FIELD:
> +            return "era";

SpiderMonkey style is

    switch (fieldName) {
      case ...:
        ...;
      case ...:
        ...;

@@ +2027,5 @@
> +            return "weekday";
> +        case UDAT_AM_PM_FIELD:
> +            return "dayperiod";
> +        case UDAT_TIMEZONE_FIELD:
> +            return "timeZoneName";

Make this function return PropertyName* instead of const char*, and then have each of these be cx->names().era, cx->names().year, and so on.  Add all those names to CommonPropertyNames.h as needed.

@@ +2029,5 @@
> +            return "dayperiod";
> +        case UDAT_TIMEZONE_FIELD:
> +            return "timeZoneName";
> +        // for now, we're going to skip other fields, until we add them
> +        // to ECMA 402

Complete sentence, period, etc.

@@ +2053,5 @@
> +        return false;
> +    UErrorCode status = U_ZERO_ERROR;
> +    UFieldPositionIterator* fpositer = ufieldpositer_open(&status);
> +    if (U_FAILURE(status))
> +        return false;

In JSAPI-using code, there are two sorts of failure.

If you do something JSAPI or SpiderMonkey that'll throw an exception on failure, you can just return false on error -- sb.ensureTwoByteChars() and sb.resize() are examples.  (You'd know they did this somewhat by the |cx| passed to StringBuffer, somewhat from gradual learning over time.)

But if you're doing something that can fail, that's *not* going to throw, you have to throw an exception yourself on failure -- else a failure will just silently terminate script execution.  ICU functions are such functions (note how U_FAILUREs elsewhere in this file are usually followed by JS_ReportErrorNumber(cx, GetErrorMessage, nullptr, JSMSG_INTERNAL_INTL_ERROR) or so.  Please add a similar thing to this bit.

    UFieldPositionIterator* fpositer = ufieldpositer_open(&status);
    if (U_FAILURE(status)) {
        JS_ReportErrorNumber(cx, GetErrorMessage, nullptr, JSMSG_INTERNAL_INTL_ERROR)
        return false;
    }

@@ +2068,5 @@
> +        JS_ReportErrorNumber(cx, GetErrorMessage, nullptr, JSMSG_INTERNAL_INTL_ERROR);
> +        return false;
> +    }
> +
> +    RootedString str(cx, JS_NewUCStringCopyN(cx, sb.rawTwoByteBegin(), size));

Hmm.  Okay, so I'm dumb, and when I said to use StringBuffer earlier, I was wrong -- because we're copying the data here, necessarily to ensure we have the right size string.  Please revert |StringBuffer sb(cx);| and the next four lines to

    Vector<char16_t, INITIAL_CHAR_BUFFER_SIZE> chars(cx);
    if (!chars.resize(INITIAL_CHAR_BUFFER_SIZE))
        return false;

Also, I think this wants to be

NewStringCopyN<CanGC>

like the other code.


as I think you had this earlier.

(We should have some code to create a string from a Vector of stuff, efficiently, in a way that doesn't require a copy -- but that's not on you, that's on us.  :-) )

@@ +2085,5 @@
> +    int arrayIdx = 0;
> +
> +    while ((fieldName = ufieldpositer_next(fpositer, &beginIndex, &endIndex)) >= 0) {
> +        if (beginIndex > currentIndex) {
> +            partObject = NewBuiltinClassInstance<PlainObject>(cx);

This is fallible, needs a null-check.

@@ +2086,5 @@
> +
> +    while ((fieldName = ufieldpositer_next(fpositer, &beginIndex, &endIndex)) >= 0) {
> +        if (beginIndex > currentIndex) {
> +            partObject = NewBuiltinClassInstance<PlainObject>(cx);
> +            partType = StringValue(JS_NewStringCopyZ(cx, "separator"));

Add "separator" to common names, use cx->names().separator.  (But if you didn't, JS_NewStringCopyZ would be fallible and would require a null-check.)

@@ +2089,5 @@
> +            partObject = NewBuiltinClassInstance<PlainObject>(cx);
> +            partType = StringValue(JS_NewStringCopyZ(cx, "separator"));
> +            if (!DefineProperty(cx, partObject, cx->names().type, partType, nullptr, nullptr, JSPROP_ENUMERATE))
> +                return false;
> +            substr = SubstringKernel(cx, str, currentIndex, beginIndex - currentIndex);

This is fallible, needs a null-check.

Really, the general rule is anything that returns a pointer is fallible and should be null-checked.

@@ +2184,5 @@
>      // Use the UDateFormat to actually format the time stamp.
>      RootedValue result(cx);
> +    bool success = args[2].toBoolean()
> +        ? intl_FormatToPartsDateTime(cx, df, args[1].toNumber(), &result)
> +        : intl_FormatDateTime(cx, df, args[1].toNumber(), &result);

Formatting should be

    bool success = args[2].toBoolean()
                   ? intl_FormatToPartsDateTime(cx, df, args[1].toNumber(), &result)
                   : intl_FormatDateTime(cx, df, args[1].toNumber(), &result);

::: js/src/builtin/Intl.js
@@ +2710,3 @@
>  }
>  
> +function dateTimeFormatFormatToPartsToBind() {

Please move this hunk down to the start of the Intl_DateTimeFormat_formatToParts_get hunk.  I understand this function's very similar to dateTimeFormatFormatToBind, but it makes more sense grouping bound-function-behavior with the getter that exposes it, than to group functions that happen to look similar.
Attachment #8682343 - Flags: feedback?(jwalden+bmo)
Keywords: dev-doc-needed
Whiteboard: [DocArea=JS]
Attached patch 1216150.patchSplinter Review
Thanks a lot! Updated the patch to your feedback and re-added tests.

> Could you explain exactly what the algorithm performed by the Intl.cpp intl_FormatToPartsDateTime code is?

Oh! Yes! It's so easy to forget that code like this is not self-explanatory, sorry!

So, basically, this is an extension to ECMA 402 DateTimeFormat (and NumberFormat will come as well) that allows users to format the date string.

The `.format()` function returns an opaque string that does not allow for any sort of rich formatting (put month in <b></b> or dayperiod in <sup></sup>).

The way it works, in example, is that:

var formatter = Intl.DateTimeFormat(navigator.languages, {
  year: 'numeric',
  month: 'long',
  day: 'numeric'
});
var now = new Date();

var str = formatter.formatToParts(now).map(({type, value}) => {
  switch (type) {
    case 'month': return `<b>${value}</b>`;
    default     : return value;
  }
}).reduce((string, part) => string + part);

which will give you sth like "25 <b>November</b> 2015" with proper formatting for any locale.

As you can see in tests, if you omit the special case for `month`, formatToParts concatenates to the same string as returned by `format`.

That also allows us to remove tokens if UX needs it (there are cases in FxOS Calendar where we need to show hour without am/pm dayperiod token) and for LockScreen we want to show dayperiod token as <sup>.

I presented the feature together with two other editors of ECMA 402 rev.3 at last week TC39 meeting and we got it accepted in stage 1.

The plan is to get it to stage 2 for the next TC39 meeting in January, and to stage 3 in March. That's well in time for June ECMA 402 rev 3 release.

The initial reaction from Microsoft, Google and Apple is positive. Microsoft may have to circumvent the fact that they don't use CLDR, but they believe that they can match the algo we're proposing 1-1 and they want/need something exactly like that for Windows.
Google and Apple are happy to follow and they also expressed interest in this feature.

Imho, the only thing that is specific for us, is that we need it *now*. As in, if we can get the early implementation only for B2G in for 2.5, we can solve a cluster of bugs that is accruing related to mis-formatting of dates/times.
Attachment #8682343 - Attachment is obsolete: true
Attachment #8690566 - Flags: review?(jwalden+bmo)
blocking-b2g: --- → 2.5+
Blocks: 1226612
Feh.  For better or worse, I ended up taking this patch and fixing it to do what I wanted, locally, rather than deal with extended Telephone Review again.

However, in the process I ran into a few issues, beyond style issues, JSAPI misuses, &c.  I'll list 'em here so at least they're on record -- no point posting a partial patch given their existence will mean the final patch won't be what I'd post.

1) Does udat_formatForFields's iterator return parts in start-to-end order?  Docs are unclear.  I filed http://bugs.icu-project.org/trac/ticket/12024 on this so upstream can clarify.  (Hopefully it's that ordering, but the current patch would just just ass-u-me this, which I'm not horribly comfortable doing.)

2) With the patch as posted so far (and as locally), my utterly trivial thirty seconds of fiddling exposed an unhandled case:

js> new Intl.DateTimeFormat("de", { weekday: "narrow" }).formatToParts(Date.now())
[{type:"separator", value:"D"}]

Surely that should be...something, right?

It's pretty disturbing it was that easy to find an unhandled part-field.  The proposal is a decent handwavy start, but I *really* want a full list of every single possible ICU field that we're going to recognize.  Semi-haphazard recognition as happens now is not acceptable.  Quite likely I want us listing every single possible field in the switch-statement, exhaustively, with each returning a name-indicator or a no-name-indicator.  Then there can be fallthrough *outside* the switch returning nullptr, as extra backstop.  (And as we compile with other ICUs we can add #ifdefs to the switch to deal with new fields as they crop up.)

3) I really hope it's okay to use lambdas a bit, because my current WIP local patch uses a few of them, semi-non-trivially.  Turns out they're really nice to deal with doing a possible separator-action before all parts, then each part followed by a possible separator-action -- you don't have to duplicate the separator-action code.  :-)  If it happens such lambdas are off-limits, oh well, I can rewrite a little bit.

4) Neither the current patch, nor my changes to it, expose this method *only* if explicitly requested -- necessary if we're going to only ship this to certified apps until it's further along in standardization.  That work's going to have to happen to do this in a temporary way.
Thanks so much Jeff!

I can only try to help with one of your concerns:

> 2) With the patch as posted so far (and as locally), my utterly trivial thirty seconds of fiddling exposed an unhandled case:

We do have a list of all possible fields [0] and I just did not add them all to my switch because my main concern was to get the codebase to work first. There are 36 fields and in my original patch I mapped 13. I'll provide you the final list today.
Substantively, I treated UDAT_STANDALONE_MONTH_FIELD as "month" and UDAT_STANDALONE_DAY_FIELD as "weekday", to fix { month: "narrow" } and { weekday: "narrow" } results.  I also listed out all the other UDateFormatField values in the switch, exhaustively, so that omissions will be detected as compiler warnings (possibly even errors, depending how we compiled).  It would still be good to have that full list of field/type correspondences, tho, because I didn't carefully verify that the only two changes I made -- adding the two *STANDALONE* cases as handled cases -- are the only ones we should be making.

I also removed abortive, unused changes to Intl.NumberFormat and made field-type be converted directly to a JSAtomState pointer-to-member to the corresponding name (or nullptr if no such).

In other news, I'm not sure what I think about this ad-hoc-tested behavior I stumbled across:

js> new Intl.DateTimeFormat("de", { hour: "2-digit" }).formatToParts(Date.now())
[{type:"hour", value:"16"}, {type:"separator", value:" Uhr"}]

The " Uhr" there is intrinsically tied to the hour-number, and I think there's a pretty decent argument that the whole thing should be one "hour" part, not one "hour" part and a separator.  But maybe not 100% certain about that.

Note that for "en-US", we have [{type:"hour", value:"4"}, {type:"separator", value:" "}, {type:"dayperiod", value: "PM"}], where there's trailing text but it's affiliated with a bona fide type -- much more sensible than for the "de" case.
Assignee: gandalf → jwalden+bmo
Status: NEW → ASSIGNED
Assignee: jwalden+bmo → gandalf
Attachment #8690566 - Flags: review?(jwalden+bmo)
I added an at-global-object-creation-time mechanism to make the DateTimeFormat formatToParts method opt-in.  Whenever this might be ready to land, it'll be up to the b2g side to ensure this only happens for globals created on behalf of certified apps.  (And I'll want the patch to use it to land simultaneous with this patch, so I can be confident this method's properly exposed to certified apps *only*.)
Attachment #8695034 - Attachment is obsolete: true
I'm testing the patch at the moment. I should have the final version for your review tonight.

> it'll be up to the b2g side to ensure this only happens for globals created on behalf of certified apps.  (And I'll want the patch to use it to land simultaneous with this patch, so I can be confident this method's properly exposed to certified apps *only*.)

Fabrice, can you help with this patch?
Flags: needinfo?(fabrice)
(In reply to Zibi Braniecki [:gandalf][:zibi] from comment #18)
> I'm testing the patch at the moment. I should have the final version for
> your review tonight.
> 
> > it'll be up to the b2g side to ensure this only happens for globals created on behalf of certified apps.  (And I'll want the patch to use it to land simultaneous with this patch, so I can be confident this method's properly exposed to certified apps *only*.)
> 
> Fabrice, can you help with this patch?

Yes I can help with the "certified only" part.
Flags: needinfo?(fabrice)
Attached patch formatToParts patch (obsolete) — Splinter Review
Updated your patch with two changes:

1) I went through all the UDAT fields and matched them to what Intl API returns. It means that several more fields potentially returned by ICU should be properly matched to tokens.

2) I wrapped the loop for ufieldpositer_next in an if that tests if `resultSize` is not 0.

The reason is that there are some cases when the combination of options produces an empty string and then spidermonkey hanged.

The example of the condition is:
> new Intl.DateTimeFormat("en-US", {era: 'short', day: '2-digit'}).format(Date.now())

I also hacked a tool that produces *all* possible combinations of options for DateTimeFormat - almost 2100000 - and runs them all checking if any of the requested options is missing in the formatted parts array.

That means that first of all - the patch does not crash/hang on *any* combination of options. Second of all, it returns all matching tokens for around 180k of them, it returns an empty string for around 3000 and it fails to match all tokens for around 35k options.

I could not manually go through all 35k failures, but from scanning it seems that they all make sense - as in - ICU doesn't return a matching timeZone for second/minute/timeZone request or for second/era doesn't return the era token.

So, I think the patch is ready for review and experimental landing to fix bug 1215068.
Attachment #8695763 - Attachment is obsolete: true
Attachment #8696133 - Flags: review?(jwalden+bmo)
Attached patch l10n-certified-only.patch (obsolete) — Splinter Review
Applies on top of the main patch. This enables the experimental feature only on certified apps, ie. gaia preloaded ones.
Attachment #8697268 - Flags: review?(jwalden+bmo)
Comment on attachment 8697268 [details] [diff] [review]
l10n-certified-only.patch

Review of attachment 8697268 [details] [diff] [review]:
-----------------------------------------------------------------

So, hm.  This is a little awkward.  I was thinking that by putting an option into CompartmentOptions, this would be something you would specify when you created the global object -- only.  But I think your patch points out that this isn't how CompartmentOptions is used: you can get a CompartmentOptions for an existing global, and you can modify it *even after it's been placed in the global*.  I think you do that in this patch, actually.  (So at a minimum, I think I'll want this option-setting to happen in slightly different locations in these files.  Still pondering to determine where.)

So then we could have:

   CompartmentOptions opts;
   JSObject* global = JS_NewGlobal(..., opts);
   ...run some script whose behavior depends on those option values, e.g. checking formatToParts's existence...
   auto storedOpts = CompartmentOptionsRef(global);
   storedOpts.setExperimentalDateTimeFormatFormatToPartsEnabled(true);
   ...run more script that depends on formatToParts not existing -- but it might now be faulted in!...

Not immediately sure what to do about this.  Pondering still.  Will have something later today on this, I hope.
Waldo, do you have any progress on that?
Flags: needinfo?(jwalden+bmo)
Depends on: 1235615
Comment on attachment 8697268 [details] [diff] [review]
l10n-certified-only.patch

Review of attachment 8697268 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/workers/WorkerScope.cpp
@@ +432,5 @@
>    const bool extraWarnings = usesSystemPrincipal &&
>                               xpc::ExtraWarningsForSystemJS();
>  
>    options.setDiscardSource(discardSource)
> +         .setExperimentalDateTimeFormatFormatToPartsEnabled(mWorkerPrivate->IsInCertifiedApp())

Same comment as in nsXPConnect.cpp.

::: js/xpconnect/src/nsXPConnect.cpp
@@ +417,5 @@
>  
> +    // Enable ECMA-402 experimental formatToParts in certified apps.
> +    if (status == nsIPrincipal::APP_STATUS_CERTIFIED) {
> +        JS::CompartmentOptionsRef(aGlobal)
> +          .setExperimentalDateTimeFormatFormatToPartsEnabled(status == nsIPrincipal::APP_STATUS_CERTIFIED);

This looks like about the right thing.  Although bug 1235615 will want very slightly different code here, and the addition of setExperimentalDateTimeFormatFormatToPartsEnabled would be to what's named JS::CompartmentCreationOptions in the current patch.  Should be simple alterations to make, once that bug's changes land.
Attachment #8697268 - Flags: review?(jwalden+bmo) → review+
Comment on attachment 8696133 [details] [diff] [review]
formatToParts patch

Review of attachment 8696133 [details] [diff] [review]:
-----------------------------------------------------------------

(In reply to Zibi Braniecki [:gandalf][:zibi] from comment #20)
> 1) I went through all the UDAT fields and matched them to what Intl API
> returns. It means that several more fields potentially returned by ICU
> should be properly matched to tokens.

Do you by chance have tests invoking each of them, that you could include here?  It would be nice to demonstrate that failure to characterize any particular initializer would immediately cause *some* test to fail, rather than simply taking it on faith (and taking the characterization of the part on faith, too).

Let's make those test-additions into a followup patch, so as not to delay this longer.

> 2) I wrapped the loop for ufieldpositer_next in an if that tests if
> `resultSize` is not 0.
> 
> The reason is that there are some cases when the combination of options
> produces an empty string and then spidermonkey hanged.
> 
> The example of the condition is:
> > new Intl.DateTimeFormat("en-US", {era: 'short', day: '2-digit'}).format(Date.now())

Eeeugh.  I guess this is technically permitted to return an empty string here, but still.  Eeeugh.

> I also hacked a tool that produces *all* possible combinations of options
> for DateTimeFormat - almost 2100000 - and runs them all checking if any of
> the requested options is missing in the formatted parts array.
> 
> That means that first of all - the patch does not crash/hang on *any*
> combination of options. Second of all, it returns all matching tokens for
> around 180k of them, it returns an empty string for around 3000 and it fails
> to match all tokens for around 35k options.

Cool beans.  It might be worth doing a run of this testing against the full set of locales in Intl/DateTimeFormat/supportedLocalesOf.js, although I think we have enough data now to defer that to after this lands.

::: js/src/builtin/Intl.cpp
@@ +386,5 @@
>  }
>  
> +static int32_t
> +udat_formatForFields(const UDateFormat* format, UDate dateToFormat, UChar* result,
> +            int32_t resultLength, UFieldPositionIterator* fpositer, UErrorCode* status)

Align everything under the first parameter, and let's keep result/resultLength on the same line since they're a pair:

static int32_t
udat_formatForFields(const UDateFormat* format, UDate dateToFormat,
                     UChar* result, int32_t resultLength, UFieldPositionIterator* fpositer,
                     UErrorCode* status)

@@ +1695,5 @@
>      }
>  
> +    // If the still-experimental DateTimeFormat.prototype.formatToParts method
> +    // is enabled, also add its getter.
> +    if (cx->compartment()->options().experimentalDateTimeFormatFormatToPartsEnabled()) {

This will be creationOptions() if bug 1235615 lands first, and the necessary changes noted elsewhere in this patch are made.

@@ +2138,5 @@
> +
> +    uint32_t partIndex = 0;
> +    RootedArrayObject partsArray(cx, NewDenseEmptyArray(cx));
> +    if (!partsArray)
> +        return false;

Rather than guarding a large while-loop below and having to indent a large code block without good reason, let's handle the early-exit case here -- rearrange things so that a zero-sized result is just an early exit:

    RootedArrayObject partsArray(cx, NewDenseEmptyArray(cx));
    if (!partsArray)
        return false;
    if (resultSize == 0) {
        // An empty string contains no parts, so avoid extra work below.
        args.rval().setObject(*partsArray);
        return true;
    }

    RootedString overallResult(cx, NewStringCopyN<CanGC>(cx, chars.begin(), resultSize));
    if (!overallResult)
        return false;

    size_t lastEndIndex = 0;

    uint32_t partIndex = 0;
    RootedObject singlePart(cx);
    ...

Then undo the if-resultSize-not-zero change.

@@ +2173,5 @@
> +        partIndex++;
> +        return true;
> +    };
> +
> +    if (resultSize != 0) {

Set the early-return adjustment aside.

Why wouldn't the while-loop have ufieldpositer_next return a value less than zero and iterate no times?  Is this an ICU bug that we should report upstream?  Please explain to me why this isn't an ICU bug, because I'd expect the iterator to iterate no times (ufieldpositer_next to return < 0) and not iloop.  It's really not clear to me why there would/should be any ilooping in the code here, as I read it now.  If indeed this is a bug, we should report it to ICU upstream.

::: js/src/jsapi.h
@@ +2193,5 @@
>        , mergeable_(false)
>        , discardSource_(false)
>        , disableLazyParsing_(false)
>        , cloneSingletons_(false)
> +      , experimentalDateTimeFormatFormatToPartsEnabled_(false)

If this patch doesn't land before bug 1235615 does, this should move into JS::CompartmentCreationOptions.

@@ +2260,5 @@
> +    // mission-critical code that can't be changed if ECMA-402 decides not to
> +    // accept the method in its current form.
> +    bool experimentalDateTimeFormatFormatToPartsEnabled() const {
> +        return experimentalDateTimeFormatFormatToPartsEnabled_;
> +    }

And all this hunk of diff, too.

@@ +2314,5 @@
>      bool mergeable_;
>      bool discardSource_;
>      bool disableLazyParsing_;
>      bool cloneSingletons_;
> +    bool experimentalDateTimeFormatFormatToPartsEnabled_;

And.

::: js/src/shell/js.cpp
@@ +3980,5 @@
>  
> +        if (!JS_GetProperty(cx, opts, "experimentalDateTimeFormatFormatToPartsEnabled", &v))
> +            return false;
> +        if (v.isBoolean())
> +            options.setExperimentalDateTimeFormatFormatToPartsEnabled(v.toBoolean());

If bug 1235615 lands first, this will need obvious according updates.  But please be extra-careful that this addition goes among the creationOptions() additions.  I went out of my way to have all the at-creation-options together, and all the behaviors-options together, with no intermixing, and I'd like that non-intermixing preserved.

::: js/src/tests/Intl/DateTimeFormat/formatToParts.js
@@ +1,1 @@
> +// |reftest| skip-if(!this.hasOwnProperty("Intl")||!newGlobal({experimentalDateTimeFormatFormatToPartsEnabled:true}).Intl.DateTimeFormat().formatToParts)

I think you need to add

  !this.newGlobal||

before the existing !newGlobal in there, so that browser runs of these tests work.

@@ +1,4 @@
> +// |reftest| skip-if(!this.hasOwnProperty("Intl")||!newGlobal({experimentalDateTimeFormatFormatToPartsEnabled:true}).Intl.DateTimeFormat().formatToParts)
> +/* This Source Code Form is subject to the terms of the Mozilla Public
> + * License, v. 2.0. If a copy of the MPL was not distributed with this
> + * file, You can obtain one at http://mozilla.org/MPL/2.0/. */

Use

// Any copyright is dedicated to the Public Domain.
// http://creativecommons.org/licenses/publicdomain/

instead.
Attachment #8696133 - Flags: review?(jwalden+bmo) → review+
Re the concern about setting creation-time options and not modifying them post-creation, discussing attachment 8697268 [details] [diff] [review] in comment 22: I think I might have misread the code.  The Wrap method takes in options and creates a global object from them, and so I think setting stuff there is *not* the same as modifying an option for an existing global object.  So I think we're okay on that front in the patch.
Flags: needinfo?(jwalden+bmo)
Attached patch formatToParts.diff (obsolete) — Splinter Review
Patch updated to comments.

wrt. ufieldpositer_next, quote from :waldo on IRC:

> now, as for ufieldpositer_next returning 0
> I see "return FALSE if there are no more fields." in http://icu-project.org/apiref/icu4c/ufieldpositer_8h.html#ac1b8f9ef44181e2865fe8fdab9c50a40
> which, technically, means that the docs are a little confusing
> because FALSE return is non-negative and non-positive, *but* the return docs say "The field type (non-negative value), or a negative value if there are no more fields for which to provide information. If negative, then any values pointed to by beginIndex and endIndex are undefined."
> but if it's returning 0 there, that corresponds to UDAT_ERA_FIELD
> so yes, as I read it now, 0 is supposedly a "field type (non-negative value)" meaning UDAT_ERA_FIELD, but they *meant* it to be FALSE
> I think there's a bug, and it's likely that that ufieldpositer_next is returning FALSE and thinking that indicates error, when it meant to return -1

I'm keeping the early exit in our patch and will report an ICU bug and point to this comment.
Attachment #8696133 - Attachment is obsolete: true
Attachment #8703191 - Flags: review?(jwalden+bmo)
Attached patch formatToParts.diff (obsolete) — Splinter Review
New version. This time, with 100% less accidentally commented out blocks of code and a new vastly improved while condition.
Attachment #8703191 - Attachment is obsolete: true
Attachment #8703191 - Flags: review?(jwalden+bmo)
Attachment #8703194 - Flags: review?(jwalden+bmo)
Attachment #8703194 - Flags: review?(jwalden+bmo) → review+
> /builds/slave/l64-br-haz_m-in_dep-0000000000/build/source/js/src/builtin/Intl.cpp:390:59: error: 'UFieldPositionIterator' has not been declared
> /builds/slave/l64-br-haz_m-in_dep-0000000000/build/source/js/src/builtin/Intl.cpp:2020:28: error: 'UDateFormatField' was not declared in this scope
> /builds/slave/l64-br-haz_m-in_dep-0000000000/build/source/js/src/builtin/Intl.cpp:2021:1: error: expected ',' or ';' before '{' token 

:waldo, do you know why it errors here?
Flags: needinfo?(jwalden+bmo)
Looks like it's because these types are provided via a bunch of #includes which are inside of a "#if ENABLE_INTL_API" guard:
https://hg.mozilla.org/integration/mozilla-inbound/annotate/3d5156fda2df/js/src/builtin/Intl.cpp#l24

And presumably the build with the failures ("Hazard Analysis Build") has ENABLE_INTL_API disabled.

I suspect you may be able to reproduce the failure locally if you build with "ac_add_options --without-intl-api" in your mozconfig. (not 100% sure, but I'm assuming that mozconfig-flag controls ENABLE_INTL_API, based on the naming.)
Flags: needinfo?(jwalden+bmo)
(In the meantime, you should probably back yourself out until you've got an updated fix.)
Oh, ok. We just need to define some stuff for our minimal mock of ICU for non-ICU builds.

Carrying over r+

New patch is in try https://treeherder.mozilla.org/#/jobs?repo=try&revision=9a47c6e789d1
Attachment #8703194 - Attachment is obsolete: true
Attachment #8703230 - Flags: review+
https://hg.mozilla.org/mozilla-central/rev/15bd594b5982
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla46
This is not fully closed yet. We need fabrice's patch to land.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Fabrice, now that the main patch landed, can you land your patch?
Flags: needinfo?(fabrice)
Depends on: 1236256
No longer depends on: 1236256
I found it cleaner to add the new flag to the CompartmentBehaviors and not to the creation options. Let me know it that works for you.
Attachment #8697268 - Attachment is obsolete: true
Flags: needinfo?(fabrice)
Attachment #8703744 - Flags: review?(jwalden+bmo)
Comment on attachment 8703744 [details] [diff] [review]
l10-certified-only.patch v2

Review of attachment 8703744 [details] [diff] [review]:
-----------------------------------------------------------------

The creation-option/behavior divide is very specifically intended to force some options to be set *before* the global object is created, and before any script might have run, objects be created, accesses or enumerations performed, etc.  Putting this in behavior-land makes it possible for someone to set at some far-distant time, well after it's been observed Intl.DateTimeFormat().formatToParts doesn't exist, making the set completely ineffective.  That's not acceptable.

I think probably the solution here is to split up InitGlobalObject into a part performed before the global is created -- InitCompartmentOptions, say -- and a part performed that depends on the global having been created -- InitGlobalObject.  Then ensure the former is called before the global object is created in all the relevant places.  A little finicky, but I don't think it'll be that bad.  And this is definitely something we want for longer-term hygiene in any event.
Attachment #8703744 - Flags: review?(jwalden+bmo) → review-
Depends on: 1236525
Waldo, Fabrice says he will not be able to finish the last patch to turn on the bit for B2G.

Would you be able to take it over?
Flags: needinfo?(jwalden+bmo)
There are exactly two callers of xpc::InitGlobalObject, and this modifies both of them:

http://mxr.mozilla.org/mozilla-central/search?string=initglobalobject

Super-straightforward splitup, IMO.
Attachment #8713420 - Flags: review?(bzbarsky)
Assignee: gandalf → jwalden+bmo
Status: REOPENED → ASSIGNED
Super-trivial atop the previous patch.
Attachment #8713421 - Flags: review?(fabrice)
Comment on attachment 8713421 [details] [diff] [review]
Turn on the experimental Intl.DateTimeFormat.prototype.formatToParts in b2g certified apps

Review of attachment 8713421 [details] [diff] [review]:
-----------------------------------------------------------------

Any reason for not changing dom/workers/WorkerScope.cpp ?
Comment on attachment 8713420 [details] [diff] [review]
Split xpc::InitGlobalObject into an options-setting component and a global-object-modifying component

This needs documentation.  In particular, I assume the relevant constraint is that InitGlobalObjectOptions should be called before CreateGlobalObject or equivalent?

This needs to be documented.  You also need to document why we can't just push down InitGlobalObjectOptions into CreateGlobalObject (because not all consumers of InitGlobalObject create it with CreateGlobalObject).

Even better, seems to me, would be to have a single function that does InitGlobalObjectOptions, then calls some handed-in function pointer to create the global, then does InitGlobalObject.  And make that the only exposed API, so people can't screw up the ordering even if they wanted to...

r=me with just detailed comments added, but do consider the other, at least for a followup.
Attachment #8713420 - Flags: review?(bzbarsky) → review+
Just pentesting the review process.  You'll believe that, rather than think I just forgot to fix both parts of the code, right?
Attachment #8713879 - Flags: review?(fabrice)
Attachment #8713421 - Attachment is obsolete: true
Attachment #8713421 - Flags: review?(fabrice)
Attachment #8713879 - Flags: review?(fabrice) → review+
Okay, we should be good for this bug now.  Whenever this makes its way far enough into the standards process, we can flip the pref's default, then remove all the conditional enabling code, in separate bugs.
Flags: needinfo?(jwalden+bmo)
Summary: Implement ECMA 402 DateTimeFormat formatToParts → Implement ECMA 402 DateTimeFormat formatToParts behind a compartment option (enabled only for b2g certified apps)
So this turns things on for dedicated workers, in certified apps, but not shared or service workers.  I guess that's ok?
I filed bug 1260858 to track progress of getting this feature ready to be exposed to the web.
See Also: → 1260858
No longer blocks: 1289340
You need to log in before you can comment on or make changes to this bug.