Implement NumberFormat.prototype.formatToParts

RESOLVED FIXED in Firefox 53

Status

()

Core
JavaScript: Internationalization API
RESOLVED FIXED
a year ago
11 months ago

People

(Reporter: gandalf, Assigned: Waldo)

Tracking

(Blocks: 1 bug, {dev-doc-needed})

unspecified
mozilla53
dev-doc-needed
Points:
---

Firefox Tracking Flags

(firefox53 fixed)

Details

Attachments

(2 attachments, 4 obsolete attachments)

(Reporter)

Description

a year ago
As a counterpart to already implemented DateTimeFormat.prototype.formatToParts we need NumberFormat.prototype.formatToParts.

The spec patch is here: https://github.com/tc39/ecma402/pull/79

The output is here: https://rawgit.com/zbraniecki/ecma402/format-to-parts-mix/out/index.html
(Reporter)

Comment 1

a year ago
:waldo offered to take a stab at this. It'll probably mean writing an ICU counterpart to udat_formatForFields used in patch in bug 1216150 to implement the DateTimeFormat counterpart.

Having this would help us with adoption in other engines.
Flags: needinfo?(jwalden+bmo)
Keywords: dev-doc-needed
Created attachment 8784083 [details] [diff] [review]
WIPish WIP

Comment 3

a year ago
(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #2)
> Created attachment 8784083 [details] [diff] [review]
> WIPish WIP

It looks like the ICU portion of this patch is to add a C wrapper for the existing C++ API. Anyway, it'd make it easier to upstream (when it's done) to file a bug at the upstream bug tracker for this feature request. 

I've just done that. See http://bugs.icu-project.org/trac/ticket/12684
(In reply to Jungshik Shin from comment #3)
> it'd make it easier to upstream (when it's done)
> to file a bug at the upstream bug tracker for this feature request. 
> 
> I've just done that. See http://bugs.icu-project.org/trac/ticket/12684

Yeah, this was totally in the cards once I had a functional patch.  And the policy is not to patch ICU locally without an upstream patch landing, or in simple-enough cases occurring fairly simultaneously.  Just, until I have a patch complete and I know the API does what I want, there's not too much point in the upstream bug yet.  If the API to expose were subject to a bunch of discussion, perhaps -- but this had an obvious parallel, so I was willing to take the risk of having to redo some things once ICU decided on an API.  :-)
Flags: needinfo?(jwalden+bmo)
Created attachment 8797334 [details] [diff] [review]
Seemingly-functional prototype patch

Bit later than I was aiming for, but this seems to work in very manual testing:

js> function test(num, locale, opts) { var nf = newGlobal({ experimentalNumberFormatFormatToPartsEnabled: true }).Intl.NumberFormat(locale, opts); print(uneval(nf.formatToParts(num))); assertEq(nf.format(num), nf.formatToParts(num).map(obj => obj.value).join("")); }
js> test(34.3)
[{type:"integer", value:"34"}, {type:"decimal", value:"."}, {type:"fraction", value:"3"}]
js> test(-34.3)
[{type:"minusSign", value:"-"}, {type:"integer", value:"34"}, {type:"decimal", value:"."}, {type:"fraction", value:"3"}]
js> test(-Infinity)
[{type:"minusSign", value:"-"}, {type:"infinity", value:"\u221E"}]
js> test(Infinity)
[{type:"infinity", value:"\u221E"}]
js> test(NaN)
[{type:"nan", value:"NaN"}]
js> test(NaN, "en-US", { style: "percent" })
[{type:"nan", value:"NaN"}]
js> test(0.33, "en-US", { style: "percent" })
[{type:"integer", value:"33"}, {type:"percentSign", value:"%"}]
js> test(0.725, "en-US", { style: "percent" })
[{type:"integer", value:"73"}, {type:"percentSign", value:"%"}]
js> test(0.725, "en-US", { style: "percent", minimumSignificantDigits: 3 })
[{type:"integer", value:"72"}, {type:"decimal", value:"."}, {type:"fraction", value:"5"}, {type:"percentSign", value:"%"}]
js> test(7210.375, "en-US", { style: "percent", minimumSignificantDigits: 3 })
[{type:"integer", value:"721"}, {type:"group", value:","}, {type:"integer", value:"037"}, {type:"decimal", value:"."}, {type:"fraction", value:"5"}, {type:"percentSign", value:"%"}]
js> test(7210.375, "ar-IQ", { style: "percent", minimumSignificantDigits: 3 })
[{type:"integer", value:"\u0667\u0662\u0661"}, {type:"group", value:"\u066C"}, {type:"integer", value:"\u0660\u0663\u0667"}, {type:"decimal", value:"\u066B"}, {type:"fraction", value:"\u0665"}, {type:"percentSign", value:"\u066A"}]
js> test(0)
[{type:"integer", value:"0"}]
js> test(-0)
[{type:"integer", value:"0"}]

Still need to write tests, stare at the algorithm to convert ICU's hierarchical fields-information into the disjoint-parts information formatToParts requires to see if it can be simplified or made more readable (or, for that matter, proven correct with spelled-out invariants), extricate an ICU-upstreamable patch for the ICU bits, &c.  But this actually seems to work (with the caveat that it doesn't actually *expose* formatToParts unless you opt into it on the relevant global.)

People can feel free to look at this if they want, but it's possibly still a bit rough.  Caveat lector.
Attachment #8784083 - Attachment is obsolete: true
Assignee: nobody → jwalden+bmo
Status: NEW → ASSIGNED
(Reporter)

Comment 6

a year ago
Looks awesome!

I was able to compile it and test it and it works as expected.

It passes test262 except of 

https://github.com/tc39/test262/blob/master/test/intl402/NumberFormat/prototype/formatToParts/length.js#L9

where it returns "1".

Do you think we should change it in the test or fix in the code?
Flags: needinfo?(jwalden+bmo)
(In reply to Zibi Braniecki [:gandalf][:zibi] from comment #6)
> It passes test262 except of 
> 
> https://github.com/tc39/test262/blob/master/test/intl402/NumberFormat/
> prototype/formatToParts/length.js#L9
> 
> where it returns "1".
> 
> Do you think we should change it in the test or fix in the code?

As I said in the Github issue, I don't think the argument to NumberFormat.prototype.formatToParts should be considered as an optional argument.  The behavior is to return formatted-to-parts NaN information, but this makes no sense as a behavior-by-default that anyone would actually want -- unlike the default for DateTimeFormat, where formatting the current time makes plenty of sense (just as |new Date()| produces a Date representing the current time).  So I would argue spec (proposal) and test should change.  I'm a bit behind on Github notifications; let me catch up on those (including any in that issue) and we can see whether changing length this way is or is not feasible upstream.

That said, changing the patch to have it return 0 is trivial, if it's necessary:

diff --git a/js/src/builtin/Intl.cpp b/js/src/builtin/Intl.cpp
--- a/js/src/builtin/Intl.cpp
+++ b/js/src/builtin/Intl.cpp
@@ -1370,5 +1370,5 @@ CreateNumberFormatPrototype(JSContext* c
         if (!GlobalObject::getSelfHostedFunction(cx, cx->global(),
                     cx->names().NumberFormatFormatToParts,
-                    name, 1, &ftp))
+                    name, 0, &ftp))
         {
             return nullptr;
diff --git a/js/src/builtin/Intl.js b/js/src/builtin/Intl.js
--- a/js/src/builtin/Intl.js
+++ b/js/src/builtin/Intl.js
@@ -2040,5 +2040,5 @@ function Intl_NumberFormat_format_get() 
 
 
-function Intl_NumberFormat_formatToParts(value) {
+function Intl_NumberFormat_formatToParts() {
     // Step 1.
     var nf = this;
Flags: needinfo?(jwalden+bmo)
I posted an ICU patch against trunk in the ICU ticket.  Not sure whether it also needs raising it on icu-design or not; I'll probably give it a few days, then do so if no one's commented on the ticket.

Still need to do various cleanups to the SpiderMonkey-specific parts of this patch.
(Reporter)

Updated

a year ago
Created attachment 8803561 [details] [diff] [review]
Patch largely ready for review, but upstream will have to act on the ICU bits first

Feel free to look at the ICU changes, but they're not what'll land (obviously, that #if 1 is skeevy).  What lands for that, depends on what ICU runs with upstream.  (Plus we'd need to import a patch to apply against our checkout, in addition to changing intl/icu/source directly.)

The rest of it is in, I think, landable shape.

The test is somewhat smallish, but I think it's adequate to cover the basics of things.  It seems to pass all the formatToParts tests upstream, except for formatToParts.length, which my reading of https://github.com/tc39/ecma402/pull/79/commits/a131147b4ed92fed9ecf93b2dc4f81833a464d40 discussion ending August 15 says we can make be 1, not 0 as the test wants.

Most of the complexity in this is converting ICU's nested-fields representation to intl402's sequence-of-fields representation.  I believe the logic of it is sound, but it's pretty finicky, so there could be a lurking problem in it.  I tried to add enough comments to describe what it's all doing; if more seem valuable, feel free to request them.
Attachment #8803561 - Flags: review?(andrebargull)
Attachment #8797334 - Attachment is obsolete: true
Created attachment 8803581 [details] [diff] [review]
With more tests added, a couple comment-fixes
Attachment #8803581 - Flags: review?(andrebargull)
Attachment #8803561 - Attachment is obsolete: true
Attachment #8803561 - Flags: review?(andrebargull)
Comment on attachment 8803581 [details] [diff] [review]
With more tests added, a couple comment-fixes

Review of attachment 8803581 [details] [diff] [review]:
-----------------------------------------------------------------

Only r-'ed for now because I'd like to re-review when the suggested changes are applied. :-)


And intl_FormatNumber() in js/src/builtin/Number.js needs to be updated, too.

::: js/src/builtin/Intl.cpp
@@ +11,5 @@
>  
>  #include "builtin/Intl.h"
>  
>  #include "mozilla/Range.h"
> +#include "mozilla/SizePrintfMacros.h"

Leftover include?

@@ +1610,5 @@
> +            JS_ReportErrorNumberASCII(cx, GetErrorMessage, nullptr, JSMSG_INTERNAL_INTL_ERROR);
> +            return false;
> +        }
> +    }
> +    ScopedICUObject<UFieldPositionIterator, ufieldpositer_close> toClose(fieldIterator);

It's easier to understand and less code if UFieldPositionIterator is created in the caller (i.e. intl_FormatNumberToParts) and then passed to PartitionNumberPattern.

@@ +1691,5 @@
> +
> +        // Manual trawling through the ICU call graph appears to indicate that
> +        // the basic formatting we request will never include a positive sign.
> +        // But this analysis may be mistaken, so don't absolutely trust it.
> +        // but it's possible this analysis is mistaken.

Duplicate comment line.

@@ +1737,5 @@
> +
> +    RootedArrayObject partsArray(cx, NewDenseEmptyArray(cx));
> +    if (!partsArray)
> +        return false;
> +    if (chars.length() == 0) {

This implies there exists a number which can be formatted to the empty string. I don't think this is actually possible, is it?

@@ +1770,5 @@
> +    int32_t fieldInt, beginIndexInt, endIndexInt;
> +    while ((fieldInt = ufieldpositer_next(fpositer, &beginIndexInt, &endIndexInt)) >= 0) {
> +        MOZ_ASSERT(beginIndexInt >= 0);
> +        MOZ_ASSERT(endIndexInt >= 0);
> +        MOZ_ASSERT(beginIndexInt <= endIndexInt,

Fields are never empty, so we can strengthen this assertion to |beginIndexInt < endIndexInt|.

https://dxr.mozilla.org/mozilla-central/rev/215f9686117673a2c914ed207bc7da9bb8d741ad/intl/icu/source/i18n/fpositer.cpp#67

@@ +1774,5 @@
> +        MOZ_ASSERT(beginIndexInt <= endIndexInt,
> +                   "field iterator returning invalid range");
> +
> +        uint32_t beginIndex(beginIndexInt);
> +        uint32_t endIndex(endIndexInt);

No need for extra locals, just call emplaceBack with `uint32_t(beginIndexInt)` and `uint32_t(endIndexInt)`. Maybe also rename {begin,end}IndexInt to {begin,end}Index in that case.

@@ +1783,5 @@
> +    }
> +
> +    // Second, sort the fields vector.
> +    size_t fieldsLen = fields.length();
> +    if (!fields.resizeUninitialized(fieldsLen * 2))

Maybe comment that |fields| is temporarily enlarged to hold the scratch space for MergeSort. (It only became clear to me why this is needed when I read the doc-comments for MergeSort.) But up to you to decide if this is useful.

@@ +1794,5 @@
> +                                  // Sort first by begin index, then to place
> +                                  // enclosing fields before nested fields.
> +                                  *lessOrEqual = left.begin < right.begin ||
> +                                                 (left.begin == right.begin &&
> +                                                  left.end > right.end);

Two fields cannot start at the same position (*), so we can change this to |*lessOrEqual = left.begin < right.begin| and also add |MOZ_ASSERT(left.begin != right.begin);|.

(*) Not necessarily in general, but at least when formatting numbers this shouldn't be possible. Or did I miss something here?

@@ +1835,5 @@
> +    //                       fraction
> +    //         integer        |
> +    //       _____|________   |   currency
> +    //      /  /| |\  |\  |\  |   ____|___
> +    //     /| / | | \ | \ | \ |\ /        \

Trailing backslash produces GCC warning: multi-line comment [-Wcomment]

@@ +1864,5 @@
> +
> +        // Index of the current field, in |fields|, being considered to
> +        // determine part boundaries.  |lastEnd <= fields[i].begin| is an
> +        // invariant.
> +        size_t i;

I prefer to spell out loop variables when they appear as class members, so |i| -> |index| here. (Not sure if my preferences match the SpiderMonkey code conventions!)

@@ +1867,5 @@
> +        // invariant.
> +        size_t i;
> +
> +        // The end index of the last part produced, always less than or equal
> +        // to |limit|, monotonically increasing.

"strictly monotonically increasing", because fields aren't empty.

@@ +1875,5 @@
> +        const uint32_t limit;
> +
> +        Vector<size_t, 4> enclosingFields;
> +
> +        void popEnclosingFieldsEndingAt(uint32_t end) {

Please add |MOZ_ASSERT_IF(!enclosingFields.empty(), fields[enclosingFields.back()].end >= end);| to ensure we didn't skip any fields.

@@ +1876,5 @@
> +
> +        Vector<size_t, 4> enclosingFields;
> +
> +        void popEnclosingFieldsEndingAt(uint32_t end) {
> +            while (enclosingFields.length() > 0 && fields[enclosingFields.back()].end == end)

|!enclosingFields.empty()| seems to be preferred over |enclosingFields.length() > 0| when using dxr to search for all callers of .length() and .empty().

@@ +1888,5 @@
> +            // If we're out of fields, all that remains are part(s) consisting
> +            // of trailing portions of enclosing fields, and maybe a final
> +            // literal part.
> +            if (i == len) {
> +                if (enclosingFields.length() > 0) {

Also: !enclosingFields.empty()

@@ +1906,5 @@
> +            }
> +
> +            // Otherwise we still have a field to process.
> +            const Field* current = &fields[i];
> +            MOZ_ASSERT(current->begin <= current->end);

current->begin < current->end

@@ +1907,5 @@
> +
> +            // Otherwise we still have a field to process.
> +            const Field* current = &fields[i];
> +            MOZ_ASSERT(current->begin <= current->end);
> +            MOZ_ASSERT(current->begin >= lastEnd);

Can we change either this condition or the invariant condition for |size_t i;|, so that both conditions use the same order of operands?

@@ +1913,5 @@
> +            // But first, deal with inter-field space.
> +            if (current->begin > lastEnd) {
> +                part->end = current->begin;
> +
> +                if (enclosingFields.length() > 0) {

Also: !enclosingFields.empty()

@@ +1918,5 @@
> +                    // Space between fields, within an enclosing field, is part
> +                    // of that enclosing field, until the start of the current
> +                    // field.
> +                    part->type = fields[enclosingFields.back()].type;
> +                    popEnclosingFieldsEndingAt(part->end);

|part->end| is |current->begin| (l.1915), but we actually need |fields[enclosingFields.back()].end|. The proposed assertion for popEnclosingFieldsEndingAt would have caught this error.

@@ +1931,5 @@
> +
> +            // Otherwise, the part spans a prefix of the current field.  Find
> +            // the most-nested field containing that prefix.
> +            const Field* next;
> +            do {

Two fields cannot start at the same position, so we don't need to loop here.

@@ +1942,5 @@
> +                    return true;
> +                }
> +
> +                next = &fields[i];
> +                MOZ_ASSERT(current->begin <= next->begin);

|current->begin < next->begin|, because fields aren't empty and don't start at the same position.

@@ +1943,5 @@
> +                }
> +
> +                next = &fields[i];
> +                MOZ_ASSERT(current->begin <= next->begin);
> +                MOZ_ASSERT(current->begin <= next->end);

|current->begin < next->end|.

@@ +1958,5 @@
> +            } while (current->begin == next->begin);
> +
> +            part->type = current->type;
> +
> +            if (current->end < next->begin) {

|end| is exclusive, so we need |current->end <= next->begin| here.

@@ +1979,5 @@
> +        {}
> +
> +        bool nextPart(bool* hasPart, Part* part) {
> +            // There are no parts left if we've partitioned the entire string.
> +            if (lastEnd == limit) {

Please add |MOZ_ASSERT(enclosingFields.empty());| to ensure we've popped all enclosing fields.

@@ +1984,5 @@
> +                *hasPart = false;
> +                return true;
> +            }
> +
> +            *hasPart = true;

I'd probably only set |hasPart| if nextPartInternal returned normally, but also up to you.

@@ +2001,5 @@
> +    RootedObject singlePart(cx);
> +    RootedValue propVal(cx);
> +
> +    PartGenerator gen(cx, fields, chars.length());
> +    do {

|while (true) ...| seems to be preferred over |do ... while (true)| in SpiderMonkey sources.

@@ +2024,5 @@
> +        if (!DefineProperty(cx, singlePart, cx->names().type, propVal))
> +            return false;
> +
> +        JSString* partSubstr =
> +            SubstringKernel(cx, overallResult, lastEndIndex, endIndex - lastEndIndex);

Directly call NewDependentString instead of SubstringKernel. I don't consider SubstringKernel as a part of the SpiderMonkey API and therefore it should only be called internally in String.prototype.substring.

::: js/src/builtin/Intl.js
@@ +2012,5 @@
>      // ES5.1 10.5, step 4.d.ii.
>  
>      // Step 1.a.ii-iii.
>      var x = ToNumber(value);
> +    return intl_FormatNumber(this, x, false);

Maybe use |/* formatToParts = */ false| so it's easier for readers to know why |false| is passed here.

@@ +2052,5 @@
> +    // Step 4.
> +    var x = ToNumber(value);
> +
> +    // Step 5.
> +    return intl_FormatNumber(nf, x, true);

And here |/* formatToParts = */ true|.

::: js/src/tests/Intl/NumberFormat/formatToParts.js
@@ +49,5 @@
> +
> +  var len = parts.length;
> +  assertEq(len, expected.length, "parts count mismatch");
> +  for (var i = 0; i < len; i++)
> +  {

Switched to Allman style? :-)

@@ +193,5 @@
> +var arPercentFormatter =
> +  new Intl.NumberFormat("ar-IQ", arPercentOptions);
> +
> +assertParts(arPercentFormatter, -135.32,
> +            [MinusSign("‏-"),

Can you change to Unicode escapes for the RTL marker here?
Attachment #8803581 - Flags: review?(andrebargull) → review-
I have more comments/responses coming, but let's get the most substantive one out early for longest perusal:

> @@ +1918,5 @@
> > +                    // Space between fields, within an enclosing field, is part
> > +                    // of that enclosing field, until the start of the current
> > +                    // field.
> > +                    part->type = fields[enclosingFields.back()].type;
> > +                    popEnclosingFieldsEndingAt(part->end);
> 
> |part->end| is |current->begin| (l.1915), but we actually need
> |fields[enclosingFields.back()].end|. The proposed assertion for
> popEnclosingFieldsEndingAt would have caught this error.

You have correctly noted a lapse, fixing the case where the inter-field space should end where the enclosing field ends, because the next field begins *after* the enclosing field ends.  With what I had, I would get this behavior, incorrectly putting the " " literal into the prior integer:

js> new Intl.NumberFormat("en-US", { style: "currency", currency: "USD", currencyDisplay: "name", minimumFractionDigits: 0 }).formatToParts(12345678)
[{type:"integer", value:"12"}, {type:"group", value:","}, {type:"integer", value:"345"}, {type:"group", value:","}, {type:"integer", value:"678 "}, {type:"currency", value:"US dollars"}]

But while your solution fixes one problem, it breaks another.  :-)  Consider what happens with your idea implemented, when the inter-field space ends with another field nested within the enclosing field:

js> new Intl.NumberFormat(undefined).formatToParts(12345678)
[{type:"integer", value:"12"}, {type:"group", value:","}, {type:"integer", value:"345,678"}]

This problem also manifests with the national-debt test in the new test in the patch.

So what we need is a std::min of both of 'em to handle both cases.  Or so it seems to me -- I'm not missing some other edge case, am I?

    // But first, deal with inter-field space.
    if (lastEnd < current->begin) {
        if (enclosingFields.length() > 0) {
            // Space between fields, within an enclosing field, is part
            // of that enclosing field, until the start of the current
            // field.
            auto enclosing = enclosingFields.back();
            part->end = std::min(fields[enclosing].end, current->begin);
            part->type = fields[enclosing].type;
            popEnclosingFieldsEndingAt(part->end);
        } else {
            // If there's no enclosing field, the space is a literal.
            part->end = current->begin;
            part->type = &JSAtomState::literal;
        }

        return true;
    }

Woo!  This stuff is so gnarl.
(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #12)
> So what we need is a std::min of both of 'em to handle both cases.  Or so it
> seems to me -- I'm not missing some other edge case, am I?

Ups. No, you're absolutely correct, we need a std::min here.
And we probably also need a plan what to do with the "--with-system-icu" configure flag. I guess the two obvious choices are:
1. Use the C++ API until upstream ICU provides the C API.
2. Disable NumberFormat.prototype.formatToParts when --with-system-icu is used.
I'll probably do #2.  I'm disinclined to have separate code paths for separate ICUs, that we're not going to do a proper job of testing.  Compile-time decisions are more likely to blow up completely if they're wrong, making them safer.
Blocks: 652780
Created attachment 8817325 [details] [diff] [review]
Updated

(In reply to André Bargull from comment #11)
> It's easier to understand and less code if UFieldPositionIterator is created
> in the caller (i.e. intl_FormatNumberToParts) and then passed to
> PartitionNumberPattern.

Good point.

> This implies there exists a number which can be formatted to the empty
> string. I don't think this is actually possible, is it?

Probably?  But we don't/can't really *know* unless ICU documents it so, and I don't think it has, and I don't think it really would because it depends on the behavior of locales in the real world, ergo can't be constrained.

That said, I think all the rest of the code will just create an empty array if this happens anyway, so I guess I can remove it.

> Fields are never empty, so we can strengthen this assertion to
> |beginIndexInt < endIndexInt|.

Another sort of guarantee that ICU's docs never really make clear.  But, I guess so.  It's hard to think of how one of ICU's fields could ever be empty, seeing as no integer, decimal, sign, fraction, &c. can be encoded in zero bits of information.  (Or only one could be, I guess.)  And it simplifies enough other code that I suppose we might as well.

> No need for extra locals, just call emplaceBack with
> `uint32_t(beginIndexInt)` and `uint32_t(endIndexInt)`. Maybe also rename
> {begin,end}IndexInt to {begin,end}Index in that case.

Not renaming: I want the signedness of the types to be clear to the reader, so they're aware they should handle (even if by asserting) the negative case.

> Maybe comment that |fields| is temporarily enlarged to hold the scratch
> space for MergeSort.

I was fence-sitting about doing it, given the meaning is spread across about ten lines or so.  Added since you think it's valuable.

> Two fields cannot start at the same position (*), so we can change this to
> |*lessOrEqual = left.begin < right.begin| and also add
> |MOZ_ASSERT(left.begin != right.begin);|.
> 
> (*) Not necessarily in general, but at least when formatting numbers this
> shouldn't be possible. Or did I miss something here?

This is something ICU's docs, again, never state, and there isn't obvious logic compelling it.  Better safe than sorry with hairy indexing logic.

> Trailing backslash produces GCC warning: multi-line comment [-Wcomment]

Ugh.  Futzed with the line art.

> I prefer to spell out loop variables when they appear as class members

Fair enough.

> Please add |MOZ_ASSERT_IF(!enclosingFields.empty(),
> fields[enclosingFields.back()].end >= end);| to ensure we didn't skip any
> fields.

Great idea.

> |!enclosingFields.empty()| seems to be preferred over
> |enclosingFields.length() > 0| when using dxr to search for all callers of
> .length() and .empty().

Eh.  I prefer length comparison because "empty" is ambiguous between verb and adjective.

> > +            MOZ_ASSERT(current->begin <= current->end);
> > +            MOZ_ASSERT(current->begin >= lastEnd);
> 
> Can we change either this condition or the invariant condition for |size_t
> i;|, so that both conditions use the same order of operands?

Makes sense.  I went with |lastEnd| before |begin|, and I swapped the two assertions so that we assert about ranges from beginning of string to end.

> Two fields cannot start at the same position, so we don't need to loop here.
>
> |current->begin < next->begin|, because fields aren't empty and don't start
> at the same position.

ICU's docs never state fields can't start at the same position, and there are conceivable field-setups that could do otherwise, so I'm unwilling to relax this.

> @@ +1958,5 @@
> > +            } while (current->begin == next->begin);
> > +
> > +            part->type = current->type;
> > +
> > +            if (current->end < next->begin) {
> 
> |end| is exclusive, so we need |current->end <= next->begin| here.

I...think this might only be a theoretical concern with certain imaginable arrangements of parts?  I tried to write a test that would break and came up mostly empty.  Still, changed.

> Switched to Allman style? :-)

We've never cared about test style/formatting/naming (except that actively-misleading test code may need changing), because test existence is way more important than whether it looks nice.  (Plus this style is the best style.  :-P )
Attachment #8817325 - Flags: review?(andrebargull)
Attachment #8803581 - Attachment is obsolete: true

Comment 17

11 months ago
Comment on attachment 8817325 [details] [diff] [review]
Updated

Review of attachment 8817325 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good!

I guess you plan to move the ICU changes into patch files for intl/icu-patches? Oh, and the MOZ_SYSTEM_ICU guards are still missing.

::: js/src/builtin/Intl.cpp
@@ +1759,4 @@
>      UErrorCode status = U_ZERO_ERROR;
> +
> +    // Passing null for |fpositer| will just not compute partition information,
> +    // letting us common up all ICU number-formatting code.

This comment should be moved to the calling code in intl_FormatNumber(), because that's where nullptr is used.

@@ +1767,2 @@
>      if (status == U_BUFFER_OVERFLOW_ERROR) {
> +        MOZ_ASSERT(resultSize >= 0);

Maybe also nice to have this check when the first call to unum_formatDoubleForFields returned successfully.

@@ +1775,5 @@
> +            unum_formatDoubleForFields(nf, *x, Char16ToUChar(formattedChars.begin()), resultSize,
> +                                       fpositer, &status);
> +        MOZ_ASSERT(size == resultSize);
> +    } else {
> +        if (!formattedChars.resize(resultSize))

We don't want to resize the vector when unum_formatDoubleForFields returns with a failure, because size_t(-1) is a huge number (where -1 seems to be the default return value on the failure case from ICU).

@@ +2005,5 @@
> +        // The fields in order from start to end, then least to most nested.
> +        const FieldsVector& fields;
> +
> +        // Index of the current field, in |fields|, being considered to
> +        // determine part boundaries.  |lastEnd <= fields[i].begin| is an

i -> index

@@ +2019,5 @@
> +
> +        Vector<size_t, 4> enclosingFields;
> +
> +        void popEnclosingFieldsEndingAt(uint32_t end) {
> +            MOZ_ASSERT_IF(!enclosingFields.empty(),

|enclosingFields.length() > 0| because you said you preferred "length()" over "empty()". :-)

@@ +2062,5 @@
> +                    // Space between fields, within an enclosing field, is part
> +                    // of that enclosing field, until the start of the current
> +                    // field or the end of the enclosing field, whichever is
> +                    // earlier.
> +                    auto enclosing = enclosingFields.back();

Maybe enclosing -> enclosingIndex to be consistent with the variable name a few lines upwards.
Or "const auto& enclosing = fields[enclosingFields.back()];" to directly access the field.

@@ +2090,5 @@
> +                }
> +
> +                next = &fields[index];
> +                MOZ_ASSERT(current->begin <= next->begin);
> +                MOZ_ASSERT(current->begin <= next->end);

|current->begin < next->end| as long as fields aren't empty.

@@ +3275,5 @@
>          if (!DefineProperty(cx, singlePart, cx->names().type, partType))
>              return false;
>  
> +        JSLinearString* partSubstr =
> +            NewDependentString(cx, overallResult, beginIndex, endIndex - beginIndex);

Thanks!
Attachment #8817325 - Flags: review?(andrebargull) → review+
Created attachment 8819574 [details] [diff] [review]
Respond to final comments, as a patch-atop-the-existing-patch (will merge on landing)

Okay, final time, I hope!  This addresses all the remaining comments.

Of particular note: I don't use MOZ_SYSTEM_ICU but rather a fresh #define added in unum.h.  This is consistent with what we do for icu::TimeZone::recreateDefault(), making it the apparent preferred approach.  (Plus we don't AC_DEFINE MOZ_SYSTEM_ICU, so I'd have to start touching build/autoconf/icu.m4 if I wanted that.  Seems excessive when we have an existing pattern to follow.)

This builds locally for me with/without --with-system-icu.  Unfortunately, I'm having trouble verifying that the update-icu.sh change is definitely precisely correct.  :-\  If I run it, I get this behavior:

[jwalden@find-waldo-now intl]$ ./update-icu.sh https://ssl.icu-project.org/repos/icu/icu/tags/release-58-1/
...
Exported revision 39543.
Applying local patch bug-915735
patching file intl/icu/source/config/mh-darwin
Hunk #1 succeeded at 25 (offset 2 lines).
Applying local patch suppress-warnings.diff
patching file intl/icu/source/acinclude.m4
Hunk #1 succeeded at 469 (offset -4 lines).
patching file intl/icu/source/configure
Hunk #1 succeeded at 4347 (offset 28 lines).
Applying local patch bug-1172609-timezone-recreateDefault.diff
patching file intl/icu/source/i18n/timezone.cpp
Hunk #1 succeeded at 110 (offset 2 lines).
Hunk #2 succeeded at 561 (offset 4 lines).
patching file intl/icu/source/i18n/unicode/timezone.h
Applying local patch bug-1198952-workaround-make-3.82-bug.diff
patching file intl/icu/source/Makefile.in
Applying local patch bug-1228227-bug-1263325-libc++-gcc_hidden.diff
patching file intl/icu/source/common/unicode/std_string.h
patching file intl/icu/source/common/utypeinfo.h
patching file intl/icu/source/io/unicode/ustream.h
Applying local patch ucol_getKeywordValuesForLocale-ulist_resetList.diff
patching file intl/icu/source/common/ulist.c
patching file intl/icu/source/i18n/ucol_res.cpp
Applying local patch unum_formatDoubleForFields.diff
patching file intl/icu/source/i18n/unicode/unum.h
patching file intl/icu/source/i18n/unum.cpp
Traceback (most recent call last):
  File "./icu_sources_data.py", line 22, in <module>
    from mozpack import path as mozpath
ImportError: No module named mozpack

and I haven't yet figured out why I'm getting that ImportError.  (Maybe you have some idea why?)  That the fresh patch applies without complaint, however, suggests this change is *probably* correct on a system that doesn't hit the mozpack issue.
Attachment #8819574 - Flags: review?(andrebargull)

Comment 19

11 months ago
(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #18)
> Traceback (most recent call last):
>   File "./icu_sources_data.py", line 22, in <module>
>     from mozpack import path as mozpath
> ImportError: No module named mozpack
> 
> and I haven't yet figured out why I'm getting that ImportError.  (Maybe you
> have some idea why?)  That the fresh patch applies without complaint,
> however, suggests this change is *probably* correct on a system that doesn't
> hit the mozpack issue.

Adding mozbuild to PYTHONPATH solved that issue for me: `export PYTHONPATH=$MOZILLA_SRC/python/mozbuild/`
(In reply to André Bargull from comment #19)
> Adding mozbuild to PYTHONPATH solved that issue for me: `export
> PYTHONPATH=$MOZILLA_SRC/python/mozbuild/`

Mm.  That works, and it seems to only result in changes to config/external/icu/data/icudt58l.dat (probably because tzdata needs subsequent bump back to 2016j).  But if it's really necessary, perhaps we should be setting PYTHONPATH to this value ourselves inside update-icu.sh?

As to the subsequent tzdata step, interestingly, that fails for me, complaining about no |icupkg| in PATH.  `sudo dnf install icu` picks that up for me (and fixes the config/external difference) -- maybe in the past I had that installed? -- so maybe we should be informing in some sort of printing spew that installing ICU is required to run that script?

(Interestingly, at the end of those two steps, plus the final |../js/src/builtin/make_intl_data.py tzdata|, there *is* a change to my tree:

[jwalden@find-waldo-now intl]$ hg diff
diff --git a/intl/tzdata/SVN-INFO b/intl/tzdata/SVN-INFO
--- a/intl/tzdata/SVN-INFO
+++ b/intl/tzdata/SVN-INFO
@@ -1,10 +1,10 @@
 Path: 44
 URL: https://ssl.icu-project.org/repos/icu/data/trunk/tzdata/icunew/2016j/44
 Relative URL: ^/data/trunk/tzdata/icunew/2016j/44
 Repository Root: https://ssl.icu-project.org/repos/icu
 Repository UUID: 251d0590-4201-4cf1-90de-194747b24ca1
 Node Kind: directory
 Last Changed Author: yoshito
 Last Changed Rev: 39512
-Last Changed Date: 2016-11-29 08:30:43 +0000 (Di, 29 Nov 2016)
+Last Changed Date: 2016-11-29 08:30:43 +0000 (Tue, 29 Nov 2016)
 

I guess maybe update-icu.sh needs to set LANG as well as TZ?  Although setting LANG doesn't do the trick for me, so I'm not sure what envvar we should be setting to fix this.)
Or actually, no, LANG might still do the trick.  From update-icu.sh:

# Also ensure SVN-INFO isn't localized.
export LANG=C

Maybe LANG=C isn't so en-USish as might have been assumed in theory?  Guess I'll file a bug to change this, and if it works for you we can call that fixed.

Comment 22

11 months ago
(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #20)
> -Last Changed Date: 2016-11-29 08:30:43 +0000 (Di, 29 Nov 2016)
> +Last Changed Date: 2016-11-29 08:30:43 +0000 (Tue, 29 Nov 2016)
>  
> 
> I guess maybe update-icu.sh needs to set LANG as well as TZ?  Although
> setting LANG doesn't do the trick for me, so I'm not sure what envvar we
> should be setting to fix this.)

I noticed this as well when I worked on the patch for bug 1323254. What I ended up with was setting `export LANG=C`, `export LANGUAGE=C`, and `export LC_ALL=C` to make sure svn uses English instead of German.

Updated

11 months ago
Attachment #8819574 - Flags: review?(andrebargull) → review+

Comment 23

11 months ago
Pushed by jwalden@mit.edu:
https://hg.mozilla.org/integration/mozilla-inbound/rev/f849271896d3
Implement an experimental, off-by-default Intl.NumberFormat.prototype.formatToParts function.  r=anba

Comment 24

11 months ago
Pushed by jwalden@mit.edu:
https://hg.mozilla.org/integration/mozilla-inbound/rev/81f871e19f1f
Fix bustage.  r=bustage in a CLOSED TREE

Comment 25

11 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/f849271896d3
https://hg.mozilla.org/mozilla-central/rev/81f871e19f1f
Status: ASSIGNED → RESOLVED
Last Resolved: 11 months ago
status-firefox53: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
You need to log in before you can comment on or make changes to this bug.