Expose an API to provide display names for calendar terms

RESOLVED FIXED in Firefox 53

Status

()

defect
RESOLVED FIXED
3 years ago
2 years ago

People

(Reporter: gandalf, Assigned: gandalf)

Tracking

(Blocks 2 bugs, {dev-doc-needed})

unspecified
mozilla53
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox53 fixed)

Details

(Whiteboard: [milestone5])

Attachments

(1 attachment, 3 obsolete attachments)

As part of bug 1280654, we need an API to get localized names of calendar terms.

Things like:

 - weekdays - Monday Tuesday Wednesday etc.
 - months - January February March
 - dayperiods - AM PM

in the future we may want to add timezones, territories, regions etc.

My first idea is to go for sth like:

Intl.getDisplayNames(locales, {
  group: 'datetime',
  type: 'weekday',
}); ->

{
  names: [
    'poniedzialek',
    'wtorek',
    'sroda',
    ...
  ],
  locale: 'pl',
  calendar: 'gregory'
}

=============

Intl.getDisplayNames(locales, {
  group: 'datetime',
  type: 'month',
}); ->

{
  names: [
    'styczen',
    'luty',
    'marzec'
    ...
  ],
  locale: 'pl',
  calendar: 'gregory'
}

=============

Intl.getDisplayNames(locales, {
  group: 'datetime',
  type: 'dayperiod',
}); ->

{
  names: [
    'AM',
    'PM',
  ],
  locale: 'pl',
  calendar: 'gregory'
}
Assignee

Updated

3 years ago
Assignee: nobody → gandalf
Blocks: 1280654
Posted patch WIP v1 (obsolete) — Splinter Review
This is a very early WIP, but it works.

The API ended up looking like this:

Intl.getDisplayNames(locales, {
  group: 'datetime', // unit | location
  type: 'weekday', // month | dayperiod
  style: 'long' // short | narrow 
});

{
  "names": [
    "niedziela",
    "poniedziałek",
    "wtorek",
    "środa",
    "czwartek",
    "piątek",
    "sobota"
  ],
  "calendar":"gregory",
  "locale":"pl"
}
:srl - does it look sane from ICU perspective?
Flags: needinfo?(srl)
Posted patch WIP v2 (obsolete) — Splinter Review
This is slightly extended version of the WIP.

I'd like to get first feedback on this as it does everything it's supposed to I think.

Waldo, can you take a look in particular at how I create UDateFormat? Is that the right way?

I'll be working on the code more - I know it needs more checkguards etc., but would like to know if it overall looks like I'm going in the right direction.
Attachment #8772308 - Attachment is obsolete: true
Attachment #8772587 - Flags: feedback?(jwalden+bmo)
Scott, can you take a look at this - esp. the test file - and confirm that it does what you need? I give you months, weekdays and dayperiods, and I give you styles: long|short|shorter|narrow.
Flags: needinfo?(scwwu)
Assignee

Updated

3 years ago
Status: NEW → ASSIGNED
Hi Zibi,

Thanks for working on the patch! This looks great, and I think it covers the need for most locales. I did notice that while playing with Chrome that in Arabic, the numbers are also displayed in localized string. If that's the expectation for Arabic (and maybe others I'm not aware of), we'll also need localized number strings to show calendar days, years, hours, and etc. Maybe just 0~9 unless there are special rules (hopefully not).

Another thing is the strings for "Year", "Month", "Day", and "Week". Would they be included as well? "Week" will be displayed on the week input field and UI. We'll need "Year", "Month", and "Day" on input boxes as placeholders, and maybe provide localizers the option to override them as needed. What do you think?
Flags: needinfo?(scwwu)
(In reply to Scott Wu [:scottwu] from comment #5)
> Hi Zibi,
> 
> Thanks for working on the patch! This looks great, and I think it covers the
> need for most locales. I did notice that while playing with Chrome that in
> Arabic, the numbers are also displayed in localized string. If that's the
> expectation for Arabic (and maybe others I'm not aware of), we'll also need
> localized number strings to show calendar days, years, hours, and etc. Maybe
> just 0~9 unless there are special rules (hopefully not).

JS has a good API for number formatting: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/NumberFormat

> Another thing is the strings for "Year", "Month", "Day", and "Week". Would
> they be included as well? "Week" will be displayed on the week input field
> and UI. We'll need "Year", "Month", and "Day" on input boxes as
> placeholders, and maybe provide localizers the option to override them as
> needed. What do you think?

Great, I'll look into how to map those words onto this API.
(In reply to Zibi Braniecki [:gandalf][:zibi] from comment #6)
> JS has a good API for number formatting:
> https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/
> Global_Objects/NumberFormat

This is exactly what I need. Thanks!
Assignee

Updated

3 years ago
Depends on: 1289951
Comment on attachment 8772587 [details] [diff] [review]
WIP v2

I'm updating the API to the latest proposal and building on top of mozIntl that already landed.
Attachment #8772587 - Flags: feedback?(jwalden+bmo)
> Another thing is the strings for "Year", "Month", "Day", and "Week".

:scottwu - So, seems like ICU will land this piece only in ICU 58 which will be released around January and we can hope to have it in Gecko around Jan/Feb.

If you plan to release your work earlier, I'm afraid you'll have to use regular .properties files to localize those strings. I'll design the API in this bug so that we will be easily able to extend it once we have it in ICU and then you'll be able to switch to use it.

The API for localization of month names, dayperiod names, and weekday names can land now.
Flags: needinfo?(scwwu)
Ok thanks for the heads up! The strings year, month, day, and week are intended for input box placeholder texts. Based on our discussion on bug 1069609, I think we've settled that we will provide the option for localizers to override the string? If that's the case then we'll just make that the only way to do it.
Flags: needinfo?(scwwu)

Updated

3 years ago
Blocks: 1301302
Ok, so in the first iteration it will provide:

 - names of months (January, February etc.)
 - names of weekdays (Monday, Tuesday etc.)
 - names of dayperiods (AM, PM)

All of them available in long, short and narrow forms.

The names of units ("Year", "Month", "Day", and "Week") will be provided by localizers now and once we extend this API after landing ICU 58 we will switch to it.
Posted patch WIP v3 (obsolete) — Splinter Review
This is an updated version of the proposal.

I changed the original approach since it wasn't applicable to all areas we may want to use this API for.
I scanned CLDR list for display names and came up with this list:

 - languages
 - locales
 - scripts
 - territories
 - date fields
 - unit names
 - symbols

We don't have to support all of them now, but I'd like to design the API to handle those data points.

After discussing it with Unicode/ICU people and people on my team, I've settled on a key-based API, where each key is a path.
The path chunks will be based on the names used by CLDR, but will be independent of CLDR, so if CLDR changes the organization in the future, we will not have to update the code that uses the API.

The initial proposal is to start with the following key-paths:
 - dates/gregorian/months/*
 - dates/gregorian/weekdays/*
 - dates/gregorian/dayperiods/*
 - dates/fields/year
 - dates/fields/month
 - dates/fields/week
 - dates/fields/day

So, an example use of the API will look like this:

let data = Intl.getDisplayNames('pl', {
  keys: [
    'dates/fields/year',
    'dates/fields/month',
    'dates/fields/week',
    'dates/fields/day',
    'dates/gregorian/months/january',
    'dates/gregorian/months/february',
    'dates/gregorian/months/march',
    'dates/gregorian/weekdays/tuesday',
    'dates/gregorian/dayperiods/am',
  ],
  style: 'long' // short, narrow
});

and the return value will be:
{
  "locale": "pl",
  "style": "long",
  "values": {
    "dates/fields/year": "rok",
    "dates/fields/month": "miesiąc",
    "dates/fields/week": "tydzień",
    "dates/fields/day": "dzień",
    "dates/gregorian/months/january": "styczeń",
    "dates/gregorian/months/february": "luty",
    "dates/gregorian/months/march": "marzec",
    "dates/gregorian/weekdays/tuesday": "wtorek",
    "dates/gregorian/dayperiods/am": "AM"
  }
}

:scottwu - that means that the API will be able to provide you with field names and you don't have to put them in localization.
Attachment #8772587 - Attachment is obsolete: true
Comment hidden (mozreview-request)
Assignee

Updated

3 years ago
Attachment #8789939 - Attachment is obsolete: true
Hi Waldo, 
Haven't seen updates on this recently. 
Do you have projection or feedback on the review?
Flags: needinfo?(jwalden+bmo)

Comment 15

3 years ago
mozreview-review
Comment on attachment 8790481 [details]
Bug 1287677 - Add mozIntl.getDisplayNames API.

https://reviewboard.mozilla.org/r/78266/#review88570

::: js/src/builtin/Intl.h:215
(Diff revision 1)
>  #if ENABLE_INTL_API
>  /**
> + * Returns an Object with CLDR-based fields display names.
> + *
> + * Usage: info = intl_GetDisplayNames(locale, {
> + *  style: 'long', // short | narrow

Write out

  "narrow" or "short" or "long"

rather than have this odd mix of code and pseudocode in comments.  See the format used for the comments on the structure of internals in Intl.js, for an idea of how to do it.  (And for the others.)

Actually, wait.  You're trying to mix an *example* (a perfectly fine thing), with documenting what this does generally.  Don't do that.  Describe what an API does generally as one thing.  Then, in a separate paragraph following up on the general docs, provide a specific example.  Mixing the two is more confusing than informative.

::: js/src/builtin/Intl.h:220
(Diff revision 1)
> + *  style: 'long', // short | narrow
> + *  keys: [
> + *    'dates/fields/year', // year | month | week | day
> + *    'dates/gregorian/months/january', // january | february | ...
> + *    'dates/gregorian/weekdays/monday', // monday | tuesday | ...
> + *    'dates/gregorian/dayperiods/am', // am | pm

Should keys really be an array?  Then you have to deal with duplicates, holes, and all sorts of other garbage (this is,

::: js/src/builtin/Intl.h:225
(Diff revision 1)
> + *    'dates/gregorian/dayperiods/am', // am | pm
> + *  ]
> + * });
> + *
> + * Returns: {
> + *   locale: 'en-US',

If your example is going to have "en-US" as locale, you should be passing it as locale in the example above that's supposed to produce this return value.

::: js/src/builtin/Intl.cpp:2495
(Diff revision 1)
> +    RootedArrayObject keys(cx, &value.toObject().as<ArrayObject>());
> +    if (!keys)
> +        return false;
> +
> +    if (!GetProperty(cx, options, options, cx->names().style, &value))
> +        return false;

The way you wrote the self-hosted bit of algorithm, you may have directly passed in |options.style| here.  So this value could be an arbitrary value, and then |value.toString()| below would misinterpret the value as a string, and this would be a type-safety violation.  This goes back to the self-hosted code requiring *much* more care about exactly how it translates user-provided values into narrowly-defined internal data structures to select particular behavior.

::: js/src/builtin/Intl.cpp:2496
(Diff revision 1)
> +    if (!keys)
> +        return false;
> +
> +    if (!GetProperty(cx, options, options, cx->names().style, &value))
> +        return false;
> +    JSAutoByteString style(cx, value.toString());

This is bad.  You've passed in a user-provided style string (let's suppose you'd properly coerced this to string, for the moment), but there' no guarantee at this juncture that the string is one of a fixed set of values.  Therefore when you do this, you're Latin-1 encoding a string into bytes -- and that could just strip off everything above the lower 8 bits.  So "lon\u0067", i.e. "long", could appear here, and that would be fine -- everything's Latin-1.  But if instead you got "lon\u1067", that would *still* truncate to "long" even tho it's *not* "long".  That's wrong.  Your self-hosted code should probably be using GetOption to get options.style and then massage it into one of the minimal set of values you want to be allowed here.

Once this really truly is "long" or "short" or "narrow" only, *then* it's okay to JSAutoByteString here.  But not before.

::: js/src/builtin/Intl.cpp:2504
(Diff revision 1)
> +
> +    RootedArrayObject result(cx, NewDenseFullyAllocatedArray(cx, keys->length()));
> +    if (!result)
> +        return false;
> +
> +    result->ensureDenseInitializedLength(cx, 0, keys->length());

This is almost certainly either premature optimization, or something that's just fairly dumb.  There's no good reason for this code to be doing cutesy things with how it creates the result array, and it should be good enough for it just to create an array -- dense and allocated is fine only because it doesn't have any broader implications to consider, but manually mucking with initialized lengths is a bridge too far -- and defining elements on it begin to end (which should update the dense initialized length as it goes).

::: js/src/builtin/Intl.cpp:2509
(Diff revision 1)
> +    result->ensureDenseInitializedLength(cx, 0, keys->length());
> +
> +    UErrorCode status = U_ZERO_ERROR;
> +
> +    UDateFormat* fmt = udat_open(
> +        UDAT_DEFAULT,

This formatting isn't consistent with how we format function calls that span lines.  I'd do

    UDateFormat* fmt =
        udat_open(..., ..., ...,
                  ...);

or so.

::: js/src/builtin/Intl.cpp:2518
(Diff revision 1)
> +        -1, NULL, -1, &status);
> +    if (U_FAILURE(status)) {
> +        JS_ReportErrorNumber(cx, GetErrorMessage, nullptr, JSMSG_INTERNAL_INTL_ERROR);
> +        return false;
> +    }
> +    ScopedICUObject<UDateFormat, udat_close> UDattoClose(fmt);

Capitalizing variable names is definite no-no.  Name this |datToClose| or something.  Likewise for the udatpg below.

::: js/src/builtin/Intl.cpp:2529
(Diff revision 1)
> +    }
> +    ScopedICUObject<UDateTimePatternGenerator, udatpg_close> UDatPGtoClose(dtpg);
> +
> +    for (uint32_t i = 0; i < keys->length(); i++) {
> +        RootedValue keyValue(cx, keys->getDenseElement(i));
> +        RootedArrayObject pattern(cx, &keyValue.toObject().as<ArrayObject>());

Declare |keyValue| and |pattern| outside the loop.  Then assign to them here.

(Creating Rooted inside a loop will do a few linked-list operations every time around the loop, when really they only need be done once, then reused for all iterations.  It's good style to avoid that sort of needless work when you can, when it clearly doesn't impact correctness because the variables are assigned at the loop start.)

::: js/src/builtin/Intl.cpp:2539
(Diff revision 1)
> +        RootedString partString(cx);
> +        for (uint32_t j = 0; j < 4; j++) {
> +            if (j < pattern->length()) {
> +                part.set(pattern->getDenseElement(j));
> +            } else {
> +                part.set(JS_GetEmptyStringValue(cx));;

The empty string is best accessed internally as cx->names().empty.

::: js/src/builtin/Intl.cpp:2541
(Diff revision 1)
> +            if (j < pattern->length()) {
> +                part.set(pattern->getDenseElement(j));
> +            } else {
> +                part.set(JS_GetEmptyStringValue(cx));;
> +            }
> +            partString.set(ToString(cx, part));

ToString is a fallible operation (as is usually indicated by a |cx| leading argument), so you need to check for failure and return false if so.

::: js/src/builtin/Intl.cpp:2542
(Diff revision 1)
> +                part.set(pattern->getDenseElement(j));
> +            } else {
> +                part.set(JS_GetEmptyStringValue(cx));;
> +            }
> +            partString.set(ToString(cx, part));
> +            parts[j].encodeUtf8(cx, partString);

This too is a fallible operation, so it also requires a check.

::: js/src/builtin/Intl.cpp:2551
(Diff revision 1)
> +
> +        if (equal(parts[0], "dates")) {
> +	    if (equal(parts[1], "fields")) {
> +                UDateTimePatternField fieldType;
> +
> +		if (equal(parts[2], "year")) {

This code handling dates/fields/year is problematic in that it would, I believe, equally apply if someone passed in a key "dates/fields/year/BOGUS", or "dates/fields/year/" with empty final component.  You need to structure this code to reflect the *actual* number of components in the overall string (array at this point, I guess).  Note that because the 0-to-4 loop is a fixed length, testing for last component non-empty wouldn't handle the second case I mention, I think.

::: js/src/builtin/Intl.cpp:2560
(Diff revision 1)
> +                } else if (equal(parts[2], "week")) {
> +                    fieldType = UDATPG_WEEK_OF_YEAR_FIELD;
> +		} else if (equal(parts[2], "day")) {
> +                    fieldType = UDATPG_DAY_FIELD;
> +                } else {
> +                    JS_ReportErrorNumber(

Bad formatting again.

::: js/src/builtin/Intl.cpp:2571
(Diff revision 1)
> +                }
> +
> +                int32_t resultSize;
> +
> +                const UChar* value = udatpg_getAppendItemName(
> +                    dtpg, fieldType, &resultSize);

Who owns the return value here?  Does ICU own it, or do we need to deallocate it when we're done with it?

::: js/src/builtin/Intl.cpp:2573
(Diff revision 1)
> +                int32_t resultSize;
> +
> +                const UChar* value = udatpg_getAppendItemName(
> +                    dtpg, fieldType, &resultSize);
> +                if (U_FAILURE(status)) {
> +                    JS_ReportErrorNumber(

Bad formatting on the JS_ReportErrorNumber call, and this'll want to be the ASCII variant when you adjust this, now.

::: js/src/builtin/Intl.cpp:2583
(Diff revision 1)
> +                    return false;
> +                }
> +
> +                RootedString word(
> +                    cx,
> +                    NewStringCopyN<CanGC>(cx, UCharToChar16(value), resultSize));

NewStringCopyN is fallible.

::: js/src/builtin/Intl.cpp:2584
(Diff revision 1)
> +                }
> +
> +                RootedString word(
> +                    cx,
> +                    NewStringCopyN<CanGC>(cx, UCharToChar16(value), resultSize));
> +                wordVal.set(StringValue(word));

setString(value) is slightly more readable.

::: js/src/builtin/Intl.cpp:2595
(Diff revision 1)
> +                    if (equal(style, "narrow")) {
> +                        symbolType = UDAT_STANDALONE_NARROW_MONTHS;
> +                    } else if (equal(style, "short")) {
> +                        symbolType = UDAT_STANDALONE_SHORT_MONTHS;
> +                    } else {
> +                        symbolType = UDAT_STANDALONE_MONTHS;

MOZ_ASSERT that the value must be "long" here, right?

::: js/src/builtin/Intl.cpp:2665
(Diff revision 1)
> +                    continue;
> +                }
> +
> +                Vector<char16_t, INITIAL_CHAR_BUFFER_SIZE> chars(cx);
> +                if (!chars.resize(INITIAL_CHAR_BUFFER_SIZE))
> +                    return false;

This vector should be declared outside the loop, and inside the loop it should be resized to this length anew each time.  No need to recreate it every time around.

::: js/src/builtin/Intl.cpp:2681
(Diff revision 1)
> +                    return false;
> +                }
> +
> +                RootedString word(
> +                        cx,
> +                        NewStringCopyN<CanGC>(cx, chars.begin(), resultSize));

Another fallible operation, and setString beneath it.

::: js/src/builtin/Intl.cpp:2686
(Diff revision 1)
> +                        NewStringCopyN<CanGC>(cx, chars.begin(), resultSize));
> +                wordVal.set(StringValue(word));
> +            }
> +        }
> +
> +        result->setDenseElement(i, wordVal);

Use normal element-setting functions for this.  When you go behind the VM's back like this, you're going to start losing support for things like TI knowing that all the fields are going to be strings, for example, and then you start paying prices down the line.  Not to mention this is definitely more obscure and harder to understand.

::: js/src/builtin/Intl.js:2909
(Diff revision 1)
>  
>    return result;
>  }
> +
> +function expandDisplayNamesKeys(keys) {
> +  const result = [];

JS style in this file is not to brace single-line loop bodies and to indent four spaces.

::: js/src/builtin/Intl.js:2911
(Diff revision 1)
>  }
> +
> +function expandDisplayNamesKeys(keys) {
> +  const result = [];
> +  for (let i = 0; i < keys.length; i++) {
> +    result[i] = callFunction(String_split, keys[i], '/');

String_split behaves like the original String.prototype.split.  That function's behavior depends on the value of the |Symbol.split| property of its |separator| argument:

js> String.prototype\[Symbol.split\] = () => \{ throw 42; \}
() => \{ throw 42; \}
js> "hi".split("bye")
uncaught exception: 42
(Unable to print stack trace)

So this is wrong.  I \*think\* you can use |StringSplitString(keys\[i\], "/")| instead here, but someone should double-check me on that.

::: js/src/builtin/Intl.js:2912
(Diff revision 1)
> +
> +function expandDisplayNamesKeys(keys) {
> +  const result = [];
> +  for (let i = 0; i < keys.length; i++) {
> +    result[i] = callFunction(String_split, keys[i], '/');
> +  }

You can't create an array via [], then initialize its elements by direct assignment.  That behavior is modifiable if someone's defined a setter on Array.prototype.  Instead you should individually *define* elements.  I expect there are examples of doing this in Array.js, or ask on IRC.

::: js/src/builtin/Intl.js:2927
(Diff revision 1)
> +  }
> +  return result;
> +}
> +
> +function Intl_getDisplayNames(locales, options = {}) {
> +  const requestedLocales = CanonicalizeLocaleList(locales);

This function requires stepwise comments, like all the other Intl_* functions have, correlating each bit of the algorithm with a spec step so that exactness of algorithm can be verified.

::: js/src/builtin/Intl.js:2943
(Diff revision 1)
> +                        DateTimeFormat.relevantExtensionKeys,
> +                        localeData);
> +
> +  const resolvedOptions = {
> +    style: options.style || 'long',
> +    keys: expandDisplayNamesKeys(options.keys || []),

I know || is sort of idiomatic-ish JS for providing default values, but it's not really a spec idiom at all.  Is the goal to accept any \\\*object\\\*-valued keys property?  This code would be wrong for that if the value were document.all.  Is the goal to accept something that would coerce to object?  This would be wrong for that, too.  Note how existing Intl.js code uses || only in places where the operands are exact value comparisons, or where the values are constrained to particular types, or other such things -- never directly on user-provided values.

As for what options.keys is.  One, it's not \*necessarily\* an array, so when eDNK iterates up to .length, that .length could invoke getters and arbitrary behavior.  In specs, usually a length is gotten, then iteration proceeds up to it \*without\* recomputing length.  You need to have a much clearer, more precise algorithm for exactly what the input value is, then how you process it into spec-internal data structures.

Alternatively, having an array means you have the possibility of duplicates, etc.  It might be better to have this argument treated as a dictionary, and enumerating its properties (maybe including inherited ones?) gets the particular keys to use.  I'm not certain of this, tho -- some discussion of exactly how this should look might not be bad.

::: js/src/builtin/Intl.js:2946
(Diff revision 1)
> +  const resolvedOptions = {
> +    style: options.style || 'long',
> +    keys: expandDisplayNamesKeys(options.keys || []),
> +  }
> +
> +  const values = intl_GetDisplayNames(r.locale, resolvedOptions);

Having both Intl_getDisplayNames and intl_GetDisplayNames is awfully finicky for reading.  Please rename the latter to something else -- maybe intl_ComputeDisplayNames or something.

::: js/src/builtin/Intl.js:2952
(Diff revision 1)
> +
> +  const result = {};
> +
> +  result.locale = r.locale;
> +  result.style = resolvedOptions.style;
> +  result.values = buildDisplayNamesObject(resolvedOptions.keys, values);

All these assignments will trigger user-modifiable behavior if someone defines setters for the relevant properties.  You should directly return an object literal with these properties embedded in it (object literals without computed property names, can't trigger these side effects).

::: js/src/tests/Intl/getDisplayNames.js:1
(Diff revision 1)
> +// |reftest| skip-if(!this.hasOwnProperty("Intl"))

Given this uses addIntlExtras, there's a fair chance this test will work when run in the shell, yet fail when run in the browser -- try should say whether this is a problem or not.

::: js/src/tests/Intl/getDisplayNames.js:9
(Diff revision 1)
> + * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
> +
> +// Tests the getCalendarInfo function with a diverse set of arguments.
> +
> +/*
> + * Return true if A is equal to B, where equality on arrays and objects

This comment looks entirely bogus now.

::: js/src/tests/Intl/getDisplayNames.js:21
(Diff revision 1)
> +{
> +  assertEq(Object.getPrototypeOf(names), Object.prototype);
> +
> +  assertEq(names.locale, expected.locale);
> +  assertEq(names.style, expected.style);
> +  

Trailing WS

::: js/src/tests/Intl/getDisplayNames.js:83
(Diff revision 1)
> +    "dates/gregorian/dayperiods/pm": "PM"
> +  }
> +});
> +
> +checkDisplayNames(gDN('it', {
> +  style: 'short',

You don't appear to have any tests with options.style === "narrow".  You don't have any tests with options.style not any of these strings (including when it's not a string at all).

::: toolkit/components/mozintl/test/test_mozintl.js:44
(Diff revision 1)
> +  let x = {};
> +
> +  mozIntl.addGetCalendarInfo(x);
> +  equal(x.getCalendarInfo instanceof Function, true);
> +
> +  mozIntl.addDisplayNames(x);

This should have been addGetDisplayNames -- did you try running this test?
Attachment #8790481 - Flags: review?(jwalden+bmo) → review-
Comment hidden (mozreview-request)
Thanks a lot for the review :waldo!

I updated the patch (for some reason mozreview shows correctly the "orig-2" patch, but "1-2" is broken and contains different pieces. If you want me to post the interdiff separately, I'll do that) to your review.

I decided to move the string splitting to C++ so that we don't have to go back and forth and also made it possible to throw properly from inside of C++ with a correct error message.

Can you look at this change? (It's the `expandDisplayNamesKey` function) I had to write some raw C and I'm not comfortable with it *^^*.

> Should keys really be an array?  Then you have to deal with duplicates, holes, and all sorts of other garbage (this is,

I don't have an answer to that. My idea was to stick to Array in, Object out because that's the closes thing to what we'll be able to standardize (I don't see a way right now to get Sets into ECMA402), and it gives an OK-ish API.

The trouble with this API is that it's pretty borderline Intl. It's really almost more like L10n pulling strings out of CLDR and a bit of style matching aligned with Intl API sprinkled on top.

But I think it does its job, can be used by the Firefox UI via mozIntl and can be a foundation for a conversation about ECMA402 API on top of it.

> Who owns the return value here?  Does ICU own it, or do we need to deallocate it when we're done with it?

How can I check that?

> This function requires stepwise comments, like all the other Intl_* functions have, correlating each bit of the algorithm with a spec step so that exactness of algorithm can be verified.

I'll document the code, although can't align with spec steps because there's no spec yet.
Comment hidden (mozreview-request)
> I'll document the code, although can't align with spec steps because there's no spec yet.

Updated the patch with documentation. The intra-diff still doesn't work and the #ateam says that it's a bug in mozreview. Hope the full diff will be sufficient for the review.

Comment 20

3 years ago
mozreview-review
Comment on attachment 8790481 [details]
Bug 1287677 - Add mozIntl.getDisplayNames API.

https://reviewboard.mozilla.org/r/78266/#review90552

::: js/src/builtin/Intl.h:269
(Diff revision 2)
> + *         An array or path-like strings that identify keys to be returned
> + *         At the moment the following types of keys are supported:
> + *
> + *           'dates/fields/{year|month|week|day}'
> + *           'dates/gregorian/months/{january|february|...}'
> + *           'dates/gregorian/weekdays/{monday|tuesday|...}'

{sunday|...|saturday} to be clear that the full range of days is supported.

::: js/src/builtin/Intl.h:274
(Diff revision 2)
> + *           'dates/gregorian/weekdays/{monday|tuesday|...}'
> + *           'dates/gregorian/dayperiods/{am|pm}'
> + *
> + * Example:
> + *
> + * let info = intl_GetDisplayNames('en-US', {

This should be intl_ComputeDisplayNames.

::: js/src/builtin/Intl.cpp:2679
(Diff revision 2)
> +    int i = 0;
> +
> +    while (i < 4) {
> +        if (part) {
> +            parts[i] = part;
> +            part = strtok(nullptr, "/");

It is always and everywhere wrong to use |strtok|, because it's thread-unsafe and is just a fugly API.  Not just in this patch, or for this task -- for any task.  See further on for how to avoid it.

::: js/src/builtin/Intl.cpp:2712
(Diff revision 2)
> +
> +    if (!GetProperty(cx, options, options, cx->names().style, &value))
> +        return false;
> +    JSAutoByteString style(cx, value.toString());
> +    if (!style)
> +        return false;

Please move the style bits up above the keys bits, so that use of `keys->length()` is as close to the code that computes it as possible.

::: js/src/builtin/Intl.cpp:2714
(Diff revision 2)
> +        return false;
> +    JSAutoByteString style(cx, value.toString());
> +    if (!style)
> +        return false;
> +
> +    RootedArrayObject result(cx, NewDenseFullyAllocatedArray(cx, keys->length()));

Your length is user-controllable, so it could be large enough that dense storage can't be used for the entire array.  If that's the case, subsequent code in here that presumes density will go way off the rails in I don't know what way -- no good.  Use `NewDenseUnallocatedArray` instead, because "This is useful, e.g., when accepting length from the user."  The perf of allocating elements just once is not important for this function.

::: js/src/builtin/Intl.cpp:2722
(Diff revision 2)
> +
> +    UErrorCode status = U_ZERO_ERROR;
> +
> +    UDateFormat* fmt =
> +        udat_open(UDAT_DEFAULT, UDAT_DEFAULT, icuLocale(locale.ptr()),
> +        NULL, -1, NULL, -1, &status);

nullptrs, please.  And pass 0 instead of -1 for these -- as the docs state it, -1 means null-terminated, and you're just getting lucky that the implementation null-checks before interpreting these lengths.

::: js/src/builtin/Intl.cpp:2744
(Diff revision 2)
> +    Vector<char16_t, INITIAL_CHAR_BUFFER_SIZE> chars(cx);
> +    if (!chars.resize(INITIAL_CHAR_BUFFER_SIZE))
> +        return false;
> +
> +    for (uint32_t i = 0; i < keys->length(); i++) {
> +        keyValue.set(keys->getDenseElement(i));

Use `GetElement` instead, because you can't know for certain that `keys` uses dense storage, because its length is user-controllable.

::: js/src/builtin/Intl.cpp:2746
(Diff revision 2)
> +        return false;
> +
> +    for (uint32_t i = 0; i < keys->length(); i++) {
> +        keyValue.set(keys->getDenseElement(i));
> +
> +        JSAutoByteString pattern(cx, keyValue.toString());

This string is user-provided, so you have no guarantee of it being Latin-1, which is what this currently assumes.  And, I think encoding as Latin-1 will just truncate -- so "\u9963" would be interpreted as "c" which is certainly undesirable.

Rather, encode the string as UTF-8:

    JSAutoByteString pattern;
    if (!pattern.encodeUtf8(cx, keyValue.toString()))
        return false;

And of course, when you do this, you'll want to change the various error reporting to use UTF-8 functions, not the current ASCII functions.

::: js/src/builtin/Intl.cpp:2749
(Diff revision 2)
> +        keyValue.set(keys->getDenseElement(i));
> +
> +        JSAutoByteString pattern(cx, keyValue.toString());
> +
> +        char const* parts [4];
> +        expandDisplayNamesKey(parts, &pattern);

This approach -- copying the string, iterating through with strtok, and doing other stuff -- is a bad one.  You should be able to parse the string through in one go, without having to modify it.  Something like this:

    template<size_t N>
    inline bool
    MatchPart(const char** pattern, const char (&part)[N])
    {
        if (strncmp(*pattern, part, N - 1))
            return false;
        
        const char after = (*pattern)[N - 1];
        if (after != '/' && after != '\0')
            return false;
        
        // Adjust |pattern| to point either at the next part, or
        // at the final null terminator.  Code verifying a last
        // part depends on this.
        size_t skipBy = after == '/'
                        ? N
                        : N - 1;
        *pattern += skipBy;
        return true;
    }

    const char* pat = pattern.ptr();

    if (!MatchPart(&pat, "dates")) {
        complain bad key;
        return false;
    }
    
    if (MatchPart(&pat, "fields")) {
        UDateTimePatternField fieldType;
        if (MatchPart(&pat, "year")) {
            fieldType = UDATPG_YEAR_FIELD;
        } else if (MatchPart(&pat, "month")) {
            fieldType = UDATPG_MONTH_FIELD;
        } else if (MatchPart(&pat, "week")) {
            fieldType = UDATPG_WEEK_OF_YEAR_FIELD;
        } else if (MatchPart(&pat, "day")) {
            fieldType = UDATPG_DAY_FIELD;
        } else {
            complain bad key;
            return false;
        }
        
        // This part must be the final part with no trailing data.
        if (*pat != '\0') {
            complain bad key;
            return false;
        }
        
        ...rest of things in this code...
    } else if (MatchPart(&pat, "gregorian")) {
        UDateFormatSymbolType symbolType;
        int32_t index = -1;
        
        ...similar idea...
    } else {
        complain bad key;
        return false;
    }

::: js/src/builtin/Intl.cpp:2751
(Diff revision 2)
> +        JSAutoByteString pattern(cx, keyValue.toString());
> +
> +        char const* parts [4];
> +        expandDisplayNamesKey(parts, &pattern);
> +
> +        wordVal.set(NullValue());

setNull()

::: js/src/builtin/Intl.cpp:2781
(Diff revision 2)
> +                    return false;
> +                }
> +
> +                RootedString word(
> +                    cx,
> +                    NewStringCopyN<CanGC>(cx, UCharToChar16(value), resultSize));

Just do

    JSString* word = NewStringCopyN<CanGC>(cx, UCharToChar16(value), resultSize);

because the pointer is used so quickly, without any subsequent code possibly able to GC and invalidate the pointer.

::: js/src/builtin/Intl.cpp:2824
(Diff revision 2)
> +                    else if (equal(parts[3], "october"))
> +                        index = UCAL_OCTOBER;
> +                    else if (equal(parts[3], "november"))
> +                        index = UCAL_NOVEMBER;
> +                    else if (equal(parts[3], "december"))
> +                        index = UCAL_DECEMBER;

Surely this needs to RangeError if no month is found, right?

::: js/src/builtin/Intl.cpp:2849
(Diff revision 2)
> +                    else if (equal(parts[3], "friday"))
> +                        index = UCAL_FRIDAY;
> +                    else if (equal(parts[3], "saturday"))
> +                        index = UCAL_SATURDAY;
> +                    else if (equal(parts[3], "sunday"))
> +                        index = UCAL_SUNDAY;

Same RangeError thing?

::: js/src/builtin/Intl.cpp:2858
(Diff revision 2)
> +
> +                    if (equal(parts[3], "am")) {
> +                        index = UCAL_AM;
> +                    } else if (equal(parts[3], "pm")) {
> +                        index = UCAL_PM;
> +                    }

RangeError?

::: js/src/builtin/Intl.cpp:2867
(Diff revision 2)
> +                if (index == -1 || !symbolType) {
> +                    continue;
> +                }
> +
> +                uint32_t resultSize = udat_getSymbols(
> +                        fmt,

This formatting isn't consistent with JS style or the rest of the file.  You want something more like

    int32_t resultSize =
        udat_getSymbols(fmt, symbolType, index, Char16ToUChar(chars.begin()),
                        INITIAL_CHAR_BUFFER_SIZE, &status);

(rewrapped to break at 99ch, I'm just giving an example of what it might be at some offset).

There's no guarantee that this function will produce a result that fits in the provided buffer, so you have to do the usual song-and-dance of trying once, then resizing to redo if there wasn't enough space the first time around.  Search for `U_BUFFER_OVERFLOW_ERROR` to see the proper way to do this.

::: js/src/builtin/Intl.cpp:2880
(Diff revision 2)
> +                    return false;
> +                }
> +
> +                RootedString word(
> +                        cx,
> +                        NewStringCopyN<CanGC>(cx, chars.begin(), resultSize));

You won't need a RootedString for this, either, just a local `JSString*`.

::: js/src/builtin/Intl.js:3048
(Diff revision 2)
>    result.locale = r.locale;
>  
>    return result;
>  }
> +
> +function buildDisplayNamesObject(keys, values) {

Could you inline this function into Intl_getDisplayNames?  It's hard to read this and know what keys/values are -- which is a user-provided object, which has structure determined solely by the behavior of a self-hosting intrinsic function, &c.  And that matters a lot for semantics ultimately implemented.

::: js/src/builtin/Intl.js:3054
(Diff revision 2)
> +    const result = {};
> +    for (let i = 0; i < keys.length; i++) {
> +        const key = keys[i];
> +        const val = values[i];
> +        if (val) {
> +            result[key] = val || "";

As I said last time around, this needs to define the elements:

    _DefineDataProperty(result, key, val);

Note that `val` comes directly from `intl_ComputeDisplayNames`, which defines the elements of `values` as strings.  So there's no reason to test this -- although some assertions seem worthwhile:

    for (let i = 0; i < keys.length; i++) {
        const key = keys[i];
        const val = values[i];
        assert(typeof val === "string", "unexpected non-string value");
        assert(val.length > 0, "empty string value");
        _DefineDataProperty(result, key, val);
    }

Also, single-line ifs shouldn't be braced.

::: js/src/builtin/Intl.js:3081
(Diff revision 2)
> +                          localeOpt,
> +                          DateTimeFormat.relevantExtensionKeys,
> +                          localeData);
> +
> +    const style = GetOption(options, "style", "string", ["long", "short", "narrow"], "long");
> +    let keys = options["keys"];

options.keys is better style.

::: js/src/builtin/Intl.js:3085
(Diff revision 2)
> +    const style = GetOption(options, "style", "string", ["long", "short", "narrow"], "long");
> +    let keys = options["keys"];
> +
> +    if (keys === undefined)
> +        keys = [];
> +    else if (!IsArray(keys))

There's probably not a good reason to require that this argument be an array, versus simply duck-typing it as an arraylike.  Instead of doing

    for (let i = 0; i < keys.length; i++)
        ...;

do this:

    let len = ToLength(keys.length);
    for (let i = 0; i < len; i++)
        ...;

and don't bother requiring that it be an array.  But, some type-validation probably isn't bad -- just make it like so:

    if (keys === undefined)
        keys = [];
    else if (!IsObject(keys))
        ThrowTypeError(...);

Actually, looking closer -- you're operating upon `keys` the user-provided object multiple times.  That's not cool.  You should process `keys` once into an internal data structure for sensible behavior.  Something like:

    var processedKeys = new List();
    var len = ToLength(keys.length);
    for (var i = 0; i < len; i++)
        callFunction(std_Array_push, processedKeys, ToString(keys[i]));

and then use `{ style, keys: processedKeys }` for the value of `resolvedOptions`, and `processedKeys` instead of `keys` inside the algorithm that's currently in `buildDisplayNamesObject`.

::: js/src/builtin/Intl.js:3091
(Diff revision 2)
> +        ThrowTypeError(JSMSG_INVALID_KEYS_TYPE, typeof keys);
> +
> +    let len = keys.length;
> +
> +    for (let i = 0; i < len; i++) {
> +        if (typeof keys[i] !== 'string') {

This would actually be problematic if you used `IsArray` as a gatekeeper.  That function would allow

    new Proxy([], { ... });

to pass through, and at that point you can't guarantee `keys[i]` is a string -- even if you test it here, because the proxy handler could return different things for separate `[[Get]]`s on the object.

Just pass each element value through `ToString` as I suggested earlier, and let non-string element values just get handled however their stringified forms would work.

::: js/src/builtin/Intl.js:3099
(Diff revision 2)
> +    }
> +
> +    const resolvedOptions = {
> +        style,
> +        keys,
> +    }

Just put this all on a single line, and add a semicolon:

    const resolvedOptions = { style, keys };

::: js/src/builtin/Intl.js:3101
(Diff revision 2)
> +    const resolvedOptions = {
> +        style,
> +        keys,
> +    }
> +
> +    const values = intl_ComputeDisplayNames(r.locale, resolvedOptions);

Probably worth a comment over this that `values` will be an array whose elements are all strings.  Then someone reading subsequent processing of `values` knows that its contents are "safe".

::: js/src/tests/Intl/getDisplayNames.js:20
(Diff revision 2)
> +  assertEq(names.locale, expected.locale);
> +  assertEq(names.style, expected.style);
> +
> +  assertEq(
> +    Object.keys(names.values).length,
> +    Object.keys(expected.values).length);

Do

    var nameValues = names.values;
    var expectedValues = expected.values;
    
    var nameValuesKeys = Object.getOwnPropertyNames(nameValues).sort();
    var expectedValuesKeys = Object.getOwnPropertyNames(expectedValues).sort();
    
    assertEqArray(nameValuesKeys, expectedValuesKeys);
    
    for (var key of expectedValuesKeys)
      assertEq(nameValues[key], expectedValues[key]);

instead -- check all own properties, check for their exact presence slightly less fuzzily.

::: js/src/tests/Intl/getDisplayNames.js:208
(Diff revision 2)
> +  });
> +}, RangeError);
> +
> +assertThrowsInstanceOf(() => {
> +  gDN('en-US', {
> +    keys: ['dates/foo/faa/bar/baz']

Add some tests passing in non-ASCII codepoints, too.

::: js/src/vm/SelfHosting.cpp:2444
(Diff revision 2)
>      JS_FN("intl_defaultTimeZone", intl_defaultTimeZone, 0,0),
>      JS_FN("intl_defaultTimeZoneOffset", intl_defaultTimeZoneOffset, 0,0),
>      JS_FN("intl_FormatDateTime", intl_FormatDateTime, 2,0),
>      JS_FN("intl_FormatNumber", intl_FormatNumber, 2,0),
>      JS_FN("intl_GetCalendarInfo", intl_GetCalendarInfo, 1,0),
> +    JS_FN("intl_ComputeDisplayNames", intl_ComputeDisplayNames, 1,0),

Should probably make the length 2, if this function's going to assert it's always called with two arguments.
Attachment #8790481 - Flags: review?(jwalden+bmo) → review-

Comment 21

3 years ago
mozreview-review
Comment on attachment 8790481 [details]
Bug 1287677 - Add mozIntl.getDisplayNames API.

https://reviewboard.mozilla.org/r/78266/#review94070

::: js/src/builtin/Intl.cpp:2663
(Diff revisions 2 - 3)
>  void
>  expandDisplayNamesKey(char const* parts[], JSAutoByteString* key)
>  {
> +    // Take a slash delimited string key and fill 4 elements
> +    // parts array with path chunks from slicing the string.
> +    // Missing chunks are replaced with an empty string.

Setting aside for the moment that my previous comments (I don't understand why mozreview is making me comment on this stuff twice) will eliminate this function entirely, documentation comments inside these C++ functions aren't the same as the spec-step functions I illustrated in the earlier comment.  I want those sorts of comments, not this sort of comment.

Note also that spec-step comments don't necessarily correspond 100% with implementation tactics.  It's fine for spec language to be something like,

""""
1. Let keyList be a List formed by splitting processedKey into separate strings at each U+002F SOLIDUS (excluding it from each substring).
2. If the first element of keyList is not "dates", throw a RangeError.
3. If the second element of keyList is "gregorian", then:
   a. If the third element of keyList is "months", then
      i. If keyList's length is not 4, throw a RangeError.
      ii. If the fourth element of keyList is "january", ...
      iii. Else if the fourth element is "february"...
      iv. Else throw a RangeError.
4. Else, if the second element of keyList is "fields", then:
   a. If the third element of keyList is
5. Else throw a RangeError.
""""

and if it turns into algorithm comments in the code that don't exactly, precisely, 100% match up line-by-line, that's perfectly acceptable.

::: js/src/builtin/Intl.cpp:2739
(Diff revisions 2 - 3)
>          return false;
>      }
>      ScopedICUObject<UDateFormat, udat_close> datToClose(fmt);
>  
> +    // UDateTimePatternGenerator will be needed for translations of date and
> +    // time fields like "month", "week", "day" etc.

"will be needed" comments, aren't.  :-)  It's perfectly obvious where this is used, and why, just by searching for the variable name.

::: js/src/builtin/Intl.js:3072
(Diff revision 3)
> + * order to ensure they're aligned with what Intl API returns.
> + *
> + * This API may one day be a foundation for an ECMA402 API spec proposal.
> + *
> + * The function takes two arguments - locales which is a list of locale strings
> + * and options which is an object with two optional properties:

This isn't what I meant about spec steps.  See later on for more detail.

::: js/src/builtin/Intl.js:3094
(Diff revision 3)
> + *     An key-value pair list of requested keys and corresponding
> + *     translated values
> + *
> + */
> +function Intl_getDisplayNames(locales, options) {
> +    const requestedLocales = CanonicalizeLocaleList(locales);

For spec steps, I meant something like:

    // 1. Let requestedLocales be CanonicalizeLocaleList(locales).
    const requestedLocales = ...;
    
    // 2. If options is undefined, let options be ObjectCreate(%ObjectPrototype%); otherwise let options be ToObject(options).
    if (...)
        options = {};
    else
        options = ToObject(options);

    // 3. Let localeOpt be a new Record.
    // 4. Set localeOpt.[[localeMatcher]] to "best fit".
    const localeOpt = new Record();
    localeOpt.localeMatcher = "best fit";

and so on through the entire algorithm.

It is inordinately easy, if you don't implement an algorithm by doing this (or working from such steps previously written), to implement unintentionally-visible side effects.  For example, how you reused `keys` multiple times despite its being a user-provided object, when such has the potential to invoke user-defined behavior (and because of how `IsArray` is defined, could have), versus using `List` (interactions with which are invisible to user code, by design).  Writing a precise algorithm that the code matches will ensure these issues aren't in the final patch.
Flags: needinfo?(jwalden+bmo)
Comment hidden (mozreview-request)
Comment on attachment 8790481 [details]
Bug 1287677 - Add mozIntl.getDisplayNames API.

Thanks for the review Jeff!

I updated the patch to your feedback and below are comments:

> It is always and everywhere wrong to use |strtok|

Fun. Now, try to search for "Split string with delimiters in C".

> You should be able to parse the string through in one go, without having to modify it.  Something like this:

Wow, and here I thought that I'm aiming at writing an easy ECMA402-direction style wrapper for CLDR data.
Write in C they said, see the world they said.

Thanks a lot. This piece works great!

> Surely this needs to RangeError if no month is found, right?

So, I aimed at gathering all missed cases below under "if (index == -1)", but I guess it's less readable than just adding a case and erroring in each if..else.

> As I said last time around, this needs to define the elements:
>    _DefineDataProperty(result, key, val);

I don't get MozReview. Sorry for that. This has been fixed in the last version of my patch. If you keep seeing it, MozReview is playing tricks.


> var processedKeys = new List();

When tried to use List here, it crashed from within C code. Not sure why, but switching to Array works.

> Probably worth a comment over this that `values` will be an array whose elements are all strings.

Not sure how should I mix spec comments with inline comments

I pseudo-specced the whole JS code the way I'd do if I wrote a spec for it.
Unless you want me to write a whole spec, that's how far I feel like we can get.

I didn't comment the C code the same way, mostly because I don't think we would spec this behavior in ECMA402 at all, much like we don't spec the 1.1.3 PluralRuleSelection in Intl.PluralRules API, but instead say "it performs implementation dependent algorithm".

Is that ok?

Comment 24

3 years ago
mozreview-review
Comment on attachment 8790481 [details]
Bug 1287677 - Add mozIntl.getDisplayNames API.

https://reviewboard.mozilla.org/r/78266/#review94714

::: js/src/builtin/Intl.cpp:2751
(Diff revision 4)
> +    // branches for.
> +    //
> +    // For any unknown path branch, the wordVal will keep NullValue and
> +    // we'll throw at the end.
> +    for (uint32_t i = 0; i < keys->length(); i++) {
> +        GetElement(cx, keys, keys, i, &keyValue);

GetElement is fallible.  You need to check the result and return false on failure.

::: js/src/builtin/Intl.cpp:2755
(Diff revision 4)
> +    for (uint32_t i = 0; i < keys->length(); i++) {
> +        GetElement(cx, keys, keys, i, &keyValue);
> +
> +        JSAutoByteString pattern;
> +        keyValStr = keyValue.toString();
> +        if (!pattern.encodeUtf8(cx, keyValStr))

Oh, ugh.  I just realized, this quite possibly/probably will result (since we're interpeting these strings as null-terminated) that this probably messes up handling of "dates/fields/year\0EXTRA".  Sigh.  Run with this for now, but file a followup (assign to me) to figure out how to address this.  I have an idea for fixing this, but it's probably not worth me talking you through executing it.

::: js/src/builtin/Intl.cpp:2782
(Diff revision 4)
> +                fieldType = UDATPG_DAY_FIELD;
> +            } else {
> +                JS_ReportErrorNumberUTF8(cx, GetErrorMessage, nullptr, JSMSG_INVALID_KEY, pattern.ptr());
> +                return false;
> +            }
> +

This needs this sort of addition to verify there's no trailing garbage.

    // This part must be the final part with no trailing data.
    if (*pat != '\0') {
        complain bad key;
        return false;
    }

I think the test I suggest later will catch the lack of this -- please make the test change, verify failure without this hunk, then verify success with the nul-testing added.

::: js/src/builtin/Intl.cpp:2792
(Diff revision 4)
> +                JS_ReportErrorNumberASCII(cx, GetErrorMessage, nullptr, JSMSG_INTERNAL_INTL_ERROR);
> +                return false;
> +            }
> +
> +            JSString* word = NewStringCopyN<CanGC>(cx, UCharToChar16(value), resultSize);
> +            if (!word) {

Oops, I didn't notice this last times around -- functions that take `cx` as first argument are 99.99% of the time fallible and, on failure, will report errors themselves.  (The major exception are functions that only take a single argument, where it's unclear whether leading-argument means fallible, or trailing-argument means infallible.  Generally it means the former.)  No need for the caller to do so.

So, then, this should just be

    JSString* word = NewStringCopyN<CanGC>(...);
    if (!word)
        return false;

::: js/src/builtin/Intl.cpp:2799
(Diff revision 4)
> +                return false;
> +            }
> +            wordVal.setString(word);
> +        } else if (MatchPart(&pat, "gregorian")) {
> +            UDateFormatSymbolType symbolType;
> +            int32_t index = -1;

Don't initialize this -- it's not used unless it's set.

::: js/src/builtin/Intl.cpp:2845
(Diff revision 4)
> +                if (equal(style, "narrow")) {
> +                    symbolType = UDAT_STANDALONE_NARROW_WEEKDAYS;
> +                } else if (equal(style, "short")) {
> +                    symbolType = UDAT_STANDALONE_SHORT_WEEKDAYS;
> +                } else {
> +                    symbolType = UDAT_STANDALONE_WEEKDAYS;

Add

    MOZ_ASSERT(equal(style, "long"));

::: js/src/builtin/Intl.cpp:2882
(Diff revision 4)
> +            } else {
> +                JS_ReportErrorNumberUTF8(cx, GetErrorMessage, nullptr, JSMSG_INVALID_KEY, pattern.ptr());
> +                return false;
> +            }
> +
> +

Needs the same

    if (*pat != '\0') {
        complain;
        return false;
    }

sort of thing mentioned earlier.  And I think the various "dates/fields/dayperiods/amEXTRA" sort of tests should catch this (again, please verify).

::: js/src/builtin/Intl.cpp:2891
(Diff revision 4)
> +            if (status == U_BUFFER_OVERFLOW_ERROR) {
> +                if (!chars.resize(resultSize))
> +                    return false;
> +                status = U_ZERO_ERROR;
> +                udat_getSymbols(fmt, symbolType, index, Char16ToUChar(chars.begin()),
> +                        nullptr, &status);

You should be passing `resultSize` here, not `nullptr`.  This doesn't even compile for me on my system.  Also, that second line should be aligned with `fmt`.

::: js/src/builtin/Intl.cpp:2900
(Diff revision 4)
> +                return false;
> +            }
> +
> +            JSString* word = NewStringCopyN<CanGC>(cx, chars.begin(), resultSize);
> +            if (!word) {
> +                JS_ReportErrorNumberASCII(cx, GetErrorMessage, nullptr, JSMSG_INTERNAL_INTL_ERROR);

Just return false, again.

::: js/src/builtin/Intl.cpp:2910
(Diff revision 4)
> +            JS_ReportErrorNumberUTF8(cx, GetErrorMessage, nullptr, JSMSG_INVALID_KEY, pattern.ptr());
> +            return false;
> +        }
> +
> +        // If the wordVal is Null, report an invalid key RangeError
> +        if (wordVal.isNull()) {

I think you can just

    MOZ_ASSERT(wordVal.isString());

here.

::: js/src/builtin/Intl.cpp:2915
(Diff revision 4)
> +        if (wordVal.isNull()) {
> +            JS_ReportErrorNumberUTF8(cx, GetErrorMessage, nullptr, JSMSG_INVALID_KEY, pattern.ptr());
> +            return false;
> +        }
> +
> +        if (!DefineElement(cx, result, i, wordVal)) {

This function reports its own errors, no need to report again.

::: js/src/builtin/Intl.js:3053
(Diff revision 4)
> +
> +/**
> + * This function zips keys and values returned from intl_computeDisplayNames
> + * into a single object.
> + */
> +function buildDisplayNamesObject(keys, values) {

This function should be removed now.

::: js/src/builtin/Intl.js:3089
(Diff revision 4)
> + *
> + *   style:
> + *     negotiated style
> + *
> + *   values:
> + *     An key-value pair list of requested keys and corresponding

A key-value pair...

::: js/src/builtin/Intl.js:3116
(Diff revision 4)
> +    // 5. Let opt be a new Record.
> +    const localeOpt = new Record();
> +    // 6. Set localeOpt.[[localeMatcher]] to "best fit".
> +    localeOpt.localeMatcher = "best fit";
> +
> +    // 7. Let r be RedolveLocale(%DateTimeFormat%.[[availableLocales]], requestedLocales, localeOpt,

s/Redolve/Resolve/

::: js/src/builtin/Intl.js:3130
(Diff revision 4)
> +    const style = GetOption(options, "style", "string", ["long", "short", "narrow"], "long");
> +    // 9. Let keys be ? Get(options, "keys").
> +    let keys = options.keys;
> +
> +    // 10. If keys is undefined,
> +    if (keys === undefined)

The statements in the if-else arms here need to be braced.

::: js/src/builtin/Intl.js:3131
(Diff revision 4)
> +    // 9. Let keys be ? Get(options, "keys").
> +    let keys = options.keys;
> +
> +    // 10. If keys is undefined,
> +    if (keys === undefined)
> +        // a. Let keys be a new Array.

s/a new Array/ArrayCreate(0)/ is the spec way to say this.  And for all the other "a new Array" bits of language.

::: js/src/builtin/Intl.js:3138
(Diff revision 4)
> +    // 11. Else,
> +    //   a. If Type(keys) is not Object, throw a TypeError exception.
> +    else if (!IsObject(keys))
> +        ThrowTypeError(JSMSG_INVALID_KEYS_TYPE, typeof keys);
> +
> +    // 12. Let processedKeys be a new Array.

Add an extra comment here that,

"""
(This really should be a List, but we use an Array here in order that |intl_ComputeDisplayNames| may infallibly access the list's length via |ArrayObject::length|.)
"""

::: js/src/builtin/Intl.js:3144
(Diff revision 4)
> +    let processedKeys = [];
> +    // 13. Let len be ? ToLength(? Get(keys, "length")).
> +    let len = ToLength(keys.length);
> +    // 14. Let i be 0.
> +    // 15. Repeat, while i < len
> +    for (let i = 0; i < len; i++)

This loop body needs braces around it, seeing as it covers multiple lines (including the comments).

::: js/src/builtin/Intl.js:3145
(Diff revision 4)
> +    // 13. Let len be ? ToLength(? Get(keys, "length")).
> +    let len = ToLength(keys.length);
> +    // 14. Let i be 0.
> +    // 15. Repeat, while i < len
> +    for (let i = 0; i < len; i++)
> +        // a. Let processedKey be ToString(? Get(keys, i)).

Needs a ? before ToString.

::: js/src/builtin/Intl.js:3154
(Diff revision 4)
> +    // 16. Let resolvedOptions be a new Record.
> +    // 17. Set resolvedOptions.[[style]] to style.
> +    // 18. Set resolvedOptions.[[keys]] to processedKeys.
> +    const resolvedOptions = { style, keys: processedKeys };
> +
> +    // 19. Let names be ? ComputeDisplayNames(r.locale, resolvedOptions).

r.[[locale]] (and search this entire function and replace all such uses)

It's fine to have this step just call a sub-function, but then that sub-function should have stepwise comments in it, too.  Steps that say "For each element of `keys`, use an implementation dependent algorithm to map that key to a string that..." would be perfectly fine.

::: js/src/builtin/Intl.js:3155
(Diff revision 4)
> +    // 17. Set resolvedOptions.[[style]] to style.
> +    // 18. Set resolvedOptions.[[keys]] to processedKeys.
> +    const resolvedOptions = { style, keys: processedKeys };
> +
> +    // 19. Let names be ? ComputeDisplayNames(r.locale, resolvedOptions).
> +    const names = intl_ComputeDisplayNames(r.locale, resolvedOptions);

Actually, it occurs to me -- there's no reason to have an "options" argument here, versus just passing `style` and `processedKeys` as separate arguments.  Please expand the one argument into two here and in the implementation.  That should also eliminate a fallible `GetProperty`.

::: js/src/builtin/Intl.js:3169
(Diff revision 4)
> +        const key = processedKeys[i];
> +        // b. Let name be ? Get(names, i).
> +        const name = names[i];
> +        // c. Assert: Type(name) is string.
> +        assert(typeof name === "string", "unexpected non-string value");
> +        // d. Assert: ToLength(? Get(name, "length")) is greater than 0.

Asserts that perform side-effectful or throwing operations are bad.  In this case there's no side effects because `names` is a string, but then we shouldn't be using the side-effectful description.  Make this instead,

"""
d. Assert: the length of name is greater than zero.
"""

::: js/src/js.msg:477
(Diff revision 4)
>  MSG_DEF(JSMSG_INTERNAL_INTL_ERROR,     0, JSEXN_ERR, "internal error while computing Intl data")
>  MSG_DEF(JSMSG_INTL_OBJECT_NOT_INITED,  3, JSEXN_TYPEERR, "Intl.{0}.prototype.{1} called on value that's not an object initialized as a {2}")
>  MSG_DEF(JSMSG_INTL_OBJECT_REINITED,    0, JSEXN_TYPEERR, "can't initialize object twice as an object of an Intl constructor")
>  MSG_DEF(JSMSG_INVALID_CURRENCY_CODE,   1, JSEXN_RANGEERR, "invalid currency code in NumberFormat(): {0}")
>  MSG_DEF(JSMSG_INVALID_DIGITS_VALUE,    1, JSEXN_RANGEERR, "invalid digits value: {0}")
> +MSG_DEF(JSMSG_INVALID_KEYS_TYPE,       1, JSEXN_TYPEERR, "invalid keys type: {0}")

Use zero for arity and make the message "calendar info keys must be an object or undefined".  Passing in `typeof keys` will pass "object" if null were passed, which would be confusing.

::: js/src/tests/Intl/getDisplayNames.js:170
(Diff revision 4)
> +  });
> +}, RangeError);
> +
> +const typeErrorKeys = [
> +  null,
> +  'string',

Probably should add Symbol.iterator, 15, 1, 3.7, NaN, Infinity as other values here.

::: js/src/tests/Intl/getDisplayNames.js:188
(Diff revision 4)
> +  ['foo'],
> +  ['dates/foo'],
> +  ['/dates/foo'],
> +  ['dates/foo/foo'],
> +  ['dates/fields'],
> +  ['dates/fields/'],

Add ["dates/fieldsEXTRA"] and ["dates/gregorianEXTRA"] and ["datesEXTRA"] and ["dates/gregorian/monthsEXTRA"] and ["dates/gregorian/weekdaysEXTRA"] and ["dates/gregori\u1161n/months/january"] to this list -- want to make sure various trailing gunk, and the Unicode truncation thing, are tested.
Attachment #8790481 - Flags: review?(jwalden+bmo) → review-
Comment hidden (mozreview-request)
> This needs this sort of addition to verify there's no trailing garbage.

I added the bits, but could not get the tests to fail without them.

All the ['dates/fields/yearEXTRA'] just work.

I got the ['dates/fields/year\0EXTRA'] to fail but the `if` didn't fix it.

I filled bug 1319740 to handle that.
Assignee

Updated

3 years ago
Blocks: 1319740

Comment 27

3 years ago
mozreview-review
Comment on attachment 8790481 [details]
Bug 1287677 - Add mozIntl.getDisplayNames API.

https://reviewboard.mozilla.org/r/78266/#review95392

::: js/src/builtin/Intl.cpp:2683
(Diff revision 5)
> +bool
> +js::intl_ComputeDisplayNames(JSContext* cx, unsigned argc, Value* vp)
> +{
> +    // 1. Assert: locale is a string.
> +    // 2. Assert: style is a string.
> +    // 3. Assert: keys is an Array.

Put these three comments adjacent to the corresponding asserts.

::: js/src/builtin/Intl.cpp:2684
(Diff revision 5)
> +js::intl_ComputeDisplayNames(JSContext* cx, unsigned argc, Value* vp)
> +{
> +    // 1. Assert: locale is a string.
> +    // 2. Assert: style is a string.
> +    // 3. Assert: keys is an Array.
> +    // 4. Let result be ArrayCreate(0).

Move this next to the declaration/init of `result` further down.  (General idea is to have these step-comments actually next to the steps.)

::: js/src/builtin/Intl.cpp:2685
(Diff revision 5)
> +{
> +    // 1. Assert: locale is a string.
> +    // 2. Assert: style is a string.
> +    // 3. Assert: keys is an Array.
> +    // 4. Let result be ArrayCreate(0).
> +    // 4. For each element of keys,

Move this line down next to the corresponding for-loop.

::: js/src/builtin/Intl.cpp:2687
(Diff revision 5)
> +    // 2. Assert: style is a string.
> +    // 3. Assert: keys is an Array.
> +    // 4. Let result be ArrayCreate(0).
> +    // 4. For each element of keys,
> +    //   a. Perform an implementation dependent algorithm to map a key to a
> +    //      corresponding display name.

Move this comment next to the declaration `const char* pat`.

::: js/src/builtin/Intl.cpp:2688
(Diff revision 5)
> +    // 3. Assert: keys is an Array.
> +    // 4. Let result be ArrayCreate(0).
> +    // 4. For each element of keys,
> +    //   a. Perform an implementation dependent algorithm to map a key to a
> +    //      corresponding display name.
> +    //   b. Append the result string to result.

Move this all the way down next to the `DefineElement` call.

::: js/src/builtin/Intl.cpp:2689
(Diff revision 5)
> +    // 4. Let result be ArrayCreate(0).
> +    // 4. For each element of keys,
> +    //   a. Perform an implementation dependent algorithm to map a key to a
> +    //      corresponding display name.
> +    //   b. Append the result string to result.
> +    // 5. Return result.

Move this next to `args.rval().setObject(*result);`.

::: js/src/builtin/Intl.cpp:2893
(Diff revision 5)
> +                return false;
> +            }
> +
> +            int32_t resultSize =
> +                udat_getSymbols(fmt, symbolType, index, Char16ToUChar(chars.begin()),
> +                                INITIAL_CHAR_BUFFER_SIZE, &status);

The interdiff shows something funky here -- if it's a tab, remove it.

::: js/src/builtin/Intl.cpp:2916
(Diff revision 5)
> +        } else {
> +            JS_ReportErrorNumberUTF8(cx, GetErrorMessage, nullptr, JSMSG_INVALID_KEY, pattern.ptr());
> +            return false;
> +        }
> +
> +        // If the wordVal is Null, report an invalid key RangeError

Remove this comment.

::: js/src/builtin/Intl.cpp:2924
(Diff revision 5)
> +        if (!DefineElement(cx, result, i, wordVal))
> +            return false;
> +    }
> +
> +    args.rval().setObject(*result);
> +

Remove this blank line -- it's pretty traditional for rval-setting and returning true to be grouped together, because they're read as the idiom of "return X".

::: js/src/builtin/Intl.js:3120
(Diff revision 5)
> +    // 10. If keys is undefined,
> +    if (keys === undefined) {
> +        // a. Let keys be ArrayCreate(0).
> +        keys = [];
> +    // 11. Else,
> +    //   a. If Type(keys) is not Object, throw a TypeError exception.

There's not a good way to put step comments in with the ECMAScript style of having two steps for if/else, to be sure.  But I think we usually would put this step-11 comment inside the else, indented consistent with the rest of the else-block.

::: js/src/tests/Intl/getDisplayNames.js:212
(Diff revision 5)
> +  ['dates/gregorian/dayperiods/foo'],
> +  ['dates/gregorian/months/الشهر'],
> +  [3],
> +  [null],
> +  ['d', 'a', 't', 'e', 's'],
> +  ['datesEXTRA'],

Oh, I see why my suggestions for tests didn't pick up the missing-'\0' thing.  Doesn't hurt to keep the suggestions I had -- but what's needed to hit the null-terminator case are these:

    ["dates/fields/year/"],
    ["dates/fields/month/"],
    ["dates/fields/week/"],
    ["dates/fields/day/"],
    ["dates/gregorian/months/january/"],
    ["dates/gregorian/weekdays/saturday/"],
    ["dates/gregorian/dayperiods/am/"],
    ["dates/fields/months/january/"],

::: js/src/tests/Intl/getDisplayNames.js:214
(Diff revision 5)
> +  [3],
> +  [null],
> +  ['d', 'a', 't', 'e', 's'],
> +  ['datesEXTRA'],
> +  ['dates/fieldsEXTRA'],
> +  ['daes/gregorianEXTRA'],

s/daes/dates/

::: js/src/tests/Intl/getDisplayNames.js:217
(Diff revision 5)
> +  ['datesEXTRA'],
> +  ['dates/fieldsEXTRA'],
> +  ['daes/gregorianEXTRA'],
> +  ['dates/gregorian/monthsEXTRA'],
> +  ['dates/gregorian/weekdaysEXTRA'],
> +  ['dates/fields/dayperiod/amEXTRA'],

Just noticed this -- I should have said "dayperiods".
Attachment #8790481 - Flags: review?(jwalden+bmo) → review+
Comment hidden (mozreview-request)
Assignee

Comment 29

3 years ago
mozreview-review-reply
Comment on attachment 8790481 [details]
Bug 1287677 - Add mozIntl.getDisplayNames API.

https://reviewboard.mozilla.org/r/78266/#review88570

> Who owns the return value here?  Does ICU own it, or do we need to deallocate it when we're done with it?

I'm still not sure if we need to deallocate it.
Comment on attachment 8790481 [details]
Bug 1287677 - Add mozIntl.getDisplayNames API.

I updated the patch to not fail on missing ICU and fixed the comments.

There are two items left:
 - I'm not sure why the termination check for \0 doesn't catch the tests you suggested
 - I'm not sure about the deallocation

Can you take a last look at the code and help me with those two things?
Attachment #8790481 - Flags: review+ → review?(jwalden+bmo)
Assignee

Comment 31

3 years ago
mozreview-review-reply
Comment on attachment 8790481 [details]
Bug 1287677 - Add mozIntl.getDisplayNames API.

https://reviewboard.mozilla.org/r/78266/#review95392

> Oh, I see why my suggestions for tests didn't pick up the missing-'\0' thing.  Doesn't hurt to keep the suggestions I had -- but what's needed to hit the null-terminator case are these:
> 
>     ["dates/fields/year/"],
>     ["dates/fields/month/"],
>     ["dates/fields/week/"],
>     ["dates/fields/day/"],
>     ["dates/gregorian/months/january/"],
>     ["dates/gregorian/weekdays/saturday/"],
>     ["dates/gregorian/dayperiods/am/"],
>     ["dates/fields/months/january/"],

I tried adding those tests but it seems that the safeguard in the code is not picking it and the tests don't fail in result. Not sure why :/
Whiteboard: [milestone5]
Comment hidden (mozreview-request)
> I tried adding those tests but it seems that the safeguard in the code is not picking it and the tests don't fail in result. Not sure why :/

I was able to identify the reason. The code in MatchPart was skipping the '/' after each part, including after last part making any pattern like "dates/fields/year/" work.

I added an `if` to MatchPart that returns false if the character after '/' is '\0'.
Let me know if you'd prefer the test to be done in the for loop in ComputeDisplayNames instead.
Also, :bz pointed out that we should move away from using JSAutoByteString (bug 1320706). Not sure if it's material for fixing here or a separate bug to refactor the whole Intl.cpp.
Comment hidden (mozreview-request)

Comment 36

3 years ago
mozreview-review
Comment on attachment 8790481 [details]
Bug 1287677 - Add mozIntl.getDisplayNames API.

https://reviewboard.mozilla.org/r/78266/#review96306

::: js/src/builtin/Intl.cpp:3031
(Diff revision 8)
> +
> +    // If the next character is the delimiter, return false if
> +    // it's the last character in the pattern to prevent matching
> +    // against patterns that end with the delimiter.
> +    if (after == '/') {
> +        const char after2 = (*pattern)[N];

Hmm, okay.  I think this is pointing out that it was a mistake to try to parse a "part" as including the trailing slash.

I think we should trim this function down a little.  Cut out everything from the `const char after` line down through `size_t skipBy`, and replace the final `skipBy` with `N - 1`.  Then inside the part-matching code, add a bunch of

    if (!MatchPart(&pat, "/")) {
        complain bad patern;
        return false;
    }

to check for slashes explicitly.  That should address this concern, and without piling on special case after special case as it seems apparent to me this ultimate patch does.

::: js/src/builtin/Intl.cpp:3154
(Diff revision 8)
> +                return false;
> +            }
> +
> +            int32_t resultSize;
> +
> +            const UChar* value = udatpg_getAppendItemName(dtpg, fieldType, &resultSize);

(In reply to Zibi Braniecki [:gandalf][:zibi] from comment #29)
> > Who owns the return value here?  Does ICU own it, or do we need to deallocate
> > it when we're done with it?
> 
> I'm still not sure if we need to deallocate it.

I read through the implementation.  :-\  Looks like the buffer is owned by some UnicodeString inside of ICU, so it appears no need to deallocate somehow.
Attachment #8790481 - Flags: review?(jwalden+bmo) → review-
Comment hidden (mozreview-request)
Ok, I was hoping that maybe just checking for the trailing delimiter at the end of the string will be enough, but I went for the solution you proposed.

We can work more on that algo when we'll be pushing it to ECMA402, and unblock the HTML5 forms team now.

Comment 39

3 years ago
mozreview-review
Comment on attachment 8790481 [details]
Bug 1287677 - Add mozIntl.getDisplayNames API.

https://reviewboard.mozilla.org/r/78266/#review96508

There are a few tweaks I'll probably make to this after it lands, but this should land and stop the ping-pong.  :-)
Attachment #8790481 - Flags: review?(jwalden+bmo) → review+

Comment 40

3 years ago
Pushed by zbraniecki@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/12718a74c79f
Add mozIntl.getDisplayNames API. r=Waldo

Comment 41

3 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/12718a74c79f
Status: ASSIGNED → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
Hi Zibi, thanks for working on these APIs. I talked to Scott, and he told me the way to use mozIntl is:
> const mozIntl = Components.classes["@mozilla.org/mozintl;1"].getService(Components.interfaces.mozIMozIntl);

However, Components.classes is for chrome code only, and the input box part for date/time inputs are in content side, implemented using XBL. Do you know if there is any other way I can access these utility functions? (e.g., we usually expose it in webidl and restrict it using [Func="IsChromeOrXBL"]).

Thanks.
Flags: needinfo?(gandalf)
I never encountered such a case.

:mossop - do you think it would be ok to expose mozIntl to XBL using sth like what :jessica asked for in comment 42?
Flags: needinfo?(gandalf) → needinfo?(dtownsend)
(In reply to Zibi Braniecki [:gandalf][:zibi] from comment #43)
> I never encountered such a case.
> 
> :mossop - do you think it would be ok to expose mozIntl to XBL using sth
> like what :jessica asked for in comment 42?

I can't think of any reason why not. Adding a webidl will need a DOM peer to sign off though so best to check in with one of those: https://wiki.mozilla.org/Modules/Core#Document_Object_Model
Flags: needinfo?(dtownsend)
ni myself for comment 44 to track for it's blocking RTL functions on time/date input types
Flags: needinfo?(whuang)
(In reply to Dave Townsend [:mossop] from comment #44)
> (In reply to Zibi Braniecki [:gandalf][:zibi] from comment #43)
> > I never encountered such a case.
> > 
> > :mossop - do you think it would be ok to expose mozIntl to XBL using sth
> > like what :jessica asked for in comment 42?
> 
> I can't think of any reason why not. Adding a webidl will need a DOM peer to
> sign off though so best to check in with one of those:
> https://wiki.mozilla.org/Modules/Core#Document_Object_Model
I suppose we'll need a separate bug for that, right?
And more importantly, who's going to take the task. :)
:wesley_huang - I assume that yes, we should open a new bug.

I don't know who would be the best person to write the patch. I have never written webidl, so I'm not sure how it would look like. If you can open a new bug, we can discuss it there and fine people to NI.
(In reply to Zibi Braniecki [:gandalf][:zibi] from comment #47)
> :wesley_huang - I assume that yes, we should open a new bug.
> 
> I don't know who would be the best person to write the patch. I have never
> written webidl, so I'm not sure how it would look like. If you can open a
> new bug, we can discuss it there and fine people to NI.

Thanks Zibi.
I had created Bug 1337234 so let's move on the discussion there.
Flags: needinfo?(whuang)
Assignee

Updated

2 years ago
Flags: needinfo?(srl)
You need to log in before you can comment on or make changes to this bug.