Closed Bug 769872 Opened 12 years ago Closed 11 years ago

Implement core of Intl, Intl.Collator, Intl.NumberFormat, Intl.DateTimeFormat

Categories

(Core :: JavaScript: Internationalization API, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla22
blocking-basecamp -

People

(Reporter: mozillabugs, Assigned: mozillabugs)

References

(Depends on 1 open bug, Blocks 1 open bug, )

Details

(Whiteboard: [js:t] [bcp47])

Attachments

(20 files, 52 obsolete files)

6.80 KB, patch
mozillabugs
: review+
till
: checkin+
Details | Diff | Splinter Review
30.42 KB, patch
mozillabugs
: review+
Details | Diff | Splinter Review
2.70 KB, patch
Waldo
: review+
Details | Diff | Splinter Review
6.19 KB, patch
till
: review+
Details | Diff | Splinter Review
11.52 KB, patch
mozillabugs
: review+
Details | Diff | Splinter Review
9.90 KB, patch
Waldo
: review+
Details | Diff | Splinter Review
9.06 KB, patch
mozillabugs
: review+
Details | Diff | Splinter Review
12.23 KB, patch
mozillabugs
: review+
Details | Diff | Splinter Review
6.69 KB, patch
mozillabugs
: review+
Details | Diff | Splinter Review
6.64 KB, patch
mozillabugs
: review+
Details | Diff | Splinter Review
23.84 KB, patch
Waldo
: review+
Details | Diff | Splinter Review
9.12 KB, patch
mozillabugs
: review+
Details | Diff | Splinter Review
5.56 KB, patch
mozillabugs
: review+
Details | Diff | Splinter Review
8.18 KB, patch
mozillabugs
: review+
Details | Diff | Splinter Review
13.71 KB, patch
mozillabugs
: review+
Details | Diff | Splinter Review
4.56 KB, patch
mozillabugs
: review+
Details | Diff | Splinter Review
4.97 KB, patch
mozillabugs
: review+
Details | Diff | Splinter Review
5.74 KB, patch
mozillabugs
: review+
Details | Diff | Splinter Review
5.65 KB, patch
Waldo
: review+
Details | Diff | Splinter Review
5.82 KB, patch
Waldo
: review+
Details | Diff | Splinter Review
This is the bulk of implementing the ECMAScript Internationalization API: Integrate the existing JavaScript prototype implementation as self-hosted JavaScript code into the JavaScript engine, and use ICU to implement the core pieces of functionality: the actual locale-sensitive string comparison, number formatting, and date and time formatting.
One detail: The spec requires that the supportedLocalesOf methods return an array that's extensible but has its length and indexed properties non-writable/non-configurable. Bug 591059 and bug 598996 report that the length property currently cannot be defined that way.
Depends on: 591059, 598996
Whiteboard: [js:t]
This is something we’d need for B2G/Gaia, especially for date/time formatting.

In the short term I think we’d be fine if we could have something like this:
  var result = (new Date​(t)).toLocaleDateString​(["fr"]);

… as suggested on your weblog: ;-)
http://norbertlindenberg.com/2012/06/ecmascript-internationalization-api/index.html#DateTimeFormat

Could this be a first step? Or would you have other suggestions regarding an implementation of a localized date/time format for B2G?
blocking-basecamp: --- → ?
Where's the spec for this? (I'm not seeing it on the harmony proposals page.)
You found the right spec, and the changes to Date.prototype.toLocaleDateString and similar existing methods are specified in chapter 13; 13.3.2 for this particular method.

The specification respecifies these existing methods as thin wrappers around Intl.DateTimeFormat and the other new objects, and I'm planning to implement them this way, of course with optimizations such as not constructing a new DateTimeFormat for every call. The locale and parameter negotiation would depend on the same facilities as DateTimeFormat and friends.

As an interim step, I'm planning to reimplement these functions, without any negotiation, just using default locale and options, based on ICU - see bug 769871.

The complete implementation of the ECMAScript Internationalization API then will be the combination of the work in bug 769871 and in this bug, as ICU will be used to provide input into the negotiation (the "what's available" part) and to implement the CompareStrings, FormatNumber, and FormatDateTime functions.
The specs are not finalized and Basecamp timeline does not allow implementation of these. Blocking-.
blocking-basecamp: ? → -
Depends on: 784288
Depends on: 784400
Depends on: 784300, 784293
Depends on: 785805
Depends on: 785294, 785645
This is a version of my Intl reference implementation adapted to run within Till Schneidereit's self-hosted JavaScript environment. There are lots of workarounds for the current limitations of that environment.

There's no integration with ICU yet; instead Collator.prototype.compare, NumberFormat.prototype.format, and DateTimeFormat.prototype.format use the existing locale sensitive methods on String, Number, and Date.

Tested with the js shell on Mac against the ECMAScript Internationalization API conformance test suite (hg.ecmascript.org/tests/test262, intl402 subset); it currently passes 110 of the 137 tests. Browser has very short MTTF - my code likely has bugs around garbage collection.

Based on mozilla-central -r 706174d31a02.
Comment on attachment 657989 [details] [diff] [review]
Add Intl with constructors Collator, NumberFormat, DateTimeFormat (part 1, WIP)

Review of attachment 657989 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/builtin/intl.js
@@ +904,5 @@
> +    searchLocaleData[locale].sensitivity = "accent";
> +
> +    __defineInternalProperty(result, "__sortLocaleData", sortLocaleData);
> +    __defineInternalProperty(result, "__searchLocaleData", searchLocaleData);
> +    __defineInternalProperty(result, "__availableLocales", [locale]);

I think you want to use what Accept-Languages uses ( https://mxr.mozilla.org/mozilla-central/source/dom/base/Navigator.cpp#263 ) for availableLocales instead of using DefaultLocale

::: js/src/vm/GlobalObject.cpp
@@ +279,5 @@
> +#ifdef HAVE_SETLOCALE
> +    locale = setlocale(LC_ALL, NULL);
> +#else
> +    locale = getenv("LANG");
> +#endif

also for DefaultLocale I believe it would be better to use Navigator::GetLanguage or navigator.language instead of LANG env
(In reply to Zbigniew Braniecki [:gandalf] from comment #9)

> > +    __defineInternalProperty(result, "__availableLocales", [locale]);
> 
> I think you want to use what Accept-Languages uses (
> https://mxr.mozilla.org/mozilla-central/source/dom/base/Navigator.cpp#263 )
> for availableLocales instead of using DefaultLocale

No, availableLocales needs to reflect what the implementation actually supports, not what the user desires. Once I hook up the Intl module to ICU, availableLocales will be based on the locales for which ICU has data. We still have to determine whether we take the full set or trim it to a subset to reduce the size of the library.

> > +#ifdef HAVE_SETLOCALE
> > +    locale = setlocale(LC_ALL, NULL);
> > +#else
> > +    locale = getenv("LANG");
> > +#endif
> 
> also for DefaultLocale I believe it would be better to use
> Navigator::GetLanguage or navigator.language instead of LANG env

The default locale primarily has to be one that the implementation supports, and one that SpiderMonkey has access to. The above is only a starting point; there also needs to be an API that lets containers override the default locale.
Blocks: 525494
Whiteboard: [js:t] → [js:t] [bcp47]
(In reply to Norbert Lindenberg from comment #10)
> (In reply to Zbigniew Braniecki [:gandalf] from comment #9)
> 
> > > +    __defineInternalProperty(result, "__availableLocales", [locale]);
> > 
> > I think you want to use what Accept-Languages uses (
> > https://mxr.mozilla.org/mozilla-central/source/dom/base/Navigator.cpp#263 )
> > for availableLocales instead of using DefaultLocale
> 
> No, availableLocales needs to reflect what the implementation actually
> supports, not what the user desires. Once I hook up the Intl module to ICU,
> availableLocales will be based on the locales for which ICU has data. We
> still have to determine whether we take the full set or trim it to a subset
> to reduce the size of the library.

But you do want sort what implementation actually supports by user preference, am I right here?
I'm asking cause I'm working on error recovery scenarios for l20n lib and it seems that with this implementation if the selected locale fails, we have no way to find out what's user desired fallback option.
(In reply to Zbigniew Braniecki [:gandalf] from comment #11)
> 
> But you do want sort what implementation actually supports by user
> preference, am I right here?
> I'm asking cause I'm working on error recovery scenarios for l20n lib and it
> seems that with this implementation if the selected locale fails, we have no
> way to find out what's user desired fallback option.

There can be a number of preferences involved: The user's preferences as set in the browser or operating system; the user's preferences as set in a web application; the preferences of the application itself (many applications are monolingual).

The internationalization API expects the application to specify the preferences to the API. The application may use the user's preferences set in the browser or other information, or simply specify the one language it supports. The default language only comes in if the application doesn't say anything.
I broke out the C++ part because I think it's ready to be reviewed and then committed, while the JavaScript part still has to wait for enhancements in the self-hosting infrastructure.

Try run: https://tbpl.mozilla.org/?tree=Try&rev=ae64ac57b593
Attachment #657989 - Attachment is obsolete: true
Attachment #666403 - Flags: review?(luke)
Depends on: 769871
Depends on: 655499
Comment on attachment 666403 [details] [diff] [review]
Add C++ core of Intl object and constructors Collator, NumberFormat, DateTimeFormat.

For class initialization, I'd defer to Waldo since he wrote/rewrote it all recently.
Attachment #666403 - Flags: review?(luke) → review?(jwalden+bmo)
Comment on attachment 666403 [details] [diff] [review]
Add C++ core of Intl object and constructors Collator, NumberFormat, DateTimeFormat.

Review of attachment 666403 [details] [diff] [review]:
-----------------------------------------------------------------

Okay, I think this covers first-pass comments.  A bunch of style nits for starters, some do-this-with-this-other-API things, then the big issue is not trying to share code amongst the different class initialization methods (and initializing with the same pattern all the other builtin classes use).

Sorry for the delay here, partly it's just that it's kind of draining to write out review comments that might be interpreted as excessive pedantry.  :-\

::: js/src/jsintl.cpp
@@ +12,5 @@
> +#include "jstypes.h"
> +#include "jsapi.h"
> +#include "jsatom.h"
> +#include "jscntxt.h"
> +#include "jsintl.h"

We alphabetize headers, so jstypes should go at the end.

@@ +16,5 @@
> +#include "jsintl.h"
> +
> +#include "jsobjinlines.h"
> +
> +#include "vm/GlobalObject.h"

Non-inlines headers get included before inlines headers, so put this above the jsobjinlines.h #include.  (We group "old-school" js*.h headers distinct from new-style subfolder/*.h headers.)

@@ +22,5 @@
> +using namespace js;
> +
> +/******************** Common to Intl constructors ********************/
> +
> +#define CONSTRUCTOR_CLASS_NAME(cx, name) (Atomize(cx, js_##name##_str, strlen(js_##name##_str), InternAtom))

Use cx->names().name.

@@ +26,5 @@
> +#define CONSTRUCTOR_CLASS_NAME(cx, name) (Atomize(cx, js_##name##_str, strlen(js_##name##_str), InternAtom))
> +
> +bool
> +js_IntlInitialize(JSContext *cx, HandleObject obj, const char *initializerName,
> +    HandleValue locales, HandleValue options)

The js_ prefix was used historically for extern'd methods, called across files.  This method's only called inside this function, so it should be a static, and it shouldn't have a js_ prefix.  It looks like there are a bunch of methods like this in this file.

Code style has overflowing arguments being indented so that they're aligned with the first argument, like so:

MethodName(Arg1 arg1,
           Arg2 arg2);

@@ +35,5 @@
> +
> +bool
> +js_IntlConstructor(JSContext *cx, unsigned argc, Value *vp,
> +    const char *constructorName, Class *constructorClass, HandleObject proto,
> +    const char *initializerName)

As a general rule, inside the engine you shouldn't pass around const char* parameters unless your API requires it.  (For example, throwing an exception whose message has {} parameters in it requires a const char*.)  Here, nothing does -- and in fact, the use of initializerName needs a Handle<PropertyName*>, and there's no reason you can't have callers pass that, so you should use that type.  (constructorName looks unused, so it should be removed.)

@@ +37,5 @@
> +js_IntlConstructor(JSContext *cx, unsigned argc, Value *vp,
> +    const char *constructorName, Class *constructorClass, HandleObject proto,
> +    const char *initializerName)
> +{
> +    CallArgs args = CallArgsFromVp(argc, vp);

...okay, something's screwy here.  In the mid-range we want to get rid of having argc/vp anywhere, and replacing all such places with CallArgs references, pointers, or whatever.  If you have argc/vp in new code, it should pretty much always be because you're implementing some existing API that fits that signature -- JSNative, for example.

@@ +43,5 @@
> +    RootedObject obj(cx);
> +
> +    bool construct = IsConstructing(args);
> +    if (!construct) {
> +        JSObject *global = JS_GetGlobalForObject(cx, &args.callee());

JS_GGFO is the external method for getting the global, and using it has extra overhead as a result.  Internally, you want to use cx->global() instead.  (Because of how the VM works, you're guaranteed that the global of the callee function is the current global of the context, which cx->global() exposes.)

@@ +47,5 @@
> +        JSObject *global = JS_GetGlobalForObject(cx, &args.callee());
> +        JSObject *intl = global->asGlobal().getOrCreateIntlObject(cx);
> +        if (!intl) {
> +            return false;
> +        }

Don't brace single-line ifs.  (This applies throughout the patch.)

@@ +48,5 @@
> +        JSObject *intl = global->asGlobal().getOrCreateIntlObject(cx);
> +        if (!intl) {
> +            return false;
> +        }
> +        Value self = JS_THIS(cx, vp);

JS_THIS is a public API macro, and it's on its way to deprecation.  The replacement is CallArgs::thisv().

@@ +51,5 @@
> +        }
> +        Value self = JS_THIS(cx, vp);
> +        if (!self.isUndefined() && &self.toObject() != intl) {
> +            obj = &self.toObject();
> +            if (!JS_IsExtensible(obj)) {

Again, this is a public API method (anything JS_* is), and internally there are usually faster methods to call.  Here, I think you want obj->isExtensible() directly.

@@ +59,5 @@
> +            construct = true;
> +        }
> +    }
> +    if (construct) {
> +        obj = NewObjectWithClassProto(cx, constructorClass, proto, NULL);

I dislike the API, and in the long run I'd like to change it, but I *think* you could just use NewObjectWithClass(cx, constructorClass) and the prototype will be properly hooked up for you without having to do a big dance of passing an explicit proto around.

@@ +77,5 @@
> +}
> +
> +JSObject *
> +js_InitIntlConstructorClass(JSContext *cx, HandleObject Intl, Handle<GlobalObject *> global,
> +    JSNative constructorFunc, const char *constructorName, HandleObject proto,

Pass Handle<PropertyName*> with cx->names().name instead of const char* for constructorName.  (Add any missing strings to vm/CommonPropertyNames.h.)

@@ +98,5 @@
> +
> +    if (!JS_DefineFunctions(cx, proto, methods)) {
> +        return NULL;
> +    }
> +    

Don't leave trailing whitespace.  (And elsewhere in this patch.)

@@ +106,5 @@
> +        return NULL;
> +    }
> +
> +    if (!JS_DefineProperty(cx, Intl, constructorName, ObjectValue(*ctor),
> +                           JS_PropertyStub, JS_StrictPropertyStub, 0)) {

JS_DefineProperty is public API, better to use JSObject::defineProperty internally.

@@ +158,5 @@
> +    
> +    JSObject *global = JS_GetGlobalForObject(cx, &args.callee());
> +    RootedObject proto(cx, global->asGlobal().getOrCreateCollatorPrototype(cx));
> +    if (!proto) {
> +        return JS_FALSE;

We use true/false instead of JS_TRUE/JS_FALSE these days.

@@ +161,5 @@
> +    if (!proto) {
> +        return JS_FALSE;
> +    }
> +    return js_IntlConstructor(cx, argc, vp, js_Collator_str,
> +        &CollatorClass, proto, js_InitializeCollator_str);

Erm.  It's really not a good idea to funnel code through choke-point helper methods like this.  It makes it really hard in the long run to read them, because you have to keep N different methods in mind at once, rather than the single method being implemented.  If there really *should* be funneling, it should be solved at the spec level (although even there I might discourage it, depending on the level of dissimilarity).  Please do the steps of the one common constructor method separately for each instance of this.

@@ +165,5 @@
> +        &CollatorClass, proto, js_InitializeCollator_str);
> +}
> +
> +JSObject *
> +js_InitCollatorClass(JSContext *cx, HandleObject Intl, Handle<GlobalObject *> global)

Last I remember the consensus was on Handle<T*> with the * next to the type name in this case.

@@ +178,5 @@
> +        js_InitializeCollator_str);
> +}
> +
> +bool
> +GlobalObject::initCollatorProto(JSContext *cx, Handle<GlobalObject *> global)

Okay, this is a bit confusing, having both InitCollatorClass and initCollatorProto.  If you regularize things the way the GlobalObject.h change I ask for requests, I think this oddity will go away.

@@ +377,5 @@
> +
> +    // The constructors above need to be able to determine whether they've been
> +    // called with this being "the standard built-in Intl object". The global
> +    // object reserves slots to track standard built-in objects, but doesn't
> +    // normally keep reference to non-constructors. This makes sure there is one.

Intl's initialization code shouldn't use a JSProto_Intl triplet of slots for a constructor/prototype/property, then -- it should have a one-off reserved slot and a one-off slot for the Intl property itself, then.  I believe StopIteration sets a fair example for how to do this.  (We need to apply this treatment to JSON and Math at some point, as well, but likely that work shouldn't happen here.)

@@ +411,5 @@
> +bool
> +GlobalObject::initIntlObject(JSContext *cx, Handle<GlobalObject *> global)
> +{
> +    RootedObject Intl(cx, NewObjectWithClassProto(cx, &IntlClass, NULL, global));
> +    if (!Intl || !JSObject::setSingletonType(cx, Intl)) {

This setSingletonType call really should be in a js::InitIntlClass method along the lines of, say, the js_InitStringClass method.  (This is, again, the same change I ask for in GlobalObject.h, really.)

::: js/src/jsintl.h
@@ +1,1 @@
> +/* -*- Mode: C; tab-width: 8; indent-tabs-mode: nil; c-basic-offset: 4 -*-

Mode: C++

We're -- haltingly -- attempting to better-organize our code than just dumping everything into js/src, more often putting things into subdirectories.  VM stuff goes into vm/; parsing/scanning/etc. goes in frontend/; built-in methods and so on go in builtin/; and so on.  This stuff, then, wants to go in builtin/, as Intl.cpp/h.

@@ +4,5 @@
> + * License, v. 2.0. If a copy of the MPL was not distributed with this
> + * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
> +
> +#ifndef jsintl_h___
> +#define jsintl_h___

Headers should #include or forward-declare their dependencies.  JSObject and JSContext you can just forward-declare these structs.  HandleObject, however, you can't forward-declare, because it's actually a typedef, so you want this inside the include guards:

  #include "gc/Root.h"

  struct JSObject;
  struct JSContext;

@@ +12,5 @@
> + * ECMAScript Internationalization API Specification.
> + */
> +
> +extern JSObject *
> +js_InitIntlClass(JSContext *cx, js::HandleObject obj);

We're trying to get away from having anything named js_*.  The new style is to place the method in the "js" namespace (obviously no prefix needed then).

::: js/src/vm/GlobalObject.h
@@ +314,5 @@
>      }
>  
> +    JSObject *getOrCreateIntlObject(JSContext *cx) {
> +        return getOrCreateObject(cx, JSProto_Intl, initIntlObject);
> +    }

These should be using the pattern followed by all the other getOrCreate* methods -- checking the slot, then calling an InitIntlClass(cx, self) method, then returning the initialized prototype.  (The repetition here is something we'll be getting rid of when we make the bootstrapping code more unified and less crazy.  In the short run nobody will be touching this stuff, so don't worry about the repetition.)

@@ +332,5 @@
>      JSObject *getIteratorPrototype() {
>          return &getPrototype(JSProto_Iterator).toObject();
>      }
>  
> +JSObject *getIntrinsicsHolder() {

Errant change.
Attachment #666403 - Flags: review?(jwalden+bmo)
(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #17)
> Comment on attachment 666403 [details] [diff] [review]
> Add C++ core of Intl object and constructors Collator, NumberFormat,
> DateTimeFormat.
> 
> Review of attachment 666403 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Okay, I think this covers first-pass comments.  A bunch of style nits for
> starters, some do-this-with-this-other-API things, then the big issue is not
> trying to share code amongst the different class initialization methods (and
> initializing with the same pattern all the other builtin classes use).
> 
> Sorry for the delay here, partly it's just that it's kind of draining to
> write out review comments that might be interpreted as excessive pedantry. 
> :-\

Thanks for the thorough comments! Pedantic is good in code reviews. In particular thanks for the pointers to better API to use - I sometimes had a hard time finding the right function, and it happens that the JSAPI is somewhat better documented than the internals.

I've made most of the requested changes, but have questions or comments on a few below.

> ::: js/src/jsintl.cpp
> @@ +35,5 @@
> > +
> > +bool
> > +js_IntlConstructor(JSContext *cx, unsigned argc, Value *vp,
> > +    const char *constructorName, Class *constructorClass, HandleObject proto,
> > +    const char *initializerName)
> 
> As a general rule, inside the engine you shouldn't pass around const char*
> parameters unless your API requires it.  (For example, throwing an exception
> whose message has {} parameters in it requires a const char*.)  Here,
> nothing does -- and in fact, the use of initializerName needs a
> Handle<PropertyName*>, and there's no reason you can't have callers pass
> that, so you should use that type.

initializerName isn't used yet in this patch, but the next patch uses JSRuntime::getSelfHostedFunction, which does take a const char *. Should that also change?

> @@ +47,5 @@
> > +        JSObject *global = JS_GetGlobalForObject(cx, &args.callee());
> > +        JSObject *intl = global->asGlobal().getOrCreateIntlObject(cx);
> > +        if (!intl) {
> > +            return false;
> > +        }
> 
> Don't brace single-line ifs.  (This applies throughout the patch.)

After 15 years of following style guides that require the braces, this looks totally wrong to me, but if you insist...

> @@ +59,5 @@
> > +            construct = true;
> > +        }
> > +    }
> > +    if (construct) {
> > +        obj = NewObjectWithClassProto(cx, constructorClass, proto, NULL);
> 
> I dislike the API, and in the long run I'd like to change it, but I *think*
> you could just use NewObjectWithClass(cx, constructorClass) and the
> prototype will be properly hooked up for you without having to do a big
> dance of passing an explicit proto around.

I can't find NewObjectWithClass - where is it? And with which proto would it hook up my classes? Note that the three constructors using this path are not properties of the global object, so the standard path of having a reserved slot triplet doesn't work for them (I tried in a previous patch (attachment 656994 [details] [diff] [review]), but the cycle garbage collector didn't like it at all).

> @@ +161,5 @@
> > +    if (!proto) {
> > +        return JS_FALSE;
> > +    }
> > +    return js_IntlConstructor(cx, argc, vp, js_Collator_str,
> > +        &CollatorClass, proto, js_InitializeCollator_str);
> 
> Erm.  It's really not a good idea to funnel code through choke-point helper
> methods like this.  It makes it really hard in the long run to read them,
> because you have to keep N different methods in mind at once, rather than
> the single method being implemented.  If there really *should* be funneling,
> it should be solved at the spec level (although even there I might
> discourage it, depending on the level of dissimilarity).  Please do the
> steps of the one common constructor method separately for each instance of
> this.

I'd think cutting these 50 lines and pasting them in three times would only make maintenance more difficult - now every change has to be made three times. Is that really better? The spec is quite a bit shorter, and the limitations of the language we use in the spec makes it more difficult to break out code sections, otherwise I'd have done it there too. You may have noticed big sections of shared "abstract operations"...

> @@ +165,5 @@
> > +        &CollatorClass, proto, js_InitializeCollator_str);
> > +}
> > +
> > +JSObject *
> > +js_InitCollatorClass(JSContext *cx, HandleObject Intl, Handle<GlobalObject *> global)
> 
> Last I remember the consensus was on Handle<T*> with the * next to the type
> name in this case.

Might be good to document this in the style guidelines - grepping through the sources showed no consensus.

> @@ +178,5 @@
> > +        js_InitializeCollator_str);
> > +}
> > +
> > +bool
> > +GlobalObject::initCollatorProto(JSContext *cx, Handle<GlobalObject *> global)
> 
> Okay, this is a bit confusing, having both InitCollatorClass and
> initCollatorProto.  If you regularize things the way the GlobalObject.h
> change I ask for requests, I think this oddity will go away.

I'm afraid I need both. InitCollatorClass is called by js_InitIntlClass, which needs to set up Collator & Co. as its properties, but doesn't care about their prototypes. Later on, prototypes are needed when constructing new objects, so they (not the constructors) need to be available via slots. I don't see how I can merge the two.

> @@ +377,5 @@
> > +
> > +    // The constructors above need to be able to determine whether they've been
> > +    // called with this being "the standard built-in Intl object". The global
> > +    // object reserves slots to track standard built-in objects, but doesn't
> > +    // normally keep reference to non-constructors. This makes sure there is one.
> 
> Intl's initialization code shouldn't use a JSProto_Intl triplet of slots for
> a constructor/prototype/property, then -- it should have a one-off reserved
> slot and a one-off slot for the Intl property itself, then.  I believe
> StopIteration sets a fair example for how to do this.  (We need to apply
> this treatment to JSON and Math at some point, as well, but likely that work
> shouldn't happen here.)

StopIteration is listed in jsprototypes.h, which I think will result in giving it a slot triplet. Did you mean a different class that I can use as a model? (And applying the treatment to Math would be awesome, because that's what I used as my model.)

> @@ +411,5 @@
> > +bool
> > +GlobalObject::initIntlObject(JSContext *cx, Handle<GlobalObject *> global)
> > +{
> > +    RootedObject Intl(cx, NewObjectWithClassProto(cx, &IntlClass, NULL, global));
> > +    if (!Intl || !JSObject::setSingletonType(cx, Intl)) {
> 
> This setSingletonType call really should be in a js::InitIntlClass method
> along the lines of, say, the js_InitStringClass method.  (This is, again,
> the same change I ask for in GlobalObject.h, really.)

js_InitStringClass & Co register their constructor and prototype via DefineConstructorAndPrototype, which assume the reserved triplet. I haven't found a good way to set them other than through planting a function in GlobalObject. Suggestions?

> ::: js/src/jsintl.h
> @@ +12,5 @@
> > + * ECMAScript Internationalization API Specification.
> > + */
> > +
> > +extern JSObject *
> > +js_InitIntlClass(JSContext *cx, js::HandleObject obj);
> 
> We're trying to get away from having anything named js_*.  The new style is
> to place the method in the "js" namespace (obviously no prefix needed then).

I tried that, but together with other changes you suggested it caused some trouble: With js::InitIntlClass in jsprototypes.h, the compiler complained that it didn't know anything about that function. When I then added an include for Intl.h, I got flooded with errors because Root.h doesn't declare its dependencies in the way you recommend. If I can remove js_InitIntlClass from jsprototypes.h, then that problem might go away; otherwise I'd prefer to let some other object be the pioneer.

> ::: js/src/vm/GlobalObject.h
> @@ +314,5 @@
> >      }
> >  
> > +    JSObject *getOrCreateIntlObject(JSContext *cx) {
> > +        return getOrCreateObject(cx, JSProto_Intl, initIntlObject);
> > +    }
> 
> These should be using the pattern followed by all the other getOrCreate*
> methods -- checking the slot, then calling an InitIntlClass(cx, self)
> method, then returning the initialized prototype.  (The repetition here is
> something we'll be getting rid of when we make the bootstrapping code more
> unified and less crazy.  In the short run nobody will be touching this
> stuff, so don't worry about the repetition.)

I was following the model of getOrCreateIteratorPrototype & Co. - are you saying, they got it wrong?
(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #17)
> Comment on attachment 666403 [details] [diff] [review]
> Add C++ core of Intl object and constructors Collator, NumberFormat,
> DateTimeFormat.
> 
> Review of attachment 666403 [details] [diff] [review]:
> -----------------------------------------------------------------
> ::: js/src/jsintl.cpp
> @@ +377,5 @@
> > +
> > +    // The constructors above need to be able to determine whether they've been
> > +    // called with this being "the standard built-in Intl object". The global
> > +    // object reserves slots to track standard built-in objects, but doesn't
> > +    // normally keep reference to non-constructors. This makes sure there is one.
> 
> Intl's initialization code shouldn't use a JSProto_Intl triplet of slots for
> a constructor/prototype/property, then -- it should have a one-off reserved
> slot and a one-off slot for the Intl property itself, then.  I believe
> StopIteration sets a fair example for how to do this.  (We need to apply
> this treatment to JSON and Math at some point, as well, but likely that work
> shouldn't happen here.)

I tried removing Intl from jsprototypes.h, and got things to work in the JS Shell without major issues. In Firefox, however, Intl can't be found anymore. I guess that's because it's supposed to be initialized lazily, but without the entry in jsprototypes.h js_GetClassObject doesn't know how to initialize it. js_GetClassObject can't be hacked up because it's based on JSProtoKey, and Intl doesn't have one anymore, so I guess I'd have to insert knowledge about Intl into a JS_ResolveOp implementation for GlobalObject?
(In reply to Norbert Lindenberg from comment #19)
Ugh, yes, we have to live with JSProto_Intl for now.  Definitely don't be going the resolve-hook route!

(In reply to Norbert Lindenberg from comment #18)
> initializerName isn't used yet in this patch, but the next patch uses
> JSRuntime::getSelfHostedFunction, which does take a const char *. Should
> that also change?

Yes, ideally as a patch underneath the changes here.  Conceptually separate changes are best reviewed separately, to minimize distinct concerns to be juggled when reviewing.  Plus they're reviewed quicker, too -- win-win for all!

> I can't find NewObjectWithClass - where is it? And with which proto would it
> hook up my classes? Note that[...]

Sigh.  Our object creation APIs internally have changed enough over the last few years that I'm probably thinking of something since changed/removed.  But "Note that" tells me that whatever I was thinking of wouldn't work anyway because of the triplet dependency.  Maybe this *is* the right thing.  Sigh**2.

> I'd think cutting these 50 lines and pasting them in three times would only
> make maintenance more difficult - now every change has to be made three
> times. Is that really better?

Once the spec's finalized (looks like maybe it is if I found the right webpage?), there won't be substantive changes here.  There might be broader engine-wide refactoring/changes that would touch this, but a few more or less such places doesn't meaningfully change the difficulty of making them.  But many people will end up reading this over time to learn what each constructor does.  Put simply, maintenance isn't the concern: it's readability.  And right now there's enough entanglement that it's really hard to think about how just one of the constructors works.  Nobody cares how all three work *at once*, after all.

> The spec is quite a bit shorter, and the
> limitations of the language we use in the spec makes it more difficult to
> break out code sections, otherwise I'd have done it there too. You may have
> noticed big sections of shared "abstract operations"...

Abstract operations are fine, the spec should have more of them.  But we shouldn't invent our own ones when the spec doesn't provide them.  Here, taking a look at how all these functions are supposed to work, it'll be much clearer, and easier to see correspondence to spec steps, if each constructor function is implemented separately, referring to the appropriate steps in {10,11,12}.1.{2,3}.

> > @@ +178,5 @@
> > > +        js_InitializeCollator_str);
> > > +}
> > > +
> > > +bool
> > > +GlobalObject::initCollatorProto(JSContext *cx, Handle<GlobalObject *> global)
> > 
> > Okay, this is a bit confusing, having both InitCollatorClass and
> > initCollatorProto.  If you regularize things the way the GlobalObject.h
> > change I ask for requests, I think this oddity will go away.
> 
> I'm afraid I need both. InitCollatorClass is called by js_InitIntlClass,
> which needs to set up Collator & Co. as its properties, but doesn't care
> about their prototypes. Later on, prototypes are needed when constructing
> new objects, so they (not the constructors) need to be available via slots.
> I don't see how I can merge the two.

Hmm.  So you're basically trying to take laziness to the nth degree, by having Intl be lazy separately from the nested classes, even.  Given that we have a mid-term-ish goal of eliminating laziness entirely from global-object bootstrapping, and given that it really doesn't take much time or space to initialize Intl fully, I don't think we need to go to this much effort.  Just have js_InitIntlClass that creates the Intl object, then have js_InitCollatorClass that creates and defines the Collator constructor upon it (and stashes away Intl.Collator.prototype), the same for NumberFormat, and the same for DateTimeFormat.  Have js_InitIntlClass call each of these in succession, then define the global.Intl property, then do the MarkStandardClassInitializedNoProto bit.  All the Intl stuff is really one cohesive unit.

> StopIteration is listed in jsprototypes.h, which I think will result in
> giving it a slot triplet. Did you mean a different class that I can use as a
> model? (And applying the treatment to Math would be awesome, because that's
> what I used as my model.)

Oops, I forgot that you of course need to have global.Intl.  Given that, I guess a triplet and MarkStandardClassInitializedNoProto is the only thing, for now.

> > @@ +411,5 @@
> > > +bool
> > > +GlobalObject::initIntlObject(JSContext *cx, Handle<GlobalObject *> global)
> > > +{
> > > +    RootedObject Intl(cx, NewObjectWithClassProto(cx, &IntlClass, NULL, global));
> > > +    if (!Intl || !JSObject::setSingletonType(cx, Intl)) {
> > 
> > This setSingletonType call really should be in a js::InitIntlClass method
> > along the lines of, say, the js_InitStringClass method.  (This is, again,
> > the same change I ask for in GlobalObject.h, really.)
> 
> js_InitStringClass & Co register their constructor and prototype via
> DefineConstructorAndPrototype, which assume the reserved triplet. I haven't
> found a good way to set them other than through planting a function in
> GlobalObject. Suggestions?

What I meant is that js_InitIntlClass should produce and return the Intl object, it should define the Intl property, and it should do the MarkBlahBlahBlah bit.  js_InitMathClass, not js_InitStringClass, is the best example for all this.  Sorry for the confusion!

> > We're trying to get away from having anything named js_*.  The new style is
> > to place the method in the "js" namespace (obviously no prefix needed then).
> 
> I tried that, but together with other changes you suggested it caused some
> trouble: With js::InitIntlClass in jsprototypes.h[...]

Hmm, right, the prototype-init functions have to be this way because of the standard class initialization list in jsapi.cpp, for standard_class_atoms.  (Which itself is an abomination that should be removed at some point...)  For most other functions, tho, I can't think of constraints on using js:: over js_.  Leave this one as it was, yes.

> > ::: js/src/vm/GlobalObject.h
> > @@ +314,5 @@
> > >      }
> > >  
> > > +    JSObject *getOrCreateIntlObject(JSContext *cx) {
> > > +        return getOrCreateObject(cx, JSProto_Intl, initIntlObject);
> > > +    }
> > 
> > These should be using the pattern followed by all the other getOrCreate*
> > methods -- checking the slot, then calling an InitIntlClass(cx, self)
> > method, then returning the initialized prototype.  (The repetition here is
> > something we'll be getting rid of when we make the bootstrapping code more
> > unified and less crazy.  In the short run nobody will be touching this
> > stuff, so don't worry about the repetition.)
> 
> I was following the model of getOrCreateIteratorPrototype & Co. - are you
> saying, they got it wrong?

The way they do it is different, definitely.  It shares a little code, but it's verbose and therefore confusing in other ways.  :-(  Please do this similar to how getOrCreateRegExpPrototype does it.  Since one init method will init all these, tho (after you make js_InitIntlClass also create Intl.NumberFormatter.prototype and all those), I'm fine with this minor code-sharing:

    JSObject *getOrCreateInternationalizationPrototype(JSContext *cx, unsigned slot) {
        JS_ASSERT(slot == COLLATOR_PROTO || slot == NUMBER_FORMAT_PROTO || slot == DATE_TIME_FORMAT_PROTO);
        if (intlClassesInitialized())
            return &getReservedSlot(slot).toObject();
        Rooted<GlobalObject*> self(cx, this);
        if (!js_InitIntlClasses(cx, self))
            return NULL;
        return &self->getReservedSlot(slot).toObject();
    }

    JSObject *getOrCreateCollatorPrototype(JSContext *cx) {
        return getOrCreateInternationalizationPrototype(cx, COLLATOR_PROTO);
    }
    ...

Sorry for the delay responding to earlier comments, things kept coming up.  :-(
Updated based on Jeff's comments. Not included here are the requested changes around the use of global object slots - see the following patch for that.

Try run: https://tbpl.mozilla.org/?tree=Try&rev=2bdfb8ffef2b
Attachment #666403 - Attachment is obsolete: true
Attachment #673758 - Flags: review?(jwalden+bmo)
This patch changes the access to the global object slots for Intl module objects as proposed by Jeff. It includes setters for the global object slots because the slot IDs used are not public and probably shouldn't be, and because for the Intl object two slots have to be updated in sync in order to pass an assertion.

Neither this nor the previous solution is particularly elegant, but it makes me uncomfortable that this patch introduces a different solution for the same problem. Jason actually tried this new solution in a draft for bug 725909, but then went the other way for that bug and for bug 725907. I'd suggest that Jeff and Jason discuss this and come up with a common solution.
Attachment #673759 - Flags: review?(jwalden+bmo)
Attachment #673759 - Flags: review?(jorendorff)
As requested by Jeff above. I went with HandleAtom rather than Handle<PropertyName*> because Atomize and AtomToId both use JSAtom*.

Try run: https://tbpl.mozilla.org/?tree=Try&rev=34d70c4729ca
Attachment #673760 - Flags: review?(tschneidereit)
Attachment #673760 - Flags: review?(jwalden+bmo)
Comment on attachment 673760 [details] [diff] [review]
Use JSAtom* instead of char* for getSelfHostedFunction argument.

Review of attachment 673760 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good to me, should have done that in the first place.
Attachment #673760 - Flags: review?(tschneidereit) → review+
Review ping.

Waldo, could you at least take a quick look at attachment 673760 [details] [diff] [review]? It's a fairly simple change that I'd like to have for other changes, too.
Comment on attachment 673758 [details] [diff] [review]
Add C++ core of Intl object and constructors Collator, NumberFormat, DateTimeFormat.

Review of attachment 673758 [details] [diff] [review]:
-----------------------------------------------------------------

There's just faintly too much remaining here for me to + it without seeing the diff between this patch and what would land, alas.  :-\  Any chance you could attach an interdiff of the changes for review the next time around?  Would probably save some time to just review that, rather than skim (however quickly) the entire thing.

::: js/src/builtin/Intl.cpp
@@ +56,5 @@
> +#endif
> +
> +static JSFunctionSpec collator_static_methods[] = {
> +    JS_FS_END
> +};

You don't need this -- you can just pass NULL for the relevant argument.  Same for any JSPropertySpec where you might be doing the same thing.

I guess, since we have three relatively-similar algorithms here, all the comments on this file apply three times, roughly.

@@ +82,5 @@
> +        // 10.1.2.1 step 3
> +        JSObject *intl = cx->global()->getOrCreateIntlObject(cx);
> +        if (!intl) {
> +            return false;
> +        }

No bracing.

@@ +83,5 @@
> +        JSObject *intl = cx->global()->getOrCreateIntlObject(cx);
> +        if (!intl) {
> +            return false;
> +        }
> +        Value self = args.thisv();

You can pass arbitrary |this| to any method:

  Intl.Collator.call(17);

So you need to check that |self.isObject()| before trying to call |self.toObject()| below.  Please add a test somewhere (tc39ish test suite?) that checks behavior on the above expression, and similar ones passing a string primitive, boolean primitive, and so on.

@@ +84,5 @@
> +        if (!intl) {
> +            return false;
> +        }
> +        Value self = args.thisv();
> +        if (!self.isUndefined() && &self.toObject() != intl) {

It might be faintly prettier to have |self.toObject() != *intl|, to avoid |&& &| reading strangely.  (There's an operator==(const JSObject&, const JSObject&) that checks pointer equality.)

What should happen for this case:

  Intl.Collator.call(otherGlobal.Intl);

|this| isn't the standard built-in Intl object for the global.  But those ECMAScript operations that distinguish based on [[Class]] will treat String objects from whatever global identically (for example, in the JSON stringifying algorithms).  Should this then treat any Intl object identically?

We can go with what you have for now, at least, but that underlying issue needs to be addressed in specs at some point.  Maybe when multiple globals are actually addressed somewhere; it's been pretty obvious thus far what's sensible, to accommodate them, but the rules have never been fully written down.

@@ +86,5 @@
> +        }
> +        Value self = args.thisv();
> +        if (!self.isUndefined() && &self.toObject() != intl) {
> +            // 10.1.2.1 step 4
> +            obj = &self.toObject();

|self.toObject()| isn't the same as ToObject.  The former extracts the JSObject* from a value.  The latter is the spec method that converts a value into a suitable object, or throws.  In SpiderMonkey the latter is spelled like so, if memory serves:

    Rooted<JSObject*> obj(cx, ToObject(cx, self));
    if (!obj)
        return false;
    ...

@@ +100,5 @@
> +        // 10.1.3.1 paragraph 2
> +        RootedObject proto(cx, cx->global()->getOrCreateCollatorPrototype(cx));
> +        if (!proto)
> +            return false;
> +        obj = NewObjectWithClassProto(cx, &CollatorClass, proto, NULL);

This should be NewObjectWithGivenProto(cx, &CollatorClass, proto, cx->global()).

@@ +110,5 @@
> +    RootedValue locales(cx, args.length() > 0 ? args[0] : UndefinedValue());
> +    RootedValue options(cx, args.length() > 1 ? args[1] : UndefinedValue());
> +    // 10.1.2.1 step 6; 10.1.3.1 step 3
> +    RootedAtom initializer(cx, Atomize(cx, js_InitializeCollator_str,
> +                                       strlen(js_InitializeCollator_str)));

Add this to vm/CommonPropertyNames.h and use cx->names().InitializeCollator, if you need this.  Generally it's a mistake to be using raw C strings inside SpiderMonkey; existing APIs that take C strings, we're slowly changing.

@@ +111,5 @@
> +    RootedValue options(cx, args.length() > 1 ? args[1] : UndefinedValue());
> +    // 10.1.2.1 step 6; 10.1.3.1 step 3
> +    RootedAtom initializer(cx, Atomize(cx, js_InitializeCollator_str,
> +                                       strlen(js_InitializeCollator_str)));
> +    if (!IntlInitialize(cx, obj, initializer, locales, options))

Hmm.  If this matches step 6, shouldn't it be named InitializeCollator?  Why does it need to take a string argument, too?  The spec method doesn't.

@@ +152,5 @@
> +    // 8.1
> +    RootedValue ctorValue(cx, ObjectValue(*ctor));
> +    if (!JSObject::defineProperty(cx, Intl, cx->names().Collator, ctorValue,
> +                           JS_PropertyStub, JS_StrictPropertyStub, 0))
> +        return NULL;

JS_PropertyStub should be aligned with |cx|, and this should be braced, because the condition-expression covers more than one line.  (A bit verbose, sure; I'm working on slimming down these method names a little, which should help with this somewhat.)

@@ +502,5 @@
> +
> +    if (!JS_DefineFunctions(cx, Intl, intl_static_methods))
> +        return NULL;
> +
> +    MarkStandardClassInitializedNoProto(global, &IntlClass);

I'm pretty sure a lot of our bootstrapping code isn't very error-resilient, so perhaps it doesn't matter, but could you put this at the end of the method, to preserve some semblance of resiliency here?

::: js/src/builtin/Intl.h
@@ +9,5 @@
> +
> +#include "gc/Root.h"
> +
> +struct JSObject;
> +struct JSContext;

Nitpicky, but let's alphabetize the forward-declarations.

::: js/src/jsobj.cpp
@@ +44,5 @@
>  #include "jswatchpoint.h"
>  #include "jswrapper.h"
>  #include "jsxml.h"
>  
> +#include "builtin/Intl.h"

Hmm.  This wouldn't seem to be necessary, due to the DECLARE_PROTOTYPE_CLASS_INIT macro use in this file.  I guess this is necessary because that macro declares methods that take (JSContext*, JSObject*), not (JSContext*, Handle<JSObject*>).  If you change that macro to declare with the latter signature, is this #include necessary?  Would be nice to remove this, if there's only semi-implicit dependencies on it.
Attachment #673758 - Flags: review?(jwalden+bmo) → review-
Comment on attachment 673759 [details] [diff] [review]
Change access to global object slots for Intl module objects.

Review of attachment 673759 [details] [diff] [review]:
-----------------------------------------------------------------

This still smells a bit funny, but really it's the whole lazy bootstrapping thing smelling funny, so this is reasonable for now.

::: js/src/builtin/Intl.cpp
@@ +131,3 @@
>      if (!proto)
>          return NULL;
> +    global->setCollatorPrototype(proto);

Hmm.  So each one of the sub-initializing methods stores to the relevant global prototype slot immediately.  That seems a little bit funky, for purposes of atomicity and all that.  What about if you made these methods like so:

static bool
InitFooClass(JSContext *cx, HandleObject Intl, Handle<GlobalObject*> global, MutableHandle<JSFunction*> ctor, MutableHandle<JSObject*> proto)

and then had them pass out the ctor/proto they created?  Then, at the place where MarkStandardClassInitializedNoProto is called, all the global slots could be set, nicely and atomically.
Attachment #673759 - Flags: review?(jwalden+bmo) → review+
Comment on attachment 673760 [details] [diff] [review]
Use JSAtom* instead of char* for getSelfHostedFunction argument.

Review of attachment 673760 [details] [diff] [review]:
-----------------------------------------------------------------

I see why you'd use atom rather than propertyname, for the Atomize bits here.  But really, we should be using PropertyName for this, and not JSAtom.  See below.

::: js/src/jsapi.cpp
@@ +5019,5 @@
> +        RootedAtom selfHostedAtom(cx);
> +        if (fs->selfHostedName) {
> +            selfHostedAtom = Atomize(cx, fs->selfHostedName, strlen(fs->selfHostedName));
> +            if (!selfHostedAtom)
> +                return JS_FALSE;

Hmm.  Could you use PropertyName rather than Atom, and address this particular case using asPropertyName()?  Given that it's syntactically impossible to call a self-hosted function whose name is not a PropertyName -- for that to be the case, it'd have to be named like a number in the [0, 2**32 - 1] range -- we really should be using PropertyName here.
Attachment #673760 - Flags: review?(jwalden+bmo) → review-
Changed to use Handle<PropertyName*> as requested by Jeff.

Try run: https://tbpl.mozilla.org/?tree=Try&rev=39f56c378fb2
Attachment #673760 - Attachment is obsolete: true
Attachment #678176 - Flags: review?(tschneidereit)
Attachment #678176 - Flags: review?(jwalden+bmo)
Comment on attachment 678176 [details] [diff] [review]
Use Handle<PropertyName*> instead of char* for getSelfHostedFunction argument

Review of attachment 678176 [details] [diff] [review]:
-----------------------------------------------------------------

Makes sense, yes.

::: js/src/jsapi.cpp
@@ +5011,5 @@
>           */
>          if (fs->selfHostedName && cx->runtime->isSelfHostedGlobal(cx->global()))
>              return JS_TRUE;
>  
> +        Rooted<PropertyName*> selfHostedPropertyName(cx);

Nit: you can also use `RootedPropertyName`

::: js/src/jscntxt.cpp
@@ +377,5 @@
>      MarkObjectRoot(trc, &selfHostedGlobal_, "self-hosting global");
>  }
>  
>  JSFunction *
> +JSRuntime::getSelfHostedFunction(JSContext *cx, Handle<PropertyName*> name)

... and HandlePropertyName here. I don't think it matters too much, though.
Attachment #678176 - Flags: review?(tschneidereit) → review+
(In reply to Till Schneidereit [:till] from comment #32)

> Nit: you can also use `RootedPropertyName`

> ... and HandlePropertyName here. I don't think it matters too much, though.

I didn't know these existed - I thought the ones in Root.h are all we get.

Jeff, OK to use them?
Comment on attachment 678176 [details] [diff] [review]
Use Handle<PropertyName*> instead of char* for getSelfHostedFunction argument

Review of attachment 678176 [details] [diff] [review]:
-----------------------------------------------------------------

We're getting rid of RootedFoo and HandleFoo in favor of Rooted<> and Handle<>...soonish.  I'm not sure I can bring myself to care much what you use in the interim; use the typedefs, then, if Till is suggesting them.

::: js/src/jscntxt.cpp
@@ +382,3 @@
>  {
>      RootedObject holder(cx, cx->global()->getIntrinsicsHolder());
> +    RootedId id(cx, AtomToId(name));

NameToId is, I believe, slightly more efficient.

We should propagate PropertyName further down the stack into cloneSelfHostedValue, but this is a start, at least.  Followup bug for this?
Attachment #678176 - Flags: review?(jwalden+bmo) → review+
(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #34)
> Comment on attachment 678176 [details] [diff] [review]
> Use Handle<PropertyName*> instead of char* for getSelfHostedFunction argument
> 
> Review of attachment 678176 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> We're getting rid of RootedFoo and HandleFoo in favor of Rooted<> and
> Handle<>...soonish.  I'm not sure I can bring myself to care much what you
> use in the interim; use the typedefs, then, if Till is suggesting them.

Oh, I didn't know that. In light of this, by all means, don't use them.

> 
> ::: js/src/jscntxt.cpp
> @@ +382,3 @@
> >  {
> >      RootedObject holder(cx, cx->global()->getIntrinsicsHolder());
> > +    RootedId id(cx, AtomToId(name));
> 
> NameToId is, I believe, slightly more efficient.
> 
> We should propagate PropertyName further down the stack into
> cloneSelfHostedValue, but this is a start, at least.  Followup bug for this?

I thought about that, too, but thought that keeping the symmetry with the other "ById" functions is worthwhile. But again, I probably don't know enough about the situation regarding jsids.
Property splitting will mean property entry points will be either element-based, or propertyname-based, with jsid versions going away.  So it's not good to keep symmetry with the ById versions, because I'm working on removing them.  :-)
Updated to use NameToId instead of AtomToId. Carrying r+till,waldo.
Attachment #678176 - Attachment is obsolete: true
Attachment #678450 - Flags: review+
Attachment #678450 - Flags: checkin?(tschneidereit)
This patch contains changes to address the review in comment #28. I couldn't find interdiff for the Mac, so I created this separate patch which will be folded into the patch in attachment 673758 [details] [diff] [review] for checkin. In reviewing, please consider whether the combination of attachment 673758 [details] [diff] [review] and this patch is ready for checkin.

(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #28)

Comments without a reply here have been addressed by code changes.

> ::: js/src/builtin/Intl.cpp
> @@ +56,5 @@
> > +#endif
> > +
> > +static JSFunctionSpec collator_static_methods[] = {
> > +    JS_FS_END
> > +};
> 
> You don't need this -- you can just pass NULL for the relevant argument. 
> Same for any JSPropertySpec where you might be doing the same thing.

It currently looks useless, but the next big patch adds the supportedLocalesOf method here.

> I guess, since we have three relatively-similar algorithms here, all the
> comments on this file apply three times, roughly.

Yep, especially since you made me make three copies :-;

> @@ +83,5 @@
> > +        JSObject *intl = cx->global()->getOrCreateIntlObject(cx);
> > +        if (!intl) {
> > +            return false;
> > +        }
> > +        Value self = args.thisv();
> 
> You can pass arbitrary |this| to any method:
> 
>   Intl.Collator.call(17);
> 
> So you need to check that |self.isObject()| before trying to call
> |self.toObject()| below.  Please add a test somewhere (tc39ish test suite?)
> that checks behavior on the above expression, and similar ones passing a
> string primitive, boolean primitive, and so on.

Good catch. I added tests to the test402 test suite:
http://hg.ecmascript.org/tests/test262/rev/53c4ade82d14

> @@ +84,5 @@
> > +        if (!intl) {
> > +            return false;
> > +        }
> > +        Value self = args.thisv();
> > +        if (!self.isUndefined() && &self.toObject() != intl) {
> 
> What should happen for this case:
> 
>   Intl.Collator.call(otherGlobal.Intl);
> 
> |this| isn't the standard built-in Intl object for the global.  But those
> ECMAScript operations that distinguish based on [[Class]] will treat String
> objects from whatever global identically (for example, in the JSON
> stringifying algorithms).  Should this then treat any Intl object
> identically?

The check for Intl was added to the spec because applications might call "Intl.Collator(…)" without new, in which case Collator will receive the Intl object as the this value. Initializing the Intl object as a collator is unlikely to be what the developer meant, so we treat it as if Collator had been called with an undefined this value. If someone goes to the trouble of getting a different global object and its Intl object and writing the code you have above, I guess they really mean it...

> @@ +111,5 @@
> > +    RootedValue options(cx, args.length() > 1 ? args[1] : UndefinedValue());
> > +    // 10.1.2.1 step 6; 10.1.3.1 step 3
> > +    RootedAtom initializer(cx, Atomize(cx, js_InitializeCollator_str,
> > +                                       strlen(js_InitializeCollator_str)));
> > +    if (!IntlInitialize(cx, obj, initializer, locales, options))
> 
> Hmm.  If this matches step 6, shouldn't it be named InitializeCollator?  Why
> does it need to take a string argument, too?  The spec method doesn't.

The real InitializeCollator will be implemented in JavaScript; IntlInitialize is a helper function to find and call the appropriate JavaScript function.
Attachment #678549 - Flags: review?(jwalden+bmo)
Comment on attachment 678549 [details] [diff] [review]
Add C++ core of Intl object and constructors Collator, NumberFormat, DateTimeFormat - review

Review of attachment 678549 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good.

As for interdiff, I kind of did mean the incremental patch, so this works great.  :-)  Bugzilla has the ability to create interdiffs between two patches in the same bug, but it's kind of buggy and doesn't always work, particularly if too much has changed to the affected files in the meantime (which wouldn't be too much of an issue here, but still).

::: js/src/vm/CommonPropertyNames.h
@@ +79,5 @@
>      macro(ignoreCase, ignoreCase, "ignoreCase") \
>      macro(index, index, "index") \
> +    macro(InitializeCollator, InitializeCollator, "intl_InitializeCollator") \
> +    macro(InitializeNumberFormat, InitializeNumberFormat, "intl_InitializeNumberFormat") \
> +    macro(InitializeDateTimeFormat, InitializeDateTimeFormat, "intl_InitializeDateTimeFormat") \

The DateTimeFormat line should be second, not third.
Attachment #678549 - Flags: review?(jwalden+bmo) → review+
Comment on attachment 678450 [details] [diff] [review]
Use Handle<PropertyName*> instead of char* for getSelfHostedFunction argument

https://hg.mozilla.org/integration/mozilla-inbound/rev/34cd9abab64f
Attachment #678450 - Flags: checkin?(tschneidereit) → checkin+
This is the combination of attachment 673758 [details] [diff] [review] and attachment 678549 [details] [diff] [review], plus the one change requested in comment 39. Carrying r+waldo.
Attachment #673758 - Attachment is obsolete: true
Attachment #678549 - Attachment is obsolete: true
Attachment #678978 - Flags: review+
Attachment #678978 - Flags: checkin?(jwalden+bmo)
Uses ENABLE_INTL_API to control whether js_InitIntlClass is called.

One potential issue: There's one other reference to that function, in jsprototypes.h. I found one macro in jsobj.cpp that enters the functions from jsprototypes.h into a table lazy_prototype_init, which is then used in js_GetClassObject. I'm not sure under what conditions this function is called, but none of the other entries in jsprototypes.h is conditional (not even on JS_HAS_XML_SUPPORT and JS_HAS_GENERATORS), so it's probably harmless.
Attachment #678982 - Flags: review?(jwalden+bmo)
Comment on attachment 678982 [details] [diff] [review]
Disable initialization of Intl object and its constructors while functionality is incomplete.

Review of attachment 678982 [details] [diff] [review]:
-----------------------------------------------------------------

Just double-checked this against attachment 678978 [details] [diff] [review], looks like I/we did remember all the relevant stuff-addition places.  I assume you want me pushing this patch too, right?  (I'll probably unify the two so there's not a weird intermediate bisection point.)
Attachment #678982 - Flags: review?(jwalden+bmo) → review+
Regarding js_GetClassObject, I don't think we need to worry about it -- only if someone did NewObject(..., &IntlClass, ...) or similar, which won't happen for a bit, certainly.
Erm, no, you said js_GetClassObject -- that's only for JS_GetClassObject(..., &IntlClass, ...), which won't happen unless someone's being deliberately obnoxious.
https://hg.mozilla.org/mozilla-central/rev/34cd9abab64f
Status: NEW → RESOLVED
Closed: 12 years ago
Flags: in-testsuite?
Resolution: --- → FIXED
Target Milestone: --- → mozilla19
Please put [leave open] on the whiteboard when you've got things that still need to land, otherwise our merge scripts will resolve the bug.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Whiteboard: [js:t] [bcp47] → [js:t] [bcp47] [leave open]
Comment on attachment 678982 [details] [diff] [review]
Disable initialization of Intl object and its constructors while functionality is incomplete.

(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #43)
> I assume you want me pushing this patch too, right?  (I'll probably
> unify the two so there's not a weird intermediate bisection point.)

Yes, please. I thought I'd keep the two patches separate because the fate of this one is to be backed out eventually, but if you think merging them works better for Mozilla processes, feel free to do so.
Attachment #678982 - Flags: checkin?(jwalden+bmo)
Status: REOPENED → ASSIGNED
Pushed both patches, as one, with a little bit of whitespace tweaking I noticed when doing a last once-over before pushing:

https://hg.mozilla.org/integration/mozilla-inbound/rev/769777308b19

I think the macro should make it pretty clear what needs to be backed out.  But more than that, we try not to have intermediate states in the tree that have brokenness in them -- best to never expose it even in an intermediate revision.
Attachment #678978 - Flags: checkin?(jwalden+bmo)
Attachment #678982 - Flags: checkin?(jwalden+bmo)
Depends on: 811853
Depends on: 811911
Depends on: harmony:symbols
Breaking out some utilities that should be useful for self-hosted code beyond the Internationalization API. This patch uses a few workarounds for bug 784400 and bug 784293 that can be removed when these bugs get fixed.
Attachment #684882 - Flags: review?(tschneidereit)
Attachment #684882 - Flags: review?(luke)
I'd like to kick off the code review process for this part even though I'm already aware of some necessary improvements:

- The use of %GetBuiltIn should be replaced with direct access to built-in objects once bug 784400 is fixed.

- References to built-in objects that applications might mess with should be cached at startup time once bug 784293 is fixed.

- Functions that reconstruct essentially static data can be replaced or enhanced with top-level vars once bug 784293 is fixed.

- The use of slightly obscure property names to simulate ECMAScript internal properties can be replaced with either symbols once bug 645416 is fixed, or with Weakmaps once bug 784293 is fixed.

- The functions labeled "stub" will be updated or replaced by a follow-on patch based on ICU.

This patch is based on mozilla-central rev 0d373cf880fd and requires attachment 684882 [details] [diff] [review]. Note also that the internationalization API is currently disabled in jsversion.h.
Attachment #673762 - Attachment is obsolete: true
Attachment #685044 - Flags: review?(luke)
Following up on comment 9 through comment 12, this patch adds a mechanism that lets Firefox and other SpiderMonkey embedders set a default locale, which the Internationalization API will use under conditions described in jsapi.h if the JavaScript application using the API doesn't specify a supported locale.
Attachment #685046 - Flags: review?(luke)
Attachment #685046 - Flags: review?(gandalf)
Comment on attachment 684882 [details] [diff] [review]
Add utilities for self-hosted JavaScript code

Review of attachment 684882 [details] [diff] [review]:
-----------------------------------------------------------------

It seems to me like many of these utilities are really workarounds for bug 784293, right?

Given that %GetBuiltin is unsafe because it uses the current global's builtins, we should really try to get at least the required parts of bug 784293 into the tree quickly, instead.

::: js/src/Makefile.in
@@ +894,5 @@
>  selfhosting:: selfhosted.out.h
>  
>  selfhosting_srcs := \
>    $(srcdir)/builtin/array.js \
> +  $(srcdir)/builtin/Utilities.js \

We should probably be consistent in the file naming. I don't know that a lower-case first letter is better than an upper-case one, so feel free to change in either direction.

::: js/src/builtin/Utilities.js
@@ +13,5 @@
> +// ??? need to cache Array operations earlier - depends on bugs 784400 and 784293
> +function List() {
> +    // ??? workaround for 784400
> +    var Object = %GetBuiltIn("Object");
> +    var Array = %GetBuiltIn("Array");

The %GetBuiltIn shouldn't be necessary anymore with the patch from bug 784293. As it uses the current global's builtins, your workaround can't be used in production: client code can change it and thus change the builtins' behavior.

@@ +17,5 @@
> +    var Array = %GetBuiltIn("Array");
> +
> +    if (IS_UNDEFINED(List.prototype)) {
> +        var proto = Object.create(null);
> +        proto.indexOf = Array.prototype.indexOf;

I hope to be able to change this, but at least for now, you need to refer to other self-hosted function using their original name. In this case: ArrayIndexOf.

As that requires knowing which functions are self-hosted and which aren't, I guess we should fix this. One way that's pretty much guaranteed to work is to assign the self-hosted functions to their final place during creation:

function ArrayIndexOf(...) {...}
Array.prototype.indexOf = ArrayIndexOf;

I'll try doing that.

@@ +25,5 @@
> +        proto.sort = Array.prototype.sort;
> +        List.prototype = proto;
> +    }
> +}
> +%_MakeConstructible(List);

Given that builtins work with bug 784293 fixed, do we need the List class?

@@ +38,5 @@
> +    var Object = %GetBuiltIn("Object");
> +
> +    return Object.create(null);
> +}
> +%_MakeConstructible(Record);

As above for the List class.

@@ +45,5 @@
> +/********** Abstract operations defined in ECMAScript Language Specification **********/
> +
> +
> +/* Spec: ECMAScript Language Specification, 5.1 edition, 8.12.6 and 11.8.7 */
> +function __HasProperty(o, p) {

I'm not too fond of the naming of these functions, but I see how they stand out as spec utilities this way. And they certainly are better than the ALL CAPS names from macros.py.

::: js/src/jscntxt.cpp
@@ +335,5 @@
> +            JS_ASSERT(false);
> +            return false;
> +        }
> +    }
> +    JS_ASSERT(false);

Is an ASSERT better here than throwing a JS exception?

@@ +344,5 @@
> + * ??? workaround for 784400
> + * Returns the property of the global object named by args[0].
> + */
> +static JSBool
> +intrinsic_GetBuiltIn(JSContext *cx, unsigned argc, Value *vp)

Shouldn't be required with bug 784293 fixed.
Attachment #684882 - Flags: review?(tschneidereit)
(In reply to Till Schneidereit [:till] from comment #55)
> Review of attachment 684882 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> It seems to me like many of these utilities are really workarounds for bug
> 784293, right?

Only %GetBuiltIn is a workaround; the rest is functionality that I needed in the implementation of the internationalization API and that I think may be useful for others too.

> Given that %GetBuiltin is unsafe because it uses the current global's
> builtins, we should really try to get at least the required parts of bug
> 784293 into the tree quickly, instead.

I agree.

> >  selfhosting_srcs := \
> >    $(srcdir)/builtin/array.js \
> > +  $(srcdir)/builtin/Utilities.js \
> 
> We should probably be consistent in the file naming. I don't know that a
> lower-case first letter is better than an upper-case one, so feel free to
> change in either direction.

Jeff made me use Intl.(cpp|h), and so I'm also adding Intl.js. It seems CamelCase is preferred in this directory.

> @@ +25,5 @@
> > +        proto.sort = Array.prototype.sort;
> > +        List.prototype = proto;
> > +    }
> > +}
> > +%_MakeConstructible(List);
> 
> Given that builtins work with bug 784293 fixed, do we need the List class?

I should have explained the purpose of List and Record: Both are used in the ECMAScript specifications; List is similar to Array, Record to Object, but both are internal to the specs and therefore not subject to modifications by applications.

Actually, the idea for the current implementation of the two came from you: bug 462300 comment 23 :-)

> > +function __HasProperty(o, p) {
> 
> I'm not too fond of the naming of these functions, but I see how they stand
> out as spec utilities this way. And they certainly are better than the ALL
> CAPS names from macros.py.

I was considering es5_HasProperty etc - would that be better?

> ::: js/src/jscntxt.cpp
> @@ +335,5 @@
> > +            JS_ASSERT(false);
> > +            return false;
> > +        }
> > +    }
> > +    JS_ASSERT(false);
> 
> Is an ASSERT better here than throwing a JS exception?

I'm using assertions to catch incorrect use of internal functions - at the API level, there would be exceptions of course. For incorrect use of internal functions, I'd like to be stopped immediately when it happens, and JS_ASSERT does that in debug builds.
Comment on attachment 684882 [details] [diff] [review]
Add utilities for self-hosted JavaScript code

I don't think I'm the right reviewer for these patches.
Attachment #684882 - Flags: review?(luke)
Attachment #685044 - Flags: review?(luke)
Attachment #685046 - Flags: review?(luke)
Depends on: 816394
Comment on attachment 685044 [details] [diff] [review]
Add self-hosted JavaScript core of Intl constructors Collator, NumberFormat, DateTimeFormat

Jeff, please see comment 52 for some caveats when reviewing this patch.
Attachment #685044 - Flags: review?(jwalden+bmo)
This version of the patch requires a fix for bug 784293 as provided by attachment 687524 [details] [diff] [review].
Attachment #684882 - Attachment is obsolete: true
Attachment #687536 - Flags: review?(tschneidereit)
Depends on: 811566
No longer depends on: 655499
Updated to take advantage of Till's partial fix for bug 784293. Requires attachment 687524 [details] [diff] [review] and attachment 687536 [details] [diff] [review].

I'm already aware of some necessary improvements:

- The remaining functions that reconstruct essentially static data can be replaced or enhanced with top-level vars once bug 784293 is fully fixed.

- The use of slightly obscure property names to simulate ECMAScript internal properties can be replaced with either symbols once bug 645416 is fixed, or with Weakmaps once bug 784293 is fixed.

- The functions labeled "stub" will be updated or replaced by a follow-on patch based on ICU.

Note that the internationalization API is currently disabled in jsversion.h.
Attachment #685044 - Attachment is obsolete: true
Attachment #685044 - Flags: review?(jwalden+bmo)
Attachment #688514 - Flags: review?(jwalden+bmo)
Comment on attachment 687536 [details] [diff] [review]
Add utilities for self-hosted JavaScript code

Review of attachment 687536 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry for the long reviewing delay.

I wonder if we shouldn't make all or most of the __To* functions be macros in macros.py instead. That way, they'd just compile away. As for __ToObject, we shouldn't have that wrapper. Instead, the intrinsics naming scheme should be sane enough to make them usable in spec implementations as-is.

I still have no really strong opinion on the naming scheme. As a data point, SpiderMonkey uses first-letter capitalization for non-method functions, so I wonder if being consistent with that is more important than consistency with widely-used style guides in the JS world.

In general, I'm cool with checking this in with the feedback addressed. We can always change naming issues once we agree on guide lines for them.

::: js/src/Makefile.in
@@ +897,5 @@
>  selfhosting:: selfhosted.out.h
>  
>  selfhosting_srcs := \
>    $(srcdir)/builtin/array.js \
> +  $(srcdir)/builtin/Utilities.js \

Can you move this up so that the utilities are available in all self-hosted code?

::: js/src/builtin/Utilities.js
@@ +30,5 @@
> +        proto.sort = std_ArraySort;
> +        List.prototype = proto;
> +    }
> +}
> +%_MakeConstructible(List);

Now that the patch in bug 784293 supports Array cloning, this class shouldn't be needed anymore. If you want to use the name "List" for clarity, can you just alias Array to it?

@@ +83,5 @@
> +
> +function __assert(b, info) {
> +    if (!b) {
> +        %AssertionFailed(info);
> +    }

No need for the curly braces

::: js/src/jscntxt.cpp
@@ +330,5 @@
> +    if (argc > 0) {
> +        // try to make the informative string accessible in a debugger
> +        JSString *str = ToString(cx, args[0]);
> +        if (str) {
> +            JSAutoByteString info(cx, str);

Instead of just making the string available for inspection in a debugger, can you dump it in debug builds?

@@ +332,5 @@
> +        JSString *str = ToString(cx, args[0]);
> +        if (str) {
> +            JSAutoByteString info(cx, str);
> +            JS_ASSERT(false);
> +            return false;

These shouldn't be required
Attachment #687536 - Flags: review?(tschneidereit) → review+
Comment on attachment 673759 [details] [diff] [review]
Change access to global object slots for Intl module objects.

Review of attachment 673759 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/vm/GlobalObject.h
@@ +357,5 @@
> +    }
> +
> +    void setNumberFormatPrototype(JSObject *obj) {
> +        setSlot(NUMBER_FORMAT_PROTO, ObjectValue(*obj));
> +    }

These are not supposed to happen more than once per global, right?

If that's correct, then before calling setSlot:
    JS_ASSERT(getSlot(WHATEVER).isUndefined());
Attachment #673759 - Flags: review?(jorendorff) → review+
(In reply to Till Schneidereit [:till] from comment #61)
> Comment on attachment 687536 [details] [diff] [review]
> Add utilities for self-hosted JavaScript code
> 
> I wonder if we shouldn't make all or most of the __To* functions be macros
> in macros.py instead. That way, they'd just compile away. As for __ToObject,
> we shouldn't have that wrapper. Instead, the intrinsics naming scheme should
> be sane enough to make them usable in spec implementations as-is.

Turning them into macros is a good idea; probably should do this after or as part of bug 814795. I'll get rid of __ToObject.

> I still have no really strong opinion on the naming scheme. As a data point,
> SpiderMonkey uses first-letter capitalization for non-method functions, so I
> wonder if being consistent with that is more important than consistency with
> widely-used style guides in the JS world.

I sent a survey to js-internals; maybe that'll result in clarity.

> ::: js/src/Makefile.in
> @@ +897,5 @@
> >  selfhosting:: selfhosted.out.h
> >  
> >  selfhosting_srcs := \
> >    $(srcdir)/builtin/array.js \
> > +  $(srcdir)/builtin/Utilities.js \
> 
> Can you move this up so that the utilities are available in all self-hosted
> code?

Function and variable declarations are hoisted to the top of the combined script, but yes, this may make it clearer to humans.

> ::: js/src/builtin/Utilities.js
> @@ +30,5 @@
> > +        proto.sort = std_ArraySort;
> > +        List.prototype = proto;
> > +    }
> > +}
> > +%_MakeConstructible(List);
> 
> Now that the patch in bug 784293 supports Array cloning, this class
> shouldn't be needed anymore. If you want to use the name "List" for clarity,
> can you just alias Array to it?

No, it's still different from Array in that it's safe to call methods on its objects without going through callFunction(std_Array_whatever).

> @@ +332,5 @@
> > +        JSString *str = ToString(cx, args[0]);
> > +        if (str) {
> > +            JSAutoByteString info(cx, str);
> > +            JS_ASSERT(false);
> > +            return false;
> 
> These shouldn't be required

The "return false"? In a no-debug build, JS_ASSERT doesn't do anything, so I still have to find some way out for that case.
Comment on attachment 685046 [details] [diff] [review]
Add ways to request default locale to be used by Internationalization API

that looks good
Attachment #685046 - Flags: review?(gandalf) → review+
Comment on attachment 685048 [details] [diff] [review]
Implement Collator.compare, NumberFormat.format, DateTimeFormat.format using ICU; make old locale sensitive functions use them (WIP)

Review of attachment 685048 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/builtin/Intl.js
@@ +644,5 @@
> +            %intl_numberFormatAvailableLocales().indexOf(locale) === -1 ||
> +            %intl_dateTimeFormatAvailableLocales().indexOf(locale) === -1) {
> +        locale = "en-GB";
> +    }
> +    return locale;

Why do you use en-GB as a default fallback locale here? I believe Chrome uses en-US in that case.
(In reply to Norbert Lindenberg from comment #63)
> > @@ +332,5 @@
> > > +        JSString *str = ToString(cx, args[0]);
> > > +        if (str) {
> > > +            JSAutoByteString info(cx, str);
> > > +            JS_ASSERT(false);
> > > +            return false;
> > 
> > These shouldn't be required
> 
> The "return false"? In a no-debug build, JS_ASSERT doesn't do anything, so I
> still have to find some way out for that case.

What I meant is that this case is the same as the general, outer case, so removing these two lines won't change behavior at all.
(In reply to Jason Orendorff [:jorendorff] from comment #62)
> These are not supposed to happen more than once per global, right?
> 
> If that's correct, then before calling setSlot:
>     JS_ASSERT(getSlot(WHATEVER).isUndefined());

I think in that case, rather than setSlot, initSlot is preferable, because it doesn't invoke write barriers.  I think.  Confirm with IRC before listening to me.  :-)
Updated patch based on comment 61, comment 63, comment 66, as well as the fix for bug 819700. Also added jshint settings, and renamed array.js to Array.js based on comment 55.
Attachment #687536 - Attachment is obsolete: true
Attachment #691183 - Flags: review?(tschneidereit)
Attachment #691183 - Flags: checkin?(tschneidereit)
Comment on attachment 691183 [details] [diff] [review]
Add utilities for self-hosted JavaScript code

Review of attachment 691183 [details] [diff] [review]:
-----------------------------------------------------------------

lgtm, will push after bug 784293 is fixed

::: js/src/builtin/Utilities.js
@@ +14,5 @@
> +         IS_UNDEFINED: false, TO_UINT32: false,
> +         JSMSG_NOT_FUNCTION: false, JSMSG_MISSING_FUN_ARG: false,
> +         JSMSG_EMPTY_ARRAY_REDUCE: false,
> +*/
> +  

I'll remove these spaces before pushing
Attachment #691183 - Flags: review?(tschneidereit) → review+
Attachment #691183 - Flags: checkin?(tschneidereit) → checkin?(jwalden+bmo)
Attachment #685046 - Flags: review?(jwalden+bmo)
Depends on: 822080
(In reply to Zbigniew Braniecki [:gandalf] from comment #65)

> Why do you use en-GB as a default fallback locale here? I believe Chrome
> uses en-US in that case.

I think more English-speaking countries follow British conventions than American ones. Note that this only comes into play when the requested default locale is not supported by ICU.
Comment on attachment 688514 [details] [diff] [review]
Add self-hosted JavaScript core of Intl constructors Collator, NumberFormat, DateTimeFormat

Review of attachment 688514 [details] [diff] [review]:
-----------------------------------------------------------------

I give up.  This is not happening in any sort of reasonable time, if ever, if I have to review 1900 lines of JS all at once to finish it.  It looks like everything outside Intl.js is landable (with whatever tweaks I noted) separate from the addition of Intl.js.  Split that out into its own patch and put it up for review.  This is pretty much an r+ on the non-Intl.js parts, but it's probably worth me looking a second time just to be safe.

As for Intl.js.  I've heard people say before that review time is quadratic in the length of the patch.  And 70K of patch is even worse when it's 70K of JS (C++ at least the way we write it just isn't as dense).  Nothing in Intl.js gets used yet with this patch, so there should be no problems splitting out Intl.js, splitting it into some number of separate changes, and getting reviews on each of those, separately.  I can review, say, IsStructurallyValidLanguageTag(locale), *much* quicker than I can review it and every other spec method all at once.

I'm trying to find some sort of split-up algorithm I can suggest to make this nice, but nothing is likely to spring to mind while I don't have a full grasp of the entire spec and all its components and all its references to RFC labels and maybe algorithms (if there are any).  So let's just do this the stupid way.  Divide Intl.js into 100-line increments, and create patches for each one.  Just do them linearly -- first hundred lines, then next, and so on.  Put the patch number in the patch summary/title so I can review them in order.  If you could, please double-check that each addition doesn't have any forward references -- if you need them, stub out the function with a ThrowError or something.  The exact form of all these really doesn't matter.  Just keep them short and separately landable without regressing non-Intl-enabled behavior.

I apologize for sitting on all the non-Intl.js comments as long as I have.  I probably should have written this comment a week or so ago, at least.  Oh well.  If I'm too dumb to review this whole patch at once (an exaggeration sort of, but not much of one), there's not much to do about it.  :-\

::: js/src/builtin/Intl.cpp
@@ +29,5 @@
>  {
> +    RootedValue initializerValue(cx);
> +    if (!cx->global()->getIntrinsicValue(cx, initializer, &initializerValue))
> +        return false;
> +    JS_ASSERT(initializerValue.isObject() && initializerValue.toObject().isFunction());

Separate this into two assertions, please.  (That way you know which one failed if they're hit.)

@@ +34,5 @@
> +    JSFunction *initializerFunc = initializerValue.toObject().toFunction();
> +
> +    Value funargs[] = {ObjectValue(*obj), locales, options};
> +    Value r;
> +    return JS_CallFunction(cx, NULL, initializerFunc, 3, funargs, &r);

Please use internal APIs to do this.  json.cpp:PreprocessValue has an example you can crib.

@@ +555,5 @@
>  
>  bool
>  GlobalObject::initIntlObject(JSContext *cx, Handle<GlobalObject*> global)
>  {
> +    RootedObject Intl(cx, NewObjectWithGivenProto(cx, &IntlClass, NULL, global));

Erm, isn't Intl supposed to have Object.prototype as its [[Prototype]]?  Switching to given-proto is great, but you need to pass in global->getOrCreateObjectPrototype(cx) here.

::: js/src/builtin/Intl.js
@@ +4,5 @@
> +
> +/* Portions Copyright Norbert Lindenberg 2011-2012. */
> +
> +/*jshint bitwise: true, curly: true, eqeqeq: true, forin: true, immed: true, latedef: false,
> +  newcap: true, noarg: true, nonew: true, regexp: true, undef: true, strict: false, trailing: false */

We don't consider jshint to be a gold standard for JS style.  In fact I'm not sure we even have a JS style.  Nothing wrong with looking consistent, but I'm not sure I want to enshrine jshint's arbitrary good-style choices as our own.  I have no idea what most of these mean, but if the rest of the file is any guide, at the very least I'm *definitely* sure we don't want to have to brace everything single-statement.  SpiderMonkey's never done that as a rule, and I think we are unanimously against doing that.

@@ +16,5 @@
> + * avoid interference from setters on Object.prototype.
> + */
> +function intl_defineProperty(o, p, v) {
> +    std_ObjectDefineProperty(o, p, {value: v, writable: true, enumerable: true, configurable: true});
> +}

I don't know what other engines do, but I think probably (perhaps not now) we want a AddNormalProperty builtin that takes an object, property name, and value and defines it with these attributes.  Object.defineProperty is a good API for JS in general, but internally where we know stuff like the property will never overwrite or conflict with an existing one, we can do better.

@@ +49,5 @@
> +
> +/**
> + * Regular expression for characters other than ASCII upper case.
> + */
> +var intl_nonASCIIUpperCaseRE = new RegExp("[^A-Z]");

I'd prefer seeing this regex inlined, as a regexp literal, in the single case where it's used, for greater clarity.

@@ +68,5 @@
> +    // so go character by character (actually code unit by code unit, but
> +    // since we only care about ASCII characters here, that's OK).
> +    var result = "";
> +    var i;
> +    for (i = 0; i < s.length; i++) {

for (var i = ...

@@ +86,5 @@
> + * Spec: ECMAScript Internationalization API Specification, draft, 6.2.1.
> + */
> +var intl_unicodeLocaleExtensionSequence = "-u(-[a-z0-9]{2,8})+";
> +var intl_unicodeLocaleExtensionSequenceRE = new RegExp(intl_unicodeLocaleExtensionSequence);
> +var intl_unicodeLocaleExtensionSequenceGlobalRE = new RegExp(intl_unicodeLocaleExtensionSequence, "g");

This global regex worries me, just on first read -- perhaps for no reason, but I don't know.  The issue is that global regular expressions include .lastIndex statefulness, which worries me as a potential correctness hazard.  Dunno if there's actually an issue with this -- but this raises a flag, definitely.

@@ +111,5 @@
> +        extlang = "(" + alpha + "{3}(-" + alpha + "{3}){0,2})",
> +        language = "(" + alpha + "{2,3}(-" + extlang + ")?|" + alpha + "{4}|" + alpha + "{5,8})",
> +        langtag = language + "(-" + script + ")?(-" + region + ")?(-" + variant + ")*(-" + extension + ")*(-" + privateuse + ")?",
> +        languageTag = "^(" + langtag + "|" + privateuse + "|" + grandfathered + ")$",
> +        languageTagRE = new RegExp(languageTag, "i");

A comma-separated var list might have all the requisite sequence points in it...but this just isn't readable like breaking down the entire thing into separate sections would be.  For that matter, it'd be great to see the ABNF entries copied directly into here as comments.

::: js/src/jscntxt.cpp
@@ +393,5 @@
> +    CallArgs args = CallArgsFromVp(argc, vp);
> +
> +    const char *locale = cx->getDefaultLocale();
> +    if (!locale)
> +        return false;

Hmm, this is an impedance mismatch.  We should keep the default locale around as a JSString*, not as a const char*, because it's what we'll actually use (and incidentally it'll simplify lifetime management).

But, hmm, it's not so easy storing a GC thing in JSRuntime, and keeping it properly rooted, is it.  Sigh.

Let's just make cx->runtime->getDefaultLocale() return a JSString* for now, constructed anew each time from the cached const char*, and then at some point someone can go back and make the actual value get stored in a JSString*.

::: js/src/jscntxt.h
@@ +1354,5 @@
>  
>      /* Per-context run options. */
>      unsigned            runOptions;          /* see jsapi.h for JSOPTION_* */
>  
> +    char                *defaultLocale;

Please store this in JSRuntime, as that's going to be the right place for it in the long run.

@@ +1360,4 @@
>    public:
>      int32_t             reportGranularity;  /* see jsprobes.h */
>  
> +    const char *getDefaultLocale();

This should move there as well.

@@ +2266,5 @@
>  #pragma warning(pop)
>  #endif
>  
>  #endif /* jscntxt_h___ */
> +

No need for a blank line here, just a newline at the end of the file -- here, and in other files.  (Is this something your editor does?)
Attachment #688514 - Flags: review?(jwalden+bmo)
Comment on attachment 685046 [details] [diff] [review]
Add ways to request default locale to be used by Internationalization API

Review of attachment 685046 [details] [diff] [review]:
-----------------------------------------------------------------

Nothing wrong with the mechanics of this, really, just API considerations, and where things should live.  Should be easy enough to fix, I expect.

::: js/src/jsapi.cpp
@@ +6821,5 @@
>  JS_PUBLIC_API(void)
> +JS_SetDefaultLocale(JSContext *cx, const char *locale)
> +{
> +    AssertHeapIsIdle(cx);
> +    cx->setDefaultLocale(locale);

This should be propagating failure through the JSBool return value I noted.

@@ +6827,5 @@
> +
> +JS_PUBLIC_API(const char *)
> +JS_GetDefaultLocale(JSContext *cx)
> +{
> +    return cx->getDefaultLocale();

And this should be a JS_strdup per earlier (maybe later, given Splinter UI) comments.

::: js/src/jsapi.h
@@ +5600,5 @@
> + * (Intl.Collator, Intl.NumberFormat, Intl.DateTimeFormat).
> + * Will only be respected if (a) a structurally well-formed BCP 47
> + * language tag and (b) supported by the implementation of the API.
> + * Note that the API encourages clients to specify their own locales.
> + * Use NULL to revert to OS defaults.

Hmm.  API methods are cheap.  Could we split out the revert-to-defaults part of this into a separate JS_ClearDefaultLocale method?  And make JS_SetDefaultLocale require the locale be non-null?  And have it return JSBool to indicate success or failure?

@@ +5606,5 @@
> +extern JS_PUBLIC_API(void)
> +JS_SetDefaultLocale(JSContext *cx, const char *locale);
> +
> +extern JS_PUBLIC_API(const char *)
> +JS_GetDefaultLocale(JSContext *cx);

This needs a comment next to it clarifying ownership of the returned pointer.  For simplicity, I think we should make this return a copy with JS_strdup.  Longer-term we'll want to make this return a JSString*, in line with my comments on the other patch, but for now let's at least foist ownership onto the caller, and not have to deal with interactions between JS_SetDefaultLocale and the string returned here.

::: js/src/jscntxt.cpp
@@ +1478,5 @@
>      JS_ASSERT(!resolvingList);
>  }
>  
> +void
> +JSContext::setDefaultLocale(const char *locale)

Per comments on the other patch, this should be in JSRuntime.

Or, actually.  This seems to me, now, considering the XPConnect parts, to be something that makes more sense living in JSCompartment.  It's something that probably wants to be per-global object, not per-thread, so the compartment is the natural place for it to live.  And there's a fair chance that XPConnect is calling it in a global-sensitive way.  Could you move it there, please?  (Same for JSContext::defaultLocale itself -- I am now contradicting my previous comments on the previous patch.)

@@ +1482,5 @@
> +JSContext::setDefaultLocale(const char *locale)
> +{
> +    if (defaultLocale) {
> +        js_free(defaultLocale);
> +        defaultLocale = NULL;

js_free is null-safe, so you can get rid of the if-check here.

@@ +1487,5 @@
> +    }
> +    if (locale) {
> +        char *loc = JS_strdup(this, locale);
> +        if (loc)
> +            defaultLocale = loc;

If JS_strdup returns NULL, that's a failure condition, and it should show up in the method signature, so if !loc you should return false, and otherwise return true.

::: js/src/jscntxt.h
@@ +2267,5 @@
>  #endif
>  
>  #endif /* jscntxt_h___ */
>  
> +

Something's weird here.  There shouldn't be any blank lines, yet somehow there are two?  Files should have a newline after the last empty line, and that should be it.

::: js/xpconnect/src/XPCLocale.cpp
@@ -318,1 @@
>      // else we might leak the intl stuff hooked onto |cx|

I have no idea whether these are all the changes needed in XPConnect to hook this stuff up.  I'm going to pretend they are, and say good enough.
Attachment #685046 - Flags: review?(jwalden+bmo) → review-
Comment on attachment 691183 [details] [diff] [review]
Add utilities for self-hosted JavaScript code

https://hg.mozilla.org/integration/mozilla-inbound/rev/256ff2b72064
Attachment #691183 - Flags: checkin?(jwalden+bmo)
Attachment #688514 - Attachment is obsolete: true
Attachment #693751 - Flags: review?(jwalden+bmo)
(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #71)
> > +var intl_unicodeLocaleExtensionSequenceGlobalRE = new RegExp(intl_unicodeLocaleExtensionSequence, "g");
> 
> This global regex worries me, just on first read -- perhaps for no reason,
> but I don't know.  The issue is that global regular expressions include
> .lastIndex statefulness, which worries me as a potential correctness hazard.
> Dunno if there's actually an issue with this -- but this raises a flag,
> definitely.

It's only used in conjunction with String.prototype.replace, which initializes lastIndex to 0 for global REs before doing anything else.
Attachment #693753 - Flags: review?(jwalden+bmo)
(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #71)
> > +    RootedObject Intl(cx, NewObjectWithGivenProto(cx, &IntlClass, NULL, global));
> 
> Erm, isn't Intl supposed to have Object.prototype as its [[Prototype]]? 

Good catch! I also added a test for this to the conformance test suite.
Attachment #693790 - Flags: review?(jwalden+bmo)
This patch combines the default locale handling of attachment 688514 [details] [diff] [review] and attachment 685046 [details] [diff] [review].

(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #71)
> ::: js/src/jscntxt.cpp
> @@ +393,5 @@
> > +    const char *locale = cx->getDefaultLocale();
> > +    if (!locale)
> > +        return false;
> 
> Hmm, this is an impedance mismatch.  We should keep the default locale
> around as a JSString*, not as a const char*, because it's what we'll
> actually use (and incidentally it'll simplify lifetime management).

It's not visible in this patch yet, but most uses of the default locale will actually use char* because that's what ICU uses. I therefore prefer to keep this as char*.

(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #72)

> @@ +5606,5 @@
> > +extern JS_PUBLIC_API(const char *)
> > +JS_GetDefaultLocale(JSContext *cx);
> 
> This needs a comment next to it clarifying ownership of the returned
> pointer.  For simplicity, I think we should make this return a copy with
> JS_strdup.

Even better: I removed the function entirely because it's not needed.

> ::: js/src/jscntxt.cpp
> > +void
> > +JSContext::setDefaultLocale(const char *locale)
> 
> Per comments on the other patch, this should be in JSRuntime.

I agree in theory; in practice it didn't work out. I tried to move all default locale handling from JSContext to JSRuntime, but the code needs to allocate memory (JS_strdup), and memory management (especially OOM error reporting) still lives in JSContext. This move will have to wait until memory management has moved.

> Or, actually.  This seems to me, now, considering the XPConnect parts, to be
> something that makes more sense living in JSCompartment.  It's something
> that probably wants to be per-global object, not per-thread, so the
> compartment is the natural place for it to live.  And there's a fair chance
> that XPConnect is calling it in a global-sensitive way.  Could you move it
> there, please?

XPConnect sets locale information for each JSContext, not for each JSCompartment, so JSContext (or JSRuntime) seems the right place.
Attachment #685046 - Attachment is obsolete: true
Attachment #694209 - Flags: review?(jwalden+bmo)
Depends on: 823390
(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #71)

> > +var intl_nonASCIIUpperCaseRE = new RegExp("[^A-Z]");
> 
> I'd prefer seeing this regex inlined, as a regexp literal, in the single
> case where it's used, for greater clarity.

I tried, but that led to assertion failures while running two of my test cases that exercise IsWellFormedCurrencyCode. I filed bug 823390.
Attachment #694227 - Flags: review?(jwalden+bmo)
Now with regex literal, since bug 823390 has been fixed.
Attachment #694227 - Attachment is obsolete: true
Attachment #694227 - Flags: review?(jwalden+bmo)
Attachment #695803 - Flags: review?(jwalden+bmo)
Comment on attachment 694209 [details] [diff] [review]
Add support for default locale for Internationalization API

Review of attachment 694209 [details] [diff] [review]:
-----------------------------------------------------------------

This is good other than the "will only be respected" bit -- clear that up in code-fixes or documentation-fixes and it's ready.

::: js/src/jscntxt.cpp
@@ +1167,5 @@
> +JSContext::resetDefaultLocale()
> +{
> +    js_free(defaultLocale);
> +    defaultLocale = NULL;
> +}    

Remove the trailing whitespace.

::: js/src/jscntxt.h
@@ +1371,5 @@
> +    /*
> +     * Set the default locale for the ECMAScript Internationalization API
> +     * (Intl.Collator, Intl.NumberFormat, Intl.DateTimeFormat).
> +     * Will only be respected if (a) a structurally well-formed BCP 47
> +     * language tag and (b) supported by the implementation of the API.

This "will only be respected" language confuses me.  It looks like setDefaultLocale doesn't do a structurally-well-formed BCP 47 check.  What guarantees that'll have happened, as far as getDefaultLocale() is concerned?  (That function returns defaultLocale without alteration if it's non-null.)

::: js/xpconnect/src/XPCLocale.cpp
@@ +322,5 @@
>      if (XPCLocaleCallbacks* lc = XPCLocaleCallbacks::MaybeThis(cx)) {
>        // This is a JSContext for which xpc_LocalizeContext() was called.
>        JS_SetLocaleCallbacks(cx, nullptr);
>        delete lc;
> +      JS_ResetDefaultLocale(cx);

This doesn't appear necessary to add.  The lines before are needed to not leak heap-allocated callbacks, but that's not an issue for the default locale stuff.

@@ +371,5 @@
> +      rv = appLocale->
> +           GetCategory(NS_LITERAL_STRING(NSILOCALE_TIME), localeStr);
> +      NS_ASSERTION(NS_SUCCEEDED(rv), "failed to get app locale info");
> +      NS_LossyConvertUTF16toASCII locale(localeStr);
> +      JS_SetDefaultLocale(cx, locale.get());

Since this is fallible it really doesn't belong in a void-returning (never-fails-implying) method.  But, this is really the natural place for it to live, adjacent to the other locale-tweaking API calls.  So I guess this is good for now.
Attachment #694209 - Flags: review?(jwalden+bmo)
Comment on attachment 693751 [details] [diff] [review]
Add self-hosted JavaScript core of Intl constructors Collator, NumberFormat, DateTimeFormat (part 1)

Review of attachment 693751 [details] [diff] [review]:
-----------------------------------------------------------------

...this might be a bit over-short for a hunk to read, but I'll take it.  :-)
Attachment #693751 - Flags: review?(jwalden+bmo) → review+
Updated to excise bit rot. Carrying r+jwalden.
Attachment #693751 - Attachment is obsolete: true
Attachment #696778 - Flags: review+
Attachment #696778 - Flags: checkin?(jwalden+bmo)
(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #83)

> ::: js/src/jscntxt.h
> @@ +1371,5 @@
> > +    /*
> > +     * Set the default locale for the ECMAScript Internationalization API
> > +     * (Intl.Collator, Intl.NumberFormat, Intl.DateTimeFormat).
> > +     * Will only be respected if (a) a structurally well-formed BCP 47
> > +     * language tag and (b) supported by the implementation of the API.
> 
> This "will only be respected" language confuses me.  It looks like
> setDefaultLocale doesn't do a structurally-well-formed BCP 47 check.  What
> guarantees that'll have happened, as far as getDefaultLocale() is concerned?
> (That function returns defaultLocale without alteration if it's non-null.)

The code for checking language tags and determining available locales comes in later patches, so I've removed the comments for now.

> ::: js/xpconnect/src/XPCLocale.cpp
> @@ +322,5 @@
> > +      JS_ResetDefaultLocale(cx);
> 
> This doesn't appear necessary to add.  The lines before are needed to not
> leak heap-allocated callbacks, but that's not an issue for the default
> locale stuff.

Removed.
Attachment #694209 - Attachment is obsolete: true
Attachment #696788 - Flags: review?(jwalden+bmo)
Attachment #696788 - Flags: checkin?(jwalden+bmo)
Attachment #696778 - Flags: checkin?(jwalden+bmo)
Attachment #696788 - Flags: checkin?(jwalden+bmo)
Attachment #696788 - Flags: review?(jwalden+bmo) → review+
Comment on attachment 693753 [details] [diff] [review]
Add self-hosted JavaScript core of Intl constructors Collator, NumberFormat, DateTimeFormat (part 2)

Review of attachment 693753 [details] [diff] [review]:
-----------------------------------------------------------------

My kingdom for named capture groups in JS.  :-(

::: js/src/builtin/Intl.js
@@ +20,5 @@
> + * Convert s to upper case, but limited to characters a-z.
> + *
> + * Spec: ECMAScript Internationalization API Specification, draft, 6.1.
> + */
> +function toASCIIUpperCase(s) {

This function seems like it would be better and clearer as an intrinsic, given it's probably going to be doing a function call per character in most common cases.  (I'd assume people will pass all-ASCII-lowercase strings through here most of the time.)  Could you change this into one, please?

@@ +40,5 @@
> +
> +/**
> + * Regular expression pattern defining Unicode locale extension sequences
> + *
> + * Spec: ECMAScript Internationalization API Specification, draft, 6.2.1.

Could you add a note here that resolves the "subtag" term used in the spec?  I can't find a production explicitly for subtag in BCP47, but I'm semi-guessing this refers to the "extension" production with singleton="u".

@@ +55,5 @@
> + */
> +var languageTagRE = (function () {
> +    // RFC 5234 section B.1
> +    // ALPHA          =  %x41-5A / %x61-7A   ; A-Z / a-z
> +    var alpha = "[a-zA-Z]";

This is a little more readable if named ALPHA, I think.  Makes uses of it stand out more, and be more similar to the side-by-side productions/grammar, than a lowercased thing would be.

@@ +62,5 @@
> +    var digit = "[0-9]";
> +
> +    // RFC 5646 section 2.1
> +    // alphanum      = (ALPHA / DIGIT)     ; letters and numbers
> +    var alphanum = "(" + alpha + "|" + digit + ")";

"[0-9A-Za-z]" is nicer to read, I think, than an extra alternation/breakdown here.  The comment should make clear that this simplification is correct.

@@ +101,5 @@
> +    //               / %x41-57             ; A - W
> +    //               / %x59-5A             ; Y - Z
> +    //               / %x61-77             ; a - w
> +    //               / %x79-7A             ; y - z
> +    var singleton = "(" + digit + "|[A-WY-Za-wy-z])";

I think this is slightly more readable as "([0-9A-WY-Za-wy-z])" -- the copied productions make clear that it's not necessary to use DIGIT.

Although, come to think of it, you don't need grouping parentheses around that when it's written that way, so make it "[0-9A-WY-Za-wy-z]".

@@ +133,5 @@
> +    //               / privateuse          ; private use tag
> +    //               / grandfathered       ; grandfathered tags
> +    var languageTag = "^(" + langtag + "|" + privateuse + "|" + grandfathered + ")$";
> +
> +    return new RegExp(languageTag, "i");

Er, is "i" really right here?  6.2.2 doesn't say anything that makes me think it is.

@@ +137,5 @@
> +    return new RegExp(languageTag, "i");
> +}());
> +
> +
> +var duplicateSingletonRE = (function () {

Pedantry, but please put duplicateVariantRE before duplicateSingletonRE, to match the bullet/prose ordering in 6.2.2.

@@ +153,5 @@
> +    //               / %x41-57             ; A - W
> +    //               / %x59-5A             ; Y - Z
> +    //               / %x61-77             ; a - w
> +    //               / %x79-7A             ; y - z
> +    var singleton = "(" + digit + "|[A-WY-Za-wy-z])";

"([0-9A-WY-Za-wy-z])", as before (but with parens because they're needed below, thanks for having the comment point that out).

@@ +158,5 @@
> +
> +    // \1 refers back to the singleton (which is parenthesized in the RE).
> +    // If it shows up after "-" and without being followed by alphanum characters
> +    // that would turn it into a different subtag, it's a duplicate singleton.
> +    var duplicateSingleton = "-" + singleton + "-(.*-)?\\1(?!" + alphanum + ")";

Is this actually correct?  .* is going to eat up a lot, possibly eating up out of the *("-" extension) bit of a langtag and into the ["-" privateuse] part.  What would this do for a language tag like this:

aa            ; language
-a-foo        ; *("-" extension)
-x-a-foo-bar  ; ["-" privateuse]

That is, for the total input string "aa-a-foo-x-a-foo-bar".  Indeed, if I copy the relevant bits out of this file, then evaluate this:

  print(duplicateSingletonRE.test("aa-a-foo-x-a-foo-bar") ? "duplicate!" : "not a dup");

I get "duplicate!" printed.

@@ +159,5 @@
> +    // \1 refers back to the singleton (which is parenthesized in the RE).
> +    // If it shows up after "-" and without being followed by alphanum characters
> +    // that would turn it into a different subtag, it's a duplicate singleton.
> +    var duplicateSingleton = "-" + singleton + "-(.*-)?\\1(?!" + alphanum + ")";
> +

Less important than the correctness issue...

My spider sense is telling me that testing this regex will be non-linear for some input strings.  As a first pass at implementation this would seem good enough, except for the correctness issue above.  (If it is one -- it's entirely possible I'm not understanding the patch/spec, I'm sure you'll tell me if so.)

The regex approach might be adjustable to fix the correctness issue.  I'm not sure.  But my gut (always reliable, of course ;-) ) says it's going to be very hard to address the performance issue if it exists.  So I think you're going to need to check for duplicate singletons a different way.  I think probably you'll want to use the languageTagRE to extract out all the variant extension subtags, then use a Set to check whether each extension's singleton is a duplicate.

If indeed you use the languageTagRE to extract out the singleton components, and the variant components as noted below, that's going to raise a new issue.  In JS (foo) is a *capturing* pattern, which means the JS engine goes to extra work to track substring matches for them.  (But not for .test(), which is why there's no issue with the patch as it stands.)  They're exposed possibly through regex statics, also through indexing on the result object returned from a successful regex.exec().

(...which, um, is Worrisome, isn't it?  Because it'd mean calling some Intl thing might frob regex statics on someone depending on Intl calls not changing them.  But then again, Intl stuff is new, and regex statics usage is highly disfavored.  So if you end up going the route I suggest below, let's ignore this issue for now and file a bug to determine if there's a problem and fix it if so.  But this seems like an ignorable issue at the very least for the first pass.)

Anyway.  What I was going to say was that using () for everything you don't care about, when you're generating match result data (not merely using .test()), is a bad idea.  So you should use (?:) instead to not capture the components you don't need, please.

Speaking of which, a part of me finds string-interpolation of "(|)" very noisy, and thinks maybe a helper function like function or(...args) { return "(?:" args.join("|") + ")"; } might be useful.  Go ahead with something like that (using std_Array_join, of course) if you agree it's a reasonable idea.

@@ +160,5 @@
> +    // If it shows up after "-" and without being followed by alphanum characters
> +    // that would turn it into a different subtag, it's a duplicate singleton.
> +    var duplicateSingleton = "-" + singleton + "-(.*-)?\\1(?!" + alphanum + ")";
> +
> +    return new RegExp(duplicateSingleton, "i");

Why the "i" again?

@@ +183,5 @@
> +    // \3 refers back to the variant (which starts with the third opening parenthesis in the RE).
> +    // If it shows up after "-", without following a singleton or "x",
> +    // and without being followed by alphanum characters
> +    // that would turn it into a different subtag, it's a duplicate variant.
> +    var duplicateVariant = "(" + alphanum + "{2,8}-)+" + variant + "-(" + alphanum + "{2,8}-)*\\3(?!" + alphanum + ")";

I haven't looked at this as closely as I looked at the singleton bit, but I think this has similar sorts of issues to the ones found there.  So this approach needs to be changed to use a Set as well.
Attachment #693753 - Flags: review?(jwalden+bmo) → review-
Comment on attachment 693790 [details] [diff] [review]
Add self-hosted JavaScript core of Intl constructors Collator, NumberFormat, DateTimeFormat (part 3)

Review of attachment 693790 [details] [diff] [review]:
-----------------------------------------------------------------

This looks good, with the few little nitpicks noted.  It looks like it depends on symbols in Intl.js added in later patches, tho, so I assume it's not something really landable on its own.

::: js/src/builtin/Intl.cpp
@@ +157,5 @@
>      // 10.3.2 and 10.3.3
>      if (!JS_DefineFunctions(cx, proto, collator_methods))
>          return NULL;
>  
> +    // install the getter for the main method (it'll return a bound function)

This should be capitalized/ended with a period, since it's a complete sentence.  But before that, this reads just a bit abrupt to me -- I know what it's saying because I thought about it, but it wouldn't be immediately obvious if you didn't know this bit of the Intl interface.  How about something like this:

Install the getter for Collator.prototype.compare, which returns a bound comparison function for the specified Collator object (suitable for passing to methods like Array.prototype.sort).

@@ +163,5 @@
> +    if (!cx->global()->getIntrinsicValue(cx, cx->names().CollatorCompare, &getter))
> +        return NULL;
> +    RootedValue undefinedValue(cx, UndefinedValue());
> +    if (!JSObject::defineProperty(cx, proto, cx->names().compare, undefinedValue,
> +                                  (JSPropertyOp) &getter.toObject(), NULL, JSPROP_GETTER)) {

See comment later about casting.

@@ +457,5 @@
> +    if (!cx->global()->getIntrinsicValue(cx, cx->names().DateTimeFormatFormat, &getter))
> +        return NULL;
> +    RootedValue undefinedValue(cx, UndefinedValue());
> +    if (!JSObject::defineProperty(cx, proto, cx->names().format, undefinedValue,
> +                                  (JSPropertyOp) &getter.toObject(), NULL, JSPROP_GETTER)) {

Object-to-function-pointer casts trigger warnings with some compilers.  I believe the macro you want for this is JS_DATA_TO_FUNC_PTR(JSPropertyOp, ...).

@@ +547,5 @@
>          return NULL;
>  
> +    // Skip initialization of the Intl constructors during initialization of the
> +    // self-hosting global as we may get here before self-hosted code is compiled,
> +    // and no core code refers to the Intl classes.

Hmm; hackish.  But I guess this'll work well enough for now.

@@ +566,5 @@
>  bool
>  GlobalObject::initIntlObject(JSContext *cx, Handle<GlobalObject*> global)
>  {
> +    RootedObject Intl(cx, NewObjectWithGivenProto(cx, &IntlClass,
> +                      global->getOrCreateObjectPrototype(cx), global));

We'd either align the second line with |cx| or do this instead:

    RootedObject Intl(cx);
    Intl = ...

Since aligning with |cx| results in really-indented code, pushing the assignment to its own line seems preferable.
Attachment #693790 - Flags: review?(jwalden+bmo) → review+
Comment on attachment 694210 [details] [diff] [review]
Add self-hosted JavaScript core of Intl constructors Collator, NumberFormat, DateTimeFormat (part 4)

Review of attachment 694210 [details] [diff] [review]:
-----------------------------------------------------------------

Precedent here would, I think, have us write a Python script that would download http://www.iana.org/assignments/language-subtag-registry into our tree, and would as a separable step parse it and write out a generated file with the appropriate data.  (See, for example, vm/make_unicode.py and vm/Unicode*.)  This would make it trivial to update this initializer when a new registry file is released.  (Set aside for the moment that some of these particular bits of data might not ever change.  Presumably there'll be other dependencies on that file that will, in the longer run.  Plus a large generated list, generated by code I could review for correctness, I'd trust more than one generated by hand, all else equal.)

On the other hand, these are somewhat short lists, and manual verification seemed to confirm them to be correct/complete, so this'll do for now.

::: js/src/builtin/Intl.js
@@ +210,5 @@
> +
> +/**
> + * Mappings from complete tags to preferred values.
> + *
> + * Spec: IANA Language Subtag Registry.

Include http://www.iana.org/assignments/language-subtag-registry here too, please.

@@ +276,5 @@
> +
> +/**
> + * Mappings from non-extlang subtags to preferred values.
> + *
> + * Spec: IANA Language Subtag Registry.

Mention the URL here as well.  Or put a comment above both of these mappings that by implication applies to both, perhaps.

@@ +304,5 @@
> +    "myt": "mry",
> +    "sca": "hle",
> +    "tie": "ras",
> +    "tlw": "weo",
> +    "tkk": "twm",

tlw/tkk are not in alphabetical order.  (Yes, I'm comparing against the registry list as best as I can.  :-) )
Attachment #694210 - Flags: review?(jwalden+bmo) → review+
(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #93)
> Precedent here would, I think, have us write a Python script that would
> download http://www.iana.org/assignments/language-subtag-registry into our
> tree, and would as a separable step parse it and write out a generated file
> with the appropriate data.  (See, for example, vm/make_unicode.py and
> vm/Unicode*.)  This would make it trivial to update this initializer when a
> new registry file is released.  (Set aside for the moment that some of these
> particular bits of data might not ever change.  Presumably there'll be other
> dependencies on that file that will, in the longer run.  Plus a large
> generated list, generated by code I could review for correctness, I'd trust
> more than one generated by hand, all else equal.)

Psst... https://github.com/GPHemsley/BCP47
Comment on attachment 694211 [details] [diff] [review]
Add self-hosted JavaScript core of Intl constructors Collator, NumberFormat, DateTimeFormat (part 5)

Review of attachment 694211 [details] [diff] [review]:
-----------------------------------------------------------------

Fifty lines was iffy to take as non-generated bits, but I could semi-stomach it for stuff that wasn't going to change.  As far as I can tell, these bits actually can change in subsequent registry updates, and there's a lot more of them.  So I think for this bit, I'm going to have to put my foot down and ask for it to be generated with a Python script.

The script should live in vm/.  It should take various arguments along the lines of the Unicode-generating script: one set to download a new copy of the registry into vm/, and then other sets to extract and generate the relevant data.  The Python script, the downloaded registry, and the files generated from them should all be in source control, so it's clear and obvious at any point exactly what was the source for the data used in a build.  The Python script should probably have something to slurp the whole registry file and create some sort of tree-ish structure to represent everything, then have various methods that walk/search the tree and produce JS object literals and such for the various structured representations you're using in Intl.js, then some method to concatenate all the individual parts into a vm/GeneratedIntlData.js that Intl.js can depend on.

It's not that I really want to get in the way of the Intl stuff landing.  :-\  But the manually-list-it approach here just isn't maintainable, nor is it especially reviewable.
Attachment #694211 - Flags: review?(jwalden+bmo) → review-
(In reply to Norbert Lindenberg from comment #7)

> Tested with the js shell on Mac against the ECMAScript Internationalization
> API conformance test suite (hg.ecmascript.org/tests/test262, intl402
> subset); it currently passes 110 of the 137 tests.

Can we import the necessary tests if we don't have our own? Especially for mutation-based fuzzing, tests that use this new API are essential.
Comment on attachment 695803 [details] [diff] [review]
Add self-hosted JavaScript core of Intl constructors Collator, NumberFormat, DateTimeFormat (part 6)

Review of attachment 695803 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/builtin/Intl.js
@@ +557,5 @@
>  };
> +
> +
> +/**
> + * Canonicalizes the given well-formed BCP 47 language tag, including regularized case of subtags.

This is adequate as a brief summary, but it doesn't give any detail of what's done as an example, nor does it describe the extra canonicalization niceties done here (like capitalizing select bits and so on).  Something like demonstrating the result of canonicalizing the following tag would be quite helpful here, noting the transformations done on the broken-down parts:

aAa            ; language
-SCRI          ; ["-" script]
-uS            ; ["-" region]
-variant2      ; *("-" variant)
-Variant1
-oneMore
-b-ROLL        ; *("-" extension)
-a-team
-x-Fnord-ZOOP  ; ["-" privateuse]

@@ +565,5 @@
> + */
> +function CanonicalizeLanguageTag(locale) {
> +    assert(IsStructurallyValidLanguageTag(locale), "CanonicalizeLanguageTag");
> +
> +    // start with lower case for easier processing, and because most subtags will need to be lower case anyway

I'd prefix this with something like "Language tags are compared and processed case-insensitively, so technically it's not necessary to adjust case.  But" to be slightly more precise.

Also, comments should linebreak at 79 characters, and capitalize/add a period since it's a complete sentence.

@@ +576,5 @@
> +    var subtags = callFunction(std_String_split, locale, "-");
> +    var i = 0;
> +
> +    // handle standard part: all subtags before first singleton or "x"
> +    while (i < subtags.length) {

This seems better-suited to writing it like so:

var subtag;
for (var i = 0; i < subtags.length; subtags[i] = subtag, i++) {

@@ +578,5 @@
> +
> +    // handle standard part: all subtags before first singleton or "x"
> +    while (i < subtags.length) {
> +        var subtag = subtags[i];
> +        if (subtag.length === 1 && (i > 0 || subtag === "x")) {

How can i > 0 ever not be true here, if we have a structurally valid language tag?  Seems like we should just have |if (subtag === "x")|.

Although, looking further down, I think the intent here might have been to break off once the extensions/privateuse bits are hit, i.e. when a single-character component is hit.  So should this just be |if (subtag.length === 1)|?

@@ +580,5 @@
> +    while (i < subtags.length) {
> +        var subtag = subtags[i];
> +        if (subtag.length === 1 && (i > 0 || subtag === "x")) {
> +            break;
> +        } else if (i !== 0 && subtag.length === 2) {

Let's do this like so:

// We're done when we've hit the privateuse subtags.
if (subtag === "x")
    break;

if (i !== 0 && subtag.length === 2) {
...

Hitting the private-use stuff is logically separate, so it should be separated.  And else-if doesn't make any more sense after a break than it does after a return.

@@ +581,5 @@
> +        var subtag = subtags[i];
> +        if (subtag.length === 1 && (i > 0 || subtag === "x")) {
> +            break;
> +        } else if (i !== 0 && subtag.length === 2) {
> +            subtag = callFunction(std_String_toUpperCase, subtag);

The |if (i !== 0 && subtag.length === 2)| bit is to uppercase a two-letter region subtag.  Is that right?  Please add a comment like "// Uppercase a two-letter region tag.".

@@ +584,5 @@
> +        } else if (i !== 0 && subtag.length === 2) {
> +            subtag = callFunction(std_String_toUpperCase, subtag);
> +        } else if (subtag.length === 4) {
> +            subtag = callFunction(std_String_toUpperCase, subtag[0]) +
> +                     callFunction(std_String_toLowerCase, callFunction(std_String_substring, subtag, 1));

This is to convert the script subtag from abcd to Abcd, i.e. to capitalize the first letter of the script?  Add a comment to that effect too.

Since the script component goes before the region component in the ABNF, please put this if-component before the region-component if.

@@ +587,5 @@
> +            subtag = callFunction(std_String_toUpperCase, subtag[0]) +
> +                     callFunction(std_String_toLowerCase, callFunction(std_String_substring, subtag, 1));
> +        }
> +        if (callFunction(std_Object_hasOwnProperty, langSubtagMappings, subtag)) {
> +            subtag = langSubtagMappings[subtag];

Should this conversion really be applied to both the 3ALPHA component of a language, and to the 3ALPHA component (first, or the permanently-reserved second) of an extlang?  Or to the 3alphanum component of an extension?  Maybe I just don't know enough about this whole process, but I suspect this is something we're only supposed to do to *some* subtags -- it looks like just the 3ALPHA of a language to me -- and not all (excluding privateuse) of them.

I think we should probably (again) use a regex with capturing patterns to extract out the components we care about, then process each one individually, then recombine them at the end.  Which, er, probably makes some of these comments on the existing structure/code not super-meaningful, except as testcase fodder.  :-\

I had thought you also had a problem here regarding use of case, due to langSubtagMappings having uppercase stuff in it, but I guess that's adjusting to the uppercasing earlier here.  It might be worth reducing the intertwinedness of the transformations here with the data-table structures by making them all lowercase, then doing uppercasing *after* any substitutions/replacements are performed.

However this method is used spec-visibly, we probably should have tests for these fuzzy parts.

@@ +590,5 @@
> +        if (callFunction(std_Object_hasOwnProperty, langSubtagMappings, subtag)) {
> +            subtag = langSubtagMappings[subtag];
> +        } else if (callFunction(std_Object_hasOwnProperty, extlangMappings, subtag)) {
> +            subtag = extlangMappings[subtag][0];
> +            if (i === 1 && extlangMappings[subtag][1] === subtags[0]) {

The patch with extlangMappings in it had a comment by the structure saying [1] was "(if present)".  I assume that comment was wrong if we're assuming existence here?

I think I'd prefer if, instead of using array indexes, we used named properties for [0] and [1] to indicate their meanings/purpose.  The indexes alone are kind of inscrutable.  :-)

@@ +592,5 @@
> +        } else if (callFunction(std_Object_hasOwnProperty, extlangMappings, subtag)) {
> +            subtag = extlangMappings[subtag][0];
> +            if (i === 1 && extlangMappings[subtag][1] === subtags[0]) {
> +                callFunction(std_Array_shift, subtags);
> +                i--;

If |i === 1| here, then subtags should be an array of length 1, right?  It looks to me like you could just not do the shift at all, then let the assignment below go to element 0 in the array.

@@ +600,5 @@
> +        i++;
> +    }
> +    var normal = callFunction(std_Array_join, callFunction(std_Array_slice, subtags, 0, i), "-");
> +
> +    // handle extensions

Er, we were looping til we hit an "x" subtag above, right?  Or til we ran out of subtags.  So we should be at the ["-" privateuse] component, or at the end.  So does this loop ever execute?

@@ +615,5 @@
> +
> +    // handle private use
> +    var privateUse;
> +    if (i < subtags.length)
> +        privateUse = callFunction(std_Array_join, callFunction(std_Array_slice, subtags, i), "-");

Same does-this-ever-execute concern.

@@ +623,5 @@
> +    if (extensions.length > 0)
> +        canonical += "-" + extensions.join("-");
> +    if (privateUse !== undefined) {
> +        if (canonical.length > 0)
> +            canonical += "-" + privateUse;

Is it ever possible for canonical to be empty, given our input was asserted structurally-valid?  Seems like it shouldn't be, to me, because that would be as if the first subtag were one character, but that doesn't match the 2*3ALPHA in language.

I'd probably have this, instead of mixing undefined/string types:

var privateUse = "";
...
...
...
if (privateUse.length > 0)
    canonical += "-" + privateUse;

@@ +643,5 @@
> +    var c = ToString(currency);
> +    var normalized = toASCIIUpperCase(c);
> +    if (normalized.length !== 3)
> +        return false;
> +    return callFunction(std_String_match, normalized, /[^A-Z]/) === null;

String.prototype.match has to go to the effort of creating a result array/object and all that junk, just like RegExp.prototype.match does.  Either String.prototype.search or RegExp.prototype.test is better if you just want to know if it matches -- test probably being better because it doesn't require the user to think in terms of indexes.  So do this instead, I think:

    return !callFunction(std_RegExp_test, /[^A-Z]/, normalized);
Attachment #695803 - Flags: review?(jwalden+bmo) → review-
(In reply to Christian Holler (:decoder) from comment #96)
> (In reply to Norbert Lindenberg from comment #7)
> 
> > Tested with the js shell on Mac against the ECMAScript Internationalization
> > API conformance test suite (hg.ecmascript.org/tests/test262, intl402
> > subset); it currently passes 110 of the 137 tests.
> 
> Can we import the necessary tests if we don't have our own? Especially for
> mutation-based fuzzing, tests that use this new API are essential.

Yes, I'm planning to integrate the conformance test suite into one of the SpiderMonkey test suites. It's a bit early for that unfortunately - so far little of the API's implementation has landed, and the whole thing is still disabled.
Updated as requested in comment 92. Carrying r+jwalden.

(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #92)

> This looks good, with the few little nitpicks noted.  It looks like it
> depends on symbols in Intl.js added in later patches, tho, so I assume it's
> not something really landable on its own.

It does depend on later additions to Intl.js, but since all of Intl is currently disabled, that's a problem only for those who specifically enable it, which is probably just me.
Attachment #693790 - Attachment is obsolete: true
Attachment #706585 - Flags: review+
Attachment #706585 - Flags: checkin?(jwalden+bmo)
Depends on: 834989
(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #91)

> > +function toASCIIUpperCase(s) {
> 
> This function seems like it would be better and clearer as an intrinsic,
> given it's probably going to be doing a function call per character in most
> common cases.  (I'd assume people will pass all-ASCII-lowercase strings
> through here most of the time.)  Could you change this into one, please?

I checked with the profiler against the ECMA-402 conformance test suite, and this function is in the general noise, not worth optimizing.

> > +    // alphanum      = (ALPHA / DIGIT)     ; letters and numbers
> > +    var alphanum = "(" + alpha + "|" + digit + ")";
> 
> "[0-9A-Za-z]" is nicer to read, I think, than an extra alternation/breakdown
> here.  The comment should make clear that this simplification is correct.

Yes, but once you do that for duplicateVariantRE, the ALPHA variable becomes unnecessary, but you still need its documentation... Same for the proposed simplification of singleton, which makes DIGIT unnecessary.

> @@ +158,5 @@
> > +
> > +    // \1 refers back to the singleton (which is parenthesized in the RE).
> > +    // If it shows up after "-" and without being followed by alphanum characters
> > +    // that would turn it into a different subtag, it's a duplicate singleton.
> > +    var duplicateSingleton = "-" + singleton + "-(.*-)?\\1(?!" + alphanum + ")";
> 
> Is this actually correct?  .* is going to eat up a lot, possibly eating up
> out of the *("-" extension) bit of a langtag and into the ["-" privateuse]
> part.  What would this do for a language tag like this:
> 
> aa            ; language
> -a-foo        ; *("-" extension)
> -x-a-foo-bar  ; ["-" privateuse]
> 
> That is, for the total input string "aa-a-foo-x-a-foo-bar".  Indeed, if I
> copy the relevant bits out of this file, then evaluate this:
> 
>   print(duplicateSingletonRE.test("aa-a-foo-x-a-foo-bar") ? "duplicate!" :
> "not a dup");
> 
> I get "duplicate!" printed.

That's why IsStructurallyValidLanguageTag removes any private use part before using duplicateVariantRE and duplicateSingletonRE. I added your input string to the conformance test for valid inputs for 6.2.2, and it still passes.

> @@ +159,5 @@
> > +    // \1 refers back to the singleton (which is parenthesized in the RE).
> > +    // If it shows up after "-" and without being followed by alphanum characters
> > +    // that would turn it into a different subtag, it's a duplicate singleton.
> > +    var duplicateSingleton = "-" + singleton + "-(.*-)?\\1(?!" + alphanum + ")";
> > +
> 
> My spider sense is telling me that testing this regex will be non-linear for
> some input strings.  As a first pass at implementation this would seem good
> enough, except for the correctness issue above.  (If it is one -- it's
> entirely possible I'm not understanding the patch/spec, I'm sure you'll tell
> me if so.)
> 
> [...] my gut (always reliable, of course ;-) ) says it's going to
> be very hard to address the performance issue if it exists.  So I think
> you're going to need to check for duplicate singletons a different way.  I
> think probably you'll want to use the languageTagRE to extract out all the
> variant extension subtags, then use a Set to check whether each extension's
> singleton is a duplicate.

It's quite possible that a regular expression engine implements these two patterns using a backtracking algorithm, and so has to reexamine the remainder of the language tag for each singleton or variant in it. But then, in real life most language tags have only 1-3 subtags (language-script-country), and language tags with multiple singleton or variant subtags are very rare.

> If indeed you use the languageTagRE to extract out the singleton components,
> and the variant components as noted below, that's going to raise a new
> issue.  In JS (foo) is a *capturing* pattern, which means the JS engine goes
> to extra work to track substring matches for them.  (But not for .test(),
> which is why there's no issue with the patch as it stands.)  They're exposed
> possibly through regex statics, also through indexing on the result object
> returned from a successful regex.exec().
> 
> (...which, um, is Worrisome, isn't it?  Because it'd mean calling some Intl
> thing might frob regex statics on someone depending on Intl calls not
> changing them.  But then again, Intl stuff is new, and regex statics usage
> is highly disfavored.  So if you end up going the route I suggest below,
> let's ignore this issue for now and file a bug to determine if there's a
> problem and fix it if so.  But this seems like an ignorable issue at the
> very least for the first pass.)

I'd never heard of RegExp statics, and now that I know about them, I'm surprised that they're not listed as one of the most awful parts of JavaScript in the books of the masters...

I looked into this a bit more and found that the statics expose a lot of information about the use of RegExp in self-hosted code to client code. I could reduce the amount of exposed information by using (?:), but not eliminate it entirely. I've filed bug 834989.

> Speaking of which, a part of me finds string-interpolation of "(|)" very
> noisy, and thinks maybe a helper function like function or(...args) { return
> "(?:" args.join("|") + ")"; } might be useful.  Go ahead with something like
> that (using std_Array_join, of course) if you agree it's a reasonable idea.

Seeing that most of the string concatenations building the regular expressions here are more than just "(|)", the or() function wouldn't get us very far. Let's leave it as is.
Attachment #693753 - Attachment is obsolete: true
Attachment #706722 - Flags: review?(jwalden+bmo)
This patch includes the script requested in comment 95 to automatically generate mapping tables from the IANA Language Subtag Registry, as well as the generated data file. The downloaded IANA Language Subtag Registry file itself will follow as part 5 when I get legal clearance.

Unlike requested in comment 5, the script and data live in builtin/ rather than vm/ because neither seem very virtual machine-ish and I'd like to keep the Intl stuff in one place. Also, I didn't see the benefits of requiring arguments for the script, so there aren't any.

The generated data addresses comment 93 and parts of comment 97 - in particular, the extlang mappings now use objects with named properties rather than arrays. The script addresses additional parts of comment 97 by checking some preconditions for CanonicalizeLanguageTag.
Attachment #694210 - Attachment is obsolete: true
Attachment #694211 - Attachment is obsolete: true
Attachment #706895 - Flags: review?(jwalden+bmo)
(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #97)

> @@ +576,5 @@
> > +    var i = 0;
> > +
> > +    // handle standard part: all subtags before first singleton or "x"
> > +    while (i < subtags.length) {
> 
> This seems better-suited to writing it like so:
> 
> var subtag;
> for (var i = 0; i < subtags.length; subtags[i] = subtag, i++) {

No, to me this would indicate that i is only used within the loop, and that it's incremented by 1 for each iteration. Neither is true here.

> @@ +587,5 @@
> > +            subtag = callFunction(std_String_toUpperCase, subtag[0]) +
> > +                     callFunction(std_String_toLowerCase, callFunction(std_String_substring, subtag, 1));
> > +        }
> > +        if (callFunction(std_Object_hasOwnProperty, langSubtagMappings, subtag)) {
> > +            subtag = langSubtagMappings[subtag];
> 
> Should this conversion really be applied to both the 3ALPHA component of a
> language, and to the 3ALPHA component (first, or the permanently-reserved
> second) of an extlang?  Or to the 3alphanum component of an extension? 
> Maybe I just don't know enough about this whole process, but I suspect this
> is something we're only supposed to do to *some* subtags -- it looks like
> just the 3ALPHA of a language to me -- and not all (excluding privateuse) of
> them.

We break out of this loop when we reach an extension sequence or the private use part, so this is while processing the main part of the tag. I added checks to the script generating the mapping data to make sure that mappings for language and extlang subtags don't interfere with subtags of the other kind.

> However this method is used spec-visibly, we probably should have tests for
> these fuzzy parts.

The current test case for this function is here:
http://hg.ecmascript.org/tests/test262/file/927ea8563f7f/test/suite/intl402/ch06/6.2/6.2.3.js
Unfortunately, CanonicalizeLanguageTag is not public API and therefore can't be tested directly. The test case can pass in all kinds of strings, but if an implementation doesn't support the requested locale, it will simply return the default locale, and we have no idea what the function did with the input. During the original development of this function, I added some fake locales for testing purposes, but now I'm testing with just the 400+ ICU locales.

> @@ +590,5 @@
> > +            if (i === 1 && extlangMappings[subtag][1] === subtags[0]) {
> 
> The patch with extlangMappings in it had a comment by the structure saying
> [1] was "(if present)".  I assume that comment was wrong if we're assuming
> existence here?

That comment was indeed wrong. RFC 5646 section 3.1.8 guarantees that extlang records have exactly one prefix.

> @@ +592,5 @@
> > +        } else if (callFunction(std_Object_hasOwnProperty, extlangMappings, subtag)) {
> > +            subtag = extlangMappings[subtag][0];
> > +            if (i === 1 && extlangMappings[subtag][1] === subtags[0]) {
> > +                callFunction(std_Array_shift, subtags);
> > +                i--;
> 
> If |i === 1| here, then subtags should be an array of length 1, right?  It
> looks to me like you could just not do the shift at all, then let the
> assignment below go to element 0 in the array.

No, subtags can still contain script codes, region codes, or variants.

> > +    if (privateUse !== undefined) {
> > +        if (canonical.length > 0)
> > +            canonical += "-" + privateUse;
> 
> Is it ever possible for canonical to be empty, given our input was asserted
> structurally-valid?  Seems like it shouldn't be, to me, because that would
> be as if the first subtag were one character, but that doesn't match the
> 2*3ALPHA in language.

The Language-Tag production allows the entire language tag to be private use.

The other comments are addressed by changes to code or comments.
Attachment #695803 - Attachment is obsolete: true
Attachment #706896 - Flags: review?(jwalden+bmo)
Comment on attachment 706585 [details] [diff] [review]
Add self-hosted JavaScript core of Intl constructors Collator, NumberFormat, DateTimeFormat (part 3)

https://hg.mozilla.org/integration/mozilla-inbound/rev/4baea9b16379
Attachment #706585 - Flags: checkin?(jwalden+bmo)
Blocks: 837957
Blocks: es-intl
The comment trail of this bug is getting quite long, so I'm reducing the scope of this bug:

- Separated out the part of the implementation that depends on ICU as the underlying internationalization library into bug 837957.

- Created new bug 837963 to track the implementation of the ECMAScript Internationalization API as a whole.

- Moved the blockers that only affect the spec conformance of the overall implementation, but don't block progress on implementation work, to bug 837963.
No longer blocks: 724529
Summary: Implement Intl, Intl.Collator, Intl.NumberFormat, Intl.DateTimeFormat → Implement core of Intl, Intl.Collator, Intl.NumberFormat, Intl.DateTimeFormat
Attachment #699996 - Attachment is obsolete: true
Comment on attachment 706722 [details] [diff] [review]
Add self-hosted JavaScript core of Intl constructors Collator, NumberFormat, DateTimeFormat (part 2)

Review of attachment 706722 [details] [diff] [review]:
-----------------------------------------------------------------

Getting there.  Unfortunately, I found issues I missed the first time around, so this still isn't good yet.  :-(  And it's well past end of day for me, so I'll have to continue on with more stuff here tomorrow.  (I had to get at least one of these done today since I said I thought I would earlier.  :-) )

(In reply to Norbert Lindenberg from comment #103)
> > > +function toASCIIUpperCase(s) {
> > 
> > This function seems like it would be better and clearer as an intrinsic,
> > given it's probably going to be doing a function call per character in most
> > common cases.  (I'd assume people will pass all-ASCII-lowercase strings
> > through here most of the time.)  Could you change this into one, please?
> 
> I checked with the profiler against the ECMA-402 conformance test suite, and
> this function is in the general noise, not worth optimizing.

Eh, fair enough, I guess.  Although I wasn't suggesting the change because of it being an optimization, more because it'd read much nicer in C++, I think.  Still, never mind.

> > > +    // alphanum      = (ALPHA / DIGIT)     ; letters and numbers
> > > +    var alphanum = "(" + alpha + "|" + digit + ")";
> > 
> > "[0-9A-Za-z]" is nicer to read, I think, than an extra alternation/breakdown
> > here.  The comment should make clear that this simplification is correct.
> 
> Yes, but once you do that for duplicateVariantRE, the ALPHA variable becomes
> unnecessary, but you still need its documentation... Same for the proposed
> simplification of singleton, which makes DIGIT unnecessary.

Eh, sure.

> > Is this actually correct?  .* is going to eat up a lot, possibly eating up
> > out of the *("-" extension) bit of a langtag and into the ["-" privateuse]
> > part.  What would this do for a language tag like this:
> 
> That's why IsStructurallyValidLanguageTag removes any private use part
> before using duplicateVariantRE and duplicateSingletonRE. I added your input
> string to the conformance test for valid inputs for 6.2.2, and it still
> passes.

Aha, I missed this!  Please add a comment before the |var pos| line along these lines:

"Remove any privateuse component to simplify inputs the regular expressions used here must address."

Also add a blank line before this comment.  More blank lines between conceptually separate things is pretty much always a good thing here.

But see the comments on that actually chunk of code below, too.

> It's quite possible that a regular expression engine implements these two
> patterns using a backtracking algorithm, and so has to reexamine the
> remainder of the language tag for each singleton or variant in it. But then,
> in real life most language tags have only 1-3 subtags
> (language-script-country), and language tags with multiple singleton or
> variant subtags are very rare.

Yeah, if there's no correctness issue we should defer.

> I'd never heard of RegExp statics, and now that I know about them, I'm
> surprised that they're not listed as one of the most awful parts of
> JavaScript in the books of the masters...

They're not listed as such because they're moderately undocumented these days.  I guess our JS reference might talk about them, but we certainly don't talk about them very loudly, so people mostly don't use them.  I don't know exactly how people stumble onto information about them, really.

> I looked into this a bit more and found that the statics expose a lot of
> information about the use of RegExp in self-hosted code to client code. I
> could reduce the amount of exposed information by using (?:), but not
> eliminate it entirely. I've filed bug 834989.

That's fine, all we can really do for now.

::: js/src/builtin/Intl.js
@@ +44,5 @@
> + * Spec: ECMAScript Internationalization API Specification, 6.2.1.
> + */
> +// Subtags, as specified in RFC 5646 section 2.1, are sequences of alphanumeric
> +// characters (letters and digits) with a maximum length of eight characters.
> +// Singletons are single-character subtags.

So as I reread my comment here, I have *no idea* why I was asking for clarification of the term "subtag" here.

...or no, "subtag" was in 6.2.1 referred to here, and I hadn't asked for the definition to be included here.  On second thought, that definition is really what I should have asked for.  Sigh.  How about this (the bits between ====, that is) as the final comment:

====
Regular expression matching a "Unicode locale extension sequence", which the Internationalization API defines as: "any substring of a language tag that starts with a separator '-' and the singleton 'u' and includes the maximum sequence of following non-singleton subtags and their preceding '-' separators."

Alternatively, this may defined as: the components of a language tag that match the extension production in RFC 5646, where the singleton component is "u".

Spec: ECMAScript Internationalization API Specification, 6.2.1.
====

@@ +162,5 @@
> +    // in the RE). If it shows up after "-", without following a singleton or
> +    // "x", and without being followed by alphanum characters that would turn it
> +    // into a different subtag, it's a duplicate variant.
> +    var duplicateVariant = "(?:" + alphanum + "{2,8}-)+" + variant + "-(?:" +
> +                           alphanum + "{2,8}-)*\\1(?!" + alphanum + ")";

To double-check:

(?: + alphanum + {2,8}-)+

is supposed to match |language ["-" script] ["-" region]|.  Right?  (And maybe a few variants, but that doesn't matter -- it's just extra effort, no duplicate will be missed.)

Then you have a variant and a "-".

Then you have zero or more subtags, for (?: alphanum {2,8}-)*, stopping before extension and privateuse get hit.

Then you have the same variant again.

Then you have a negative lookahead assertion that the repeated variant isn't followed by an alphanumeric character, to verify that the second variant isn't just the leading substring of another.

Right?

Let's cons all this together like so (wrapping and such to fit in 79 characters, which I can't judge in this comment):

  // Match a langtag that contains a duplicate variant.
  var duplicateVariant =
      // Match everything in a langtag prior to any variants, and
      // maybe some of the variants as well (which makes this
      // pattern inefficient but not wrong, for our purposes);
      "(?:" + alphanum + "{2,8}-)+" +
      // a variant;
      variant + "-" +
      // zero or more subtags at least two characters long (thus
      // stopping before extension and privateuse components);
      "(?:" + alphanum + "{2,8}-)*" +
      // and the same variant, again
      "\\1" +
      // ...but not followed by any characters that would permit
      // this as a prefix of the other (which would make it a
      // non-duplicate).
      "(?!" + alphanum + ")";

@@ +193,5 @@
> +    // \1 refers back to the singleton (which is in the only capturing
> +    // parenthesis in the RE). If it shows up after "-" and without being
> +    // followed by alphanum characters that would turn it into a different
> +    // subtag, it's a duplicate singleton.
> +    var duplicateSingleton = "-" + singleton + "-(?:.*-)?\\1(?!" + alphanum + ")";

Let's do the same treatment as above here:

var duplicateSingleton =
    // Match a singleton subtag;
    "-" + singleton + "-" +
    // then zero or more subtags;
    "(?:" + alphanum + "+-)*" +
    // and the same singleton subtag again
    "\\1" +
    // ...but not followed by a character that would make
    // this second instance not actually a singleton.
    "(?!" + alphanum + ")";

I changed the zero-or-more bit to avoid quite so much having to backtrack, by limiting to alphanums.  This also, I think, makes more clear that it's matching zero or more subtags.

@@ +212,5 @@
> +function IsStructurallyValidLanguageTag(locale) {
> +    assert(typeof locale === "string", "IsStructurallyValidLanguageTag");
> +    if (!callFunction(std_RegExp_test, languageTagRE, locale))
> +        return false;
> +    var pos = callFunction(std_String_indexOf, locale, "-x-");

A language tag is structurally valid if it consists solely of a privateuse component, right?  That seems to mean that, say, "x-u-foo-u-foo" and "x-12345-12345" get past the strip-privateuse part.  As your regexps earlier aren't anchored at the start, I think this means a structurally invalid language tag that contains a duplicate singleton, or a duplicate variant, prefixed with "x-", will start claiming the overall thing isn't structurally valid even when it is.  Examples:

js> IsStructurallyValidLanguageTag("x-en-US-12345") // correct
true
js> IsStructurallyValidLanguageTag("x-en-US-12345-12345") // !!!
false
js> IsStructurallyValidLanguageTag("en-u-foo") // correct
true
js> IsStructurallyValidLanguageTag("x-en-u-foo") // correct
true
js> IsStructurallyValidLanguageTag("x-en-u-foo-u-bar") // !!!
false

Can you add a std_String_startsWith and pass it "x-", then return true if the startsWith test passes?  I think that fixes all these issues.  And of course add tests wherever that'd be.
Attachment #706722 - Flags: review?(jwalden+bmo) → review-
Comment on attachment 706895 [details] [diff] [review]
Add self-hosted JavaScript core of Intl constructors Collator, NumberFormat, DateTimeFormat (part 4)

Review of attachment 706895 [details] [diff] [review]:
-----------------------------------------------------------------

This patch should include the language-subtag-registry.txt used to generate this stuff, too.  That way, if we ever need to regenerate IntlData.js (suppose a bug is found), we can do it without also being required to update the language registry data at the same time.  This is important if we discover the bug close to release and can't take on whatever quantum of risk is involved in updating language tag data.

I guess our current precedent would be for naming the Python file with underscores, not dashes, so make_intl_data.py, please.

This all looks pretty good -- definitely much better than hand-generated data! -- but there's enough little things with room for dispute over what's best that I might as well make it a formal r- until we've converged a little more.

::: js/src/builtin/make-intl-data.py
@@ +1,1 @@
> +#!/usr/bin/env python3.2

We currently have to build with python 2.7, so make this just |python|.  No problem keeping to 3.x compatibility, just can't require it to build as happens here.

I see a few code lines with trailing ';' on them -- please remove those.

@@ +11,5 @@
> +#
> +# The IANA Language Subtag Registry is imported from
> +# http://www.iana.org/assignments/language-subtag-registry
> +# and uses the syntax specified in
> +# http://tools.ietf.org/html/rfc5646#section-3

Possibly this whole thing should be a triple-quoted docstring comment?

@@ +13,5 @@
> +# http://www.iana.org/assignments/language-subtag-registry
> +# and uses the syntax specified in
> +# http://tools.ietf.org/html/rfc5646#section-3
> +
> +

Only one blank line, please.

@@ +16,5 @@
> +
> +
> +import urllib.request
> +
> +

And here.

@@ +61,5 @@
> +    """
> +    langTagMappings = {}
> +    langSubtagMappings = {}
> +    extlangMappings = {}
> +    languageCodes = []

Should languageCodes possibly be a set()?  That or we should assert that there are no duplicates at the end of all this, maybe.

@@ +78,5 @@
> +            tag = record["Tag"]
> +            if "Preferred-Value" in record:
> +                langTagMappings[tag.lower()] = record["Preferred-Value"]
> +            else:
> +                langTagMappings[tag.lower()] = tag

For my own edification, what's the reason for no-op mappings like "zh-min" => "zh-min"?

@@ +79,5 @@
> +            if "Preferred-Value" in record:
> +                langTagMappings[tag.lower()] = record["Preferred-Value"]
> +            else:
> +                langTagMappings[tag.lower()] = tag
> +        if record["Type"] == "redundant":

If we're handling records casewise by "Type", which I see we are, let's use elif for all these.

@@ +92,5 @@
> +                languageCodes.append(subtag)
> +            if "Preferred-Value" in record:
> +                if subtag == "heploc":
> +                    # The entry for heploc is unique in its complexity; handle
> +                    # it as special case below.

Good times.

@@ +110,5 @@
> +            if "Preferred-Value" in record:
> +                preferred = record["Preferred-Value"]
> +                prefix = record["Prefix"]
> +                if subtag in languageCodes and subtag != preferred:
> +                    raise Exception("Conflict: lang with extlang mapping: " + subtag);

This depends on having seen the language record before the extlang record, correct?  I don't seem to remember seeing that as a requirement in the RFC, although I could have just missed it.  If so, for robustness we should make this into a second pass after the main loop:

for lang in languageCodes:
    if lang in extlangMappings and extlangMappings[lang]["preferred"] != lang:
        raise Exception("Conflict: lang with extlang mapping: " + lang)

@@ +111,5 @@
> +                preferred = record["Preferred-Value"]
> +                prefix = record["Prefix"]
> +                if subtag in languageCodes and subtag != preferred:
> +                    raise Exception("Conflict: lang with extlang mapping: " + subtag);
> +                extlangMappings[subtag] = {"preferred": preferred, "prefix": prefix}

{ "preferred": preferrred, "prefix": prefix } with extra whitespace.

Also add an else case, then |assert False, "Unrecognized Type: %s" record["Type"]|, prefixed with a comment something like this:

# Only the previously-tested types are valid, per http://tools.ietf.org/html/rfc5646#section-3.1.2

@@ +116,5 @@
> +
> +    # Special case for heploc.
> +    langTagMappings["ja-latn-hepburn-heploc"] = "ja-Latn-alalc97"
> +
> +    return fileDate, langTagMappings, langSubtagMappings, extlangMappings

Please return a record here, too, so the tuple ordering's not important.

@@ +132,5 @@
> +    intlData.write("var " + name + " = {\n")
> +    keys = sorted(dict)
> +    lastKey = keys[len(keys) - 1]
> +    for key in keys:
> +        intlData.write("    \"" + key + "\": ");

I'd prefer if you used "" to quote the strings, and ' as the internal quote character.  Escaped quotes get unreadable pretty quickly.

Any chance you could use % instead of string concatenation?  (Or if that's removed in Python 3 -- I can't remember -- string.format or whatever the replacement is?)  "   '%s': " reads much better than addition for this case.

@@ +138,5 @@
> +            intlData.write("\"" + dict[key] + "\"")
> +        else:
> +            preferred = dict[key]["preferred"]
> +            prefix = dict[key]["prefix"]
> +            intlData.write("{preferred: \"" + preferred + "\", prefix: \"" + prefix + "\"}")

Could you form up the value here into a single string, then write the entire key's entry to intlData at once?

    if isinstance(dict[key], str):
        value = "'%s'" % dict[key]
    else:
        value = "{ preferred: '%s', prefix: '%s' }" % dict[key]["preferred"], dict[key]["prefix"]
    intlData.write("    '%': %s,\n" % key, value)

@@ +141,5 @@
> +            prefix = dict[key]["prefix"]
> +            intlData.write("{preferred: \"" + preferred + "\", prefix: \"" + prefix + "\"}")
> +        if key != lastKey:
> +            intlData.write(",")
> +        intlData.write("\n")

Object literals in JS (ES5 or later) can have trailing commas, so you don't need to handle commas specially -- just splat a comma and linebreak after every one.

@@ +150,5 @@
> +    """
> +        Writes the language tag data to the Intl data file.
> +    """
> +    writeMappingsVar(intlData, langTagMappings, "langTagMappings",
> +                     "Mappings from complete tags to preferred values", fileDate, url)

I'd use a file-level comment to note fileDate/url, particularly as they're invariant per section, but it doesn't really matter.

Yes, I said just the opposite last time.  This either indicates I'm slightly mad or that, now that this is all generated, I don't especially expect people to really look anywhere past the start of the file for derived-from information.  Your pick.  ;-)

@@ +167,5 @@
> +registry.write(text)
> +registry.seek(0)
> +
> +print("Processing IANA Language Subtag Registry...")
> +fileDate, langTagMappings, langSubtagMappings, extlangMappings = readRegistry(registry)

data = readRegistry(registry)
fileDate = data["fileDate"]
etc.

@@ +174,5 @@
> +print("Writing Intl data...")
> +intlData = open("IntlData.js", "w", encoding="utf-8")
> +intlData.write("// Generated by make-intl-data.py. DO NOT EDIT.\n")
> +writeLanguageTagData(intlData, fileDate, url, langTagMappings, langSubtagMappings, extlangMappings)
> +intlData.close()

Let's do all this top-level stuff the way make_unicode.py does it, with the same interface:

http://mxr.mozilla.org/mozilla-central/source/js/src/vm/make_unicode.py#365

That is, check for __main__, import urllib inside that (or maybe urllib2?  probably should use that unless there's specific reason not to, since precedent is for using that), use sys.argv[1] if present, otherwise download to  "language-subtag-registry.txt" then seek to start.
Attachment #706895 - Flags: review?(jwalden+bmo) → review-
(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #109)
> Comment on attachment 706722 [details] [diff] [review]

> Let's cons all this together like so (wrapping and such to fit in 79
> characters, which I can't judge in this comment):

Done for both regular expressions. I also moved the parentheses for variant and singleton to make the back references clearer.

> A language tag is structurally valid if it consists solely of a privateuse
> component, right?  That seems to mean that, say, "x-u-foo-u-foo" and
> "x-12345-12345" get past the strip-privateuse part.  As your regexps earlier
> aren't anchored at the start, I think this means a structurally invalid
> language tag that contains a duplicate singleton, or a duplicate variant,
> prefixed with "x-", will start claiming the overall thing isn't structurally
> valid even when it is.

Yep, you found a real bug here - thank you! Fixed code, added tests to conformance test suite.

Looking at this further, I realized that there's also a related bug in the specification: The definition of “Unicode locale extension sequence” in 6.2.1 doesn't take private use subtag sequences into consideration. I've filed bug
https://bugs.ecmascript.org/show_bug.cgi?id=1244
but need to take this to es-discuss before making changes to spec, code, or tests.
Attachment #706722 - Attachment is obsolete: true
Attachment #710853 - Flags: review?(jwalden+bmo)
Comment on attachment 706896 [details] [diff] [review]
Add self-hosted JavaScript core of Intl constructors Collator, NumberFormat, DateTimeFormat (part 6)

Review of attachment 706896 [details] [diff] [review]:
-----------------------------------------------------------------

Looks basically good with a bunch of little comment fixes and the like, addressing the typeof noted at the end, and the followup request happening in a separate patch here eventually.

(In reply to Norbert Lindenberg from comment #105)
> No, to me this would indicate that i is only used within the loop, and that
> it's incremented by 1 for each iteration. Neither is true here.

Oh, absolutely right, ignore me.  :-)

> We break out of this loop when we reach an extension sequence or the private
> use part, so this is while processing the main part of the tag. I added
> checks to the script generating the mapping data to make sure that mappings
> for language and extlang subtags don't interfere with subtags of the other
> kind.

Ah, okay.  I kind of wish we weren't treating all subtags all at once, rather processing each component in a separate step for greater readability, but this works for now.

> The current test case for this function is here:
> http://hg.ecmascript.org/tests/test262/file/927ea8563f7f/test/suite/intl402/
> ch06/6.2/6.2.3.js
> Unfortunately, CanonicalizeLanguageTag is not public API and therefore can't
> be tested directly. The test case can pass in all kinds of strings, but if
> an implementation doesn't support the requested locale, it will simply
> return the default locale, and we have no idea what the function did with
> the input. During the original development of this function, I added some
> fake locales for testing purposes, but now I'm testing with just the 400+
> ICU locales.

Hmm.  I think the way we hacked around this for Unicode character data was to generate test files in vm/make_unicode.py into js/src/tests that would exercise these things more closely.  Maybe (separate patch) you could augment make_intl_data.py (?) to spit out the concatenation of IntlData.js and Intl.js into a test file, then do some tests of inputs against our function implementations directly.  Not necessary as part of this patch, but I'd like to see us unit-testing these methods directly if there's no good spec way to do so.

Another possibility might be adding a shell builtin method that somehow extracts out the relevant self-hosted function and tests its output.  Till might be able to help you with this, or talk about how easy or difficult it might be.  Me, I tend to think the first option is simpler, and I'd run with that if it were me.  But either way would work for me.

> > If |i === 1| here, then subtags should be an array of length 1, right?  It
> > looks to me like you could just not do the shift at all, then let the
> > assignment below go to element 0 in the array.
> 
> No, subtags can still contain script codes, region codes, or variants.

Hm, yes, my mistake.

> The Language-Tag production allows the entire language tag to be private use.

Ah, right.  That'll be clearer with the comment I mention inline.

::: js/src/builtin/Intl.js
@@ +221,5 @@
>  }
> +
> +
> +/**
> + * Canonicalizes the given well-formed BCP 47 language tag, including

Might be worth saying "structurally-valid" rather than well-formed, since that's the description used elsewhere.

@@ +232,5 @@
> + * -variant2      ; *("-" variant)
> + * -Variant1
> + * -u-ca-chinese  ; *("-" extension)
> + * -t-Zh-laTN
> + * -x-PRIVATE     ; ["-" privateuse]

Blank lines before and after the whole breakdown, for readability -- also indent the whole breakdown a couple spaces for the same reason.

@@ +246,5 @@
> +    // technically it's not necessary to adjust case. But for easier processing,
> +    // and because the canonical form for most subtags is lower case, we start
> +    // with lower case for all.
> +    // "Zh-NAN-haNS-bu-variant2-Variant1-u-ca-chinese-t-Zh-laTN-x-PRIVATE" ->
> +    // "zh-nan-hans-bu-variant2-variant1-u-ca-chinese-t-zh-latn-x-private"

The example here seems unnecessary.  :-)

Er, actually, later I see what you're doing -- you're introducing an example input and showing how it gets transformed through the method.  That's a good idea.  But let's introduce that in a separate comment, after the first assert (assert, blank line, comment, blank line, "Language tags are...").

// The input "Zh-NAN-haNS-bu-variant2-Variant1-u-ca-chinese-t-Zh-laTN-x-PRIVATE"
// will be used throughout this method to illustrate how it works.

@@ +329,5 @@
> +    var canonical = normal;
> +    if (extensions.length > 0)
> +        canonical += "-" + extensions.join("-");
> +    if (privateUse.length > 0) {
> +        if (canonical.length > 0)

Add a // Be careful of a Language-Tag that is entirely privateuse. here.

@@ +346,5 @@
> + * Spec: ECMAScript Internationalization API Specification, 6.3.1.
> + */
> +function IsWellFormedCurrencyCode(currency) {
> +    assert(typeof currency === "string", "IsWellFormedCurrencyCode");
> +    var c = ToString(currency);

Er.  Is |currency| a string, or is it not?  If it is, the ToString here shouldn't be necessary.  If it isn't, the assert shouldn't be here.  Looking purely at the spec, it looks to me like this typeof won't work if an options object with a "currency" property of |undefined| is used.  Does the test suite have a test for that?
Attachment #706896 - Flags: review?(jwalden+bmo) → review+
Comment on attachment 710853 [details] [diff] [review]
Add self-hosted JavaScript core of Intl constructors Collator, NumberFormat, DateTimeFormat (part 2)

Review of attachment 710853 [details] [diff] [review]:
-----------------------------------------------------------------

Nice catch on the spec bug.  I'd like to think I should have seen that looking this closely at the code, although I certainly have excuses for not doing so.  :-)

::: js/src/builtin/Intl.js
@@ +43,5 @@
> + * specification defines as: "any substring of a language tag that starts with
> + * a separator '-' and the singleton 'u' and includes the maximum sequence of
> + * following non-singleton subtags and their preceding '-' separators."
> + *
> + * Alternatively, this may defined as: the components of a language tag that

Oop, "may be defined".
Attachment #710853 - Flags: review?(jwalden+bmo) → review+
(In reply to Norbert Lindenberg from comment #104)
> Also, I didn't see the benefits of requiring arguments for the script, so
> there aren't any.

Just in case I didn't justify it well enough in comment 110, the reason to take arguments is to split updating the data-generation script (and its output) and updating its source data into two parts.  The former is likely to happen periodically as more people read this code and suggest comment fixes, or start emitting extra mappings or Intl data derived from the database.  The latter will happen when our internationalization people observe new updates to the registry.  We want these as separate steps to make it simpler to figure out when an update introduces a behavioral change reported as a bug -- whether the change is due to the registry being updated, or due to our change to the data generated, or the scripts generating it.  Updating the registry used should be a painless process up til they change the format on us.  Changing the script that parses it is a different matter.  But it's hopefully going to be less frequent than updating the registry, so it's good to have two different actions to keep the risk levels of each separate.
Updated per comment 113. Carrying r+jwalden.
Attachment #710853 - Attachment is obsolete: true
Attachment #711152 - Flags: review+
Attachment #711152 - Flags: checkin?(jwalden+bmo)
Comment on attachment 711152 [details] [diff] [review]
Add self-hosted JavaScript core of Intl constructors Collator, NumberFormat, DateTimeFormat (part 2)

https://hg.mozilla.org/integration/mozilla-inbound/rev/dbc54456b832
Attachment #711152 - Flags: checkin?(jwalden+bmo)
(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #110)
> Review of attachment 706895 [details] [diff] [review]:

> This patch should include the language-subtag-registry.txt used to generate
> this stuff, too.  That way, if we ever need to regenerate IntlData.js
> (suppose a bug is found), we can do it without also being required to update
> the language registry data at the same time.  This is important if we
> discover the bug close to release and can't take on whatever quantum of risk
> is involved in updating language tag data.

Understood. As I indicated in comment 104, I'm still waiting for legal clearance to add a copy of the downloaded data into the repository.

> I guess our current precedent would be for naming the Python file with
> underscores, not dashes, so make_intl_data.py, please.

Actually, there are precedents for both in roughly equal measure in both js/ and m-c as a whole. I'll refrain from using one of each though :-)

> > +#!/usr/bin/env python3.2
> 
> We currently have to build with python 2.7, so make this just |python|.  No
> problem keeping to 3.x compatibility, just can't require it to build as
> happens here.

This script isn't used as part of the build; only the engineers updating the data or the script itself have to be able to run it. And 3.x and 2.7 aren't compatible; on first attempt to run the script with 2.7 I got reminded that urllib is organized differently. I can move it back to 2.7 if that's really necessary, but I thought Mozilla is trying to move to 3.x in general.

> For my own edification, what's the reason for no-op mappings like "zh-min"
> => "zh-min"?

As the comment says, grandfathered tags don't use standard syntax, so CanonicalizeLanguageTag expects the mapping table to provide the final form for all.

> @@ +111,5 @@
> > +                preferred = record["Preferred-Value"]
> > +                prefix = record["Prefix"]
> > +                if subtag in languageCodes and subtag != preferred:
> > +                    raise Exception("Conflict: lang with extlang mapping: " + subtag);
> > +                extlangMappings[subtag] = {"preferred": preferred, "prefix": prefix}
> 
> { "preferred": preferrred, "prefix": prefix } with extra whitespace.

No: http://www.python.org/dev/peps/pep-0008/#whitespace-in-expressions-and-statements

> > +        if key != lastKey:
> > +            intlData.write(",")
> 
> Object literals in JS (ES5 or later) can have trailing commas, so you don't
> need to handle commas specially -- just splat a comma and linebreak after
> every one.

I know. Unfortunately, jshint doesn't.

> I'd use a file-level comment to note fileDate/url, particularly as they're
> invariant per section, but it doesn't really matter.

There will likely be other data for Intl.js that will be derived from other sources (e.g., the IANA time zone database), in which case tagging each variable separately works better.

(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #114)
> (In reply to Norbert Lindenberg from comment #104)
> > Also, I didn't see the benefits of requiring arguments for the script, so
> > there aren't any.
> 
> Just in case I didn't justify it well enough in comment 110, the reason to
> take arguments is to split updating the data-generation script (and its
> output) and updating its source data into two parts.  The former is likely
> to happen periodically as more people read this code and suggest comment
> fixes, or start emitting extra mappings or Intl data derived from the
> database.  The latter will happen when our internationalization people
> observe new updates to the registry.  We want these as separate steps to
> make it simpler to figure out when an update introduces a behavioral change
> reported as a bug -- whether the change is due to the registry being
> updated, or due to our change to the data generated, or the scripts
> generating it.  Updating the registry used should be a painless process up
> til they change the format on us.  Changing the script that parses it is a
> different matter.  But it's hopefully going to be less frequent than
> updating the registry, so it's good to have two different actions to keep
> the risk levels of each separate.

I see. OK, changed to follow the structure of make_unicode.py.

When we start adding data from other sources, asking for a file name won't work anymore; we'll probable just want to distinguish between "update IntlData from files already downloaded" and "update the downloaded files, then update IntlData". But for now this is OK.
Attachment #706895 - Attachment is obsolete: true
Attachment #711177 - Flags: review?(jwalden+bmo)
Updated as requested in comment 112. Carrying r+jwalden.

(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #112)
> Comment on attachment 706896 [details] [diff] [review]

> @@ +346,5 @@
> > + * Spec: ECMAScript Internationalization API Specification, 6.3.1.
> > + */
> > +function IsWellFormedCurrencyCode(currency) {
> > +    assert(typeof currency === "string", "IsWellFormedCurrencyCode");
> > +    var c = ToString(currency);
> 
> Er.  Is |currency| a string, or is it not?  If it is, the ToString here
> shouldn't be necessary.  If it isn't, the assert shouldn't be here.  Looking
> purely at the spec, it looks to me like this typeof won't work if an options
> object with a "currency" property of |undefined| is used.  Does the test
> suite have a test for that?

It depends on whether you read the "and" in 11.1.1.1 step 18 Pascal-style or C-style. The implementation of course is C-style, so things work out. I'll have to clarify that in the spec, but removing the assertion works for now.

Related test:
http://hg.ecmascript.org/tests/test262/file/927ea8563f7f/test/suite/intl402/ch11/11.1/11.1.1_19.js
Attachment #706896 - Attachment is obsolete: true
Attachment #711482 - Flags: review+
Attachment #711482 - Flags: checkin?(jwalden+bmo)
Comment on attachment 711482 [details] [diff] [review]
Add self-hosted JavaScript core of Intl constructors Collator, NumberFormat, DateTimeFormat (part 6)

https://hg.mozilla.org/integration/mozilla-inbound/rev/c9d72487aeae
Attachment #711482 - Flags: checkin?(jwalden+bmo)
Comment on attachment 701878 [details] [diff] [review]
Add self-hosted JavaScript core of Intl constructors Collator, NumberFormat, DateTimeFormat (part 7)

Review of attachment 701878 [details] [diff] [review]:
-----------------------------------------------------------------

Some of these spec methods are getting long enough that I'm starting to want explicit // Step 1. comments by them.  It's one thing when you have a three-step method with an obvious implementation, another matter when the algorithm is half a page.  I'm not quite concerned enough to require this just yet, but it's something you should probably keep in mind for future patches.

::: js/src/builtin/Intl.js
@@ +668,5 @@
> +    if (availableLocales["zh-Hant-HK"])
> +        availableLocales["zh-HK"] = true;
> +    if (availableLocales["zh-Hant-TW"])
> +        availableLocales["zh-TW"] = true;
> +    return availableLocales;

I would have this not return anything, rather than ostensibly produce a new value that's really an old one.

@@ +688,5 @@
> +    var len = TO_UINT32(O.length);
> +    var k = 0;
> +    while (k < len) {
> +        var Pk = ToString(k);
> +        var kPresent = HasProperty(O, Pk);

The spec has to do ToString here because it thinks of all properties in terms of strings.  But it's faster and simpler for us (because these are all uint32_t indexes) if you don't bother with ToString here.  So eliminate |Pk| and just use |k| everywhere here.

@@ +692,5 @@
> +        var kPresent = HasProperty(O, Pk);
> +        if (kPresent) {
> +            var kValue = O[Pk];
> +            if (!(typeof kValue === "string" ||
> +                  (typeof kValue === "object" && kValue !== null)))

This condition doesn't quite implement the spec test.  The reason is that SpiderMonkey (and every other browser-based implementation, actually, except maybe one) has an extension where there are objects which look like the value |undefined| or the value |null| in certain contexts.  Specifically, ToBoolean on them returns false, typeof on them returns "undefined", and they compare as if equal to |null| and |undefined| when the == and != operators are used.

The upshot is you need one more test here to fully implement the spec language, without introducing yet another way that such (hated) objects are special:

    typeof kValue === "undefined" && kValue !== undefined

Glorious, ain't it?

A test for this can't be written in pure JS because objects that emulate |undefined| are an extension.  You could sort of write it as a browser-based test, using |document.all| as such a falsy object, but even that has a little complexity.  Really we want a shell-only test for this, tho, for ease of running it.  I think this would be one:

  try
  {
    new Intl.Collator(objectEmulatingUndefined());
    assertEq(true, true, "shouldn't have thrown");
  }
  catch (e)
  {
    assertEq(true, false,
             "Unexpected error thrown: " + e);
  }

@@ +693,5 @@
> +        if (kPresent) {
> +            var kValue = O[Pk];
> +            if (!(typeof kValue === "string" ||
> +                  (typeof kValue === "object" && kValue !== null)))
> +                ThrowError(JSMSG_INVALID_LOCALES_ELEMENT);

Because the if-condition here spans multiple lines, this should be braced for readability:

if (...
    ...)
{
    ThrowError(...);
}

@@ +699,5 @@
> +            if (!IsStructurallyValidLanguageTag(tag))
> +                ThrowError(JSMSG_INVALID_LANGUAGE_TAG, tag);
> +            tag = CanonicalizeLanguageTag(tag);
> +            if (seen.indexOf(tag) === -1)
> +                seen.push(tag);

So CanonicalizeLocaleList is quadratic.  I will be very surprised if we don't reformulate this as a Set in relatively short order, when the web starts throwing relatively long locale lists at it.  I guess this being a semi-onetime cost helps somewhat, but even still, someone *will* be bitten by this.

@@ +754,5 @@
> +    var locale, noExtensionsLocale;
> +    while (i < len && availableLocale === undefined) {
> +        locale = requestedLocales[i];
> +        noExtensionsLocale = callFunction(std_String_replace, locale, unicodeLocaleExtensionSequenceGlobalRE, "");
> +        availableLocale = BestAvailableLocale(availableLocales, noExtensionsLocale);

The requested locales are canonicalized, if I read the spec right and track callers correctly.  But that doesn't remove the privateuse part of the tag.  So this munges that part as well as the Unicode parts that actually should be munged, right?  That's another component of the spec bug you mentioned earlier

Except, I'm *guessing* that this doesn't matter because our locale data is never going to contain privateuse components.  Maybe.  If this is the case, and we're leaving the code this way for some longer run, this should be called out in a comment.
Attachment #701878 - Flags: review?(jwalden+bmo) → review+
Updated per comment 124, including the gory details about document.all. Carrying r+jwalden.

(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #124)
> Comment on attachment 701878 [details] [diff] [review]

> > +    return availableLocales;
> 
> I would have this not return anything, rather than ostensibly produce a new
> value that's really an old one.

I added a comment saying that it returns the argument provided. Doing so simplifies its callers later on.

> > +            if (seen.indexOf(tag) === -1)
> > +                seen.push(tag);
> 
> So CanonicalizeLocaleList is quadratic.  I will be very surprised if we
> don't reformulate this as a Set in relatively short order, when the web
> starts throwing relatively long locale lists at it.  I guess this being a
> semi-onetime cost helps somewhat, but even still, someone *will* be bitten
> by this.

If we do that, it would have to be a pair List/Set. The order of requested locales lists is significant. Requested locales lists are likely to be short though - in typical use, they're derived from the list of languages that the user understands.

> > +        noExtensionsLocale = callFunction(std_String_replace, locale, unicodeLocaleExtensionSequenceGlobalRE, "");
> 
> The requested locales are canonicalized, if I read the spec right and track
> callers correctly.  But that doesn't remove the privateuse part of the tag. 
> So this munges that part as well as the Unicode parts that actually should
> be munged, right?  That's another component of the spec bug you mentioned
> earlier

Correct.

> Except, I'm *guessing* that this doesn't matter because our locale data is
> never going to contain privateuse components.  Maybe.  If this is the case,
> and we're leaving the code this way for some longer run, this should be
> called out in a comment.

At this point, the ICU locale list doesn't have privateuse components, but there's no guarantee that'll never change or Mozilla will never use a different library. I've filed bug 839372.
Attachment #701878 - Attachment is obsolete: true
Attachment #711670 - Flags: review+
Attachment #711670 - Flags: checkin?(jwalden+bmo)
Comment on attachment 711670 [details] [diff] [review]
Add self-hosted JavaScript core of Intl constructors Collator, NumberFormat, DateTimeFormat (part 7)

https://hg.mozilla.org/integration/mozilla-inbound/rev/594fa394a334

(In reply to Norbert Lindenberg from comment #125)
> If we do that, it would have to be a pair List/Set. The order of requested
> locales lists is significant. Requested locales lists are likely to be short
> though - in typical use, they're derived from the list of languages that the
> user understands.

Hmm, I think I may have mixed up lists a bit, because I was thinking of [[availableLocales]] that's a List but whose order is specifically stated to be irrelevant.  Maybe we will be good here, for a little bit, at least.

> At this point, the ICU locale list doesn't have privateuse components, but
> there's no guarantee that'll never change or Mozilla will never use a
> different library. I've filed bug 839372.

Sounds good.
Attachment #711670 - Flags: checkin?(jwalden+bmo)
(In reply to Norbert Lindenberg from comment #117)
> > > +#!/usr/bin/env python3.2
> > 
> > We currently have to build with python 2.7, so make this just |python|.  No
> > problem keeping to 3.x compatibility, just can't require it to build as
> > happens here.
> 
> This script isn't used as part of the build; only the engineers updating the
> data or the script itself have to be able to run it. And 3.x and 2.7 aren't
> compatible; on first attempt to run the script with 2.7 I got reminded that
> urllib is organized differently. I can move it back to 2.7 if that's really
> necessary, but I thought Mozilla is trying to move to 3.x in general.

I think it is.  The standard way you set up a build environment on Windows is to use a MozillaBuild package for it, and that ships only Python 2.7.  I suspect our Mac build instructions don't include a Python 3.2, either, although it might be easier to get there than on Windows.

As for us "moving to" 3.x, we're trying to be more 3.x-compatible in what we write, but the primary target remains Python 2.x for now.  Looking at docs, it seems urllib is one of those 2/3 incompatibility points.  :-(  Up to you whether you just revert to 2.x compatibility or do something with try/except and import hacks to be 2/3 compatible both, but really we want something that's 2.x-compatible by default.

> > { "preferred": preferrred, "prefix": prefix } with extra whitespace.
> 
> No:
> http://www.python.org/dev/peps/pep-0008/#whitespace-in-expressions-and-
> statements

Sigh, Word of God.  :-(

> > Object literals in JS (ES5 or later) can have trailing commas, so you don't
> > need to handle commas specially -- just splat a comma and linebreak after
> > every one.
> 
> I know. Unfortunately, jshint doesn't.

Even with es5:1 at the top of the file?

> There will likely be other data for Intl.js that will be derived from other
> sources (e.g., the IANA time zone database), in which case tagging each
> variable separately works better.

Makes sense.

> When we start adding data from other sources, asking for a file name won't
> work anymore; we'll probable just want to distinguish between "update
> IntlData from files already downloaded" and "update the downloaded files,
> then update IntlData". But for now this is OK.

Yeah.  Maybe --update and --generate as options, or something.  We'll figure it out then.
(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #128)
> > > { "preferred": preferrred, "prefix": prefix } with extra whitespace.
> > 
> > No:
> > http://www.python.org/dev/peps/pep-0008/#whitespace-in-expressions-and-
> > statements
> 
> Sigh, Word of God.  :-(

"A style guide is about consistency. Consistency with this style guide is important. Consistency within a project is more important."

http://www.python.org/dev/peps/pep-0008/#a-foolish-consistency-is-the-hobgoblin-of-little-minds
(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #128)

Python 2.7 it is.

> > > Object literals in JS (ES5 or later) can have trailing commas, [...]
> > 
> > I know. Unfortunately, jshint doesn't.
> 
> Even with es5:1 at the top of the file?

Thanks - I didn't know that that affects commas.
Attachment #711177 - Attachment is obsolete: true
Attachment #711177 - Flags: review?(jwalden+bmo)
Attachment #712018 - Flags: review?(jwalden+bmo)
Comment on attachment 701880 [details] [diff] [review]
Add self-hosted JavaScript core of Intl constructors Collator, NumberFormat, DateTimeFormat (part 8)

Review of attachment 701880 [details] [diff] [review]:
-----------------------------------------------------------------

I didn't really look at ResolveLocale because of the lack-of-steps issue, so you'll probably get more than a few comments on that bit next time around, as a heads-up.

::: js/src/builtin/Intl.js
@@ +643,5 @@
> +
> +    var locale = RuntimeDefaultLocale();
> +    if (!IsStructurallyValidLanguageTag(locale)) {
> +        return localeOfLastResort;
> +    }

No braces, please.

We should eventually warn here, or something, if this happens, so we can maybe fix it.  But this works for now.

@@ +645,5 @@
> +    if (!IsStructurallyValidLanguageTag(locale)) {
> +        return localeOfLastResort;
> +    }
> +    locale = CanonicalizeLanguageTag(locale);
> +    switch (locale) {

Hmm.  So this set of mappings is identical to the ones that happen in addOldStyleLanguageTags, just in reverse?  Could we just keep these in an array in here, then have each place iterate over the array doing things generically?  That's pretty much what's going to happen already, in terms of how the switch'll get implemented.

@@ +665,5 @@
> +        break;
> +    }
> +    if (!(collatorInternalProperties.__availableLocales[locale] &&
> +          numberFormatInternalProperties.__availableLocales[locale] &&
> +          dateTimeFormatInternalProperties.__availableLocales[locale])) {

This too could use a brace on a new line.

@@ +839,5 @@
> + * options into consideration.
> + *
> + * Spec: ECMAScript Internationalization API Specification, 9.2.5.
> + */
> +function ResolveLocale(availableLocales, requestedLocales, options, relevantExtensionKeys, localeData) {

Okay, yeah, this is too big for one huge splat of code to be readable, and moreover to be easily correlated with the spec algorithm.  Please add

// Step 1.
// Step 3-5.
// Step 11(c).

and so on throughout this.  Feel free to group steps that perform one logical action, so long as it doesn't get too huge grouping them.
Attachment #701880 - Flags: review?(jwalden+bmo) → review-
Updated per comment 131.
Attachment #701880 - Attachment is obsolete: true
Attachment #712351 - Flags: review?(jwalden+bmo)
Comment on attachment 701881 [details] [diff] [review]
Add self-hosted JavaScript core of Intl constructors Collator, NumberFormat, DateTimeFormat (part 9)

Review of attachment 701881 [details] [diff] [review]:
-----------------------------------------------------------------

Yeah, I think I want // Step N. comments in all this.

::: js/src/builtin/Intl.js
@@ +938,5 @@
> +        if (availableLocale !== undefined)
> +            subset.push(locale);
> +        k++;
> +    }
> +    return subset.slice(0);

I've been a bit confused for awhile as to why we have List at all, versus using arrays directly.  But it hasn't actually made a meaningful difference in the code I've seen up to now.  Here, where the List gets converted to an array eventually, we should definitely use an array from the start.  And I'm not sure why we shouldn't just use arrays from the start for all of this, particularly as our array methods can be faster on Arrays than on other kinds of objects.

@@ +977,5 @@
> +    var subset;
> +    if (matcher === undefined || matcher === "best fit")
> +        subset = BestFitSupportedLocales(availableLocales, requestedLocales);
> +    else
> +        subset = LookupSupportedLocales(availableLocales, requestedLocales);

We'd use a ternary here, i.e.

var subset = (matcher === undefined || matcher === "best fit")
             ? BestFitBlahBlahBlah
             : LookupSupportedBlahBlahBlah;

@@ +1005,5 @@
> +            value = ToString(value);
> +            break;
> +        default:
> +            assert(false, "GetOption");
> +        }

Please use an if-else if-else here -- matches the algorithm structure better.

@@ +1040,5 @@
> +// ??? stub
> +var runtimeAvailableLocales = (function () {
> +    var o = std_Object_create(null);
> +    o[RuntimeDefaultLocale()] = true;
> +    return addOldStyleLanguageTags(o);

Whee that stubbed-out code ugly!  :-)
Attachment #701881 - Flags: review?(jwalden+bmo)
Comment on attachment 712018 [details] [diff] [review]
Add self-hosted JavaScript core of Intl constructors Collator, NumberFormat, DateTimeFormat (part 4)

Review of attachment 712018 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good.  A last few things, some I'm sure I just didn't notice last time through, and we'll be good.

::: js/src/builtin/make_intl_data.py
@@ +85,5 @@
> +                langTagMappings[record["Tag"].lower()] = record["Preferred-Value"]
> +
> +        # For langSubtagMappings, keys must be in canonical case; values in
> +        # canonical form.
> +        elif record["Type"] in ("language", "script", "region", "variant"):

Could you put these comments inside the elifs, and get rid of the blank line between one elif and the next?  The blank line would ordinarily indicate the end of the if/elif/else chain, and it's confusing reading it this way.

@@ +99,5 @@
> +                    # This might indicate another heploc-like complex case.
> +                    raise Exception("Please evaluate: subtag mapping with prefix value.")
> +                langSubtagMappings[subtag] = record["Preferred-Value"]
> +
> +        # For extlangMappings, keys must be in canonical case;

Could you clarify what "canonical case" is here?  I'm sure I'm going to come back to this at some point and not remember what canonical means here.  :-\

There's a part of me that thinks we should do all processing in lowercase, and have methods to pretty-print language tags whenever they're needed for actual display in error messages or whatever.  But doing it with not-all-lowercase does make some things somewhat simpler, so I haven't been inclined to raise the point so far.

@@ +102,5 @@
> +
> +        # For extlangMappings, keys must be in canonical case;
> +        # values are arrays with [0] the replacement value, [1] the prefix to
> +        # be removed.
> +        elif record["Type"] in ("extlang"):

record["Type"] == "extlang"?
Attachment #712018 - Flags: review?(jwalden+bmo)
Comment on attachment 712351 [details] [diff] [review]
Add self-hosted JavaScript core of Intl constructors Collator, NumberFormat, DateTimeFormat (part 8)

Review of attachment 712351 [details] [diff] [review]:
-----------------------------------------------------------------

Definitely easier reading, mostly just style nits to go, and some commenting.

::: js/src/builtin/Intl.js
@@ +392,5 @@
> + *
> + * Spec: ECMAScript Internationalization API Specification, 6.2.4.
> + */
> +function DefaultLocale() {
> +    var localeOfLastResort = "und";

I'd be inclined to inline localeOfLastResort as being an obvious value with obvious semantics (at least after you look it up in BCP47, then it's part of your baseline knowledge), but this works fine too.

@@ +589,5 @@
> + *
> + * Spec: ECMAScript Internationalization API Specification, 9.2.5.
> + */
> +function ResolveLocale(availableLocales, requestedLocales, options, relevantExtensionKeys, localeData) {
> +    // step 1-3

// Steps 1-3. is the style (caps, period) we've already been using.

@@ +590,5 @@
> + * Spec: ECMAScript Internationalization API Specification, 9.2.5.
> + */
> +function ResolveLocale(availableLocales, requestedLocales, options, relevantExtensionKeys, localeData) {
> +    // step 1-3
> +    var matcher = options.__localeMatcher;

Under the assumption that we'll implement symbols, and use them for this, could you add a global |var bracketsLocaleMatcher = "__localeMatcher";| and do |options[bracketsLocaleMatcher]| here, and other places that use this property?  It'll read more visibly like the spec language, and it'll call this out as not being quite a "normal" property access.  And when symbols happen, we'll have just one place to adjust rather than many.  (Maybe plus a search-and-replace, but I can't predict the future that well when symbols haven't even been spec'd yet.  :-) )

@@ +595,5 @@
> +    var r;
> +    if (matcher === "lookup")
> +        r = LookupMatcher(availableLocales, requestedLocales);
> +    else
> +        r = BestFitMatcher(availableLocales, requestedLocales);

Ternary expression here.

@@ +598,5 @@
> +    else
> +        r = BestFitMatcher(availableLocales, requestedLocales);
> +
> +    // step 4
> +    var foundLocale = r.__locale;

bracketsLocale, and a blank line after this.

@@ +610,5 @@
> +        // step 5.c is implemented by Utilities.js
> +        // step 5.d
> +        extensionSubtags = callFunction(std_String_split, extension, "-");
> +        // step 5.e
> +        extensionSubtagsLength = extensionSubtags.length;

I'd be inclined to batch up 5a-5b under one commented section, then 5c-d, then 5e.

@@ +615,5 @@
> +    }
> +
> +    // step 6-7
> +    var result = new Record();
> +    result.__dataLocale = foundLocale;

bracketsDataLocale

@@ +621,5 @@
> +    var supportedExtension = "-u";
> +    // step 9-10
> +    var i = 0;
> +    var len = relevantExtensionKeys.length;
> +    // step 11

I might do // Steps 9-11. here, since they're all part and parcel of implementing an algorithmic loop.

@@ +625,5 @@
> +    // step 11
> +    while (i < len) {
> +        // step 11.a-c
> +        var key = relevantExtensionKeys[i];
> +        var foundLocaleData = localeData(foundLocale);

Should this be localeData[foundLocale]?  Although, quite possibly this is just stub for calling into ICU -- if so, a comment that localeData is represented by a function rather than a list might be worthwhile.

@@ +636,5 @@
> +        // locale tag may override
> +        // step 11.e
> +        var supportedExtensionAddition = "";
> +        
> +        // step 11.f is implemented by Utilities.js

No need to call out a step like this.  :-)

@@ +637,5 @@
> +        // step 11.e
> +        var supportedExtensionAddition = "";
> +        
> +        // step 11.f is implemented by Utilities.js
> +        

You've got some trailing whitespace here, and a couple lines earlier.

@@ +647,5 @@
> +            // step 11.g.ii
> +            if (keyPos !== -1) {
> +                // step 11.g.ii.1
> +                if (keyPos + 1 < extensionSubtagsLength &&
> +                        extensionSubtags[keyPos + 1].length > 2) {

Align the second line with the 'k' in "keyPos", please, and push the { to a new line.

@@ +657,5 @@
> +                    if (valuePos !== -1) {
> +                        value = requestedValue;
> +                        supportedExtensionAddition = "-" + key + "-" + value;
> +                    }
> +                // step 11.g.ii.2

This should go inside the else-block.

@@ +672,5 @@
> +        }
> +
> +        // options override all
> +        // step 11.h.i
> +        var optionsValue = options["__" + key];

Huh, generic bracketed properties are a thing now?  Joy.  Not sure how I feel about this...

@@ +682,5 @@
> +            if (optionsValue !== value) {
> +                // step 11.h.ii.1.a
> +                value = optionsValue;
> +                // step 11.h.ii.1.b
> +                supportedExtensionAddition = "";

Don't bother with five-deep step commenting here -- two assignments together make a small enough unit to be commentable once.
Attachment #712351 - Flags: review?(jwalden+bmo) → review-
Depends on: 840400
Comment on attachment 711582 [details] [diff] [review]
Add self-hosted JavaScript core of Intl constructors Collator, NumberFormat, DateTimeFormat (part 10)

Review of attachment 711582 [details] [diff] [review]:
-----------------------------------------------------------------

More // Step N. comments, please.  Although I like the extra whitespacing you have already!

::: js/src/builtin/Intl.js
@@ +815,5 @@
> + * Create an object holding the properties specified as "internal" for
> + * an Intl API object. This call is equivalent to setting the
> + * [[initializedIntlObject]] internal property of o to true.
> + */
> +function initializeIntlObject(o) {

I am kind of thinking |initializeIntlObjectOn| might read very slightly more nicely, so that it reads as a whole phrase -- "initialize Intl object stuff on obj".  Thoughts?

@@ +816,5 @@
> + * an Intl API object. This call is equivalent to setting the
> + * [[initializedIntlObject]] internal property of o to true.
> + */
> +function initializeIntlObject(o) {
> +    assert(typeof o === "object" && o !== null);

This should use the IsObject method I'm adding in bug 840400.  Note you're missing the awful case here right now.  :-)

@@ +855,5 @@
> +function checkIntlAPIObject(o, className, methodName) {
> +    var internals = getInternals(o);
> +    if (internals === undefined || internals["initialized" + className] !== true) {
> +        ThrowError(JSMSG_INTL_OBJECT_NOT_INITED, className, methodName, className);
> +    }

No braces around this.

@@ +856,5 @@
> +    var internals = getInternals(o);
> +    if (internals === undefined || internals["initialized" + className] !== true) {
> +        ThrowError(JSMSG_INTL_OBJECT_NOT_INITED, className, methodName, className);
> +    }
> +    assert(typeof o === "object" && o !== null);

IsObject

@@ +881,5 @@
> + * Initializes an object as a Collator.
> + *
> + * Spec: ECMAScript Internationalization API Specification, 10.1.1.
> + */
> +function InitializeCollator(collator, locales, options) {

I like the amounts of whitespace in this method -- definitely helps with readability.  Could I get // Step 1. comments too?  :-)

@@ +882,5 @@
> + *
> + * Spec: ECMAScript Internationalization API Specification, 10.1.1.
> + */
> +function InitializeCollator(collator, locales, options) {
> +    assert(typeof collator === "object", "InitializeCollator");

This should use IsObject too.  You're missing *both* awful cases here right now.  :-)
Attachment #711582 - Flags: review?(jwalden+bmo) → review-
Attachment #712769 - Flags: review?(jwalden+bmo) → review+
Comment on attachment 711583 [details] [diff] [review]
Add self-hosted JavaScript core of Intl constructors Collator, NumberFormat, DateTimeFormat (part 11)

Review of attachment 711583 [details] [diff] [review]:
-----------------------------------------------------------------

Needs // Step N. comments; other than that, looks pretty good.

::: js/src/builtin/Intl.js
@@ +970,5 @@
> + *
> + * Spec: ECMAScript Internationalization API Specification, 10.2.2.
> + */
> +function Intl_Collator_supportedLocalesOf(locales /*, options*/) {
> +    var options = arguments[1];

arguments.length >= 2 ? arguments[1] : undefined

The issue with doing it the way you're doing it is that if someone has |Object.prototype[1] = { get localeMatcher() { throw 17; } };| things won't work as they're specified to work.  Another test to add to your test suite, probably.  :-)

@@ +972,5 @@
> + */
> +function Intl_Collator_supportedLocalesOf(locales /*, options*/) {
> +    var options = arguments[1];
> +
> +    var availableLocales = collatorInternalProperties.__availableLocales;

[bracketsAvailableLocales], from some prior patch

@@ +987,5 @@
> +var collatorInternalProperties = {
> +    __sortLocaleData: collatorSortLocaleData,
> +    __searchLocaleData: collatorSearchLocaleData,
> +    __availableLocales: runtimeAvailableLocales, // stub
> +    __relevantExtensionKeys: ["co", "kn"]

I guess bracketsAvailableLocales means this should just be {} with defineProperty calls after that.  Sad, perhaps.

@@ +1050,5 @@
> +    assert(typeof x === "string", "CompareStrings");
> +    assert(typeof y === "string", "CompareStrings");
> +
> +    // ?? stub
> +    return x.localeCompare(y);

The ??? has me wondering if this is just stubby or not -- if it's not stubby, std_String_localeCompare.  If it is stubby for now, all good.

@@ +1077,5 @@
> +        if (key === "co") {
> +            property = "collation";
> +        } else {
> +            mapping = collatorKeyMappings[key];
> +            property = mapping.property;

I'd probably combine these two into one to eliminate the used-only-once |mapping| variable, for clarity and fewer names to think about and keep in mind.
Attachment #711583 - Flags: review?(jwalden+bmo)
Attachment #712769 - Flags: checkin?(jwalden+bmo)
(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #135)

> > +    var matcher = options.__localeMatcher;
> 
> Under the assumption that we'll implement symbols, and use them for this,
> could you add a global |var bracketsLocaleMatcher = "__localeMatcher";| and
> do |options[bracketsLocaleMatcher]| here, and other places that use this
> property?  It'll read more visibly like the spec language, and it'll call
> this out as not being quite a "normal" property access.  And when symbols
> happen, we'll have just one place to adjust rather than many.

After careful consideration, I think we should just drop the use of "__" and use simple property names instead.

I originally started using "__" for internal properties to indicate that they should be private, and then extended it to Record fields, which in the spec are written with brackets, just like internal properties.

For internal properties of Collator, NumberFormat, and DateTimeFormat objects, I dropped the underscores when moving the internal properties into separate objects held in a WeakMap (see attachment 711582 [details] [diff] [review]). The new scheme provides real privacy, so the fake privacy wasn't meaningful anymore. The properties, being on separate objects, are also easy to find and change once private symbols become available.

For the internal properties of the constructors themselves and for Record fields, privacy is provided by other means. The internal properties of the constructors are actually held on self-hosted global variables, which (once bug 822080 is fixed) are inaccessible to client code. Record fields are inaccessible to client code because Records are never made available through the API and have null prototypes. There's no need for fake privacy, and using symbols wouldn't improve anything. Using underscores or name prefixes just to reflect the brackets in the spec doesn't seem necessary, and adds unnecessary overhead when implementing Record fields whose names are given by variables.

If some indication of their special status is needed, we might suffix the names of records with "Record".

Jeff, any objections?
Comment on attachment 712769 [details] [diff] [review]
Add self-hosted JavaScript core of Intl constructors Collator, NumberFormat, DateTimeFormat (part 4)

https://hg.mozilla.org/integration/mozilla-inbound/rev/095782b51013
Attachment #712769 - Flags: checkin?(jwalden+bmo)
(In reply to Norbert Lindenberg from comment #139)
> After careful consideration, I think we should just drop the use of "__" and
> use simple property names instead.
> 
> Jeff, any objections?

Actually that sounds pretty good to me, if the objects in question aren't being exposed at all, which they aren't.  Carry on with this!
Updated per comment 135, comment 139, and comment 141.

(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #135)

> > +        var foundLocaleData = localeData(foundLocale);
> 
> Should this be localeData[foundLocale]?  Although, quite possibly this is
> just stub for calling into ICU [...].

Exactly.

> > +        // step 11.f is implemented by Utilities.js
> 
> No need to call out a step like this.  :-)

Well, this is one of the few steps where the correlation between the spec algorithm and the implementation didn't seem obvious.

> > +        
> 
> You've got some trailing whitespace here, and a couple lines earlier.

My editor software loves whitespace, my human editor doesn't...

> > +                // step 11.g.ii.2
> 
> This should go inside the else-block.

Actually, above the else, but outside the then-block.
Attachment #712351 - Attachment is obsolete: true
Attachment #713754 - Flags: review?(jwalden+bmo)
Updated per comment 133.

(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #133)

> > +        if (availableLocale !== undefined)
> > +            subset.push(locale);
> > +        k++;
> > +    }
> > +    return subset.slice(0);
> 
> I've been a bit confused for awhile as to why we have List at all, versus
> using arrays directly.  But it hasn't actually made a meaningful difference
> in the code I've seen up to now.  Here, where the List gets converted to an
> array eventually, we should definitely use an array from the start.  And I'm
> not sure why we shouldn't just use arrays from the start for all of this,
> particularly as our array methods can be faster on Arrays than on other
> kinds of objects.

The difference is that List objects have only List.prototype in their prototype chains, so they're not affected by getters, setters, or redefined methods on Array.prototype or Object.prototype. If we use an array from the start, we can't use push (which uses [[Get]] and [[Put]] internally), but have to use defineProperty. Is that better?

> var subset = (matcher === undefined || matcher === "best fit")
>              ? BestFitBlahBlahBlah
>              : LookupSupportedBlahBlahBlah;

JSHint doesn't like operators at the beginning of the line - or is there another option I can't find?
Attachment #701881 - Attachment is obsolete: true
Attachment #713768 - Flags: review?(jwalden+bmo)
Updated per comment 137.

(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #137)

> > +function initializeIntlObject(o) {
> 
> I am kind of thinking |initializeIntlObjectOn| might read very slightly more
> nicely, so that it reads as a whole phrase -- "initialize Intl object stuff
> on obj".  Thoughts?

Doesn't seem to make a difference to me.
Attachment #711582 - Attachment is obsolete: true
Attachment #713776 - Flags: review?(jwalden+bmo)
Updated per comment 138.

(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #138)

> > +function Intl_Collator_supportedLocalesOf(locales /*, options*/) {
> > +    var options = arguments[1];
> 
> arguments.length >= 2 ? arguments[1] : undefined
> 
> The issue with doing it the way you're doing it is that if someone has
> |Object.prototype[1] = { get localeMatcher() { throw 17; } };| things won't
> work as they're specified to work.  Another test to add to your test suite,
> probably.  :-)

Nice catch! I added the test. The test fails even with this fix, but I think that's because of bug 822080.

> > +    // ?? stub
> > +    return x.localeCompare(y);
> 
> The ??? has me wondering if this is just stubby or not -- if it's not
> stubby, std_String_localeCompare.  If it is stubby for now, all good.

It is.
Attachment #711583 - Attachment is obsolete: true
Attachment #713788 - Flags: review?(jwalden+bmo)
Comment on attachment 713754 [details] [diff] [review]
Add self-hosted JavaScript core of Intl constructors Collator, NumberFormat, DateTimeFormat (part 8)

Review of attachment 713754 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good with a couple micro-changes.

::: js/src/builtin/Intl.js
@@ +441,5 @@
> +    var oldStyleLocales = std_Object_getOwnPropertyNames(oldStyleLanguageTagMappings);
> +    for (var i = 0; i < oldStyleLocales.length; i++) {
> +        var oldStyleLocale = oldStyleLocales[i];
> +        if (availableLocales[oldStyleLanguageTagMappings[oldStyleLocale]])
> +            availableLocales[oldStyleLocale] = true;

Hmm.  It occurs to me that these []-lookups are probably vulnerable to poisoning through defining the same properties on Object.prototype.  Could you fix that through use of std_Object_hasOwnProperty in a separate patch, please?  And get tests for this in whatever testsuite you can, if possible.

@@ +590,5 @@
> +    // Step 5.
> +    if (extension !== undefined) {
> +        // Step 5.b.
> +        extensionIndex = r.extensionIndex;
> +        // Step 5.c is implemented by Utilities.js.

I expect any reader of this code to very quickly understand that std_*_* refers to the standard function *.* or *.prototype.*.  Given that, and given that we're immediately using the value computed here, explaining std_String_split in a comment isn't worth the trouble.

@@ +599,5 @@
> +
> +    // Steps 6-7.
> +    var result = new Record();
> +    result.dataLocale = foundLocale;
> +    // Step 8.

Add a blank line before each // Step comment that doesn't already have one, please.  I'm having difficulty thinking of any time in SpiderMonkey where a comment isn't preceded by a blank line, actually.  (When the line doesn't have any code, that is -- // comments on the same line as a bit of code are a different matter.)

@@ +642,5 @@
> +                        value = requestedValue;
> +                        supportedExtensionAddition = "-" + key + "-" + value;
> +                    }
> +                }
> +                // Step 11.g.ii.2.

No -- please put this directly inside the else-block.  In general, if I see a closing '}' (not followed by 'else {' on the same line) at the end of an if-block, else-if block, etc. I assume the end of the whole if-chain has been reached.

 if (foo) {
   ...
 } // no else, must be done!
 // la la la
 else { // er, wait, no I'm not!

Allowing the chain to continue a couple lines down, after an intervening comment, is just confusing, requiring me to mentally reset my to recognize "we're still testing a sequence of alternatives".  Putting the comment inside the block avoids this issue.

This same reason was why I asked for the '#' comments in the Python code to be moved inside their respective indented blocks, for what it's worth.
Attachment #713754 - Flags: review?(jwalden+bmo) → review+
Comment on attachment 713768 [details] [diff] [review]
Add self-hosted JavaScript core of Intl constructors Collator, NumberFormat, DateTimeFormat (part 9)

Review of attachment 713768 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/builtin/Intl.js
@@ +699,5 @@
> +function LookupSupportedLocales(availableLocales, requestedLocales) {
> +    // Steps 1-2.
> +    var len = requestedLocales.length;
> +    var subset = new List();
> +    // Steps 3-4.

Blank line before all the // Step lines, as in the previous patch.

@@ +753,5 @@
> +        }
> +    }
> +
> +    // Steps 2-3.
> +    var subset = (matcher === undefined || matcher === "best fit") ?

Does the laxbreak option fix this?

@@ +761,5 @@
> +    // Step 4.
> +    for (var i = 0; i < subset.length; i++)
> +        std_Object_defineProperty(subset, i, {value: subset[i], writable: false, enumerable: true, configurable: false});
> +//    ??? commented out because of SpiderMonkey bugs 591059 and 598996
> +//    std_Object_defineProperty(subset, "length", {value: subset.length, writable: false, enumerable: false, configurable: false});

We could almost do Object.freeze(subset) here, but this does very weird things to the "length" property, so I'm hesitant to use it yet.

@@ +783,5 @@
> +    if (value !== undefined) {
> +        // Step 2.b.
> +        if (type === "boolean")
> +            value = ToBoolean(value);
> +        // Step 2.c.

Please group 2a-2c in a single comment -- this commenting has the same if-appears-to-end problem as I've mentioned in previous patches.

@@ +795,5 @@
> +            ThrowError(JSMSG_INVALID_OPTION_VALUE, property, value);
> +        // Step 2.e.
> +        return value;
> +    }
> +    // Step 3.

Blank line before this to separate it from the if-block.

@@ +820,5 @@
> +        if (std_isNaN(value) || value < minimum || value > maximum)
> +            ThrowError(JSMSG_INVALID_DIGITS_VALUE, value);
> +        return std_Math_floor(value);
> +    }
> +    // Step 3.

Blank lines between // Step and the previous thing.
Attachment #713768 - Flags: review?(jwalden+bmo) → review+
Incidentally, [].push using [[Set]] in the spec is possibly, conceivably, a spec bug.  I can't find a JS engine that implements things like the spec says, suggesting that possibly [].push might want to be changed to use [[DefineOwnProperty]].  In the meantime, okay, I guess we use List.  :-\  Horrible language.  :-|
Updated per comment 152. Carrying r+jwalden.

(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #152)

> > +    var oldStyleLocales = std_Object_getOwnPropertyNames(oldStyleLanguageTagMappings);
> > +    for (var i = 0; i < oldStyleLocales.length; i++) {
> > +        var oldStyleLocale = oldStyleLocales[i];
> > +        if (availableLocales[oldStyleLanguageTagMappings[oldStyleLocale]])
> > +            availableLocales[oldStyleLocale] = true;
> 
> Hmm.  It occurs to me that these []-lookups are probably vulnerable to
> poisoning through defining the same properties on Object.prototype.

I don't see how: First, availableLocales is created without a prototype, both in the stub version in part 9 and in the later real version based on ICU. Second, this code only runs during initialization of the self-hosting environment, before any user code gets to mess up things.

> > +    result.dataLocale = foundLocale;
> > +    // Step 8.
> 
> Add a blank line before each // Step comment that doesn't already have one,
> please.  I'm having difficulty thinking of any time in SpiderMonkey where a
> comment isn't preceded by a blank line, actually.  (When the line doesn't
> have any code, that is -- // comments on the same line as a bit of code are
> a different matter.)

Looking at a few SpiderMonkey source files it seems that comments at a new indentation level are never preceded by a blank line. Otherwise, yes, stand-alone comments without preceding blank lines are very rare.
Attachment #713754 - Attachment is obsolete: true
Attachment #715036 - Flags: review+
Attachment #715036 - Flags: checkin?(jwalden+bmo)
Updated per comment 153. Carrying r+jwalden.

(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #153)

> @@ +753,5 @@
> > +        }
> > +    }
> > +
> > +    // Steps 2-3.
> > +    var subset = (matcher === undefined || matcher === "best fit") ?
> 
> Does the laxbreak option fix this?

It does. However, the documentation doesn't say which other "possibly unsafe line breakings" go unreported with this option. I'd rather not use it. Frankly, a multi-line ?: expression seems inappropriate to me no matter how it's formatted...

> > +//    ??? commented out because of SpiderMonkey bugs 591059 and 598996
> > +//    std_Object_defineProperty(subset, "length", {value: subset.length, writable: false, enumerable: false, configurable: false});
> 
> We could almost do Object.freeze(subset) here, but this does very weird
> things to the "length" property, so I'm hesitant to use it yet.

No, the spec requires that the object remains extensible.
Attachment #713768 - Attachment is obsolete: true
Attachment #715058 - Flags: review+
Attachment #715058 - Flags: checkin?(jwalden+bmo)
Comment on attachment 713776 [details] [diff] [review]
Add self-hosted JavaScript core of Intl constructors Collator, NumberFormat, DateTimeFormat (part 10)

Review of attachment 713776 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/builtin/Intl.js
@@ +854,5 @@
> + *
> + * Ideally we'd be using private symbols for internal properties, but
> + * SpiderMonkey doesn't have those yet.
> + */
> +var internalsMap = new WeakMap();

It occurs to me that there's one interesting way this differs from having a symbol property directly on the object.  We have one internalsMap per global, so therefore an object can be Intl-initialized multiple times, as long as it's for a distinct global each time.  I think we can live with this for now.  Hopefully symbols will happen soon enough that it won't matter.

@@ +863,5 @@
> + * an Intl API object. This call is equivalent to setting the
> + * [[initializedIntlObject]] internal property of o to true.
> + */
> +function initializeIntlObject(o) {
> +    assert(IsObject(o));

Do you want a message/reason/caller string here?

@@ +900,5 @@
> + * Spec: ECMAScript Internationalization API Specification, 12.3.
> + */
> +function checkIntlAPIObject(o, className, methodName) {
> +    var internals = getInternals(o);
> +    if (internals === undefined || internals["initialized" + className] !== true)

Maybe worth an assert(typeof className === "string") for documentation/clarity purposes?

@@ +962,5 @@
> +
> +    // Check all allowed options properties and convert them to extension keys.
> +    // Steps 13-13.a.
> +    var key, mapping, property, value;
> +    for (key in collatorKeyMappings) {

for-in loops will also list enumerable properties on Object.prototype, which I'm pretty sure can be bootstrapped into wrong code here.  Right?

...oh, you're doing a has-own check immediately afterward.  Could you work around this using |for (var key of collatorKeyMappings)|?  That should only check own enumerable properties.

@@ +966,5 @@
> +    for (key in collatorKeyMappings) {
> +        if (callFunction(std_Object_hasOwnProperty, collatorKeyMappings, key)) {
> +            mapping = collatorKeyMappings[key];
> +            // Step 13.b.
> +            value = GetOption(options, mapping.property, mapping.type, mapping.values);

Please add the explicit |undefined| last argument.

@@ +990,5 @@
> +    var i = 0, len = relevantExtensionKeys.length;
> +    while (i < len) {
> +        // Step 19.a.
> +        key = relevantExtensionKeys[i];
> +        // Step 19.b.

This should go inside the |key === "co"| block.

@@ +995,5 @@
> +        if (key === "co") {
> +            property = "collation";
> +            value = r.co === null ? "default" : r.co;
> +        }
> +        // Step 19.c.

And this should go inside the else-block.

@@ +1011,5 @@
> +    }
> +
> +    // Compute remaining collation options.
> +    // Steps 20-21.
> +    var s = GetOption(options, "sensitivity", "string", ["base", "accent", "case", "variant"]);

Add an explicit final |undefined| argument.

@@ +1013,5 @@
> +    // Compute remaining collation options.
> +    // Steps 20-21.
> +    var s = GetOption(options, "sensitivity", "string", ["base", "accent", "case", "variant"]);
> +    if (s === undefined) {
> +        // Step 21.a.

Move this inside the if-block.

@@ +1017,5 @@
> +        // Step 21.a.
> +        if (u === "sort") {
> +            s = "variant";
> +        }
> +        // Step 21.b.

And this inside the else-block.
Attachment #713776 - Flags: review?(jwalden+bmo) → review+
Comment on attachment 713788 [details] [diff] [review]
Add self-hosted JavaScript core of Intl constructors Collator, NumberFormat, DateTimeFormat (part 11)

Review of attachment 713788 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/builtin/Intl.js
@@ +1112,5 @@
> +    return internals.boundCompare;
> +}
> +
> +
> +function collatorCompareToBind(x, y) {

Could you move this function above Intl_Collator_compare, and then CompareStrings above this?  It's nicer to have functions defined before they're used, whenever possible, for reader-friendliness, even if JS does hoist them for you.

@@ +1159,5 @@
> +
> +    var relevantExtensionKeys = collatorInternalProperties.relevantExtensionKeys;
> +    for (var i = 0; i < relevantExtensionKeys.length; i++) {
> +        var key, property;
> +        key = relevantExtensionKeys[i];

var key = ...;

and then

var property = ... ? ... : ...;

with any necessary line breaks.
Attachment #713788 - Flags: review?(jwalden+bmo) → review+
Comment on attachment 714121 [details] [diff] [review]
Add self-hosted JavaScript core of Intl constructors Collator, NumberFormat, DateTimeFormat (part 12)

Review of attachment 714121 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/builtin/Intl.js
@@ +1224,5 @@
> +    var s = GetOption(options, "style", "string", ["decimal", "percent", "currency"], "decimal");
> +    internals.style = s;
> +
> +    // Steps 17-20.
> +    var c = GetOption(options, "currency", "string");

Please add two extra explicit |undefined| arguments.

There's some real screwiness to how the spec handles this particular option-get, in terms of not having a fallback, of the result not necessarily matching the input type, and other stuff.  I don't feel I have a good enough handle on exactly what it's doing to suggest improvements, or to explain exactly what all makes it screwy.  :-\

@@ +1225,5 @@
> +    internals.style = s;
> +
> +    // Steps 17-20.
> +    var c = GetOption(options, "currency", "string");
> +    if (c && !IsWellFormedCurrencyCode(c))

c !== undefined

I think this boils over into visible differences, if the |options| passed in includes a currency that's the empty string.  Is that correct?

@@ +1229,5 @@
> +    if (c && !IsWellFormedCurrencyCode(c))
> +        ThrowError(JSMSG_INVALID_CURRENCY_CODE, c);
> +    if (s === "currency" && c === undefined)
> +        ThrowError(JSMSG_UNDEFINED_CURRENCY);
> +    if (s === "currency") {

I think I'd combine these two into one if, like so:

if (s === "currency") {
    if (c === undefined)
        ThrowError(...);
    // Steps 20.a-b.
    c = toASCIIUpperCase(c);
    ...
}

@@ +1233,5 @@
> +    if (s === "currency") {
> +        // Steps 20.a-b.
> +        c = toASCIIUpperCase(c);
> +        internals.currency = c;
> +    }

Why aren't you doing step 20c here?  Seems like it makes sense here, as the spec reads, and it means you only need one call to CurrencyDigits rather than two.

@@ +1252,5 @@
> +
> +    // Steps 28-30.
> +    var mxfdDefault = s === "currency" ? std_Math_max(mnfd, CurrencyDigits(c))
> +                                       : s === "percent" ? std_Math_max(mnfd, 0)
> +                                                         : std_Math_max(mnfd, 3);

This sort of nested ternary is fairly hard to read.  Could you just do it the simple way, without going overboard trying to common up a left-hand side?

var mxfdDefault;
if (s === "currency")
    mxfdDefault = std_Math_max(mnfd, CurrencyDigits(c));
else if (s === "percent")
    mxfdDefault = std_Math_max(mnfd, 0);
else
    mxfdDefault = std_Math_max(mnfd, 3);

@@ +1285,5 @@
> + *
> + * Spec: ISO 4217 Currency and Funds Code List.
> + * http://www.currency-iso.org/en/home/tables/table-a1.html
> + */
> +var currencyDigits = {

This isn't the full list given there.  Is that because the majority of entries use 2, and here you're only listing the ones that don't have 2 as a value, letting them be handled by the fallback case?  I'm inclined to say we should have the same sort of currency-data update generation script as we have for the other databases, as this stuff is doubtless going to change as time goes on.  Could you do that in a separate patch, please?  Seems best not holding up this entire thing just for this little bit.

@@ +1321,5 @@
> + * Spec: ECMAScript Internationalization API Specification, 11.1.1.
> + */
> +function CurrencyDigits(currency) {
> +    assert(typeof currency === "string", "CurrencyDigits");
> +    assert(callFunction(std_String_match, currency, /^[A-Z]{3}$/), "CurrencyDigits");

Use std_RegExp_test instead here with the inverted arguments.  |match| builds up an array with match data, but if we're just testing for *whether* it matches or not, we don't want that, and indeed want to just get back true or false as the test method will give us.

@@ +1323,5 @@
> +function CurrencyDigits(currency) {
> +    assert(typeof currency === "string", "CurrencyDigits");
> +    assert(callFunction(std_String_match, currency, /^[A-Z]{3}$/), "CurrencyDigits");
> +
> +    var d = currencyDigits[currency];

Doesn't this have Object.prototype poisoning issues?
Attachment #714121 - Flags: review?(jwalden+bmo) → review-
Comment on attachment 714123 [details] [diff] [review]
Add self-hosted JavaScript core of Intl constructors Collator, NumberFormat, DateTimeFormat (part 13)

Review of attachment 714123 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/builtin/Intl.js
@@ +1390,5 @@
> +    return internals.boundFormat;
> +}
> +
> +
> +function numberFormatFormatToBind(value) {

Please move this above Intl_NumberFormat_format, and then move FormatNumber above this.
Attachment #714123 - Flags: review?(jwalden+bmo) → review+
Looking at the original mega-patch, it looks like the Intl.js changes were ~1900 lines.  So far the patches I've looked at take us up to around ~1400ish or so, but I've also asked for a bit more whitespace and commenting, so maybe we're at 60% or so done with the Intl.js patches?  Just want to get a vague feel for how far we've gotten, and how far remains to go, to see how we're doing as far as schedule goes.
Updated per comment 157. Carrying r+jwalden.

(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #157)

> > + * Ideally we'd be using private symbols for internal properties, but
> > + * SpiderMonkey doesn't have those yet.
> > + */
> > +var internalsMap = new WeakMap();
> 
> It occurs to me that there's one interesting way this differs from having a
> symbol property directly on the object.  We have one internalsMap per
> global, so therefore an object can be Intl-initialized multiple times, as
> long as it's for a distinct global each time.  I think we can live with this
> for now.  Hopefully symbols will happen soon enough that it won't matter.

Yes to all.

> @@ +962,5 @@
> > +
> > +    // Check all allowed options properties and convert them to extension keys.
> > +    // Steps 13-13.a.
> > +    var key, mapping, property, value;
> > +    for (key in collatorKeyMappings) {
> 
> for-in loops will also list enumerable properties on Object.prototype, which
> I'm pretty sure can be bootstrapped into wrong code here.  Right?
> 
> ...oh, you're doing a has-own check immediately afterward.  Could you work
> around this using |for (var key of collatorKeyMappings)|?  That should only
> check own enumerable properties.

Actually, if I parse the spec and proposal correctly, that would have to be |for (var key of keys(collatorKeyMappings))|. But then jshint stumbles badly over the "of", and it doesn't have an ECMAScript6 option yet. The has-own check is a pretty standard pattern...
Attachment #713776 - Attachment is obsolete: true
Attachment #715893 - Flags: review+
Attachment #715893 - Flags: checkin?(jwalden+bmo)
Comment on attachment 714126 [details] [diff] [review]
Add self-hosted JavaScript core of Intl constructors Collator, NumberFormat, DateTimeFormat (part 14)

Review of attachment 714126 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/builtin/Intl.js
@@ +1478,5 @@
> + *
> + * Spec: ECMAScript Internationalization API Specification, 12.1.1.
> + */
> +function InitializeDateTimeFormat(dateTimeFormat, locales, options) {
> +    assert(IsObject(dateTimeFormat));

Does this want a message string?

@@ +1531,5 @@
> +    opt = new Record();
> +    // Step 19.
> +    var i, prop;
> +    for (i = 0; i < dateTimeComponents.length; i++) {
> +        prop = dateTimeComponents[i];

Make this |var prop|.  JS scoping doesn't work that way, sure -- but it's a semantic hint that the value won't be used after this loop.

@@ +1532,5 @@
> +    // Step 19.
> +    var i, prop;
> +    for (i = 0; i < dateTimeComponents.length; i++) {
> +        prop = dateTimeComponents[i];
> +        var value = GetOption(options, prop, "string", dateTimeComponentValues[prop]);

Add a final explicit |undefined| argument, please.

@@ +1548,5 @@
> +                     BestFitFormatMatcher(opt, formats);
> +
> +    // Step 25.
> +    for (i = 0; i < dateTimeComponents.length; i++) {
> +        prop = dateTimeComponents[i];

This should be |var prop| as well, same reason.

@@ +1565,5 @@
> +        if (hr12 === undefined)
> +            hr12 = dataLocaleData.hour12;
> +        internals.hour12 = hr12;
> +        // Step 27.c.
> +        if (hr12) {

Add an assert(typeof hr12 === "boolean") before this, please.  And the // Step comment belongs inside the block, as usual.  Same for the else-block comment.

@@ +1575,5 @@
> +        else {
> +            pattern = bestFormat.pattern;
> +        }
> +    }
> +    // Step 28.

This comment goes inside the else-block.
Attachment #714126 - Flags: review?(jwalden+bmo) → review+
Comment on attachment 714127 [details] [diff] [review]
Add self-hosted JavaScript core of Intl constructors Collator, NumberFormat, DateTimeFormat (part 15)

Review of attachment 714127 [details] [diff] [review]:
-----------------------------------------------------------------

r- for the defineProperty bit primarily, but there's other stuff to be done too.

::: js/src/builtin/Intl.js
@@ +1607,5 @@
> +    if (options === undefined)
> +        options = null;
> +    else
> +        options = ToObject(options);
> +    options = callFunction(std_Object_create, this, options);

Use |undefined| rather than |this| here, please.  There's no reason to use |this| here, and it's just confusing this way.

@@ +1615,5 @@
> +    // Step 5.
> +    if ((required === "date" || required === "any") &&
> +        (options.weekday !== undefined || options.year !== undefined ||
> +         options.month !== undefined || options.day !== undefined)) {
> +        needDefaults = false;

if (... &&
    ...)
{
    needDefaults = false;
}

@@ +1621,5 @@
> +    // Step 6.
> +    if ((required === "time" || required === "any") &&
> +        (options.hour !== undefined || options.minute !== undefined ||
> +         options.second !== undefined)) {
> +        needDefaults = false;

Same style fix here.

@@ +1626,5 @@
> +    }
> +
> +    // Step 7.
> +    if (needDefaults && (defaults === "date" || defaults === "all")) {
> +        defineProperty(options, "year", "numeric");

defineProperty as previously defined used Object.defineProperty.  That method is equivalent to the [[DefineOwnProperty]] call here, but with |true| as the last argument.  So either defineProperty needs to change, to try-catch errors the Object.defineProperty call there might throw, or this needs a try-catch in it.  Given you have six different calls here, I think you want to add it to that method.  If some other place needs a [[DefineOwnProperty]] call that passes true, I think you should have a different method for that.

(Is this the first defineProperty call I've seen?  I want to say it is, but I'm not sure.  Were all the previous ones passing |true| as the throwing parameter?)

(Should the spec be changed to pass |true| as the last parameter?  Most new stuff in ECMAScript isn't swallowing errors like this is, so I think this may not be the Right Thing To Do.)

@@ +1632,5 @@
> +        defineProperty(options, "day", "numeric");
> +    }
> +    // Step 8.
> +    if (needDefaults && (defaults === "time" || defaults === "all")) {
> +        defineProperty(options, "hour", "numeric");

Same true/false thing as above needs resolving here.

@@ +1657,5 @@
> +        longMorePenalty = 6,
> +        shortLessPenalty = 6,
> +        shortMorePenalty = 3;
> +    // Table 3.
> +    var properties = ["era", "year", "month", "day", "weekday",

Make the ordering here the same as the ordering in the table -- that is, move "weekday" to the start.

The spec doesn't impose any ordering on how these should be considered, correct?  Under the assumption that |formats| is a densely-packed array, every access is going to be infallible and produce a sane value.  And because addition/subtraction is commutative the same final score will result.  Just want to understand a little of the logic.

@@ +1664,5 @@
> +    var values = ["2-digit", "numeric", "narrow", "short", "long"];
> +
> +    // Steps 7-8.
> +    var bestScore = -Infinity;
> +    var bestFormat;

= undefined, please, to be explicit.

@@ +1666,5 @@
> +    // Steps 7-8.
> +    var bestScore = -Infinity;
> +    var bestFormat;
> +
> +    // Step 9-11.

Steps

@@ +1676,5 @@
> +        var score = 0;
> +        // Step 11.c.
> +        var j;
> +        var formatProp;
> +        for (j = 0; j < properties.length; j++) {

for (var j = 0; ...

@@ +1686,5 @@
> +            formatProp = undefined;
> +            // Steps 11.c.ii-iii.
> +            if (callFunction(std_Object_hasOwnProperty, format, property))
> +                formatProp = format[property];
> +            // Steps 11.c.iv.

Step

@@ +1688,5 @@
> +            if (callFunction(std_Object_hasOwnProperty, format, property))
> +                formatProp = format[property];
> +            // Steps 11.c.iv.
> +            if (optionsProp === undefined && formatProp !== undefined) {
> +                score -= additionPenalty;

Step comment inside block, and same for the else-if/else cases.

@@ +1690,5 @@
> +            // Steps 11.c.iv.
> +            if (optionsProp === undefined && formatProp !== undefined) {
> +                score -= additionPenalty;
> +            }
> +            // Steps 11.c.v.

Step

@@ +1694,5 @@
> +            // Steps 11.c.v.
> +            else if (optionsProp !== undefined && formatProp === undefined) {
> +                score -= removalPenalty;
> +            }
> +            // Steps 11.c.vi.

Step

@@ +1697,5 @@
> +            }
> +            // Steps 11.c.vi.
> +            else {
> +                var optionsPropIndex = callFunction(std_Array_indexOf, values, optionsProp);
> +                var formatPropIndex = callFunction(std_Array_indexOf, values, formatProp);

Could you add |assert(... !== -1);| for each of these, just as a sanity check?

The connection in the spec between this implied assertion and the actual text is a bit understated.  :-\  I wish the list in the spec had mentioned something about how this string list was created, in a NOTE: afterward or something.

@@ +1711,5 @@
> +                else
> +                    assert(false, "BasicFormatMatcher");
> +            }
> +        }
> +        // Step 11.d.

Blank line before this.

@@ +1716,5 @@
> +        if (score > bestScore) {
> +            bestScore = score;
> +            bestFormat = format;
> +        }
> +        // Step 11.e.

Blank line before this.

@@ +1719,5 @@
> +        }
> +        // Step 11.e.
> +        i++;
> +    }
> +    // Step 12.

Blank line before this, please.
Attachment #714127 - Flags: review?(jwalden+bmo) → review-
Comment on attachment 714128 [details] [diff] [review]
Add self-hosted JavaScript core of Intl constructors Collator, NumberFormat, DateTimeFormat (part 16)

Review of attachment 714128 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/builtin/Intl.js
@@ +1758,5 @@
> + * DateTimeFormat internal properties.
> + *
> + * Spec: ECMAScript Internationalization API Specification, 9.1 and 12.2.3.
> + */
> +var dateTimeFormatInternalProperties = {

Please move this up above supportedLocales.  (And, hmm, come to think of it, I think I noticed this for the other classes as well but forgot to say anything.  Move them up above their first uses as well?)

@@ +1807,5 @@
> + * effective locale and the formatting options of this DateTimeFormat.
> + *
> + * Spec: ECMAScript Internationalization API Specification, 12.3.2.
> + */
> +function Intl_DateTimeFormat_format() {

It belatedly occurs to me, but this is used as a getter function -- not as a function-valued property.  Given that, I think this should be more clearly named "Intl_DateTimeFormat_format_get" (or maybe getter, but brevity seems slightly nicer given the already-long names that will result).  Could you change the existing ones as well, please?

@@ +1824,5 @@
> +    return internals.boundFormat;
> +}
> +
> +
> +function dateTimeFormatFormatToBind() {

Move this above Intl_DateTimeFormat_format_get, since that method uses it.

@@ +1832,5 @@
> +    if (date === undefined)
> +        x = std_Date_now();
> +    // Step 1.a.ii.
> +    else
> +        x = ToNumber(date);

Let's join this into

var x = (date === undefined)
        ? std_Date_now()
        : ToNumber(date);

Maybe all on one line if you have enough characters for it.

@@ +1845,5 @@
> + * DateTimeFormat.
> + *
> + * Spec: ECMAScript Internationalization API Specification, 12.3.2.
> + */
> +function FormatDateTime(dateTimeFormat, x) {

And this above dateTimeFormatFormatToBind
Attachment #714128 - Flags: review?(jwalden+bmo) → review+
(In reply to Norbert Lindenberg from comment #155)
> > Hmm.  It occurs to me that these []-lookups are probably vulnerable to
> > poisoning through defining the same properties on Object.prototype.
> 
> I don't see how:

Oh, you're right.  Although to be sure, I'd much rather not rely on running before any user code for correctness here, seeing as that only makes it harder to reason about correctness.

> Looking at a few SpiderMonkey source files it seems that comments at a new
> indentation level are never preceded by a blank line. Otherwise, yes,
> stand-alone comments without preceding blank lines are very rare.

Right, I did forget the at-the-start-of-a-block caveat.  :-)

(In reply to Norbert Lindenberg from comment #156)
> > Does the laxbreak option fix this?
> 
> It does. However, the documentation doesn't say which other "possibly unsafe
> line breakings" go unreported with this option. I'd rather not use it.

Given that we're all reasonably experienced developers, and given (moreover) that any changes here will go through review, and given -- perhaps above all -- that this code will have a fair number of tests for it -- I don't think there's significant reason to worry.

> Frankly, a multi-line ?: expression seems inappropriate to me no matter how
> it's formatted...

Be that as it may, doing it the way you're doing it will seem inappropriate to any SpiderMonkey developer looking at this code.  Our coding style trumps jshint.

Also, in the process of doing a last skim of part 8 I noticed another ?: with the jshint-approved but SpiderMonkey-disallowed formatting.  Please fix that as well.

> No, the spec requires that the object remains extensible.

Ah, right.  Sigh, JS arrays.  So much wrong with this language.  :-)

(In reply to Norbert Lindenberg from comment #162)
> Actually, if I parse the spec and proposal correctly, that would have to be
> |for (var key of keys(collatorKeyMappings))|. But then jshint stumbles badly
> over the "of", and it doesn't have an ECMAScript6 option yet. The has-own
> check is a pretty standard pattern...

Hmm, yeah, I guess that would be keys().  And yeah, hasOwnProperty is good enough, just figured we might as well use something nicer if we could.  No worries if we can't.

Looking at the final patch I noticed some trailing whitespace in the patch -- mind removing that, please?
Attachment #715036 - Flags: checkin?(jwalden+bmo)
Attachment #715058 - Flags: checkin?(jwalden+bmo)
Attachment #715893 - Flags: checkin?(jwalden+bmo)
Updated per comment 158, plus getter renaming per comment 165 and laxbreak and whitespace fixes per comment 166. Carrying r+jwalden.

(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #158)

> > +function collatorCompareToBind(x, y) {
> 
> Could you move this function above Intl_Collator_compare, and then
> CompareStrings above this?  It's nicer to have functions defined before
> they're used, whenever possible, for reader-friendliness, even if JS does
> hoist them for you.

For functions/methods/operations that are named in the spec I've generally used the order in which they're specified there, so CompareStrings comes after Intl_Collator_compare (CompareStrings and its buddies FormatNumber and FormatDateTime will actually be replaced with ICU-based C++ functions, so their locations are temporary in any case).

I did move the *ToBind functions. The most spec-parallel place for them would actually be inside the methods that use them, but that would give them closures, which they don't need.
Attachment #713788 - Attachment is obsolete: true
Attachment #716407 - Flags: review+
Attachment #716407 - Flags: checkin?(jwalden+bmo)
Updated per comment 159.

(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #159)

> > +    var c = GetOption(options, "currency", "string");

> There's some real screwiness to how the spec handles this particular
> option-get, in terms of not having a fallback, of the result not necessarily
> matching the input type, and other stuff.  I don't feel I have a good enough
> handle on exactly what it's doing to suggest improvements, or to explain
> exactly what all makes it screwy.  :-\

This option is different from others in that (a) the allowed values come from a large set (all 140608 possible strings consisting of three Basic Latin letters), (b) a value is only needed when the style is "currency", and (c) when it is needed, we want to force the caller to provide it because we don't want to guess whether euros or Indonesian rupiahs are meant.

> > +    var c = GetOption(options, "currency", "string");
> > +    if (c && !IsWellFormedCurrencyCode(c))
> 
> c !== undefined
> 
> I think this boils over into visible differences, if the |options| passed in
> includes a currency that's the empty string.  Is that correct?

You're right, it has to be !== undefined. Callers passing in an empty string are entitled to a RangeError exception. I added a test case.

> > + * Spec: ISO 4217 Currency and Funds Code List.
> > + * http://www.currency-iso.org/en/home/tables/table-a1.html
> > + */
> > +var currencyDigits = {
> 
> This isn't the full list given there.  Is that because the majority of
> entries use 2, and here you're only listing the ones that don't have 2 as a
> value, letting them be handled by the fallback case?

Correct. In particular, we use 2 for all strings that are well-formed currency codes but about which we don't have information, because they're new, obsolete, or never used.

> I'm inclined to say we
> should have the same sort of currency-data update generation script as we
> have for the other databases, as this stuff is doubtless going to change as
> time goes on.  Could you do that in a separate patch, please?  Seems best
> not holding up this entire thing just for this little bit.

OK, will do.

> > +    var d = currencyDigits[currency];
> 
> Doesn't this have Object.prototype poisoning issues?

It did. Thank you!
Attachment #714121 - Attachment is obsolete: true
Attachment #716408 - Flags: review?(jwalden+bmo)
Updated per comment 160. Carrying r+jwalden.

(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #160)

> > +function numberFormatFormatToBind(value) {
> 
> Please move this above Intl_NumberFormat_format, and then move FormatNumber
> above this.

See comment 167 above.
Attachment #714123 - Attachment is obsolete: true
Attachment #716409 - Flags: review+
Attachment #716409 - Flags: checkin?(jwalden+bmo)
Updated per comment 163. Carrying r+jwalden.

(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #163)

> > +    // Step 19.
> > +    var i, prop;
> > +    for (i = 0; i < dateTimeComponents.length; i++) {
> > +        prop = dateTimeComponents[i];
> 
> Make this |var prop|.  JS scoping doesn't work that way, sure -- but it's a
> semantic hint that the value won't be used after this loop.

> > +    // Step 25.
> > +    for (i = 0; i < dateTimeComponents.length; i++) {
> > +        prop = dateTimeComponents[i];
> 
> This should be |var prop| as well, same reason.

No, JS scoping really doesn't work that way, and declaring the same variable twice is generally a sign that the programmer doesn't understand JS scoping rules and is asking for trouble. In this case we can convince ourselves that there is no trouble, but I don't know how to tell jshint that it's OK.
Attachment #714126 - Attachment is obsolete: true
Attachment #716410 - Flags: review+
Attachment #716410 - Flags: checkin?(jwalden+bmo)
Updated per comment 164.

(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #164)

> > +    options = callFunction(std_Object_create, this, options);
> 
> Use |undefined| rather than |this| here, please.  There's no reason to use
> |this| here, and it's just confusing this way.

Actually, using callFunction here was unnecessary.

> > +        defineProperty(options, "year", "numeric");
> 
> defineProperty as previously defined used Object.defineProperty.  That
> method is equivalent to the [[DefineOwnProperty]] call here, but with |true|
> as the last argument.  So either defineProperty needs to change, to
> try-catch errors the Object.defineProperty call there might throw, or this
> needs a try-catch in it.  Given you have six different calls here, I think
> you want to add it to that method.  If some other place needs a
> [[DefineOwnProperty]] call that passes true, I think you should have a
> different method for that.
> 
> (Is this the first defineProperty call I've seen?  I want to say it is, but
> I'm not sure.  Were all the previous ones passing |true| as the throwing
> parameter?)
> 
> (Should the spec be changed to pass |true| as the last parameter?  Most new
> stuff in ECMAScript isn't swallowing errors like this is, so I think this
> may not be the Right Thing To Do.)

Earlier spec drafts had true as the last parameter, but then Allen made me change to false specifically because these calls add properties to new objects, so there's nothing to check. I added a comment.

> > +    var bestFormat;
> 
> = undefined, please, to be explicit.

That seems quite redundant, and I don't know how to tell jshint not to tell me so.
Attachment #714127 - Attachment is obsolete: true
Attachment #716411 - Flags: review?(jwalden+bmo)
Updated per comment 165. Carrying r+jwalden.

(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #165)

> > + * Spec: ECMAScript Internationalization API Specification, 9.1 and 12.2.3.
> > + */
> > +var dateTimeFormatInternalProperties = {
> 
> Please move this up above supportedLocales.  (And, hmm, come to think of it,
> I think I noticed this for the other classes as well but forgot to say
> anything.  Move them up above their first uses as well?)

Again, I followed the order in which these items occur in the spec.

> @@ +1845,5 @@
> > + * DateTimeFormat.
> > + *
> > + * Spec: ECMAScript Internationalization API Specification, 12.3.2.
> > + */
> > +function FormatDateTime(dateTimeFormat, x) {
> 
> And this above dateTimeFormatFormatToBind

See comment 167 above.
Attachment #714128 - Attachment is obsolete: true
Attachment #716413 - Flags: review+
Attachment #716413 - Flags: checkin?(jwalden+bmo)
(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #161)
> Looking at the original mega-patch, it looks like the Intl.js changes were
> ~1900 lines.  So far the patches I've looked at take us up to around
> ~1400ish or so, but I've also asked for a bit more whitespace and
> commenting, so maybe we're at 60% or so done with the Intl.js patches?  Just
> want to get a vague feel for how far we've gotten, and how far remains to
> go, to see how we're doing as far as schedule goes.

You've looked at a lot more since asking this question, so now the answer is easier: You've looked at all of the original intl-js.patch. When intl-js-16.patch is checked in, this bug in its reduced form (see comment 108) is done, and we can move on to bug 837957.

On Monday I estimated that at the speed of the previous two weeks we'd get the product code, but not the test suite, landed by mid-April. I'll redo my math coming Monday.
The checkin-needed? right now depend on previous patch reviews getting done (unless I want to work through patch-application conflicts, but I'm pretty sure you don't want this, and it doesn't really need to happen that soon anyway).  So I won't be getting to those landings til the prerequisite checkins and possibly reviews are made.  (Just as a status note.)
Specifically, I think I was referring to parts 8-10 as the parts that need landing before the current c-n? bunch can move forward.  I'll keep looking at newer stuff in the meantime.
Comment on attachment 716408 [details] [diff] [review]
Add self-hosted JavaScript core of Intl constructors Collator, NumberFormat, DateTimeFormat (part 12)

Review of attachment 716408 [details] [diff] [review]:
-----------------------------------------------------------------

(In reply to Norbert Lindenberg from comment #168)
> This option is different from others in that (a) the allowed values come
> from a large set (all 140608 possible strings consisting of three Basic
> Latin letters), (b) a value is only needed when the style is "currency", and
> (c) when it is needed, we want to force the caller to provide it because we
> don't want to guess whether euros or Indonesian rupiahs are meant.

Blech.  I think I'd have added a GetCurrencyOption spec method if it'd been up to me, rather than have this oddness, if that would have addressed matters.  (And probably GetStringOption and GetBooleanOption as well, or something, rather than having GetOption so generic.)  Or whatever set of extra spec methods would have clarified things.  Oh well, water under the bridge.

::: js/src/builtin/Intl.js
@@ +1312,5 @@
> +    var mxfd = GetNumberOption(options, "maximumFractionDigits", mnfd, 20, mxfdDefault);
> +    internals.maximumFractionDigits = mxfd;
> +
> +    // Steps 31-33.
> +    if (options.minimumSignificantDigits !== undefined || options.maximumSignificantDigits !== undefined) {

Hmm.  Actually, this !== || !== pattern is wrong, at least in this particular case.  The spec steps are [[Get]] for the first, then [[Get]] for the second, *then* do !== comparisons.  But doing it this way interleaves the first !== comparison between the two [[Get]]s.  This is observable with options = { ..., minimumSignificantDigits: undefined, get maximumSignificantDigits() { throw 17; } }.  Looks like v8-intl that I assume is shipping/will ship with v8 at some point gets this wrong, too, for exactly the same self-hosting reasons:

http://code.google.com/p/v8-i18n/source/browse/trunk/src/number-format.js?spec=svn169&r=167#109

Definitely this wants a test.  Or if you want to change the spec, that's cool with me too -- just as long as everyone gets on the same page, and stays there.

Could you do a check of the existing code, and of any other patches in flight, to see if any others make this short-circuiting mistake, while you're at it?  I feel like I've seen this before but never noticed it, in some other patch here.
Attachment #716408 - Flags: review?(jwalden+bmo) → review-
Updated per comment 176.

(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #176)

> > +    // Steps 31-33.
> > +    if (options.minimumSignificantDigits !== undefined || options.maximumSignificantDigits !== undefined) {
> 
> Hmm.  Actually, this !== || !== pattern is wrong, at least in this
> particular case.  The spec steps are [[Get]] for the first, then [[Get]] for
> the second, *then* do !== comparisons.  But doing it this way interleaves
> the first !== comparison between the two [[Get]]s.  This is observable with
> options = { ..., minimumSignificantDigits: undefined, get
> maximumSignificantDigits() { throw 17; } }.

Good catch - thanks!

> Definitely this wants a test.  Or if you want to change the spec, that's
> cool with me too -- just as long as everyone gets on the same page, and
> stays there.

Added a test. I don't think the behavior matters to users of the API, other than wanting consistency, so there's no point in changing the spec.

> Could you do a check of the existing code, and of any other patches in
> flight, to see if any others make this short-circuiting mistake, while
> you're at it?  I feel like I've seen this before but never noticed it, in
> some other patch here.

I checked Intl.js as of part 16, but didn't find any other situation like this. Some code in ToDateTimeOptions looks similar, but there the properties are always read and checked immediately per spec, so that's OK.
Attachment #716408 - Attachment is obsolete: true
Attachment #718217 - Flags: review?(jwalden+bmo)
Attachment #716407 - Flags: checkin?(jwalden+bmo)
(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #164)

> > +                var optionsPropIndex = callFunction(std_Array_indexOf, values, optionsProp);
> > +                var formatPropIndex = callFunction(std_Array_indexOf, values, formatProp);
> 
> Could you add |assert(... !== -1);| for each of these, just as a sanity
> check?
> 
> The connection in the spec between this implied assertion and the actual
> text is a bit understated.  :-\  I wish the list in the spec had mentioned
> something about how this string list was created, in a NOTE: afterward or
> something.

I added the assertions in attachment 716411 [details] [diff] [review], but that turned out to be wrong, and the spec is incomplete as well: It's possible here for both optionsProp and formatProp to be undefined. The spec language doesn't define what "the index of x within y" means when x doesn't occur in y, so this needs to be handled separately. I filed
https://bugs.ecmascript.org/show_bug.cgi?id=1258

In the code, std_Array_indexOf returns -1, and that works just fine until it runs into the assertion at the end of step 11.c.vi, which became incorrect when I converted from a switch statement to an if-else cascade and dropped the delta === 0 case.

This update removes all three assertions.
Attachment #716411 - Attachment is obsolete: true
Attachment #716411 - Flags: review?(jwalden+bmo)
Attachment #719130 - Flags: review?(jwalden+bmo)
Comment on attachment 718217 [details] [diff] [review]
Add self-hosted JavaScript core of Intl constructors Collator, NumberFormat, DateTimeFormat (part 12)

Review of attachment 718217 [details] [diff] [review]:
-----------------------------------------------------------------

Seeing as this needs to land before a couple of your other patches, I'm going to take the liberty of assuming this is ready to land, and will land it momentarily.  :-)
Attachment #718217 - Flags: review?(jwalden+bmo) → review+
Attachment #716409 - Flags: checkin?(jwalden+bmo)
Attachment #716410 - Flags: checkin?(jwalden+bmo)
Attachment #716413 - Flags: checkin?(jwalden+bmo)
Attachment #719130 - Flags: review?(jwalden+bmo) → review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/a63ce4ed61c8 (part 16, accidentally forgotten in the last push)
Blocks: 853301
Comment on attachment 673759 [details] [diff] [review]
Change access to global object slots for Intl module objects.

For consistency, the proposed change to the pattern of access to global object slots should be accompanied by parallel changes to the iterator implementation. I've separated this out as bug 854049.
Attachment #673759 - Attachment is obsolete: true
All code for this part of the implementation of the ECMAScript Internationalization API has landed. Still missing is the IANA Language Subtag Registry file, aka part 5; since this can come later, I've separated it out as bug 854042.

The code is still disabled in current builds; those interested in actually seeing it work should watch bug 853301.
Severity: normal → enhancement
Status: ASSIGNED → RESOLVED
Closed: 12 years ago11 years ago
Resolution: --- → FIXED
Whiteboard: [js:t] [bcp47] [leave open] → [js:t] [bcp47]
Target Milestone: mozilla19 → mozilla22
Version: unspecified → Trunk
Blocks: 864843
Blocks: 866301
Blocks: 866305
Mass-moving existing Intl-related bugs to the new Core :: JavaScript: Internationalization API component.

If you think this bug has been moved in error, feel free to move it back to Core :: JavaScript Engine.

[Mass change filter: core-js-intl-api-move]
Component: JavaScript Engine → JavaScript: Internationalization API
Depends on: 1018244
No longer depends on: 1018244
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: