Closed Bug 200505 Opened 21 years ago Closed 15 years ago

Optimization of jsref array_join_sub() function

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: bmarsch, Assigned: luke)

References

(Blocks 1 open bug)

Details

(Keywords: perf)

Attachments

(6 files, 17 obsolete files)

10.44 KB, patch
shaver
: review-
Details | Diff | Splinter Review
3.78 KB, text/plain
Details
9.26 KB, patch
Details | Diff | Splinter Review
47.45 KB, patch
Waldo
: review+
Details | Diff | Splinter Review
1.67 KB, patch
Waldo
: review+
Details | Diff | Splinter Review
755 bytes, patch
graydon
: review+
Details | Diff | Splinter Review
User-Agent:       Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20021003
Build Identifier: JS 1.5 RC 5

I noticed that the implementation of the Array.join function in the JavaScript
engine (function array_join_sub in file js/src/jsarray.c) is rather poor:

It iterates over all elements in the array, converts each element to a string,
(re)mallocs the memory needed for the element and finally appends the string to
a buffer. That means for any element in the array remalloc is called which can
be rather expensive.

So I rewrote the code, so it converts all elements to strings in a first loop.
Knowing their total length, I'm then able to allocate the buffer once and in a
second loop I copy all strings into the buffer.

On Linux, a loop just calling Array.join() run more than 35% faster than the
original code.

I'd be really glad, if my patch makes it into the next release ...


Reproducible: Always

Steps to Reproduce:
1.
2.
3.




The rewritten function (based on js-1.5 RC 5) looks like this:

static JSBool
array_join_sub(JSContext *cx, JSObject *obj, JSString *sep, JSBool literalize,
	       jsval *rval, JSBool localeString)
{
    JSBool ok;
    jsval v;
    jsuint length, index;
    jschar *chars, *ochars;
    size_t nchars, growth, seplen;
    const jschar *sepstr;
    JSString *str;
    JSHashEntry *he;
    JSObject *obj2;
    JSString **strarr = NULL;
    int totallength;

    ok = js_GetLengthProperty(cx, obj, &length);
    if (!ok)
	return JS_FALSE;
    ok = JS_TRUE;

    he = js_EnterSharpObject(cx, obj, NULL, &chars);
    if (!he)
        return JS_FALSE;
    if (literalize) {
	if (IS_SHARP(he)) {
#if JS_HAS_SHARP_VARS
	    nchars = js_strlen(chars);
#else
	    chars[0] = '[';
	    chars[1] = ']';
	    chars[2] = 0;
	    nchars = 2;
#endif
	    goto make_string;
	}
        if (!chars)
            nchars = 0;
        else
            nchars = js_strlen(chars);
    } else {
        /*
         * Free any sharp variable definition in chars.  Normally, we would
         * MAKE_SHARP(he) so that only the first sharp variable annotation is
         * a definition, and all the rest are references, but in the current
         * case of (!literalize), we don't need chars at all.
         */
        if (chars)
            JS_free(cx, chars);
        chars = NULL;
        nchars = 0;

        /* Return the empty string on a cycle as well as on empty join. */
        if (IS_BUSY(he) || length == 0) {
            js_LeaveSharpObject(cx, NULL);
	    *rval = JS_GetEmptyStringValue(cx);
	    return ok;
	}

        /* Flag he as BUSY so we can distinguish a cycle from a join-point. */
        MAKE_BUSY(he);
    }

    /* First convert all elements to strings and store them in a temporary
       array. After conversion we know how many bytes to (re)allocate and in a
       second loop we build the actual string. */
    strarr = (JSString**)malloc(sizeof(JSString*)*length);
    totallength = 0;

    v = JSVAL_NULL;
    for (index = 0; index < length; index++) {
	ok = JS_GetElement(cx, obj, index, &v);
	if (!ok)
	    goto done;

	if (JSVAL_IS_VOID(v) || JSVAL_IS_NULL(v)) {
	    str = cx->runtime->emptyString;
	} else {
            if (localeString) {
                if (!js_ValueToObject(cx, v, &obj2) ||
                    !js_TryMethod(cx, obj2,
                                  cx->runtime->atomState.toLocaleStringAtom,
                                  0, NULL, &v)) {
                    str = NULL;
                } else {
                    str = js_ValueToString(cx, v);
                }
            } else {
                str = (literalize ? js_ValueToSource : js_ValueToString)(cx, v);
            }
	    if (!str) {
		ok = JS_FALSE;
		goto done;
	    }
	}

        strarr[index] = str;
        totallength += JSSTRING_LENGTH(str);
    }

    sepstr = NULL;
    seplen = JSSTRING_LENGTH(sep);

    /*
     * Allocate 1 + 3 + 1 for "[", the worst-case closing ", ]", and the
     * terminating 0.
     */
    growth = (literalize) ? (1 + 3 + 1) : 0;
    /* I'm rather sure that I need the additional +3+1 only once at the end of
       the string, but I'm not 100% sure ... */
    growth += (nchars + (JSSTRING_CHARS(sep) ? seplen : 0)) * length +
totallength + 3 + 1;
    growth *= sizeof(jschar);
    if (!chars) {
        chars = (jschar *) malloc(growth);
        if (!chars)
            goto done;
    } else {
        chars = (jschar *) realloc((ochars = chars), growth);
        if (!chars) {
            free(ochars);
            goto done;
        }
    }

    if (literalize) {
        chars[nchars++] = '[';
    }

    for (index = 0; index < length; index++) {
        if (sepstr) {
            js_strncpy(&chars[nchars], sepstr, seplen);
            nchars += seplen;
        }
        sepstr = JSSTRING_CHARS(sep);
        
        js_strncpy(&chars[nchars], JSSTRING_CHARS(strarr[index]),
JSSTRING_LENGTH(strarr[index]));
        nchars += JSSTRING_LENGTH(strarr[index]);
    }

  done:
    if (strarr)
        free(strarr);
    if (literalize) {
	if (chars) {
	    if (JSVAL_IS_VOID(v)) {
		chars[nchars++] = ',';
		chars[nchars++] = ' ';
	    }
	    chars[nchars++] = ']';
	}
    } else {
        CLEAR_BUSY(he);
    }
    js_LeaveSharpObject(cx, NULL);
    if (!ok) {
	if (chars)
	    free(chars);
	return ok;
    }

  make_string:
    if (!chars) {
	JS_ReportOutOfMemory(cx);
	return JS_FALSE;
    }
    chars[nchars] = 0;
    str = js_NewString(cx, chars, nchars, 0);
    if (!str) {
	free(chars);
	return JS_FALSE;
    }
    *rval = STRING_TO_JSVAL(str);
    return JS_TRUE;
}
Reassigning and cc'ing Brendan for consideration of this change.

Out of curisoity, I looked up other bugs on Array performance.
I remember that in bug 143354, similar questions were raised on 
string conversion. For example, see bug 143354 comment 9.

bug 64065   "Array.prototype.sort inefficiencies"
bug 143354  "IE6 is 10x faster than Mozilla on random Array.sort()
bug 171262  "Performance: improve speed of new Array creation"
Assignee: rogerl → khanson
Status: UNCONFIRMED → NEW
Ever confirmed: true
Keywords: perf
Is this worth the code size increase?

Please attach a proper diff -u patch.

/be
Attached patch jsarray.c patch (obsolete) — Splinter Review
Comment on attachment 119415 [details] [diff] [review]
jsarray.c patch

This from a real quick glance at the patch.  Not a real review...

It looks to me like the code bloat might not be that bad.  Bernhard, can you
give us code size numbers before/after your patch so that we can objectively
see the impact of this patch?

Also: switch to JS_malloc and family; you have a JSContext available.  Also
check for out-of-memory.

Also: the strings in your temporary array don't seem to be rooted.  (Unless I'm
missing something--if so, please comment in the code.)	The GC might run during
js_ValueToString, causing your previously created strings to vanish.

Also: address the "I'm not sure" part.	Become sure. ;)

--scole
Attached patch New patch for jsarray.c (obsolete) — Splinter Review
I only did a very simple test (joining an array of 100 elements 100000 times)
and observed no significant change in code size.

I now use JS_malloc/JS_free (I just copied that part from the original code and
it used malloc/free).

The objection about rooting the temporary JSStrings is absolutely right (don't
know how I missed that).

About the "I'm not sure part": I hoped the comment would provoke somebody how
really knows the code (the original author?) to look at it ;-). I see no reason
why 3+1 is added for every element but maybe I'm missing something.
Attachment #119415 - Attachment is obsolete: true
Something for 1.8a ?
Assignee: khanson → general
QA Contact: pschwartau → general
Sorry this bug got lost -- it needs an owner.

The patch is quite out of date with respect to the latest source.  Also, there is no need for a manually added global root -- a local root can be used without any locking or hash table add/remove overhead associated with a global root.

Someone should take assignment of this bug and help get an updated patch attached and ready for review.  I won't have time till June, but I'm cc'ing some folks who may have time.

/be
this seems to have gotten lost again
Blocks: 374740
Assignee: general → igor
Bug got lost *again*. Nominating for blocking, as it could be a perf win?
Flags: blocking1.9?
Do we have significant data to know this is indeed a perf win?
Flags: blocking1.9? → blocking1.9-
This definitely -would- be a significant perf win for code using Array.join ...  how widely that's used in the real world, I don't know.  I'll try to hit this after I finish bug 403977 (which interestingly uses a very similar strategy).
Array.join over a set of strings to perform a concat is fairly common and has been a general perf recommendation for some time.
+'ing and setting to P3.  Crowder, please adjust the priority as needed.  Thanks for the info, guys.
Flags: blocking1.9- → blocking1.9+
Priority: -- → P3
I'm making this bug dependent upon the completion of the work in bug 322889, which will make resolving this much easier (or at least different).
Assignee: igor → crowder
Depends on: native-arrays
Bug 322889 has landed, you are cleared for take-off.
Attached patch WIP, v.01 (obsolete) — Splinter Review
Attachment #121686 - Attachment is obsolete: true
Attached patch WIP, v.02 (obsolete) — Splinter Review
Attachment #306549 - Attachment is obsolete: true
Flags: tracking1.9+ → blocking1.9+
Priority: P3 → P2
Attachment #306619 - Attachment is obsolete: true
Attached patch js/tests passedSplinter Review
make check and mochitest coming up.
Attachment #306665 - Attachment is obsolete: true
Comment on attachment 306668 [details] [diff] [review]
js/tests passed

+                /* Can't fast-path this join, so growth = 0, and bail below. */
+                growth = 0;
+                break;

Tell me how you feel about using a goto slow_path; here, instead of a break followed by if (growth)...  I'm more and more inclined to go with the goto.
Attachment #306668 - Flags: review?(shaver)
make check and mochitests seem to be as clean with this patch on my machine as they are without (they still don't run clean for me all the time, though)
Comment on attachment 306668 [details] [diff] [review]
js/tests passed

More to do, but the approach seems generally solid.

- I think the logic would be clearer if you made a subroutine like array_join_try_fast that could either set *finished to JS_FALSE and bail by returning, or set it to JS_TRUE to indicate that it was able to go fast.

- This pattern looks like a win for sparse arrays as well; if you agree, make array_join_try_fast call GetArrayElement for sparse arrays (or all of them, since GAE specializes itself for denseness, and I suspect it's getting inlined anyway)

- Even if we can't fully predict the length, such as if there's a single object in an otherwise-primitive array, it seems like we win by fusing as much allocation as we can, and just growing if we exceed that amount.

>+    jsval val;

Canonical name is |v|.

>+        for (index = 0; index < length; index++) {

You don't seem to account for holes at the end of the array here, where denselen < length:

javascript:a = [1]; a.length = 5; a

>+            val = obj->dslots[index];
>+            if (JSVAL_IS_STRING(val) && op != TO_SOURCE) {
>+                growth += JSSTRING_LENGTH(JSVAL_TO_STRING(val)) + seplen;
>+            } else if (JSVAL_IS_INT(val)) {
>+                int32 value = JSVAL_TO_INT(val);
>+                /* positive only for log10(), and *10 to make room for '-' */
>+                value = value < 0 ? (0 - value) * 10 : value;
>+                growth += ceilf(log10(value + 2)) + seplen;

Canonical name would be |i| -- and what's the |+ 2| magic constant in the last line?

I know brendan likes the ?: assignment, but I think this case (where one branch just assigns over itself) would be clearer as

if (value < 0)
  value = 0 - value * 10;

>+            } else if (JSVAL_IS_BOOLEAN(val) ||
>+                       val == JSVAL_HOLE || val == JSVAL_VOID ||
>+                       val == JSVAL_NULL) {
>+                if (op == TO_SOURCE && val != JSVAL_HOLE) {
>+                    str = js_ValueToSource(cx, val);
>+                    growth += JSSTRING_LENGTH(str) + seplen;
>+                } else {
>+                    growth += seplen;
>+                }

Break the JSVAL_HOLE case into a preceding block to simplify?  The test in the actual copying path below looks different, probably not intentionally?

Creating a String here seems counter to the intended leanness of this patch -- do we have jschar expansions of 'false', 'true', 'undefined', 'null' anywhere that we can re-use?

>+        if (growth) {
>+            /* Make room for '[' and ']' required by TO_SOURCE. */
>+            growth += (op == TO_SOURCE) ? 2 : 0;
>+
>+            chars = JS_malloc(cx, sizeof(jschar) * growth);
>+            if (!chars)
>+                return JS_FALSE;
>+
>+            nchars = 0;
>+            if (op == TO_SOURCE)
>+                chars[nchars++] = '[';
>+            for (index = 0; index < length; index++) {
>+                val = obj->dslots[index];
>+                if (JSVAL_IS_STRING(val)) {
>+                    str = JSVAL_TO_STRING(val);
>+                    tmplen = JSSTRING_LENGTH(str);
>+                    JS_ASSERT(nchars + tmplen <= growth);
>+                    memcpy(&chars[nchars], JSSTRING_CHARS(str),
>+                           tmplen * sizeof(jschar));

js_strncpy here and elsewhere, as you do for the JSVAL_IS_INT case.

>+                    nchars += tmplen;
>+                } else if (JSVAL_IS_INT(val)) {
>+                    int32 value = JSVAL_TO_INT(val);
>+                    jschar numStr[12];
>+                    ochars = js_IntToUCString(value, numStr,
>+                                              JS_ARRAY_LENGTH(numStr));
>+                    tmplen = numStr + JS_ARRAY_LENGTH(numStr) - ochars - 1;
>+                    JS_ASSERT(nchars + tmplen <= growth);
>+                    js_strncpy(&chars[nchars], ochars, tmplen);
>+                    nchars += tmplen;
>+                } else if (op == TO_SOURCE &&
>+                           ((JSVAL_IS_BOOLEAN(val) && val != JSVAL_HOLE) ||
>+                            val == JSVAL_VOID || val == JSVAL_NULL)) {
>+                    str = js_ValueToSource(cx, val);
>+                    tmplen = JSSTRING_LENGTH(str);
>+                    JS_ASSERT(nchars + tmplen <= growth);
>+                    memcpy(&chars[nchars], JSSTRING_CHARS(str),
>+                           tmplen * sizeof(jschar));

Again js_strncpy.

>+                    nchars += tmplen;
>+                }
>+                if (seplen && index < length - 1) {
>+                    memcpy(&chars[nchars], sepstr, seplen * sizeof(jschar));
>+                    nchars += seplen;
>+                }
>+                JS_ASSERT(nchars <= growth);


>+            }
>+            if (op == TO_SOURCE) {
>+                /* TO_SOURCE must provide for arrays with a hole at the end. */
>+                if (obj->dslots[index - 1] == JSVAL_HOLE)
>+                    chars[nchars++] = ',';
>+                chars[nchars++] = ']';
>+            }
>+
>+            str = js_NewString(cx, chars, nchars);
>+            if (!str) {
>+                free(chars);

JS_free

>+                return JS_FALSE;
>+            }
>+            *rval = STRING_TO_JSVAL(str);
>+            return JS_TRUE;
>+        }
>+    }
>+
>     he = js_EnterSharpObject(cx, obj, NULL, &chars);
>     if (!he)
>         return JS_FALSE;
>+
> #ifdef DEBUG
>     growth = (size_t) -1;
> #endif
> 
>     if (op == TO_SOURCE) {
>         if (IS_SHARP(he)) {
> #if JS_HAS_SHARP_VARS
>             nchars = js_strlen(chars);
>@@ -1155,18 +1263,18 @@ array_join_sub(JSContext *cx, JSObject *
>             chars = (jschar *)realloc((ochars = chars), growth);
>             if (!chars) {
>                 free(ochars);
>                 goto done;
>             }
>         }
>         chars[nchars++] = '[';
>         JS_ASSERT(sep == NULL);
>-        sepstr = NULL;  /* indicates to use ", " as separator */
>-        seplen = 2;
>+        sepstr = to_src_sep;  /* indicates to use ", " as separator */
>+        seplen = JS_ARRAY_LENGTH(to_src_sep) - 1;
>     } else {
>         /*
>          * Free any sharp variable definition in chars.  Normally, we would
>          * MAKE_SHARP(he) so that only the first sharp variable annotation is
>          * a definition, and all the rest are references, but in the current
>          * case of (op != TO_SOURCE), we don't need chars at all.
>          */
>         if (chars)
>@@ -1183,18 +1291,18 @@ array_join_sub(JSContext *cx, JSObject *
>         }
> 
>         /* Flag he as BUSY so we can distinguish a cycle from a join-point. */
>         MAKE_BUSY(he);
> 
>         if (sep) {
>             JSSTRING_CHARS_AND_LENGTH(sep, sepstr, seplen);
>         } else {
>-            sepstr = NULL;      /* indicates to use "," as separator */
>-            seplen = 1;
>+            sepstr = to_str_sep;      /* indicates to use "," as separator */
>+            seplen = JS_ARRAY_LENGTH(to_str_sep) - 1;
>         }
>     }
> 
>     /* Use rval to locally root each element value as we loop and convert. */
>     for (index = 0; index < length; index++) {
>         if (OBJ_IS_DENSE_ARRAY(cx, obj) && index < ARRAY_DENSE_LENGTH(obj)) {
>             *rval = obj->dslots[index];
>             hole = (*rval == JSVAL_HOLE);
>@@ -1266,24 +1374,17 @@ array_join_sub(JSContext *cx, JSObject *
>                 goto done;
>             }
>         }
> 
>         js_strncpy(&chars[nchars], JSSTRING_CHARS(str), tmplen);
>         nchars += tmplen;
> 
>         if (seplen) {
>-            if (sepstr) {
>-                js_strncpy(&chars[nchars], sepstr, seplen);
>-            } else {
>-                JS_ASSERT(seplen == 1 || seplen == 2);
>-                chars[nchars] = ',';
>-                if (seplen == 2)
>-                    chars[nchars + 1] = ' ';
>-            }
>+            memcpy(&chars[nchars], sepstr, seplen * sizeof(jschar));
>             nchars += seplen;
>         }
>     }
> 
>   done:
>     if (op == TO_SOURCE) {
>         if (chars)
>             chars[nchars++] = ']';
>Index: jsnum.c
>===================================================================
>RCS file: /cvsroot/mozilla/js/src/jsnum.c,v
>retrieving revision 3.108
>diff -u -p -8 -r3.108 jsnum.c
>--- jsnum.c	26 Feb 2008 21:01:42 -0000	3.108
>+++ jsnum.c	1 Mar 2008 05:09:44 -0000
>@@ -237,16 +237,45 @@ js_IntToCString(jsint i, char *buf, size
> 
>     if (i < 0)
>         *--cp = '-';
> 
>     JS_ASSERT(cp >= buf);
>     return cp;
> }
> 
>+/* The buf must be big enough for MIN_INT to fit including '-' and '\0'. */
>+jschar *
>+js_IntToUCString(jsint i, jschar *buf, size_t bufSize)
>+{
>+    jschar *cp;
>+    jsuint u;
>+
>+    u = (i < 0) ? -i : i;
>+
>+    cp = buf + bufSize; /* one past last buffer cell */
>+    *--cp = '\0';       /* null terminate the string to be */
>+
>+    /*
>+     * Build the string from behind. We use multiply and subtraction
>+     * instead of modulus because that's much faster.
>+     */

"Build the string from the end, using multiplication and subtraction rather than modulus for speed."?
Attachment #306668 - Flags: review?(shaver) → review-
> - I think the logic would be clearer if you made a subroutine like
> array_join_try_fast that could either set *finished to JS_FALSE and bail by
> returning, or set it to JS_TRUE to indicate that it was able to go fast.
I toyed with this idea, and the function is already stretched thin; but I was hesitant to pay the cost of a function-call.  It seems unlikely the compiler will inline this, either.  I can force it, I suppose.


> - This pattern looks like a win for sparse arrays as well; if you agree, make
> array_join_try_fast call GetArrayElement for sparse arrays (or all of them,
> since GAE specializes itself for denseness, and I suspect it's getting inlined
> anyway)
I toyed with the idea of writing an array_for_each_helper using this pattern, and then building this (and probably a good number of other routines) on that, but again was wary of adding the function call(s)...  and this would be a wicked-evil macro.  The for_each_helper would be used to both calculate string-size and build the output string.  Would be so nice to have C++ functors here; could save function-calls.


> - Even if we can't fully predict the length, such as if there's a single object
> in an otherwise-primitive array, it seems like we win by fusing as much
> allocation as we can, and just growing if we exceed that amount.
Still seems like the less-hacky approach to this for slow-arrays is a smarter, safe-appending, doubling-growth string routine.

> >+        for (index = 0; index < length; index++) {
> 
> You don't seem to account for holes at the end of the array here, where
> denselen < length:
length <= ARRAY_DENSE_LENGTH(obj) occurs in the conditional enclosing all of this logic.  Arrays whose length is longer than denselen must consult their prototype for elements (so we "fail" to slow).  :(

> >+            val = obj->dslots[index];
> >+            if (JSVAL_IS_STRING(val) && op != TO_SOURCE) {
> >+                growth += JSSTRING_LENGTH(JSVAL_TO_STRING(val)) + seplen;
> >+            } else if (JSVAL_IS_INT(val)) {
> >+                int32 value = JSVAL_TO_INT(val);
> >+                /* positive only for log10(), and *10 to make room for '-' */
> >+                value = value < 0 ? (0 - value) * 10 : value;
> >+                growth += ceilf(log10(value + 2)) + seplen;
> 
> Canonical name would be |i| -- and what's the |+ 2| magic constant in the last
> line?
Canonical name of what?  I'm "reusing" growth here in just the same fashion as it is used in the "slow" section.  Will rename if I create a new function; but I am still not quite sure to which variable-name you refer.  The + 2 is important.  log10(0) and log10(1) yield 0.00000, so we need to "fudge" upwards; should always get at least 1 for growth.

> >+            } else if (JSVAL_IS_BOOLEAN(val) ||
> >+                       val == JSVAL_HOLE || val == JSVAL_VOID ||
> >+                       val == JSVAL_NULL) {
> >+                if (op == TO_SOURCE && val != JSVAL_HOLE) {
> >+                    str = js_ValueToSource(cx, val);
> >+                    growth += JSSTRING_LENGTH(str) + seplen;
> >+                } else {
> >+                    growth += seplen;
> >+                }
> 
> Break the JSVAL_HOLE case into a preceding block to simplify?  The test in the
> actual copying path below looks different, probably not intentionally?
If I recall rightly, this weirdness happens because holes are also booleans :(  I'll try to rephrase.  The difference below is intentional, though, since for holes all we need to do is append a separator, which we're already doing.

> Creating a String here seems counter to the intended leanness of this patch --
> do we have jschar expansions of 'false', 'true', 'undefined', 'null' anywhere
> that we can re-use?
Luckily, I'm not creating a string for any of these types of values.  Their strings already exist and are atomized earlier.  I might be able to steal jschar * constants of them from somewhere, but this code seemed a lot cleaner.


> >+                    memcpy(&chars[nchars], JSSTRING_CHARS(str),
> >+                           tmplen * sizeof(jschar));
> 
> js_strncpy here and elsewhere, as you do for the JSVAL_IS_INT case.
I'd actually prefer to use memcpy in all the places where the length of the string being copied from is already known.  It saves having to compare against null in the src.  Are there places I'm using strncpy which could instead be memcpy?  I'll look.


> >+                free(chars);
> 
> JS_free
Ugh, thanks.


> >+    /*
> >+     * Build the string from behind. We use multiply and subtraction
> >+     * instead of modulus because that's much faster.
> >+     */
> 
> "Build the string from the end, using multiplication and subtraction rather
> than modulus for speed."?
Sure.  I'll tidy up the non-UC version similarly.

Gimme some feedback on my thoughts here and I'll do another patch?
(In reply to comment #22)
> >+                value = value < 0 ? (0 - value) * 10 : value;
> >+                growth += ceilf(log10(value + 2)) + seplen;
> 
> Canonical name would be |i| -- and what's the |+ 2| magic constant in the last
> line?
> 
> I know brendan likes the ?: assignment, but I think this case (where one branch
> just assigns over itself) would be clearer as
> 
> if (value < 0)
>   value = 0 - value * 10;

I do not like any such thing as 'x = c ? y : x' -- don't take my name in vain here :-P. This is a clear case for if-then with conditional update to value, not unconditional in all paths.

The condition c above is (value < 0) and it should be parenthesized per house style (all conditions for ?: with precedence looser than member expressions).
> Creating a String here seems counter to the intended leanness of this patch --
> do we have jschar expansions of 'false', 'true', 'undefined', 'null' anywhere
> that we can re-use?

Sure do.

/be
I wrote this before, and then I lost it because I can't be trusted with complex tools like tabs and browser windows.  Sorry.

- Go for the simplifying function, it's a single call overhead for the whole of the join, it should be well below the noise floor.  Don't sweat inlining it.

- Use the jschar expansions that brendan references.

- Fear not js_strncpy's performance, for it does not check the zeroes: http://mxr.mozilla.org/mozilla/source/js/src/jsstr.h#540

- slow arrays don't differ in the allocation behaviour for join, just the cost and means of fetching properties; not sure why they'd want to be so different?

- sorry, canonical name for your |int32 value| would be |int32 i|

- comment the magic +2?
Looking like this should be back-burnered; the easy wins aren't there yet.  Some ideas, though:  is the sharp-processing code necessary in any but the TO_SOURCE case, here?  If this can be avoided, we can save at least one alloc and some hash-table work.

The next step(s) here should be to try measuring out the result-set in one pass and allocating for that, in both the fast *and* slow cases, using pre-calculated knowledge of whether the array is "simple/dense" or not (where simple means that the array contains ints, strings, or other "constant" data, like boolean "true"/"false", holes, etc.  Inserting an object or an array, in other words, would make it non-"simple".

Another path to try is simply making the output string grow by doubling, instead of conservatively, as it does now.
Given comment 26 I'm taking this off the blocker list
Flags: blocking1.9+ → blocking1.9-
Yeah, unblocking.  Even if we have a couple of objects, fusing the allocations for the rest and reducing copies is a win in my testing for JSOP_CONCATN at least.  Thanks for running the science here!
Assignee: crowder → general
Priority: P2 → --
this patch is obviously messy, but it does manage to bypass js_ValueToString for a 6% speedup. GNU's std::vector seems to be about the same speed as the manual chars/nchars stuff.

trunk: 9000 ops/sec
patch: 9550 ops/sec

The big, unaddressed cost for the attached test case is js_EnterSharpObjects, which is really expensive in join(). See bug 484512.
Blocks: 484614
Assignee: general → jim
Easy enough to memoize just array objects in a dedicated (per-cx) JSDHashTable. cx->resolvingTable exists for similar reason involving lazy standard class init.

The sharp{Depth,Array} members of JSStackFrame will become fixed slots (local variables -- different slot per function but not requiring dedicated stack frame members) shortly; see bug 471425.

/be
I found what I think is a bug (bug 496223) while investigating the discussions here about sharp objects.  I would like to prepare a patch for this bug, and I would appreciate someone more knowledgeable's opinion on the issues in bug 496223.

The first part of the above comment 32 makes me think that infinite recursion through user-defined toString is the user's fault, and caught by JS_CHECK_RECURSION, not js_EnterSharpObject et al.
Here is a rough first draft of a patch.  There are still several TODOs inline, but I would be happy to get early feedback.

In this patch:
- array_join_sub is split into two functions: array_toString_sub, called by  various toString methods, and array_toSource.  Although there is a bit of code duplication, the logic is a lot easier to follow and the separate path should make toString faster.
- For the toString case, a hash table is kept in the context (busyArrayTable) to detect infinite loops.  This means that [array -> user-defined function -> array] infinite loops are not caught.  But, as seen in bug 496223, it is not always clear when a toString/toSource call in user code is an infinite loop or not.  JS_CHECK_RECURSION will catch the error.
- For both functions, a vector-like buffer is used instead of realloc()ing on every iteration.  This vector was factored out into a template (JSVector) to allow reuse and RAII-style resource management.  JSVector doesn't throw and, at the momement, assumes JS_malloc will succeed or abort().

The result is a 33% on peacekeeper.arrayCombined with everything but Array.join commented out.

trunk: 4015 ops/sec
patch: 5347 ops/sec
Assignee: jim → lwagner
This version improves on the last by avoiding a lot of temporary object allocation.  By adding new functions to jsnum.cpp and jsstr.cp, numbers and strings write themselves directly into the memory buffer being built in array_toString_sub.  The JSVector was moved to its own header, jsvector.h following the lead of jsclist.h.

Big win for performance:
trunk: 5042 ops/sec
patch: 22624 ops/sec
Attachment #381879 - Attachment is obsolete: true
Attachment #382417 - Flags: review?(graydon)
Attached patch Tweaks (obsolete) — Splinter Review
Tweaks to JSVector to allocate less / be faster.
Moved busyArrayTable allocation to js_NewContext.
v2 patch forgets to free busyArrayTable in FreeContext.

Also, valgrind results for running the join portion of the peacekeeper benchmark:
trunk: 218K malloc/free calls, 94MB allocated
patch: 20K malloc/free calls, 4.4MB allocated
Attachment #382533 - Flags: review?(graydon)
Attachment #382417 - Attachment is obsolete: true
Attachment #382417 - Flags: review?(graydon)
Comment on attachment 382533 [details] [diff] [review]
Tweaks

>--- a/js/src/jsarray.cpp	Fri May 29 01:19:52 2009 +0200
>+++ b/js/src/jsarray.cpp	Wed Jun 10 10:50:08 2009 -0700

>+    for (jsuint index = 0; index < length; index++) {
>+        /* Use rval to locally root each element value. */
>+        JSBool hole;
>+        ok = (JS_CHECK_OPERATION_LIMIT(cx) &&
>+              GetArrayElement(cx, obj, index, &hole, vp));

You're using vp, not rval. Fix comment.

>+static JSBool
>+array_toString_sub(JSContext *cx, JSObject *obj, JSBool locale,
>+                   JSString *sepstr, jsval *rval)

I'm not really fond of splitting what used to be one function taking a flag into
two large functions sharing significant amounts of logic. Can you factor
out more shared pieces?

... Most of the material in your changes to jsarray are to code I've never
worked on and am only mildly familiar with the semantic constraints of. I
don't know most of those APIs. You might wish to ask a JSAPI person with
more background in that file for review. Hg log it and see who's been
editing it a lot :)

>--- a/js/src/jsstr.cpp	Fri May 29 01:19:52 2009 +0200
>+++ b/js/src/jsstr.cpp	Wed Jun 10 10:50:08 2009 -0700
>@@ -68,16 +68,17 @@
> #include "jsnum.h"
> #include "jsobj.h"
> #include "jsopcode.h"
> #include "jsregexp.h"
> #include "jsscope.h"
> #include "jsstaticcheck.h"
> #include "jsstr.h"
> #include "jsbit.h"
>+#include "jsvector.h"
> 
> #define JSSTRDEP_RECURSION_LIMIT        100
> 
> size_t
> js_MinimizeDependentStrings(JSString *str, int level, JSString **basep)
> {
>     JSString *base;
>     size_t start, length;
>@@ -2970,16 +2971,53 @@ js_ValueToString(JSContext *cx, jsval v)
>     } else if (JSVAL_IS_BOOLEAN(v)) {
>         str = js_BooleanToString(cx, JSVAL_TO_BOOLEAN(v));
>     } else {
>         str = ATOM_TO_STRING(cx->runtime->atomState.typeAtoms[JSTYPE_VOID]);
>     }
>     return str;
> }
> 
>+static inline void 
>+pushAtom(JSAtom *atom, JSVector<jschar> &buf) {
>+    JSString *str = ATOM_TO_STRING(atom);
>+    const jschar *chars = JS_GetStringChars(str);
>+    buf.pushBack(chars, chars + JS_GetStringLength(str));
>+}
>+
>+JS_FRIEND_API(JSBool)
>+js_ValueToStringBuffer(JSContext *cx, jsval v, JSVector<jschar> &buf)

Put a note here that this function is intended to implement E-262-3 section
9.8, toString. Also take the code and that document to a quiet place with a
cup of tea and a red pen and be absollutely certain you're implementing that
algorithm. There are already a zillion ways toString can wind up "surprising"
us, and you're adding a new path into it. It looks mostly-right to me but
tread very carefully here.

As with my comment on jsarray, I'm not an ideal candidate for review on
things that are inspecting the various forms of jsval. I've only wandered
up into the interpreter a few times so far, and usually get burned.

>--- a/js/src/jsstr.h	Fri May 29 01:19:52 2009 +0200
>+++ b/js/src/jsstr.h	Wed Jun 10 10:50:08 2009 -0700

> /*
>+ * Convert a value to a string of jschars appended to the given buffer.  On 
>+ * error, the passed buffer may have partial results appended.
>+ */
>+extern JS_FRIEND_API(JSBool)
>+js_ValueToStringBuffer(JSContext *, jsval, JSVector<jschar> &);

Comment here about section 9.8 toString algorithm as well.

>--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
>+++ b/js/src/jsvector.h	Wed Jun 10 10:50:08 2009 -0700

>+ * The Original Code is Mozilla Communicator client code, released
>+ * March 31, 1998.
>+ *
>+ * The Initial Developer of the Original Code is
>+ * Netscape Communications Corporation.
>+ * Portions created by the Initial Developer are Copyright (C) 1998
>+ * the Initial Developer. All Rights Reserved.

It's a ways past '98 and you don't work for Netscape Communications Corp. Try
a fresher vintage license block, and put your name on it :)

>+ * JSVector is written under the assumption that JS_malloc will abort on OOM, 
>+ * not return null.  JSVector will fail in the same way [TODO] in the case of 
>+ * an overflow in number of elements.

I don't mean to start the flame war here except, er, well, I kinda have to in
order to sign off on this. We're going further down the infallilble-malloc
route. We need to Decide This Issue before you commit this code. I'm game but
it has been Officially Undecided as part of the spidermonkey roadmap for some
time now.

>+{
>+  public:
>+    JSVector(JSContext *cx) : cx(cx), mBeg(0), mEnd(0), mCap(0) {}
>+    ~JSVector();

Nit, not really your fault: now that we casually slid into C++-land, we're
growing class member-variable naming conventions that are not specified
anywhere. Some _foo, some foo, now some mFoo. Can we pick one?

>+  private:
>+    static const int sGrowthFactor = 3;

Why 3? I generally think of artifacts like the JSVector as a
"doubling/halving" vector.

>+    void growTo(size_t, size_t delta);

I don't see delta being used as a delta in the implementation. Also the
implementation has a weird "overflow" failure mode. Why not just pass a
"desired size" to growTo() and have it work out how much it needs to expand
capacity to cover that size. The two-arg form here looks awkward.

>+template <class T>
>+inline void
>+JSVector<T>::pushBack(const T *beg, const T *end) {
>+    int space = mCap - mEnd, needed = end - beg;

size_t here please.

>+template <class T>
>+inline void
>+JSVector<T>::growTo(size_t newcap, size_t requiredFree) {
>+    /* check for overflow */
>+    int oldsize = size();

And here. Also redo this to "cover intended size, double as needed" as
suggested above.
Attachment #382533 - Flags: review?(graydon) → review-
> >+static JSBool
> >+array_toString_sub(JSContext *cx, JSObject *obj, JSBool locale,
> >+                   JSString *sepstr, jsval *rval)
> 
> I'm not really fond of splitting what used to be one function taking a flag
> into
> two large functions sharing significant amounts of logic. Can you factor
> out more shared pieces?

I initially developed this patch maintaining the single array_join_sub function.  The problem is that, while the toString and toSource share the same gross control structure (detect cycles, choose separator, loop over elems, finish string), at a fine level of detail they are pretty different.  If you put the two functions side-by-side, each one of those steps I mentioned is handled differently.  The exception is the finish-string step, which I factored into a new function in the new patch.  Case and point, in the new patch, the sum of the LOC of the bodies of the two functions is exactly equal to the LOC of the body of the original array_join_sub.

> You might wish to ask a JSAPI person with more background in that file for
> review.

> Also take the code and that document to a quiet place with a
> cup of tea and a red pen and be absollutely certain you're implementing that
> algorithm. There are already a zillion ways toString can wind up "surprising"
> us, and you're adding a new path into it. It looks mostly-right to me but
> tread very carefully here.

Will do x 2

> Nit, not really your fault: now that we casually slid into C++-land, we're
> growing class member-variable naming conventions that are not specified
> anywhere. Some _foo, some foo, now some mFoo. Can we pick one?

foo makes member var names conflict with member function names, so I'll use _foo.

> Why 3? I generally think of artifacts like the JSVector as a
> "doubling/halving" vector.

Oops, at some point I was playing around and measuring.  Changed back.

> >+    void growTo(size_t, size_t delta);
> 
> I don't see delta being used as a delta in the implementation. Also the
> implementation has a weird "overflow" failure mode. Why not just pass a
> "desired size" to growTo() and have it work out how much it needs to expand
> capacity to cover that size. The two-arg form here looks awkward.

Really, I was factoring overflow checks out the pushBacks and resize.  A cleaner design is to put the overflow check in a new inline function, so I did that.
Attached patch v.4 (obsolete) — Splinter Review
With fixes in response to graydon's review.
Attachment #382533 - Attachment is obsolete: true
Attachment #383043 - Flags: review?(igor)
Attachment #383043 - Flags: review?(igor) → review?(jwalden+bmo)
The class member naming convention, based on years-old jsapi.h precedent, is mFoo (jorendorff or I will doc it at

https://wiki.mozilla.org/JavaScript:SpiderMonkey:C%2B%2B_Coding_Style

pretty soon).

Not sure it would help code sharing here, but dmandelin has been making good use of templates taking traits structs with static methods; grep Traits jstracer.cpp.

/be
See bug 497618 for recent mFoo usage. Jim B. reminded us on IRC that leading _ is reserved to the (C|C++) implementation, in all scopes (due to macros IIRC). The mFoo convention is used in Gecko too; using it in SpiderMonkey is not thereby mandated, but still a win overall.

/be
Leading _ is not reserved in all scopes.  Leading _ followed by a capital letter is reserved in that manner, as are all names containing __, but _foo is a perfectly valid identifier -- with the caveat that the C++ implementation might use and define that name globally, in which case the name should be qualified somehow (this-> [although it's possible names resolve to member in preference to globally in class methods, don't actually know if this happens], obj., and so on) to be fully safe.  I lack IRC context, however, so maybe I misunderstand what was claimed to be reserved.

That said, either mFoo or _foo is fine with me, and from what I understand others are indifferent to preferring mFoo anyway, so this clarification doesn't seem to matter much regarding what we choose.
Comment on attachment 383043 [details] [diff] [review]
v.4

>+template <class T>
>+inline void
>+JSVector<T>::growBy(size_t amount) {

This cannot be void. SpiderMonkey does not use exceptions, so new can return null and the code must check for it and return false if so. In addition, the class should be named JSTempVector or something like that as it uses malloc/free via the default new/delete and not JS_malloc/JS_free. This means, in particular, that the vector cannot be stored inside structures that will be released only during the GC.
> In addition, the
> class should be named JSTempVector or something like that as it uses
> malloc/free via the default new/delete and not JS_malloc/JS_free. This means,
> in particular, that the vector cannot be stored inside structures that will be
> released only during the GC.

Ignore that part of the comments - the class does uses JS_malloc to allocate the buffer. Still, given its usage as a temporary buffer that is explicitly freed when toString finishes, not when the garbage collector run, it should use just the standard new/delete with an explicit null check.
(In reply to comment #40)
> Not sure it would help code sharing here, but dmandelin has been making good
> use of templates taking traits structs with static methods; grep Traits
> jstracer.cpp.

Sounds great.  Initially, I was worried that if I did that I would come off as "one of those new yuppie C++ kids who insist on making everything a template parameter" (a la Alexandrescu's 10 bajillion parameter smart_ptr).
(In reply to comment #44)
> Ignore that part of the comments - the class does uses JS_malloc to allocate
> the buffer. Still, given its usage as a temporary buffer that is explicitly
> freed when toString finishes, not when the garbage collector run, it should use
> just the standard new/delete with an explicit null check.

The decision to elide null-checks is based on the assumption of the 'infallible-malloc' graydon referred to in comment 37.  (This is also documented in a comment at the top of the JSVector definition.)
(In reply to comment #46)
> The decision to elide null-checks is based on the assumption of the
> 'infallible-malloc' graydon referred to in comment 37.

malloc cannot be infallible with a script-controlled code that runs in the same process as the rest of the browser. What happens with 32-bit js shell when it runs the following code with the patch applied?

var s = Array(1e6).join();
var a = [];
for (var i = 0; i != 2000; ++i)
    s.push(s);
s.toString();

?
(In reply to comment #47)
> var s = Array(1e6).join();
> var a = [];
> for (var i = 0; i != 2000; ++i)
>     s.push(s);
> s.toString();

The example should be:

var s = Array(1e6).join();
var a = [];
for (var i = 0; i != 2000; ++i)
    a.push(s);
a.toString();
We try to avoid mixed style code that resembles Rome after it has been sacked (cough, nanojit).

How much more important it is to mix substance. We should not start pushing "OOM is fatal" without a conversation with embedders. And we need peers to agree and have a plan that shares the work.

Graydon clearly called for this larger agreement and plan, so citing his comment does not support the patch as is!

/be
Er, obviously I meant "how much more important is it *not* to mix [incompatible] substance."

/be
(In reply to comment #49)
> We should not start pushing
> "OOM is fatal" without a conversation with embedders. And we need peers to
> agree and have a plan that shares the work.

There is no even working infallible malloc implementation for the browser so we do not yet know if such malloc can be implemented with, for example, untrusted scripts on the stack. I.e. what to do if a page contains "for (a = [];;) a.push(1)"? Just abort the browser?

I do not see that we are going to have an answer to the above question any time soon. If it is not desirable to postpone toString optimizations until we have the answer, then the patch must deal with malloc returning null properly.
(In reply to comment #49)
> 
> How much more important it is to mix substance. We should not start pushing
> "OOM is fatal" without a conversation with embedders. And we need peers to
> agree and have a plan that shares the work.

I thought there was agreement to address it. Why continue to talk about talking about it? :)
(In reply to comment #52)
> (In reply to comment #49)
> > 
> > How much more important it is to mix substance. We should not start pushing
> > "OOM is fatal" without a conversation with embedders. And we need peers to
> > agree and have a plan that shares the work.
> 
> I thought there was agreement to address it. Why continue to talk about talking
> about it? :)

The newsgroup is the place. Be sure to address comment 51. Just pushing changes into code without agreeement is bad business. The onus is on the pushers to first explain how we avoid aborting on trivial DOS or indistinguishable web developer mistake.

Any quota system like we have for interpreter stack and other resources amounts to unwinding from OOM, so there's really no "out" here. We have to unwind-protect and recover from such script quota violations.

OTOH if all desktop OSes fail malloc only after the OS has thrashed everything to hell and gunned down random processes (or left the user trying to force quit whatever seems guilty), then *maybe* -- but that is not for sure even on the fat desktops, and it's certainly not a given for mobile.

Some of the OOM -> abort thinking has been focused on certain C++ code where we control the workload, and the content can't directly induce allocations beyond some built-in limit or other (screen dimensions x color depth x some reasonable small factor). But there's no limit on JS.

/be
In particular, asking for way too much memory than could ever be allocated will never thrash to death -- you'll get a clean, fast null from malloc. That's the point Igor is making in comment 51. That needs to be recovered from. Trying to guess when malloc would so fail is a waste of space, compared to letting it fail. Therefore there is no gain to trying to abort only some malloc failures (the ones that would cause thrash-to-death).

The real alternative here is exceptions (+ RTTI, + RAII helpers religiously for unwind protection). A big static analysis Everest, whose foothills we are now climbing.

/be
> Graydon clearly called for this larger agreement and plan, so citing his
> comment does not support the patch as is!

I'm happy to add OOM checks to the patch; I was just pointing out that the omission wasn't accidental and the patch was contingent on the consensus mentioned in the comment.  This consensus appears not to be as eminent as I initially thought, or eminent at all, so to move things along for this bug, I'll add the checks.
(In reply to comment #53)
> 
> The newsgroup is the place. Be sure to address comment 51. 

So it's up to me have to have the allocation discussion? I'm fine with that, but I doubt I'm the best person.

> Just pushing changes into code without agreeement is bad business. 

It's a patch in bugzilla. Don't call it "just pushing changes into code".

- Rob
I should add that other javascript engines aren't built with exceptions, don't seem to check each and every allocation, but are nevertheless used in a variety of embedding situations. What makes us so different?
(In reply to comment #56)
> (In reply to comment #53)
> > 
> > The newsgroup is the place. Be sure to address comment 51. 
> 
> So it's up to me have to have the allocation discussion? I'm fine with that,
> but I doubt I'm the best person.

You or whoever else is advocating the change, whatever it is. If you don't know what you're advocating, then figure that out first.

> > Just pushing changes into code without agreeement is bad business. 
> 
> It's a patch in bugzilla. Don't call it "just pushing changes into code".

Graydon already wrote "I don't mean to start the flame war here except, er, well, I kinda have to in order to sign off on this. We're going further down the infallilble-malloc route. We need to Decide This Issue before you commit this code."

Now you're circling back to the "it's just a patch", ignoring what he wrote. It'll keep getting r- if you keep this up.

(In reply to comment #57)
> I should add that other javascript engines aren't built with exceptions, don't
> seem to check each and every allocation, but are nevertheless used in a variety
> of embedding situations.

They have less market share and don't test OOM corners?

WebKit/JavaScriptCore/wtf/FastMalloc.h defines a CRASH macro that forces an unexploitable (I hope) segv. Indeed, try entering this into Safari 4's address bar:

javascript:var s=Array(1e6).join();var a=[];for(var i=0;i!=2000;++i)a.push(s);a.toString();

and see what happens. I didn't try this on my iPhone. Perhaps CRASH (which is #define'ed only if not defined already) is redefined there to something more productive.

> What makes us so different?

To use your style of argument, the answer is nothing -- so let's switch to V8 or Nitro. File a bug, patch it. Have fun being on a competitor's treadmill.

To use my own style of argument, you bloody well know what makes us different: we have tested for OOM and unwound, for more than a decade (infrequent bugs, and nanojit and jstracer calculated lossage since Firefox 3 notwithstanding). Gavin Reaney at Picsel, among others, have tested this with malloc failure injection. We've discussed this many times. So are you asking another question, such as "why is it worth trying to unwind from OOM, instead of just crashing like some competing engines do?"

If so, ask that question straight up instead of playing dumb, but first please notice that I already addressed it in comment 53 and comment 54. We have other kinds of quota checks that require unwind code, and errors as exceptions also require that in general, so unless we're talking "leaf code" in the control flow graph that knows it is calling malloc (new, other equivalents), we can't dodge the obligation to propagate error status and unwind-protect correctly.

Since we can't dodge this general obligation, the win of crashing if malloc returns null is limited to the leaf code. And even there, it's stupid to die if the request can never be fulfilled, as I noted in comment 54 -- yet also dumb to try to guess or test-and-patch for that request-size vs. process-size limit vs. OS policy vs. user rlimits vs. etc.

Now, this is the third time the case for malloc null checking has been made, and I am done repeating it here (it has been made in other bugs too). Respond to the argument in detail, in the newsgroup (with a link here).

/be
(In reply to comment #58)
> 
> You or whoever else is advocating the change, whatever it is. If you don't know
> what you're advocating, then figure that out first.

I am not sure why sentences like this are called for.

> 
> > > Just pushing changes into code without agreeement is bad business. 
> > 
> > It's a patch in bugzilla. Don't call it "just pushing changes into code".
> 
> Graydon already wrote "I don't mean to start the flame war here except, er,
> well, I kinda have to in order to sign off on this. We're going further down
> the infallilble-malloc route. We need to Decide This Issue before you commit
> this code."
> 
> Now you're circling back to the "it's just a patch", ignoring what he wrote.
> It'll keep getting r- if you keep this up.

Luke asked "what should I do if malloc fails in JSVector", and I answered "I think we're going to change that, don't check for now". So, the bug would have a dependency if we were going to change things. If we're not, the patch needs work.

Of course, I must have been mistaken about which code should change its policy on checking for malloc failures.

> Now, this is the third time the case for malloc null checking has been made,
> and I am done repeating it here (it has been made in other bugs too). Respond
> to the argument in detail, in the newsgroup (with a link here).

No thanks. I thought we actually wanted to change this policy, since the complexity costs are pretty high (at least, that's what I thought). If there is no desire from you guys, I have no desire either.
(In reply to comment #59)
> (In reply to comment #58)
> > 
> > You or whoever else is advocating the change, whatever it is. If you don't know
> > what you're advocating, then figure that out first.
> 
> I am not sure why sentences like this are called for.

Because I think you are not arguing fair and square in this bug. Did I sound annoyed? Good!

> So, the bug would have
> a dependency if we were going to change things. If we're not, the patch needs
> work.

That's the best case. Worst case, it becomes a forgotten to-do item, or (I've seen it, no point pretending we're all angels) it gets neglected "kind of" on purpose until poor planning becomes someone else's crisis.

> Of course, I must have been mistaken about which code should change its policy
> on checking for malloc failures.

I don't know if there's a single answer. This is a technical issue. If there is a leaf-ish function (not a pure leaf, the function or method calls malloc, of course!) that can't otherwise fail, then there is a policy choice.

If the function might run into a JS error-as-exception, then there's not much win in failing to JS_ReportOutOfMemory and propagate -- a little win in code size and complexity, at an incommensurate set of costs to webdevs and users, maybe (see the post I link to below for an idea that might help).

> > Now, this is the third time the case for malloc null checking has been made,
> > and I am done repeating it here (it has been made in other bugs too). Respond
> > to the argument in detail, in the newsgroup (with a link here).
> 
> No thanks. I thought we actually wanted to change this policy, since the
> complexity costs are pretty high (at least, that's what I thought). If there is
> no desire from you guys, I have no desire either.

What is this, Oprah? It's not about "desire" or "feelings". Either we have a technical case for preferring to crash on OOM (breaking compatibility with our long-standing OOM policy is certainly possible), or we don't have a technical case.

I don't think you have one yet, certainly not by citing Safari or Chrome (note that I didn't test v8 yet).

The trade-offs could be hard to make, but the costs and benefits are not only subjective -- and they in any event do not depend only on "our" (developers of the core code) desires. Web devs and even users will care, due to unintended as well as intended DOSes, off-by-N bugs in our code that we ship, bad luck under a blue moon on a particular OS, etc.

I posted to the newsgroup about this; I hope bugzilla doesn't break this URL by wrapping it:

http://groups.google.com/group/mozilla.dev.tech.js-engine/browse_frm/thread/05261dea181f0561#

Argh, twitter makes me want to shorten for you all:

http://bit.ly/JUy5L

/be
(In reply to comment #59)
> Luke asked "what should I do if malloc fails in JSVector", and I answered "I
> think we're going to change that, don't check for now".

This is what I do not like. If we decide that crashing on OOM is OK, then it should be done in one big patch applied to the whole code. It is rather harmful to start doing it method-method. This way we immediately loose all the advantages of the promise that we handle OOM and gain no advantages in reducing the overall code complexity.
Attached patch v.5 (obsolete) — Splinter Review
In response to comments:
 - JSVector has an Allocator policy with a default to use JS_malloc
 - OOM checks

Also, found bugs while sipping tea in a quiet room.
Attachment #383043 - Attachment is obsolete: true
Attachment #383318 - Flags: review?(jwalden+bmo)
Attachment #383043 - Flags: review?(jwalden+bmo)
(In reply to comment #62)
> Created an attachment (id=383318) [details]
> v.5

It is important to get a confirmation that 

for (T *dst = newbeg, *src = mBeg; src != mEnd; ++dst, ++src)
        new(dst) T(*src);

will translates into nothing in an optimized shell. Can you verify this? Otherwise the patch should use realloc and state that T must have no default or copy constructor nor destructor with a static verifier for this.

Also use bool/true/false evrywhere, not JSBool true/false.
> It is important to get a confirmation that 
> 
> for (T *dst = newbeg, *src = mBeg; src != mEnd; ++dst, ++src)
>         new(dst) T(*src);
> 
> will translates into nothing in an optimized shell. Can you verify this?
> Otherwise the patch should use realloc and state that T must have no default or
> copy constructor nor destructor with a static verifier for this.

I've been thinking about this issue.  I made JSVector like an STL vector in that it does the right thing if T is a string or other type that depends on a constructor/destructor.  Of course with the simplistic implementation I've used, this is less than optimal for POD types.  In most cases, the compiler discards noop destructor calls (I verified the assembly produce by gcc -O3 for destroyAll).  However, as you have observed, the specification of realloc() prevents its use when growing the vector.  To not lose performance, there are three options I can see:

1. the one you suggest, which is taking away the "safe for non-PODs" property of JSVector and then using realloc, memcpy, etc.
2. extending realloc() slightly with a flag that says "if you can't realloc in-place, return null."  This would allow a safe and efficient implementation.  Talking about this on #jsapi the other day, cjones pointed out that this requires a single 'else' clause in one function in jemalloc.
3. make a traits class 'pod<T>' which can be used to statically specialize JSVector implementations to use all the normal POD techniques.  We specialize pod<T> for all the PODs we care about (like jschar).

For the sake of generality, I am disclined toward #1, but OTOH, its possible no non-PODs would ever grace a JSVector with their membership.  What do you think?
(In reply to comment #64)
> 2. extending realloc() slightly with a flag that says "if you can't realloc
> in-place, return null."  This would allow a safe and efficient implementation. 

We do not control realloc in many embeddings. Correct me if I wrong, but on Mac FF is build with the system malloc implementation, not with jemalloc.

> For the sake of generality, I am disclined toward #1, but OTOH, its possible no
> non-PODs would ever grace a JSVector with their membership.  What do you think?

I do no know. Right now this is not the case, but if, for example, we would consider reference-counting strings, then this could be reconsidered. In any case, as long as we can assert with a static analysis that a type is safe against using realloc/memcopy for its array, we should be safe.
Attachment #383318 - Flags: review?(jwalden+bmo) → review-
Comment on attachment 383318 [details] [diff] [review]
v.5

Have fun updating the JSString macros to the new method names, also fixing value types to size_t and such from int, as a general note.


>diff -r 6b5508397a29 js/src/jsarray.cpp

>+/* Transfer ownership of buffer to returned string. */
>+BufferToString(JSContext *cx, JSVector<jschar> &buf, jsval *rval)

Maybe it's the C++ way, but something sits oddly with me about the implicit assumption that the defined-default allocator is compatible with JS_free later in this method.  I don't immediately see a way around this, but maybe it's just me, or maybe someone else can point out a better way to do this.


>+    size_t charlen = buf.size() - 1;

Kind of like length better here.


> #if JS_HAS_TOSOURCE
> static JSBool
> array_toSource(JSContext *cx, uintN argc, jsval *vp)
> {
>+    JSBool ok = JS_TRUE;

We defer declarations and assignments as long as possible, usually, now that we can in C++.  I believe in this instance, however, that it will be generally unobjectionable that you should define before before the "After this point" comment rather than deferring to after it.


>+    JS_CHECK_RECURSION(cx, return JS_FALSE);
>+
>+    JSObject *obj = JS_THIS_OBJECT(cx, vp);

Existing code's wrong, should null-check obj here, cf. bug 498543.  General problem to be fixed there, of course, but your code can do it right from the start, to be sure.


>+    JSBool initiallySharp = IS_SHARP(he) ? JS_TRUE : JS_FALSE;
>+    /* After this point, all paths exit through the 'done' label. */

There are macros which will invoke static checking magic to verify this, search for MUST_FLOW_LABEL and MUST_FLOW_THROUGH and use them here.  Also, you need a blank line before a comment like this.


>+    JSVector<jschar> buf(cx);
>+    if (!buf.reserve(3))
>+        goto done;

Need to assign to ok here.


>+#else
>+    if (IS_SHARP(he)) {
>+        jschar arr[] = { '[', ']', 0 };

Should be static const.


>+        /* Get element's character string. */
>+        JSString *str;
>+        if (hole) {
>+            str = cx->runtime->emptyString;
>+        } else {
>+            str = js_ValueToSource(cx, *vp);
>+            if (!str) {
>+                ok = JS_FALSE;
>+                goto done;
>+            }

Please add a *vp = STRING_TO_JSVAL(str) here, just to be safe; relying on weak roots is asking for trouble.  What's here should technically be safe since pushBack shouldn't GC, but I'd rather not even have to think about that when reading the code.


>+        if (index + 1 != length) {
>+            if (!(ok = buf.pushBack(',')) || !(ok = buf.pushBack(' ')))
>+                goto done;
>+        }
>+        else if (hole)
>+            if (!(ok = buf.pushBack(',')))
>+                goto done;

Always brace either arm of an if-else if either arm is more than one line or if the condition being tested is more than one line, so brace the if (hole) arm here (the if statement inside doesn't need to be braced).


>+JSBool
>+js_InitContextBusyArrayTable(JSContext *cx) {
>+    cx->busyArrayTable = JS_NewHashTable(4, js_hash_array, JS_CompareValues,
>+                                         JS_CompareValues, NULL, NULL);
>+    return cx->busyArrayTable != 0;
>+}

Compare to NULL, not 0.


>+static JSBool
>+array_toString_sub(JSContext *cx, JSObject *obj, JSBool locale,
>+                   JSString *sepstr, jsval *rval)
>+{
>+    JSBool ok = JS_TRUE;

Defer lower as noted before.


>+    }
>+    else {

One line for all this, cuddle those elses!


>+        /* Cycle, so return empty string. */
>+        *rval = JS_GetEmptyStringValue(cx);

cx->runtime->atomState.emptyAtom or whatever the right spelling of that is.


>+        return JS_TRUE;
>+    }
>+    JSAutoTempValueRooter tvr(cx, obj);
>+    /* After this point, all paths exit through the 'done' label. */

Blank line as before.


>+
>+    /* Get characters to use for the separator. */
>+    jschar comma = ',';

static const


>+    for (jsuint index = 0; index < length; index++) {
>+        /* Use rval to locally root each element value. */
>+        JSBool hole;
>+        ok = (JS_CHECK_OPERATION_LIMIT(cx) &&
>+              GetArrayElement(cx, obj, index, &hole, rval));

Don't parenthesize this, also previously in the patch if I remember right.


>+    if (buf.empty()) {
>+        *rval = JS_GetEmptyStringValue(cx);   

emptyAtom


>+        goto done;
>+    }
>+    else {

No need for an else immediately after a goto, true more generally for return as well, so don't bother with an else at all here.


>+        if (!(ok = buf.pushBack(0)))
>+            goto done;
>+    }
>+
>+    if (!(ok = BufferToString(cx, buf, rval)))
>+        goto done;

As a general thing to fix, we generally combine sequential failing calls (assuming no special cleanups needed between them) like so:

  ok = mightFail()
       && mightFail2()
       && mightFail3();
  if (!ok)
      error;


> static JSBool
> array_toString(JSContext *cx, uintN argc, jsval *vp)
> {
>     JSObject *obj;
> 
>     obj = JS_THIS_OBJECT(cx, vp);
>     if (OBJ_GET_CLASS(cx, obj) != &js_SlowArrayClass &&
>         !JS_InstanceOf(cx, obj, &js_ArrayClass, vp + 2)) {
>         return JS_FALSE;
>     }
>-    return array_join_sub(cx, obj, TO_STRING, NULL, vp);
>+    return array_toString_sub(cx, obj, JS_FALSE, NULL, vp);
> }

Please add a null-check of obj in here for the this-case while you're fixing things here.


>diff -r 6b5508397a29 js/src/jsbool.cpp

>+/* This function is intended to implement E-262-3 section 9.8, toString. */

"This function implements"


>+JSBool
>+js_BooleanToStringBuffer(JSContext *cx, JSBool b, JSVector<jschar> &buf) 
>+{
>+    JSAtom *atom = cx->runtime->atomState.booleanAtoms[b ? 1 : 0];
>+    JSString *str = ATOM_TO_STRING(atom);
>+    const jschar *chars = JS_GetStringChars(str);
>+    return buf.pushBack(chars, chars + JS_GetStringLength(str));

Use static const jschar arrays and JS_ARRAY_LENGTH to do this, no need to drag in atoms or JSAPI calls to do this.


>diff -r 6b5508397a29 js/src/jsnum.cpp

>+JSBool JS_FASTCALL
>+js_IntDoubleToStringBuffer(JSContext *cx, jsval v, JSVector<jschar> &buf) {
>+    /* Convert to C-string. */
>+    static const size_t arrSize = DTOSTR_STANDARD_BUFFER_SIZE;
>+    char arr[arrSize];
>+    const char *cstr;
>+    if (JSVAL_IS_INT(v))
>+        cstr = IntToCString(JSVAL_TO_INT(v), 10, arr, arrSize);
>+    else {

Brace the if, because the else is multi-line.


>+        JS_ASSERT(JSVAL_IS_DOUBLE(v));
>+        cstr = JS_dtostr(arr, arrSize, DTOSTR_STANDARD, 0, *JSVAL_TO_DOUBLE(v));
>+    }

Need to check for failure to create |cstr|.


>diff -r 6b5508397a29 js/src/jsprvtd.h

>+/* Template forward declarations. */
>+struct JS_mallocator;
>+template <class, class = JS_mallocator> class JSVector;

I personally am not a fan of nameless arguments in parameter lists.  I'd rather see a canonical "class T" and "class Allocator" here than complete anonymity (even weirder with the default allocator class in the template).


>diff -r 6b5508397a29 js/src/jsstr.cpp

>+static inline JSBool 
>+pushAtom(JSAtom *atom, JSVector<jschar> &buf) {
>+    JSString *str = ATOM_TO_STRING(atom);
>+    const jschar *chars = JS_GetStringChars(str);
>+    return buf.pushBack(chars, chars + JS_GetStringLength(str));
>+}

Use JSString::getCharsAndLength here instead of JSAPI calls, which have extra overhead for any number of different reasons.


>+/* This function is intended to implement E-262-3 section 9.8, toString. */
>+JS_FRIEND_API(JSBool)
>+js_ValueToStringBuffer(JSContext *cx, jsval v, JSVector<jschar> &buf)
>+{
>+    if (JSVAL_IS_OBJECT(v)) {
>+        JSObject *obj = JSVAL_TO_OBJECT(v);
>+        if (!obj)
>+            return pushAtom(cx->runtime->atomState.nullAtom, buf);
>+        if (!OBJ_DEFAULT_VALUE(cx, obj, JSTYPE_STRING, &v))
>+            return JS_FALSE;
>+    }
>+    if (JSVAL_IS_STRING(v)) {
>+        JSString *str = JSVAL_TO_STRING(v);
>+        const jschar *chars = JS_GetStringChars(str);
>+        return buf.pushBack(chars, chars + JS_GetStringLength(str));

JSString::getCharsAndLength

>+    } else if (JSVAL_IS_INT(v) || JSVAL_IS_DOUBLE(v)) {
>+        return js_IntDoubleToStringBuffer(cx, v, buf);
>+    } else if (JSVAL_IS_BOOLEAN(v)) {
>+        return js_BooleanToStringBuffer(cx, JSVAL_TO_BOOLEAN(v), buf);
>+    } else {
>+        return pushAtom(cx->runtime->atomState.typeAtoms[JSTYPE_VOID], buf);
>+    }
>+}

Oh hey, pre-existing bug:

print({toString: function() { return null; } });

Please fix here and in the other places you mentioned seeing this pattern, and dump some testcase examples that would fail with the current code (for each case you fix) in a comment so they'll get swept up in bclary's in-testsuite watch.


>diff -r 6b5508397a29 js/src/jsstr.h

> /*
>+ * This function is intended to implement E-262-3 section 9.8, toString.  
>+ * Convert the given value to a string of jschars appended to the given buffer.  
>+ * On error, the passed buffer may have partial results appended.
>+ */

Just "implements", again.


>diff -r 6b5508397a29 js/src/jsvector.h

>+ * N.B: JSVector is not reentrent: T member functions called during JSVector
>+ *      member functions must not call back into the same JSVector.
>+ */

reentrant

On that note, add a bool inProgress member, and add a private class within JSVector whose constructor takes a bool&, asserts it to be false, then sets it to true and whose destructor sets it to false.  Then sprinkle instances of that class at the top of every method in JSVector, to tell us if reentry ever occurs.


>+template <class T, class Allocator>
>+class JSVector
>+{
>+  public:
>+    JSVector(JSContext *cx) : mCx(cx), mBeg(0), mEnd(0), mCap(0) {}

mStart, mEnd, mCapacity seem like better names to me.


>+    /*
>+     * Transfer ownership of an array of objects into the JSVector.
>+     * N.B. This call assumes that there are not uninitialized elements in the
>+     *      passed array.
>+     */

no uninitialized
One nit on Waldo's fine review:

  ok = mightFail()
       && mightFail2()
       && mightFail3();
  if (!ok)
      error;

House style puts && and || at the end, justifies all operands to start in the same column.

The old bug

js> print({toString: function() { return null; } });
undefined

comes from 1995! Easy to fix with !JSVAL_IS_PRIMITIVE test and OBJ_DEFAULT_VALUE first, then JSVAL_IS_NULL along with the other tests.

/be
Attached patch specialization for PODs (obsolete) — Splinter Review
Before getting into Waldo's review, I wanted to post the experiment I did on  optimizing JSVector for PODs.  I used the traits/specialization technique I mentioned above by making a trait IsPodType<T> and a specialized template JSVectorImpl which contains inlined functions called by JSVector.  From testing (on gcc -O3) there is no overhead for the specialization compared to writing the POD code in JSVector directly.  Templates + inlining make the two equivalent for PODs.

The interesting thing I found is that there is *no* performance gain.  You can see it by running this patch on the peacekeeper benchmark attached to the bug with/without the line:

template <> struct IsPodType<jschar> { static const bool result = true; };

I also made a standalone perf test that does a bunch of pushBacks in a loop (both versions) and got the same results.

Interestingly, using memcpy/memset for POD operations actually made the peacemaker benchmark run slower than the manual 'for' loop.  I believe this is because the copies tend to be short and memset/memcall requires a function call.

So, unless I'm doing something dim in my POD implementation (always a possibility), I don't see any benefit for dropping the "safe for non-POD" property or adding all the extra code for the specialization approach used in this patch.
Attached patch v.6 (obsolete) — Splinter Review
Thanks, good finds!  Hopefully this patch addresses all your comments; although I had a few points to make below.  For now, the POD-specialization code is in the patch until the relevant discussion concludes.

> >+/* Transfer ownership of buffer to returned string. */
> >+BufferToString(JSContext *cx, JSVector<jschar> &buf, jsval *rval)
> 
> Maybe it's the C++ way, but something sits oddly with me about the implicit
> assumption that the defined-default allocator is compatible with JS_free later
> in this method.  I don't immediately see a way around this, but maybe it's just
> me, or maybe someone else can point out a better way to do this.

It's not the C++ way; I can't recall anywhere where containers with allocators give up their internal buffers.  Perhaps for this reason.  To be explicit, the more verbose JSVector<jschar, JS_mallocator> template-id could be used.  What I'd rather do is define the default allocator in JSVector's comment so its not so implicit.

> cuddle those elses!

 else
o-V-<

> >+/* Template forward declarations. */
> >+struct JS_mallocator;
> >+template <class, class = JS_mallocator> class JSVector;
> 
> I personally am not a fan of nameless arguments in parameter lists.

I generally agree, I just thought this was a utility file no one read.  Will change.

> Use JSString::getCharsAndLength here instead of JSAPI calls, which have extra
> overhead for any number of different reasons.

I don't see JSString having any member functions; I'll use JSSTRING_CHARS_AND_LENGTH.

Neat! 10% faster on my Array.join benchmark!

> Please fix here and in the other places you mentioned seeing this pattern, and
> dump some testcase examples that would fail with the current code

Will do.

> >+template <class T, class Allocator>
> >+class JSVector
> >+{
> >+  public:
> >+    JSVector(JSContext *cx) : mCx(cx), mBeg(0), mEnd(0), mCap(0) {}
> 
> mStart, mEnd, mCapacity seem like better names to me.

I chose begin/end because they are established terminology in the STL world and this is an STL-like container.  I shortened mCapacity to mCap because its nice when they are all 4 letters :).  I'll leave it the same for now unless you are someone strongly objects.

Hopefully I'll internalize the JS style soon so subsequent reviews are shorter :)
Attachment #383318 - Attachment is obsolete: true
Attachment #383548 - Attachment is obsolete: true
Attachment #383588 - Flags: review?(jwalden+bmo)
Blocks: 498797
Since we're evolving new style, we have the chance to adopt more standard style. In this light, naming conventions from STL could be helpful.

But then hacking, e.g. begin to [b]eg (in mBeg), on a Procrustean bed two chars shorter than any I faced in the '80s due to 6bit chars in a 36 bit word (!), goes against the "follow STL" premise.

The mFoo convention is yet another influence. Eclecticism is good up to a point, past which it collapses into Modernist muddle or PoMo satire. In this case, we can try to follow STL in naming, and Mozilla C++ in member prefixing, but I do not see a coherent third influence that can mix nicely (_Tertium non datur_).

So to keep the STL parallel and follow the mFoo convention without adding a third, cryptically short style, use mBegin, mEnd, and mCapacity.

/be
It is certainly hard to argue with an argument so rich in color and history.

To wit, and confounding my previous argument, GCC's STL implementation actually uses 'start', 'finish', and 'end_of_storage' (appropriately prefixed with chaos) to name their private data.
Is there no STL standard? Or are begin, end, capacity standard method names and the GCC private (so non-standard) data members are start, finish, end_of_storage?

What's their prefix chaos look like, I'm curious?

/be
It's the public interface that is standardized.  Being private, member variable names are arbitrary (which is why I originally took the liberty of making mine terse).

"start" --> "this->_M_impl._M_start"
Wow, underscores (two) *and* capital-M -- an RSI inducer for sure. Stallman's was so bad he had grad-student slaves typing for him as he peppered them with Emacs commands and text input at high velocity. No thanks :-P.

/be
Attachment #383588 - Flags: review?(jwalden+bmo) → review-
Comment on attachment 383588 [details] [diff] [review]
v.6

Regarding getCharsAndLength, you need to rebase your changes against the latest tree, because that was added only a couple days ago, and you're going to find out you're bitrotted when you create the final version for someone to push.


>diff -r 6b5508397a29 js/src/jsarray.cpp

>+    /* Cycles/joins are indicated by sharp objects. */
>+#if JS_HAS_SHARP_VARS
>+    if (IS_SHARP(he)) {
>+        JS_ASSERT(sharpchars != 0);
>+        /* +1 to include the trailing '\0' */
>+        buf.replaceRawBuffer(sharpchars, js_strlen(sharpchars) + 1);
>+        goto make_string;
>+    }
>+    else if (sharpchars) {
>+        MAKE_SHARP(he);
>+        buf.replaceRawBuffer(sharpchars, js_strlen(sharpchars));
>+    }

} else if (sharpchars) {

Double-check the rest of the patch, if I missed this one the first time around it's quite possible I missed others.


>+        /* Append element to buffer. */
>+        if (!(ok = buf.pushBack(chars, chars + charlen)))
>+            goto done;
>+        if (index + 1 != length) {
>+            if (!(ok = buf.pushBack(',')) || !(ok = buf.pushBack(' ')))
>+                goto done;
>+        }
>+        else if (hole) {

} else if (hole) {


>+            if (locale) {
>+                JSObject *robj;
>+
>+                JSAtom *atom = cx->runtime->atomState.toLocaleStringAtom;
>+                ok = js_ValueToObject(cx, *rval, &robj);
>+                if (ok) {
>+                    /* Re-use *rval to protect robj temporarily. */
>+                    *rval = OBJECT_TO_JSVAL(robj);
>+                    ok = js_TryMethod(cx, robj, atom, 0, NULL, rval);
>+                }
>+                if (!ok)
>+                    goto done;
>+                ok = js_ValueToStringBuffer(cx, *rval, buf);
>+                if (!ok)
>+                    goto done;
>+            } else {
>+                ok = js_ValueToStringBuffer(cx, *rval, buf);
>+                if (!ok)
>+                    goto done;
>+            }

The last three lines of both if- and else-arms are the same, so they can be moved out of it entirely and the else-arm can be removed.  Add a blank line before the moved lines when you do this.


>+        /* Append the separator. */
>+        if (index + 1 != length)
>+            if (!(ok = buf.pushBack(sep, sep + seplen)))
>+                goto done;

Unbraced multi-line if!


>+    }
>+
>+    /* Finalize the buffer. */
>+    if (buf.empty()) {
>+        *rval = ATOM_KEY(cx->runtime->atomState.emptyAtom);
>+        goto done;
>+    } 
>+    ok = buf.pushBack(0) &&
>+         BufferToString(cx, buf, rval);

Generally put an empty line after blocks, so before the |ok = ...| line here.


> static JSBool
> array_toString(JSContext *cx, uintN argc, jsval *vp)
> {
>     JSObject *obj;
> 
>     obj = JS_THIS_OBJECT(cx, vp);
>-    if (OBJ_GET_CLASS(cx, obj) != &js_SlowArrayClass &&
>-        !JS_InstanceOf(cx, obj, &js_ArrayClass, vp + 2)) {
>+    if (!obj ||
>+        (OBJ_GET_CLASS(cx, obj) != &js_SlowArrayClass &&
>+         !JS_InstanceOf(cx, obj, &js_ArrayClass, vp + 2)))
>         return JS_FALSE;
>-    }
>-    return array_join_sub(cx, obj, TO_STRING, NULL, vp);
>+
>+    return array_toString_sub(cx, obj, JS_FALSE, NULL, vp);
> }

You killed the braces around the if!


>-    if (OBJ_GET_CLASS(cx, obj) != &js_SlowArrayClass &&
>-        !JS_InstanceOf(cx, obj, &js_ArrayClass, vp + 2)) {
>+    if (!obj ||
>+        (OBJ_GET_CLASS(cx, obj) != &js_SlowArrayClass &&
>+         !JS_InstanceOf(cx, obj, &js_ArrayClass, vp + 2)))
>         return JS_FALSE;
>-    }

Again!


>diff -r 6b5508397a29 js/src/jsbool.cpp

>+/* This function implements E-262-3 section 9.8, toString. */
>+JSBool
>+js_BooleanToStringBuffer(JSContext *cx, JSBool b, JSVector<jschar> &buf) 
>+{
>+    static const jschar trueChars[] = { 't', 'r', 'u', 'e' },
>+                        falseChars[] = { 'f', 'a', 'l', 's', 'e' };
>+    return b ? buf.pushBack(trueChars, trueChars + JS_ARRAY_LENGTH(trueChars))
>+             : buf.pushBack(falseChars, falseChars + JS_ARRAY_LENGTH(falseChars));
>+}

This would be more readable, I think, as:

    const jschar* chars = b ? trueChars : falseChars;
    return buf.pushBack(chars,
                        chars + (b ? JS_ARRAY_LENGTH(trueChars) : JS_ARRAY_LENGTH(falseChars)));


>diff -r 6b5508397a29 js/src/jsstr.cpp

>+    if (!JSVAL_IS_PRIMITIVE(*vp) &&
>+        !OBJ_DEFAULT_VALUE(cx, JSVAL_TO_OBJECT(*vp), JSTYPE_STRING, vp))
>+        return NULL;

Braces!


>+    if (JSVAL_IS_STRING(*vp)) {
>         return JSVAL_TO_STRING(*vp);
>-    if (JSVAL_IS_INT(*vp)) {
>+    } else if (JSVAL_IS_INT(*vp)) {

This is an else after a return, so the else is unnecessary.  That makes the previous if a single-liner which needs no bracing.


>@@ -2946,40 +2945,72 @@ js_ValueToPrintable(JSContext *cx, jsval

>+    if (!JSVAL_IS_PRIMITIVE(v) &&
>+        !OBJ_DEFAULT_VALUE(cx, JSVAL_TO_OBJECT(v), JSTYPE_STRING, &v))
>+        return NULL;

Braces!


>+/* This function implements E-262-3 section 9.8, toString. */
>+JS_FRIEND_API(JSBool)
>+js_ValueToStringBuffer(JSContext *cx, jsval v, JSVector<jschar> &buf)
>+{
>+    if (!JSVAL_IS_PRIMITIVE(v) &&
>+        !OBJ_DEFAULT_VALUE(cx, JSVAL_TO_OBJECT(v), JSTYPE_STRING, &v))
>+        return JS_FALSE;

Braces!


>diff -r 6b5508397a29 js/src/jsvector.h

>+ * The Initial Developer of the Original Code is
>+ *   Luke Wagner <lwagner@mozilla.org>
>+ *
>+ * Contributor(s):
>+ *

Initial Developer is Mozilla Corporation, put your name under the contributors section.  People will usually put a (original author) parenthetical after the name in the contributor list to distinguish authorship, too, with or without's fine.


>+/* Allocator policy for JSVector. */
>+struct JS_mallocator {
>+    static void *alloc(JSContext *cx, size_t bytes) {
>+        void *p = JS_malloc(cx, bytes);
>+        if (!p)
>+            JS_ReportOutOfMemory(cx);
>+        return p;
>+    }
>+    static void *realloc(JSContext *cx, void *old, size_t bytes) {
>+        void *p = JS_realloc(cx, old, bytes);
>+        if (!p)
>+            JS_ReportOutOfMemory(cx);
>+        return p;
>+    }
>+    static void dealloc(JSContext *cx, void *p) {

Seems it would be most consistent to rename to malloc, realloc, and free, particularly since that's about what they wrap.  Actually -- that *should* be what the wrap, check their implementations to see that they already handle OOM-reporting, so the first two should just be |return JS_(re|m)alloc(...)| -- double-reporting rears its ugly head again.  Also a last nit: I think the expected name here would be JSAllocator as with most other structs and classes, or perhaps just Allocator under the assumption that we're starting to drop prefixes as we have with TraceRecorder.


>+/*
>+ * Traits class for identifying POD types.  Until C++0x, there is no automatic 
>+ * way to detect PODs, so for the moment it is done manually.
>+ */

Add one for jsval, too, since I suspect that'll see use pretty quickly.

>+/*
>+ * This template class provides a default implementation for vector operations 
>+ * when the element type is not known to be a POD, as judged by IsPodType.
>+ */
>+template <class T, class Alloc, bool Is_pod>

IsPod would be better style, at least to not mix underscores and not-quite-CamelCaps.

>+struct JSVectorImpl 

>+    static inline void destroy(T *beg, T *end) {

>+    static inline void initialize(T *beg, T *end) {

>+    static inline void initialize(T *beg, T *end) {

We can spare an extra fourteen characters for "begin".  :-P


>+/*
>+ * JS-friendly STL-like container.  JSVector calls the constructors/destructors 
>+ * of all elements stored in its internal buffer, so non-PODs may be safely 
>+ * used.  The Alloc supplies static members for acquiring and releasing memory.  
>+ * The default allocator is JS_mallocator.
>+ *
>+ * T requirements:
>+ *  - has member functions:
>+ *      default constructor, copy constructor, destructor
>+ *  - none of these operations throw
>+ *
>+ * Alloc requirements:
>+ *  - has static functions:
>+ *      void *alloc(JSContext *, size_t);
>+ *      void *realloc(JSContext *, void *, size_t);
>+ *      void dealloc(JSContext *, void *);
>+ *  - none of these operations throw
>+ *
>+ * N.B: JSVector is not reentrent: T member functions called during JSVector
>+ *      member functions must not call back into the same JSVector.
>+ */
>+template <class T, class Alloc>
>+class JSVector
>+{
>+#ifdef DEBUG
>+    bool mInProgress;
>+
>+    struct ReentrancyGuard {
>+        bool &mInProgress;
>+        ReentrancyGuard(bool &b)
>+          : mInProgress(b)
>+        {
>+            JS_ASSERT(!mInProgress);
>+            mInProgress = true; 
>+        }
>+        ~ReentrancyGuard() { mInProgress = false; }
>+    };
>+
>+#  define DEBUG_IN_PROGRESS() ReentrancyGuard reentrancyGuard(mInProgress)
>+#  define ASSERT_NOT_IN_PROGRESS() JS_ASSERT(!mInProgress);
>+#else
>+#  define DEBUG_IN_PROGRESS() ((void)0)
>+#  define ASSERT_NOT_IN_PROGRESS() ((void)0)
>+#endif

I would rather see this explicit than hidden behind macros; #ifdefs for the bodies of the methods should be sufficient to eliminate overhead.  JS_ASSERT(!mInProgress) is sufficient for the not-in-progress assert since the macro expands to nothing if !DEBUG.  Also, I'd pass in |this| rather than the bool member, sorry for not saying what I really meant.


>+  public:
>+    JSVector(JSContext *cx)
>+      : mCx(cx), mBeg(0), mEnd(0), mCap(0)

I still want full names here without omitting a handful of extra characters.


>+    T &operator[](int i)             { ASSERT_NOT_IN_PROGRESS(); JS_ASSERT(i < size()); return mBeg[i]; }
>+    const T &operator[](int i) const { ASSERT_NOT_IN_PROGRESS(); JS_ASSERT(i < size()); return mBeg[i]; }
>+
>+    T *begin()              { ASSERT_NOT_IN_PROGRESS(); return mBeg; }
>+    const T *begin() const  { ASSERT_NOT_IN_PROGRESS(); return mBeg; }
>+
>+    T *end()                { ASSERT_NOT_IN_PROGRESS(); return mEnd; }
>+    const T *end() const    { ASSERT_NOT_IN_PROGRESS(); return mEnd; }
>+
>+    T &back()               { ASSERT_NOT_IN_PROGRESS(); JS_ASSERT(!empty()); return *(mEnd-1); }
>+    const T &back() const   { ASSERT_NOT_IN_PROGRESS(); JS_ASSERT(!empty()); return *(mEnd-1); }

My preference is for inlines with more than one statement-like thing to be multi-liners, which applies to all of these.  Also, spaces around the '-' in the |back| methods.


>+    template <class U> bool pushBack(const U *beg, const U *end);

More unuseful omissions!

>+    /*
>+     * Transfers ownership of the internal buffer used by JSVector to the 
>+     * caller.  After this call, the JSVector is empty.
>+     * N.B. Although a T*, only the range [0, size()) is constructed
>+     */

Should put a period after this last (full) sentence.
Attached patch v.7 (obsolete) — Splinter Review
(In reply to comment #75)
> 
> >+JSBool
> >+js_BooleanToStringBuffer(JSContext *cx, JSBool b, JSVector<jschar> &buf) 
> >+{
> >+    static const jschar trueChars[] = { 't', 'r', 'u', 'e' },
> >+                        falseChars[] = { 'f', 'a', 'l', 's', 'e' };
> >+    return b ? buf.pushBack(trueChars, trueChars + JS_ARRAY_LENGTH(trueChars))
> >+             : buf.pushBack(falseChars, falseChars + JS_ARRAY_LENGTH(falseChars));
> >+}
> 
> This would be more readable, I think, as:
> 
>     const jschar* chars = b ? trueChars : falseChars;
>     return buf.pushBack(chars,
>                         chars + (b ? JS_ARRAY_LENGTH(trueChars) :
> JS_ARRAY_LENGTH(falseChars)));

Lexically shorter, but IMHO, not more readable.  Extra branching means extra mental cycles to understand and extra CPU cycles to execute.

> >  if (JSVAL_IS_STRING(*vp)) {
> >      return JSVAL_TO_STRING(*vp);
> >  } else if (JSVAL_IS_INT(*vp)) {
> >      str = js_NumberToString(cx, JSVAL_TO_INT(*vp));
> >  } else if (JSVAL_IS_DOUBLE(*vp)) {
> >      str = js_NumberToString(cx, *JSVAL_TO_DOUBLE(*vp));
> >  } else if (JSVAL_IS_BOOLEAN(*vp)) {
> >      return ATOM_TO_STRING(cx->runtime->atomState.booleanAtoms[
> >                                JSVAL_TO_BOOLEAN(*vp)? 1 : 0]);
> >  } else if (JSVAL_IS_NULL(*vp)) {
> >      return ATOM_TO_STRING(cx->runtime->atomState.nullAtom);
> >  } else {
> >      JS_ASSERT(JSVAL_IS_VOID(*vp));
> >      return ATOM_TO_STRING(cx->runtime->atomState.typeAtoms[JSTYPE_VOID]);
> >  }
>
> This is an else after a return, so the else is unnecessary.  That makes the
> previous if a single-liner which needs no bracing.

I understand the general rule about else-following-return, but don't you think, in this case, it would obscure the simple branching pattern of the larger control structure?  With sporadic elses removed, one has to look harder to determine that this is just a simple type switch.

> Also a last nit: I think the
> expected name here would be JSAllocator as with most other structs and classes,
> or perhaps just Allocator under the assumption that we're starting to drop
> prefixes as we have with TraceRecorder.

Perhaps "Mallocator" because "allocator" is the name of a general policy?  In contrast, future allocators could be named "GCAllocator" or "ArenaAllocator".  ("MallocAllocator" would just be verbose :)

> Add one for jsval, too, since I suspect that'll see use pretty quickly.

Due to the transparency of typedefs, already covered with the int specialization.

Thanks again!
Attachment #383588 - Attachment is obsolete: true
Attachment #383839 - Flags: review?(jwalden+bmo)
(In reply to comment #76)
> Created an attachment (id=383839) [details]
> v.7

The patch does not include jsvector.h
Comment on attachment 383588 [details] [diff] [review]
v.6

>diff -r 6b5508397a29 js/src/jsvector.h
>+template <class T, class Alloc>
>+class JSVector
>+{
>+  public:
>+    JSVector(JSContext *cx)
>+      : mCx(cx), mBeg(0), mEnd(0), mCap(0)

It is rather problematic to store JSContext inside JSVector as code makes no provisions that JSContext is not destroyed before the vector. I think it would be better for this patch to have a class like JSTempVector that just calls the standard new/malloc and is supposed to be declared on the native stack or at least to be deallocated when the function that allocates it returns.

If a class like vector can be stored in GC-thing, then its destructor must not free the storage. Rather it should just assert that the storage is freed with a separated method taking explicit cx parameter to free/allocate things. The latter requires that cx to be passed to most vector methods. This shows that there could not be one template class that addresses all the needs of SpiderMonkey.
(In reply to comment #76)
> > >  if (JSVAL_IS_STRING(*vp)) {
> > >      return JSVAL_TO_STRING(*vp);
> > >  } else if (JSVAL_IS_INT(*vp)) {
> > >      str = js_NumberToString(cx, JSVAL_TO_INT(*vp));
> > >  } else if (JSVAL_IS_DOUBLE(*vp)) {
> > >      str = js_NumberToString(cx, *JSVAL_TO_DOUBLE(*vp));
> > >  } else if (JSVAL_IS_BOOLEAN(*vp)) {
> > >      return ATOM_TO_STRING(cx->runtime->atomState.booleanAtoms[
> > >                                JSVAL_TO_BOOLEAN(*vp)? 1 : 0]);
> > >  } else if (JSVAL_IS_NULL(*vp)) {
> > >      return ATOM_TO_STRING(cx->runtime->atomState.nullAtom);
> > >  } else {
> > >      JS_ASSERT(JSVAL_IS_VOID(*vp));
> > >      return ATOM_TO_STRING(cx->runtime->atomState.typeAtoms[JSTYPE_VOID]);
> > >  }
> >
> > This is an else after a return, so the else is unnecessary.  That makes the
> > previous if a single-liner which needs no bracing.
> 
> I understand the general rule about else-following-return, but don't you think,
> in this case, it would obscure the simple branching pattern of the larger
> control structure?

No. The context above is missing the common exit block which succeeds the two clauses that do not return early. Including that context shows how the above obfuscates control flow and splits (and indents) code blocks arbitrarily.

> With sporadic elses removed, one has to look harder to
> determine that this is just a simple type switch.

The above is arbitrarily putting common code after only two of N cases using else after return, when it should simply reorder tests and use the right final else clause:

    if (JSVAL_IS_STRING(*vp)) {
        return JSVAL_TO_STRING(*vp);
    } else if (JSVAL_IS_BOOLEAN(*vp)) {
        return ATOM_TO_STRING(cx->runtime->atomState.booleanAtoms[
                                  JSVAL_TO_BOOLEAN(*vp)? 1 : 0]);
    } else if (JSVAL_IS_NULL(*vp)) {
        return ATOM_TO_STRING(cx->runtime->atomState.nullAtom);
    } else if (JSVAL_IS_VOID(*vp)) {
        return ATOM_TO_STRING(cx->runtime->atomState.typeAtoms[JSTYPE_VOID]);
    } else {
        if (JSVAL_IS_INT(*vp)) {
            str = js_NumberToString(cx, JSVAL_TO_INT(*vp));
        } else {
            JS_ASSERT(JSVAL_IS_DOUBLE(*vp));
            str = js_NumberToString(cx, *JSVAL_TO_DOUBLE(*vp));
        }
        if (str)
            *vp = STRING_TO_JSVAL(str);
        return str;
    }

PGO can optimize for best order, but it might be better to put the int and double code first, at the price of a JSVAL_IS_NUMBER(*vp) test. Not sure, not really the point here (I trust PGO).

A common return str; at the end, with all the consequents setting str rather than most returning early, is arguably even better.

But in no wise is there a benefit in arbitrarily splitting control flow to make a seemingly common exit block apply only to two of N cases in the if-else tree.

> Perhaps "Mallocator" because "allocator" is the name of a general policy?  In
> contrast, future allocators could be named "GCAllocator" or "ArenaAllocator". 
> ("MallocAllocator" would just be verbose :)

Mallocator +1

/be
(In reply to comment #74)
> Wow, underscores (two) *and* capital-M -- an RSI inducer for sure. 

Those references to RSI in connection with frequent capital letters or underscores always puzzle me. I do not know about Mac but even Windows from the last millennium allowed (if not directly then at least with easy to get programs) to make a key like CapsLock temporary so it applies only to the following letter. This way typing a capital letter or shift requires no more efforts than to enter any other letter - one just needs to prefix it with that key.
(In reply to comment #80)
> (In reply to comment #74)
> Those references to RSI in connection with frequent capital letters or
> underscores always puzzle me. I do not know about Mac but even Windows from the
> last millennium allowed (if not directly then at least with easy to get
> programs) to make a key like CapsLock temporary so it applies only to the
> following letter.

Most users don't customize enough; even hardcore programmers. Certainly RSI victims do customize, starting with a better keyboard, non-QWERTY even. But the general ergonomic point stands for "most users".

I've had no RSI problems to speak of while programming in C, C++, or JS, but others I know have mentioned this issue (one hacker I know prefers Python to curly-brace languages solely on this basis; curlies require shift by default).

/be
Attached patch v.8 (obsolete) — Splinter Review
> The patch does not include jsvector.h
Doh!

(In reply to comment #78)
In this patch I added an allocator that calls malloc/free.  It deserves the name Mallocator, and the allocator that calls JS_malloc/JS_free is renamed to  JSMallocator.  The JSMallocator comment specifies that the JSContext should outlive the JSVector.  Thus, JSVector is not necessarily temporary and I believe should keep its name.  I would have added the alias:

template <class T> using JSTempVector = JSVector<T, JSMallocator>;

but alas that is a C++0x feature.

Now, it seems that the JSContext* stored in JSVector is wasted for JSVector<T, Mallocator>.  This fixed by doing something I should have done anyway: make Allocators instances.  This way, the JSContext* is stored inside JSMallocator.  
This is what the STL does but I was initially discouraged because even empty allocators still incur a word of storage overhead (e.g. in std::vector).  I got around this by deriving the allocator instead of storing it as a member variable (no size overhead on gcc, I don't know about msvc).  Also STL allocators are given pretty weak guarantees by the containers (because of std::list and splicing), so I explicitly make the stronger guarantee ("same allocator used to allocate/free") in JSVector's comment.
Attachment #383839 - Attachment is obsolete: true
Attachment #383953 - Flags: review?
Attachment #383839 - Flags: review?(jwalden+bmo)
(In reply to comment #82)
> Now, it seems that the JSContext* stored in JSVector is wasted for JSVector<T,
> Mallocator>.

The cx is still necessary to report out-of-memory and range errors (the last patch is not doing that - this is a bug).

Thus either the vector is for temporary stuff that cannot outlive a single cx instance. Then it should use malloc, not JS_malloc, and can store cx for convenience of error reporting. 

Or the vector can outlive cx. This implies that the vector cannot store cx directly or indirectly via an allocator yet it must use JS_malloc/JS_free for proper GC accounting as a long-lived vector would be managed via the GC. So cx should be passed to vector methods explicitly and the destructor can not call the deallocation routine as it would not have access to the cx.

These two cases are very different to try to capture the semantics behind one template class. Since the toString optimization needs only the first kind of vector, I suggest to cover only this case for now. That is, lets have a class like JSTempVector that calls malloc/free and stores cx for convenience of range and OOM error reporting.
(In reply to comment #81)
> one hacker I know prefers Python to curly-brace languages solely on this basis; curlies require shift by default

On default Scandinavian keyboard layouts one has to press Shift to enter =. That was one of the reasons why GNU long option style became --word, not =word. So Stalman is not that bad as it is implied often ;)
Attached patch v.9 (obsolete) — Splinter Review
Ok, I think I understand you better now, thanks.

It seems there is no need for an Allocator policy atm, so for simplicity's sake, I removed it in this patch.
Attachment #383953 - Attachment is obsolete: true
Attachment #383961 - Flags: review?(jwalden+bmo)
Attachment #383953 - Flags: review?
(In reply to comment #42)
> Leading _ is not reserved in all scopes.  Leading _ followed by a capital
> letter is reserved in that manner, as are all names containing __, but _foo is
> a perfectly valid identifier -- with the caveat that the C++ implementation
> might use and define that name globally, in which case the name should be
> qualified somehow (this-> [although it's possible names resolve to member in
> preference to globally in class methods, don't actually know if this happens],
> obj., and so on) to be fully safe.  I lack IRC context, however, so maybe I
> misunderstand what was claimed to be reserved.

I think Jeff's right, here.  The standard's text is:

   All identifiers that begin with an underscore are always reserved for use
   as identifiers with file scope in both the ordinary and tag name spaces.

It wasn't clear to me (and others?) how one could safely define _foo in some
non-file scope without shadowing the file scope definition.  But if a reference
to a shadowed variable were written explicitly in the code, well, that's your
own fault, and if some system header's macro were to introduce a reference to
the shadowed variable, well, then that macro is buggy for assuming the user has
no local definition for the name.  So I was confused.
Comment on attachment 383961 [details] [diff] [review]
v.9

>+js_ValueToStringBuffer(JSContext *cx, jsval v, JSVector<jschar> &buf)

>+    if (JSVAL_IS_STRING(v)) {
>+        JSString *str = JSVAL_TO_STRING(v);
>+        const jschar *chars;
>+        size_t length;
>+        str->getCharsAndLength(chars, length);
>+        return buf.pushBack(chars, chars + length);
>+    } else if (JSVAL_IS_INT(v) || JSVAL_IS_DOUBLE(v)) {
>+        return js_IntDoubleToStringBuffer(cx, v, buf);
>+    } else if (JSVAL_IS_BOOLEAN(v)) {
>+        return js_BooleanToStringBuffer(cx, JSVAL_TO_BOOLEAN(v), buf);
>+    } else if (JSVAL_IS_NULL(v)) {
>+        return pushAtom(cx->runtime->atomState.nullAtom, buf);
>+    } else {
>+        return pushAtom(cx->runtime->atomState.typeAtoms[JSTYPE_VOID], buf);
>+    }
>+}

This should be a series of unbraced ifs, since all return unconditionally.


>+ * N.B: JSVector is not reentrent: T member functions called during JSVector
>+ *      member functions must not call back into the same JSVector.

"reeentrant" still.


>+#ifdef DEBUG
>+    bool mInProgress;
>+
>+    class ReentrancyGuard {
>+        JSVector &mVec;
>+      public:
>+        ReentrancyGuard(JSVector &v)
>+          : mVec(v)
>+        {
>+            JS_ASSERT(!mVec.mInProgress);
>+            mVec.mInProgress = true; 
>+        }
>+        ~ReentrancyGuard() { mVec.mInProgress = false; }
>+    };
>+#endif

I don't think we gain anything by passing in this by reference rather than just as a pointer, would rather just see a |this| not besmirched by a * in the constructor calls to reduce visual clutter.  Also, put the ifdefs around the contents of the constructor and destructor -- this allows you to not ifdef each instantiation, and the compiler will optimize the overhead out.


Nearly there, think this is all that remains...

Oh, in a followup note, JSVector<jsval> would of course be a bad idea because the jsvals would be unrooted -- noted when doing the initial review, then forgot to remove that line from the eventual comment.
Attachment #383961 - Flags: review?(jwalden+bmo) → review-
Thanks!  One small quibble: 

While 'this' may need to be *-besmirched, I think that, in general, '&' as a parameter and member type conveys stronger semantics than '*' and thus should be used when possible.  Specifically, '&' implies (1) non-null, (2) doesn't change, (3) no transfer of ownership.  In the specific cases of ReentrancyGuard (and Impl::growTo) these things are trivial to see, but I would like to use this style uniformly as long as it doesn't conflict with other SpiderMonkey style.
(In reply to comment #88)
> Thanks!  One small quibble: 

True enough, I just don't think any of those concerns are relevant to this limited situation.
(In reply to comment #87)
> (From update of attachment 383961 [details] [diff] [review])
> >+js_ValueToStringBuffer(JSContext *cx, jsval v, JSVector<jschar> &buf)
> 
> >+    if (JSVAL_IS_STRING(v)) {
> >+        JSString *str = JSVAL_TO_STRING(v);
> >+        const jschar *chars;
> >+        size_t length;
> >+        str->getCharsAndLength(chars, length);
> >+        return buf.pushBack(chars, chars + length);
> >+    } else if (JSVAL_IS_INT(v) || JSVAL_IS_DOUBLE(v)) {
> >+        return js_IntDoubleToStringBuffer(cx, v, buf);
> >+    } else if (JSVAL_IS_BOOLEAN(v)) {
> >+        return js_BooleanToStringBuffer(cx, JSVAL_TO_BOOLEAN(v), buf);
> >+    } else if (JSVAL_IS_NULL(v)) {
> >+        return pushAtom(cx->runtime->atomState.nullAtom, buf);
> >+    } else {
> >+        return pushAtom(cx->runtime->atomState.typeAtoms[JSTYPE_VOID], buf);
> >+    }
> >+}
> 
> This should be a series of unbraced ifs, since all return unconditionally.

Draft off Waldo, not looking at the whole patch. Besides else after return, the above does not assert before the last return that JSVAL_IS_VOID(v).

Also, JSVAL_IS_NUMBER and js_NumberToStringBuffer are better than the open-coded INT or DOUBLE condition and the IntDouble name-part.

> >+ * N.B: JSVector is not reentrent: T member functions called during JSVector
> >+ *      member functions must not call back into the same JSVector.
> 
> "reeentrant" still.

Igor's advice about including Temp(orary) in the JSVector name still seems good.

> Oh, in a followup note, JSVector<jsval> would of course be a bad idea because
> the jsvals would be unrooted -- noted when doing the initial review, then
> forgot to remove that line from the eventual comment.

Another reason to use Temp -- if you could template-specialize for jsval and other GC-thing tagged or untagged pointer types, you could mix in or derive from JSAutoTempValueRooter.

/be
(In reply to comment #90)
> Another reason to use Temp -- if you could template-specialize for jsval and
> other GC-thing tagged or untagged pointer types, you could mix in or derive
> from JSAutoTempValueRooter.

Template specialization to do GC protection seems fraught with peril.  The way tempate specialization works, as I understand it, if there is any error trying to instantiate the jsval-specific version of the template, the C++ compiler silently proceeds to the next overloading of that templated name --- which would presumably leave us at the non-GC-protecting generic definition.  So if there is ever a problem on any platform with the jsval-specific template, all values of that type can potentially become silently unrooted.  C++ veterans, am I wrong?

Much better to require some explicit indication at the point of definition that the value should be rooted, such that errors occur when things don't work.
(In reply to comment #91)
> Much better to require some explicit indication at the point of definition that
> the value should be rooted, such that errors occur when things don't work.

EIBTI for sure but don't we have an implicit hazard with the current JSVector approach? You can instantiate it on gc-thing pointer types and there's no safety.

Will 0x give us the contract-y goodness we crave? I can hardly wait.

/be
(In reply to comment #92)
> EIBTI for sure but don't we have an implicit hazard with the current JSVector
> approach? You can instantiate it on gc-thing pointer types and there's no
> safety.

What I'm saying is that, if we try to eliminate the hazard you mention (that JSVector<jsval> is not a compile-time error, even though the jsvals aren't rooted), we should look for a solution that does not introduce a new hazard that is easy to miss (that a typo in the template specialization causes silent fall-back to a non-rooting variant).

I'd love to hear how experienced template metaprogramming folks determine which specialization actually got used.  Add static members with interesting names and then look for them in the linker symbol table in the final executable???

> Will 0x give us the contract-y goodness we crave? I can hardly wait.

I don't know whether contracts can give you a Prolog-style 'cut' behavior: if the template's arguments meet these contracts, then you *must* use this specialization or report an error.  If so, that would solve this problem, and make template specialization much less hazardous.
> Also, JSVAL_IS_NUMBER and js_NumberToStringBuffer are better than the
> open-coded INT or DOUBLE condition and the IntDouble name-part.

JSVAL_IS_NUMBER sounds good.  I originally chose the name js_NumberToStringBuffer, but given that js_NumberToString has the following signature:

JSString *js_NumberToString(JSContxt *, double);

It seemed strange for the -Buffer version to take a jsval, not a double.  I considered making two functions, js_IntToStringBuffer (taking an int) and js_DoubleToStringBuffer (taking a double), but the bodies of these were almost the same.  I suppose these could easily be factored out into a static impl function and then the function names would make sense.
> What I'm saying is that, if we try to eliminate the hazard you mention (that
> JSVector<jsval> is not a compile-time error, even though the jsvals aren't
> rooted), we should look for a solution that does not introduce a new hazard
> that is easy to miss (that a typo in the template specialization causes silent
> fall-back to a non-rooting variant).

The situation you are referring to is the "Substitution Failure Is Not An Error" (SFINAE) rule which lies at the heart of enable_if hackery.  Fortunately, this only comes up when considering what function declarations go into the overload set for a function call.  When choosing a specialization, the rules are simpler: use pattern matching on the arguments to pick the specialization, and any subsequent error during instantiation is a real error.  In particular, the following is an error, not a program printing "general":

template <class> struct Templ { static void f() { cout << "general\n"; } };
template <class T> struct Templ<T *> { T a[1]; static void f() {} };
int main() { Templ<void *>::f(); }

In general, though, having a specialization do something significantly different than the general definition is discouraged C++ style; specializations are intended for optimized implementations of the same behavior.  (An obvious exception is the traits/policy pattern, but indeed 0x contract-y goodness will make these go away.)  To add the rooting functionality, I think a good idea would be to make a separate template class (implemented in terms of JSTempVector) with an appropriate "Root"-bearing name.  This allows for the case that the knowing user might want a temp vector of non-rooted GC-things.  Additionally, assuming there are multiple types of GC-things to root, having a separate template, rather than using specialization, makes implementation reuse much more natural.
(In reply to comment #94)
> I originally chose the name
> js_NumberToStringBuffer, but given that js_NumberToString has the following
> signature:
> 
> JSString *js_NumberToString(JSContxt *, double);
> 
> It seemed strange for the -Buffer version to take a jsval, not a double.

Then name it js_NumberValueToStringBuffer (you're right that the NumberTo name was slightly off); that's the precise full name.

Nit: left brace of that function's body is misplaced:

 js_IntDoubleToStringBuffer(JSContext *cx, jsval v, JSVector<jschar> &buf) {

/be
Attached patch v.10 (obsolete) — Splinter Review
Attachment #383961 - Attachment is obsolete: true
Attachment #385762 - Flags: review?(jwalden+bmo)
Comment on attachment 385762 [details] [diff] [review]
v.10

Final drive-by from me, looks great otherwise:

$ grep '^+[A-Za-z_].*) {$' /tmp/v10
+js_InitContextBusyArrayTable(JSContext *cx) {
+pushAtom(JSAtom *atom, JSTempVector<jschar> &buf) {
+JSTempVector<T>::~JSTempVector() {
+JSTempVector<T>::reserve(size_t newsz) {
+JSTempVector<T>::growBy(size_t amount) {
+JSTempVector<T>::clear() {
+JSTempVector<T>::pushBack(const T &t) {
+JSTempVector<T>::pushBack(const U *begin, const U *end) {
+JSTempVector<T>::extractRawBuffer() {
+JSTempVector<T>::replaceRawBuffer(T *p, size_t length) {

These all put the method's opening brace on the same line as the declarator and formals, but that's not (even evolving) house style. Practical benefit of left brace on its own line in column 1 is easier navigation for vim heads via [[ -- and similar benefits to others anchoring greps, seds, awks, and perls.

For inline methods in class bodies, evolving style puts method type as well as declarator, formals, and left brace on same line where they all fit. This is ok by me, although you can see some places where overlong method heads make the left brace go on its own line. No foolish consistency hobgoblins here! ;-)

But for top-level functions and method definitions, left brace on its own line in column 1, please.

Great to see this patch line up for landing.

/be
Attached patch v.11 (obsolete) — Splinter Review
Oops, not trying to make a fashion statement, just old habits dying slowly.  (Also, I would recommend [{ to said vim heads :)
Attachment #385762 - Attachment is obsolete: true
Attachment #385794 - Flags: review?(jwalden+bmo)
Attachment #385762 - Flags: review?(jwalden+bmo)
Old muscle memory dies harder, and [{ requires a shift-chord. But the grep, etc. anchoring use-case still wants body-{ on its own line in col 1.

For structs and classes with superclasses we've been doing likewise. Seems good.

/be
Attachment #385794 - Flags: review?(jwalden+bmo) → review+
(In reply to comment #90)
> Draft off Waldo, not looking at the whole patch.

Sorry, but if I see something like this, I'm going to say something about it -- happened (happens!) often enough with my JS patches when I was first writing them.  Better to fix before it even gets in than in some random style cleanup driveby checkin later...
http://hg.mozilla.org/tracemonkey/rev/b2256abf53c0
Whiteboard: fixed-in-tracemonkey
(In reply to comment #102)
> http://hg.mozilla.org/tracemonkey/rev/b2256abf53c0

I had to back this out because it doesn't build due to jsd.
Whiteboard: fixed-in-tracemonkey
Attached patch v.12Splinter Review
In a n00b move, I forgot to compile the browser.  Two types of errors arose:

1. Two files #include jsnum.h, which uses JSVector, and were only #including jspubtd.h (directly or through jsapi.h).  jsnum.h is already a private js detail (since its not in jsapi.h), so I think it is ok to simply switch to jsprvtd.h.

2. jsprvtd.h sometimes gets #included by C files (jsd) and thus needs to be #ifdef'd out.  Next, some C++ files (through jsdebug.h) wrap jsprvtd.h externally with an extern "C" block, which does not allow template declarations.  The solution is to temporarily exit, then reenter, the extern block.  The tricky part is how to know when to do this.  Since the C++/extern case is rare, I had jsdebug.h #define WRAPPED_IN_EXTERN_C.  Pretty gross, but (1) it prevents having to dump jsvector.h in all the js headers and slowing compilation and (2) future C++ templated containers (JSHashMap...) can simply add their decl below JSVector's without any fuss.
Attachment #385794 - Attachment is obsolete: true
Attachment #386148 - Flags: review?(jwalden+bmo)
Comment on attachment 386148 [details] [diff] [review]
v.12

I can't believe I'm doing this.

Search for a jsd bug to convert it to C++, and if none exists file such a bug, then include the bug number in a comment here.
Attachment #386148 - Flags: review?(jwalden+bmo) → review+
(In reply to comment #105)
> Search for a jsd bug to convert it to C++, and if none exists file such a bug,
> then include the bug number in a comment here.

Righto. bug 501536
so, why don't you do:

#ifdef __cplusplus

extern "C++" {

...

}
#endif
(In reply to comment #107)
> so, why don't you do:
> 
> #ifdef __cplusplus
> 
> extern "C++" {
> 
> ...
> 
> }
> #endif

Good point. Or just get rid of the issue by adding JS_BEGIN_EXTERN_C ... JS_END_EXTERN_C around jsprvtd.h?

/be
(In reply to comment #107)

Awesome!  I had no idea linkage specifications could nest.  It looks like its portable too (section 7.5.4).

(In reply to comment #108)
> Good point. Or just get rid of the issue by adding JS_BEGIN_EXTERN_C ...
> JS_END_EXTERN_C around jsprvtd.h?

JS_BEGIN_EXTERN_C is added around jsprvtd.h by other headers, viz. jsdebug.h
Tiny patch assuming v.12 has been applied.  
(Yes, I was previously #undef'ing the wrong symbol...)
Attachment #386299 - Flags: review+
Depends on: 501834
http://hg.mozilla.org/mozilla-central/rev/bf952aed3786
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Blocks: peacekeeper
Depends on: 503981
This patches removes a warning.
Attachment #392382 - Flags: review?(graydon)
Attachment #392382 - Flags: review?(graydon) → review+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: