Closed Bug 587257 Opened 9 years ago Closed 9 years ago

Make Array.prototype.join faster

Categories

(Core :: JavaScript Engine, defect)

defect
Not set

Tracking

()

RESOLVED FIXED

People

(Reporter: dmandelin, Assigned: luke)

References

Details

(Whiteboard: fixed-in-tracemonkey)

Attachments

(4 files, 10 obsolete files)

1.59 KB, patch
Details | Diff | Splinter Review
810 bytes, patch
cdleary
: review+
Details | Diff | Splinter Review
6.18 KB, patch
Waldo
: review+
Details | Diff | Splinter Review
14.34 KB, patch
Waldo
: review+
Details | Diff | Splinter Review
We are 1.3x slower than JSC on this microbenchmark:

  function f(a) {
    var s;
    for (var i = 0; i < 100000; ++i) {
      s = a.join('');
    }
  }

  var a = new Array(60);
  for (var i = 0; i < 60; ++i)
    a[i] = 'c';

This is costing us 2 ms on fasta, and probably more on other benchmarks.
  var t0 = new Date;
  f(a);
  print(new Date - t0);
Summary: Make String.join faster → Make Array.prototype.join faster
Sunspider calls Array.prototype.join 2837 times (V8 benchmark does not use it).

Most of these calls have an empty separator (''), 935 calls in fasta, 1001 in format-tofte, 3 in aes and probably some others. V8 and JSC special-case this so they don't append an empty separator for each item, and we can remove the "index + 1 != length" check.

Not sure how much this helps, but it could be a start.
Speeding up array_join_sub (bug 200505 comment 35) was my first bug, so I'd be happy to take this, but not for a month or so.  Comment 1 looks like a great start.
Attached patch special case for empty separator (obsolete) — Splinter Review
No difference on SunSpider (haven't sharked yet), but I cooked up a little microbenchmark for string-join, and this patch does seem to win there:

TEST            COMPARISON            FROM                 TO             

=============================================================================

** TOTAL **:    -                 3157.1ms +/- 5.1%   3013.4ms +/- 3.3% 

=============================================================================

  join:         -                 3157.1ms +/- 5.1%   3013.4ms +/- 3.3% 
    empty-sep:  1.086x as fast     630.8ms +/- 0.2%    581.0ms +/- 0.6%     
    single-sep: 1.027x as fast     695.4ms +/- 1.0%    677.3ms +/- 1.4%     
    long-sep:   -                 1830.9ms +/- 8.8%   1755.1ms +/- 5.5%
Looking at the Shark profiles, I'd say the harness underestimates the win here, because here's a lot of time spent shutting down.
Would it make sense to extend this to empty-sep + dense array? iirc this is true for the string-fasta benchmark and probably others. This could remove the call to GetArrayElement and the repeated checks done there. With my limited SM-knowledge I don't know if this is possible/desirable/worthwhile though...
Attached patch special case for empty separator (obsolete) — Splinter Review
cleanup whitespace
Attachment #467773 - Attachment is obsolete: true
(In reply to comment #6)
> Would it make sense to extend this to empty-sep + dense array? 

That's a good idea. Besides, our behavior here looks pretty suboptimal for extremely sparse arrays.
dmandelin's benchmark from comment #0:

TM tip:  143ms
JSC tip: 113ms
patched:  90ms


better results on my microbenchmark, too:

TEST            COMPARISON            FROM                 TO             

=============================================================================

** TOTAL **:    -                 3245.5ms +/- 13.8%   3196.1ms +/- 20.6% 

=============================================================================

  join:         -                 3245.5ms +/- 13.8%   3196.1ms +/- 20.6% 
    empty-sep:  1.22x as fast      630.2ms +/- 0.5%    518.4ms +/- 0.3%     
    single-sep: 1.067x as fast     689.6ms +/- 2.2%    646.3ms +/- 2.5%     
    long-sep:   ??                1925.7ms +/- 23.6%   2031.4ms +/- 32.8%
Attached patch cleanup, some renames (obsolete) — Splinter Review
We can do more here, but I think this win is good enough to take for now.
Assignee: general → sayrer
Attachment #467786 - Attachment is obsolete: true
Attachment #467863 - Attachment is obsolete: true
Attachment #467873 - Flags: review?(lw)
Mmmm, AutoArrayCycleDetector is nice.  One comment: HashSet's default allocator is the ContextAllocPolicy, which reports OOM, so AutoArrayCycleDetector doesn't need to.
Comment on attachment 467873 [details] [diff] [review]
cleanup, some renames

Drive-by nits:

>+typedef js::HashSet<JSObject *> ObjSet;

No js:: required.

>+class AutoArrayCycleDetector
>+{
>+  public:
>+    AutoArrayCycleDetector(JSContext *cx, JSObject *obj) 
>+        : mContext(cx), mObject(obj), mGenBefore(0), mHashPointer(mContext->busyArrays.lookupForAdd(mObject)) {

Here is where the style guide says (or should say) put the { on its own line. Our four-space indentation otherwise makes it hard to see quickly where the body starts.

>+            if ((!(rval->isMagic(JS_ARRAY_HOLE) || rval->isNullOrUndefined()))

The ! expression doesn't need parens around it.

>                                                                                && !GetElementCharacters(cx, obj, locale, &cb, rval))
..1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890

Way over the 100 column limit, and here is a case where an outer if testing just the !(... || ...) and an inner testing the fallible call would win, IMHO.

>+            if ((!(hole || rval->isNullOrUndefined())) && !GetElementCharacters(cx, obj, locale, &cb, rval))

Same nits here.

/be
Attachment #467873 - Attachment is obsolete: true
Attachment #467911 - Flags: review?(lw)
Attachment #467873 - Flags: review?(lw)
Comment on attachment 467911 [details] [diff] [review]
address comments from lw and brendan

Looks good.

>+class AutoArrayCycleDetector
>+{
>+  public:
>+    AutoArrayCycleDetector(JSContext *cx, JSObject *obj) 
>+        : mContext(cx), mObject(obj), mGenBefore(0),
>+        mHashPointer(mContext->busyArrays.lookupForAdd(mObject))
>+    {
>+        JS_GUARD_OBJECT_NOTIFIER_INIT;
>+    }

I think you need a JS_GUARD_OBJECT_NOTIFIER_PARAM in the ctor parameter list for it to work.

Also, the naming convention for member variables has changed from mFoo to just foo.
Attachment #467911 - Flags: review?(lw) → review+
Three ideas for further speedup:
1. give js_ValueToCharBuffer an inline fast path for v.isString()
2. hoist "if (local)" out of the dense array loop, so we don't check on every iteration
3. I have been repeatedly disappointed by the code produced by gcc on loops over an index that go through the obj->getXYZ(i) inline helpers.  Rewrite the array loop to use:

  for (Value *vp = obj->getDenseArrayElements(), *end = vp + len; vp != end; ++vp)
Comment on attachment 467911 [details] [diff] [review]
address comments from lw and brendan

>+            if (!(rval->isMagic(JS_ARRAY_HOLE) || rval->isNullOrUndefined())) {

Applying De Morgan's theorem seems good here:

>+            if (!rval->isMagic(JS_ARRAY_HOLE) && !rval->isNullOrUndefined()) {

"If noot a hole and not null or undefined, then get the element."

>+            if (!(hole || rval->isNullOrUndefined())) {

Ditto.

/be
patch changed enough to need re-review.
Attachment #467911 - Attachment is obsolete: true
Attachment #468046 - Flags: review?(lw)
(In reply to comment #17)
> Three ideas for further speedup:
> 1. give js_ValueToCharBuffer an inline fast path for v.isString()

I copied the relevant part into GetElementCharacters, and that took us from 90ms to 73ms on dmandelin's test.

> 2. hoist "if (local)" out of the dense array loop, so we don't check on every
> iteration

I deleted the locale check altogether, and didn't get a speedup.

> 3. I have been repeatedly disappointed by the code produced by gcc on loops
> over an index that go through the obj->getXYZ(i) inline helpers.  Rewrite the
> array loop to use:
> 
>   for (Value *vp = obj->getDenseArrayElements(), *end = vp + len; vp != end;
> ++vp)

This worked as well, taking us from 73ms to 65ms on dmandelin's test.

new results for dmandelin's benchmark from comment #0:

TM tip:  143ms
JSC tip: 113ms
patched:  65ms
This patch seems to get me a reliable, small speedup on SunSpider itself.
Attached patch clean up the inline (obsolete) — Splinter Review
Attachment #468046 - Attachment is obsolete: true
Attachment #468047 - Flags: review?(lw)
Attachment #468046 - Flags: review?(lw)
Comment on attachment 468047 [details] [diff] [review]
clean up the inline

Far out!

>+    AutoArrayCycleDetector(JSContext *cx, JSObject *obj) 
>+        : context(cx), object(obj), genBefore(0),
>+        hashPointer(context->busyArrays.lookupForAdd(object)
>+        JS_GUARD_OBJECT_NOTIFIER_PARAM)

AutoArrayCycleDetector(JSContext *cx, JSObject *obj JS_GUARD_OBJECT_NOTIFIER_PARAM)

>+        for (Value *vp = obj->getDenseArrayElements(), *end = vp + len; vp != end; ++vp) {
>+            if (!JS_CHECK_OPERATION_LIMIT(cx))
>+                return false;
>+            *rval = *vp;
>+            if (!rval->isMagic(JS_ARRAY_HOLE) && !rval->isNullOrUndefined()) {
>+                if (!GetElementCharacters(cx, obj, locale, &cb, rval))
>+                    return false;
>+            }
>+        }

One last potential optimization: I think we can remove the "*rval = *vp" assignment and use rval directly in the subsequent test/call.  However, this also requires changing GetElementCharacters to not mutate its argument, which we can do using our new stack-scanning superpowers: just pass rval by const Value& and take out the rval.setObjectOrNull.  (Also, since rval would no longer be an outparam, cb should probably be the last parameter.)
(In reply to comment #23)
> 
> AutoArrayCycleDetector(JSContext *cx, JSObject *obj
> JS_GUARD_OBJECT_NOTIFIER_PARAM)
> 

uh, duh.

> just pass rval by const
> Value& and take out the rval.setObjectOrNull.

I don't think this idea will work without a lot more refactoring, because of js_TryMethod. I commented out the locale parameter just to get an idea of the speedup without the assignment. dmandelin's benchmark went from 65ms to 63ms. I don't think it's worth chasing right now.
Attached patch fix guard object macro (obsolete) — Splinter Review
Attachment #468047 - Attachment is obsolete: true
Attachment #468157 - Flags: review?(lw)
Attachment #468047 - Flags: review?(lw)
Attachment #468157 - Flags: review?(lw) → review+
this passes all the tests.
Attachment #468157 - Attachment is obsolete: true
Attachment #468168 - Flags: review?(lw)
I did my usual Cachegrind run on Sayre's latest patch (on TM, not JM):

---------------------------------------------------------------
| millions of instructions executed                           |
| total                        | on-trace (may overestimate)  |
---------------------------------------------------------------
|   110.883   110.883 (------) |    20.808    20.808 (------) | 3d-cube
|    56.935    56.936 (------) |    22.313    22.313 (------) | 3d-morph
|   145.607   145.641 (------) |    21.968    21.966 (------) | 3d-raytrace
|    76.966    76.966 (------) |    17.407    17.407 (------) | access-binary-
|   183.411   183.412 (------) |   103.705   103.705 (------) | access-fannkuc
|    41.358    41.358 (------) |    17.552    17.552 (------) | access-nbody
|    59.988    59.989 (------) |    27.013    27.013 (------) | access-nsieve
|    15.671    15.672 (------) |     2.862     2.862 (------) | bitops-3bit-bi
|    43.281    43.282 (------) |    30.281    30.281 (------) | bitops-bits-in
|    22.749    22.751 (------) |    10.219    10.219 (------) | bitops-bitwise
|    71.607    71.609 (------) |    39.453    39.453 (------) | bitops-nsieve-
|    42.348    42.350 (------) |    26.655    26.655 (------) | controlflow-re
|    46.905    46.907 (------) |     5.661     5.661 (------) | crypto-md5
|    31.207    31.209 (------) |     7.109     7.109 (------) | crypto-sha1
|   114.340   113.398 (1.008x) |    11.936    11.912 (1.002x) | date-format-to
|    86.148    86.149 (------) |    10.620    10.620 (------) | date-format-xp
|    52.587    52.588 (------) |    27.614    27.614 (------) | math-cordic
|    28.962    28.962 (------) |     6.166     6.166 (------) | math-partial-s
|    28.456    28.456 (------) |    10.286    10.286 (------) | math-spectral-
|    57.966    57.966 (------) |    34.591    34.591 (------) | regexp-dna
|    40.441    40.441 (------) |    10.397    10.397 (------) | string-base64
|   122.858   117.574 (1.045x) |    24.544    24.430 (1.005x) | string-fasta
|   185.180   185.181 (------) |    11.439    11.439 (------) | string-tagclou
|   180.737   180.737 (------) |    10.931    10.931 (------) | string-unpack-
|    51.295    51.295 (------) |     8.419     8.419 (------) | string-validat
-------
|  1897.897  1891.722 (1.003x) |   519.958   519.819 (------) | all


So I can believe it would save 1 or 2 ms overall.
Comment on attachment 468168 [details] [diff] [review]
fix thinko in js_ValueToCharBuffer, fix control flow in AutoArrayCycleDetector

>+    AutoArrayCycleDetector(JSContext *cx, JSObject *obj JS_GUARD_OBJECT_NOTIFIER_PARAM)
>+        : context(cx), object(obj), genBefore(0),
>+        hashPointer(context->busyArrays.lookupForAdd(object)), cycle(!!hashPointer)

Style seems to be to unindent the ": context(cx)" line 2 spaces so context lines up with hashPointer.

>+    ~AutoArrayCycleDetector() {
>+        JS_ASSERT(context);
>+        if (!cycle) {
>+            if (genBefore == context->busyArrays.generation())
>+                context->busyArrays.remove(hashPointer);
>+            else
>+                context->busyArrays.remove(object);
>+        }
>+    }

I think there is a problem if ~AutoArrayCycleDetector is called with !cycle, context->busyArrays.generation() == 0 and !hashPointer.  I believe this could happen if AutoArrayCycleDetector::update fails.  The fix seems to be changing "if (!cycle)" to "if (!cycle && hashPointer)".
Attached patch fix lw's final comments (obsolete) — Splinter Review
Attachment #468168 - Attachment is obsolete: true
Attachment #468370 - Flags: review?(lw)
Attachment #468168 - Flags: review?(lw)
Attachment #468370 - Flags: review?(lw) → review+
http://hg.mozilla.org/tracemonkey/rev/b404ad209cb9
Status: NEW → ASSIGNED
Whiteboard: fixed-in-tracemonkey
I had to back this out due to in-browser jstest failures. (it passed in the shell here).

http://hg.mozilla.org/tracemonkey/rev/c3c185d6ee88
Whiteboard: fixed-in-tracemonkey
Did this get tracked down / fixed?
Ping?
Tom, everyone's really busy fixing blockers for Firefox 4.0.  Nobody will have a chance to look at this until it's released.  Thanks for your patience!
I'll take this over if you don't mind.  I rebased the patches and fixed two bugs (lack of check for js_ProrotypeHasIndexedProperties and an unlikely case with the guards and OOM).

Using a few-months-old jsc and d8 (no crankshaft), I get:
trunk: 250ms
jsc:   244ms
d8:    155ms
patch: 101ms

SS also shows a reliable 1-2ms speedup, mostly in fasta as predicted.
Assignee: sayrer → luke
Might as well apply the same cleanup to array_toSource.
Attachment #522853 - Flags: review?(jwalden+bmo)
Attachment #468370 - Attachment is obsolete: true
Attachment #522855 - Flags: review?(jwalden+bmo)
(Green on try)
Attachment #522852 - Flags: review?(cdleary) → review+
Comment on attachment 522853 [details] [diff] [review]
clean up array_toSource in the same way as array_toString_sub

This detector class is such a great idea.[0]

>diff --git a/js/src/jsarray.cpp b/js/src/jsarray.cpp

>+    ~ArraySharpDetector() {
>+        if (chars)
>+            cx->free(chars);
>+        if (he && !sharp)
>+            js_LeaveSharpObject(cx, NULL);

How can |he| ever be NULL here?  Seems like the last test should just be |!sharp|, as it was previously.


>-    /* Cycles/joins are indicated by sharp objects. */
> #if JS_HAS_SHARP_VARS
>-    if (IS_SHARP(he)) {
>-        JS_ASSERT(sharpchars != 0);
>-        sb.replaceRawBuffer(sharpchars, js_strlen(sharpchars));
>+    if (detector.initiallySharp()) {
>+        jschar *chars = detector.takeSharpChars();
>+        sb.replaceRawBuffer(chars, js_strlen(chars));

I think you lost a |chars != NULL| assertion here.  Looks like that can go in takeSharpChars, I think.


>         /* Get element's character string. */
>         JSString *str;
>         if (hole) {
>             str = cx->runtime->emptyString;
>         } else {
>             str = js_ValueToSource(cx, *vp);
>             if (!str)
>-                goto out;
>+                return false;
>+            vp->setString(str);

Is this purely anchoring?


>+    JS_SET_RVAL(cx, vp, StringValue(str));
>+    return true;

Why not assign to |*vp|?  More traditional, and I suspect the set-rval system isn't going to be any more future-proof if we ever do the pass-arguments-and-length-as-combined-class thing.


>     /* Finalize the buffer. */
>+    JSString *str;
>+    str = sb.finishString();
>+    if (!str)
>         goto out;

Please to be unifying the declaration and initialization.

0. Were you maybe expecting a different adjective?
Attachment #522853 - Flags: review?(jwalden+bmo) → review+
(In reply to comment #41)
> How can |he| ever be NULL here?  Seems like the last test should just be
> |!sharp|, as it was previously.

If js_EnterSharpObject returns NULL.
 
> I think you lost a |chars != NULL| assertion here.  Looks like that can go in
> takeSharpChars, I think.

From previous nits, SM style is to generally not assert != NULL since you will get a clean unexploitable crash if it is NULL.

> Is this purely anchoring?

Yes, I think I was just preserving code here, but it looks like I can rely on stack scanning here and nix the anchoring.

> >+    JS_SET_RVAL(cx, vp, StringValue(str));
> >+    return true;
> 
> Why not assign to |*vp|?  More traditional, and I suspect the set-rval system
> isn't going to be any more future-proof if we ever do the
> pass-arguments-and-length-as-combined-class thing.

I rather prefer JS_SET_RVAL/JS_ARGV/JS_CALLEE to manual vp manipulation.  Plus, this greases the path for bug 580337.

> >+    JSString *str;
> >+    str = sb.finishString();
> 
> Please to be unifying the declaration and initialization.

Verily, the next patch (still r?you) does just that.  (It is currently this way because of the "goto crosses initialization" error.)
Comment on attachment 522855 [details] [diff] [review]
Sayre's patch, rebased and twiddled

>diff --git a/js/src/jshashtable.h b/js/src/jshashtable.h
>+        /* Leaves Ptr uninitialized. */
>+        Ptr() {}
>+
>@@ -137,16 +140,19 @@ class HashTable : private AllocPolicy
>+        /* Leaves AddPtr uninitialized. */
>+        AddPtr() {}

Put something deadbeefy here maybe?


>diff --git a/js/src/jsstr.cpp b/js/src/jsstr.cpp

> /* This function implements E-262-3 section 9.8, toString. */
> bool
>-js::ValueToStringBuffer(JSContext *cx, const Value &arg, StringBuffer &sb)
>+js::ValueToStringBufferSlow(JSContext *cx, const Value &arg, StringBuffer &sb)
> {
>     Value v = arg;
>     if (v.isObject() && !DefaultValue(cx, &v.toObject(), JSTYPE_STRING, &v))
>         return false;
> 
>-    if (v.isString()) {
>-        JSString *str = v.toString();
>-        size_t length = str->length();
>-        const jschar *chars = str->getChars(cx);
>-        if (!chars)
>-            return false;
>-        return sb.append(chars, length);
>-    }
>+    if (v.isString())
>+        return sb.append(v.toString());
>     if (v.isNumber())
>         return NumberValueToStringBuffer(cx, v, sb);
>     if (v.isBoolean())
>         return BooleanToStringBuffer(cx, v.toBoolean(), sb);
>     if (v.isNull())
>         return sb.append(cx->runtime->atomState.nullAtom);
>     JS_ASSERT(v.isUndefined());
>     return sb.append(cx->runtime->atomState.typeAtoms[JSTYPE_VOID]);

Should this perhaps require callers who might have a string to use the inline version that pre-handles the string case?  Then make this assert non-stringness.


>diff --git a/js/src/tests/ecma/Array/15.4.4.3-2.js b/js/src/tests/ecma/Array/15.4.4.3-2.js

Add some lines that test when the property is set both directly on the object and up the prototype chain, as either null or undefined.
Attachment #522855 - Flags: review?(jwalden+bmo) → review+
(In reply to comment #43)
> Should this perhaps require callers who might have a string to use the inline
> version that pre-handles the string case?

I thought about it, but, because of the DefaultValue call above, we already have to check for string, so I didn't see any reason to restrict usage.
You need to log in before you can comment on or make changes to this bug.