Closed Bug 1233642 Opened 4 years ago Closed 4 years ago

Self-host Array.prototype.concat.

Categories

(Core :: JavaScript Engine, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla48
Tracking Status
firefox46 --- affected
firefox48 --- fixed

People

(Reporter: arai, Assigned: arai)

References

(Blocks 1 open bug)

Details

Attachments

(3 files, 26 obsolete files)

3.12 KB, patch
arai
: review+
Details | Diff | Splinter Review
40.16 KB, patch
efaust
: review+
Details | Diff | Splinter Review
1.47 KB, patch
bholley
: review+
Details | Diff | Splinter Review
derived from bug 1165052.

Before supporting @@species in Array.prototype.concat, it should be better to self-host it.

With simple perf test (bug 1165052 comment #25), straight forward implementation seems to be fast enough.
tested WIP patch
https://treeherder.mozilla.org/#/jobs?repo=try&revision=231cb879bd41

only ecma_3/Array/regress-322135-02.js fails with timeout.  however, the code works somehow strangely in m-c.

https://dxr.mozilla.org/mozilla-central/source/js/src/tests/ecma_3/Array/regress-322135-02.js
> var length = 4294967295;
> var array1 = new Array(length);
> var array2 = ['Kibo'];
> var array;
> 
> try
> {
>   array = array1.concat(array2);
> }
> catch(ex)
> {
>   printStatus(ex.name + ': ' + ex.message);
> }

|array| there becomes an Array with 0-length without throwing any exception, maybe because of uint32_t overflow.  I think it should throw |RangeError: invalid array length| or something like that.

Anyway, is it required to run this code quickly?
See Also: → 688852
Here's benchmark result

          | before  | concat only  | concat+slice
----------+---------+--------------+--------------
SunSpider | 164.5ms | 154.6ms      | 154.7ms
Kraken    | 897.3ms | 898.7ms      | 966.0ms
          |         |              |
JetStream | 180.28  | 181.57       | 181.95
Octane    | 29061   | 27882        | 31316


Some regression in Kraken+slice, Octane+concat, and some improvement in SunSpider+concat, Octane+slice.
Blocks: 784288
So, about regress-322135-02.js case, it uses following branch

https://dxr.mozilla.org/mozilla-central/rev/388bdc46ba51ee31da8b8abe977e0ca38d117434/js/src/jsarray.cpp#2633
>     if (isArray && !ObjectMayHaveExtraIndexedProperties(aobj)) {
>         if (!GetLengthProperty(cx, aobj, &length))
>             return false;
> 
>         size_t initlen = GetAnyBoxedOrUnboxedInitializedLength(aobj);
>         narr = NewFullyAllocatedArrayTryReuseGroup(cx, aobj, initlen);
>         if (!narr)
>             return false;
>         CopyAnyBoxedOrUnboxedDenseElements(cx, narr, aobj, 0, 0, initlen);
>         SetAnyBoxedOrUnboxedArrayLength(cx, narr, length);
> 
>         args.rval().setObject(*narr);
> ...

with initlen = 0, length = 4294967295, so it skips iterating |this| value's elements.

https://dxr.mozilla.org/mozilla-central/rev/388bdc46ba51ee31da8b8abe977e0ca38d117434/js/src/jsarray.cpp#2650
>         if (length == initlen) {
>             while (argc) {
> ...
>                 for (size_t i = 0; i < initlen; i++) {
> ...
>                 }
> ...
>             }
>         }

this case could be optimized with current behavior, even in self-hosted code, but might need a bit complicated optimization path when we implement @@species.

Also, I'm not sure if this case really matters on real web.  does anyone know the actual use case with similar code?

anyway, will try implementing it and see how it's hard to do so :)
oh, I mis-understood the code.
|if (length == initlen)| check is used for iterating arguments, not |this|.
|this| elements are copied by |CopyAnyBoxedOrUnboxedDenseElements|.
See Also: → 1234947
I copied the CopyAnyBoxedOrUnboxedDenseElements optimization from native array_concat to self-hosted concat, but pure-JS code is faster than it when the Array length fits into int32_t.

Also, if the Array length doesn't fit into int32_t, it gets extremely slower, and that's the reason of the TIMEOUT in comment #1.
Filed bug 1234947 for it.

If that case is unusual in ordinary web app, maybe we could just disable the testcase and avoid applying the CopyAnyBoxedOrUnboxedDenseElements optimization?

Do you know if such case exists?
Flags: needinfo?(efaustbmo)
As suggested in IRC, redirecting ni?

Is there any known case in wild that we should perform Array.prototype.concat (and Array.prototype.slice for bug 1233643) quickly with an Array that its length doesn't fit into int32_t?
Flags: needinfo?(luke)
Flags: needinfo?(jdemooij)
Flags: needinfo?(efaustbmo)
I'm not aware of any, but I haven't followed Array stuff much.
Flags: needinfo?(luke)
(In reply to Tooru Fujisawa [:arai] from comment #6)
> Is there any known case in wild that we should perform
> Array.prototype.concat (and Array.prototype.slice for bug 1233643) quickly
> with an Array that its length doesn't fit into int32_t?

I'm not aware of such cases and I don't expect it to be common. Sparse arrays definitely show up in the wild (see bug 1088189 and bug 1087963 for instance), but I think that's mostly orthogonal and most of these should still have an int32 length.
Flags: needinfo?(jdemooij)
Attached image Performance comparison (obsolete) —
Thanks :)
I'll focus on other cases then.

performance on sparse arrays is fast enough I think.
short dense arrays may need some more optimization.
Attached patch Part 1: Add IsArray intrinsic. (obsolete) — Splinter Review
This is from bug 1165052.
Attachment #8707459 - Flags: review?(efaustbmo)
This implements general case of Array#concat, and removes native one including Ion code.
Attachment #8707460 - Flags: review?(efaustbmo)
This improves performance with single short array.
will post perf comparison.

Green on try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=bc18dc840676
Attachment #8707465 - Flags: review?(efaustbmo)
Attached image Performance comparison with Part 3 (obsolete) —
Attachment #8707459 - Flags: review?(efaustbmo) → review+
I must be confused or misreading graphs, or something. Do we understand why these patches make small dense array concats much slower? I imagine that length 10 must be more common than length 2**10, so...
Flags: needinfo?(arai.unmht)
thank you for reviewing :)

here's the characteristic between the length of dense array and the time taken by 50000 loops of |A.concat(A).length|.

In attached graph, current m-c (before) is faster than patched ones in [0, 63] and [208, 1023] range, and the performance gets too worse on length>=1024.

Part1+2 is faster than before in [64, 192] range.

Part1+2+3 is faster than before in [1024-] range, also, it's faster than Part1+2 in [0, 7], [16, 58], and [512-] range.  So, applying Part3 introduces both performance improvement and regression from Part1+2.

Will look into the reason of the spike at length=8, 64, 256, 512, etc.
Attached image time taken by std_Array(len1 + len2). (obsolete) —
so, the spike comes from std_Array(len1 + len2) allocation,

the graph is the result of following function

function ArrayConcat(arg1) {
    var len1 = ToLength(this.length);
    var len2 = ToLength(arg1.length);

    // Step 2 (reordered).
    var A = std_Array(len1 + len2);
    return A;
}

there seems to be a difference in allocation logic between original native concat and sts_Array.  will look into it.
Flags: needinfo?(arai.unmht)
Note that the spike with > ~7 elements is probably because the elements will no longer fit inline in the object. So we need either malloc or nursery allocated elements. Wild guess, maybe Ion does not have a fast path to allocate elements in the nursery when we do Array(x) and x is not a constant?
thanks :)

yeah, N=7 (len1+len2==14) and N=8 (len1+len2==16) uses different branch in following condition

https://dxr.mozilla.org/mozilla-central/rev/584870f1cbc5d060a57e147ce249f736956e2b62/js/src/jit/CodeGenerator.cpp#4803
> void
> CodeGenerator::visitNewArrayDynamicLength(LNewArrayDynamicLength* lir)
> {
> ...
>         if (templateObject->as<ArrayObject>().hasFixedElements()) {
>             size_t numSlots = gc::GetGCKindSlots(templateObject->asTenured().getAllocKind());
>             inlineLength = numSlots - ObjectElements::VALUES_PER_HEADER;
>         } else {
>             canInline = false;
>         }

will investigate how concat performs it quickly.
Added intrinsic for NewFullyAllocatedArrayTryReuseGroup, and called it instead of std_Array.
it reduces the spike at N=8 (length=16) and it draws almost same curve as "before", in N<1024 cases, but spike at N=64 (length=128) is still large, and it's bigger than Part2's case.
Attachment #8715135 - Attachment is obsolete: true
here's the patch for the intrinsic.
Attachment #8707460 - Attachment is obsolete: true
Attachment #8707460 - Flags: review?(efaustbmo)
Attachment #8707465 - Attachment is obsolete: true
Attachment #8707465 - Flags: review?(efaustbmo)
Attachment #8715373 - Attachment is obsolete: true
Here's current performance comparison with octane between m-c and part 2, 3, 4, average of 10 runs.

There is at most 5% improvement in SplayLatency, but other cases have <2% improvement/regression, depends on testcase.

will check other benchmarks.
In kraken, stanford-crypto-pbkdf2 regress about 20-25% in all cases.
stanford-crypto-ccm regress about 15% in 1+2+3 and 1+2+3+4 cases.
stanford-crypto-sha256-iterative and json-stringify-tinderbox regress about 5% in 1+2+3+4 case.
with SunSpider, again, all cases improve/regress about 5% depending on testcase.
Calculated characteristic for 2 cases,
  1. A.concat(A) where the length of A is variable.
  2. A.concat(B) where the length of A is variable and the length of B is fixed.
Attachment #8715372 - Attachment is obsolete: true
Attachment #8703112 - Attachment is obsolete: true
Attachment #8707466 - Attachment is obsolete: true
Now I'm thinking just using only Part 2 might be the best way, as Part 3 and Part 4 adds extra code and complexity, while the performance improvements in benchmarks are not so big :/
efaust, can I have your opinion about what we should do here?
do you think Part 3 and Part 4 (or some other optimization maybe?) should be applied?
Flags: needinfo?(efaustbmo)
In stanford-crypto-pbkdf2 and stanford-crypto-ccm, concat is called with only 1 argument on every time:

stanford-crypto-pbkdf2

  called  | this.length | arguments[0].length
----------+-------------+---------------------
  32768   |      0      |           8
  32760   |      8      |           1
     16   |      0      |          16
      8   |      3      |           1
      8   |      2      |           1
      8   |      0      |           3

stanford-crypto-ccm

  called  | this.length | arguments[0].length
----------+-------------+---------------------
   2254   |      1      |           4
   1056   |      3      |           3
    944   |      4      |           3
     11   |     28      |           4
      9   |     44      |           4
      8   |      8      |           4
      8   |      4      |           4
      8   |     36      |           4
      8   |     34      |           4
      8   |     24      |           4
      8   |      0      |           4
      7   |     47      |           4
      7   |     29      |           4
      7   |     25      |           4
      7   |      2      |           4
      6   |      9      |           4
      6   |      6      |           4
      6   |     46      |           4
      6   |     45      |           4
      6   |     41      |           4
      6   |     39      |           4
      6   |     23      |           4
      5   |      7      |           4
      5   |     49      |           4
      5   |     21      |           4
      5   |     18      |           4
      5   |     17      |           4
      5   |     12      |           4
      4   |      5      |           4
      4   |     48      |           4
      4   |     38      |           4
      4   |     37      |           4
      4   |     35      |           4
      4   |     33      |           4
      4   |     27      |           4
      4   |     22      |           4
      4   |     20      |           4
      4   |     14      |           4
      4   |     11      |           4
      4   |     10      |           4
      3   |     43      |           4
      3   |     42      |           4
      3   |     40      |           4
      3   |     31      |           4
      3   |     30      |           4
      3   |      3      |           4
      3   |     19      |           4
      3   |     16      |           4
      2   |     15      |           4
      1   |     32      |           4
      1   |     13      |           4

Will look for another way to improve performance with short array.
yeah, I think that's my main hangup here. We need to drive that kraken regression closer to 0, and then the rest of this looks great!
Flags: needinfo?(efaustbmo)
here's the performance characteristic with @@species patch.
both in native and self-hosted case, almost offset is applied.

currently self-hosted @@species support is not optimized as much as native one, as it always accesses to constructor and @@species.  maybe we could apply some optimization there.
and Kraken time, with @@species.
stanford-crypto-pbkdf2 time

with original LIST file
  before: 79.9
  native + @@species: 82.2
  part 2 + @@species: 100.7
  part 3 + @@species: 96.2
  part 4 + @@species: 91.3

move stanford-crypto-pbkdf2 to top
  before: 85.6
  native + @@species: 86.4
  part 2 + @@species: 97.2
  part 3 + @@species: 90.1
  part 4 + @@species: 88.0

only stanford-crypto-pbkdf2
  before: 86.4
  native + @@species: 86.9
  part 2 + @@species: 98.3
  part 3 + @@species: 90.0
  part 4 + @@species: 86.2

so, other test that runs before it affects so much.
they're good for native ones, and bad for self-hosted ones.
(In reply to Tooru Fujisawa [:arai] from comment #32)
> here's the performance characteristic with @@species patch.
> both in native and self-hosted case, almost offset is applied.

I meant, almost *same* offset.
placing other stanford-crypto-* test before stanford-crypto-pbkdf2 affects the time of stanford-crypto-pbkdf2.

`sjcl` object is re-initialized at the top of each test, so functions are not shared directly.
maybe GC related issue, or something related to shape?
https://github.com/h4writer/arewefastyet/blob/master/benchmarks/kraken/resources/sunspider-standalone-driver.js#L41
> for (var krakenCounter = 0; krakenCounter < tests.length; krakenCounter++) {
>     var testBase = "tests/" + suiteName + "/" + tests[krakenCounter];
>     var testName = testBase + ".js";
>     var testData = testBase + "-data.js";
>     // load test data
>     load(testData);
>     var startTime = new Date;
>     load(testName);
>     times[krakenCounter] = new Date() - startTime;
>     gc();       <----- this one
> }

in the test harness, gc() is called after each test.  when I increase the number of gc() call from 1 to 2, the score changes, and it's almost same as the score when I run stanford-crypto-pbkdf2 alone.

stanford-crypto-pbkdf2 is affected by this so much, and when we call gc() twice, the Part 4 score is better than before.
others are also affected but the difference are small.
changed gc(); to following to see the result of gc:
    print(gc());
    print(gc());

and executed following command in arewefastyet/benchmarks/kraken:
  js -e "var tests = ['stanford-crypto-ccm', 'stanford-crypto-pbkdf2'], suitePath = 'test/kraken-1.1'; suiteName='kraken-1.1';" -f resources/sunspider-standalone-driver.js

and the output is following:
  before 3809280, after 3805184

  before 3809280, after 856064

  before 958464, after 958464

  before 962560, after 655360

  {
      "stanford-crypto-ccm": 95,
      "stanford-crypto-pbkdf2": 86
  }

so, first gc() call after test does not reduced the memory usage.
terrence, what would be the reason of the difference between 1st and 2nd gc call?
Flags: needinfo?(terrence)
so far, looks like `gc()` call starts background sweeping, but it does not wait for the background sweeping it started.  Also, it waits for already-running sweeping.  So 2nd gc() call waits for the sweeping for 1st gc() call, and after 2nd gc() call returns, gcBytes is reduced.

Is this expected behavior for gc() testing function?


[main thread]

gc();   // JavaScript
|
+- GC
   |
   +- JS::GCForReason
      |
      +- GCRuntime::gc
         |
         +- GCRuntime::collect
             |
             +- GCRuntime::gcCycle
                |
                +- waitBackgroundSweepEnd
                |  |
                |  +- state
                |  |  # returns IDLE     on 1st gc() call
                |  |  # returns SWEEPING on 2nd gc() call
                |  |
                |  +- waitForBackgroundThread
                |     # called only on 2nd gc() call
                |     # after returned, gcBytes is reduced
                |
                +- GCRuntime::incrementalCollectSlice
                   |
                   +- GCRuntime::sweepPhase
                      |
                      +- GCRuntime::endSweepingZoneGroup
                         |
                         +- GCRuntime::queueZonesForBackgroundSweep
                            |
                            +- GCHelperState::maybeStartBackgroundSweep
                               |
                               +- GCHelperState::startBackgroundThread(SWEEPING)
                                  # starts background sweeping

[background thread]

HelperThread::threadLoop()
|
+- HelperThread::handleGCHelperWorkload
   |
   +- GCHelperState::work
      |
      +- GCHelperState::doSweep
         |
         +- sweepBackgroundThings
            |
            +- LifoAlloc::freeAll
               # take some time on 1st call, and 1st gc() returns while
               # LifoAlloc::freeAll is running
Yes, this is the current behavior of direct calls to gc(). GC invoked by the system is smart enough to skip the second GC if the first is still going, but direct calls to gc() from script is generally an indication that the invoker wants to verify that some piece of memory is gone, so we have to be thorough. I am currently improving this situation in bug 1249367.
Flags: needinfo?(terrence)
thanks, if bug 1249367 changes the gc() function behavior to wait for the GC it starts, I can just wait for it.

I'll look into stanford-crypto-ccm next, as it's still slower even if I call gc() twice.
mmm, still the order in the LIST and the number of gc() affect the test result when Part 1-4 are applied.
seems that there's yet another reason.
So far, each function is Ion compiled different times, depending on whether stanford-crypto-pbkdf2 is executed alone or after stanford-crypto-ccm.

for example, when stanford-crypto-pbkdf2 is executed after stanford-crypto-ccm,
sjcl.misc.pbkdf2 is compiled 5 times (compared to 3 times).

maybe, ArrayConcat is executed under Interpreter more times when stanford-crypto-pbkdf2 is executed after stanford-crypto-ccm?
I will look into it.
time consumed by ArrayConcat itself is not so different

stanford-crypto-pbkdf2 only:                      11.5 [ms]
stanford-crypto-pbkdf2 after stanford-crypto-ccm: 12.8 [ms]


anyway, will try figuring out why the Ion code is discarded 2 more times.
See Also: → 1251909
testing with Parts 1-4 applied.

1. the performance difference disappears when I rename all `sjcl` to `sjcl2` in stanford-crypto-pbkdf2.js and stanford-crypto-pbkdf2-data.js

2. I see both stanford-crypto-pbkdf2-data.js and stanford-crypto-ccm-data.js filenames in single iongraph when I execute following
  js -e "var tests = ['stanford-crypto-ccm', 'stanford-crypto-pbkdf2'], suitePath = 'test/kraken-1.1'; suiteName='kraken-1.1';" -f resources/sunspider-standalone-driver.js

looks like, there `sjcl.codec.utf8String.toBits` and `sjcl.bitArray.P` in stanford-crypto-ccm-data.js are called inside some function in stanford-crypto-pbkdf2-data.js (not yet investigated what the function is tho)
Depends on: 1252228
1. the performance difference in stanford-crypto-pbkdf2 between running alone and running with another tests gets somehow different

  with bug 1252228 patch, Part 1+2+3+4, pbkdf2 alone:     88.1 [ms]
  with bug 1252228 patch, Part 1+2+3+4, pbkdf2 after ccm: 83.2 [ms]

now stanford-crypto-pbkdf2 is faster when stanford-crypto-ccm is executed before it.
but now the score doesn't depend on the number of gc() call, nor whether running in different global or not.


2. the patch in bug 1252228 makes stanford-crypto-pbkdf2 faster even without this bug's patches

so, now we have yet another performance regression issue for stanford-crypto-pbkdf2, and pre-existing regression for stanford-crypto-ccm, from m-i (bug 1252228 patch patch applied).
looks like, inlineNewArrayTryReuseGroup fails several times because it tries to optimize too much.
it expects the argument to be always same, but Array#concat can be called with different groups.

maybe, I have to fallback to std_Array's inlining in that case.
Attached file WIP patches (obsolete) —
Here's current WIP patches, including extra Part 5 that changes inlining of NewArrayTryReuseGroup to fallback to std_Array's one, in case there are multiple groups for template.
Also, here's what I'm testing now with stanford-crypto-ccm.

the difference between Part2 and Part3 that impacts the performance in stanford-crypto-ccm is something like following.

following 3 version of ArrayConcat are the simplified Array#concat that works with stanford-crypto-ccm.  That is, fast-path used in Part 3 (works with array.concat(array)).

The difference between them is how to get 1st argument. The score regresses with latter 2 cases about 10ms.

// fast

function ArrayConcat(arg1) {
    var i = 0; i *= 1; var E = arguments[i];
    var len1 = this.length;
    var len2 = E.length;

    var A = std_Array(len1 + len2);

    for (var n = 0; n < len1; n++) {
        if (n in this)
            _DefineDataProperty(A, n, this[n]);
    }

    for (var k = 0; k < len2; k++) {
        if (k in E)
            _DefineDataProperty(A, n, E[k]);

        n++;
    }

    return A;
}

// slow

function ArrayConcat(arg1) {
    var i = 0; var E = arguments[i];
    var len1 = this.length;
    var len2 = E.length;

    var A = std_Array(len1 + len2);

    for (var n = 0; n < len1; n++) {
        if (n in this)
            _DefineDataProperty(A, n, this[n]);
    }

    for (var k = 0; k < len2; k++) {
        if (k in E)
            _DefineDataProperty(A, n, E[k]);

        n++;
    }

    return A;
}

// also slow

function ArrayConcat(arg1) {
    var E = arg1;
    var len1 = this.length;
    var len2 = E.length;

    var A = std_Array(len1 + len2);

    for (var n = 0; n < len1; n++) {
        if (n in this)
            _DefineDataProperty(A, n, this[n]);
    }

    for (var k = 0; k < len2; k++) {
        if (k in E)
            _DefineDataProperty(A, n, E[k]);

        n++;
    }

    return A;
}

Then, here's the minimal testcase that seems to be almost same situation as stanford-crypto-ccm's regression.
the script takes single argument, when it's "A", it uses ArrayConcat_A, that corresponds to slow one above.  and when it's others like "B", it uses ArrayConcat_B, that corresponds to fast one above.

"use strict";

function ArrayConcat_A() {
  var i = 0;
  arguments[i];
  for (var n = 0; n < 0; n++) {
    n in this;
  }
}

function ArrayConcat_B() {
  var i = 0;
  i *= 1;
  arguments[i];
  for (var n = 0; n < 0; n++) {
    n in this;
  }
}

function f() {
  for (var i = 0; i < 10000; i++) {
    [].my_concat([]);
    for (var n = 0; n < 0; n++)
      [].slice();
  }
}

if (scriptArgs[0] == "A")
  Array.prototype.my_concat = ArrayConcat_A;
else
  Array.prototype.my_concat = ArrayConcat_B;

var T = elapsed();
f();
print(elapsed() - T);


Then, when I look the iongraph, ArrayConcat_A is inlined but ArrayConcat_B is not inlined.  so I think the difference comes from inlining, and the reason is the way I used to access arguments.

Then, when I increase the number of the loop (like i < 100000), the score changes and then ArrayConcat_A gets faster.
and now I stuck here.
fixed Part 4 to use dedicated temp register instead of output register that can alias input.
Attachment #8716587 - Attachment is obsolete: true
Attached file WIP patches (Part 1, 2, 2.1, 2.2) (obsolete) —
Now Kraken score gets better by checking IsPackedArray and skip `k in E` in that case (Part 2.1), thanks to jandem :)
Part 2.2 is just for some possible cleanup.
Part3-5 in previous posts are not used.

Here're Kraken and Octane performance comparison, thanks to bbouvier :)

Kraken benchmark score (average of 10 runs)
http://bnjbvr.github.io/simplegraph/#title=Kraken%20benchmark%20score%20%28average%20of%2010%20runs%29&categories=audio-fft,stanford-crypto-pbkdf2,audio-beat-detection,stanford-crypto-ccm,imaging-darkroom,json-parse-financial,audio-oscillator,ai-astar,audio-dft,stanford-crypto-sha256-iterative,json-stringify-tinderbox,imaging-gaussian-blur,stanford-crypto-aes,imaging-desaturate&values=before%2047.8%2074.6%2088%2074.9%2081.6%2036.4%2063.8%2065.8%20109.2%2039.2%2041%2064.2%2051.8%2053%0Apart2%2046.4%2092.6%2089.4%2076.7%2082.1%2036.8%2064.2%2065.6%20109.2%2039.3%2041%2064.3%2051.3%2053.6%0Apart2.1%2046.6%2081.6%2088.3%2074.9%2081.9%2036.6%2064.1%2065.7%20109%2036.4%2038.8%2064.4%2051.5%2053.3%0Apart2.2%2046.6%2081.8%2088.6%2075%2082%2036.6%2063.9%2065.6%20109.4%2036.4%2038.7%2064.4%2051.5%2053.2

Octane benchmark score (average of 10 runs)
http://bnjbvr.github.io/simplegraph/#title=Octane%20benchmark%20score%20%28average%20of%2010%20runs%29&categories=Richards,DeltaBlue,Crypto,RayTrace,EarleyBoyer,RegExp,Splay,SplayLatency,NavierStokes,PdfJS,Mandreel,MandreelLatency,Gameboy,CodeLoad,Box2D,zlib,Typescript,Score&values=before%2032093%2064102%2030975%20117954%2032595%203880%2017804%2012523%2037782%2014402%2031255%2034548%2052844%2017265%2057200%2080568%2031340%2030397%0Apart2%2031867%2065302%2030837%20116592%2032310%203826%2017989%2013004%2037871%2014717%2031203%2033782%2053609%2017144%2057031%2080819%2031307%2030444%0Apart2.1%2032139%2065006%2030676%20116296%2032513%203876%2019411%2018949%2038127%2014807%2031053%2037245%2050540%2017517%2057930%2080317%2031216%2031416%0Apart2.2%2032204%2064728%2030976%20116430%2032611%203890%2018791%2017367%2038041%2014796%2030792%2038123%2050104%2017412%2057067%2080691%2031729%2031222

Kraken stanford-crypto-ccm is as fast as today's m-c, and performance regresion in stanford-crypto-pbkdf2 is about 9% (+7ms).

then, now Octane SplayLatency and MandreelLatency show somewhat strange regression :/

following are SplayLatency result for each run, with part2.1:
  13665
  21148
  21168
  13144
  13569
  21102
  21596
  21513
  21556
  21026

and same for MandreelLatency:
  38947
  38947
  38947
  37767
  37483
  38348
  37767
  28984
  37203
  38055

sometimes they show nice score.


Also, following are SplayLatency result with m-c:
  13224
  12421
  12574
  12452
  12379
  11439
  12123
  12914
  12839
  12862

and same for MandreelLatency:
  38947
  28487
  28816
  28165
  37483
  38947
  38055
  28984
  38645
  38947


SplayLatency doesn't show score around 21000 with m-c, so this will be something I have to investigate next.
MandreelLatency shows around 28000 and 38000 with both m-c and part2.1, but there seems to be some difference in probability (maybe I have to check with more runs)
again, it looks like a issue depends on executing order.

SplayLatency score for 20 runs, runing splay.js only
  11699
  12216
  11524
  16435
  11666
  15628
  11565
  12836
  12246
  11930
  12039
  11977
  11938
  11728
  11779
  11591
  12053
  11961
  12594
  11645

SplayLatency score for 20 runs, runing splay.js after regexp.js
  18539
  18684
  13321
  14097
  14019
  12608
  21694
  18504
  13918
  14068
  13870
  14162
  14233
  13706
  14287
  18777
  22132
  13793
  14151
  21276
just confirmed `SplayLatency: 22321` on m-c, so both SplayLatency and MandreelLatency seem to be the issue of the possibility.
scratch that.  I totally mis-read the Octane results.

so, they shows bigger numbers when applying part 2.1, it means that it improves the performance.
Attachment #8707459 - Attachment is obsolete: true
Attachment #8715141 - Attachment is obsolete: true
Attachment #8716585 - Attachment is obsolete: true
Attachment #8716586 - Attachment is obsolete: true
Attachment #8716588 - Attachment is obsolete: true
Attachment #8716592 - Attachment is obsolete: true
Attachment #8716593 - Attachment is obsolete: true
Attachment #8716597 - Attachment is obsolete: true
Attachment #8721042 - Attachment is obsolete: true
Attachment #8721045 - Attachment is obsolete: true
Attachment #8721609 - Attachment is obsolete: true
Attachment #8724434 - Attachment is obsolete: true
Attachment #8725172 - Attachment is obsolete: true
Attachment #8725579 - Attachment is obsolete: true
Attachment #8725604 - Attachment is obsolete: true
Attachment #8725669 - Attachment is obsolete: true
Attachment #8725746 - Flags: review+
folded Part 2.1 and 2.2, and updated spec date.
Attachment #8725747 - Flags: review?(efaustbmo)
sorry, forgot to revert unnecessary reorder for Step 2.
Attachment #8725747 - Attachment is obsolete: true
Attachment #8725747 - Flags: review?(efaustbmo)
Attachment #8725770 - Flags: review?(efaustbmo)
Comment on attachment 8725770 [details] [diff] [review]
Part 2: Self-host Array.prototype.concat.

sorry again
I should've checked IsArray (or at least IsObject) before IsPackedArray, as IsPackedArray accepts only object.

will check jstests/jit-test first and then post fixed patch
Attachment #8725770 - Flags: review?(efaustbmo)
Moved IsPackedArray to IsArray branch, as we will change the IsArray call there to IsConcatSpreadable, and the difference is only in the loop there.
IsConcatSpreadable also ensures the value is an object, so it can be just replaced.

Green on try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=8cb1f5318301
Attachment #8725770 - Attachment is obsolete: true
Attachment #8725896 - Flags: review?(efaustbmo)
Comment on attachment 8725896 [details] [diff] [review]
Part 2: Self-host Array.prototype.concat.

Review of attachment 8725896 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good!

::: js/src/builtin/Array.js
@@ +876,5 @@
> +    // Step 1.
> +    var O = ToObject(this);
> +
> +    // Step 2.
> +    var A = std_Array(0);

This will go to ArraySpeciesCreate(0,O) in the other bug I'm reviewing, I assume.

@@ +881,5 @@
> +
> +    // Step 3.
> +    var n = 0;
> +
> +    // Step 4 (skipped).

nit: prefer (implicit in |arguments|), or something

::: js/src/tests/ecma_3/Array/regress-322135-02.js
@@ +1,1 @@
> +// |reftest| skip -- slow

Do we will need to skip this? Do we want or need a followup bug to fix this?

::: js/src/vm/Xdr.h
@@ +35,4 @@
>  static const uint32_t XDR_BYTECODE_VERSION =
>      uint32_t(0xb973c0de - XDR_BYTECODE_VERSION_SUBTRAHEND);
>  
> +static_assert(JSErr_Limit == 446,

I've had to review this for a long time, I guess. Luckily, we don't need this anymore
Attachment #8725896 - Flags: review?(efaustbmo) → review+
Thank you for reviewing :D

(In reply to Eric Faust [:efaust] from comment #61)
> Comment on attachment 8725896 [details] [diff] [review]
> Part 2: Self-host Array.prototype.concat.
> 
> Review of attachment 8725896 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Looks good!
> 
> ::: js/src/builtin/Array.js
> @@ +876,5 @@
> > +    // Step 1.
> > +    var O = ToObject(this);
> > +
> > +    // Step 2.
> > +    var A = std_Array(0);
> 
> This will go to ArraySpeciesCreate(0,O) in the other bug I'm reviewing, I
> assume.

Yes :)


> ::: js/src/tests/ecma_3/Array/regress-322135-02.js
> @@ +1,1 @@
> > +// |reftest| skip -- slow
> 
> Do we will need to skip this? Do we want or need a followup bug to fix this?

Yes, we need to skip this for now.
It's related to bug 1234947.  not sure if the bug can be fixed easily.
the test is slow because of `4294967295` is not int32_t, and Ion cannot use Int32 in the loop in concat.

> var length = 4294967295;
> var array1 = new Array(length);
In Part 2, Array#concat was changed from native to self-hosted, and it causes that xray exposes Array.concat generic.
looks like other self-hosted generic are also not in the list to skip, so I think I can remove concat from there too.
how do you think?
Attachment #8736028 - Flags: review?(efaustbmo)
Comment on attachment 8736028 [details] [diff] [review]
Part 2.1: Exclude concat from Array ctor property skip list in xray.

Review of attachment 8736028 [details] [diff] [review]:
-----------------------------------------------------------------

Forwarding to an xpconnect peer.
Attachment #8736028 - Flags: review?(efaustbmo) → review?(bobbyholley)
Attachment #8736028 - Flags: review?(bobbyholley) → review+
Blocks: 1041586
https://hg.mozilla.org/mozilla-central/rev/ef8f61469172
https://hg.mozilla.org/mozilla-central/rev/70e78d669f9d
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla48
Depends on: 1271850
Depends on: 1287520
You need to log in before you can comment on or make changes to this bug.