Closed
Bug 1233642
Opened 8 years ago
Closed 8 years ago
Self-host Array.prototype.concat.
Categories
(Core :: JavaScript Engine, defect)
Core
JavaScript Engine
Tracking
()
RESOLVED
FIXED
mozilla48
People
(Reporter: arai, Assigned: arai)
References
(Blocks 1 open bug)
Details
Attachments
(3 files, 26 obsolete files)
3.12 KB,
patch
|
arai
:
review+
|
Details | Diff | Splinter Review |
40.16 KB,
patch
|
efaust
:
review+
|
Details | Diff | Splinter Review |
1.47 KB,
patch
|
bholley
:
review+
|
Details | Diff | Splinter Review |
derived from bug 1165052. Before supporting @@species in Array.prototype.concat, it should be better to self-host it. With simple perf test (bug 1165052 comment #25), straight forward implementation seems to be fast enough.
Assignee | ||
Comment 1•8 years ago
|
||
tested WIP patch https://treeherder.mozilla.org/#/jobs?repo=try&revision=231cb879bd41 only ecma_3/Array/regress-322135-02.js fails with timeout. however, the code works somehow strangely in m-c. https://dxr.mozilla.org/mozilla-central/source/js/src/tests/ecma_3/Array/regress-322135-02.js > var length = 4294967295; > var array1 = new Array(length); > var array2 = ['Kibo']; > var array; > > try > { > array = array1.concat(array2); > } > catch(ex) > { > printStatus(ex.name + ': ' + ex.message); > } |array| there becomes an Array with 0-length without throwing any exception, maybe because of uint32_t overflow. I think it should throw |RangeError: invalid array length| or something like that. Anyway, is it required to run this code quickly?
Assignee | ||
Comment 2•8 years ago
|
||
Here's benchmark result | before | concat only | concat+slice ----------+---------+--------------+-------------- SunSpider | 164.5ms | 154.6ms | 154.7ms Kraken | 897.3ms | 898.7ms | 966.0ms | | | JetStream | 180.28 | 181.57 | 181.95 Octane | 29061 | 27882 | 31316 Some regression in Kraken+slice, Octane+concat, and some improvement in SunSpider+concat, Octane+slice.
Assignee | ||
Comment 3•8 years ago
|
||
So, about regress-322135-02.js case, it uses following branch https://dxr.mozilla.org/mozilla-central/rev/388bdc46ba51ee31da8b8abe977e0ca38d117434/js/src/jsarray.cpp#2633 > if (isArray && !ObjectMayHaveExtraIndexedProperties(aobj)) { > if (!GetLengthProperty(cx, aobj, &length)) > return false; > > size_t initlen = GetAnyBoxedOrUnboxedInitializedLength(aobj); > narr = NewFullyAllocatedArrayTryReuseGroup(cx, aobj, initlen); > if (!narr) > return false; > CopyAnyBoxedOrUnboxedDenseElements(cx, narr, aobj, 0, 0, initlen); > SetAnyBoxedOrUnboxedArrayLength(cx, narr, length); > > args.rval().setObject(*narr); > ... with initlen = 0, length = 4294967295, so it skips iterating |this| value's elements. https://dxr.mozilla.org/mozilla-central/rev/388bdc46ba51ee31da8b8abe977e0ca38d117434/js/src/jsarray.cpp#2650 > if (length == initlen) { > while (argc) { > ... > for (size_t i = 0; i < initlen; i++) { > ... > } > ... > } > } this case could be optimized with current behavior, even in self-hosted code, but might need a bit complicated optimization path when we implement @@species. Also, I'm not sure if this case really matters on real web. does anyone know the actual use case with similar code? anyway, will try implementing it and see how it's hard to do so :)
Assignee | ||
Comment 4•8 years ago
|
||
oh, I mis-understood the code. |if (length == initlen)| check is used for iterating arguments, not |this|. |this| elements are copied by |CopyAnyBoxedOrUnboxedDenseElements|.
Assignee | ||
Comment 5•8 years ago
|
||
I copied the CopyAnyBoxedOrUnboxedDenseElements optimization from native array_concat to self-hosted concat, but pure-JS code is faster than it when the Array length fits into int32_t. Also, if the Array length doesn't fit into int32_t, it gets extremely slower, and that's the reason of the TIMEOUT in comment #1. Filed bug 1234947 for it. If that case is unusual in ordinary web app, maybe we could just disable the testcase and avoid applying the CopyAnyBoxedOrUnboxedDenseElements optimization? Do you know if such case exists?
Flags: needinfo?(efaustbmo)
Assignee | ||
Comment 6•8 years ago
|
||
As suggested in IRC, redirecting ni? Is there any known case in wild that we should perform Array.prototype.concat (and Array.prototype.slice for bug 1233643) quickly with an Array that its length doesn't fit into int32_t?
Flags: needinfo?(luke)
Flags: needinfo?(jdemooij)
Flags: needinfo?(efaustbmo)
Comment 7•8 years ago
|
||
I'm not aware of any, but I haven't followed Array stuff much.
Flags: needinfo?(luke)
Comment 8•8 years ago
|
||
(In reply to Tooru Fujisawa [:arai] from comment #6) > Is there any known case in wild that we should perform > Array.prototype.concat (and Array.prototype.slice for bug 1233643) quickly > with an Array that its length doesn't fit into int32_t? I'm not aware of such cases and I don't expect it to be common. Sparse arrays definitely show up in the wild (see bug 1088189 and bug 1087963 for instance), but I think that's mostly orthogonal and most of these should still have an int32 length.
Flags: needinfo?(jdemooij)
Assignee | ||
Comment 9•8 years ago
|
||
Thanks :) I'll focus on other cases then. performance on sparse arrays is fast enough I think. short dense arrays may need some more optimization.
Assignee | ||
Comment 10•8 years ago
|
||
This is from bug 1165052.
Attachment #8707459 -
Flags: review?(efaustbmo)
Assignee | ||
Comment 11•8 years ago
|
||
This implements general case of Array#concat, and removes native one including Ion code.
Attachment #8707460 -
Flags: review?(efaustbmo)
Assignee | ||
Comment 12•8 years ago
|
||
This improves performance with single short array. will post perf comparison. Green on try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=bc18dc840676
Attachment #8707465 -
Flags: review?(efaustbmo)
Assignee | ||
Comment 13•8 years ago
|
||
Updated•8 years ago
|
Attachment #8707459 -
Flags: review?(efaustbmo) → review+
Comment 14•8 years ago
|
||
I must be confused or misreading graphs, or something. Do we understand why these patches make small dense array concats much slower? I imagine that length 10 must be more common than length 2**10, so...
Flags: needinfo?(arai.unmht)
Assignee | ||
Comment 15•8 years ago
|
||
thank you for reviewing :) here's the characteristic between the length of dense array and the time taken by 50000 loops of |A.concat(A).length|. In attached graph, current m-c (before) is faster than patched ones in [0, 63] and [208, 1023] range, and the performance gets too worse on length>=1024. Part1+2 is faster than before in [64, 192] range. Part1+2+3 is faster than before in [1024-] range, also, it's faster than Part1+2 in [0, 7], [16, 58], and [512-] range. So, applying Part3 introduces both performance improvement and regression from Part1+2. Will look into the reason of the spike at length=8, 64, 256, 512, etc.
Assignee | ||
Comment 16•8 years ago
|
||
so, the spike comes from std_Array(len1 + len2) allocation, the graph is the result of following function function ArrayConcat(arg1) { var len1 = ToLength(this.length); var len2 = ToLength(arg1.length); // Step 2 (reordered). var A = std_Array(len1 + len2); return A; } there seems to be a difference in allocation logic between original native concat and sts_Array. will look into it.
Flags: needinfo?(arai.unmht)
Comment 17•8 years ago
|
||
Note that the spike with > ~7 elements is probably because the elements will no longer fit inline in the object. So we need either malloc or nursery allocated elements. Wild guess, maybe Ion does not have a fast path to allocate elements in the nursery when we do Array(x) and x is not a constant?
Assignee | ||
Comment 18•8 years ago
|
||
thanks :) yeah, N=7 (len1+len2==14) and N=8 (len1+len2==16) uses different branch in following condition https://dxr.mozilla.org/mozilla-central/rev/584870f1cbc5d060a57e147ce249f736956e2b62/js/src/jit/CodeGenerator.cpp#4803 > void > CodeGenerator::visitNewArrayDynamicLength(LNewArrayDynamicLength* lir) > { > ... > if (templateObject->as<ArrayObject>().hasFixedElements()) { > size_t numSlots = gc::GetGCKindSlots(templateObject->asTenured().getAllocKind()); > inlineLength = numSlots - ObjectElements::VALUES_PER_HEADER; > } else { > canInline = false; > } will investigate how concat performs it quickly.
Assignee | ||
Comment 19•8 years ago
|
||
Added intrinsic for NewFullyAllocatedArrayTryReuseGroup, and called it instead of std_Array. it reduces the spike at N=8 (length=16) and it draws almost same curve as "before", in N<1024 cases, but spike at N=64 (length=128) is still large, and it's bigger than Part2's case.
Attachment #8715135 -
Attachment is obsolete: true
Assignee | ||
Comment 20•8 years ago
|
||
here's the patch for the intrinsic.
Assignee | ||
Comment 21•8 years ago
|
||
Attachment #8707460 -
Attachment is obsolete: true
Attachment #8707460 -
Flags: review?(efaustbmo)
Assignee | ||
Comment 22•8 years ago
|
||
Attachment #8707465 -
Attachment is obsolete: true
Attachment #8707465 -
Flags: review?(efaustbmo)
Assignee | ||
Comment 23•8 years ago
|
||
Attachment #8715373 -
Attachment is obsolete: true
Assignee | ||
Comment 24•8 years ago
|
||
Here's current performance comparison with octane between m-c and part 2, 3, 4, average of 10 runs. There is at most 5% improvement in SplayLatency, but other cases have <2% improvement/regression, depends on testcase. will check other benchmarks.
Assignee | ||
Comment 25•8 years ago
|
||
In kraken, stanford-crypto-pbkdf2 regress about 20-25% in all cases. stanford-crypto-ccm regress about 15% in 1+2+3 and 1+2+3+4 cases. stanford-crypto-sha256-iterative and json-stringify-tinderbox regress about 5% in 1+2+3+4 case.
Assignee | ||
Comment 26•8 years ago
|
||
with SunSpider, again, all cases improve/regress about 5% depending on testcase.
Assignee | ||
Comment 27•8 years ago
|
||
Calculated characteristic for 2 cases, 1. A.concat(A) where the length of A is variable. 2. A.concat(B) where the length of A is variable and the length of B is fixed.
Attachment #8715372 -
Attachment is obsolete: true
Assignee | ||
Updated•8 years ago
|
Attachment #8703112 -
Attachment is obsolete: true
Assignee | ||
Updated•8 years ago
|
Attachment #8707466 -
Attachment is obsolete: true
Assignee | ||
Comment 28•8 years ago
|
||
Now I'm thinking just using only Part 2 might be the best way, as Part 3 and Part 4 adds extra code and complexity, while the performance improvements in benchmarks are not so big :/
Assignee | ||
Comment 29•8 years ago
|
||
efaust, can I have your opinion about what we should do here? do you think Part 3 and Part 4 (or some other optimization maybe?) should be applied?
Flags: needinfo?(efaustbmo)
Assignee | ||
Comment 30•8 years ago
|
||
In stanford-crypto-pbkdf2 and stanford-crypto-ccm, concat is called with only 1 argument on every time: stanford-crypto-pbkdf2 called | this.length | arguments[0].length ----------+-------------+--------------------- 32768 | 0 | 8 32760 | 8 | 1 16 | 0 | 16 8 | 3 | 1 8 | 2 | 1 8 | 0 | 3 stanford-crypto-ccm called | this.length | arguments[0].length ----------+-------------+--------------------- 2254 | 1 | 4 1056 | 3 | 3 944 | 4 | 3 11 | 28 | 4 9 | 44 | 4 8 | 8 | 4 8 | 4 | 4 8 | 36 | 4 8 | 34 | 4 8 | 24 | 4 8 | 0 | 4 7 | 47 | 4 7 | 29 | 4 7 | 25 | 4 7 | 2 | 4 6 | 9 | 4 6 | 6 | 4 6 | 46 | 4 6 | 45 | 4 6 | 41 | 4 6 | 39 | 4 6 | 23 | 4 5 | 7 | 4 5 | 49 | 4 5 | 21 | 4 5 | 18 | 4 5 | 17 | 4 5 | 12 | 4 4 | 5 | 4 4 | 48 | 4 4 | 38 | 4 4 | 37 | 4 4 | 35 | 4 4 | 33 | 4 4 | 27 | 4 4 | 22 | 4 4 | 20 | 4 4 | 14 | 4 4 | 11 | 4 4 | 10 | 4 3 | 43 | 4 3 | 42 | 4 3 | 40 | 4 3 | 31 | 4 3 | 30 | 4 3 | 3 | 4 3 | 19 | 4 3 | 16 | 4 2 | 15 | 4 1 | 32 | 4 1 | 13 | 4 Will look for another way to improve performance with short array.
Comment 31•8 years ago
|
||
yeah, I think that's my main hangup here. We need to drive that kraken regression closer to 0, and then the rest of this looks great!
Flags: needinfo?(efaustbmo)
Assignee | ||
Comment 32•8 years ago
|
||
here's the performance characteristic with @@species patch. both in native and self-hosted case, almost offset is applied. currently self-hosted @@species support is not optimized as much as native one, as it always accesses to constructor and @@species. maybe we could apply some optimization there.
Assignee | ||
Comment 33•8 years ago
|
||
and Kraken time, with @@species.
Assignee | ||
Comment 34•8 years ago
|
||
stanford-crypto-pbkdf2 time with original LIST file before: 79.9 native + @@species: 82.2 part 2 + @@species: 100.7 part 3 + @@species: 96.2 part 4 + @@species: 91.3 move stanford-crypto-pbkdf2 to top before: 85.6 native + @@species: 86.4 part 2 + @@species: 97.2 part 3 + @@species: 90.1 part 4 + @@species: 88.0 only stanford-crypto-pbkdf2 before: 86.4 native + @@species: 86.9 part 2 + @@species: 98.3 part 3 + @@species: 90.0 part 4 + @@species: 86.2 so, other test that runs before it affects so much. they're good for native ones, and bad for self-hosted ones.
Assignee | ||
Comment 35•8 years ago
|
||
(In reply to Tooru Fujisawa [:arai] from comment #32) > here's the performance characteristic with @@species patch. > both in native and self-hosted case, almost offset is applied. I meant, almost *same* offset.
Assignee | ||
Comment 36•8 years ago
|
||
placing other stanford-crypto-* test before stanford-crypto-pbkdf2 affects the time of stanford-crypto-pbkdf2. `sjcl` object is re-initialized at the top of each test, so functions are not shared directly. maybe GC related issue, or something related to shape?
Assignee | ||
Comment 37•8 years ago
|
||
https://github.com/h4writer/arewefastyet/blob/master/benchmarks/kraken/resources/sunspider-standalone-driver.js#L41 > for (var krakenCounter = 0; krakenCounter < tests.length; krakenCounter++) { > var testBase = "tests/" + suiteName + "/" + tests[krakenCounter]; > var testName = testBase + ".js"; > var testData = testBase + "-data.js"; > // load test data > load(testData); > var startTime = new Date; > load(testName); > times[krakenCounter] = new Date() - startTime; > gc(); <----- this one > } in the test harness, gc() is called after each test. when I increase the number of gc() call from 1 to 2, the score changes, and it's almost same as the score when I run stanford-crypto-pbkdf2 alone. stanford-crypto-pbkdf2 is affected by this so much, and when we call gc() twice, the Part 4 score is better than before. others are also affected but the difference are small.
Assignee | ||
Comment 38•8 years ago
|
||
changed gc(); to following to see the result of gc: print(gc()); print(gc()); and executed following command in arewefastyet/benchmarks/kraken: js -e "var tests = ['stanford-crypto-ccm', 'stanford-crypto-pbkdf2'], suitePath = 'test/kraken-1.1'; suiteName='kraken-1.1';" -f resources/sunspider-standalone-driver.js and the output is following: before 3809280, after 3805184 before 3809280, after 856064 before 958464, after 958464 before 962560, after 655360 { "stanford-crypto-ccm": 95, "stanford-crypto-pbkdf2": 86 } so, first gc() call after test does not reduced the memory usage.
Assignee | ||
Comment 39•8 years ago
|
||
terrence, what would be the reason of the difference between 1st and 2nd gc call?
Flags: needinfo?(terrence)
Assignee | ||
Comment 40•8 years ago
|
||
so far, looks like `gc()` call starts background sweeping, but it does not wait for the background sweeping it started. Also, it waits for already-running sweeping. So 2nd gc() call waits for the sweeping for 1st gc() call, and after 2nd gc() call returns, gcBytes is reduced. Is this expected behavior for gc() testing function? [main thread] gc(); // JavaScript | +- GC | +- JS::GCForReason | +- GCRuntime::gc | +- GCRuntime::collect | +- GCRuntime::gcCycle | +- waitBackgroundSweepEnd | | | +- state | | # returns IDLE on 1st gc() call | | # returns SWEEPING on 2nd gc() call | | | +- waitForBackgroundThread | # called only on 2nd gc() call | # after returned, gcBytes is reduced | +- GCRuntime::incrementalCollectSlice | +- GCRuntime::sweepPhase | +- GCRuntime::endSweepingZoneGroup | +- GCRuntime::queueZonesForBackgroundSweep | +- GCHelperState::maybeStartBackgroundSweep | +- GCHelperState::startBackgroundThread(SWEEPING) # starts background sweeping [background thread] HelperThread::threadLoop() | +- HelperThread::handleGCHelperWorkload | +- GCHelperState::work | +- GCHelperState::doSweep | +- sweepBackgroundThings | +- LifoAlloc::freeAll # take some time on 1st call, and 1st gc() returns while # LifoAlloc::freeAll is running
Comment 41•8 years ago
|
||
Yes, this is the current behavior of direct calls to gc(). GC invoked by the system is smart enough to skip the second GC if the first is still going, but direct calls to gc() from script is generally an indication that the invoker wants to verify that some piece of memory is gone, so we have to be thorough. I am currently improving this situation in bug 1249367.
Flags: needinfo?(terrence)
Assignee | ||
Comment 42•8 years ago
|
||
thanks, if bug 1249367 changes the gc() function behavior to wait for the GC it starts, I can just wait for it. I'll look into stanford-crypto-ccm next, as it's still slower even if I call gc() twice.
Assignee | ||
Comment 43•8 years ago
|
||
mmm, still the order in the LIST and the number of gc() affect the test result when Part 1-4 are applied. seems that there's yet another reason.
Assignee | ||
Comment 44•8 years ago
|
||
So far, each function is Ion compiled different times, depending on whether stanford-crypto-pbkdf2 is executed alone or after stanford-crypto-ccm. for example, when stanford-crypto-pbkdf2 is executed after stanford-crypto-ccm, sjcl.misc.pbkdf2 is compiled 5 times (compared to 3 times). maybe, ArrayConcat is executed under Interpreter more times when stanford-crypto-pbkdf2 is executed after stanford-crypto-ccm? I will look into it.
Assignee | ||
Comment 45•8 years ago
|
||
time consumed by ArrayConcat itself is not so different stanford-crypto-pbkdf2 only: 11.5 [ms] stanford-crypto-pbkdf2 after stanford-crypto-ccm: 12.8 [ms] anyway, will try figuring out why the Ion code is discarded 2 more times.
Assignee | ||
Comment 46•8 years ago
|
||
testing with Parts 1-4 applied. 1. the performance difference disappears when I rename all `sjcl` to `sjcl2` in stanford-crypto-pbkdf2.js and stanford-crypto-pbkdf2-data.js 2. I see both stanford-crypto-pbkdf2-data.js and stanford-crypto-ccm-data.js filenames in single iongraph when I execute following js -e "var tests = ['stanford-crypto-ccm', 'stanford-crypto-pbkdf2'], suitePath = 'test/kraken-1.1'; suiteName='kraken-1.1';" -f resources/sunspider-standalone-driver.js looks like, there `sjcl.codec.utf8String.toBits` and `sjcl.bitArray.P` in stanford-crypto-ccm-data.js are called inside some function in stanford-crypto-pbkdf2-data.js (not yet investigated what the function is tho)
Assignee | ||
Comment 47•8 years ago
|
||
1. the performance difference in stanford-crypto-pbkdf2 between running alone and running with another tests gets somehow different with bug 1252228 patch, Part 1+2+3+4, pbkdf2 alone: 88.1 [ms] with bug 1252228 patch, Part 1+2+3+4, pbkdf2 after ccm: 83.2 [ms] now stanford-crypto-pbkdf2 is faster when stanford-crypto-ccm is executed before it. but now the score doesn't depend on the number of gc() call, nor whether running in different global or not. 2. the patch in bug 1252228 makes stanford-crypto-pbkdf2 faster even without this bug's patches so, now we have yet another performance regression issue for stanford-crypto-pbkdf2, and pre-existing regression for stanford-crypto-ccm, from m-i (bug 1252228 patch patch applied).
Assignee | ||
Comment 48•8 years ago
|
||
looks like, inlineNewArrayTryReuseGroup fails several times because it tries to optimize too much. it expects the argument to be always same, but Array#concat can be called with different groups. maybe, I have to fallback to std_Array's inlining in that case.
Assignee | ||
Comment 49•8 years ago
|
||
Here's current WIP patches, including extra Part 5 that changes inlining of NewArrayTryReuseGroup to fallback to std_Array's one, in case there are multiple groups for template.
Assignee | ||
Comment 50•8 years ago
|
||
Also, here's what I'm testing now with stanford-crypto-ccm. the difference between Part2 and Part3 that impacts the performance in stanford-crypto-ccm is something like following. following 3 version of ArrayConcat are the simplified Array#concat that works with stanford-crypto-ccm. That is, fast-path used in Part 3 (works with array.concat(array)). The difference between them is how to get 1st argument. The score regresses with latter 2 cases about 10ms. // fast function ArrayConcat(arg1) { var i = 0; i *= 1; var E = arguments[i]; var len1 = this.length; var len2 = E.length; var A = std_Array(len1 + len2); for (var n = 0; n < len1; n++) { if (n in this) _DefineDataProperty(A, n, this[n]); } for (var k = 0; k < len2; k++) { if (k in E) _DefineDataProperty(A, n, E[k]); n++; } return A; } // slow function ArrayConcat(arg1) { var i = 0; var E = arguments[i]; var len1 = this.length; var len2 = E.length; var A = std_Array(len1 + len2); for (var n = 0; n < len1; n++) { if (n in this) _DefineDataProperty(A, n, this[n]); } for (var k = 0; k < len2; k++) { if (k in E) _DefineDataProperty(A, n, E[k]); n++; } return A; } // also slow function ArrayConcat(arg1) { var E = arg1; var len1 = this.length; var len2 = E.length; var A = std_Array(len1 + len2); for (var n = 0; n < len1; n++) { if (n in this) _DefineDataProperty(A, n, this[n]); } for (var k = 0; k < len2; k++) { if (k in E) _DefineDataProperty(A, n, E[k]); n++; } return A; } Then, here's the minimal testcase that seems to be almost same situation as stanford-crypto-ccm's regression. the script takes single argument, when it's "A", it uses ArrayConcat_A, that corresponds to slow one above. and when it's others like "B", it uses ArrayConcat_B, that corresponds to fast one above. "use strict"; function ArrayConcat_A() { var i = 0; arguments[i]; for (var n = 0; n < 0; n++) { n in this; } } function ArrayConcat_B() { var i = 0; i *= 1; arguments[i]; for (var n = 0; n < 0; n++) { n in this; } } function f() { for (var i = 0; i < 10000; i++) { [].my_concat([]); for (var n = 0; n < 0; n++) [].slice(); } } if (scriptArgs[0] == "A") Array.prototype.my_concat = ArrayConcat_A; else Array.prototype.my_concat = ArrayConcat_B; var T = elapsed(); f(); print(elapsed() - T); Then, when I look the iongraph, ArrayConcat_A is inlined but ArrayConcat_B is not inlined. so I think the difference comes from inlining, and the reason is the way I used to access arguments. Then, when I increase the number of the loop (like i < 100000), the score changes and then ArrayConcat_A gets faster. and now I stuck here.
Assignee | ||
Comment 51•8 years ago
|
||
fixed Part 4 to use dedicated temp register instead of output register that can alias input.
Attachment #8716587 -
Attachment is obsolete: true
Assignee | ||
Comment 52•8 years ago
|
||
Now Kraken score gets better by checking IsPackedArray and skip `k in E` in that case (Part 2.1), thanks to jandem :) Part 2.2 is just for some possible cleanup. Part3-5 in previous posts are not used. Here're Kraken and Octane performance comparison, thanks to bbouvier :) Kraken benchmark score (average of 10 runs) http://bnjbvr.github.io/simplegraph/#title=Kraken%20benchmark%20score%20%28average%20of%2010%20runs%29&categories=audio-fft,stanford-crypto-pbkdf2,audio-beat-detection,stanford-crypto-ccm,imaging-darkroom,json-parse-financial,audio-oscillator,ai-astar,audio-dft,stanford-crypto-sha256-iterative,json-stringify-tinderbox,imaging-gaussian-blur,stanford-crypto-aes,imaging-desaturate&values=before%2047.8%2074.6%2088%2074.9%2081.6%2036.4%2063.8%2065.8%20109.2%2039.2%2041%2064.2%2051.8%2053%0Apart2%2046.4%2092.6%2089.4%2076.7%2082.1%2036.8%2064.2%2065.6%20109.2%2039.3%2041%2064.3%2051.3%2053.6%0Apart2.1%2046.6%2081.6%2088.3%2074.9%2081.9%2036.6%2064.1%2065.7%20109%2036.4%2038.8%2064.4%2051.5%2053.3%0Apart2.2%2046.6%2081.8%2088.6%2075%2082%2036.6%2063.9%2065.6%20109.4%2036.4%2038.7%2064.4%2051.5%2053.2 Octane benchmark score (average of 10 runs) http://bnjbvr.github.io/simplegraph/#title=Octane%20benchmark%20score%20%28average%20of%2010%20runs%29&categories=Richards,DeltaBlue,Crypto,RayTrace,EarleyBoyer,RegExp,Splay,SplayLatency,NavierStokes,PdfJS,Mandreel,MandreelLatency,Gameboy,CodeLoad,Box2D,zlib,Typescript,Score&values=before%2032093%2064102%2030975%20117954%2032595%203880%2017804%2012523%2037782%2014402%2031255%2034548%2052844%2017265%2057200%2080568%2031340%2030397%0Apart2%2031867%2065302%2030837%20116592%2032310%203826%2017989%2013004%2037871%2014717%2031203%2033782%2053609%2017144%2057031%2080819%2031307%2030444%0Apart2.1%2032139%2065006%2030676%20116296%2032513%203876%2019411%2018949%2038127%2014807%2031053%2037245%2050540%2017517%2057930%2080317%2031216%2031416%0Apart2.2%2032204%2064728%2030976%20116430%2032611%203890%2018791%2017367%2038041%2014796%2030792%2038123%2050104%2017412%2057067%2080691%2031729%2031222 Kraken stanford-crypto-ccm is as fast as today's m-c, and performance regresion in stanford-crypto-pbkdf2 is about 9% (+7ms). then, now Octane SplayLatency and MandreelLatency show somewhat strange regression :/ following are SplayLatency result for each run, with part2.1: 13665 21148 21168 13144 13569 21102 21596 21513 21556 21026 and same for MandreelLatency: 38947 38947 38947 37767 37483 38348 37767 28984 37203 38055 sometimes they show nice score. Also, following are SplayLatency result with m-c: 13224 12421 12574 12452 12379 11439 12123 12914 12839 12862 and same for MandreelLatency: 38947 28487 28816 28165 37483 38947 38055 28984 38645 38947 SplayLatency doesn't show score around 21000 with m-c, so this will be something I have to investigate next. MandreelLatency shows around 28000 and 38000 with both m-c and part2.1, but there seems to be some difference in probability (maybe I have to check with more runs)
Assignee | ||
Comment 53•8 years ago
|
||
again, it looks like a issue depends on executing order. SplayLatency score for 20 runs, runing splay.js only 11699 12216 11524 16435 11666 15628 11565 12836 12246 11930 12039 11977 11938 11728 11779 11591 12053 11961 12594 11645 SplayLatency score for 20 runs, runing splay.js after regexp.js 18539 18684 13321 14097 14019 12608 21694 18504 13918 14068 13870 14162 14233 13706 14287 18777 22132 13793 14151 21276
Assignee | ||
Comment 54•8 years ago
|
||
just confirmed `SplayLatency: 22321` on m-c, so both SplayLatency and MandreelLatency seem to be the issue of the possibility.
Assignee | ||
Comment 55•8 years ago
|
||
scratch that. I totally mis-read the Octane results. so, they shows bigger numbers when applying part 2.1, it means that it improves the performance.
Assignee | ||
Comment 56•8 years ago
|
||
Attachment #8707459 -
Attachment is obsolete: true
Attachment #8715141 -
Attachment is obsolete: true
Attachment #8716585 -
Attachment is obsolete: true
Attachment #8716586 -
Attachment is obsolete: true
Attachment #8716588 -
Attachment is obsolete: true
Attachment #8716592 -
Attachment is obsolete: true
Attachment #8716593 -
Attachment is obsolete: true
Attachment #8716597 -
Attachment is obsolete: true
Attachment #8721042 -
Attachment is obsolete: true
Attachment #8721045 -
Attachment is obsolete: true
Attachment #8721609 -
Attachment is obsolete: true
Attachment #8724434 -
Attachment is obsolete: true
Attachment #8725172 -
Attachment is obsolete: true
Attachment #8725579 -
Attachment is obsolete: true
Attachment #8725604 -
Attachment is obsolete: true
Attachment #8725669 -
Attachment is obsolete: true
Attachment #8725746 -
Flags: review+
Assignee | ||
Comment 57•8 years ago
|
||
folded Part 2.1 and 2.2, and updated spec date.
Attachment #8725747 -
Flags: review?(efaustbmo)
Assignee | ||
Comment 58•8 years ago
|
||
sorry, forgot to revert unnecessary reorder for Step 2.
Attachment #8725747 -
Attachment is obsolete: true
Attachment #8725747 -
Flags: review?(efaustbmo)
Attachment #8725770 -
Flags: review?(efaustbmo)
Assignee | ||
Comment 59•8 years ago
|
||
Comment on attachment 8725770 [details] [diff] [review] Part 2: Self-host Array.prototype.concat. sorry again I should've checked IsArray (or at least IsObject) before IsPackedArray, as IsPackedArray accepts only object. will check jstests/jit-test first and then post fixed patch
Attachment #8725770 -
Flags: review?(efaustbmo)
Assignee | ||
Comment 60•8 years ago
|
||
Moved IsPackedArray to IsArray branch, as we will change the IsArray call there to IsConcatSpreadable, and the difference is only in the loop there. IsConcatSpreadable also ensures the value is an object, so it can be just replaced. Green on try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=8cb1f5318301
Attachment #8725770 -
Attachment is obsolete: true
Attachment #8725896 -
Flags: review?(efaustbmo)
Comment 61•8 years ago
|
||
Comment on attachment 8725896 [details] [diff] [review] Part 2: Self-host Array.prototype.concat. Review of attachment 8725896 [details] [diff] [review]: ----------------------------------------------------------------- Looks good! ::: js/src/builtin/Array.js @@ +876,5 @@ > + // Step 1. > + var O = ToObject(this); > + > + // Step 2. > + var A = std_Array(0); This will go to ArraySpeciesCreate(0,O) in the other bug I'm reviewing, I assume. @@ +881,5 @@ > + > + // Step 3. > + var n = 0; > + > + // Step 4 (skipped). nit: prefer (implicit in |arguments|), or something ::: js/src/tests/ecma_3/Array/regress-322135-02.js @@ +1,1 @@ > +// |reftest| skip -- slow Do we will need to skip this? Do we want or need a followup bug to fix this? ::: js/src/vm/Xdr.h @@ +35,4 @@ > static const uint32_t XDR_BYTECODE_VERSION = > uint32_t(0xb973c0de - XDR_BYTECODE_VERSION_SUBTRAHEND); > > +static_assert(JSErr_Limit == 446, I've had to review this for a long time, I guess. Luckily, we don't need this anymore
Attachment #8725896 -
Flags: review?(efaustbmo) → review+
Assignee | ||
Comment 62•8 years ago
|
||
Thank you for reviewing :D (In reply to Eric Faust [:efaust] from comment #61) > Comment on attachment 8725896 [details] [diff] [review] > Part 2: Self-host Array.prototype.concat. > > Review of attachment 8725896 [details] [diff] [review]: > ----------------------------------------------------------------- > > Looks good! > > ::: js/src/builtin/Array.js > @@ +876,5 @@ > > + // Step 1. > > + var O = ToObject(this); > > + > > + // Step 2. > > + var A = std_Array(0); > > This will go to ArraySpeciesCreate(0,O) in the other bug I'm reviewing, I > assume. Yes :) > ::: js/src/tests/ecma_3/Array/regress-322135-02.js > @@ +1,1 @@ > > +// |reftest| skip -- slow > > Do we will need to skip this? Do we want or need a followup bug to fix this? Yes, we need to skip this for now. It's related to bug 1234947. not sure if the bug can be fixed easily. the test is slow because of `4294967295` is not int32_t, and Ion cannot use Int32 in the loop in concat. > var length = 4294967295; > var array1 = new Array(length);
Assignee | ||
Comment 63•8 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/17a5a95bf6cc6ddf4251dec76201baf20bbd3452 Bug 1233642 - Part 1: Add IsArray intrinsic. r=efaust https://hg.mozilla.org/integration/mozilla-inbound/rev/b535cc24f7d0b2703a43cf43fa371c6087dbb5e4 Bug 1233642 - Part 2: Self-host Array.prototype.concat. r=efaust
Assignee | ||
Comment 64•8 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/bc9e586a8f1944d87e53f601d41ea860b272a3bc Backed out changeset b535cc24f7d0 (bug 1233642) https://hg.mozilla.org/integration/mozilla-inbound/rev/203c7e4b8b20e9912f330ac630520508ec1e2f98 Backed out changeset 17a5a95bf6cc (bug 1233642)
Assignee | ||
Comment 65•8 years ago
|
||
In Part 2, Array#concat was changed from native to self-hosted, and it causes that xray exposes Array.concat generic. looks like other self-hosted generic are also not in the list to skip, so I think I can remove concat from there too. how do you think?
Attachment #8736028 -
Flags: review?(efaustbmo)
Comment 66•8 years ago
|
||
Comment on attachment 8736028 [details] [diff] [review] Part 2.1: Exclude concat from Array ctor property skip list in xray. Review of attachment 8736028 [details] [diff] [review]: ----------------------------------------------------------------- Forwarding to an xpconnect peer.
Attachment #8736028 -
Flags: review?(efaustbmo) → review?(bobbyholley)
Updated•8 years ago
|
Attachment #8736028 -
Flags: review?(bobbyholley) → review+
Assignee | ||
Comment 67•8 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/ef8f61469172c5f62bf990e1b38f5c25526ce3a0 Bug 1233642 - Part 1: Add IsArray intrinsic. r=efaust https://hg.mozilla.org/integration/mozilla-inbound/rev/70e78d669f9de949dcbe972f74742045ca094fae Bug 1233642 - Part 2: Self-host Array.prototype.concat. r=efaust,bholley
Comment 68•8 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/ef8f61469172 https://hg.mozilla.org/mozilla-central/rev/70e78d669f9d
Status: NEW → RESOLVED
Closed: 8 years ago
status-firefox48:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla48
You need to log in
before you can comment on or make changes to this bug.
Description
•