Object.keys can be 3x slower in SM
Categories
(Core :: JavaScript Engine: JIT, defect, P2)
Tracking
()
People
(Reporter: jrmuizel, Unassigned)
References
(Blocks 1 open bug)
Details
(Whiteboard: [sp3])
Attachments
(2 files, 2 obsolete files)
1.11 KB,
application/x-javascript
|
Details | |
1.97 KB,
patch
|
Details | Diff | Splinter Review |
Object.keys performance shows up in the heavily used shallowEqual
from React. This is a microbenchmark that attempts to explore the performance characteristics in different engines.
if (!globalThis.performance) {
globalThis.performance = { now: () => preciseTime()*1000 }
}
function f(o) {
return Object.keys(o)
}
function bench(o) {
for (var i = 0; i<1000000; i++) {
f(o);
}
}
o = {about: 5,
ballisitic: 6,
cakebread: 8,
dalespeople: 9,
evilproof: 20,
fairgoing: 30,
gargoylish: 2,
harmonici: 1,
jinniwink: 12,
kaleidoscopical: 2,
labellum: 1,
macadamization: 4,
neutrino: 1,
observership: 0,
quadratomandibular: 9,
rachicentesis: 1,
saltcat: 0,
trousseau: 1,
view: 10,
wheelbox: 2,
xerography: 1,
yez: 3
}
var start = performance.now();
bench(o)
var end = performance.now();
print(`took ${end - start}ms`);
$ js --ion-inlining=off object-keys.js
took 305.216064453125ms
$ jsc --maximumInliningDepth=1 ~/tools/jsbench/object-keys.js
took 10.8701171875ms
d8 --no-turbo-inlining object-keys.js
took 26.507ms
Increasing the number of properties in o
makes Spidermonkey and V8 slower but does not seem to have an impact on JSC.
Reporter | ||
Comment 1•2 years ago
|
||
I suspect https://bugs.webkit.org/show_bug.cgi?id=190047 is responsible for most of JSC's speed
Updated•2 years ago
|
Updated•2 years ago
|
Updated•2 years ago
|
Updated•2 years ago
|
Comment 2•2 years ago
|
||
Profiling the state of this right now, a surprising amount of time is spent growing the properties vector: https://share.firefox.dev/3P7usbM
If I arbitrarily reserve 100 elements before running the benchmark, I can speed it up by 37%:
$ hyperfine '/home/matthew/unified/obj-opt-shell-nodebug-x86_64-pc-linux-gnu/dist/bin/js keys-benchmark.js' 'PREALLOC=1 /home/matthew/unified/obj-opt-shell-nodebug-x86_64-pc-linux-gnu/dist/bin/js keys-benchmark.js'
Benchmark 1: /home/matthew/unified/obj-opt-shell-nodebug-x86_64-pc-linux-gnu/dist/bin/js keys-benchmark.js
Time (mean ± σ): 351.2 ms ± 4.3 ms [User: 352.6 ms, System: 9.8 ms]
Range (min … max): 346.0 ms … 359.0 ms 10 runs
Benchmark 2: PREALLOC=1 /home/matthew/unified/obj-opt-shell-nodebug-x86_64-pc-linux-gnu/dist/bin/js keys-benchmark.js
Time (mean ± σ): 255.8 ms ± 1.2 ms [User: 255.8 ms, System: 9.6 ms]
Range (min … max): 253.8 ms … 257.3 ms 11 runs
Summary
'PREALLOC=1 /home/matthew/unified/obj-opt-shell-nodebug-x86_64-pc-linux-gnu/dist/bin/js keys-benchmark.js' ran
1.37 ± 0.02 times faster than '/home/matthew/unified/obj-opt-shell-nodebug-x86_64-pc-linux-gnu/dist/bin/js keys-benchmark.js'
AFAICT accurately pre-counting is slightly grosser -- seems like a worthwhile patch regardless, but I'll put that in another bug (and we can talk about accurate pre-counting or if we just want to use a magic constant)
Comment 3•2 years ago
|
||
It's worth checking how common it is to have more than 8 keys (the number of elements the vector can store inline I think). I added some logging a few weeks ago and IIRC the vast majority of calls had < 8 keys, so this benchmark might not be super representative of most of the actual calls.
Comment 4•2 years ago
|
||
(That said, increasing from 8 to 16 is an easy improvement and probably worth it!)
Reporter | ||
Comment 5•2 years ago
|
||
(In reply to Jan de Mooij [:jandem] from comment #3)
It's worth checking how common it is to have more than 8 keys (the number of elements the vector can store inline I think). I added some logging a few weeks ago and IIRC the vast majority of calls had < 8 keys, so this benchmark might not be super representative of most of the actual calls.
Yeah, the main purpose of this benchmark is to show that JSC caches the results of Object.keys() and takes O(1) time on objects with cached Object.keys() instead of O(number of keys) like Spidermonkey
Comment 6•2 years ago
|
||
Oh yeah, that's totally understood -- just was a notable problem when looking at a profile and a quick thing to potentially fix.
Comment 7•2 years ago
•
|
||
So, one hypothesized road forward here is the consumption of cached native iterators created by for-in
.
I have built a very preliminary prototype patch that does exactly this to evaluate the potential win we could see here. It's built on top of the patch in Bug 1838529 to avoid the reallocation issues we'd seen elsewhere.
In this prototype patch we directly consume the for-in
cache if it exists; so the microbenchmark on this bug was tweaked to walk the object once with 'for-in' to prime that cached iterator.
The prototype patch is not sound since we don't handle prototype properties at all; it merely says "if there's a pre-existing for-in iterator, put those properties in, then create the keys array." In this sense, this is almost the best case scenario we can hope for performance wise.
The results are... underwhelming.
$ hyperfine '/home/matthew/unified/obj-opt-shell-nodebug-x86_64-pc-linux-gnu/dist/bin/js keys-benchmark.js' 'ENABLED=1 /home/matthew/unified/obj-opt-shell-nodebug-x86_64-pc-linux-gnu/dist/bin/js keys-benchmark.js'
Benchmark 1: /home/matthew/unified/obj-opt-shell-nodebug-x86_64-pc-linux-gnu/dist/bin/js keys-benchmark.js
Time (mean ± σ): 263.8 ms ± 1.0 ms [User: 265.3 ms, System: 7.6 ms]
Range (min … max): 262.4 ms … 265.6 ms 11 runs
Benchmark 2: ENABLED=1 /home/matthew/unified/obj-opt-shell-nodebug-x86_64-pc-linux-gnu/dist/bin/js keys-benchmark.js
Time (mean ± σ): 221.7 ms ± 2.9 ms [User: 221.3 ms, System: 8.0 ms]
Range (min … max): 217.7 ms … 228.9 ms 13 runs
Summary
'ENABLED=1 /home/matthew/unified/obj-opt-shell-nodebug-x86_64-pc-linux-gnu/dist/bin/js keys-benchmark.js' ran
1.19 ± 0.02 times faster than '/home/matthew/unified/obj-opt-shell-nodebug-x86_64-pc-linux-gnu/dist/bin/js keys-benchmark.js'
So a 20% improvement on an already synthetic microbenchmark is all this gives us. Furthermore, we have more work to do in order to filter out prototype properties (fortunately, I think this is comparatively easy -- we can divide the range that comes out of our enumerators into 'own properties' and 'proto properties', so with a single index/offset we can disambiguate own vs proto properties).
Edit: If anyone is interested, here's a profile of the optimized case
Comment 8•2 years ago
•
|
||
Note that most objects don't have enumerable prototype properties. If the iterator's indicesState is Valid
or AvailableOnRequest
, then we don't have to filter anything out.
I would guess that a lot of the remaining time is being spent allocating the array and running GC. We could maybe do scalar replacement, but that's not entirely trivial. Maybe we turn Object.keys
into MObjectToIterator
and then beef up the set of consumers that can do something with an iterator?
Actually, one thing I notice is that this microbenchmark doesn't actually use the result of Object.keys. It might be the case that V8 and JSC are doing some sort of dead code elimination here.
Edit: wait, scratch that, this is being tested with no inlining. So the result escapes, and can't be optimized out. I guess JSC has a fast-path for allocation?
Comment 9•2 years ago
•
|
||
So... if we essentially copy JSC's optimization called by Jeff in Comment 1, we can get a lot faster...
Benchmark 1: /home/matthew/unified/obj-opt-shell-nodebug-x86_64-pc-linux-gnu/dist/bin/js keys-benchmark.js
Time (mean ± σ): 422.8 ms ± 1.3 ms [User: 339.0 ms, System: 95.0 ms]
Range (min … max): 421.6 ms … 425.9 ms 10 runs
Benchmark 2: ENABLED=1 /home/matthew/unified/obj-opt-shell-nodebug-x86_64-pc-linux-gnu/dist/bin/js keys-benchmark.js
Time (mean ± σ): 57.0 ms ± 0.3 ms [User: 53.9 ms, System: 4.3 ms]
Range (min … max): 56.3 ms … 57.9 ms 51 runs
Summary
'ENABLED=1 /home/matthew/unified/obj-opt-shell-nodebug-x86_64-pc-linux-gnu/dist/bin/js keys-benchmark.js' ran
7.41 ± 0.05 times faster than '/home/matthew/unified/obj-opt-shell-nodebug-x86_64-pc-linux-gnu/dist/bin/js keys-benchmark.js'`
This patch is -extremely- not production quality, but the general gist is that it rides on the already existing ShapeCachePtr and stores the previously computed set of properties, and births a new array from these values at a cache hit (this suggests that array allocation isn't ultimately the long pole here as well).
If we wanted to productize the patch, there's a bunch of work we'd need to do
- Adding a new cache type to the ShapePtrCache means that when one cache is active, others don't attach. This conflict likely needs to be resolved in some fashion. If I modify my benchmark to bring back the 'for-in' activating loop mentioned in Comment 7, now this patch provides zero speedup, because we can't attach the cache, since the iterator cache is occupying those bits. (Perhaps the ShapePtrCache needs to instead key into a data table -- IIRC that was JSC's approach)
- This would need some thorough evaluation from performance and memory usage. The above makes it likely very easy to get this wrong. The flush-cache-on-gc thing seems sensible tho, and would limit the potential harms a bit.
- The code itself is definitely not production quality (yee-haw mallocing and almost certainly leaking memory).
Edit: oh and I forgot. most importantly, this needs to be made sound! We need to figure out how to handle indexed properties being added and ensure we flush the cache when that happens.
Comment 10•2 years ago
|
||
Returning to the approach of using the for-in
cache, except this time specializing array construction to consume the property names directly, I've got it up and running and passing jstests and jit-tests.
$ hyperfine 'DISABLED=1 /home/matthew/unified/obj-opt-shell-nodebug-x86_64-pc-linux-gnu/dist/bin/js keys-benchmark.js' 'ENABLED=1 /home/matthew/unified/obj-opt-shell-nodebug-x86_64-pc-linux-gnu/dist/bin/js
keys-benchmark.js'
Benchmark 1: DISABLED=1 /home/matthew/unified/obj-opt-shell-nodebug-x86_64-pc-linux-gnu/dist/bin/js keys-benchmark.js
Time (mean ± σ): 256.6 ms ± 1.3 ms [User: 255.4 ms, System: 10.6 ms]
Range (min … max): 254.9 ms … 259.5 ms 11 runs
Benchmark 2: ENABLED=1 /home/matthew/unified/obj-opt-shell-nodebug-x86_64-pc-linux-gnu/dist/bin/js keys-benchmark.js
Time (mean ± σ): 91.7 ms ± 0.8 ms [User: 88.9 ms, System: 5.7 ms]
Range (min … max): 90.3 ms … 93.8 ms 32 runs
Summary
'ENABLED=1 /home/matthew/unified/obj-opt-shell-nodebug-x86_64-pc-linux-gnu/dist/bin/js keys-benchmark.js' ran
2.80 ± 0.03 times faster than 'DISABLED=1 /home/matthew/unified/obj-opt-shell-nodebug-x86_64-pc-linux-gnu/dist/bin/js keys-benchmark.js'
So now we're 2.8x faster rather than the previous 20% faster.
I'm going to get some SP3 results from try before cleaning these patches up.
Comment 11•2 years ago
|
||
Local testing of the shell SP3 TodoMVC/ReactRedux, React benchmarks showed no benefit to this optimization.
We'll see what a try dispatch shows.
Comment 12•2 years ago
•
|
||
On my machine, the patchset that you pushed to try makes shallowEqual
faster by 11%. (1716 -> 1529 samples), for an overall improvement of the TodoMVC-React-Redux subtest by 0.6%.
Safari only spends 500 samples in this function. Matching Safari's speed would improve the subtest by 3.8%.
Comment 13•2 years ago
|
||
Just for future reference; this optimization attempted on resources/todomvc/architecture-examples/react-redux/index.js
revision 9c41c606c4, has the following execution stats:
attempts 40414
successes 39803
rate 98.488148%
Next steps for me is to try to get back the rest of the performance by attempting to modify native iterator to store tagged values, then I'll put a patch stack up for review.
Comment 14•2 years ago
|
||
So in Comment 10 I suggested a 7x speedup was achievable with a more invasive caching system. That actually was a mistake on my part -- I neglected to notice that in that experiment that -disabled- state also got slower, which created illusory speedup.
A more accurate measure of what is possible would be this:
$ hyperfine 'DISABLED=1 /home/matthew/unified/obj-opt-shell-nodebug-x86_64-pc-linux-gnu/dist/bin/js keys-benchmark.js' 'ENABLED=1 /home/matthew/unified/obj-cache-list/dist/bin/js keys-benchmark.js'
Benchmark 1: DISABLED=1 /home/matthew/unified/obj-opt-shell-nodebug-x86_64-pc-linux-gnu/dist/bin/js keys-benchmark.js
Time (mean ± σ): 282.3 ms ± 22.6 ms [User: 284.3 ms, System: 7.5 ms]
Range (min … max): 272.0 ms … 346.0 ms 10 runs
Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
Benchmark 2: ENABLED=1 /home/matthew/unified/obj-cache-list/dist/bin/js keys-benchmark.js
Time (mean ± σ): 67.2 ms ± 1.0 ms [User: 64.1 ms, System: 4.3 ms]
Range (min … max): 65.7 ms … 70.0 ms 43 runs
Summary
'ENABLED=1 /home/matthew/unified/obj-cache-list/dist/bin/js keys-benchmark.js' ran
4.20 ± 0.34 times faster than 'DISABLED=1 /home/matthew/unified/obj-opt-shell-nodebug-x86_64-pc-linux-gnu/dist/bin/js keys-benchmark.js'
So without being as invasive, we can get to ~2.7x.
In Comment 13 I said I'd attempt to get the NativeIterator to store tagged values instead of untagged pointers. I did that; however it effectively didn't change the performance of the optimization.
I plan on cleaning up my version of this optimization that doesn't change the storage format of NativeIterator and submit that for review. Further optimization can come later.
Comment 15•2 years ago
|
||
Depends on D181004
Updated•2 years ago
|
Comment 16•2 years ago
|
||
Depends on D181411
Comment 17•2 years ago
|
||
Could the patches be attached to a different bug? They only address part of the performance difference so it would be better to keep this bug open until we've closed most of the gap.
Comment 18•2 years ago
|
||
Will do
Comment 19•2 years ago
|
||
Comment on attachment 9339901 [details]
Bug 1836679 - Allow creating and Array from an array of JSLinearStrings r?jandem
Revision D181411 was moved to bug 1839437. Setting attachment 9339901 [details] to obsolete.
Comment 20•2 years ago
|
||
Comment on attachment 9339902 [details]
Bug 1836679 - Use the for-in cache to improve performance on Object.keys r?jandem
Revision D181412 was moved to bug 1839437. Setting attachment 9339902 [details] to obsolete.
Comment 21•2 years ago
|
||
So, with the patches in Bug 1839437 landing, I thought it would be good to leave behind a profile after those patches land:
https://share.firefox.dev/3Nhc91c
In that profile, we're at 78% of the time inside of TryEnumerableOwnPropertiesNative being spent inside of js::NewDenseCopiedArray
Comment 23•1 years ago
|
||
At this point it needs to be re-triaged in terms of priority impact; we've gone from 30x worse to ~3x worse on Linux, and 2.3x worse on Mac; measurements made using the tweaked version of the benchmark from Comment #1 attached here.
The remaining fruit here may be hard to get.
I'm going to open one followup bug, which is that we really ought to have the ability to fill the cache for this situation without requiring the object do a for-of first; fundamentally that's why this version of the test case exists is to show the case where the cache hits. Empirically it does quite a lot in some of the targeted benchmarks, but this might help places where for of
wasn't used too.
Comment 24•1 years ago
|
||
Side note: even on the original benchmark, where the cache accessed by the patches in bug 1839437 aren't used, I measure our speed difference exactly as was measured in Comment 1 as 4.8x locally, not 30x. Presumably other things (like Jon's work on C++ nursery allocation) has substantially improved there as well.
Comment 25•1 years ago
|
||
A patch to observe the hit rate of the for-of cache use in Object.keys
Reporter | ||
Comment 26•1 years ago
|
||
Do we know whether react-todoMVC hits the for-of cache?
Comment 27•1 years ago
|
||
For react
, (running Speedometer/resources/todomvc/architecture-examples/react/index.js
) I see
- attempts 39802 successes 39800 rate 99.994975
For react-redux
, on the shell branch something has changed which has dropped the success rate (running Speedometer/resources/todomvc/architecture-examples/react-redux/index.js
)
For commit 9c41c606 (June 16):
- attempts 40414 successes 39801 rate 98.483199
For commit f339d7b1 (July 17):
- attempts 40414 successes 1 rate 0.002474
Curiously though, if I run react
(not -redux
) from that earlier commit, I see zero impact: attempts 39802 successes 0 rate 0.000000.
This instability really does raise the priority of Bug 1844394, as this is clearly sensitive to changes in the benchmark.
Comment 28•10 months ago
|
||
(Re-summaried to cover state of the world, and unassigning myself as I'm not pushing on this right now)
Description
•