Pretenure thisObject allocations for scripted JSOp::New
Categories
(Core :: JavaScript: GC, task, P2)
Tracking
()
| Tracking | Status | |
|---|---|---|
| firefox149 | --- | fixed |
People
(Reporter: iain, Assigned: iain)
References
(Blocks 1 open bug)
Details
(Whiteboard: [js-perf-next])
Attachments
(4 files)
In bug 1894012, Jon added logging to help find cases where we might want to add allocation sites. I enabled the logging for a run of JS3 to track how often we had missing allocation sites that would have been marked as LongLived if they existed.
Looking at the results and bucketing by op, a clear pattern emerges:
1 RecreateLexi
2 AsyncResolve
2 GlobalOrEval
4 Generator
5 NewContent
8 FunWithProto
8 ObjWithProto
19 SpreadSuperC
24 PushLexicalE
47 AsyncAwait
97 Call
169 SuperCall
1284 New
The most frequent op is JSOp::New, making up 3/4 of all candidates on its own. The next most frequent is JSOp::SuperCall, which is effectively the same thing. Together, they make up 87% of missing-but-pretenurable alloc sites.
We currently don't use alloc sites when allocating the this object for scripted constructors. When we do so in baseline, it's part of a complicated dance where we generate CacheIR that looks like this:
... (guards omitted)
MetaScriptedThisShape thisShapeOffset 40
CallScriptedFunction calleeId 1, argcId 0, flags (format 1, isConstructing, isSameRealm), argcFixed 0
CallScriptedFunction is generic, and is used by all sorts of calls (including non-constructor calls). If the isConstructing flag is set, then it will call CreateThisFromIC. MetaScriptedThisShape doesn't generate any code in baseline ICs, but tells the CacheIR transpiler what shape to use. IIRC, this was all part of an attempt to avoid bloating the set of fields for CallScriptedFunction.
We could probably simplify this by splitting out a CallKnownScriptedConstructor op with a shape field, and eliminating MetaScriptedThisShape. Then we could put an alloc site field in that op as well, and use it in BaselineCacheIRCompiler::createThis.
| Assignee | ||
Comment 1•10 months ago
|
||
| Assignee | ||
Comment 2•10 months ago
•
|
||
Based on the log I've attached, it looks like typescript has the most missing pretenuring. If I run a local comparison report on that benchmark, I do see minor GC as the most important bucket, with a 40x difference in time. WSL also seems to have quite a bit, and a comparison report there again shows that minor GC is the top bucket with a 20x difference.
We probably won't close the entire gap by pretenuring, but that's still enough evidence to mark this as perf-next.
js-perf-next: Add alloc sites to scripted constructor calls in baseline, and use this information to pretenure them. Bug 1898270 is a recent example of adding alloc sites for lambdas. As described above, the tricky part with this patch is that the thisObject is allocated by this code, which is inside shared code used by a variety of different calls. (In particular, both CallScriptedFunction and CallBoundScriptedFunction can reach this point.) Right now, information about the this object is stored in a preceding MetaScriptedThisShape op that generates no code, but is used in the transpiler to allocate this.
I think there are two reasonable options for making this work. One, as described above, is to add a new CallKnownScriptedConstructor op that would replace the MetaScriptedThisShape/CallScriptedFunction pair. (This might also need a CallKnownBoundScriptedConstructor op to handle the bound function case?) The second option, easier but more of a hack, would be to add an alloc site to MetaScriptedThisShape (and maybe rename it), rewrite CacheIRCompiler::emitMetaScriptedThis to store an allocSite offset internally in the CacheIRCompiler, and then use that allocsite offset if it exists in BaselineCacheIRCompiler::createThis to load the alloc site out of the stub fields and use it.
| Assignee | ||
Comment 3•10 months ago
|
||
(Additional evidence of value: Jon's SP3 numbers here show that JSOp::New was the second most common op for missed pretenuring, second only to JSOp::Lambda (which has already been fixed).
Updated•9 months ago
|
Comment 4•7 months ago
|
||
I previously started work on this in bug 1740277 but got stuck.
| Assignee | ||
Comment 5•7 months ago
|
||
Looking at that patch, I think the two options I described in comment 2 should both avoid the problem of doing a VM call too early.
| Assignee | ||
Comment 6•1 month ago
|
||
I based the pretenuring parts of this on bug 1898270.
The name of the CacheIR op is no longer entirely accurate. I rename it in the next patch in preparation for adding more information to allow allocating this in jitcode.
Updated•1 month ago
|
| Assignee | ||
Comment 7•1 month ago
|
||
| Assignee | ||
Comment 8•1 month ago
|
||
Comment 9•1 month ago
|
||
These patches show a significant improvement on the splay benchmark in the shell due to greatly reduced minor GC time.
macOS: +8% splay, +17% splay latency
Linux: +5% splay, +11% splay latency
Comment 10•1 month ago
|
||
Comment 11•1 month ago
|
||
| bugherder | ||
https://hg.mozilla.org/mozilla-central/rev/aa4d5a227c41
https://hg.mozilla.org/mozilla-central/rev/151e9a5ec7fc
https://hg.mozilla.org/mozilla-central/rev/c4a1fc8dffc7
Comment 12•1 month ago
|
||
There are a bunch of performance improvements on splay, likely from this change:
https://treeherder.mozilla.org/perfherder/graphs?highlightAlerts=1&highlightChangelogData=0&highlightCommonAlerts=0&replicates=0&series=autoland,5441044,1,13&timerange=1209600&zoom=1770114444340,1770195061849,193.81796907956743,560.9543095979242
https://treeherder.mozilla.org/perfherder/graphs?timerange=1209600&series=autoland,5441043,1,13
https://treeherder.mozilla.org/perfherder/graphs?timerange=1209600&series=autoland,5441042,1,13
Description
•