Consider replacing self-hosted cloning with stencil-instantiate
Categories
(Core :: JavaScript Engine, task, P2)
Tracking
()
Tracking | Status | |
---|---|---|
firefox92 | --- | fixed |
People
(Reporter: tcampbell, Assigned: tcampbell)
References
(Blocks 1 open bug)
Details
(Keywords: perf-alert, Whiteboard: [overhead:230k])
Attachments
(12 files, 1 obsolete file)
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review |
The selfhosted-js mechanism in SpiderMonkey currently relies on cloning functions on demand into Realms that use APIs implemented by self-hosting. This relies on the brittle script-cloning mechanism (which is being removed in Bug 1662152).
The top-level self-hosted body is almost entirely a bunch of functions with a few decorators (like _SetCanonicalName
). These decorators can be resolved at parse time. Special care must be take about how to organize the stencil.
Another upside of this, is we can leverage Stencil capabilities in order to avoid needing to allocate the self-host realm entirely and minimize startup work, especially when the XDR cache file is available.
Assignee | ||
Updated•4 years ago
|
Assignee | ||
Comment 1•4 years ago
|
||
Removing the self-hosting realm/zone entirely now seems feasible. Here is a rough plan:
- Synthesize default constructors in parser
- Fix existing constructor synth code (eg. breakpoints and fun.name inference)
- Support lazy parsing of synthesized constructors
- Remove JSOp::ClassConstructor / DerivedConstructor
- Cleanup List / Record definitions in Utilites.js
- Maybe turn into simple wrappers around
std_Object_create
- Maybe unify with js::ListObject
- Remove
MakeConstructible
intrinsic
- Maybe turn into simple wrappers around
- Support both self-host name and spec name in stencil
- Repurpose
ScriptStencil::lazyFunctionEnclosingScopeIndex_
- Repurpose
- Generate special self-hosted top-level stencil
- Replace
var x = ...
with lambda similar to field initializer - Parser/BCE should decode "decorator" calls at parse time
_SetCanonicalName
_SetIsInlinableLargeFunction
- Custom stencil instantiation code to compute name->scriptId map. This is
not all that different fromGlobalOrEvalDeclInstantiation
. - No top-level bytecode is generated. This was only used for the
var
statements anyways which should now be turned into lambdas. - Either .js code should use IIFEs for each
var
or the parser should be
modified to do this for us.
- Replace
- Store the Stencil on JSRuntime
- This should be shareable with worker runtimes.
- Selfhosted.xdr and bytecode pinning will be very useful here.
GetIntrinsic
should run IIFE in target realm if there is one- Instantiate stencil (fragment) in target realm, call it, save result in
the normal intrinsic slot of the target realm. - This is not very different than existing clone-on-first-use-in-realm
behaviour. This will create a short-lived BaseScript, but Stencil makes
this impact neglible. - One caveat is that this may generate nursery objects temporarily and
small tweaks must be made in WarpSnapshot. Maybe there are tricks to use
here?
- Instantiate stencil (fragment) in target realm, call it, save result in
- Remove SelfHostingRealm/Zone
- With top-level execution removed and cloning-from-stencil, we can remove
the realm/compartment/zone entirely. - This also removes INIT_SELF_HOSTING GC from every process startup.
- With top-level execution removed and cloning-from-stencil, we can remove
- Optimize lookup of stencil fragment
- In old model, we use a JSAtom to lookup a property on the self-hosting
GlobalObject (which is in dictionary mode). - Store
ScriptIndex
instead of "ClonedName" on the JSFunction clones. - Consider using
TopLevelIndex
stencil's gcthing data to build a hashmap
of stencil fragments. This would allow cross-process sharing.
- In old model, we use a JSAtom to lookup a property on the self-hosting
Assignee | ||
Comment 2•4 years ago
|
||
Using IIFEs for self-hosted vars may be a bit extreme. There are not very vars and if the bytecode is tightened up it may be reasonable to run in each realm. The GlobalOrEvalDeclInstantiation
instruction actually avoids the need for binding bytecode for nearly all of the functions. A remaining issue here is that we must adopt the lazy-self-hosted-function mechanism to avoid pulling in function scripts. Additionally, the data tables seem to generate very verbose bytecode and we should see if ObjLiteral can help.
If the trade-off of always running this (reduced) bytecode is reasonable, this is a simpler solution for both implementation and developers.
Assignee | ||
Comment 3•4 years ago
|
||
Ugly prototype seems to work: https://treeherder.mozilla.org/jobs?repo=try&group_state=expanded&revision=9aa9b6ba1170e66abc69b38ea4889e6f79af5d36
Assignee | ||
Comment 4•4 years ago
|
||
I've run into a silly problem here with Workers. Since currently bytecode is shared per-runtime, and this prototype shares the stencil across multiple runtimes we end up in a situation where the bytecode is de-duped into multiple runtime's script data tables. These tables all reference the common bytecode and we never end up releasing any of it. See js::SweepScriptData
for logic where we need to get ref-count down to 1 before final count is released.
It might make sense to revive Bug 1638669 or otherwise do something clever here.
Assignee | ||
Comment 5•4 years ago
|
||
(In reply to Ted Campbell [:tcampbell] from comment #4)
I've run into a silly problem here with Workers. Since currently bytecode is shared per-runtime, and this prototype shares the stencil across multiple runtimes we end up in a situation where the bytecode is de-duped into multiple runtime's script data tables. These tables all reference the common bytecode and we never end up releasing any of it. See
js::SweepScriptData
for logic where we need to get ref-count down to 1 before final count is released.
Turns out this is actually non-sense. The issue was workers were reparsing the selfhosted.js file instead of reusing parent runtime.
Assignee | ||
Comment 6•4 years ago
|
||
Remove the self-hosting global/realm/zone and replace it with direct
instantiation from a CompilationStencil. Instead of cloning from a specific
self-hosting realm, we instantiate from Stencil into the current global. If
the source of the clone is a global var in the self-hosting script, we
instantiate and run the top-level bytecode in the curent realm. Due to the
intrinsic Value caching these instantiations and evaluations only happen once
per global which is very similar to the previous cloning mechanism.
Currently regresses AWSY due to extra copy of ParserAtom data kept alive that
is never used. Sharing the Stencil across processes would help here.
Assignee | ||
Comment 7•4 years ago
|
||
Here is an ugly prototype that shows the concept. It is a 60kB regression on awsy-base due to the ParserAtoms being kept around. The total stencil is 270kB so if we get bytecode pinning and cross-process XDR sharing, it should shape up as a nice win.
Updated•4 years ago
|
Assignee | ||
Updated•4 years ago
|
Assignee | ||
Updated•4 years ago
|
Updated•3 years ago
|
Assignee | ||
Comment 8•3 years ago
|
||
Later patches will run SetIntrinsic
calls in normal realms instead of relying
on the initial-self-hosting-GC. This means that GetIntrinsic
can occasionally
see nursery objects so WarpBuilder must be adapted. This is more of an edge case
so simply defer Warp while we wait for a minor GC.
Assignee | ||
Comment 9•3 years ago
|
||
Depends on D120536
Assignee | ||
Comment 10•3 years ago
|
||
Make these few functions accessible to Stencil.cpp and clean up function signatures.
Depends on D120537
Assignee | ||
Comment 11•3 years ago
|
||
Instead of defining these intrinsics directly on the self-hosting global and
cloning on demand, directly use the JSFunctionSpec in the target global. This
removes the need to pre-define these on the self-hosting global. Use a binary
search to find the intrinsic on first use in a realm and then cache it on the
intrinsics holder object so that the lookup only happens once per realm.
Depends on D120538
Assignee | ||
Comment 12•3 years ago
|
||
In order to instantiate directly from the self-hosting stencil (instead of
cloning from the special zone), we need the stencil to be part of the JS
runtime. This adds 45kB per content process right now, but will allow us to
remove the self-hosting zone entirely which will more than make up for this.
Depends on D120539
Assignee | ||
Comment 13•3 years ago
|
||
On the JSRuntime, save a mapping from (permanent) JSAtoms to corresponding
self-hosted stencil ScriptIndex. We store both the ScriptIndex and the index of
the next top-level functions. Scripts in this range are sub-scripts of the
target. This is used later to instantiate self-hosted builtins on demand.
Depends on D120540
Assignee | ||
Comment 14•3 years ago
|
||
Instead of using the self-hosted zone as a template to create the initial lazy
self-hosted functions in a Realm, use the stencil data directly. Later patches
will also use this stencil to delazify these functions.
Depends on D120541
Assignee | ||
Comment 15•3 years ago
|
||
The previous patch handled the case where a FunctionSpec uses self-hosting, and
this patch now handles self-hosting references to other self-hosted functions.
Previously this was would do a deep-clone, but now we use the lazy self-hosted
function mechanism and delazify on demand.
Depends on D120542
Assignee | ||
Comment 16•3 years ago
|
||
Instead of using script-cloning, delazify by directly instantiating the script
data from the self-hosting CompilationStencil. We are only instantating a single
(top-level) function of the self-hosting at a time, so a special version of the
instantiation code is introduced to handle sub-trees. The CompilationGCOutput is
also giving offset indices to avoid allocations in most cases (since the output
structure reserves inline space).
Depends on D120543
Assignee | ||
Comment 17•3 years ago
|
||
Where we previously queried information directly on the uncloned self-hosted
functions, we now query these flags and names directly from the stencil.
Depends on D120544
Assignee | ||
Comment 18•3 years ago
|
||
The top-level script for selfhosted.js defines a few builtin values when it
executes. Instead of relying on cloning these objects into normal realms, this
patch now executes the script in the target realm itself. We run this script on
demand when we are first looking up a builtin that is neither a C++ intrinsic,
nor a self-hosted function defined in the stencil. In practice, this is not run
except when certain Intl APIs are used in a realm.
Depends on D120545
Assignee | ||
Comment 19•3 years ago
|
||
The dedicated zone for self-hosting can now be removed entirely. Also remove the
object cloning code that was only used for the old self-hosting mechanism.
Depends on D120546
Assignee | ||
Comment 20•3 years ago
•
|
||
The self-hosted zone provides data for a number of uses that must each by
replaced with new machanisms:
- Permanent Atoms: All atoms generated by parsing selfhosted.js are added to
the permanent atom set that is frozen and shared with worker runtimes
directly. We now use the self-hosted stencil to know what atoms are used and
continue to create them at startup. - C++ Instrinsics: Self-hosted code calls back into the VM using a few hundred
intrinsic functions. Previously these were initialized on the self-hosted
GlobalObject and cloned on demand. Now, we directly search the JSFunctionSpec
static data for a match and create the JSFunction on demand. This is then
cached on the realm intrinsics holder for subsequent calls. - JS_SELF_HOSTED_FN: Methods of builtin interfaces may be implemented with
self-hosted code. These start as special self-hosted lazy JSFunctions that
are delazified as needed. We look up canonical name / nargs from the stencil
now instead of the s-h zone. - JSOp::GetIntrinsic for functions: When self-hosted code calls other
self-hosted functions, GetIntrinsic opcodes are used. Previously this would
trigger a deep-clone, but now we simply create a lazy self-hosted JSFunction. - Delazification: Previously we would clone BaseScript data from the s-h zone
when we needed to delazify. Now we directly instantiate from the stencil.
This uses a special form of stencil instantiation that can work on a partial
sub-tree of the stencil (corresponding to a single top-level function and any
of its inner functions). - Non-delazifying attribute lookups: The getSelfHostedFunctionGeneratorKind
function would directly lookup attributes on the self-hosted zone without
first triggering delazification. Now we do a similar lookup directly on the
stencil data. - JSOp::GetIntrinsic for non-function values: The top-level script defines a
small number of variables with JSOp::SetIntrinsic that were previously cloned
on demand to target realms. We now run the top-level script directly in the
target realm. This is done on demand when a self-hosted value lookup is not
served by any other source. An example islistFormatInternalProperties
.
Assignee | ||
Comment 21•3 years ago
•
|
||
Potential follow-up items:
- Store
ScriptIndex
instead of selfhosted-name in the lazy JSFunction to avoid a hashtable lookup on delazification. - Directly construct the
permanentAtoms
table from the stencil (and well-known atoms/symbols) and simplify ordinary atomization code. - Store instrinsic index directly with JSOp::GetInstrinsic so this is done at compile time instead of binary search
Comment 22•3 years ago
|
||
Comment 23•3 years ago
|
||
Backed out for xpcshell failures on test_ext_background_service_worker.js
Failure logs:
- xpcshell: https://treeherder.mozilla.org/logviewer?job_id=346626569&repo=autoland
- bc: https://treeherder.mozilla.org/logviewer?job_id=346625781&repo=autoland
Backout link: https://hg.mozilla.org/integration/autoland/rev/3cf9062b4b20ebf4099102fe7db58feedc3f266c
Assignee | ||
Comment 24•3 years ago
|
||
There are pre-existing race conditions in the TimelineConsumers
code that are uncovered by this stack. I've filed Bug 1722803 to address that and then we should be able to land again.
Comment 25•3 years ago
|
||
Comment 26•3 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/c1792b8c8e1c
https://hg.mozilla.org/mozilla-central/rev/08d8da6dc11f
https://hg.mozilla.org/mozilla-central/rev/ef682113949b
https://hg.mozilla.org/mozilla-central/rev/77b8d045bb4f
https://hg.mozilla.org/mozilla-central/rev/72288d27e039
https://hg.mozilla.org/mozilla-central/rev/aa5317d3a439
https://hg.mozilla.org/mozilla-central/rev/c5eb9d1ff114
https://hg.mozilla.org/mozilla-central/rev/9cf38f379eab
https://hg.mozilla.org/mozilla-central/rev/25bc16451830
https://hg.mozilla.org/mozilla-central/rev/d3c9f54c757f
https://hg.mozilla.org/mozilla-central/rev/58f8dedb6d58
https://hg.mozilla.org/mozilla-central/rev/becfb18e6c36
Comment 27•3 years ago
|
||
== Change summary for alert #30739 (as of Mon, 02 Aug 2021 08:49:13 GMT) ==
Improvements:
Ratio | Suite | Test | Platform | Options | Absolute values (old vs new) |
---|---|---|---|---|---|
10% | Base Content JS | windows10-64-shippable-qr | 2,163,798.67 -> 1,952,544.00 | ||
10% | Base Content JS | linux1804-64-shippable-qr | 2,154,744.33 -> 1,946,512.00 | ||
10% | Base Content JS | macosx1015-64-shippable | 2,154,476.00 -> 1,947,296.00 | ||
10% | Base Content JS | macosx1015-64-shippable-qr | 2,154,476.00 -> 1,947,296.00 | ||
6% | Base Content Explicit | macosx1015-64-shippable-qr | 8,098,629.33 -> 7,644,826.67 | ||
... | ... | ... | ... | ... | ... |
2% | JS | windows10-64-shippable-qr | 85,680,232.78 -> 83,945,944.76 |
For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=30739
Assignee | ||
Comment 28•3 years ago
|
||
https://treeherder.mozilla.org/perfherder/alerts?id=30727
https://treeherder.mozilla.org/perfherder/alerts?id=30734
This change seemed to help OSX Cold welcome
pageload metrics by 15-20%. I don't really see much of a similar impact on other platforms, so I believe this is just some bi-modal behaviour that we pushed slightly over the line to be faster.
Comment 29•3 years ago
|
||
(In reply to Pulsebot from comment #25)
Pushed by tcampbell@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/becfb18e6c36
Remove code for the (now unused) self-hosting zone. r=jandem,jonco
== Change summary for alert #30751 (as of Tue, 03 Aug 2021 04:10:46 GMT) ==
Improvements:
Ratio | Suite | Test | Platform | Options | Absolute values (old vs new) |
---|---|---|---|---|---|
23% | welcome | dcf | macosx1015-64-shippable-qr | cold webrender | 63.62 -> 49.29 |
22% | welcome | dcf | macosx1015-64-shippable-qr | cold webrender | 65.54 -> 51.33 |
20% | welcome | loadtime | macosx1015-64-shippable-qr | cold webrender | 72.77 -> 58.04 |
16% | welcome | loadtime | macosx1015-64-shippable-qr | cold webrender | 73.65 -> 62.21 |
13% | welcome | fcp | macosx1015-64-shippable-qr | cold webrender | 118.04 -> 102.75 |
For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=30751
Updated•3 years ago
|
Description
•