Closed Bug 1688794 Opened 2 years ago Closed 1 year ago

Consider replacing self-hosted cloning with stencil-instantiate

Categories

(Core :: JavaScript Engine, task, P2)

task

Tracking

()

RESOLVED FIXED
92 Branch
Tracking Status
firefox92 --- fixed

People

(Reporter: tcampbell, Assigned: tcampbell)

References

(Blocks 1 open bug)

Details

(Keywords: perf-alert, Whiteboard: [overhead:230k])

Attachments

(12 files, 1 obsolete file)

48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review

The selfhosted-js mechanism in SpiderMonkey currently relies on cloning functions on demand into Realms that use APIs implemented by self-hosting. This relies on the brittle script-cloning mechanism (which is being removed in Bug 1662152).

The top-level self-hosted body is almost entirely a bunch of functions with a few decorators (like _SetCanonicalName). These decorators can be resolved at parse time. Special care must be take about how to organize the stencil.

Another upside of this, is we can leverage Stencil capabilities in order to avoid needing to allocate the self-host realm entirely and minimize startup work, especially when the XDR cache file is available.

Assignee: nobody → tcampbell
Blocks: 1689821

Removing the self-hosting realm/zone entirely now seems feasible. Here is a rough plan:

  • Synthesize default constructors in parser
    • Fix existing constructor synth code (eg. breakpoints and fun.name inference)
    • Support lazy parsing of synthesized constructors
    • Remove JSOp::ClassConstructor / DerivedConstructor
  • Cleanup List / Record definitions in Utilites.js
    • Maybe turn into simple wrappers around std_Object_create
    • Maybe unify with js::ListObject
    • Remove MakeConstructible intrinsic
  • Support both self-host name and spec name in stencil
    • Repurpose ScriptStencil::lazyFunctionEnclosingScopeIndex_
  • Generate special self-hosted top-level stencil
    • Replace var x = ... with lambda similar to field initializer
    • Parser/BCE should decode "decorator" calls at parse time
      • _SetCanonicalName
      • _SetIsInlinableLargeFunction
    • Custom stencil instantiation code to compute name->scriptId map. This is
      not all that different from GlobalOrEvalDeclInstantiation.
    • No top-level bytecode is generated. This was only used for the var
      statements anyways which should now be turned into lambdas.
    • Either .js code should use IIFEs for each var or the parser should be
      modified to do this for us.
  • Store the Stencil on JSRuntime
    • This should be shareable with worker runtimes.
    • Selfhosted.xdr and bytecode pinning will be very useful here.
  • GetIntrinsic should run IIFE in target realm if there is one
    • Instantiate stencil (fragment) in target realm, call it, save result in
      the normal intrinsic slot of the target realm.
    • This is not very different than existing clone-on-first-use-in-realm
      behaviour. This will create a short-lived BaseScript, but Stencil makes
      this impact neglible.
    • One caveat is that this may generate nursery objects temporarily and
      small tweaks must be made in WarpSnapshot. Maybe there are tricks to use
      here?
  • Remove SelfHostingRealm/Zone
    • With top-level execution removed and cloning-from-stencil, we can remove
      the realm/compartment/zone entirely.
    • This also removes INIT_SELF_HOSTING GC from every process startup.
  • Optimize lookup of stencil fragment
    • In old model, we use a JSAtom to lookup a property on the self-hosting
      GlobalObject (which is in dictionary mode).
    • Store ScriptIndex instead of "ClonedName" on the JSFunction clones.
    • Consider using TopLevelIndex stencil's gcthing data to build a hashmap
      of stencil fragments. This would allow cross-process sharing.

Using IIFEs for self-hosted vars may be a bit extreme. There are not very vars and if the bytecode is tightened up it may be reasonable to run in each realm. The GlobalOrEvalDeclInstantiation instruction actually avoids the need for binding bytecode for nearly all of the functions. A remaining issue here is that we must adopt the lazy-self-hosted-function mechanism to avoid pulling in function scripts. Additionally, the data tables seem to generate very verbose bytecode and we should see if ObjLiteral can help.

If the trade-off of always running this (reduced) bytecode is reasonable, this is a simpler solution for both implementation and developers.

Depends on: 1690863
Depends on: 1690943

I've run into a silly problem here with Workers. Since currently bytecode is shared per-runtime, and this prototype shares the stencil across multiple runtimes we end up in a situation where the bytecode is de-duped into multiple runtime's script data tables. These tables all reference the common bytecode and we never end up releasing any of it. See js::SweepScriptData for logic where we need to get ref-count down to 1 before final count is released.

It might make sense to revive Bug 1638669 or otherwise do something clever here.

(In reply to Ted Campbell [:tcampbell] from comment #4)

I've run into a silly problem here with Workers. Since currently bytecode is shared per-runtime, and this prototype shares the stencil across multiple runtimes we end up in a situation where the bytecode is de-duped into multiple runtime's script data tables. These tables all reference the common bytecode and we never end up releasing any of it. See js::SweepScriptData for logic where we need to get ref-count down to 1 before final count is released.

Turns out this is actually non-sense. The issue was workers were reparsing the selfhosted.js file instead of reusing parent runtime.

Remove the self-hosting global/realm/zone and replace it with direct
instantiation from a CompilationStencil. Instead of cloning from a specific
self-hosting realm, we instantiate from Stencil into the current global. If
the source of the clone is a global var in the self-hosting script, we
instantiate and run the top-level bytecode in the curent realm. Due to the
intrinsic Value caching these instantiations and evaluations only happen once
per global which is very similar to the previous cloning mechanism.

Currently regresses AWSY due to extra copy of ParserAtom data kept alive that
is never used. Sharing the Stencil across processes would help here.

Here is an ugly prototype that shows the concept. It is a 60kB regression on awsy-base due to the ParserAtoms being kept around. The total stencil is 270kB so if we get bytecode pinning and cross-process XDR sharing, it should shape up as a nice win.

Attachment #9203520 - Attachment description: Bug 1688794 - Replace self-hosting realm with Stencil → WIP: Bug 1688794 - Replace self-hosting realm with Stencil
Depends on: 1705819
Depends on: 1705837
Depends on: 1707792
Depends on: 1710953
Whiteboard: [overhead:200k]
Whiteboard: [overhead:200k] → [overhead:230k]
Depends on: 1716901
Attachment #9203520 - Attachment is obsolete: true

Later patches will run SetIntrinsic calls in normal realms instead of relying
on the initial-self-hosting-GC. This means that GetIntrinsic can occasionally
see nursery objects so WarpBuilder must be adapted. This is more of an edge case
so simply defer Warp while we wait for a minor GC.

Make these few functions accessible to Stencil.cpp and clean up function signatures.

Depends on D120537

Instead of defining these intrinsics directly on the self-hosting global and
cloning on demand, directly use the JSFunctionSpec in the target global. This
removes the need to pre-define these on the self-hosting global. Use a binary
search to find the intrinsic on first use in a realm and then cache it on the
intrinsics holder object so that the lookup only happens once per realm.

Depends on D120538

In order to instantiate directly from the self-hosting stencil (instead of
cloning from the special zone), we need the stencil to be part of the JS
runtime. This adds 45kB per content process right now, but will allow us to
remove the self-hosting zone entirely which will more than make up for this.

Depends on D120539

On the JSRuntime, save a mapping from (permanent) JSAtoms to corresponding
self-hosted stencil ScriptIndex. We store both the ScriptIndex and the index of
the next top-level functions. Scripts in this range are sub-scripts of the
target. This is used later to instantiate self-hosted builtins on demand.

Depends on D120540

Instead of using the self-hosted zone as a template to create the initial lazy
self-hosted functions in a Realm, use the stencil data directly. Later patches
will also use this stencil to delazify these functions.

Depends on D120541

The previous patch handled the case where a FunctionSpec uses self-hosting, and
this patch now handles self-hosting references to other self-hosted functions.

Previously this was would do a deep-clone, but now we use the lazy self-hosted
function mechanism and delazify on demand.

Depends on D120542

Instead of using script-cloning, delazify by directly instantiating the script
data from the self-hosting CompilationStencil. We are only instantating a single
(top-level) function of the self-hosting at a time, so a special version of the
instantiation code is introduced to handle sub-trees. The CompilationGCOutput is
also giving offset indices to avoid allocations in most cases (since the output
structure reserves inline space).

Depends on D120543

Where we previously queried information directly on the uncloned self-hosted
functions, we now query these flags and names directly from the stencil.

Depends on D120544

The top-level script for selfhosted.js defines a few builtin values when it
executes. Instead of relying on cloning these objects into normal realms, this
patch now executes the script in the target realm itself. We run this script on
demand when we are first looking up a builtin that is neither a C++ intrinsic,
nor a self-hosted function defined in the stencil. In practice, this is not run
except when certain Intl APIs are used in a realm.

Depends on D120545

The dedicated zone for self-hosting can now be removed entirely. Also remove the
object cloning code that was only used for the old self-hosting mechanism.

Depends on D120546

The self-hosted zone provides data for a number of uses that must each by
replaced with new machanisms:

  • Permanent Atoms: All atoms generated by parsing selfhosted.js are added to
    the permanent atom set that is frozen and shared with worker runtimes
    directly. We now use the self-hosted stencil to know what atoms are used and
    continue to create them at startup.
  • C++ Instrinsics: Self-hosted code calls back into the VM using a few hundred
    intrinsic functions. Previously these were initialized on the self-hosted
    GlobalObject and cloned on demand. Now, we directly search the JSFunctionSpec
    static data for a match and create the JSFunction on demand. This is then
    cached on the realm intrinsics holder for subsequent calls.
  • JS_SELF_HOSTED_FN: Methods of builtin interfaces may be implemented with
    self-hosted code. These start as special self-hosted lazy JSFunctions that
    are delazified as needed. We look up canonical name / nargs from the stencil
    now instead of the s-h zone.
  • JSOp::GetIntrinsic for functions: When self-hosted code calls other
    self-hosted functions, GetIntrinsic opcodes are used. Previously this would
    trigger a deep-clone, but now we simply create a lazy self-hosted JSFunction.
  • Delazification: Previously we would clone BaseScript data from the s-h zone
    when we needed to delazify. Now we directly instantiate from the stencil.
    This uses a special form of stencil instantiation that can work on a partial
    sub-tree of the stencil (corresponding to a single top-level function and any
    of its inner functions).
  • Non-delazifying attribute lookups: The getSelfHostedFunctionGeneratorKind
    function would directly lookup attributes on the self-hosted zone without
    first triggering delazification. Now we do a similar lookup directly on the
    stencil data.
  • JSOp::GetIntrinsic for non-function values: The top-level script defines a
    small number of variables with JSOp::SetIntrinsic that were previously cloned
    on demand to target realms. We now run the top-level script directly in the
    target realm. This is done on demand when a self-hosted value lookup is not
    served by any other source. An example is listFormatInternalProperties.

Potential follow-up items:

  • Store ScriptIndex instead of selfhosted-name in the lazy JSFunction to avoid a hashtable lookup on delazification.
  • Directly construct the permanentAtoms table from the stencil (and well-known atoms/symbols) and simplify ordinary atomization code.
  • Store instrinsic index directly with JSOp::GetInstrinsic so this is done at compile time instead of binary search
Blocks: 1721413
Pushed by tcampbell@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/2cdb8f0337ee
Fix nursery handling for GetIntrinsic in WarpBuilder. r=jandem
https://hg.mozilla.org/integration/autoland/rev/936e59acffdc
Factor js::ScopeIndex to its own file. r=arai
https://hg.mozilla.org/integration/autoland/rev/d1285544f83c
Cleanup self-hosting helpers for later use by stencil methods. r=jandem
https://hg.mozilla.org/integration/autoland/rev/f370efcfeb76
Instantatiate self-hosting C++ intrinsics directly from JSFunctionSpec. r=jandem
https://hg.mozilla.org/integration/autoland/rev/7c57cdfa5b32
Keep self-hosting stencil alive for duration of runtime. r=nbp
https://hg.mozilla.org/integration/autoland/rev/4230722c8e8e
Add mapping from atom to self-hosting ScriptIndex. r=jandem,arai
https://hg.mozilla.org/integration/autoland/rev/073610dd3479
Use the stencil to create lazy self-hosted functions. r=jandem
https://hg.mozilla.org/integration/autoland/rev/eacb72c5069a
Use the stencil when looking up self-hosted values that are functions. r=jandem
https://hg.mozilla.org/integration/autoland/rev/5f36aebf0f6e
Delazify self-hosted functions directly from the stencil. r=jandem,arai
https://hg.mozilla.org/integration/autoland/rev/79d28566a20c
Directly use the self-hosted stencil in more places. r=jandem
https://hg.mozilla.org/integration/autoland/rev/4888b8daa6b9
Run self-hosted top-level in target realm instead of cloning. r=jandem
https://hg.mozilla.org/integration/autoland/rev/121b0b4cf551
Remove code for the (now unused) self-hosting zone. r=jandem,jonco
Depends on: 1722803

There are pre-existing race conditions in the TimelineConsumers code that are uncovered by this stack. I've filed Bug 1722803 to address that and then we should be able to land again.

Flags: needinfo?(tcampbell)
Pushed by tcampbell@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/c1792b8c8e1c
Fix nursery handling for GetIntrinsic in WarpBuilder. r=jandem
https://hg.mozilla.org/integration/autoland/rev/08d8da6dc11f
Factor js::ScopeIndex to its own file. r=arai
https://hg.mozilla.org/integration/autoland/rev/ef682113949b
Cleanup self-hosting helpers for later use by stencil methods. r=jandem
https://hg.mozilla.org/integration/autoland/rev/77b8d045bb4f
Instantatiate self-hosting C++ intrinsics directly from JSFunctionSpec. r=jandem
https://hg.mozilla.org/integration/autoland/rev/72288d27e039
Keep self-hosting stencil alive for duration of runtime. r=nbp
https://hg.mozilla.org/integration/autoland/rev/aa5317d3a439
Add mapping from atom to self-hosting ScriptIndex. r=jandem,arai
https://hg.mozilla.org/integration/autoland/rev/c5eb9d1ff114
Use the stencil to create lazy self-hosted functions. r=jandem
https://hg.mozilla.org/integration/autoland/rev/9cf38f379eab
Use the stencil when looking up self-hosted values that are functions. r=jandem
https://hg.mozilla.org/integration/autoland/rev/25bc16451830
Delazify self-hosted functions directly from the stencil. r=jandem,arai
https://hg.mozilla.org/integration/autoland/rev/d3c9f54c757f
Directly use the self-hosted stencil in more places. r=jandem
https://hg.mozilla.org/integration/autoland/rev/58f8dedb6d58
Run self-hosted top-level in target realm instead of cloning. r=jandem
https://hg.mozilla.org/integration/autoland/rev/becfb18e6c36
Remove code for the (now unused) self-hosting zone. r=jandem,jonco
Regressions: 1723464

== Change summary for alert #30739 (as of Mon, 02 Aug 2021 08:49:13 GMT) ==

Improvements:

Ratio Suite Test Platform Options Absolute values (old vs new)
10% Base Content JS windows10-64-shippable-qr 2,163,798.67 -> 1,952,544.00
10% Base Content JS linux1804-64-shippable-qr 2,154,744.33 -> 1,946,512.00
10% Base Content JS macosx1015-64-shippable 2,154,476.00 -> 1,947,296.00
10% Base Content JS macosx1015-64-shippable-qr 2,154,476.00 -> 1,947,296.00
6% Base Content Explicit macosx1015-64-shippable-qr 8,098,629.33 -> 7,644,826.67
... ... ... ... ... ...
2% JS windows10-64-shippable-qr 85,680,232.78 -> 83,945,944.76

For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=30739

https://treeherder.mozilla.org/perfherder/alerts?id=30727
https://treeherder.mozilla.org/perfherder/alerts?id=30734

This change seemed to help OSX Cold welcome pageload metrics by 15-20%. I don't really see much of a similar impact on other platforms, so I believe this is just some bi-modal behaviour that we pushed slightly over the line to be faster.

(In reply to Pulsebot from comment #25)

Pushed by tcampbell@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/becfb18e6c36
Remove code for the (now unused) self-hosting zone. r=jandem,jonco

== Change summary for alert #30751 (as of Tue, 03 Aug 2021 04:10:46 GMT) ==

Improvements:

Ratio Suite Test Platform Options Absolute values (old vs new)
23% welcome dcf macosx1015-64-shippable-qr cold webrender 63.62 -> 49.29
22% welcome dcf macosx1015-64-shippable-qr cold webrender 65.54 -> 51.33
20% welcome loadtime macosx1015-64-shippable-qr cold webrender 72.77 -> 58.04
16% welcome loadtime macosx1015-64-shippable-qr cold webrender 73.65 -> 62.21
13% welcome fcp macosx1015-64-shippable-qr cold webrender 118.04 -> 102.75

For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=30751

Regressions: 1727364
You need to log in before you can comment on or make changes to this bug.