Closed Bug 1646378 Opened 9 months ago Closed 6 months ago

Warp: Trial Inlining Prototype

Categories

(Core :: JavaScript Engine: JIT, enhancement, P1)

enhancement

Tracking

()

RESOLVED FIXED
82 Branch
Tracking Status
firefox82 --- fixed

People

(Reporter: iain, Assigned: iain)

References

(Blocks 1 open bug)

Details

Attachments

(40 files)

47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review

WarpBuilder relies heavily on transpiling CacheIR. We would like to be able to inline scripted functions in WarpBuilder, but this runs into problems for functions with many different callers. The ICs in such a function are likely to be polymorphic, even if they are monomorphic with respect to each individual caller.

The new approach: somewhere between Baseline and Warp we add a new warmup threshold. When a script reaches that threshold, we analyze that script and its callees to choose a set of callees that we would like to inline. For each call site we decide to inline, we allocate a new ICScript with a fresh set of ICEntries for the callee, and replace the call IC in the caller with a new special stub that will invoke the callee using the new ICScript. (If the callee gets hot, we can inline recursively.) When the root script hits the Warp warmup threshold, we can use the CacheIR from the inlined ICScripts, giving us more accurate information.

Severity: -- → N/A

When we clone a JitScript for inlining, there is some data that we want to share and some data that needs to be separate. This patch moves the separate data (warmup count, ICEntry array) into ICScript and embeds an ICScript into JitScript. The layout of a JitScript in memory is mostly unchanged by this patch.

We want to be able to swap out ICScripts, so we can't hardcode the ICEntry in EmitCallIC. Hoisting the ICStubReg initialization out of EmitCallIC (where it was the same across architectures) lets us change it in a subsequent patch.

Depends on D80196

Depends on D80197

The previous patch put the ICScript pointer in the baseline frame. This patch uses that pointer to find the right ICEntry.

Depends on D80198

A previous patch put the ICScript pointer in the baseline frame. This patch uses that pointer to find the warmup count.

Depends on D80199

There were already three copies of this code, and I was about to add a fourth.

WarpOracle has similar code for accessing stubDataStart. I wasn't sure that we needed an accessor for that yet.

Depends on D80200

Keywords: leave-open

Our plan for invalidating transpiled Warp ICs when guards fail is to bailout (leaving us in the baseline interpreter) and invalidate when attaching a new IC. To make this work with trial inlining, we need to ensure that we can invalidate the root of the inlining tree, not the original version of the inlined script. The most robust way to do this is to unconditionally store the ICScript in BaselineFrame, instead of only storing it in baseline and omitting it from blinterp.

This patch adds code to setInterpreterFields and the blinterp prologue to store the default ICScript, and fixes emit_JumpTarget to find the new entry in the correct ICScript. bumpInterpreterICEntry does not need to be changed. Right now the ICScript in the blinterp frame will always be the default ICScript. In the future, when we actually do Warp inlining, we can add setInterpreterFieldsForBailout and make sure that the correct ICScript is provided.

(Drive-by cleanup: BaselineFrame::addressOfEnvironmentChain was dead code.)

Depends on D80201

My current plan for stub allocation is as follows. A JitScript contains a fallbackStubSpace, which is used for all of its own stubs. When we decide to do trial inlining for a JitScript, we create an InliningRoot, which is owned by the JitScript. The InliningRoot contains an array of unique pointers to inlined ICScripts and has a single fallbackStubSpace which is shared by all those ICScripts.

I wrote this code when I was thinking that we would throw the InliningRoot away after Warp compilation. If we intend to keep it around indefinitely, it might be better to reuse the stubspace in the root JitScript, in which case we don't need this patch.

Depends on D81795

In a later patch it will be useful to be able to declare a CallFlags and give it a value later.

Depends on D81797

When we create the stub that calls an inlined script, we want to ensure that we use all the same guards as the original stub. To do so robustly, this patch adds CacheIRCloner, which can read an op from a CacheIRReader and write a copy of that op to a CacheIRWriter. This lets us clone the parts of the stub that we need and then add new ops.

In the future, this code may also be useful when folding together ICs with the same CacheIR.

Depends on D81798

This is mostly the same as CallScriptedFunction. The key differences are:

  1. We have an extra argument, the ICScript, which must be stored in the JSContext so the Baseline prologue can find it.

  2. We have to ensure that we call the baseline. This is a little bit awkward. We need to verify that the target still has a BaselineScript before calling updateArgc, which will overwrite the input operand for spread calls. However, on x86 32-bit, we don't have a spare register to hold the code pointer while we are pushing the arguments onto the stack. I eventually compromised on just reloading the code pointer for x86-32.

Depends on D81799

This patch sets up the framework to call DoTrialInlining when we want to inline. The next patch actually implements DoTrialInlining.

Depends on D81800

This patch implements the "trial" part of trial inlining. We find inlining candidates, clone them, create new ICScripts, and attach new ICs. When WarpOracle runs, we have accurate information available. The next step is to use that information to actually inline.

With this patch, --warp + --warp-trial-inlining passes almost all jit-tests, with the exception of a couple (ion/for-in-iterator-1.js and self-hosting/is-possibly-wrapped-typed-array.js) that rely on looping until inIon is true. If we do a trial-inlining of the script containing the loop, then the warmup counter for the cloned ICScript gets hot, but the warmup counter for the original script (used in GetOptimizationLevel) is not hot enough to compile. I'm holding off on fixing that until we have decided what our policy is for inlined scripts that get hot.

Depends on D81801

Attachment #9160547 - Attachment description: Bug 1646378: Mark WarmupCounter as unreachable in BaselineDebugModeOSR r=jandem → Bug 1646378: Replace RetAddrEntry::Kind::WarmupCounter with callVMNonOp r=jandem
Pushed by iireland@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/cbe0ca1d37b7
Factor out ICScript from JitScript r=jandem
https://hg.mozilla.org/integration/autoland/rev/4c6c1963eaba
Hoist ICStubReg initialization out of EmitCallIC r=jandem
https://hg.mozilla.org/integration/autoland/rev/5f9cf0364e33
Store ICScript in baseline frame r=jandem
https://hg.mozilla.org/integration/autoland/rev/25c17c4cad25
Use ICScript in baseline frame to call ICs r=jandem
https://hg.mozilla.org/integration/autoland/rev/319464e2d223
Use ICScript in baseline frame to increment warmup counter r=jandem
https://hg.mozilla.org/integration/autoland/rev/a4e296fb3211
Add cacheIRStubInfo accessor to ICStub r=jandem
https://hg.mozilla.org/integration/autoland/rev/f657de7a7c64
Store ICSCript in baseline interpreter frame r=jandem
https://hg.mozilla.org/integration/autoland/rev/3f8a0fa88566
Get stubspace from ICScript in AttachBaselineCacheIRStub r=jandem
https://hg.mozilla.org/integration/autoland/rev/0d73d6296143
Replace RetAddrEntry::Kind::WarmupCounter with callVMNonOp r=jandem
https://hg.mozilla.org/integration/autoland/rev/2804b1fc6641
Refactor CallFlags constructors r=jandem
https://hg.mozilla.org/integration/autoland/rev/01851025fa25
Implement CacheIRCloner r=jandem
https://hg.mozilla.org/integration/autoland/rev/954c1651adb2
Add CallInlinedFunction r=jandem
https://hg.mozilla.org/integration/autoland/rev/8271038c39a6
Add call to DoTrialInlining r=jandem
https://hg.mozilla.org/integration/autoland/rev/7e46319259d9
Implement DoTrialInlining r=jandem
Backout by rmaries@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/e547a6e36279
Backed out 14 changesets for build bustages. CLOSED TREE
Pushed by rmaries@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/75801659b613
Factor out ICScript from JitScript r=jandem
https://hg.mozilla.org/integration/autoland/rev/8af0842d4e7d
Hoist ICStubReg initialization out of EmitCallIC r=jandem
https://hg.mozilla.org/integration/autoland/rev/ea762fef4139
Store ICScript in baseline frame r=jandem
https://hg.mozilla.org/integration/autoland/rev/da3a5a1a4414
Use ICScript in baseline frame to call ICs r=jandem
https://hg.mozilla.org/integration/autoland/rev/55574569ada4
Use ICScript in baseline frame to increment warmup counter r=jandem
https://hg.mozilla.org/integration/autoland/rev/3c9fe96e1f50
Add cacheIRStubInfo accessor to ICStub r=jandem
https://hg.mozilla.org/integration/autoland/rev/88d8782f9cab
Store ICSCript in baseline interpreter frame r=jandem
https://hg.mozilla.org/integration/autoland/rev/62a055d16919
Get stubspace from ICScript in AttachBaselineCacheIRStub r=jandem
https://hg.mozilla.org/integration/autoland/rev/f83599523d2b
Replace RetAddrEntry::Kind::WarmupCounter with callVMNonOp r=jandem
https://hg.mozilla.org/integration/autoland/rev/0f907f953c9e
Refactor CallFlags constructors r=jandem
https://hg.mozilla.org/integration/autoland/rev/1aa0f867fca2
Implement CacheIRCloner r=jandem
https://hg.mozilla.org/integration/autoland/rev/9bd1a1409856
Add CallInlinedFunction r=jandem
https://hg.mozilla.org/integration/autoland/rev/8b4d525ae0e2
Add call to DoTrialInlining r=jandem
https://hg.mozilla.org/integration/autoland/rev/8695942e5375
Implement DoTrialInlining r=jandem
Pushed by iireland@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/1b5170ba3082
Add SMDOC comment for ICScript lifetimes r=jandem DONTBUILD

Almost all of the data in WarpBuilder was already script-specific, so instead of renaming it to WarpScriptBuilder everywhere, I pulled out a WarpCompilation class to hold the data that is shared between scripts.

Depends on D83179

Preparing for inlining, at which point the WarpSnapshot / WarpScriptSnapshot relationship is no longer 1-to-1.

Depends on D83180

When taking a snapshot for inlining, we have to read the CacheIR to find the ICScript and the target. We could store the target in another field in CallInlinedFunction, but that could be awkward in the future if we want to inline without trial inlining, because we won't have a CallInlinedFunction available. Instead, I hoisted the CacheIR-reading code in maybeInlineCall into its own function that can be shared by TrialInlining and the snapshot code.

Depends on D83182

I based the InliningTree / CompileInfo code on IonBuilder::inlineScriptedCall.

In the future, if we want to inline cases where we didn't allocate a new ICScript, it should just be a matter of changing a few lines in maybeInlineIC. We can use the default ICScript off the target script.

(Note: we now check in maybeInlineCall that we don't already have a CallInlinedFunction. This should not normally happen, but could happen if we reset the warmup counter for a script.)

Depends on D83183

Pushed by iireland@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/823d54e72613
Store ICScript and CompileInfo in WarpScriptOracle r=jandem
https://hg.mozilla.org/integration/autoland/rev/2d1ae6a0bfa1
Move compilation-wide data out of WarpBuilder r=jandem
https://hg.mozilla.org/integration/autoland/rev/47000915fd59
Add WarpScriptSnapshotList to WarpSnapshot. r=jandem
https://hg.mozilla.org/integration/autoland/rev/30a38c4ecf26
Factor out FindInlinableCallData r=jandem
https://hg.mozilla.org/integration/autoland/rev/ca2a620dc5a7
Add WarpInlinedCall snapshot r=jandem

We want to use this code in WarpBuilder. It's cleaner if it just returns a bool.

When compiling with --ion-eager, we may do bytecode analysis before we have a jitscript. Instead of adding a flag to JitScript, I reused the uninlineable flag on JSScript.

Depends on D83719

Drawing a clearer distinction between cases where we don't support inlining and cases where we have chosen not to inline.

Depends on D83720

Pushed by iireland@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/3c619409e1f2
Refactor pushCallStack and friends r=jandem
https://hg.mozilla.org/integration/autoland/rev/cf578c3dfcb1
Identify no-return functions in BytecodeAnalysis r=jandem
https://hg.mozilla.org/integration/autoland/rev/3d4681291d56
Limit inlining candidates r=jandem
https://hg.mozilla.org/integration/autoland/rev/9873bd7d59dd
Turn off TI-specific bailout code for Warp r=jandem

Based on IonBuilder::buildInline and IonBuilder::inlineScriptedCall.

Fixing up the relevant TODOs.

There's one remaining TODO in build_Try about supporting try-catch in inlined functions. The corresponding comment in IonBuilder dates back to bug 916914. As far as I can tell, it is out of date, and we support inlining JSOp::Try out of the box. I might be missing something, though, so I've left that comment in for now.

Depends on D83880

When a script becomes observable to the debugger, we invalidate the IonScript. We must also invalidate any scripts that inlined that script. The infrastructure to do this already exists; we just have to make sure inlined Warp compilations register themselves.

Depends on D83881

Pushed by iireland@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/67e965b12b4d
Build inlined calls r=jandem
https://hg.mozilla.org/integration/autoland/rev/86ca33a86ae5
Implement ops with different behaviour in inlined functions r=jandem
https://hg.mozilla.org/integration/autoland/rev/654c839a370a
Invalidate inlining script when inlined scripts are invalidated r=jandem

When we bail out of Warp, we create one or more BaselineFrames, each of which stores an ICScript. Until now, we used the default ICScript for each frame. This works poorly with our invalidation strategy: when we bail out of an inlined function because a transpiled guard failed, we want to resume execution with the inlined ICScript so that attaching a new IC will trigger us to invalidate the inlined version of the script.

Fortunately, bailouts rebuild the stack from the outermost to the innermost frame. The outermost frame will always use its default ICScript, and inner frames will use ICScripts owned by the outermost frame's JitScript. In theory we could recover the ICScript for an inlined call by looking up the ICEntry for that pc and inspecting the stub, but that seems slow and awkward. Instead, I added edges between ICScripts to let us find inlined children.

Note: the set of ICScripts in the inlining tree is the same as the set of ICScripts in InliningRoot::inlinedScripts_. If we moved ICScript ownership into the tree, then the only remaining members of InliningRoot would be the fallback stub space, and a JSScript pointer that I am about to add so that we can find the right script to invalidate. We could consider getting rid of InliningRoot entirely, but since one of the reasons we wanted a separate fallback stub space was because it could make things easier when we cancel inlining, I am holding off until we look at cancelation.

In preparation for fixing bailouts, this patch does two things:

  1. Moves initialization of BaselineFrame::icScript_ out of setInterpreterFields, where it didn't really belong.
  2. Fixes setInterpreterFields to initialize interpreterICEntry_ to use the ICScript stored in the frame instead of getting it from the JitScript.

Depends on D84427

When we replace a stub in an inlined ICScript, we want to invalidate the script into which it was inlined, rather than the original copy of the inlined script.

Depends on D84429

Pushed by iireland@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/a237d6525936
Store inlining data in ICScript r=jandem
https://hg.mozilla.org/integration/autoland/rev/b4d6118f4481
Move ICScript initialization out of setInterpreterFields r=jandem
https://hg.mozilla.org/integration/autoland/rev/86db033f2b35
Use inlined ICScripts when reconstructing bailout frames r=jandem
https://hg.mozilla.org/integration/autoland/rev/dab916754456
Invalidate the root script in MaybeInvalidateWarp r=jandem

This fixes a crash in Elm-TodoMVC.

With these conservative heuristics, trial inlining gives a performance improvement on Octane and does not regress Speedometer.

Depends on D86101

Prior to this patch, if we aborted with AbortReason::Disable (for example, because it contains an unsupported opcode), then we would abort the entire compilation. With this patch, we mark the script uninlineable, clean up, and continue as if we had not done trial inlining. This improves Octane by ~1.3%.

The CacheIR for the callsite still contains a CallInlinedFunction. The best approach I could come up with was to transpile CallInlinedFunction differently depending on whether we were actually inlining. If not, it is treated like a regular CallScriptedFunction.

An alternative approach would be to check all the conditions that could cause AbortReason::Disable beforehand (possibly in BytecodeAnalysis) and preemptively mark the script unlineable, with the downside of creating another piece of code that had to be kept in sync with WarpOracle, and there are just enough quirky cases to make that seem like a bad idea. In the future, we could combine the approaches by preemptively marking the easy cases and relying on this code for the rest.

Depends on D86102

Pushed by iireland@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/768bcbcd20e0
Don't inline cross-realm r=jandem
https://hg.mozilla.org/integration/autoland/rev/69d3d9cdb396
Initial inlining heuristics r=jandem
https://hg.mozilla.org/integration/autoland/rev/c0bcf1c6e42c
Don't abort entire warp compilation when inlining snapshot aborts r=jandem
https://hg.mozilla.org/integration/autoland/rev/95cbd1379138
Disable tests that loop infinitely with trial inlining r=jandem

Limit the maximum depth of the inlining tree and don't inline scripts with too many arguments.

Pushed by iireland@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/3bb3c814482f
Improve inlining heuristics r=jandem

Pulling this out into its own function so that the next patch can add an early return.

We have to loosen this assertion to handle cases where we trial-inline a script because needsArgsObj is false, but then we call argumentsOptimizationFailed on that script before we reach the warp oracle.

Depends on D86771

Pushed by iireland@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/18bb5c07a3b7
Factor out maybeInlineCallIC r=jandem
https://hg.mozilla.org/integration/autoland/rev/0c1308f6d1bf
Make inlining work with arguments analysis r=jandem

If a script cannot be warp-compiled, then we do not want to do trial inlining for that script. Where possible, it seems better to mark scripts proactively, instead of undoing the trial inlining after failing to warp-compile.

Pushed by iireland@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/bd4a5fb3aa65
Preemptively disable warp compilation for unsupported ops r=jandem
https://hg.mozilla.org/integration/autoland/rev/4462d1a3ffbb
Turn trial inlining on by default for Warp r=jandem
Keywords: leave-open
Status: NEW → RESOLVED
Closed: 6 months ago
Resolution: --- → FIXED
Target Milestone: --- → 82 Branch
You need to log in before you can comment on or make changes to this bug.