Closed Bug 1533890 Opened 9 months ago Closed 6 months ago

Migrate call ICs to CacheIR

Categories

(Core :: JavaScript Engine: JIT, enhancement, P1)

enhancement

Tracking

()

RESOLVED FIXED
mozilla69
Tracking Status
firefox67 --- wontfix
firefox68 --- wontfix
firefox69 --- fixed

People

(Reporter: iain, Assigned: iain)

References

Details

Attachments

(28 files)

47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review

I've been crunching away on migrating call ICs to CacheIR, and my patch stack is getting perilously tall. It's time to start thinking about landing some of this code.

Rough outline:

  • The first patch moves some code around in preparation, and adds a preference (off by default) to enable call ICs.
  • The next three* patches add basic CacheIR support for native function calls, scripted function calls, and class hooks, respectively.
  • Tucked in between those patches is a small patch to clean up an oversight in the existing code where the simulator doesn't call IgnoresReturnValue entry points.
  • The next patch adds support for spread calls. (I have a patch for bug 1528270 that is gated on this patch.)
  • The next three patches add constructor support for class hooks, native calls, and scripted calls, respectively.

Remaining work includes:

  • funcall and funapply: I have a patch laying the groundwork to get these working (for both native and scripted targets, unlike the current implementation), and I'm currently working on funcall.
  • CallAnyScripted: This should be easy to implement, but I haven't done so yet. I also have plans to introduce CallAnyNative.
  • ConstStringSplit: Once the above are finished, this will be the last of the existing C++ call ICs.
  • Minor cleanups/improvements: in particular, I want to optimize pushCallArgs to unroll the loop for small argc.

After this code has landed and been fuzzed, we can flip the switch to turn it on by default, and then eventually delete the old code.

This patch lays the groundwork for CacheIR call ICs. It refactors CallIRGenerator slightly, and adds a preference (on by default) to disable the new ICs. This lets us review and land future patches in a logical order without worrying about the attachment order of the new ICs vs the old ones. (For example, we don't want a CacheIR implementation of CallNative to shadow the old implementation of CallScriptedFunApply.)

I also removed the old disableCacheIRBinaryArith pref, after confirming with Matt that it is obsolete.

To avoid huge diffs every time the length of the longest CacheIR op changes, I padded the table out to 80 chars with a comment. Thanks, clang-format!

This patch implements calls to native functions in CacheIR. Spread calls and constructor calls are handled in a later patch.

Notes:

  1. When running in the simulator, native call stubs have an extra stub field to hold the redirected swi wrapper. We use some ugly ifdefs to make that work.
  2. The simulator version of native call stubs also doesn't support ignoresReturnValue calls. This bug is fixed in a later patch.
  3. pushCallArguments is based directly on the existing code. A later patch improves the comments and tightens up the generated code.
  4. Compared to the previous implementation, this patch bakes argc into each stub (by using it to compute the address of |callee| in tryAttachCallNative). This increases the number of versions of the stub code we generate by some (hopefully small) factor. In a future patch (not yet written), this will be changed so that we unroll the loop in pushCallArguments for small argc, and share stub code for large argc.

Comparison points:

  • BaselineCacheIRCompiler::pushCallArguments corresponds to ICCallStubCompiler::pushCallArguments
  • CallIRGenerator::tryAttachCallNative and BaselineCacheIRCompiler::emitCallNativeFunction correspond to TryAttachCallStub and ICCall_Native::Compiler::generateStubCode. (The dividing line between the two functions is different between the old and new code, but aside from the exceptions detailed above the logic should be the same.)

Depends on D22773

Native calls in the simulator have to be redirected to a special swi instruction. In the old implementation, this redirection did not support calling the IgnoresReturnValue version of a native function. This patch fixes that problem in the new implementation.

Depends on D22774

This patch implements calls to native functions in CacheIR. Spread calls and constructor calls are handled in a later patch.

Notes:

  1. This patch adds GuardFunctionHasJitEntry as a separate CacheIR op, which guards against relazification. The same functionality currently exists inside of ScriptedGetterResult. My original intent was to refactor ScriptedGetterResult to use the op, but if Ted's plans to add a delazification trampoline work out then the issue goes away entirely and we can just delete both versions of the code.
  2. Like the callNative patch, this patch bakes argc into the stub code.
  3. The existing code pops callee off the stack and unboxes it, immediately after copying the args. This doesn't make any sense, because we have already unboxed it once and it's still in a register.
  4. There are a couple of places in tryAttachCallScripted where we want to decline to attach without counting it as a failure. Right now, if the CacheIR path fails, the old implementation will do the right thing for us. A future patch will add isTemporarilyUnoptimizable support to CallIRGenerator.

Comparison points:

  • CallIRGenerator::tryAttachCallScripted and BaselineCacheIRCompiler::emitCallScriptedFunction correspond to TryAttachCallStub and ICCallScriptedCompiler::generateStubCode

Depends on D22775

This patch implements class hooks in CacheIR. It only supports call hooks; constructor calls are handled in a later patch.

The implementation of class hooks overlaps almost completely with the implementation of native calls; the only difference is how we load the address of the callee. I factored out the actual call into a lambda, and commoned up the surrounding logic.

Comparison points:

  • The old logic is in TryAttachCallStub and ICCall_ClassHook::Compiler::generateStubCode.

Depends on D22776

This patch adds spread call support for scripted calls, native calls, and class hooks. (The old code did not have spread support for class hooks, but it's easy to add.)

Notes:

  1. We don't support spread constructors yet. That will come later, after regular constructors are supported.
  2. The old implementation called ICCallStubCompiler::guardSpreadCall to limit the size of the spread array. As an undocumented side-effect, guardSpreadCall also modifies the argc input register. The CacheIR version uses a fresh register, and relies on the magic of the CacheIR register allocator to preserve the input register.

Comparison points:

  • GuardAndGetSpreadArgc corresponds to ICCallStubCompiler::guardSpreadCall.
  • BaselineCacheIRCompiler::pushSpreadCallArguments corresponds to ICCallStubCompiler::pushSpreadCallArguments
  • DoSpreadCallFallback is modified to correspond to DoCallFallback.

Depends on D22777

Constructor stubs in the existing implementation contain template objects, which are unused by the stub itself but consumed by Ion. To make this work in CacheIR, this patch introduces meta ops, which generate no code but can be consumed by BaselineInspector.

This patch adds the infrastructure for meta ops and uses it to implement constructor class hooks, because it is a simple place to start. Subsequent patches add support for native constructor calls and scripted constructor calls.

In each case, BaselineInspector needs two pieces of information: a) The template object itself, and b) the callee, which BaselineInspector checks to make sure it is getting the correct template object. (The precise nature of the template object varies: it's a class pointer for hooks, but a pointer to the actual callee object for native and scripted constructors.) The meta op's arguments are a tag (in this case, "ClassTemplateObject") and two stub field offsets. NB: the class offset is reused from an earlier guard. To make this work, addStubField now returns the offset of the new field.

Other notes:

  1. I changed an assertion in CompactBufferReader::seek to make it possible to seek directly to the end of the buffer. This is safe, because we already check to make sure we aren't at the end before reading from the buffer.
  2. It turns out that the code to iterate through CacheIR bytecode is nicer if the CacheIR op table defines the length of the arguments to an op, without including the op itself. I made the change.

Comparison points:

  • CallIRGenerator::getTemplateObjectForClassHook corresponds to GetTemplateObjectForClassHook in BaselineIC.cpp

Depends on D22779

This patch uses the infrastructure from the previous patch to add constructor support for native calls.

Comparison points:

  • CallIRGenerator::getTemplateObjectForNative corresponds to GetTemplateObjectForNative in BaselineIC.cpp

Depends on D22780

This patch adds support for scripted constructors, using the existing template object infrastructure.

Comparison points:

  • BaselineCacheIRCompiler::createThis/updateReturnValue correspond to big "if (isConstructing) ..." blocks in ICCallScriptedCompiler::generateStubCode.
  • CallIRGenerator::getTemplateObjectForScripted corresponds to the block in tryAttachCallStub starting with the comment "// Remember the template object ..."

Notes:

  1. In the original CreateThis code, there was the following comment:
    // Reload callee script. Note that a GC triggered by CreateThis may
    // have destroyed the callee BaselineScript and IonScript. CreateThis
    // is safely repeatable though, so in this case we just leave the stub
    // frame and jump to the next stub.
    This comment was out of date; as of bug 1415853, the code pointer is always valid. The comment does not exist in the new version.

  2. Bug 870478 added code to the old version of |updateReturnValue| to load |this| from the baseline frame instead of the baseline stub frame, because we would not trace the copy of |this| if a rectifier frame was created. A better fix was delivered in bug 945275 (we now trace |this| in rectifier frames). The new version of |updateReturnValue| loads |this| out of the stub frame, because it is both easier to read and more efficient.

  3. Unlike BaselineInspector::getTemplateObjectFor(Native|ClassHook), the scripted version does not use the callee as a guard to make sure that we have the correct template object. This is the same as the status quo. It works because we only retrieve the template object if the IC chain has exactly one non-fallback stub. We still store the callee in the metadata, because it is needed for BaselineInspector::getSingleCallee. This whole area of the BaselineInspector API should probably be redesigned, but it seemed out of scope for this patch.

Depends on D22781

Priority: -- → P1
Keywords: leave-open

I am landing the first set of patches. The remaining work includes:

  • funcall and funapply: I have a series of patches implementing CacheIR function.call and function.apply. It expands coverage to native targets (the existing ICs only cover scripted targets). There's some clean-up required, but the work is mostly done. I also want to do a survey of our test coverage for call/apply with native targets, and potentially fill any holes.
  • CallAnyScripted/CallAnyNative: Most of the code that needed to be written for this already exists because of funcall/funapply, so I just have to hook it up together.
  • ConstStringSplit: Once the above are finished, this will be the last of the existing C++ call ICs.
  • Minor cleanups/improvements, including but not limited to:
    • Unrolling the argument copy loop for small argc
    • don't monitor JSOP_IGNORES_RV calls
    • Move GuardAndUpdateSpreadArgc into the call operation, because it doesn't quite make sense as a standalone guard.
Pushed by iireland@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/93613b622c03
Refactor TryAttachCallStub and add disableCacheIRCalls pref r=mgaudet
https://hg.mozilla.org/integration/autoland/rev/b04b7f58b4d8
Migrate CallNative to CacheIR r=mgaudet
https://hg.mozilla.org/integration/autoland/rev/3fad807ff7d4
Add simulator support for IgnoresReturnValue version of native functions r=mgaudet
https://hg.mozilla.org/integration/autoland/rev/1a502b69ad35
Migrate CallScripted to CacheIR r=mgaudet
https://hg.mozilla.org/integration/autoland/rev/e8af74a64a88
Migrate call hooks to CacheIR r=mgaudet
https://hg.mozilla.org/integration/autoland/rev/33fdefc89831
Add spread call support to CacheIR r=mgaudet
https://hg.mozilla.org/integration/autoland/rev/9fe35422fa32
Add constructor hook support to CacheIR r=mgaudet
https://hg.mozilla.org/integration/autoland/rev/b641ee647a01
Add native constructor call support to CacheIR r=mgaudet
https://hg.mozilla.org/integration/autoland/rev/d808eb8f5dd5
Add scripted constructor support to CacheIR r=mgaudet

With this change, --disable-ion builds successfully.

Pushed by iireland@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/d43f42ce6cf0
Fix template mismatch in MacroAssembler-none r=jandem
  • Doing my best to unify some of this code and provide a single-ish source of truth for things.
  1. Add a "CallFlags" struct that bundles up information about the call, including whether it is a constructor, whether it is a cross-realm call, and how its arguments are being passed. Notes:
    a) After working with it for a while, I'm convinced that isCrossRealm is named backwards. Calls are cross-realm by default. I've changed the flag to isSameRealm so that now we only have to set a non-default value for the flag if we know something about the callee. This makes CallFlags noticeably more ergonomic.
    b) Right now there are two possibilities for how arguments are being passed: standard and spread. In future patches, this will expand to include FunCall, FunApplyArgs, and FunApplyArray.
    c) This all gets packed up into a byte to make it easy to pass via CacheIR.

  2. Unify as much of the argument-address-calculating logic in one place. I've added an ArgumentKind enum that lets us ask for this, callee, newtarget, arg0, or arg1, and then it calculates the slot, possibly using argc.

  3. The first consumer of GetIndexFromArgumentKind is a new CacheIR interface for accessing specific arguments. Instead of doing manual calculations and using LoadStackValue(slotIndex), we use LoadArgument(argumentKind, argcInfo). The correct slot is calculated under the covers and encoded in the cache op. Note that sometimes we are willing to bake in argc at compile time (for example, tryAttachStringSplit) and other times we want the flexibility / codesharing of using the value of argc that we get as an input operand. To make this work, both LoadArgument and the CacheIR op it generates come in two flavours. I'm very open to bikeshedding on the name, which is currently borrowing terminology from LoadEnvironment(Fixed/Variable)SlotResult.

  4. The second consumer of GetIndexFromArgumentKind is a new BaselineCacheIRCompiler-internal interface for loading objects off the stack, because it happens pretty often. Right now, the user of the interface is responsible for tracking the depth of the stack above the arguments, although I'd like to find a way to fix that.

The initial implementation of CacheIR spread calls added a guardAndUpdateSpreadArgc op, which had to be emitted just before the call. This patch moves the argc update inside the call op, where it belonged all along.

In a few patches, fun_apply will also use this code.

Depends on D25868

This patch adds support for scripted fun_call.

The old logic is in TryAttachFunCallStub and ICCall_ScriptedFunCall::Compiler::generateStubCode.

Depends on D25869

This patch adds support for FunCall with a native target, which the old implementation lacked.

Notes:

  1. MacroAssembler::branchIfInterpreted had been unused for quite a while. I updated it to match MacroAssembler::branchIfFunctionHasNoJitEntry (and check WASM_OPTIMIZED in the non-constructor case).
  2. GuardIsNativeFunction was a confusing name, especially after I added GuardFunctionIsNative. I renamed GuardIsNativeFunction to GuardSpecificNativeFunction to better represent what it does.
  3. Native function calls in the simulator need to be wrapped up to trigger a software interrupt. To call an arbitrary native function, we need a trampoline, so I added CallAnyNative.

Depends on D25870

This patch moves ScriptedApplyArgs and ScriptedApplyArray to CacheIR, and adds NativeApplyArgs and NativeApplyArray.

Like a spread call, FunApply updates argc as part of the call op, after we've passed all the guards.

Comparisons:

  • The new code in BaselineCacheIRCompiler::updateArgc corresponds to parts of ICCallStubCompiler::guardFunApply.
  • BaselineCacheIRCompiler::emitGuardFunApply also corresponds to ICCallStubCompiler::guardFunApply.
  • BaselineCacheIRCompiler::pushFunApplyArgs corresponds to ICCallStubCompiler::pushCallerArguments
  • BaselineCacheIRCompiler::pushFunApplyArray corresponds to ICCallStubCompiler::pushArrayArguments
  • CallIRGenerator::tryAttachFunApply corresponds to TryAttachFunApplyStub + ICCall_ScriptedApplyArray::Compiler::generateStubCode

Depends on D25871

This patch moves CallAnyScripted over to CacheIR.

Most of the pieces were already in place to implement FunCall/FunApply. The only new CacheIR op is GuardFunctionIsConstructor. I also took the opportunity to refactor the masm function flag code a little bit.

Depends on D25872

This patch extends megamorphic support to native functions, which were not previously supported.

Depends on D25873

In some cases we want to avoid attaching a stub, but not count it as a failure. To make it easier to express this (as well as deferred stubs, which will be used to implement ConstStringSplit in a future patch), I rewrote all the tryAttachFoo stubs in CallIRGenerator to return an enum instead of a bool. (This seems better than a bunch of mutually exclusive boolean out-params.)

Depends on D25874

This patch adds support for ConstStringSplit. ConstStringSplit is attached after the call is performed, so that it can grab and cache the result. To make it work, I added a new AttachDecision value.

Note:

  1. Testing this code made me realize that a previous change I made was wrong. Even in call ICs that don't use argc, we have to initialize the input location. If we don't, the register can be allocated and clobbered, which will break subsequent ICs.
  2. We only want a ConstStringSplit if this is the first stub that we try attaching. (If there's already another stub, then it can't be a constant string split. The old implementation would remove any existing ConstStringSplit stubs when it realized that the optimization had failed, but there isn't a particularly nice way of doing that in CacheIR and I don't think it matters much.
  3. For unclear reasons, we had an implementation of CallStringSplitResult in IonCacheIRCompiler. It was dead code, so I removed it.

Depends on D25875

To verify that I had covered all of the existing functionality, I added assertions to DoCallFallback and DoSpreadCallFallback to verify that the old C++ implementation never managed to attach a stub if CacheIR had failed to handle it. This patch includes the changes made as a result of this testing:

  1. JSOP_CALLITER is equivalent to JSOP_CALL for our purposes. Adding support for JSOP_CALLITER did not require any additional changes.
  2. JSOP_SUPERCALL and JSOP_SPREADSUPERCALL are just like normal constructors, except we don't save a template object. I added the super guards where necessary.
  3. There were also a number of cases where we tried to attach a CacheIR stub, declined to attach because an identical stub already existed, and then attached an old stub. For example, a spread call IC with more than 16 arguments will fail, but we will try to attach another spread call IC. I examined the failures of this sort by hand. Many of them were invalid (in particular, oomtests that caused us to fail during the process of attaching the stub) or seemed acceptable. I made a few spot fixes to cover easy exceptions.
  4. I added jitspew to the attachment code to print BaselineICFallback messages if we try to attach an identical stub, which will hopefully help when testing new ICs to verify that they are not perma-failing.

Depends on D25876

Pushed by iireland@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/f8a17aa39162
Add LoadCallee op and refactor call flags r=mgaudet
https://hg.mozilla.org/integration/autoland/rev/c1d550195b9f
Move guardAndUpdateSpreadArgc logic inside the call op r=mgaudet
https://hg.mozilla.org/integration/autoland/rev/6d8643e52d18
Add scripted fun_call support to CacheIR r=mgaudet
https://hg.mozilla.org/integration/autoland/rev/f05cdb03558f
Add native fun_call support to CacheIR r=mgaudet
https://hg.mozilla.org/integration/autoland/rev/66ea618f7b13
Add fun_apply support to CacheIR r=mgaudet
https://hg.mozilla.org/integration/autoland/rev/8664fa8a8a10
Add megamorphic scripted stubs to CacheIR. r=mgaudet
https://hg.mozilla.org/integration/autoland/rev/190f9fd0c2a7
Add megamorphic native stubs to CacheIR r=mgaudet
https://hg.mozilla.org/integration/autoland/rev/90f3b654d3ac
Add TemporarilyUnoptimizable support to CallIRGenerator r=mgaudet
https://hg.mozilla.org/integration/autoland/rev/b698d91f51dd
Migrate ConstStringSplit to CacheIR r=mgaudet
https://hg.mozilla.org/integration/autoland/rev/0d7d1ef8d08e
Add support for additional call opcodes r=mgaudet

Based on the stack of patches, I presume that these patches are very unlikely to be backported, right?

Extremely correct.

When pushing arguments for a JIT call, we push the callee value onto the stack, and then immediately pop it off. This is a remnant of the old implementation.

This patch removes that wart, with a small detour to rewrite/recomment alignJitStackBasedOnNArgs for clarity.

While trying to pin down a performance regression, I realized that both the new and the old implementations of ConstStringSplit are broken. Three years ago, as part of some ES6 work, we changed String.prototype.split to call a self-hosted String_split function instead of native str_split. In turn, String_split calls intrinsic_StringSplitString in the case we care about. However, because the call to intrinsic_StringSplitString is in self-hosted code, we would only get value out of this stub if there was only one caller of String_split in the entire program.

This patch changes ConstStringSplit to look for the self-hosted String_Split function, which means we can attach a stub to the user script instead of self-hosted code. It also adds support in BaselineInspector for extracting metadata about the string split from the stub, which enables the Ion version of this optimization.

Depends on D29532

Some native functions (for example, the Array constructor) have constructor-like behaviour even when called in a non-constructing context. We still want to create template objects for these cases. This fixes a noticeable regression in Speedometer (particularly in EmberJS-debug).

Depends on D29533

I started changing StringSplit to attach outside of self-hosted code, to match ConstStringSplit. Upon closer examination, the StringSplit IC doesn't actually add any value, and we're better off deleting it. The generated code calls StringSplitHelper, which ends up doing almost exactly the same thing as the call to intrinsic_StringSplitString it replaces. When we first wrote the patch (bug 1366377), the advantage was that we got to skip a lookup to determine the group of the resulting object. However, a subsequent patch created a single group for every StringSplitString result, which is basically free to look up.

I couldn't write a microbenchmark where the StringSplit IC made any difference, so let's just delete it and simplify our codebase.

Depends on D29534

  1. In bug 1545278, Ted is making our story about guarding on object pointer identity more robust. Since I am adding a new guardSpecificObject call in the ConstStringSplit patch, it seemed reasonable to add guardSpecificFunction now instead of later.
  2. It's not directly relevant in the current patch, but in a previous version of the StringSplit patch (before I realized we could delete the whole thing) it turned out that calling isSelfHostedFunctionWithName on an arbitrary function can trigger assertions, because GetSelfHostedFunctionName assumes isExtended, but isSelfHostedBuiltin does not necessarily imply isExtended (in the case of nested anonymous functions).
  3. Fixing the format string in a JitSpew message I added in a previous stack.

Depends on D29535

This patch enables CacheIR call ICs by default, and turns off the old call ICs when CacheIR ICs are turned on.

Depends on D29536

Pushed by iireland@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/98c5fbe185df
Don't push callee unnecessarily for jit calls r=mgaudet
https://hg.mozilla.org/integration/autoland/rev/274e684c7979
Rewrite ConstStringSplit to work outside of microbenchmarks r=mgaudet
https://hg.mozilla.org/integration/autoland/rev/2ee2a6c2095d
Create template objects for native constructors called without "new" r=mgaudet
https://hg.mozilla.org/integration/autoland/rev/5d03ed4eeeb0
Remove StringSplit call IC r=mgaudet
https://hg.mozilla.org/integration/autoland/rev/cbf988d7f823
Minor cleanups r=mgaudet
Depends on: 1549035
Pushed by iireland@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/0258584daaa6
Turn on CacheIR call ICs by default r=mgaudet

The one remaining wart that I'm aware of is that the fallback stub uses ICStubCompilerBase::pushCallArguments, so we'll continue to have two implementations of that code unless/until we overhaul our fallbacks.

Keywords: leave-open
Pushed by iireland@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/c71a126eea40
Remove obsolete disableCacheIRCalls pref r=mgaudet
Status: NEW → RESOLVED
Closed: 6 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla69
You need to log in before you can comment on or make changes to this bug.