Open Bug 1827914 Opened 1 year ago Updated 5 months ago

Eagerly baseline compile self-hosted code

Categories

(Core :: JavaScript Engine, task, P2)

task

Tracking

()

People

(Reporter: bthrall, Assigned: bthrall)

References

(Blocks 3 open bugs, )

Details

(Whiteboard: [sp3])

Attachments

(22 files)

48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review

Profiling shows that baseline compiling self-hosted code without waiting for warmup could improve Speedometer 3 scores. We can go farther and precompile self-hosted code when the runtime is initialized at the cost of increased pageload times (see SJP-130 for our plans to mitigate this).

Explore compiling self-hosted code using the Baseline Compiler and caching the results for reuse across realms. Currently, the generated bytecode is specific to a realm and script, so we will have to fall back to Baseline Interpreter methods to get the realm, etc. where necessary (only for self-hosted code) so the bytecode can be shared.

Whiteboard: [sp3]

We want to generate code for self-hosted code without a JSScript.

emitInitFrameFields() needs to set the interpreterScript_ offset so it is
available in the BaselineFrame for pushScriptArg() to use. Likewise, when doing
an OSR from BaselineInterpreter to compiled,
prepareForBaselineInterpreterToJitOSR() needs to keep interpreterScript_'s
value.

Some jit-test tests are still crashing, so this isn't a complete fix yet.

The changes aren't dependant on isSelfHosted() yet because there is no way to
set up the compiler from the API that will result in isSelfHosted() returning
true.

Depends on D187605

testSelfHostedCompile() is just for verifying changes in the generated code, it
doesn't properly automate checking the validity of the code.

Depends on D187606

The GC needs to know whether or not the interpreterScript_ is valid to trace.

Depends on D187608

I'm not sure if I'm referring to the JSSCript in a way that makes sure it isn't GC'd.

Reusing the JitCode from the script is failing assertions.

Depends on D187609

Depends on D187610

The shape pointer allows access to the Realm, which should not be shared when
we share the cached JitCode.

Depends on D187611

There appears to be no way to clean up internal watchpoints if "rc" fails,
other than quitting and restarting rr. This means after a few failures, rr
stops working because there are no more available hard watchpoints.

Depends on D187612

The delazified script needs a BaselineScript to store the JitCode that we are sharing.

We can't share the BaselineScript because it during
Zone::forceDiscardJitCode(), it will be deleted multiple times from the
JSScripts that reference it.

An alternative might be to reference count the BaselineScript, but that is more
complicated than I want to try right now. If this turns out to be a valuable
experiment, it would be better to reference count than to copy the
BaselineScript. HG: Enter commit message. Lines beginning with 'HG:' are
removed.

Depends on D187613

When self-hosted code is copied into a new realm, the frames baked into the
native bytecode need to refer to the IC list for the new JitScript or they will
execute the IC scripts using the IC stub fields from the wrong script.

For example, non262/TypedArray/of.js was crashing in
GetPrototypeFromBuiltinConstructor() because the JSContext's realm didn't match
the callee's realm (the assertion on JSObject.h:808 was failing). This was
happening because the IC stub providing the this pointer was from another
realm.

This test is apparently from a time when each newGlobal() created a separate
compartment by default, which is no longer the case.

Depends on D187618

If the realm doesn't have a JitRealm, then it won't know what to do with the
BaselineScript anyway.

The immediate issue was that JSScript::ensureHasJitScript() indirectly calls
ICScript::initICEntries(), which requires a JitRealm:
https://searchfox.org/mozilla-central/rev/d307d4d9f06dab6d16e963a4318e5e8ff4899141/js/src/jit/BaselineIC.cpp#326

Depends on D187619

Calls to handler.script() should be avoided; calls to get the script are still
sometimes needed and use scriptInternal() as long as the script data isn't
baked into the native bytecode.

In a couple cases (such as modules, MaybeIncrementCodeCoverageCounter()), I
haven't bothered to avoid calling script() because self-hosted code doesn't use
those cases yet.

Depends on D187620

The compiler frame's interpreter script pointer is only set for self-hosted
scripts when not in the interpreter, so we need to handle the case where it is
not self-hosted.

Depends on D187621

The JSFunction atom isn't the same as the key used when filling the Jit Cache,
and it doesn't look like the atom is globally unique (for example, the atom for
"ErrorToString" is "toString").

Changes to js.cpp make sure the Jit Cache is initialized in the release build,
since MOZ_ASSERT only produces code for debug builds.

Depends on D187623

This avoids the complexity of making sure initSelfHostingJitCache() is called
with a proper realm for every runtime.

We do run into the issue that the script we are storing in the JIT cache can
lose its baseline script (such as from a GC), so we are possibly having to
compile more often than we would if we kept it around.

Depends on D187624

Passing a parameter to BaselineCompile() isn't reliable, but is redundant.

Depends on D187625

Fixes broken AsyncIterator jstests.

Depends on D187626

JSScript was keeping a reference to the global, which violated expectations on
the Gecko side of things.

I'm not sure if we still need the double BaselineScript::Copy(), but I don't
expect it to cause correctness problems.

Depends on D187627

Depends on D191460

The scope leak is detected by a failed assertion:

Assertion failure: !globalScopeSentinel->IsAlive(), at /home/bryan/src/mozilla-unified/dom/workers/RuntimeService.cpp:2189

Depends on D191461

On Oct. 19, I posted results on Speedometer3 showing early baseline compilation of self-hosted code are showing an overall 1.6% slowdown. Subtests are mostly lower as well, though a few show improvement.

These results are slightly suspect, because of how CI was running Speedometer3 up until Dec. 4 (see bug 1867959 and bug 1864921), so I am attempting to re-run them to see if there is any change.

If there is an actual slowdown due to these patches, my best guess is it is due to generating slower bytecode so we can share JitCode across Realms. The cache has a 99% hit rate over a speedometer3 run, so the patches appear to be caching effectively.

There are other possible changes we could make to improve performance from early baseline compilation, including:

  • Optimizing the bytecode references to script, built-in objects, GC Things, and shapes, as mentioned above.
  • Baseline compilation is happening as needed, so the compilation can still possibly happen during the Speedometer3 timing window. We could address this by pre-compiling a core set (or all) of self-hosted functions at an earlier point.
  • Moving the Baseline compilation to a background thread might allow compilation to happen in parallel, reducing the time it takes for a script to become available to run.
  • The JIT BaselineScript cache is only shared within a JSRuntime, so it might help to allow it to be shared across runtimes (such as for workers).
  • Currently, the BaselineScript is copied every time we share the self-hosted function into a new Realm. If we manage the lifetime of the BaselineScript, we should be able to avoid these extra copies.
Blocks: 1870391
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: