Closed Bug 1960433 Opened 8 months ago Closed 3 months ago

Support trial inlining in the baseline interpreter

Categories

(Core :: JavaScript Engine: JIT, task, P1)

task

Tracking

()

RESOLVED FIXED
144 Branch
Tracking Status
firefox144 --- fixed

People

(Reporter: iain, Assigned: iain)

References

(Blocks 4 open bugs)

Details

(Keywords: perf-alert)

Attachments

(8 files)

We inline when a baseline script reaches a particular warmup threshold. With off-thread baseline compilation enabled, it is possible for small scripts to reach that threshold in the baseline interpreter while we're still compiling the baseline script offthread. This means we won't inline into that script.

The most robust way to fix this is to enable inlining in the baseline interpreter at the same threshold. This should significantly reduce the effect of variance in background thread compilation time on inlining decisions.

Blocks: 1943901
Blocks: 1960435
Depends on: 1967810

When we passed the ICScript in the JSContext, we had to be certain that we were calling baseline code. If we instead called a different tier, it would leave the ICScript in the context, and a subsequent call to baseline code might pick it up, leading to a mismatched set of ICs. Now that we're passing it in the stub frame, it is always safe to call a lower tier, so we can simplify this code.

We have an invariant that a script must have a BaselineScript to have an IonScript. It's a little unclear how that should extend to inlined scripts.

Right now, there are a couple of assertions to enforce that inlined scripts must have a BaselineScript: one here, and another here.

This patch prevents us from reaching that situation. Alternatively, we could decide to relax those assertions.

Blocks: 1969328
Flags: needinfo?(iireland)

Backed out for causing bc crashes @ browser_ext_activeScript.js

There are some r+ patches which didn't land and no activity in this bug for 2 weeks.
:iain, could you have a look please?
If you still have some work to do, you can add an action "Plan Changes" in Phabricator.
For more information, please visit BugBot documentation.

Flags: needinfo?(jdemooij)
Flags: needinfo?(iireland)
Flags: needinfo?(jdemooij)

There's a finicky stack overflow issue in 32-bit Windows mochitests that I have to solve before this lands.

Flags: needinfo?(iireland)
Regressions: 1974484
Regressions: 1974946
Regressions: 1975489
Regressions: 1976691

There have been some performance regressions reported for this bug and I've tracked it down to part 3 ("Pass the ICScript in the stub frame instead of the JSContext"). The problem with this change is that it doesn't handle the case where there's an arguments-rectifier frame between the IC stub frame and the trial-inlined callee, because we set the flag on the stub frame's descriptor and not on the arguments-rectifier frame descriptor.

I tested a fix for that and it does get rid of (most of?) the regressions. However this needs a bit more work and there might be a better solution (one option is to go back to a field on the context, but keep the frame descriptor flag and also set that in the arguments rectifer trampoline like my patch does).

Let's back this out for now and Iain can fix it properly when he's back from leave.

Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: 142 Branch → ---

(In reply to Jan de Mooij [:jandem] from comment #20)

Let's back this out for now and Iain can fix it properly when he's back from leave.

Seems that we may also need to revert bug 1974946? I am once again seeing various compilation failures after the backout.

(In reply to Sandor Molnar[:smolnar] from comment #22)

Backed out for causing performance issues

Backout link: https://hg.mozilla.org/integration/autoland/rev/78c4083d168f41a4ddc2d9d04421d67b66d85bec

Perfherder has detected a browsertime performance change from push 78c4083d168f41a4ddc2d9d04421d67b66d85bec.

If you have any questions, please reach out to a performance sheriff. Alternatively, you can find help on Slack by joining #perf-help, and on Matrix you can find help by joining #perftest.

Improvements:

Ratio Test Platform Options Absolute values (old vs new) Performance Profiles
3% speedometer AngularJS-TodoMVC/DeletingAllItems windows11-64-24h2-shippable fission webrender 37.97 -> 36.94 Before/After

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests.

If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a performance sheriff to do that for you.

You can run all of these tests on try with ./mach try perf --alert 45971

The following documentation link provides more information about this command.

Keywords: perf-alert

(In reply to Cristina Horotan [:chorotan] from comment #23)

Backout merged to central: https://hg-edge.mozilla.org/mozilla-central/rev/78c4083d168f41a4ddc2d9d04421d67b66d85bec

Perfherder has detected a devtools performance change from push 78c4083d168f41a4ddc2d9d04421d67b66d85bec.

If you have any questions, please reach out to a performance sheriff. Alternatively, you can find help on Slack by joining #perf-help, and on Matrix you can find help by joining #perftest.

Improvements:

Ratio Test Platform Options Absolute values (old vs new)
20% damp custom.jsdebugger.pause.DAMP linux1804-64-shippable-qr e10s fission stylo webrender 928.08 -> 740.28
11% damp custom.jsdebugger.pause.DAMP macosx1470-64-shippable e10s fission stylo webrender 965.17 -> 858.77
8% damp custom.jsdebugger.pause.DAMP windows11-64-24h2-shippable e10s fission stylo webrender 339.20 -> 310.84
7% damp custom.jsdebugger.pause.DAMP windows11-64-24h2-shippable e10s fission stylo webrender 334.42 -> 309.39
4% damp custom.jsdebugger.stepInNewSource.DAMP linux1804-64-shippable-qr e10s fission stylo webrender 2,274.07 -> 2,177.98
2% damp custom.jsdebugger.stepInNewSource.DAMP windows11-64-24h2-shippable e10s fission stylo webrender 871.02 -> 853.36

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests.

If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a performance sheriff to do that for you.

You can run all of these tests on try with ./mach try perf --alert 45967

The following documentation link provides more information about this command.

Ah, good catch.

Bug 1815498 might be the best way to fix this. We already know the target here; instead of using the rectifier trampoline, we could just push the correct number of arguments in the first place.

Depends on: 1983140

As written, these tests require OSR to get into Ion. In test(Get|Set)Element, however, there are inner loops. If we trigger OSR at one of those inner loops, we break scalar replacement. This shouldn't normally happen because of our heuristics to prioritize outer loops, but it looks like offthread baseline compilation can occasionally disrupt the timing enough to trigger inner loop OSR.

The simplest fix is to split out inner functions that can be compiled on entry, then disable OSR. It doesn't look like inlining disrupts this, but I disabled it just to be sure.

Looking at the other subarray-scalar-replace tests, none of them involve inner loops, so they don't require the same change.

Regressions: 1985913
QA Whiteboard: [qa-triage-done-c145/b144]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: