Closed Bug 1831676 Opened 1 year ago Closed 10 months ago

Refactor JS C++ interpreter in support of partial evaluation / specialization

Categories

(Core :: JavaScript Engine, task, P1)

task

Tracking

()

RESOLVED WONTFIX

People

(Reporter: cfallin, Assigned: cfallin)

References

Details

Attachments

(5 files)

This bug captures the effort to refactor the C++ interpreter in SpiderMonkey in order to more easily allow partial evaluation for speedups (bug 1831399).

The two main difficulties I found when prototyping this functionality, at least in relation to the interpreter's current design, are that:

  • The interpreter currently optimizes handling of JS-to-JS calls by remaining within the same C++ call-frame. JS state is updated and the interpreter loop then dispatches to the next opcode (the first opcode in the callee), without any call at the C++ level.

    In order to partially evaluate the interpreter over a function body, we need a different activation of the interpreter function for every interpreted function. Hence, we need to (when configured in this mode) always make a C++ call -- recursive at the C++ level, back to the interpreter itself -- on a JS call.

  • Certain error-paths and a few kinds of control-flow, like generators, lead to a "computed target PC" that is only known in practice at runtime, not from the bytecode, or at least not easily. For example, catching an exception leads to a handler denoted from the try-notes, and the logic to find the right handler is very challenging to partially evaluate. Or, when resuming a generator, the next PC to resume at is a value loaded from runtime state. This is challenging because partial evaluation "unrolls" the interpreter loop to a new specialization for each bytecode, and we need to know statically what the CFG is (what the next-PC is from a given opcode, or next-PCs for each branch if conditional).

    In order to make this workable, we have a notion of "entry kind" for the C++ interpreter, and enter in "specialized" mode. If an exception is thrown, a generator is resumed, or any other case occurs that's difficult to partially evaluate, we "bail out" by tail-calling the interpret function in a "generic" mode (one of several actually). Semantically this is meaningless at the pure C++ level -- we replace the current activation with another that dispatches to the place we would have anyway -- but the distinction is very useful for a partial evaluator, because it specializes only those calls that are in "specialized" mode, and we get bailouts.

An important goal of this set of patches is to have zero impact on the current interpreter when not using it for my (admittedly weird) use-case. The first patch in the series pulls some state from stack-locals into a struct (which the compiler should turn back into locals hopefully with SROA) but otherwise everything is behind an ifdef. If I've missed something and there is impact, that's a bug and I'll try to fix it.

Blocks: 1831399

This is necessary for later parts of this patch-series to split
different parts of the interpretation across multiple invocations of
Interpret().

Assignee: nobody → chris
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true

This is necessary to separate out the one-time setup logic for the
InterpretContext from the actual body of the interpreter loop, which
we may want to split across multiple call-frames.

Depends on D177326

This creates a 1-to-1 correspondence between JS calls and C++ calls, when
enabled, allowing a partial evaluation / specialization tool to process
one invocation of InterpretInner() (with constant bytecode PC) to
create a compiled function body for one JS function.

Depends on D177327

This change is a potential optimization in itself, but also allows
partial-specialization tools to understand the interpreter's current
PC without having to analyze the memory slot for REGS.pc specially.

Depends on D177328

This mode does not actually perform any specialization, but it does
distinguish different sorts of calls to InterpretInner(): those for
which pc will always follow some statically-known CFG (e.g.,
conditionals and switches), and those for which it may
be computed arbitrarily, e.g. after resuming a generator. This allows
partial specialization to focus on where it works well and bail out to
a generic interpreter otherwise.

Depends on D177329

(In reply to Chris Fallin [:cfallin] from comment #0)

  • The interpreter currently optimizes handling of JS-to-JS calls by remaining within the same C++ call-frame. JS state is updated and the interpreter loop then dispatches to the next opcode (the first opcode in the callee), without any call at the C++ level.

    In order to partially evaluate the interpreter over a function body, we need a different activation of the interpreter function for every interpreted function. Hence, we need to (when configured in this mode) always make a C++ call -- recursive at the C++ level, back to the interpreter itself -- on a JS call.

This is vaguely similar to an issue that the performance team was having, where we wanted the (perf-based) profiler to attribute samples to the function being interpreted, but that couldn't be determined just by looking at the captured stack. The solution in that case was to add profiler-only trampoline frames when entering a new script in the C++ interpreter. That's not identical to what you need, but it sounds close enough that you might benefit from taking a look.

This is vaguely similar to an issue that the performance team was having, where we wanted the (perf-based) profiler to attribute samples to the function being interpreted, but that couldn't be determined just by looking at the captured stack. The solution in that case was to add profiler-only trampoline frames when entering a new script in the C++ interpreter. That's not identical to what you need, but it sounds close enough that you might benefit from taking a look.

Very interesting, thanks! I do remember bumping into this when rebasing lately (it looks to be added recently) and wondering what it was for. I agree that the mechanism here is a bit different in other ways; at least in terms of optimizations, e.g. the first patch pulls out common state so the "shared" rooted things don't get re-rooted on each call, and take up stack space.

Severity: -- → N/A
Priority: -- → P1
Blocks: 1832406

The following patches are waiting for review from a reviewer who resigned from the review:

ID Title Author Reviewer Status
D177326 Bug 1831676: Part 1: Refactor C++ interpreter to put function-local state in InterpretContext. r=jandem cfallin jandem: Resigned from review
D177327 Bug 1831676: Part 2: Move interpreter-loop body into InterpretInner(). r=jandem cfallin jandem: Resigned from review
D177328 Bug 1831676: Part 3: add option to perform a C++-level call of InterpretInner() for each JS call. r=jandem cfallin jandem: Resigned from review
D177329 Bug 1831676: Part 4: Interpreter: cache PC in a local. r=jandem cfallin jandem: Resigned from review
D177330 Bug 1831676: Part 5: add a notion of "specialization mode" to the interpreter. r=jandem cfallin jandem: Resigned from review

:cfallin, could you please find another reviewer?

For more information, please visit BugBot documentation.

Flags: needinfo?(chris)

:jandem communicated to me out-of-band that most of patch series is not likely to be upstreamable; so, I'll close the bug.

Flags: needinfo?(chris)
Status: ASSIGNED → RESOLVED
Closed: 10 months ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: