Closed Bug 1579770 Opened 5 years ago Closed 4 years ago

Prototype a low-level codegen API as a tentative MASM replacement

Categories

(Core :: JavaScript Engine: JIT, enhancement, P5)

enhancement

Tracking

()

RESOLVED INCOMPLETE
Tracking Status
firefox70 --- wontfix
firefox71 --- fix-optional

People

(Reporter: djvj, Assigned: djvj)

Details

Attachments

(1 file, 1 obsolete file)

This is a placeholder bug for a slightly more sophisticated codegenerator in Spidermonkey than the current masm "call methods to spit out code" approach.

Attached patch GIR.patch (obsolete) — Splinter Review

Incomplete implementation of a low-level IR infrastructure. Just graph-generation for now. Attaching so I don't lose the code.

Assignee: nobody → kvijayan

2 questions:
· Why does this project matters?
· Why not reusing a project in which we already invested a lot of resources and which already solve these issues, such as Cranelift?

Type: task → enhancement
Flags: needinfo?(kvijayan)
Priority: -- → P5

The MacroAssembler design was arrived at because of significant issues with the "smart" assembler used for TraceMonkey (NanoJIT).

To my memory, the issue was that the programmer always had a specific sequence of instructions in mind: because the assembler was "smart", the programmer then was forced to operate at one level of abstraction removed from the code they wanted to generate. Instead of just writing that code, they wrote code to trick the smart assembler into generating that code. It became harder to write, required more testing, and ultimately harder to debug, because it was no longer clear to a reader what was generating what.

The "generate code directly" approach is a good one and I'm suspicious of efforts to change it.

Learns me to make vague bugs at 1 in the morning just to bookmark some code. I think the bug title, written hastily as it was, sounds a lot more portentious than it is.

I should clarify here: whatever an eventual codegen backend ends up becoming, it'll need to be used by the C++ front-end in Spidermonkey. We're in the process of shifting our compiler focus away from Ion and building up from Baseline and CacheIR as our core optimization infrastructure. That work is going faster than anticipated.

I'm hopeful about CraneLift becoming that eventual backend. In the meantime, it would be good for nearer term production-trageted Spidermonkey jit work to be able to leverage some sort of structured code output instead of the very low-level masm layer we have now. It also serves to prepare the JIT frontends in general for working with a structured backend instead of our unstructured masm.

I'm not assuming by default this will get used - but I'm standing up a C++ strawman of what that would look like, designed so that we can quickly get to the point of emitting half-decent code in a production setting. It will use marginally more overhead than masm, and for code translated from the masm API to this structured API, should produce code roughly equivalent to what the masm code would have produced.

Let's say we eventually start looking at actually integration Spidermonkey with a proper codegen (e.g. CraneLift) sometime in the future. What questions will we need to answer? How will we handle generators? How will we handle exceptions and bailout paths? How much can we afford to spend at the compile step? What is the rough shape of the graph-builder API that spidermonkey would need to use? Is it better to include Value and other domain types at the lower level or not? etc. etc.

These are all questions, in my mind, that cannot be answered in the abstract. They have to be answered by some, at least strawman production implementation. I started hacking on that implementation, and I didn't want to lose my progress because I really should be hacking on other stuff, so I made this bug.

So this is intended to be a very lightweight, cross-platform, virtualized-register API wrapper around masm. The "replace masm" in the title, then, is from the perspective of masm's current users. This will use masm as a backend, and masm will still be around, and this is not an actual macroassembler.

IThis exercise should help us understand exactly what Spidermonkey would need from an eventual proper codegen, and help inform that work.. while providing some backend target we can use in production.

I have no particular time-frame for this aside from roughly "end of this half". I have other obligations I'm committed to this half, so I'll be able to push this along intermittently.

Flags: needinfo?(kvijayan)

the programmer then was forced to operate at one level of abstraction removed from the code they wanted to generate. Instead of just writing that code, they wrote code to trick the smart assembler into generating that code.

I failed to respond to this comment before. I agree that those sorts of allowances and abstraction holes are important. The analogue to writing a new MacroAssembler method is writing a custom instruction here (and perhaps emitting it on a platform-specific basis). Those kinds of cases should be rarer. The other issue is that the number of such holes has become too high. Masm has become a convenient dumping ground for all sorts of a mix of high-level and low-level methods and APIs, some domain aware stuff, some low-level stuff, a mix of various method contracts (tempreg state, available register state, etc.).

I'm modeling this more off of CacheIR - in the sense that it's as minimal as possible, almost punishingly so.

I'll describe a bit of the implementation detail so far here. The explicit design intent acknowledges no support for any sophisticated compiler passes - no adding/removing/reordering instructions or blocks, etc. There are a number of explicit constraints on the graph builder API and semantics - an RPO index of blocks is constructed incrementally as the graph is specified, and the graph is free of critical edges by construction. Phis are not computed and must be declared explicitly and must always occur at the earliest control flow join point (as the first sequence of instructions in a block). Instructions (operations and their input definitions) are encoded in a variable-length stream, block by block, in RPO order.

The only passes I expect this to support are:

  1. Some linear pass over instructions for small-scale subexpression matching - to handle cases like:
  • LOAD, STORE - folding away small expression trees into arch-specific addressing modes.
  • ADD, SUB - again, folding const definitions.
  • CMP/BRANCH fusing

This pass should be doable with a single linear scan using zero additional space (i.e. steal the high bit of the opcode byte in the instruction stream to store a material-uses attribute on every definition that isn't folded, and just ignore them for regalloc purposes).

  1. Register allocation - some very simple minded regalloc should suffice. Round robin would probably work for now. The raw machine code perf requirements of the initial use cases for a putative "Baseline++" compiler are light. Basically "somewhat better than baseline" would do, and baseline just uses a straight-up stack-machine model with some lightweight register caching of up to 2 of the top stack values. Any functional register allocator will perform significantly better than that.

After those get implemented, that should stand up enough of an infrastructure that we can start on experimenting with concrete designs for things like deoptimization. For example, can we efficiently represent them with direct control flow instead of special side-structures, and move towards "optimized" deopt paths that allow for fast transitions between tiers? Fast deopt paths didn't matter much in a world where you expected to run mostly hot code, but might matter a lot more for fuzzy type-dynamic, warming-up code that characterizes most web browsing.

Also questions like: can we meaningfully obtain good performance by stitching together CacheIR stub bytecode (generating a single unified graph from a Baseline-Interpreter IC chain set and a baseline compile of the top-level method)?

Summary: Design a MASM replacement → Prototype a low-level codegen API as a tentative MASM replacement
Attached patch GIR.patchSplinter Review

Completed some more implementation detail - mostly beefing up the walker interface for traversing blocks and instructions, and adding support for iterating over instructions in a block in reverse order.

Attachment #9091343 - Attachment is obsolete: true
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: