Stackmap creation for wasm-via-Ion

RESOLVED FIXED in Firefox 67

Status

()

enhancement
RESOLVED FIXED
5 months ago
3 months ago

People

(Reporter: jseward, Assigned: jseward)

Tracking

(Blocks 1 bug)

unspecified
mozilla67
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox67 fixed)

Details

Attachments

(1 attachment, 4 obsolete attachments)

Assignee

Description

5 months ago
Create stackmaps when compiling wasm via Ion, so as to facilitate reftype
support in Ion.

From initial investigations, a plausible approach appears to be:

* when lowering wasm-MIR to wasm-LIR, mark LIR instructions requiring a
  Safepoint (root information) via new fn assignWasmSafepoint

* when lowering LIR to machine code, establish code-buffer offsets for the
  Safepoints by calling markSafepointAt

* Let Ion's register allocation fill in Safepoint stack offsets and register
  info

* In CodeGenerator::generateWasm, once code generation is complete, visit the
  completed Safepoints.  Convert them to the wasm::StackMap format and add
  them to the collection of StackMaps maintained by wasm::CompiledCode.

* The StackMaps thereby created are indistinguishable from those created by
  the baseline compiler, and so the same scanning/marking machinery works with
  them, without knowing which compiler created them.

Possible problems/risks

* for each call site, figuring out how many words at the top of the stack are
  outgoing args (and hence should not be included in this function's stack
  map).

* for each call site, ensuring that all args are evaluated before any of them
  are moved to outgoing argument slots, since those slots are not covered by
  the current function's stack map

* wasm::StackMap doesn't record roots-in-registers.  It's not clear yet
  whether doing so is necessary.
Assignee

Comment 1

4 months ago
Posted file ion-maps-2019feb04.tar (obsolete) —

Here's a tarfile containing complete patch stack, to make this easy to try
out. This contains the following patches, a la hg qser:

bug1520478-MIRType-RefOrNull-1.diff
bug1508559-v3-1-change-directives.diff
bug1508559-v3-2-Tgrow-fill-arg.diff
bug1508559-v3-3-enable-ion-gc.diff
bug1508559-v3-4-ion-insns.diff
bug15XXXXX-v3-4-ion-insns-RefOrNull-fixups.diff
bug1508559-v3-5-factor-rabaldr.diff
bug1508559-v3-6-reorganize-baldr.diff
bug15XXXXX-v3-6-reorganize-baldr-RefOrNull-fixups.diff
bug1476251-reinstate-debug-printing-2.diff
bug1508559-fix-tableget-bug-2.diff
bug1508559-fix-select-bug-1.diff
bug1524178-lockfree8-for-simulators-1.diff
bug1517924-ion-maps-3.diff  <-------- the stackmaps patch proper
make-buildable-on-try.diff

To use:

  • check out a copy of m-c
  • hg update the checkout to 456033:68941a0829dc
  • cd .hg
  • untar the tarfile, thus creating .hg/patches
  • cd ..
  • check with hg qser that the series is as shown above
  • hg qpush -a
  • build
Assignee

Comment 2

4 months ago
Posted patch bug1517924-ion-maps-3.diff (obsolete) — Splinter Review

The patch proper, as part of the tarball in comment 1, pulled out.

Assignee: nobody → jseward
Assignee

Updated

4 months ago
Attachment #9041120 - Flags: feedback?(lhansen)
Comment on attachment 9041120 [details] [diff] [review]
bug1517924-ion-maps-3.diff

Review of attachment 9041120 [details] [diff] [review]:
-----------------------------------------------------------------

Some preliminary comments.  Apart from the char representation of signatures and any extra work at runtime that ought be at compile time associated with processing signatures, things look good.  Will read again.  As mentioned on IRC, we should/could factor uncontroversial bits and land them separately.

::: js/src/jit/CodeGenerator.cpp
@@ +10415,5 @@
> +    size_t nBytesReservedBeforeTrap = pair.second;
> +
> +    wasm::StackMap* functionEntryStackMap
> +        = CreateStackMapForFunctionEntryTrap(argTypes, trapExitLayout,
> +                                             trapExitLayoutNumWords,

Can we avoid this cheaply if the frame has no refs?  (Ditto any other cases.)  I think we have an optimization for the baseline compiler but it would be very desirable here too.

::: js/src/jit/MacroAssembler.cpp
@@ +3382,5 @@
> +std::pair<CodeOffset, uint32_t>
> +MacroAssembler::wasmReserveStackChecked(uint32_t amount,
> +                                        wasm::BytecodeOffset trapOffset) {
> +  // This early exit probably isn't correct, and should be removed.  See
> +  // bug 1518785.

nit, cleanup needed.

::: js/src/jit/MacroAssembler.h
@@ +3325,5 @@
>    }
>    MOZ_CRASH("unexpected argType");
>  }
>  
> +static inline MIRType ToMIRType(char summaryChar) {

This really isn't desirable - we don't want to encode types as characters just so that we can pick that encoding apart again.  It'll just get hairier, too, when we have more interesting MIR types (eg, once we have nonnullable refs, 'R' will not be the obvious choice for "RefOrNull").

::: js/src/wasm/WasmBaselineCompile.cpp
@@ +4458,5 @@
> +        // However much we've pushed so far
> +        masm.framePushed() +
> +        // Extra space we'll push to get the frame aligned
> +        call->frameAlignAdjustment +
> +        // Extra space we'll push to get the arg area 16-aligned

Can probably skip the "16-" here, or usa symbolic name.

::: js/src/wasm/WasmGC.h
@@ +16,5 @@
> + * limitations under the License.
> + */
> +
> +#ifndef wasm_gc_h
> +#define wasm_gc_h

TIL that this confusion-inducing abbreviation is actually common.  Go figure :)

@@ +35,5 @@
> +//   L  MIRType::Int64
> +//   R  MIRType::RefOrNull
> +//   F  MIRType::Float32
> +//   D  MIRType::Double
> +//

As noted elsewhere, we don't want this.  Even a static array of enum values would be better (whether that's with a sentinel, a length passed in, or in an open-ended struct where there's a length field).  Indeed, I smell the possibility for optimization in some common cases.  If what we chiefly want is the /size/ of the arguments, then a table of such structs which also have fields for the size of arguments (aligned and not) could be pre-initialized from the main thread if not initialized already, and shared among the compilation threads.  Don't know if that saves enough to be worth the bother, though; possibly the size computation is dominated by calls to other wasm functions.

@@ +43,5 @@
> +
> +// JRS FIXME This comment has become orphaned.  It used to live in
> +// WasmBaselineCompile.cpp.  Rehome?  But where to?
> +// TODO / OPTIMIZE (Bug 1316821): This is expensive; let's roll the iterator
> +// walking into the walking done for passArg.  See comments in passArg.

Good question.  One could note that the code has moved on the bug, which is still open, and then delete the comment.

@@ +51,5 @@
> +// size of the area as a whole.  The size is as determined by the platforms
> +// native ABI.
> +//
> +// StackArgAreaSizeAligned returns the same, but rounded up to the nearest 16
> +// byte boundary.

Could we abstract that "16" somehow behind a name and add static_asserts if there are constraints that it needs to obey?

@@ +55,5 @@
> +// byte boundary.
> +//
> +// Note, StackArgAreaSize{Unaligned,Aligned}() must process all the arguments
> +// in order to take into account all necessary alignment constraints.  The
> +// signature must therefore be the complete call signature.

... ", including any receiver argument."
Attachment #9041120 - Flags: feedback?(lhansen) → feedback+
Assignee

Updated

3 months ago
Depends on: 1508559
Assignee

Updated

3 months ago
Depends on: 1527259
Assignee

Comment 4

3 months ago
Posted patch bug1517924-ion-maps-4.diff (obsolete) — Splinter Review
Attachment #9041120 - Attachment is obsolete: true

Updated

3 months ago
Duplicate of this bug: 1508557
Assignee

Updated

3 months ago
Depends on: 1528983
Assignee

Comment 6

3 months ago
Posted patch bug1517924-ion-maps-9.diff (obsolete) — Splinter Review
Attachment #9041119 - Attachment is obsolete: true
Attachment #9043642 - Attachment is obsolete: true
Attachment #9046110 - Flags: review?(lhansen)
Comment on attachment 9046110 [details] [diff] [review]
bug1517924-ion-maps-9.diff

Review of attachment 9046110 [details] [diff] [review]:
-----------------------------------------------------------------

Very nice indeed.

As a general comment, the StackMap and StackMaps types should move from WasmTypes.h to WasmGC.h, possibly?  Is there any reason we shouldn't?

Can you run a measurement on some large non-gc-using code base (Zen Garden and/or Tanks would be fine) to find out how much more memory we're using in Debug builds with the stack maps generated for everything?  I'm a little concerned that eg opt-debug builds that many people use for testing will be very bloaty and that this will affect performance in various ways.  If it's within reason then we shouldn't worry, but if it's not then we should worry about whether enabling stack maps in debug builds can be controlled somehow; we can discuss this if and when.

::: js/src/jit/CodeGenerator.cpp
@@ +10211,5 @@
>    emitRest(lir, temp2, numActuals, temp0, temp1, numFormals, templateObject,
>             false, ToRegister(lir->output()));
>  }
>  
> +// A stackmap creation helper.  Create a stackmap from an vector of booleans.

"a" vector.

@@ +10321,5 @@
> +  }
> +
> +  // BODY (GENERAL SPILL) AREA and FRAME and INCOMING ARGS
> +  // Deal with roots on the stack.
> +  size_t cursor = vec.length();

"cursor" is not really appropriate because it does not move.  How about eg "nonTrapArea"?

@@ +10333,5 @@
> +    if (gcSlot.stack) {
> +      // It's a slot in the body allocation, so .slot is interpreted
> +      // as an index downwards from the Frame*
> +      MOZ_ASSERT(gcSlot.slot <= nBodyBytes);
> +      uint32_t offB = nBodyBytes - gcSlot.slot;

"offB"?  "offsetFromBottom" worked well above, maybe repeat the success here.

@@ +10354,5 @@
> +    return true;
> +  }
> +#endif
> +
> +  // We're almost done.  Convert vec into a wasm::StackMap.

"We're almost done." can be removed, I think.  (I'm prone to chatty comments like this myself, but I realize they are mostly in the way.)

@@ +10355,5 @@
> +  }
> +#endif
> +
> +  // We're almost done.  Convert vec into a wasm::StackMap.
> +  MOZ_RELEASE_ASSERT(vec.length() * sizeof(void*) == nTotalBytes);

Given earlier code and asserts, this does not need to be a RELEASE assert.

@@ +10369,5 @@
> +  // Take the opportunity to check that we haven't marked any part of the
> +  // Frame itself as a pointer.
> +  stackMap->setFrameOffsetFromTop((nInboundStackArgBytes + nFrameBytes)
> +                                  / sizeof(void*));
> +  for (uint32_t i = 0; i < sizeof(wasm::Frame) / sizeof(void*); i++) {

Wrap the loop in #ifdef DEBUG, I think this is good hygiene.

@@ +10397,5 @@
> +//
> +// The "space reserved before trap" is the space reserved by
> +// MacroAssembler::wasmReserveStackChecked, in the case where the frame is
> +// "small", as determined by that function.
> +static bool CreateStackMapForFunctionEntryTrap(

Same comments for this as for the one above.

I don't really approve of the duplication of code but I see how hard it is to parameterize it.  There are some gratuitous differences, eg this one operates on sizeof(wasm::Frame) several places while the other abstracts that as nFrameBytes; with things like that fixed, the two functions look more similar.  Is it possible to factor head and tail parts and call them from the two functions, and does that help?

@@ +10423,5 @@
> +      sizeof(wasm::Frame) + nInboundStackArgBytes;
> +
> +  // REG DUMP AREA
> +  wasm::ExitStubMapVector trapExitExtras;
> +  if (! GenerateStackmapEntriesForTrapExit(argTypes, trapExitLayout,

Gratuitous space after "!".

@@ +10441,5 @@
> +
> +  // SPACE RESERVED BEFORE TRAP
> +  MOZ_ASSERT(nBytesReservedBeforeTrap % sizeof(void*) == 0);
> +  if (nBytesReservedBeforeTrap > 0 &&
> +      !vec.appendN(false, nBytesReservedBeforeTrap / sizeof(void*))) {

Looking at the code for Vector, it doesn't seem that you need the guard on > 0 here.

::: js/src/wasm/WasmInstance.cpp
@@ +1394,5 @@
> +  //
> +  // In debug builds, the stackmap construction machinery goes to considerable
> +  // efforts to ensure that the stackmaps for consecutive frames abut exactly.
> +  // This is so as to ensure there are no areas of stack inadvertantly ignored
> +  // by a stackmap, nor covered by two stackmaps.  Hence any failure of this

Comment ends mid-sentence.
Attachment #9046110 - Flags: review?(lhansen) → review+
Assignee

Comment 8

3 months ago

(In reply to Lars T Hansen [:lth] from comment #7)

Can you run a measurement on some large non-gc-using code base (Zen Garden
and/or Tanks would be fine) to find out how much more memory we're using in
Debug builds with the stack maps generated for everything?

Here are some numbers from Tanks. I didn't measure Zen Garden on the basis
that the resulting numbers would be similar, but everything would take four
times as long to run.

These numbers were gathered on an Intel Skylake at 2.6 GHz, with gcc-8.2.1
on Fedora 29, configured for a 64 bit build thusly:

--{enable,disable}-debug --enable-optimize="-g -O2"
--enable-valgrind --disable-jemalloc --enable-wasm-reftypes

--disable-jemalloc is possibly not necessary, but makes it easier to do heap
profiling. Run flags are

--wasm-compiler={ion,baseline} --no-threads --wasm-gc

Results (dbg=--enable-debug, rel=--disable-debug)

          |  Time      Time         | #maps       Max heap   of which maps
 Config   |  Insns     Secs    IPC  | created     MB         MB
----------+-------------------------+-------------------------------------
dbg, base |   8.50G    1.46   2.24  |  240,356    131.9      6.8
dbg, ion  | 180.00G   39.2    1.77  |  236,682    135.7      6.7
rel, base |   2.43G    0.43   2.17  |   33,688    124.6      < 1% of max
rel, ion  |  25.60G    6.22   1.58  |        0    103.6      < 1% of max

So the extra space use even in the always-generate-maps case is not excessive,
around 7MB out of a total of 130MB. For the rel/base case, I think there might
be a bug in the omit-the-map logic for the function entry stack map, hence
causing 34k maps to be created, presumably one per function. I'll check.

It's reassuring to see that, in the dbg cases, the two compilers produced more
or less the same number of maps. I assume the difference in the numbers is
due to different in-vs-out-of-line handling of various instance-function cases
(baseline does some stuff OOL that ion does in-line, and vice versa).

Interesting but unsurprising to see that baseline does well on IPC, both on an
absolute basis (2.17 is good) and also relative to Ion.

Also notable from the heap profiles is that wasm::TrapSite accounts for a lot
of space, perhaps 16MB, viz, more than twice as much as the stackmaps.

Assignee

Updated

3 months ago
Depends on: 1530991
Assignee

Comment 9

3 months ago

Patch with fixups for review comments in comment 7, if anyone wants to try
this out in the next few days.

I don't really approve of the duplication of code but I see how hard it is
to parameterize it. There are some gratuitous differences, eg this one
operates on sizeof(wasm::Frame) several places while the other abstracts
that as nFrameBytes; with things like that fixed, the two functions look
more similar. Is it possible to factor head and tail parts and call them
from the two functions, and does that help?

I fixed up the nFrameBytes vs sizeof(wasm::Frame) thing, and another one
similar. I also resequenced the heads of the two functions so as to make them
as similar as possible. That is definitely an improvement. I also looked at
factoring out the heads and tails, but the amount of code it would remove
would be minimal, and, given the parameter-passing that would be needed to
deal with differences in the tail, I reckon it wouldn't help overall
readability. So I didn't proceed further.

Attachment #9046110 - Attachment is obsolete: true

Comment 10

3 months ago
Pushed by jseward@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/01c2d53e22c4
Stackmap creation for wasm-via-Ion.  r=lhansen.

Comment 11

3 months ago
bugherder
Status: NEW → RESOLVED
Last Resolved: 3 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla67
You need to log in before you can comment on or make changes to this bug.