Closed Bug 1476251 Opened 2 years ago Closed Last year

Generate stack maps in the Wasm Baseline compiler

Categories

(Core :: Javascript: WebAssembly, enhancement)

enhancement
Not set

Tracking

()

RESOLVED FIXED
mozilla66
Tracking Status
firefox66 --- fixed

People

(Reporter: jseward, Assigned: jseward)

References

(Blocks 2 open bugs)

Details

Attachments

(1 file, 17 obsolete files)

155.05 KB, patch
lth
: review+
Details | Diff | Splinter Review
In order to support GCable things in Wasm, we'll need to generate some kind
of information that tells the GC which stack slots contain GC pointers, at
the following points:

* interrupt checks

* call insns
Component: JavaScript Engine: JIT → Javascript: Web Assembly
Blocks: 1456824
Blocks: 1488162
Attached patch bug1476251-stackmaps-2.diff (obsolete) — Splinter Review
WIP patch, far from landable.  Generates stackmaps in the baseline
compiler and (on x86_64) generates code to a helper function to check
them against reality, to the extent that that's possible.
Attached patch bug1476251-stackmaps-4.diff (obsolete) — Splinter Review
First "end-to-end" functionality.  Generates stack maps in baseline
compiler, attaches them to the Code(), walks wasm frames at GC time
and hands pointers to the GC.  Still WIP needing a lot of cleanup and
removal of debug printing.
Attachment #9008756 - Attachment is obsolete: true
Attached patch bug1476251-stackmaps-6.diff (obsolete) — Splinter Review
WIP: adds support for exit stub frame tracing, so as to handle interrupts.
Attachment #9011290 - Attachment is obsolete: true
Attached patch bug1476251-stackmaps-7.diff (obsolete) — Splinter Review
With a couple of crash fixes, and cleaned up debug printing.
Attachment #9012192 - Attachment is obsolete: true
Attached patch bug1476251-stackmaps-9.diff (obsolete) — Splinter Review
Now with support for x86_{32,64} and arm{32,64}.  Starting point for cleanup.
Attachment #9012610 - Attachment is obsolete: true
Attached patch bug1476251-stackmaps-11.diff (obsolete) — Splinter Review
First version worthy of serious consideration.  Contains a summary
of the implementation and status in its commit message.
Attachment #9013533 - Attachment is obsolete: true
Attachment #9013691 - Flags: feedback?(lhansen)
Comment on attachment 9013691 [details] [diff] [review]
bug1476251-stackmaps-11.diff

Review of attachment 9013691 [details] [diff] [review]:
-----------------------------------------------------------------

Some initial remarks, I'll need to read this a few more times and I'll have more comments.

Nit: there's actually not a summary of the implementation and status in the commit message.

My general impression is that this is conceptually good but is going to be too slow and too space-intensive, so we'll have to work on that.

::: js/src/jit/JitFrames.cpp
@@ +1293,5 @@
>  
>      activation->traceRematerializedFrames(trc);
>      activation->traceIonRecovery(trc);
>  
> +    // The return address of the immediate callee, viz, the address

"callee"?  I would have thought caller, given the rest of the comment.

::: js/src/wasm/WasmBaselineCompile.cpp
@@ +992,5 @@
> +    // we can't scan the section of stack corresponding to any "extras" since
> +    // those are only created when we take a signal in a loop head or stack
> +    // overflow check.  So skip over them.
> +    for (uint32_t i = map->nExtraWords; i < map->nFrameWords; i++) {
> +        bool      mapIsP = map->getBit(i) == 1;

"mapIsPointer" probably clearer.

@@ +1009,5 @@
> +            definitelyWrong = true;
> +        }
> +
> +        if (mapIsP && actual != 0 &&
> +            (actual <= 0xFFF || actual >= (~((uintptr_t)0)) - 0xFFF)) {

Nit: brace on separate line after multi-line condition.
Nit: assumes 4K page size which is not always true, though obviously no hardship here.

@@ +1846,5 @@
>      using BCESet = uint64_t;
>  
> +    // MachineStackTracker, used for stack-slot pointerness tracking.
> +
> +    class MachineStackTracker {

Can this, and perhaps other machinery, be factored out from BaseCompiler, in the way that there are other top-level classes in this file?

@@ +1864,5 @@
> +        }
> +        void push(bool isGCP) {
> +            vec_.append(isGCP);
> +        }
> +        void pushGCP() {

pushGCPointer is probably clearer (applies broadly in this class)

@@ +2439,5 @@
> +              case LocalRef:    fprintf(f, "LocRef slot %u", slot()); break;
> +              case RegisterI32: fprintf(f, "RegI32 name %s",
> +                                        i32reg().name()); break;
> +              case RegisterI64:
> +#                               ifdef JS_PUNBOX64

Nit: ifdefs are usually aligned left without all these spaces, except one or sometimes two spaces per ifdef nesting level/

@@ +2932,5 @@
> +        createStackMap(who, noExtras);
> +    }
> +
> +    void createStackMap(const char* who,
> +                        const Vector<bool, 32, SystemAllocPolicy>& extras) {

Nit: brace on separate line.

@@ +2933,5 @@
> +    }
> +
> +    void createStackMap(const char* who,
> +                        const Vector<bool, 32, SystemAllocPolicy>& extras) {
> +        // Start with the frame-setup map, and add operand-stack information

The first optimization here would be to not create a map at all if there are no recorded pointers.  Ideally the test would be constant time.  It is possible for the Stk to track this with a simple counter, I think.

@@ +3745,5 @@
> +    //
> +    // Stackmap helpers
> +
> +    void GenerateStackmapEntriesForTrapExit(Vector<bool, 32, SystemAllocPolicy>& extras)
> +    {

This will need cleanup before landing.  Clearly some kind of array mapping register names to slots is a good start.  Also see below.

@@ +3932,5 @@
> +            if (slotNo < 32)
> +                extras[slotNo] = true;
> +        }
> +#else
> +# error "GenerateStackmapEntriesForTrapExit: unimplemented on this arch"

Won't work, because this has to compile properly with no jit support (JS_CODEGEN_NONE), even if wasm isn't available at runtime in such a configuration.  What you want to do here is provide meaningful code for JS_CODEGEN_NONE (can be no-ops or functions that assert when run) and then the MIPS people can come behind us and insert their own thing.

But really we should have one piece of code here, data-driven, with data defined some suitable place.  Right next to IntArgRegs in jit/whatever/Assembler-whatever.h might be a good place, possibly.

@@ +3964,5 @@
>  #ifdef JS_CODEGEN_ARM64
>              static_assert(DebugFrame::offsetOfFrame() % WasmStackAlignment == 0, "aligned");
>  #endif
>              masm.reserveStack(DebugFrame::offsetOfFrame());
> +            // JRS FIXME: do we need to do anything here?  I think if we

Note FIXME.

@@ +8724,5 @@
>      masm.passABIArg(srcDest.low);
>      masm.passABIArg(rhs.high);
>      masm.passABIArg(rhs.low);
>      masm.callWithABI(bytecodeOffset(), callee);
> +    createStackMap("emitDivOrModI64Bui[..]");

This is wrong, for one thing, since callWithABI does not end with a call instruction, and so the stack map will not be associated with the return address that the hypothetical gc would use to look up the map.  In general, we probably want a less brittle mechanism here so that we can guarantee that this problem won't arise in other cases.  insertBreakableInstruction() is instructive; it calls nopPatchableToCall, which records the offset of the instruction we're interested in, regardless of whatever else might be around it.

For another thing, you should not need a stack map here because the callee cannot invoke the GC.  (Discuss.)

Ditto two cases below.

@@ +11022,5 @@
>      MOZ_ASSERT(func_.callSiteLineNums.length() == lastReadCallSite_);
>  
>      masm.flushBuffer();
>  
> +    // FIXMEMOZ_ASSERT(mst_.length() == 0);

Note, FIXME.

::: js/src/wasm/WasmFrameIter.cpp
@@ -478,5 @@
>          MOZ_ASSERT(BeforePushRetAddr == 0);
>          masm.push(lr);
>  # else
>          *entry = masm.currentOffset();
> -        // The x86/x64 call instruction pushes the return address.

Why was this removed?

::: js/src/wasm/WasmInstance.cpp
@@ +1002,5 @@
>  
> +uintptr_t
> +Instance::traceWasmFrame(JSTracer* trc, const wasm::WasmFrameIter& wfi,
> +                         uint8_t* pc, uintptr_t highestByteVisitedInPrevFrame)
> +{

Helpful to document what traceWasmFrame returns.

::: js/src/wasm/WasmTypes.h
@@ +1781,5 @@
>  
> +struct WasmStackMap final
> +{
> +    uint32_t nFrameWords; // total size of the map
> +    uint32_t nExtraWords; // of which this many are "exit stub" extras

Nit: SpiderMonkey style would be numFrameWords, usually, not the Geckoish "n" prefix.

"ExtraWords" is not good; why not numExitStubWords since that's what it's for?

I'm a little nervous about numFrameWords not being interpreted as the total frame size (inclusive of numExitStubWords) but not sure what to do about that yet.

BTW, ideally these fields should not even be here, but if they have to be here they will fit into a single uint32_t given that we have a limit on the stack size in the range of a few MB and the number of extra words will fit in a few bits.

@@ +1783,5 @@
> +{
> +    uint32_t nFrameWords; // total size of the map
> +    uint32_t nExtraWords; // of which this many are "exit stub" extras
> +  private:
> +    uint32_t bitmap[1];

We probably want a comment here that explains what the bitmap layout and semantics are.

@@ +1797,5 @@
> +    }
> +    static WasmStackMap* create(uint32_t nFrameWords) {
> +        uint32_t nBitmap = calcNBitmap(nFrameWords);
> +        char* buf = new char[sizeof(WasmStackMap) + (nBitmap - 1) * sizeof(bitmap[0])];
> +        return ::new(buf) WasmStackMap(nFrameWords);

Memo to self: to investigate: the use of "::new" is probably not kosher.  Also delete[], below.
Attachment #9013691 - Flags: feedback?(lhansen) → feedback+
Attached patch bug1476251-stackmaps-12.diff (obsolete) — Splinter Review
Rebased to m-c of about five hours ago (439351:d7c13cd08257).
No significant changes from the previous patch.
Attachment #9013691 - Attachment is obsolete: true
Blocks: 1498470
Blocks: 1498472
Attached patch bug1476251-stackmaps-18.diff (obsolete) — Splinter Review
This new patch addresses all review comments in comment 7.  It also has
extensive changes as detailed below.

For x64, x86 and arm32 it runs all of jit-test/tests/wasm (including its own
tests) with only block-debugger-findscripts.js failing.  On arm64 it passes
all of tests/wasm/gc at least.

Changes:

Stack maps are only created for frames in which there is a pointer.

The does-this-frame-have-a-pointer check is implemented in constant time using
counters in MachineStackTracker and (a counter associated with) Stk::vec_.

The ugly hack to do with generating stack maps for trap exit stubs has been
removed.  This is now computed from the same RegsToPreserve sets that the stub
itself is created from.  See GenerateStackmapEntriesForTrapExit().

Code compiled with debugEnabled() should be supported, although I don't think
that is really tested properly.

The size of a WasmStackMap for a map containing up to 32 pointers has been
reduced to 8 bytes.

The code for actually computing the maps in the Baseline compiler is now a lot
cleaner.
../../../PATCHES/bug1476251-st
OOM conditions during stack map construction are detected and handled.

After the maps have been created, a debug build will check that the code
address that the map relates (really, the instruction just before the
specified address) contains an expected instruction.  This shakes out
construction of maps in the "wrong" place.

Some naming to do with the instruction addresses to which stack maps are
associated, has been changed to make it clear that what the address is is
actually the first byte of the instruction *after* the instruction of
interest.

One of the tests, stackmaps.js, is probably redundant and should be removed.

Some debug printing remains to be removed.

Some sanity checking for the wasm stack number-of-pointers counter need to be
removed.  Look for "SANITYCHECK" in the patch.

This still has a hack in SuppressGC to remove the gc-suppressing code.  This
hack will presumably have to be removed in coordination with re-enabling gc
properly.

BaseCompiler::trapExitLayout_ and trapExitLayoutNumWords_ are computed once
per run of js::wasm::BaselineCompileFunctions.  Since they are constants they
only need to be computed once per process.  Maybe they could be floated
outwards.

In terms of performance of stack map construction (viz, the cost of a call to
createStackMap), we could almost certainly do a lot better.  I'd prefer to
leave that to a followup though, given that non-reffy code should entail
almost zero extra compilation costs given the fast any-pointers-present?
checks.  I'd also prefer to profile before hacking further on it.
Attachment #9014109 - Attachment is obsolete: true
Attachment #9017200 - Flags: feedback?(lhansen)
Comment on attachment 9017200 [details] [diff] [review]
bug1476251-stackmaps-18.diff

Review of attachment 9017200 [details] [diff] [review]:
-----------------------------------------------------------------

Hi, just wanted to say from a quick scan this looks great and make a few suggestions:

One drive-by style nit: inside namespace 'wasm', we generally avoid including "Wasm" in identifiers.  (The notable exception to this rule, which you've probably seen, is wasm::WasmFrameIter, which was unfortunately motivated by symmetry with other stack iters.)  So specifically I'd remove "Wasm" from wasm::WasmStackMap(s) and wasm::Instance::traceWasmFrame.

::: js/src/jit/MacroAssembler.cpp
@@ +3299,5 @@
>      loadWasmTlsRegFromFrame();
>  
>      call(wasm::CallSiteDesc(bytecode.offset(), wasm::CallSite::Symbolic), imm);
> +
> +    uint32_t retAddrOffset = currentOffset();

The wasm::CallSiteDesc overload of call() captures the return-address-offset as part of creating the wasm::CallSite, so it might be nice to have this overload return the return-address-offset as well so we can have a firm guarantee that no fixup instructions creep in after the call instruction.

::: js/src/wasm/WasmTypes.h
@@ +1849,5 @@
> +    // creation time.
> +
> +    uint32_t numMapWords:20; // total number of stack words covered by the map
> +    uint32_t numExitStubWords:6; // of which this many are "exit stub" extras
> +    uint32_t debugFrameOffsetFromTop:6; // offset of DebugFrame, if nonzero

I haven't traced through all the logic around this field, so I may be misunderstanding, but I think this field shouldn't be necessary: at any point where we're tracing a frame, we'll have a Frame* fp, so if fp->tls->instance()->debugEnabled(), then one can derive the DebugFrame* via DebugFrame::from(fp).
(In reply to Luke Wagner [:luke] from comment #10)

> ::: js/src/jit/MacroAssembler.cpp
> @@ +3299,5 @@
> >      loadWasmTlsRegFromFrame();
> >  
> >      call(wasm::CallSiteDesc(bytecode.offset(), wasm::CallSite::Symbolic), imm);
> > +
> > +    uint32_t retAddrOffset = currentOffset();
> 
> The wasm::CallSiteDesc overload of call() captures the return-address-offset
> as part of creating the wasm::CallSite, so it might be nice to have this
> overload return the return-address-offset as well so we can have a firm
> guarantee that no fixup instructions creep in after the call instruction.

Could do.  But note also that the new function js::wasm::IsValidStackMapKey
is used (in debug builds) to check that all stack maps point at an expected
instruction.  So it would detect any such slippage, both here and at any other
place where we create a stack map.  These checks are cheap; we could even
upgrade them to release asserts if you like.

> > +    uint32_t numMapWords:20; // total number of stack words covered by the map
> > +    uint32_t numExitStubWords:6; // of which this many are "exit stub" extras
> > +    uint32_t debugFrameOffsetFromTop:6; // offset of DebugFrame, if nonzero
> 
> I haven't traced through all the logic around this field, so I may be
> misunderstanding, but I think this field shouldn't be necessary: [..]

Hmm, good observation.  I suspect you're right.  I'll look into it.
(In reply to Julian Seward [:jseward] from comment #11)
> (In reply to Luke Wagner [:luke] from comment #10)
 
> > > +    uint32_t numMapWords:20; // total number of stack words covered by the map
> > > +    uint32_t numExitStubWords:6; // of which this many are "exit stub" extras
> > > +    uint32_t debugFrameOffsetFromTop:6; // offset of DebugFrame, if nonzero
> > 
> > I haven't traced through all the logic around this field, so I may be
> > misunderstanding, but I think this field shouldn't be necessary: [..]

A nice simplification.  Yes.
Some initial numbers, pertaining the the new test case stackmaps.js
in the patch.  This creates 7 stack maps.  The costs, measured as
x86_64 instructions, for js::wasm::BaselineCompileFunctions and its
children, are:

unpatched: 170,214 insns

patched: 193,841
  extra cost = 23,627 (13.8%)
  of which  3,499 ( 2.0%) to compute stack layout
  of which 18,057 (10.6%) in createStackMap

I'm pretty sure these extra costs can be halved.
Attached patch bug1476251-stackmaps-20.diff (obsolete) — Splinter Review
Incorporates Luke's DebugFrame simplification suggestion from comment 10
but is otherwise the same as the version attached in comment 9.
Attachment #9017200 - Attachment is obsolete: true
Attachment #9017200 - Flags: feedback?(lhansen)
Attachment #9018332 - Flags: feedback?(lhansen)
Comment on attachment 9018332 [details] [diff] [review]
bug1476251-stackmaps-20.diff

Review of attachment 9018332 [details] [diff] [review]:
-----------------------------------------------------------------

F+ on test cases.  (We need more, but we should use better tools so as to make them easier to write and maintain.)

::: js/src/jit-test/tests/wasm/gc/stackmaps.js
@@ +1,1 @@
> +if (!wasmGcEnabled() || typeof WebAssembly.Global !== 'function') {

WebAssembly.Global has shipped without ifdefs, so you don't need the second part of this guard.

Also, the new new thing for the guard is this syntax:
// |jit-test| skip-if: !wasmGcEnabled()

See files in jit-test/tests/wasm for many examples of this.

@@ +2,5 @@
> +    quit(0);
> +}
> +
> +// Check basic construction and debug printing of WasmStackMaps
> +// FIXME: incorrect location of loop continuation check

FIXME must be fixed before landing.

@@ +5,5 @@
> +// Check basic construction and debug printing of WasmStackMaps
> +// FIXME: incorrect location of loop continuation check
> +
> +const Module = WebAssembly.Module;
> +const Instance = WebAssembly.Instance;

The cool kids do "const {Module,Instance} = WebAssembly;"

@@ +26,5 @@
> +       (local i32)
> +
> +       ;; call direct 2, passing/returning a ptr, to check that the
> +       ;; call's stackmap notes the return reg is a GCPtr
> +       (call 2 (get_local 0))

This is only a nit, but you can name almost everything in the wast format, eg, $f1, and then reference it from places where you can use things from the appropriate namespace.  It makes test cases a /lot/ more resilient.  I see you figured this out for later test cases; suggest fixing it here too.

@@ +29,5 @@
> +       ;; call's stackmap notes the return reg is a GCPtr
> +       (call 2 (get_local 0))
> +       drop
> +
> +       (set_local 1 (i32.const 0))

You can name locals and params too: (param $x i32)  (local $n i32)  and then reference those names.

@@ +41,5 @@
> +                 (get_local 0) (get_local 0) (i32.const 15))
> +         ;; call indirect 0
> +         (call_indirect 0 (i32.const 10) (get_local 0) (i32.const 12)
> +                          (get_local 0) (get_local 0) (i32.const 15)
> +                          (i32.const 0))  ;; callIx

"callIx" ?

@@ +51,5 @@
> +       get_local 0
> +     )
> +   )`;
> +
> +let i = new Instance(new Module(wasmTextToBinary(t)));

Or simply "wasmEvalText(<string>)"

@@ +57,5 @@
> +function Croissant(chocolate) {
> +    this.chocolate = chocolate;
> +}
> +
> +for (let n = 0; n < 1000; n++) {

If this is a warmup loop, perhaps comment that.

::: js/src/jit-test/tests/wasm/gc/stackmaps2.js
@@ +11,5 @@
> +
> +let t =
> +  `(module
> +     (gc_feature_opt_in 1)
> +     (type (func (result i32) (param i32) (param anyref) (param i32)

Types can be named, too.

@@ +72,5 @@
> +    return new Croissant(true);
> +}
> +
> +let i = new Instance(new Module(wasmTextToBinary(t)),
> +                     {"name1":{name2: allocates}});

Quotes not needed, and you can pass an import object to wasmEvalText if you want to use that.

::: js/src/jit-test/tests/wasm/gc/stackmaps3.js
@@ +21,5 @@
> +     ;; -- fn 0
> +     (func $fn0 (export "fn0")
> +                (result i32) (param i32) (param anyref) (param i32)
> +                             (param anyref) (param anyref) (param i32)
> +       (local i32) ;; #6

And the idea of naming locals does not need to be sold, because it sells itself.

@@ +61,5 @@
> +         i32.add
> +
> +         ;; Do 1M iterations of this loop, to get a good amount of allocation
> +         (set_local 1 (i32.add (get_local 1) (i32.const 1)))
> +         (br_if 0 (i32.lt_s (get_local 1) (i32.const 1000000)))

Although wasm is never interpreted, this may cause some spurious timeouts.

@@ +86,5 @@
> +let i = new Instance(new Module(wasmTextToBinary(t)),
> +                     {"name1":{name2: allocates}});
> +
> +function handler() {
> +    print('XXXXXXXX icallback: START');

Either remove this (and its mate below) before landing or put it under an if (DEBUG) ... where DEBUG is a variable at the head of the file whose normal value is false.
Comment on attachment 9018332 [details] [diff] [review]
bug1476251-stackmaps-20.diff

Review of attachment 9018332 [details] [diff] [review]:
-----------------------------------------------------------------

OK, generally I believe this is roughly right.  I'm sure there are some fine points I've missed, but we'll do another round and can look at loop bounds and similar logic then.

Lots of nits here for style & similar; please fix as much as you can; if it offends your sensibilities, put up a fight.

Outright bugs: storage management of WasmStackMap instances in the presence of non-cleanup following OOM, as discussed on IRC; and you may not use new and delete the way you've used them here.  And OOM handling in beginFunction is probably not right, as an early return from beginFunction probably leaves the system in a not completely reliable state.

Performance concern: we need a better push() solution, I don't want it to have to unpack anything to perform a test and conditional increment, the type system should do this job for us unless I'm mistaken.

(All of this fleshed out in specific comments below.)

Nice job overall!

::: js/src/jit/JitFrames.cpp
@@ +1320,5 @@
>  
>  static void
>  TraceJitActivation(JSTracer* trc, JitActivation* activation)
>  {
> +    fprintf(stderr, "  in TraceJitActivation\n");

Just going to note once (and ignore all other places) that debug prints need to be removed before landing.

::: js/src/jit/MacroAssembler.cpp
@@ +3299,5 @@
>      loadWasmTlsRegFromFrame();
>  
>      call(wasm::CallSiteDesc(bytecode.offset(), wasm::CallSite::Symbolic), imm);
> +
> +    uint32_t retAddrOffset = currentOffset();

Nit: I would put the blank line after this line, to signify that the offset is related to the call, not to what comes next.

More substantively I'm somewhat concerned about the brittleness of this since call() is an abstraction several layers deep and there's no rule the says it /must/ end with the call instruction; it could in principle be a call instruction followed by a jump across a flushed constant pool (on ARM, ARM64)...  Pool flushing bites us reasonably regularly, though not normally at the end of instruction sequences.

::: js/src/wasm/WasmBaselineCompile.cpp
@@ +1788,5 @@
>  }
>  
> +// MachineStackTracker, used for stack-slot pointerness tracking.
> +
> +class MachineStackTracker {

Nit: predominant style in this file is to put the brace on the next line.

@@ +1818,5 @@
> +    MOZ_MUST_USE bool cloneTo(MachineStackTracker* dst) {
> +        MOZ_ASSERT(dst->vec_.empty());
> +        for (auto b : vec_) {
> +            if (!dst->vec_.append(b)) {
> +                return false;

Any reason not to use dst->vec_.appendAll(vec_) here, instead of the loop?

@@ +1828,5 @@
> +
> +    // Notionally push |n| non-pointers on the stack.
> +    MOZ_MUST_USE bool pushNonGCPointers(size_t n) {
> +        for (size_t i = 0; i < n; i++) {
> +            if (!vec_.append(false)) {

Can use appendN() here, cheaper and easier on the eyes.

@@ +2047,5 @@
>  
> +    // Fields for the generation of stack maps.
> +
> +    MachineStackTracker mst_;        // tracks machine stack pointerness
> +    uint32_t            masm_FPaEB_; // masm.framePushed at entry to body

This mysteriously named variable is used only six more times in the entire code base, so give it a better name: framePushedAtEntryToBody_ is fine with me (the masm prefix is implied, I would say, but I'm not going to object if you want to make it masmFramePushed...).

@@ +2495,5 @@
> +
> +    void push(Stk::Kind k, uint32_t v) {
> +        stk_.infallibleEmplaceBack(Stk(k, v));
> +        if (k == Stk::MemRef) {
> +            memRefsOnStk_++;

We do not want a test on this hot path.  We should be able to use the type system for this directly, instead of packaging up a value for commonality and then unpacking it again to test for something.  It may mean some kind of duplication of code here, or a template + specialization, I don't know - but let's figure it out.

@@ +2502,5 @@
> +
> +    void push(Stk item) {
> +        stk_.infallibleEmplaceBack(item);
> +        if (item.kind_ == Stk::MemRef) {
> +            memRefsOnStk_++;

Ditto.

@@ +2863,5 @@
> +
> +    // Create a vanilla stack map.
> +    void createStackMap(const char* who)
> +    {
> +        const Vector<bool, 32, SystemAllocPolicy> noExtras;

Typedef this vector and use that name, don't propagate this information everywhere.  That also allows you to vary the default size more easily, may be useful for it to be smaller or larger on some platforms.

@@ +2864,5 @@
> +    // Create a vanilla stack map.
> +    void createStackMap(const char* who)
> +    {
> +        const Vector<bool, 32, SystemAllocPolicy> noExtras;
> +        createStackMap(who, noExtras, masm.currentOffset(), false);

The old custom is to put a comment in front of bools like these, "/*hasRefTypedDebugFrame=*/ false".  The new custom is to create an enum class for this and use elements of it in order to make it completely self-documenting.  See eg UseABI and InterModule at the top of this file.

@@ +2884,5 @@
> +
> +    // The most general stack map construction.
> +    void createStackMap(const char* who,
> +                        const Vector<bool, 32, SystemAllocPolicy>& extras,
> +                        uint32_t assemblerOffset, bool hasRefTypedDebugFrame)

One param on each line here.

@@ +2905,5 @@
> +        // In the debug case, create the stack map regardless, and cross-check
> +        // the pointer-counting below.  We expect the final map to have
> +        // |countedPointers| in total.  This doesn't include those in the
> +        // DebugFrame, but they do not appear in the map's bitmap.
> +        for (auto b : extras) {

Except for a situation where the return value from a function is of a platform-dependent type, this file does not use auto.  (And even in that situation it really should be using a typedef...)  You should just use "bool" here, and elsewhere.

@@ +2922,5 @@
> +        // Add one new entry for each word pushed on the stack since we
> +        // entered the function body.
> +        MOZ_ASSERT(masm.framePushed() >= masm_FPaEB_);
> +        uint32_t bodyPushedBytes = masm.framePushed() - masm_FPaEB_;
> +        MOZ_ASSERT(0 == (bodyPushedBytes % sizeof(void*)));

Parens not necessary, also below.

@@ +2933,5 @@
> +        // section.
> +        for (Stk& v : stk_) {
> +            if (v.kind() != Stk::MemRef)
> +                continue;
> +            // v.offs() appears to hold masm.framePushed() at the point

"appears to"?

@@ +3738,5 @@
> +        // If this doesn't hold, we can't distinguish saved and not-saved
> +        // registers in the MachineState.  See MachineState::MachineState().
> +        MOZ_ASSERT(trapExitLayoutNumWords_ < 0x100);
> +
> +        for (size_t i = 0; i < trapExitLayoutNumWords_; i++) {

Can't you just resize() it and get the default value instead of having this loop?  or at least use appendN()?

@@ +3740,5 @@
> +        MOZ_ASSERT(trapExitLayoutNumWords_ < 0x100);
> +
> +        for (size_t i = 0; i < trapExitLayoutNumWords_; i++) {
> +            if (!extras.append(false)) {
> +                setOOM();

Return false here, see comment below.

@@ +3771,5 @@
>      //
>      // Function prologue and epilogue.
>  
>      void beginFunction() {
>          JitSpew(JitSpew_Codegen, "# Emitting wasm baseline code");

beginFunction needs to be "MOZ_MUST_USE bool" now, since it can exit early; the caller must check the result and propagate it.  Where you call setOOM below you can just return false.

If there are callees from beginFunction that can themselves set the oom flag then  we should probably worry about testing that flag at appropriate places in beginFunction, or avoiding the situation.  For example, GenerateStackMapEntriesForTrapExit can set that flag.  It is only called from beginFunction, so it should not set the flag - it should return a boolean result, which we should propagate here.

By and large the default OOM handling behavior must still be error propagation; the OOM flag can only be used when there's no risk of creating an inconsistent system state and it is a real hardship to propagate the flag, and the oom flag should probably be set at the point where it is no longer practical to propagate it any further.  createStackMap() is a poster child in this regard.

IMO beginFunction() should return true on its last line, but before that it should assert that the oom flag is not set.  (Open to counterarguments; it's just my default position.)

@@ +3814,5 @@
> +            for (size_t i = 0; i < sizeof(js::Value) / sizeof(void*); i++) {
> +                masm.storePtr(ImmWord(0),
> +                              Address(masm.getStackPointer(),
> +                                      DebugFrame::offsetOfCachedReturnJSValue()
> +                                          + i * sizeof(void*)));

You have 100 chars to use so you can probably save a line here, if you like.

@@ +3825,5 @@
> +        const ValTypeVector& args = funcType().args();
> +        Vector<bool, 32, SystemAllocPolicy> extras;
> +
> +        GenerateStackmapEntriesForTrapExit(extras);
> +        createStackMap("stack check", extras, masm.currentOffset(), false);

An excellent example of code that will, and should, set the oom flag, because it must when used from elsewhere; but where we could usefully test the flag directly here after the call and then return immediately if it is set.

@@ +3854,5 @@
>                case MIRType::Int64:
>                  fr.storeLocalI64(RegI64(i->gpr64()), l);
>                  break;
> +              case MIRType::Pointer: {
> +                uint32_t offs = fr.localOffset(l);

I'm not fond of this but I understand what's going on.  Having to expose fr.localOffset() is probably a symptom of an interface mismatch; should some of this machinery be in the frame abstraction instead?  (I suspect yes.)  However I am OK with leaving it as it is for now; we should prioritize getting the functionality into the code, and then we can readjust abstractions once we know what it will look like.

@@ +10570,5 @@
> +         }; \
> +         MOZ_ASSERT(nActual == memRefsOnStk_); \
> +    } while (0)
> +#else
> +# define CHECK_POINTER_COUNT /**/

I believe "((void)0)" or "do{}while(0)" is canonical for an effect-less define, because it gives the following semicolon something to work on.

@@ +11627,5 @@
>  #undef RABALDR_FLOAT_TO_I64_CALLOUT
> +
> +#ifdef DEBUG
> +bool
> +js::wasm::IsValidStackMapKey(bool debugEnabled, const uint8_t* nextPC)

I approve of this, esp given the relative brittleness of computing the return point (see comment elsewhere).  Even without the brittleness I would approve.  If it turns out to be a maintenance headache we can reconsider.

@@ +11665,5 @@
> +# else
> +    MOZ_CRASH("IsValidStackMapKey: requires implementation on this platform");
> +    return false;
> +
> +# endif

Blank line and return false are both redundant.

::: js/src/wasm/WasmGenerator.cpp
@@ +654,5 @@
> +        WasmStackMaps::Maplet maplet = code.stackMaps.get(i);
> +        maplet.offsetBy(offsetInModule);
> +        if (!metadataTier_->stackMaps.add(maplet)) {
> +            return false;
> +        }

Storage management does not seem right here.  This creates a situation where there are multiple references to the underlying WasmStackMap, one from the code.stackMaps and one from metadataTier_->stackMaps.  There is however no reference counting here, so we depend on cleanup elsewhere to get this right.  Now, the code.stackMaps object is cleared /if the linking succeeds without oom/, but only then.  So if we oom in the call to add() here, we are left with two stackmaps referencing the same WasmStackMap and we'll have a double free, I think.

One simple, local fix may be to add a move() or getAndClear() method to WasmStackMaps that gets the element and clears it at the same time, to avoid that; however in that case you must be sure to free it along the exit path here (since this function is now the owner).

A different fix still may be to call metadataTier_->stackMaps.clear() before returning along the false path here.  However this does create brittle code, because if somebody adds more code below this new section and it can fail, then we'd also need to clear metadataTier_->stackMaps along /that/ path, and that doesn't sound good.  So I would not favor that.

@@ +901,5 @@
> +    // binary-search them at GC time.
> +    metadataTier_->stackMaps.sort();
> +
> +#ifdef DEBUG
> +    // Check that the stack map contain no duplicates, since that could lead

nit: "maps" or "contains"

@@ +903,5 @@
> +
> +#ifdef DEBUG
> +    // Check that the stack map contain no duplicates, since that could lead
> +    // to ambiguities about stack slot pointerness.
> +    uint8_t* previousNIA = nullptr;

"NIA"?

@@ +907,5 @@
> +    uint8_t* previousNIA = nullptr;
> +    for (size_t i = 0; i < metadataTier_->stackMaps.length(); i++) {
> +        const WasmStackMaps::Maplet& maplet = metadataTier_->stackMaps.get(i);
> +        if (i > 0) {
> +            MOZ_ASSERT(uintptr_t(maplet.nextInsnAddr) > uintptr_t(previousNIA));

You can use MOZ_ASSERT_IF(i > 0, ...)

::: js/src/wasm/WasmInstance.cpp
@@ +1075,5 @@
> +    // Specifically, the highest-addressed few words described by |map|
> +    // correspond exactly to the wasm::Frame for this activation.  We have
> +    // to calculate |scanStart|, the lowest address that is described by |map|.
> +
> +    const size_t numMapBytes = map->numMapWords * sizeof(void*);

Cross-reference: see comment about "numMapWords" in WasmTypes.h

@@ +1087,5 @@
> +    // Do what we can to assert that, for consecutive wasm frames, their stack
> +    // maps also abut exactly.  This is a useful sanity check on the sizing of
> +    // stack maps.
> +    if (highestByteVisitedInPrevFrame != 0) {
> +        MOZ_ASSERT(highestByteVisitedInPrevFrame + 1 == scanStart);

MOZ_ASSERT_IF?

@@ +1098,5 @@
> +    // constant created by (code created by) GenerateTrapExit.
> +    if (map->numExitStubWords > 0) {
> +#if defined(JS_CODEGEN_ARM64)
> +        // There's an extra alignment word above the constant word.
> +        MOZ_ASSERT(stackWords[map->numExitStubWords - 2] == 1337);

We should simplify this code by lifting the 2/1 distinction into a global const somewhere, as noted on a different file.

Again, MOZ_ASSERT_IF can be used, too.

Also, we want to lift the 1337 to a global constant, so that we can use it by name and so that we can teach mrgiggles about it.

@@ +1112,5 @@
> +        }
> +
> +        // This assertion seems at least moderately effective in detecting
> +        // discrepancies or misalignments between the map and reality.
> +        bool plausible = js::gc::IsCellPointerValidOrNull((const void*)stackWords[i]);

Need an #ifdef DEBUG around the plausible computation/use here, since it's only needed for debugging I think and we don't want this computation to slow down tracing normally.

@@ +1126,5 @@
> +    }
> +
> +    const uintptr_t scanEndPlus1 = scanStart + numMapBytes;
> +
> +    // Finally, deal with a ref-typed DebugFrame if it is present.

Nice.

::: js/src/wasm/WasmStubs.cpp
@@ +1744,5 @@
> +#ifdef JS_CODEGEN_ARM64
> +    *numWords = 2;
> +#else
> +    *numWords = 1;
> +#endif

We want to consider centralizing parameters like these.  These two values are used in a different file in the patch too.

::: js/src/wasm/WasmTypes.h
@@ +1849,5 @@
> +    // creation time.
> +
> +    uint32_t numMapWords:24; // total number of stack words covered by the map
> +    uint32_t numExitStubWords:7; // of which this many are "exit stub" extras
> +    uint32_t hasRefTypedDebugFrame:1; // note presence of a ref-typed DebugFrame

Nice (also the static_assert).  If I'm to nitpick, I would say that "numMapWords" sounds like the number of words in the map; "numMappedWords" would be slightly clearer.  This propagates into other files and into other variables such as "numMapBytes" in WasmInstance.cpp.

@@ +1861,5 @@
> +        hasRefTypedDebugFrame(0)
> +    {
> +        const uint32_t nBitmap = calcNBitmap(numMapWords);
> +        for (size_t i = 0; i < nBitmap; i++)
> +            bitmap[i] = 0;

Memset?  Also see below about calloc.

@@ +1867,5 @@
> +
> +  public:
> +    static WasmStackMap* create(uint32_t numMapWords) {
> +        uint32_t nBitmap = calcNBitmap(numMapWords);
> +        char* buf = new char[sizeof(WasmStackMap) + (nBitmap - 1) * sizeof(bitmap[0])];

Do we need an OOM test or will placement new take care of that?

@@ +1868,5 @@
> +  public:
> +    static WasmStackMap* create(uint32_t numMapWords) {
> +        uint32_t nBitmap = calcNBitmap(numMapWords);
> +        char* buf = new char[sizeof(WasmStackMap) + (nBitmap - 1) * sizeof(bitmap[0])];
> +        return ::new(buf) WasmStackMap(numMapWords);

You may not use standard new and delete, see https://searchfox.org/mozilla-central/source/js/public/Utility.h#454 for more.

You can instead use js_malloc or (better in this case imo) js_calloc, and js_free, to manage the memory.  There are other options; see the comment block at that location.

@@ +1900,5 @@
> +
> +    // Add extra elements to the front (lowest-addressed end in a grow-down
> +    // stack) of the map, sliding all other entries along (notionally, to
> +    // higher addresses) to make space.
> +    void prependExtras(const Vector<bool, 32, SystemAllocPolicy>& extras) {

We want to typedef this vector type somewhere, ExitStubMapVector maybe, so that we don't have to spread information about alloc policy and number of stack-allocated entries around like this.

@@ +1906,5 @@
> +        uint32_t nExtras = int32_t(extras.length());
> +        if (nExtras == 0) {
> +            return;
> +        }
> +        MOZ_RELEASE_ASSERT(nExtras < (1 << 7));

This 1 << 7 could usefully use a name, and be defined above, along with the bitfield, to keep those values close to each other.

@@ +1909,5 @@
> +        }
> +        MOZ_RELEASE_ASSERT(nExtras < (1 << 7));
> +        MOZ_ASSERT(nExtras <= numMapWords);
> +        int32_t i;
> +        for (i = numMapWords - 1; i >= int32_t(nExtras); i--) {

The int32_t() cast is not needed here because you have that when you compute nExtras, but then inexplicably you store the value in a uint32_t.  Better that nExtras be int32_t to begin with.

Since the `i` here is not the same as the `i` below, perhaps move the declaration here into the for loop header and create a separate variable below, for the sake of hygiene.  (I know, matter of taste.)

@@ +1943,5 @@
> +  private:
> +    static constexpr uint32_t wordsPerBitmapElem = sizeof(bitmap[0]) * 8;
> +
> +    static uint32_t calcNBitmap(uint32_t numMapWords) {
> +        MOZ_RELEASE_ASSERT(numMapWords < (1 << 24));

Ditto for the "24" here, lift this constant to the top where we define the bitfield.
Attachment #9018332 - Flags: feedback?(lhansen) → feedback+
Blocks: 1505768
Attached patch bug1476251-stackmaps-37.diff (obsolete) — Splinter Review
WIP patch.  Incorporates fixes for all review comments in comments 16 and 10,
except for removal of debug printing.  It segfaults during compacting GC.
Attachment #9018332 - Attachment is obsolete: true
Attached file testcase.js (obsolete) —
Segfaulting testcase.
STR for compacting GC segfault.

Testcase is testcase.js from comment 18.  This is a simplified version of
tests/wasm/gc/ref-struct.js.

Apply patch from comment 17 on top of 444435:9f9a9234959f.

cd js ; mkdir BUILD64 ; cd BUILD64

(cd ../src && autoconf-2.13) && \
CC="ccache clang" CXX="ccache clang++" ../src/configure \
  --enable-debug --enable-optimize="-g -Og" \
  --enable-valgrind --disable-jemalloc

# NB 1. This is clang 6.0.0
# NB 2. You probably don't need  --enable-valgrind --disable-jemalloc,
        but I haven't tried it without

make -j8

./dist/bin/js --gc-zeal 2,12 -f ../src/jit-test/lib/prologue.js --js-cache \
   ../src/jit-test/.js-cache --wasm-gc -e 'const platform='"'"'linux2'"'"'' \
   -e 'const libdir='"'"'../src/jit-test/lib/'"'"'' \
   -e 'const scriptdir='"'"'..//src/jit-test/tests/wasm/gc/'"'"'' \
   -f ../src/jit-test/lib/wasm.js \
   -e 'if ((!wasmGcEnabled())) quit(59)' \
   --module-load-path ../src/jit-test/modules/ -f ./testcase.js

Runs for about 8 seconds, eventually ending thusly:

  in TraceJitActivations ---------------------
    in TraceJitActivation
      Exit         PC=(nil)
      BaselineStub PC=0x1a03d1a58b17
      BaselineJS   PC=0x1a03d1a582ea
      Wasm         PC=0x1a03d1a69f8b  no map
      Wasm         PC=0x1a03d1a69b09  { dMap n, nWords 6: PsPsss }
                   Scanning [0x7ffe7b179e00, 0x7ffe7b179e30)
      Wasm         PC=0x1a03d1a69b30  { dMap n, nWords 6: PsPsss }
                   Scanning [0x7ffe7b179e30, 0x7ffe7b179e60)
      Wasm         PC=0x1a03d1a699d6  { dMap n, nWords 4: Psss }
                   Scanning [0x7ffe7b179e60, 0x7ffe7b179e80)
  Segmentation fault (core dumped)

# --gc-zeal 2,12 causes a collection every 12 allocations.  It is very
# sensitive to the second param.  Eg, with 2,11 it runs fine.
Analysis.

After fixing a previous bug in stackmap construction, and adding a bunch more
assertery for it, I'm reasonably confident that the stackmap construction is
correct.

I looked closely at emitStructNew and somewhat at emitStruct{Get,Set,Narrow},
but didn't see any obvious way in which they could create a pointer that
createStackMap() wouldn't notice.

From some poking around, the crash happens in (code generated by)
emitStructGet, when it dereferences the object pointer in order to fish out
some field.  The object pointer is stale.  Running on valgrind leads to
a similar conclusion -- we crash soon after using stale data.

  Conditional jump or move depends on uninitialised value(s)
    at 0x1F905C644A44: ???
    ...
   Uninitialised value was created by a client request
    at 0x1B41BD6: SetMemCheckKind (js/src/jsutil.h:314)
    by 0x1B41AEA: js::AlwaysPoison (js/src/jsutil.h:353)
    by 0x1B452AE: js::NurseryChunk::poisonAndInit (js/src/gc/Nursery.cpp:94)

  Invalid read of size 4
    at 0x1F905C644A9A: ???
  Address 0xfffe2f2f2f2f2f3f is not stack'd, malloc'd or (recently) free'd

I noticed after a while, from the TraceJitActivation printout shown above,
that there's a Wasm frame that has no stackmap

 Wasm         PC=0x1a03d1a69f8b  no map
 
That's unusual for a debug build, because the compiler creates a stackmap for
every call, trap and debug breakpoint, even if there are no pointers.

By disassembling code, it's possible to see that that PC pertains to a call
instruction in code created by GenerateImportJitExit, specifically this line:

    masm.callJitNoProfiler(callee);

Putting this immediately after that line makes it easy to be sure:

    masm.xorPtr(rsi, r8); masm.xorPtr(rsi, r8);

All that said, I don't know whether this "no map for the stub" line of enquiry
is relevant or not.  Having looked at the code for the stub -- at least for
the case it complains about -- it's not clear to me that it does push any
pointers on the stack.

However, that said, if I hack the Instance::traceFrame routine (it is this
that complains about the missing map) thusly, the crash goes away:

     if (!map) {
+        uintptr_t* frame = (uintptr_t*)wfi.frame();
+        TraceRoot(trc, (JSObject**)&frame[-1],
+                       "Instance::traceWasmFrame: ?????");
         return 0;
     }

so it does seem that there's a pointer somewhere *near* the offending frame,
which is somehow being missed.  But by comparing the assembly code of the stub
(which created the frame) with the actual contents of the stack, it's possible
to see that this word is about 12 words up the stack, yet the stub only moves
RSP down by 6 or 8 words in total.  And yet the address of the word doesn't
fall into the wasm frame above (outer relative to) it on the stack.

So I'm generally mystified.
Flags: needinfo?(lhansen)
I can repro locally with the given base revision, though not with a later base revision.  So: rebase & ship it!  :-)
Flags: needinfo?(lhansen)
After rebasing to 445243:63eb34f9b171, it still fails with but with different
--gc-zeal=2,N parameters: for N in 84 77 71 68 67 59 58 53 51 46 39 38 36 32
30 28 26 23 22 18 17 (I tested the range 100 .. 1).
Adding --no-wasm-ion to the command line is fine; this will at least take ion out of the equation.  (Remember that --wasm-gc plus the gc_feature_opt_in will force baseline compilation + no tiering for a module, but will allow ion to be used for other modules.  This test case only has the one module, so it's probably not a big deal to disable ion.)

But adding --no-ion or --no-baseline removes the crasher at least with this base revision.
It is definitely the case that TraceJitActivation skips the innermost true (not counting the exit stub) frame.
This appears to be due to abuse of JitFrameIter::asWasm(); this does not return the appropriate data structure, since the WasmFrameIter usually skips the first frame.  We need a more localized solution here.
What seems to be happening is that JitFrameIter::settle, when it switches from JS frames to Wasm frames, first skips the exit frame by asking for the prevFP, and then constructs a WasmFrameIter that also skips the first frame, assuming that it is an exit frame.  Thus we end up skipping the first real frame incorrectly.
Attached patch bug1476251-dontskip.patch (obsolete) — Splinter Review
Benjamin, see the last several comments on this bug.  It looks like when we walk up the stack and go from jit code into wasm we skip two frames because both the JitFrames iterator and the WasmFrameIter skip a frame, but only one of them can do that.  It also looks like, in this situation, we're the only caller to the WasmFrameIter with a given PC, so we can not-skip the frame safely.  Don't know if this is the best way to solve this.  It does fix the crash though - we now see the frame.
Attachment #9024362 - Flags: feedback?(bbouvier)
s/given PC/given FP/
(In reply to Lars T Hansen [:lth] from comment #27)
> Created attachment 9024362 [details] [diff] [review]
> bug1476251-dontskip.patch

This makes "Linux x64 debug (cgc)" succeed (yay!) but it also causes
assertion failures on x86(32), arm32 and arm64, here:

void
DebugState::adjustEnterAndLeaveFrameTrapsState(JSContext* cx, bool enabled)
{
    MOZ_ASSERT_IF(!enabled, enterAndLeaveFrameTrapsCounter_ > 0);

This is now observing "enabled == false && enterAndLeaveFrameTrapsCounter_ == 0"
for an --enable-simulator=arm build, run thusly:

../src/jit-test/jit_test.py --tbpl -f ./dist/bin/js debug-exception-in-fast-import.js
Comment on attachment 9024362 [details] [diff] [review]
bug1476251-dontskip.patch

Review of attachment 9024362 [details] [diff] [review]:
-----------------------------------------------------------------

That's not correct; when say wasm baseline code calls into the JIT, the following happens:

- wasm baseline calls into the wasm jit exit for the specific JIT function, thus creates a frame. This frame's FP will be passed to the wasm frame iterator ctor, which will skip it, but it's fine; the next FP that's readable will be the caller's FP.
- the wasm jit exit creates a transitional JIT frame WasmToJSJit that the JS Jit frame iterator will see.

So if you don't skip the first frame, in many cases you'll see the exit where you shouldn't be observing it. That's not correct and I expect this would lead to crashes in general.
Attachment #9024362 - Flags: feedback?(bbouvier) → feedback-
That's your patch, but with an addendum from yours truly (sorry, forgot to commit your patch before importing it and fixing the HG rejects; would have made a simpler diff).

After some (ahem) time spent in the debugger, I've found that the innermost frame *was* visited, but somehow no stack maps were found for it.

We have a call from wasm to the JS JIT; transitioning from the JS Jit iterator to the wasm one is done by the JitFrameIter abstraction. In particular, this abstraction will skip transitional frames that don't carry meaningful information. It works nicely, except in js::wasm::Instance::traceFrame where the nextPC value is deduced by hand. The nextPC value was correct in all the cases, except when we transition from a JS Jit iterator to a wasm one; in this case, the last nextPC value read (in the isJSJit() block) maps to the return address in the wasm jit exit.

So the Instance::traceFrame function would try to find a stack map for the wasm jit exit, but there's none in this case; it should instead have used the return address to the innermost wasm function.

There's a data entry for this in the JS Jit iterator called the "returnAddressToFp", which is the previously read return address (that is, the return address that will have code transition to the current FP; thus it's different from fp_->returnAddress, which points to the caller). Wasm should use this too, and store it too.

My patch is the same as yours, with a new returnAddressToFp() func in JitFrameIter as well as in WasmFrameIter. The latter now stores the returnAddressToFP_ in WasmFrameIter::popFrame(), which is called in the ctor in general (and in particular, when transitioning from JS jit to wasm frames, in the case of a wasm->JIT call). This passes all your tests, including the one test case that was crashing.
Attached patch returnaddresstofp.patch (obsolete) — Splinter Review
That's the addendum part only; see previous comment for the explanation.
Attachment #9024507 - Attachment is obsolete: true
Attached patch bug1476251-stackmaps-39.diff (obsolete) — Splinter Review
Addresses all review comments in comments 15 and 16.  Also incorporates all of
Luke's comments in comment 10.

Other changes not in the review comments:

* a bug in stackmap creation was fixed.  This resulted from using
  BaseCompiler::framePushedAtEntryToBody_ while still inside the prologue.

* Benjamin's fix for a bug in computation of PCs for Wasm frames, causing
  segfaulting [comment 31], is incorporated.

* [Per comment 16 review] Handling of OOMs in beginFunction, which the
  previous patch dealt with incorrectly using a flag in class BaseCompiler,
  has been changed to use the usual direct-passing-of-bools scheme.  This
  necessitated changing various other routines in BaseCompiler to return bool
  instead of void.

* The first of the four test cases (wasm/gc/stackmaps.js) has been removed,
  since the remaining 3 cover the same stuff adequately.  The remaining 3 have
  been tidied up per comment 15, and some of the timeouts and iteration
  lengths adjusted with the aim of reducing the tendency to time out on slow
  targets.

* The commit message has been updated accordingly.

Try results are at https://treeherder.mozilla.org/#/jobs?repo=try&revision=6c5cbec31674c77b8797b610acd7fbaa1cc3d742

> Outright bugs:
> * storage management of WasmStackMap instances in the presence of
>   non-cleanup following OOM, as discussed on IRC;
> * you may not use new and delete the way you've used them here.
> * OOM handling in beginFunction is probably not right,

All fixed.

> Just going to note once (and ignore all other places) that debug prints need
> to be removed before landing.

All removed.

> ::: js/src/jit/MacroAssembler.cpp
> @@ +3299,5 @@
> >      loadWasmTlsRegFromFrame();
> >  
> >      call(wasm::CallSiteDesc(bytecode.offset(), wasm::CallSite::Symbolic), imm);
> > +
> > +    uint32_t retAddrOffset = currentOffset();
> 
> More substantively I'm somewhat concerned about the brittleness of this
> since call() is an abstraction several layers deep and there's no rule the
> says it /must/ end with the call instruction; it could in principle be a
> call instruction followed by a jump across a flushed constant pool (on ARM,
> ARM64)...  Pool flushing bites us reasonably regularly, though not normally
> at the end of instruction sequences.

I added plumbing to MacroAssembler so as to return a CodeOffset for all call
instructions we create, and used that CodeOffset instead.

::: js/src/wasm/WasmTypes.h
> @@ +1861,5 @@
> > +        hasRefTypedDebugFrame(0)
> > +    {
> > +        const uint32_t nBitmap = calcNBitmap(numMapWords);
> > +        for (size_t i = 0; i < nBitmap; i++)
> > +            bitmap[i] = 0;
> 
> Memset?  Also see below about calloc.

I changed this to used memset, but left the rest of the initialisations as-is
and didn't use calloc.  We've had problems before with zeroing allocators
taking us into C++ undefined behaviour (hence being optimised out) and I don't
want to go there again.

> ::: js/src/wasm/WasmTypes.h
> @@ +1909,5 @@
> > +        }
> > +        MOZ_RELEASE_ASSERT(nExtras < (1 << 7));
> > +        MOZ_ASSERT(nExtras <= numMapWords);
> > +        int32_t i;
> > +        for (i = numMapWords - 1; i >= int32_t(nExtras); i--) {
> 
> The int32_t() cast is not needed here because you have that when you compute
> nExtras, but then inexplicably you store the value in a uint32_t.  Better
> that nExtras be int32_t to begin with.

I've fixed this somewhat better.  AFAICS nExtras is initially a size_t.  I
wanted to avoid any possible loop overruns (sec problems) resulting from
getting passed an |extras| vector with 2^32 elements or more, or from any
32/64-bit confusion, or from any underflow of an unsigned loop index.

> Since the `i` here is not the same as the `i` below, perhaps move the
> declaration here into the for loop header and create a separate variable
> below, for the sake of hygiene.  (I know, matter of taste.)

Done.
Attachment #9023663 - Attachment is obsolete: true
Attachment #9023664 - Attachment is obsolete: true
Attachment #9024362 - Attachment is obsolete: true
Attachment #9024513 - Attachment is obsolete: true
Attachment #9024935 - Flags: review?(lhansen)
> We've had problems before with zeroing allocators
> taking us into C++ undefined behaviour (hence being optimised out) and I don't
> want to go there again.

I can believe that, but we actually use calloc other places, eg, in the code for Table.  It's no good saying "I think this is a problem but other people don't."  Is anything being done to address this adequately?  (Asking you with your Valgrind hat on.)  Also calling on a couple others to answer the same question.
Flags: needinfo?(nfroyd)
Flags: needinfo?(dmajor)
Comment on attachment 9024935 [details] [diff] [review]
bug1476251-stackmaps-39.diff

Review of attachment 9024935 [details] [diff] [review]:
-----------------------------------------------------------------

Commit message not included in the splinter review so here's a top-to-bottom comment stream for that:

> shadow-wasm-value (operand) stack

"evaluation stack" is probably canonical.

> an trap exit stub not-really-a-frame

Should be "a" and "stub's", probably.

> The zero value pushed by GenerateTrapExit has been changed to 1337

Haven't looked at the patch yet but this needs a symbolic name; it's ok to reveal the value we use for it currently in the commit comment though, to make the argument about nonzero values.

Otherwise good.
Comment on attachment 9024935 [details] [diff] [review]
bug1476251-stackmaps-39.diff

Review of attachment 9024935 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/MacroAssembler.cpp
@@ +3309,5 @@
>      loadWasmTlsRegFromFrame();
>  
> +    CodeOffset raOffset
> +        = call(wasm::CallSiteDesc(bytecode.offset(), wasm::CallSite::Symbolic), imm);
> +    MOZ_ASSERT(raOffset.offset() == currentOffset()); // FIXME RMME

FIXME needs to be fixed.

::: js/src/jit/MacroAssembler.h
@@ +530,5 @@
>      inline void callWithABI(Register fun, MoveOp::Type result = MoveOp::GENERAL);
>      inline void callWithABI(const Address& fun, MoveOp::Type result = MoveOp::GENERAL);
>  
> +    // The returned CodeOffset is the assembler offset for the instruction
> +    // immediately following the call; that is, for the return point.

This also seems to be the case for the two existing call() instructions that return CodeOffset, and unless I'm mistaken it's true even for callWithPatch(), so I think this comment should be moved up to just below "Simple call functions.".

::: js/src/jit/x86-shared/MacroAssembler-x86-shared.cpp
@@ +685,5 @@
>  MacroAssembler::call(wasm::SymbolicAddress target)
>  {
>      mov(target, eax);
> +    CodeOffset raOffset = Assembler::call(eax);
> +    MOZ_ASSERT(raOffset.offset() == currentOffset()); // FIXME RMME

FIXME needs to be fixed.
Comment on attachment 9024935 [details] [diff] [review]
bug1476251-stackmaps-39.diff

Review of attachment 9024935 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit-test/tests/wasm/gc/stackmaps1.js
@@ +28,5 @@
> +
> +     ;; -- fn 0
> +     (func $fn0 (export "fn0")
> +                (result i32) (param $arg1 i32) (param $arg2 anyref) (param $arg3 i32)
> +                             (param $arg4 anyref) (param $arg5 anyref) (param $arg6 i32)

This does not actually check that pointers are handled properly.  We don't touch the referenced objects here or above, we only move the pointers (in the caller) and ignore them here; if we disabled tracing altogether the test would probably not fail.  A slightly stronger test uses objects here that have a little structure, say, a field "x" with a number value, and calls out to JS for each of the objects after each call to alloc to read those values from the objects.  It's extra nice if it will check that those values are correct; an unlikely constant value is sufficient.

@@ +37,5 @@
> +
> +     ;; -- fn 1
> +     (func $fn1 (export "fn1") (param $arg1 anyref) (result i32)
> +       (local $i i32)
> +       (set_local $i (i32.const 0))

nit: Redundant initialization, stack slots are always zero.

::: js/src/jit-test/tests/wasm/gc/stackmaps2.js
@@ +11,5 @@
> +// At the same time we are asynchronously runnning handler(), which does a lot
> +// of allocation.  At some point that will trigger a GC.  Assuming that
> +// handler() runs whilst fn0 is running (the most likely scenario, since fn0
> +// consumes the majority of the wasm running time), then the runtime will
> +// unwind the stack from the wasm exit frame, through fn0, fn1 and finally

nit: I expect you want "walk", not "unwind".

@@ +50,5 @@
> +         (set_local $i (i32.add (get_local $i) (i32.const 1)))
> +         (br_if 0 (i32.lt_s (get_local $i) (i32.const 100)))
> +       )
> +
> +       (i32.add (i32.add (get_local $arg1) (get_local $arg3)) (get_local $arg6))

Again before this line it might be nice to check the integrity of $arg2, $arg4, and $arg5.

@@ +103,5 @@
> +        // will remove the allocation call and make the test pointless.
> +        if (x == q) { sum++; }
> +    }
> +    totAllocs += iters;
> +    if (sum == 133713371337) { print("unlikely!"); }

If this is an artificial use of sum it's good to have a comment to that effect.

::: js/src/jit-test/tests/wasm/gc/stackmaps3.js
@@ +11,5 @@
> +// There's no particular reason that the wasm compiler needs to keep the
> +// results of the $mkBoxedInt calls on the machine stack.  It could equally
> +// cache them in registers or even reorder the call sequences so as to
> +// interleave construction of the list elements with construction of the list
> +// itself.  It just happens that our baseline compiler will behave as

These comments are fine, I just think they're overly pessimistic.  With imported functions a typical wasm compiler will have limited room to maneuver re reordering.  (For sure it can use callee-saves registers but it's unlikely to have 20 of them.)

@@ +174,5 @@
> +        // Without this hoop jumping to create an apparent use of |x|, Ion
> +        // will remove the allocation call and make the test pointless.
> +        if (x == q) { sum++; }
> +    }
> +    if (sum == 133713371337) { print("unlikely!"); }

Same comment as for the previous test case.
Comment on attachment 9024935 [details] [diff] [review]
bug1476251-stackmaps-39.diff

Review of attachment 9024935 [details] [diff] [review]:
-----------------------------------------------------------------

Tremendous work.  There will be a slight amount of merge pain with the table code but really it's just superficial.

::: js/src/vm/Stack.h
@@ +1976,5 @@
>      bool done() const;
>      void operator++();
>  
>      JS::Realm* realm() const;
> +    uint8_t* returnAddressToFp() const;

At a minimum, please copy the comment from returnAddressToFp() in JSJitFrameIter.h to above this definition, if it is appropriate; if not, please add an appropriate comment.  The name of this function does not scan well as English (IMO) and needs documentation.

That comment is not great either, unfortunately; "the return address that returns back to the current frame" does not make much sense to me.  Surely there must be something better.

Judging from the implementation in WasmFrameIter.cpp, "returnAddressInCurrentFrame" might have been a more appropriate name but perhaps I misunderstand precisely what's going on here and it's not the current frame at all, but one above or below.

::: js/src/wasm/WasmBaselineCompile.cpp
@@ +1797,5 @@
> +
> +class MachineStackTracker
> +{
> +    // Simulates the machine's stack, with one bool per word.  Index zero in
> +    // this vector corresponds to highest address in stack (where the SP

nit: "the" highest address?

@@ +2872,5 @@
> +                                     uint32_t assemblerOffset,
> +                                     HasRefTypedDebugFrame refDebugFrame)
> +    {
> +        size_t countedPointers = mst_.numPtrs() + memRefsOnStk_;
> +#if !defined(DEBUG)

nit: #ifndef probably (cf below).

@@ +2879,5 @@
> +        if (countedPointers == 0 && extras.empty() && refDebugFrame == HasRefTypedDebugFrame::No) {
> +            return true;
> +        }
> +#else
> +        // In the debug case, create the stack map regardless, and cross-check

We should discuss whether this is really a good idea, since it'll confuse debugging in various ways.  For the initial landing I'm inclined to think it's OK, and the extra testing we get from it is probably good.  It's possible we want an environment variable to enable or disable the always-create behavior, eventually.

@@ +2883,5 @@
> +        // In the debug case, create the stack map regardless, and cross-check
> +        // the pointer-counting below.  We expect the final map to have
> +        // |countedPointers| in total.  This doesn't include those in the
> +        // DebugFrame, but they do not appear in the map's bitmap.
> +        for (bool b : extras) {

Maybe comment above the loop that countedPointers is debug-only from this point on, I got confused...

@@ +2978,5 @@
> +
> +        // Create the final StackMap.  The initial map is zeroed out, so
> +        // there's no need to write zero bits in it.
> +        uint32_t numFrameWords = smap.length();
> +        StackMap* wsmap = StackMap::create(numFrameWords + extras.length());

"wsmap" vs "smap" is too confusing to be right.  smap isn't even a stack map, it's a machine stack tracker.  Can we clean this up?  At least use "mst" for what is now smap, and then "smap" for what is now wsmap?  (I'm open to all suggestions but the fact that smap isn't a StackMap makes my head explode when I read about the indexing, below.)  Despite what you see elsewhere, and despite occasional sniping from bbouvier to the contrary, I'm not averse to properly descriptive names, especially in complicated functions like this.  "stackMap" and "machineStackTracker" would be fine.

@@ +3004,5 @@
> +        if (!stackMaps_->add((uint8_t*)(uintptr_t)assemblerOffset, wsmap)) {
> +            return false;
> +        }
> +
> +#if defined(DEBUG)

Ditto.

@@ +3750,5 @@
> +    ////////////////////////////////////////////////////////////
> +    //
> +    // Stackmap helpers
> +
> +    MOZ_MUST_USE bool GenerateStackmapEntriesForTrapExit(ExitStubMapVector& extras)

"generateStackmapEntries...", lower-case initial letter for instance methods, upper-case for static methods only.

@@ +3753,5 @@
> +
> +    MOZ_MUST_USE bool GenerateStackmapEntriesForTrapExit(ExitStubMapVector& extras)
> +    {
> +        // At the beginning of a function, we may have live roots in registers
> +        // (as arguments) at the point that we perform a stack overflow check.

nit: "that" => "where" (though I hesitate to correct a native speaker, even a long-time expat...)

@@ +3992,5 @@
>          if (env_.debugEnabled()) {
> +            // If the return type is a ref, we need to note that in the stack
> +            // maps generated here.  Note that this assumes that
> +            // DebugFrame::result* and DebugFrame::cachedReturnJSValue_ are
> +            // either both ref-typed or they are both not ref-typed.  It can't

This will get interesting once we have proper boxing for anyref, but let's cross that bridge when we get to it.

@@ +10643,5 @@
> +
> +#ifdef DEBUG
> +  // Check that the number of ref-typed entries in the operand stack matches
> +  // reality.
> +# define CHECK_POINTER_COUNT \

Neat.

::: js/src/wasm/WasmBaselineCompile.h
@@ +71,5 @@
>  };
>  
> +#ifdef DEBUG
> +// Check whether |nextPC| is a valid code address for a stackmap created by
> +// this compiler.

"... by this compiler" is an interesting qualifier.  It is correct, of course.  But how will we do this for Ion and Cranelift, I wonder?

::: js/src/wasm/WasmFrameIter.h
@@ +95,5 @@
>      DebugFrame* debugFrame() const;
>      jit::FrameType unwoundIonFrameType() const;
>      uint8_t* unwoundIonCallerFP() const { return unwoundIonCallerFP_; }
> +    Frame* frame() const { return fp_; }
> +    uint8_t* returnAddressToFp() const;

As in Stack.h, returnAddressToFp() needs documentation.

::: js/src/wasm/WasmGenerator.cpp
@@ +1000,5 @@
>      }
>  
> +    metadataTier_->stackMaps.offsetBy(uintptr_t(segment->base()));
> +
> +#if defined(DEBUG)

nit: "#ifdef" would be more common here.

::: js/src/wasm/WasmInstance.cpp
@@ +1113,5 @@
> +        DebugFrame* debugFrame = DebugFrame::from(frame);
> +        char* debugFrameP = (char*)debugFrame;
> +
> +        char* resultRefP = debugFrameP + DebugFrame::offsetOfResults();
> +        if (* (intptr_t*)resultRefP) {

nit: superflous space.

::: js/src/wasm/WasmStubs.cpp
@@ +279,5 @@
>  {
> +    //masm.loadPtr(Address(WasmTlsReg, offsetof(TlsData, cx)), scratch);
> +    //masm.add32(Imm32(increment),
> +    //           Address(scratch, offsetof(JSContext, suppressGC) +
> +    //                            js::ThreadData<int32_t>::offsetOfValue()));

This is fine, so long as we don't land it but land a patch that cleans this up at the same time.

@@ +1740,5 @@
> +void
> +wasm::GenerateTrapExitMachineState(MachineState* machine, size_t* numWords)
> +{
> +    // This is the number of words pushed by the initial WasmPush().
> +    *numWords = TrapExitDummyValueOffsetFromTop + 1;

WasmPushSize / sizeof(word) is probably slightly better.

::: js/src/wasm/WasmStubs.h
@@ +48,5 @@
> +// cross-checking during garbage collection.
> +static constexpr uintptr_t TrapExitDummyValue = 1337;
> +
> +// And its offset, in words, down from the highest-addressed word of the trap
> +// exit frame.

You'd want to comment here, and also at the implementation of WasmPush, that the payload is written into the low-addressed words when WasmPush allocates a quantity of storage > 1 word.  If that is not made an invariant then these offsets are not well-defined.

::: js/src/wasm/WasmTypes.h
@@ +1861,5 @@
> +    // Some initial prefix of the array, numExitStubWords, contains entries
> +    // for registers saved by a wasm trap exit stub.  This field is needed in
> +    // order for users of these maps to calculate the correct stack starting
> +    // address.  However, the presence of numExitStubWords has no effect on
> +    // indexing in the array.  numExitStubWords is in most cases zero and in

How can the presence of more words at the beginning not have any effect on indexing?  We're missing a "because ..." clause.

@@ +1991,5 @@
> +    };
> +
> +  private:
> +    bool sorted_;
> +    Vector<Maplet, 8, SystemAllocPolicy> mapping_;

Do we have data suggesting that "8" is a good value here?  If a StackMaps is per-function I could be persuaded to believe that; if it's per-module I don't (and 0 would likely be better).
Attachment #9024935 - Flags: review?(lhansen) → review+
(In reply to Lars T Hansen [:lth] from comment #34)
> > We've had problems before with zeroing allocators
> > taking us into C++ undefined behaviour (hence being optimised out) and I don't
> > want to go there again.
> 
> I can believe that, but we actually use calloc other places, eg, in the code
> for Table.  It's no good saying "I think this is a problem but other people
> don't."  Is anything being done to address this adequately?  (Asking you
> with your Valgrind hat on.)  Also calling on a couple others to answer the
> same question.

I'm not aware of the history of these problems. Do you have any links?

In the absence of any context, my gut feeling is that this is something we'd want to fix at the allocator rather than pushing the burden onto every callsite we write.
Flags: needinfo?(dmajor)
I don't know where Julian's coming from, but in my corner of the world there have been noises in the past that the C++ memory model (for concurrent memory access) does not work well with C's memory allocation functions.  I don't think calloc was singled out particularly though; it's more that the semantics of malloc/free are a poor fit because C++ is all about "objects" and C is all about "bytes", and it's easy to put your foot into some Undefined Behavior if you use malloc/free from C++.
(In reply to Lars T Hansen [:lth] from comment #40)
> I don't know where Julian's coming from, but in my corner of the world there
> have been noises in the past that the C++ memory model (for concurrent
> memory access) does not work well with C's memory allocation functions.  I
> don't think calloc was singled out particularly though; it's more that the
> semantics of malloc/free are a poor fit because C++ is all about "objects"
> and C is all about "bytes", and it's easy to put your foot into some
> Undefined Behavior if you use malloc/free from C++.

I can believe that there are subtleties with the C++ memory model and C's memory allocation functions, but I'm certainly not clever enough to see what those are.  Do you have pointers to relative discussions on that?

My understanding of the situation Julian describes is from the GCC 6 upgrade, where g++ decided to start enforcing rules about object construction, and made code like this break:

struct A
{
 A() {}

 void* operator new(size_t s)
 {
   void* ptr = malloc(s);
   memset(ptr, 0xFF, s);
   return ptr;
 }

 void operator delete(void* ptr) { free(ptr); }

 int value;
};

int main()
{
 A* a =  new A;
 assert(a->value == -1); // You cannot assume this in standard C++
 delete a;
}

since the memory writes to the memory prior to construction are, from the point of view of the C++ standard, dead code.  And g++ started deleting such writes, which broke code in Gecko that did similar things.  I think the concern, then, is that if you consider the similar code:

struct A
{
 A() {}

 void* operator new(size_t s)
 {
   void* ptr = calloc(1, s);
   return ptr;
 }

 void operator delete(void* ptr) { free(ptr); }

 int value;
};

int main()
{
 A* a =  new A;
 assert(a->value == 0); // Is this well-defined?
 delete a;
}

This is...technically bad code?...because you can't assume that the memory writes done by calloc (if those memory writes are actually done by the program and not, e.g., by the OS handing out zero'd pages) are valid.  I think that's at least a consistent position to take, but OTOH for the compiler to optimize out the zero'ing would require quite some sophistication, and not worth worrying about.  On the third hand, compilers only get more clever, and one could imagine embedding situations that statically link the standard library where the compiler might be able to do the necessary analysis and remove the zero'ing that calloc does?

Julian, how close is that explanation to your concerns?

My understanding is that the memset in the constructor would be OK, but relying on calloc() might be (pedantically) sketchy.  I go back and forth on whether we should care about the potential sketchiness even while writing this.
Flags: needinfo?(nfroyd)
(In reply to Nathan Froyd [:froydnj] from comment #41)
> (In reply to Lars T Hansen [:lth] from comment #40)
> > I don't know where Julian's coming from, but in my corner of the world there
> > have been noises in the past that the C++ memory model (for concurrent
> > memory access) does not work well with C's memory allocation functions.  I
> > don't think calloc was singled out particularly though; it's more that the
> > semantics of malloc/free are a poor fit because C++ is all about "objects"
> > and C is all about "bytes", and it's easy to put your foot into some
> > Undefined Behavior if you use malloc/free from C++.
> 
> I can believe that there are subtleties with the C++ memory model and C's
> memory allocation functions, but I'm certainly not clever enough to see what
> those are.  Do you have pointers to relative discussions on that?

Sadly not really, I read things and then ... you know how it goes.

There's this, and though it does not speak to the C++ case at all, ISTR it had something to say about this, since C was/is also moving toward an "object" model of memory: https://www.di.ens.fr/~zappa/readings/c11comp.pdf

Thank you for the detailed explanation of the calloc failure mode.  It makes a lot of sense.
(In reply to Nathan Froyd [:froydnj] from comment #41)
> My understanding of the situation Julian describes is from the GCC 6
> upgrade, where g++ decided to start enforcing rules about object
> construction, and made code like this break:

Sorry to be slow getting back to this.  Yes, that's exactly the situation
I was thinking of.  Gcc 6 started doing "lifetime dead code elimination",
and for at least dozens of classes in Gecko, succeeded in deleting the
zeroing memset that happened before the constructor was entered, causing
a whole lot of hassle.

My understanding is also that a memset in the constructor is fine, because
what the Lifetime-DCE optimisation did was to regard all assignments to the
object *before* entry of the constructor, as dead, and hence removed the
memset.  My concern about using calloc to allocate was that, since the calloc
also occurs prior to constructor entry, it too would get deleted.  

But on further consideration that can't really happen, since the space
still needs to be allocated somehow.  So at best I would guess a compiler
could downgrade the calloc into a malloc, which .. actually .. would be
just as bad.
(In reply to Julian Seward [:jseward] from comment #43)
> But on further consideration that can't really happen, since the space
> still needs to be allocated somehow.  So at best I would guess a compiler
> could downgrade the calloc into a malloc, which .. actually .. would be
> just as bad.

That would be completely evil, but I can totally see a compiler claiming this transformation is 100% valid.
(In reply to David Major [:dmajor] from comment #39)
> I'm not aware of the history of these problems. Do you have any links?

See bug 1232696, really comments 6, 12 and later.
Blocks: 1508223
Blocks: 1508550
Attached patch bug1476251-stackmaps-43.diff (obsolete) — Splinter Review
Updated patch.  Addresses review comments in comments 35, 36, 37 and 38.
Attachment #9024935 - Attachment is obsolete: true
Once bug 1505774 lands (as soon as the trees reopen) the stackmaps patch will need the following very minor adjustments.
Flags: needinfo?(jseward)
Depends on: 1509480
Once bug 1508665 lands (pending review), there will be a slight rebasing required here, since it moves a "public:" from below a function to above a function in BaseStackFrame.  That instance of "public:" has almost certainly moved quite a ways, and the change will need to be re-implemented.  Should not be hard.
Changes from the patch in comment 46 (file bug1476251-stackmaps-43.diff):

* Arguments passed in memory are now accounted for.  This fixes some crashing
  in the previous version.  Also, they are accounted for in the callee's
  stackmap, not the caller's.

* All state and logic for creating stackmaps has been lifted out of
  BaseCompiler and put in a new struct StackMapGenerator.  BaseCompiler pokes
  various fields in StackMapGenerator in a few places but otherwise contains
  little stackmap creation code.

* struct BaseCompiler::Stk (evaluation stack elements) has been lifted to the
  top level and placed above StackMapGenerator, since the latter needs it.

* Construction of wasm::StackMap as done inside
  StackMapGenerator::createStackMap has been simplified somewhat, so as to
  avoid "sliding in" the stub-exit extras.  wasm::StackMap has been simplified
  accordingly.

* Unfortunately the size of the smallest wasm::StackMap has increased from 8
  to 12 bytes.  This is because it is now necessary to include a field that
  records where the implied wasm::Frame inside the stack map is, relative to
  the top (highest addressed word of the map).  This wasn't needed before
  because the wasm::Frame was assumed to be at a fixed location relative to
  the top of the map.  I am not sure it is possible to pack the relevant
  non-bitmap fields into 32-bits.  (I'm not sure it isn't, either.)  This
  would be suitable for a followup investigation.

* Commit message is updated, with further details on the above changes.

* Rebased to 449244:0ae2634de8e1 and best-effort manually reformatted per new
  whitespace guidelines.

* Runs clean on try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=1caa4591e0b300450cd35b6899fa132511777453
Attachment #9026414 - Attachment is obsolete: true
Attachment #9026935 - Attachment is obsolete: true
Flags: needinfo?(jseward)
Attachment #9029778 - Flags: review?(lhansen)
Comment on attachment 9029778 [details] [diff] [review]
bug1476251-stackmaps-47.diff

Review of attachment 9029778 [details] [diff] [review]:
-----------------------------------------------------------------

Time to let this one leave the nest, I think.

::: js/src/wasm/WasmBaselineCompile.cpp
@@ +2170,5 @@
> +    if (!mst_.cloneTo(&augmentedMst)) {
> +      return false;
> +    }
> +
> +    // At this point, augmentedMst contains only contains entries covering the

Duplicated "contains".

@@ +2300,5 @@
> +        }
> +        i++;
> +      }
> +      // Followed by the "main" part of the map.
> +      for (i = 0; i < augmentedMstWords; i++) {

nit: Sort of weird that you reuse the 'i' here, it makes reading the code confusing (as in, "is this a bug?").  Better to scope the two 'i's separately).

@@ +4011,5 @@
> +      if (argLoc.kind() != ABIArg::Stack) {
> +        continue;
> +      }
> +      const ValType& ty = argTys[i.index()];
> +      if (ty.code() != ValType::Ref && ty.code() != ValType::AnyRef) {

Any reason we're not using !ty.isReference() here?
Attachment #9029778 - Flags: review?(lhansen) → review+
Pushed by jseward@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/5ca49059949b
Generate stack maps in the Wasm Baseline compiler.  r=lth.
https://hg.mozilla.org/mozilla-central/rev/5ca49059949b
Status: NEW → RESOLVED
Closed: Last year
Resolution: --- → FIXED
Target Milestone: --- → mozilla66
Regressions: 1567040
You need to log in before you can comment on or make changes to this bug.