Closed Bug 1277973 Opened 8 years ago Closed 8 years ago

Baldr: set Activation::fp in GenerateTrapStub so faults have backtraces

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla52
Tracking Status
firefox52 --- fixed

People

(Reporter: luke, Assigned: luke)

References

Details

Attachments

(8 files)

Currently we just auto-align the stack and call out to C++, but this leaves Activation.fp null which means, when we try to build a backtrace for the error, we lose all the wasm frames in the activation.  The trick is we need a static frame depth so we can simply store (fp + frameDepth).  It'd be good to assert this when emitting the masm.jump(JumpTarget).

I think we might also be able to convert JumpTarget::StackOverflow to a Trap, since it also has static frame depth.
D'oh, of course there isn't one static frame-depth: these JumpTargets are shared by all functions.  So I think what we need here is a per-(JumpTarget,function) thunk that bumps sp to point to the frame before jumping to the real target.  This is what CodeGenerator::generateWasm is already doing for stack overflow; so the fix here is just to generalize that.
Summary: Baldr: set Activation::fp in GenerateTrapStub → Baldr: set Activation::fp in GenerateTrapStub so faults have backtraces
What this means is that the jumps to JumpTargets can be plain jumps (no patching); its just the jumps in the thunks that need to go "far".
Note, there are two additional related cases where we'll fail to have exitFP set:
 - calls to JumpTarget::BadIndirectCall via entries in a Table
 - calls to HandleTrap() in the C++ impl of wasm ops
This prep patch removes 'throw' from the list of traps and makes it a separate kind of stub that is linked using plain labels (since it is only used by stubs created during finishCodegen).
Assignee: nobody → luke
Status: NEW → ASSIGNED
Attachment #8799195 - Flags: review?(bbouvier)
This patch switches profiling prologues to use a symbolic address, rather than the WasmTlsReg, to load the WasmActivation*.  This is necessary since trap stubs will (in the last patch) use profiling prologues/epilogues and trap stubs do (want to) have WasmTlsReg available.  The downside is prologues get a bit bigger (which is why I used to WasmTlsReg initially).  Another annoyance is that loading a SymoblicAddress on ARM has different codegen when movw/movt are not available (it uses an immediate in a pool) and this breaks the static code offsets necessary for stack walking.  For now, the patch just disables wasm if movw/movt aren't available since I expect this is a diminishing number of devices; if it ever mattered we could do something more complicated but I'd like to see if anyone actually notices first.
Attachment #8799196 - Flags: review?(bbouvier)
Attached patch system-page-sizeSplinter Review
Unrelated: I happened to notice that the SignalUsage-removal patch forgot to add in a dynamic check for gc::SystemPageSize > wasm::PageSize (which we assume).
Attachment #8799197 - Flags: review?(bbouvier)
This simple patch generalizes CallSiteAndTarget in preparation for the next patch, which needs to add a new case, to use a proper enum in stead of the current NOT_DEFINITION.
Attachment #8799198 - Flags: review?(bbouvier)
This patch simply renames the "thunks" that were added earlier to "far jumps".  "Thunk" is pretty a overloaded term and these methods just emit jumps so the new name is more accurate and makes the last patch read better.
Attachment #8799199 - Flags: review?(bbouvier)
This simple patch moves MWasmMemoryAccess out of MIR.h and into Assembler-shared.h, where it is named wasm::MemoryAccessDesc, and makes the load/store MIR nodes *contain*, not *derive* wasm::MemoryAccessDesc makes things a bit nicer in a few ways.
Attachment #8799200 - Flags: review?(bbouvier)
Attached patch fix-simulatorSplinter Review
This patch fixes the ARM simulator so that f64 loads/stores issue a single 8-byte load/store rather than two 4-byte loads/stores.  Without this patch, the fault-handling emulation gets confused in the case where an f64 memory access partially overlaps the guard page because one access will be "in" and one will be "out".
Attachment #8799201 - Flags: review?(bbouvier)
Attached patch fix-callstackSplinter Review
Here's the final patch which actually makes the fix.  In addition to setting fp, the patch also reports the trapping instruction's offset which is what required most of the work; most of the patch is just plumbing the trap information from wasm decoding into MIR and then into masm which is fortunately pretty mechanical.

The basic approach taken is to have a 3-step path for a trap:
 - the "trap site": the inline hot path which contains the branch
 - the "out of line trap path": jumped to by the trap site and goes at the very end of each function; 1 per trap-site so it can record trapping bytecode
 - the "trap exit": the shared stubs that go at the end of the module and are jumped to by all same-kind out of line trap paths

Initially I was going to have the out-of-line trap path just set some registers  before jumping to the trap exit. But in practice I ended up duplicating all this CallSite logic so I found it much simpler just have the out-of-line trap path make a proper call (with CallSite) to the trap exit which then uses a plain profiling prologue/epilogue so it is picked up without any extra work by the stack iterators.

The only tricky thing was memory accesses due to signal handling.  Fundamentally, precise trap info requires storing a MemoryAccess for each load/store pc which conveys the bytecode (whereas before wasm loads/stores had no metadata).  Considering different options, the simplest thing I found was to give MemoryAccess a 'trapOutOfLineCode' field to which the signal handler redirects the pc on fault.  This keeps the signal handler pretty simple and the rest of the trap path orthogonal to whether the trap was reach via branch vs. signal handler.
Attachment #8799547 - Flags: review?(bbouvier)
Attachment #8799195 - Flags: review?(bbouvier) → review+
Attachment #8799197 - Flags: review?(bbouvier) → review+
Attachment #8799196 - Flags: review?(bbouvier) → review+
Attachment #8799198 - Flags: review?(bbouvier) → review+
Comment on attachment 8799199 [details] [diff] [review]
rename-thunk-to-far-jump

Review of attachment 8799199 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for making a separate patch for this.

::: js/src/asmjs/WasmGenerator.cpp
@@ +304,5 @@
>                  offsets.end = masm_.currentOffset();
>  
>                  if (!metadata_->codeRanges.emplaceBack(CodeRange::CallThunk, offsets))
>                      return false;
> +                if (!metadata_->callThunks.emplaceBack(jumpOffset, cs.funcDefIndex()))

I was wondering whether it'd make sense to rename callThunks to callFarJumps, but it seems that if we did so, we'd have to rename jumpThunks to jumpFarJumps, which is silly; so keeping callThunks / jumpThunks sounds good.

::: js/src/jit/MacroAssembler.h
@@ +509,5 @@
> +    // other uint32_t offset without using a constant pool (thus returning a
> +    // simple CodeOffset instead of a CodeOffsetJump).
> +    CodeOffset farJumpWithPatch() PER_SHARED_ARCH;
> +    void patchFarJump(CodeOffset farJump, uint32_t targetOffset) PER_SHARED_ARCH;
> +    static void repatchFarJump(uint8_t* code, uint32_t farJumpOffset, uint32_t targetOffset) PER_SHARED_ARCH;

For consistency with the previous one, we *could* have farJumpOffset be a CodeOffset too (and be renamed to farJump), but I see how verbose it would be for the single caller.

::: js/src/jit/arm/MacroAssembler-arm.cpp
@@ +5034,5 @@
>      return u32Offset;
>  }
>  
>  void
> +MacroAssembler::patchFarJump(CodeOffset u32Offset, uint32_t targetOffset)

Maybe rename u32Offset to farJump?

@@ +5048,5 @@
>      *u32 = (targetOffset - addOffset) - 8;
>  }
>  
>  void
> +MacroAssembler::repatchFarJump(uint8_t* code, uint32_t u32Offset, uint32_t targetOffset)

(and maybe u32Offset to farJumpOffset?)

::: js/src/jit/mips-shared/MacroAssembler-mips-shared.cpp
@@ +1513,5 @@
>      return u32Offset;
>  }
>  
>  void
> +MacroAssembler::patchFarJump(CodeOffset u32Offset, uint32_t targetOffset)

ditto

@@ +1522,4 @@
>  }
>  
>  void
> +MacroAssembler::repatchFarJump(uint8_t* code, uint32_t u32Offset, uint32_t targetOffset)

and ditto
Attachment #8799199 - Flags: review?(bbouvier) → review+
Comment on attachment 8799200 [details] [diff] [review]
hoist-memory-access-desc

Review of attachment 8799200 [details] [diff] [review]:
-----------------------------------------------------------------

Much cleaner, thank you.

::: js/src/jit/shared/Assembler-shared.h
@@ +740,5 @@
> +
> +    void clearOffset() { offset_ = 0; }
> +};
> +
> +

nit: two blank lines
Attachment #8799200 - Flags: review?(bbouvier) → review+
Comment on attachment 8799201 [details] [diff] [review]
fix-simulator

Review of attachment 8799201 [details] [diff] [review]:
-----------------------------------------------------------------

Curious that ARM wants 8 bytes loads/stores to be aligned on 4 bytes boundary.
Attachment #8799201 - Flags: review?(bbouvier) → review+
(In reply to Benjamin Bouvier [:bbouvier] from comment #14)
> Comment on attachment 8799201 [details] [diff] [review]
> fix-simulator
> 
> Review of attachment 8799201 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Curious that ARM wants 8 bytes loads/stores to be aligned on 4 bytes
> boundary.

Doubleword accesses are defined on early ARMv7 to be two single-word accesses, ie, doubleword loads and stores are not access-atomic always.  (LDREXD/STREXD are different, as are 8-byte loads and stores on AArch64.)

On ARMv7 that supports the Large Physical Address Extension, and presumably on ARMv8, LDRD/STRD are access-atomic *provided* the address is 8-byte aligned.
Comment on attachment 8799547 [details] [diff] [review]
fix-callstack

Review of attachment 8799547 [details] [diff] [review]:
-----------------------------------------------------------------

Looking very clean in general, great comments and good testing! Thanks for doing this.

I think there's going to be a strange behavior for ARMv7 unaligned accesses, see below.

::: js/src/asmjs/WasmFrameIterator.cpp
@@ +632,2 @@
>          // When the pc is inside the prologue/epilogue, the innermost
>          // call's AsmJSFrame is not complete and thus fp points to the the

pre-existing: the the

::: js/src/asmjs/WasmGenerator.cpp
@@ +277,2 @@
>  
> +    OffsetMap callFarJumps;

I liked the previous name a bit better because of the "already" that made it more descriptive imo. Totally ok to remove thunk, though.

::: js/src/asmjs/WasmSignalHandlers.cpp
@@ -1154,5 @@
> -    return HugeMemoryAccess(context, pc, faultingAddress, *instance, ppc);
> -#elif defined(JS_CODEGEN_ARM)
> -    MOZ_RELEASE_ASSERT(signal == Signal::BusError || signal == Signal::SegFault);
> -    if (signal == Signal::BusError)
> -        *ppc = instance->codeSegment().unalignedAccessCode();

Well, this is weird: on ARMv7 even where we allow dynamic unaligned accesses, this will report out-of-bounds instead of unaligned access, which we should handle correctly at some point. Shouldn't we revert this change?

::: js/src/jit-test/tests/asm.js/testAtomics.js
@@ +8,5 @@
>  load(libdir + "asserts.js");
>  
>  setJitCompilerOption('asmjs.atomics.enable', 1);
>  
> +const RuntimeError = WebAssembly.RuntimeError;

For the record, noting down my thoughts here: this is safe because asm.js has the same compiler support preconditions as wasm. Also, atomics/simd.js error reporting becomes conditional on wasm being enabled; that is safe, because all of these are nightly only, at the moment. In the future, wasm will likely be enabled *before* atomics/simd.js (if they ever get into JS), so it will be safe too.

::: js/src/jit-test/tests/wasm/errors.js
@@ +150,5 @@
> +for (var i = 1; i < N; i++) {
> +    assertEq(isWasmFunction(stack[i].name), true);
> +    assertEq(stack[i].line > lastLine, true);
> +    lastLine = stack[i].line;
> +    assertEq(binary[stack[i].line], i == 3 ? CallIndirectCode : CallCode);

Sweet!

::: js/src/jit-test/tests/wasm/profiling.js
@@ +38,5 @@
> +    for (let j = 1; j < array.length; j++) {
> +        if (array[i] !== array[j])
> +            array[++i] = array[j];
> +    }
> +    array.length = i + 1;

or using the shiny ES2015 features,

function removeDuplicates(array) {
  return Array.from(new Set(array));
}

::: js/src/jit/CodeGenerator.cpp
@@ +11657,5 @@
>  void
>  CodeGenerator::visitWasmTrap(LWasmTrap* lir)
>  {
>      MOZ_ASSERT(gen->compilingAsmJS());
> +    masm.jump(wasm::TrapDesc(lir->mir()->trapOffset(), lir->mir()->trap(), masm.framePushed()));

maybe put lir->mir() in a local?

::: js/src/jit/CodeGenerator.h
@@ +65,5 @@
>      ~CodeGenerator();
>  
>    public:
>      MOZ_MUST_USE bool generate();
> +    MOZ_MUST_USE bool generateWasm(wasm::SigIdDesc sigId, wasm::TrapOffset trapOffset, wasm::FuncOffsets *offsets);

nit: more than 100 chars, please split in two lines

::: js/src/jit/MCallOptimize.cpp
@@ +3869,5 @@
>          if (sign == SimdSign::NotApplicable)
>              sign = GetSimdSign(toType);
>  
>          // Possibly expand into multiple instructions.
> +        ins = MSimdConvert::AddLegalized(alloc(), current, arg, mirType, sign, wasm::TrapOffset());

trapOffset = wasm::TrapOffset() could be the default value in the static method's signature

::: js/src/jit/MacroAssembler.cpp
@@ +2869,5 @@
> +            // To solve both problems at once, the out-of-line path (far) jumps
> +            // directly to the trap exit stub. This takes advantage of the fact
> +            // that there is already a CallSite for call_indirect and the
> +            // current stack/register state already match the trap exit's
> +            // assumptions.

Nice comments in this entire function, thanks!

@@ +2891,5 @@
> +            // exit's frame so that unwinding begins at the frame and offset of
> +            // the trapping instruction.
> +            wasm::CallSiteDesc desc(site.bytecodeOffset, wasm::CallSiteDesc::TrapExit);
> +            call(desc, site.trap);
> +        }

I like the idea below of having the breakpoint. In DEBUG mode, could we add a breakpoint() right here too?

::: js/src/jit/arm/CodeGenerator-arm.cpp
@@ +552,5 @@
>          masm.as_cmp(rhs, Imm8(0));
>          if (mir->canTruncateInfinities()) {
>              if (mir->trapOnError()) {
> +                wasm::TrapDesc trap(mir->trapOffset(), wasm::Trap::IntegerDivideByZero,
> +                                    masm.framePushed());

I liked what you've done for the WasmBaselineCompiler; random idea: could we also have a CodeGenerator::trap() method, maybe looking like this, to make call sitess less verbose?

template <class T>
wasm::TrapDesc trap(T* mir, wasm::Trap trap) {
  return wasm::TrapDesc(mir->trapOffset(), trap, masm.framePushed());
}

In some cases, when the MIR only has one associated wasm::Trap, it could have a wasm::Trap trap() getter (which would have the nice side-effect of propagating less information in all the CodeGenerator* files), and then we could have a variant like this:

template<class T>
wasm::TrapDesc trap(T* mir) {
  return trap(mir, mir->trap());
}

And a call site would look like this in the worst case:
masm.ma_b(trap(mir, wasm::Trap::IntegerDivideByZero), Assembler::Equal);

Or like this in the best case:
masm.ma_b(trap(mir), Assembler::Equal);

(suppressing many braces as a bonus)

@@ +2398,5 @@
>          if (type == Scalar::Int64) {
>              MOZ_ASSERT(INT64LOW_OFFSET == 0);
> +
> +            load = masm.ma_dataTransferN(IsLoad, 32, /* signed = */ false, HeapReg, ptr, output.low);
> +            masm.append(mir->access(), load.getOffset(), masm.framePushed());

Not sure why we need those here (and on x86 too)? is it because of the guard page optimization for small offsets? (if so, we might need them too for the unaligned loads/stores too).

If this is because of the guard page optimization used to catch accesses with small offsets out of bounds, could we append them conditionally iff the access' offset (not trap offset) is not 0, to reduce the amount of MemoryAccesses? (probably in the append method itself). On !HUGE_WASM_MEMORY platforms, the bounds check have their own TrapSite and will throw before the access is actually run.

::: js/src/jit/shared/Assembler-shared.h
@@ +771,5 @@
> +// the trap out-of-line path.
> +
> +struct TrapDesc : TrapOffset
> +{
> +    enum Kind { Jump, MemoryAccess };

enum class?

::: js/src/jit/x86-shared/CodeGenerator-x86-shared.cpp
@@ +2471,5 @@
>      Register temp = ToRegister(ins->temp());
>  
>      masm.convertFloat32x4ToInt32x4(in, out);
>  
> +    auto *ool = new(alloc()) OutOfLineSimdFloatToIntCheck(temp, in, ins, ins->mir()->trapOffset());

pre-existing: auto* ool
Attachment #8799547 - Flags: review?(bbouvier) → review+
(In reply to Lars T Hansen [:lth] from comment #15)
> (In reply to Benjamin Bouvier [:bbouvier] from comment #14)
> > Comment on attachment 8799201 [details] [diff] [review]
> > fix-simulator
> > 
> > Review of attachment 8799201 [details] [diff] [review]:
> > -----------------------------------------------------------------
> > 
> > Curious that ARM wants 8 bytes loads/stores to be aligned on 4 bytes
> > boundary.
> 
> Doubleword accesses are defined on early ARMv7 to be two single-word
> accesses, ie, doubleword loads and stores are not access-atomic always. 
> (LDREXD/STREXD are different, as are 8-byte loads and stores on AArch64.)

Luke, this seems to suggest that the issue you were seeing in the simulator could actually happen in practice with a real device. Also, what happens with unaligned loads/stores, which do several accesses to simulate an atomic one?
Flags: needinfo?(luke)
(In reply to Benjamin Bouvier [:bbouvier] from comment #17)
> > Doubleword accesses are defined on early ARMv7 to be two single-word
> > accesses, ie, doubleword loads and stores are not access-atomic always. 
> > (LDREXD/STREXD are different, as are 8-byte loads and stores on AArch64.)
> 
> Luke, this seems to suggest that the issue you were seeing in the simulator
> could actually happen in practice with a real device.

So the failure here was more to do with the simulator inner workings: we perform the first loadW, it faults, redirects PC, then perform the second, it faults, tries to look up the (now redirected) pc, finds no MemoryAccess, and then uses the generic out-of-bounds throw path (which provides no callstack, and thus failed jit-tests/tests/wasm/errors.js).  So the attempted fix in fix-simulator was really just preventing that; I suppose instead I could have a "did I fault?" test in-between the first and second loadW.

> Also, what happens
> with unaligned loads/stores, which do several accesses to simulate an atomic
> one?

Right, I forgot to mention that: since those don't create MemoryAccesses for each loading/storing instruction, they will get the generic out-of-bounds path that won't provide a stack trace.  I had meant to say that should be a follow-up fix to add MemoryAccesses and add special tests for them.  Sounds like it depends on bug 1308996.
Flags: needinfo?(luke)
(In reply to Benjamin Bouvier [:bbouvier] from comment #16)
Great points!

> ::: js/src/jit-test/tests/asm.js/testAtomics.js
> For the record, noting down my thoughts here:
[...]
> In the future, wasm will likely be enabled *before* atomics/simd.js (if 
> they ever get into JS), so it will be safe too.

Agreed on overall reasoning, but I'd slightly refine that to say that Atomics will most definitely get into *JS*, just not *asmjs*.  Atomics will continue to throw whatever JS says; but wasm will mostly have its own atomic operations (either by new barrier bits in existing load/store's flags immediate or new ops for cmpxchg et al) which will have trapping (and thus RuntimeError) semantics.

> or using the shiny ES2015 features,
> 
> function removeDuplicates(array) {
>   return Array.from(new Set(array));
> }

Ah, cool technique.  Unfortunately, we only want to remove *adjacent* duplicates, not all duplicates.  I'll rename to removeAdjacentDuplicates, though.

> I liked what you've done for the WasmBaselineCompiler; random idea: could we
> also have a CodeGenerator::trap() method, maybe looking like this, to make
> call sitess less verbose?

This suggestion in particular really made things cleaner, thanks!

> Not sure why we need those here (and on x86 too)? is it because of the guard
> page optimization for small offsets? (if so, we might need them too for the
> unaligned loads/stores too).

Right and right

> If this is because of the guard page optimization used to catch accesses
> with small offsets out of bounds, could we append them conditionally iff the
> access' offset (not trap offset) is not 0, to reduce the amount of
> MemoryAccesses? (probably in the append method itself). On !HUGE_WASM_MEMORY
> platforms, the bounds check have their own TrapSite and will throw before
> the access is actually run.

Unfortunately, there's the unaligned+partially-oob case too.

> > +struct TrapDesc : TrapOffset
> > +{
> > +    enum Kind { Jump, MemoryAccess };
> 
> enum class?

For nested enums, where the containing struct already nicely scopes the names, I tend to not think the extra type-checking is worth the extra syntactic burden.  I wish one could 'using Kind' within TrapDesc to inject the enumerators and have both...
Pushed by lwagner@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/e3f5d150e1ba
Baldr: make the throw stub not be a JumpTarget (r=bbouvier)
https://hg.mozilla.org/integration/mozilla-inbound/rev/9a51acfd0094
Baldr: replace WasmTlsReg load with SymbolicAddress load in prologue/epilogue (r=bbouvier)
https://hg.mozilla.org/integration/mozilla-inbound/rev/e6d2b67b35e5
Baldr: dynamically check system page size (r=bbouvier)
https://hg.mozilla.org/integration/mozilla-inbound/rev/e29daffaaca4
Baldr: prepare CallSiteAndTarget for another Kind (r=bbouvier)
https://hg.mozilla.org/integration/mozilla-inbound/rev/33b295d58244
Baldr: rename masm 'thunk' to 'farJump' (r=bbouvier)
https://hg.mozilla.org/integration/mozilla-inbound/rev/959f1e7b26fa
Baldr: hoist wasm::MemoryAccessDesc (r=bbouvier)
https://hg.mozilla.org/integration/mozilla-inbound/rev/6b8136cb9a4d
Baldr: avoid clobbering pc in ARM simulator (r=bbouvier)
https://hg.mozilla.org/integration/mozilla-inbound/rev/860ba5468626
Baldr: provide precise trap locations and stacks (r=bbouvier)
Depends on: 1311419
Depends on: 1352500
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: