Closed Bug 1353763 Opened 7 years ago Closed 7 years ago

Crash [@ js::gc::IsInsideNursery]

Tracking

()

Status:

VERIFIED FIXED

Milestone:

mozilla55

Tracking Flags:

Tracking

Status

firefox-esr45

---

unaffected

firefox52

---

unaffected

firefox-esr52

---

unaffected

firefox53

---

wontfix

firefox54

---

wontfix

firefox55

---

verified

People

(Reporter: gkw, Assigned: luke)

References

Details

(Keywords: crash, sec-moderate, testcase, Whiteboard: [jsbugmon:update][adv-main55+])

Crash Data

Attachments

(3 files, 1 obsolete file)

Detailed Crash Information 7 years ago Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now) 8.64 KB, text/plain		Details
fix-interrupt-fp 7 years ago Luke Wagner [:luke] 41.28 KB, patch		Details \| Diff \| Splinter Review
make-test-faster 7 years ago Luke Wagner [:luke] 2.44 KB, patch	bbouvier : review+	Details \| Diff \| Splinter Review
fix-interrupt-fp 7 years ago Luke Wagner [:luke] 43.50 KB, patch	bbouvier : review+	Details \| Diff \| Splinter Review

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Reporter

Description

•

7 years ago

The following testcase crashes on mozilla-central revision b043233ec04f (build with --enable-debug --enable-more-deterministic, run with --fuzzing-safe --no-threads --ion-eager):

// Adapted from randomly chosen test: js/src/jit-test/tests/debug/execution-observability-03.js
p = new Proxy(this, {
    "deleteProperty": function () {
        g = newGlobal();
        g.parent = this;
        g.eval("Debugger(parent).onEnterFrame = function() {};");
        return 1;
    }
});
delete p.f;
// Adapted from randomly chosen test: js/src/jit-test/tests/wasm/timeout/1.js
timeout(0.01);
var code = wasmTextToBinary('(module(func(export "f1")(loop$top br$top)))');
new WebAssembly.Instance(new WebAssembly.Module(code)).exports.f1();


Backtrace:

#0  js::gc::IsInsideNursery (cell=0xe4e4e4e400000000) at /home/ubuntu/shell-cache/js-dbg-64-dm-linux-b043233ec04f/objdir-js/dist/include/js/HeapAPI.h:328
#1  js::gc::Cell::isTenured (this=0xe4e4e4e400000000) at js/src/gc/Heap.h:251
#2  JSObject::readBarrier (obj=0xe4e4e4e400000000) at js/src/jsobj.h:644
#3  js::InternalBarrierMethods<js::WasmInstanceObject*>::readBarrier (v=0xe4e4e4e400000000) at js/src/gc/Barrier.h:266
#4  js::ReadBarrieredBase<js::WasmInstanceObject*>::read (this=0x7f0329759068) at js/src/gc/Barrier.h:553
/snip

For detailed crash information, see attachment.

Setting s-s because GC is on the stack, as a start.

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Reporter

Comment 1

•

7 years ago

Attached file Detailed Crash Information — Details

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Reporter

Comment 2

•

7 years ago

autoBisect shows this is probably related to the following changeset:

The first bad revision is:
changeset:   https://hg.mozilla.org/mozilla-central/rev/766ead465209
user:        Yury Delendik
date:        Sat Jan 07 10:36:11 2017 -0600
summary:     Bug 1286948 - Extends AbstractFramePtr to reference wasm::DebugFrame. r=luke,shu

Yury, is bug 1286948 a likely regressor?

Blocks: 1286948

Flags: needinfo?(ydelendik)

Ryan VanderMeulen [:RyanVM]

Updated

•

7 years ago

status-firefox52: --- → unaffected

status-firefox53: --- → affected

status-firefox54: --- → affected

status-firefox-esr45: --- → unaffected

status-firefox-esr52: --- → unaffected

Daniel Veditz [:dveditz]

Comment 3

•

7 years ago

what does the 0xe4 memory poison value mean?

calling this "sec-moderate" because it looks like Debugger calls not available to the web are involved, but if the underlying bug can be hit from a normal context we should raise this to sec-high.

Keywords: sec-moderate

Eric Rahm [:erahm]

Comment 4

•

7 years ago

(In reply to Daniel Veditz [:dveditz] from comment #3)
> what does the 0xe4 memory poison value mean?

It's our junk fill [1]. Basically what you get when you malloc something but don't initialize it.

[1] http://searchfox.org/mozilla-central/rev/b8cce081200129a1307e2a682f121135b5ca1d19/memory/mozjemalloc/jemalloc.c#5107-5112

Luke Wagner [:luke]

Assignee

Comment 5

•

7 years ago

Ah hah, what's happening here is that the timeout() is asynchronously interrupting execution while wasm frames are on the stack.  Because the async interrupt doesn't set WasmActivation::exitFP, FrameIter doesn't find any frames and so the soon-to-be-popped frames are left in the debugger's frame map where they later explode.

I think the fix is for the interrupt callback to pass 'fp' to WasmHandleExecutionInterrupt() so that it can set exitFP.  This would actually strengthen our invariants about WasmActivation::exitFP and remove some async-interrupt special cases in other places.

Luke Wagner [:luke]

Assignee

Comment 6

•

7 years ago

Attached patch fix-interrupt-fp (obsolete) — Details — Splinter Review

I'm pretty happy about this patch: it removes an entire case of "interrupted and lost the stack" with a pretty aggressive assert in FrameIterator.  It also means we always get full stacks in slow-script-interrupt situations.

Assignee: nobody → luke

Attachment #8857584 - Flags: review?(bbouvier)

Luke Wagner [:luke]

Assignee

Comment 7

•

7 years ago

Attached patch make-test-faster — Details — Splinter Review

Unrelated, I saw this test timeout on non-JIT and eager runs.  Disabling in these cases allows the test to run in 7s without loss of coverage.

Attachment #8857585 - Flags: review?(bbouvier)

Luke Wagner [:luke]

Assignee

Comment 8

•

7 years ago

Actually, the new tighter assertion shows that we now depend on the builtin thunks (b/c some of them can GC, and thus iterate the stack).  Great timing on that patch Benjamin!

Blocks: 1340219

Luke Wagner [:luke]

Assignee

Comment 9

•

7 years ago

Attached patch fix-interrupt-fp — Details — Splinter Review

Testing on windows revealed that GetThreadContext() on Win64 was never returning fp; the CONTEXT_CONTROL flag had to be upgraded to CONTEXT_FULL to get fp (but only on Win64; on Win32, CONTEXT_CONtroL gives you fp...).  Yay for stronger assertions!

Attachment #8857584 - Attachment is obsolete: true

Attachment #8857584 - Flags: review?(bbouvier)

Attachment #8857666 - Flags: review?(bbouvier)

Benjamin Bouvier [:bbouvier] (inactive)

Comment 10

•

7 years ago

Comment on attachment 8857585 [details] [diff] [review]
make-test-faster

Review of attachment 8857585 [details] [diff] [review]:
-----------------------------------------------------------------

Ha, fun!

Attachment #8857585 - Flags: review?(bbouvier) → review+

Benjamin Bouvier [:bbouvier] (inactive)

Comment 11

•

7 years ago

Comment on attachment 8857666 [details] [diff] [review]
fix-interrupt-fp

Review of attachment 8857666 [details] [diff] [review]:
-----------------------------------------------------------------

So nice.

That makes me wonder about the remaining builtin calls that don't get thunked, but I think that for these we're good as well, because either they have their own exit prologue/epilogue (e.g. callimport, coerce_in_place, traps and debug trap, ), or they'll inherit the FP value of the caller (e.g. throw, OOB, unaligned accesses, and now handle interrupt with this patch). Good.

::: js/src/jit/arm/Simulator-arm.cpp
@@ +1555,5 @@
> +void
> +Simulator::handleWasmInterrupt()
> +{
> +    void* pc = reinterpret_cast<void*>(get_pc());
> +    uint8_t* fp = reinterpret_cast<uint8_t*>(get_register(r11));

s/r11/FramePointer/ ?

@@ +1580,5 @@
>      if (!act)
>          return false;
>  
>      void* pc = reinterpret_cast<void*>(get_pc());
> +    uint8_t* fp = reinterpret_cast<uint8_t*>(get_register(r11));

(same comment as above)

::: js/src/vm/Stack.cpp
@@ +1687,5 @@
> +    // interrupt-during-interrupt.
> +    MOZ_ASSERT(!interrupted());
> +    MOZ_ASSERT(compartment()->wasm.lookupCode(pc)->lookupRange(pc)->isFunction());
> +    MOZ_ASSERT(pc);
> +    MOZ_ASSERT(fp);

I'd move the last two assertions before the lookupCode assertion (since the lookupCode assertion will fail if !pc, but having the pc assertion trigger first is more explanatory).

@@ +1697,5 @@
> +void
> +WasmActivation::finishInterrupt()
> +{
> +    MOZ_ASSERT(resumePC_);
> +    MOZ_ASSERT(exitFP_);

Or just MOZ_ASSERT(interrupted())?

::: js/src/vm/Stack.h
@@ +1764,5 @@
> +    // simulator) and cleared by WasmHandleExecutionInterrupt or WasmHandleThrow
> +    // when the interrupt is handled.
> +    void startInterrupt(void* pc, uint8_t* fp);
> +    void finishInterrupt();
> +    bool interrupted() const { return !!resumePC_; }

Would it make sense that in the dtor of WasmActivation, we'd assert !interrupted()? (although the interrupt stub handler doesn't reset the state, so i'm not sure it holds as is)

::: js/src/wasm/WasmFrameIterator.cpp
@@ -235,4 @@
>      // Only non-imported functions can have debug frames.
>      return code_->metadata().debugEnabled &&
> -           fp_ &&
> -           !missingFrameMessage_ &&

I like these simplifications!

::: js/src/wasm/WasmSignalHandlers.cpp
@@ +843,1 @@
>          // to InterruptRunningCode's use of SuspendThread. When this happens,

pre-existing: InterruptRunningCode has changed name. There's a second occurrence above.

Attachment #8857666 - Flags: review?(bbouvier) → review+

Benjamin Bouvier [:bbouvier] (inactive)

Updated

•

7 years ago

Flags: needinfo?(ydelendik)

Luke Wagner [:luke]

Assignee

Comment 12

•

7 years ago

(In reply to Benjamin Bouvier [:bbouvier] from comment #11)
Thanks!

> s/r11/FramePointer/ ?

Unfortunately, the ARM simulator has its own set of enums for registers so FramePointer isn't compatible.

> Would it make sense that in the dtor of WasmActivation, we'd assert
> !interrupted()? (although the interrupt stub handler doesn't reset the
> state, so i'm not sure it holds as is)

Technically we already do (by asserting resumePC_ == null).  The interrupt state is always cleared on both the continue and stop paths (by WasmHandleExecutionInterrupt and WasmHandleThrow, resp.).  I can change to !interrupted(), though.

Luke Wagner [:luke]

Assignee

Comment 13

•

7 years ago

Landed:
  https://hg.mozilla.org/integration/mozilla-inbound/rev/f90707333e7f6daaaf364bc61890ba2b1849199d
  https://hg.mozilla.org/integration/mozilla-inbound/rev/20e85b5bd3429c3e7386b56fe6c6056d52219eb0

This created an intermittent win8 failure of asm.js/testBug1117255.js that I had missed in try and local runs.  Reducing the shell's watchdog timer interval from .1s to 1ms makes it reproduce easily, though.  With printf logging I can see a most bizarre sequence:
 1. the watchdog interrupts the thread at PC x and attempts to redirect to PC y
 2. an OOB fault at PC x fires (probably it had already fired internally before step 1, but hadn't bubbled up to userspace yet)
 3. the OOB handler handles successfully by redirecting PC and this *clobbers* the redirect in made in step 1
 4. now WasmActivation::interrupt() is left 'true' even though the interrupt handler will never be called (b/c of step 3)

Fortunately, just like the other weird case in the Windows handler, this is precisely detectable and has a simple fix (once you understand the madness):
  https://hg.mozilla.org/integration/mozilla-inbound/rev/b411de2683e99d545e82f55113a0f0dc2920808b

Ryan VanderMeulen [:RyanVM]

Comment 14

•

7 years ago

https://hg.mozilla.org/mozilla-central/rev/20e85b5bd342
https://hg.mozilla.org/mozilla-central/rev/f90707333e7f
https://hg.mozilla.org/mozilla-central/rev/b411de2683e9

Status: NEW → RESOLVED

Closed: 7 years ago

status-firefox55: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla55

Ryan VanderMeulen [:RyanVM]

Comment 15

•

7 years ago

Looks like we'll want to uplift this to 54?

status-firefox53: affected → wontfix

Flags: needinfo?(luke)

Luke Wagner [:luke]

Assignee

Comment 16

•

7 years ago

IIUC, the bug can only be activated by the debugger combined with the slow-script dialog.  I'm not sure what the policy is for rare devtools-only crashes, but I'd guess this didn't require uplift and letting the fix ride the trains would be fine.

Flags: needinfo?(luke)

Ryan VanderMeulen [:RyanVM]

Comment 17

•

7 years ago

Good enough for me.

status-firefox54: affected → wontfix

Daniel Veditz [:dveditz]

Updated

•

7 years ago

Group: javascript-core-security → core-security-release

Fuzzing Team

Updated

•

7 years ago

Status: RESOLVED → VERIFIED

status-firefox55: fixed → verified

Fuzzing Team

Comment 18

•

7 years ago

JSBugMon: This bug has been automatically verified fixed.

Al Billings [:abillings - ex-MoCo]

Updated

•

7 years ago

Whiteboard: [jsbugmon:update] → [jsbugmon:update][adv-main55+]

Daniel Veditz [:dveditz]

Updated

•

6 years ago

Group: core-security-release

David Lawrence [:dkl]

Updated

•

5 months ago

Keywords: bugmon

You need to log in before you can comment on or make changes to this bug.