Closed Bug 1367871 Opened 8 years ago Closed 8 years ago

Assertion failure: funcIndex >= AsmJSFirstDefFuncIndex, at js/src/wasm/AsmJS.cpp:372

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla55

Tracking Flags:

Tracking

Status

firefox-esr52

---

unaffected

firefox53

---

unaffected

firefox54

---

unaffected

firefox55

---

fixed

People

(Reporter: gkw, Assigned: bbouvier)

References

Details

(Keywords: assertion, bugmon, testcase, Whiteboard: [jsbugmon:])

Attachments

(3 files, 1 obsolete file)

Detailed Crash Information 8 years ago Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now) 10.73 KB, text/plain		Details
Testcase 8 years ago Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now) 2.89 MB, text/plain		Details
fix.patch 8 years ago Benjamin Bouvier [:bbouvier] (inactive) 4.79 KB, patch		Details \| Diff \| Splinter Review
fix.patch 8 years ago Benjamin Bouvier [:bbouvier] (inactive) 2.97 KB, patch	luke : review+	Details \| Diff \| Splinter Review

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Reporter

Description

•

8 years ago

The following testcase crashes on mozilla-central revision f81bcc23d37d (build with --enable-debug --enable-more-deterministic, run with --fuzzing-safe --no-threads --no-baseline --no-ion): See attachment. Backtrace: #0 0x0000000000d02000 in js::AsmJSMetadata::getFuncName (this=this@entry=0x7f9ed0618400, maybeBytecode=<optimized out>, funcIndex=funcIndex@entry=0, name=name@entry=0x7fff96b64fe0) at js/src/wasm/AsmJS.cpp:372 #1 0x0000000000d697e6 in js::wasm::Instance::getFuncName (name=0x7fff96b64fe0, funcIndex=0, this=<optimized out>) at js/src/wasm/WasmInstance.cpp:775 #2 js::wasm::Instance::getFuncAtom (this=<optimized out>, cx=0x7f9ed0672000, funcIndex=0) at js/src/wasm/WasmInstance.cpp:782 #3 0x0000000000d698c5 in js::wasm::FrameIterator::functionDisplayAtom (this=this@entry=0x7fff96b65720) at js/src/wasm/WasmFrameIterator.cpp:177 #4 0x0000000000bdf810 in js::FrameIter::functionDisplayAtom (this=this@entry=0x7fff96b65690) at js/src/vm/Stack.cpp:834 #5 0x0000000000bca1a0 in js::SavedStacks::insertFrames(JSContext*, js::FrameIter&, JS::MutableHandle<js::SavedFrame*>, mozilla::Variant<JS::AllFrames, JS::MaxFrames, JS::FirstSubsumedFrame>&&) (this=this@entry=0x7f9ed064e0f8, cx=cx@entry=0x7f9ed0672000, iter=..., frame=..., frame@entry=..., capture=capture@entry=<unknown type in /home/gkwubu/shell-cache/js-dbg-64-dm-linux-f81bcc23d37d/js-dbg-64-dm-linux-f81bcc23d37d, CU 0x4d7d1f7, DIE 0x4f98adc>) at js/src/vm/SavedStacks.cpp:1352 /snip For detailed crash information, see attachment. This is a reproducible/reliable testcase, but hard to reduce.

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Reporter

Comment 1

•

8 years ago

Attached file Detailed Crash Information — Details

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Reporter

Comment 2

•

8 years ago

Attached file Testcase — Details

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Reporter

Comment 3

•

8 years ago

autoBisect shows this is probably related to the following changeset: The first bad revision is: changeset: https://hg.mozilla.org/mozilla-central/rev/766ead465209 user: Yury Delendik date: Sat Jan 07 10:36:11 2017 -0600 summary: Bug 1286948 - Extends AbstractFramePtr to reference wasm::DebugFrame. r=luke,shu Yury/Benjamin, is bug 1286948 a likely regressor? (Note that line 982 might be the interesting portion of the testcase, reproduced below) s = "timeout(0.01);var code=wasmTextToBinary(`(module(func(export\"f1\")(loop$top br$top)))`);new WebAssembly.Instance(new WebAssembly.Module(code)).exports.f1()"

Blocks: 1286948

Flags: needinfo?(ydelendik)

Flags: needinfo?(bbouvier)

Fuzzing Team

Updated

•

8 years ago

Whiteboard: [jsbugmon:update] → [jsbugmon:]

Fuzzing Team

Comment 4

•

8 years ago

JSBugMon: Cannot process bug: Error: Failed to isolate test from comment

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 5

•

8 years ago

Looking. So far, I can only reproduce with --enable-deterministic mode, fwiw.

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 6

•

8 years ago

After some time spent in the debugger and rr, found how this issue could arise and made a smaller test case which reproduces also in non-deterministic mode: function asmJS(glob, ffi) { "use asm"; var out = ffi.out; function f() { out(); } return f; } var ffi = {}; ffi.out = new WebAssembly.Instance(new WebAssembly.Module(wasmTextToBinary('(module (func (export "f") (loop $top (br $top))))'))).exports.f; timeout(0.1); asmJS(this, ffi)(); The stack frame iterator is using the activation->resumePC() (which is just runtime->wasmResumePC(), common to all the wasm::Activation) to recover the Code/CodeRange, but when we're in the asm.js activation, the value of resumePC is incorrect! (it's set to the wasm infinite loop's body) Thus we're getting the wasm Code/CodeRange values, and the function index for the first wasm function (=== 0) is less than AsmJSFirstFuncDef, hence this assertion. Luke, this suggests that the resumePC should actually remain in the WasmActivation, right?

Flags: needinfo?(ydelendik)

Flags: needinfo?(luke)

Flags: needinfo?(bbouvier)

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 7

•

8 years ago

(duh, bugzilla chose to remove my needinfo on my behalf :/)

Flags: needinfo?(bbouvier)

Luke Wagner [:luke]

Comment 8

•

8 years ago

Ah hah, I think the problem is when there are two WasmActivations on the stack for difference instances. After bug 1360263, WasmActivation::interrupted() just forwards to JSRuntime::wasmResumePC() so both WasmActivations will think they are interrupted when only the inner activation actually is. For the outer WasmActivation, JSRuntime::wasmResumePC will then be used to lookup the Code/CodeRange which will not match fp->instance->code. I think the fix is to make WasmActivation::interrupted() only say 'true' for the innermost; once way would be to have it do a search from cx->activation() down to the first act where act->isWasm() and check whether act == this. Also, it'd be good to assert fp->instance->code == code_ in FrameIterator's ctor.

Flags: needinfo?(luke)

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 9

•

8 years ago

This fix is not enough, but I am not too sure to understand why: in the jit-tests, wasm/timeout/2.js asserts (no code_ in a WasmFrameIterator ctor): http://searchfox.org/mozilla-central/source/js/src/jit-test/tests/wasm/timeout/2.js I wonder if this is that the gc() after the timeout has caused the ilooping Instance to be deregistered from the wasm compartment, causing the lookupCode at the compartment level to fail, because the resumePC is still in the ilooping Instance? I think the interrupted() function should also check that the resumePC is actually in the activation's instance code. With a bit of refactoring, code_ in WasmFrameIterator can be inferred from the interrupted() call.

Flags: needinfo?(bbouvier)

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 10

•

8 years ago

Attached patch fix.patch (obsolete) — Details — Splinter Review

Attempt at fix described above.

Assignee: nobody → bbouvier

Status: NEW → ASSIGNED

Attachment #8872692 - Flags: review?(luke)

Luke Wagner [:luke]

Comment 11

•

8 years ago

Comment on attachment 8872692 [details] [diff] [review] fix.patch Review of attachment 8872692 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/vm/Stack.cpp @@ +1710,5 @@ > + continue; > + > + WasmActivation* act = iter->asWasm(); > + if (act != this) > + continue; Hmm, I think what you want to say here is just 'return false'. Also, I wouldn't try to optimize the lookup of code by returning pcode. Rather, I'd just keep this a simple boolean "is this wasm activation interrupted?" query.

Attachment #8872692 - Flags: review?(luke)

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 12

•

8 years ago

Attached patch fix.patch — Details — Splinter Review

Attachment #8872692 - Attachment is obsolete: true

Attachment #8872707 - Flags: review?(luke)

Luke Wagner [:luke]

Comment 13

•

8 years ago

Comment on attachment 8872707 [details] [diff] [review] fix.patch Review of attachment 8872707 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/vm/Stack.cpp @@ +1713,5 @@ > + if (act != this) > + return false; > + > + wasm::Frame* fp = act->exitFP(); > + return fp && fp->instance()->code().containsFunctionPC(pc); Can you change this to an assert? Because the meaning of !interrupted() is that the WasmActivation exited "normally" and that's not the case if this test fails.

Attachment #8872707 - Flags: review?(luke) → review+

Pulsebot

Comment 14

•

8 years ago

Pushed by bbouvier@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/8f6a8013e62c Loop over the wasm activations to find the one that's actually interrupted; r=luke

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 15

•

8 years ago

Probably more likely to be caused by bug 1360263. I've checked Aurora and the assertion doesn't trigger there.

Blocks: 1360263
No longer blocks: 1286948

status-firefox53: --- → unaffected

status-firefox54: --- → unaffected

Pulsebot

Comment 16

•

8 years ago

Backout by ryanvm@gmail.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/207e97b313e5 Backed out changeset 8f6a8013e62c for Stack.cpp asserts on a CLOSED TREE.

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 17

•

8 years ago

This caused https://treeherder.mozilla.org/logviewer.html#?job_id=103431591&repo=mozilla-inbound&lineNumber=3286 and the patch got backed out (thanks RyanVM). I've noticed that wasm::MaybeActiveActivation was doing the same as my manual loop with ActivationIterator (without the cx check), so I could replace the whole chunk with: auto* act = wasm::MaybeActiveActivation(cx_); if (act != this) return false; MOZ_ASSERT(fp && fp->instance()->code().containsFunctionPC(pc)); return true; Unfortunately, this now breaks wasm/timeout/2.js, because fp->instance() is nullptr when act == this... I have no idea why, especially since the two codes seem semantically equivalent. Luke, any idea why?

Flags: needinfo?(luke)

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 18

•

8 years ago

Wait, the assert (at the origin of the backout) only triggered for windows builds?

Luke Wagner [:luke]

Comment 19

•

8 years ago

That's strange. So you're saying that fp is non-null, fp->tls is non-null but fp->tls->instance is null? That's super-weird b/c TlsData::instance should be simply initialized to 'this' in the Instance ctor. Maybe the nullness is elsehwere? Also, could you confirm that the wasmResumePC is in the instance with the iloop and actually within that iloop?

Flags: needinfo?(luke)

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 20

•

8 years ago

(In reply to Luke Wagner [:luke] from comment #19) > That's strange. So you're saying that fp is non-null, fp->tls is non-null > but fp->tls->instance is null? Correct, given I modify the code as explained in comment 17. Otherwise this doesn't show up in a linux build... > That's super-weird b/c TlsData::instance > should be simply initialized to 'this' in the Instance ctor. Maybe the > nullness is elsehwere? Will investigate a bit more. > Also, could you confirm that the wasmResumePC is in the instance with the > iloop and actually within that iloop? Yes to all of this.

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 21

•

8 years ago

(In reply to Benjamin Bouvier [:bbouvier] from comment #20) > Will investigate a bit more. So, fun story: MaybeActiveActivation returns a WasmActivation iff it's at the top or below N JitActivations. The change suggested in comment 17 was causing a crash in wasm/timeout/2, because of a series of events. wasm/timeout/2.js does the following: an instance calls into another instance (of another module), the callee being an infinite loop. There's a timeout function, which final callback does a gc() before continuing. - in the final callback, a gc() call happens (we're in JS, so the top activaiton is an interpreter one). - we trace wasm activations, using a wasm frame iterator - the wasm frame iterator finds the first activation, calls interrupted() - because of the change in comment 17, MaybeActiveActivation would return nullptr (top activation is an interpreter activation). - so the innermost wasm frame would think it's not the interrupted one, and be popped in the wasm frame iterator - since the inner frame has been skipped, the ilooping callee instance is not marked - then it's gc'd - after the final timeout callback, the JS interpreter tries to make a stack trace, using another wasm frame iterator - now it can figure that the innermost wasm frame is the interrupted one (no more interpreter frame above) - but its instance has been gc'd! hence the crash. TL;DR: we should rename MaybeActiveActivation to MaybeActiveTopActivation, or MaybeActiveInnermostActivation, or something that refers to skipping the JitActivation, I don't know. Now, back to the Windows issue.

Luke Wagner [:luke]

Comment 22

•

8 years ago

Wow, good job detectiving that. Agreed on renaming.

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 23

•

8 years ago

Aha! When scheduling an interrupt on windows, we don't switch thread but actually use the current thread [1] (watchdog thread, in the JS shell case), which never initializes TlsContext. The fix is as easy as mimicking the unrolled ActivationIterator loop, without the cx == TlsContext.get() assertion. [1] http://searchfox.org/mozilla-central/source/js/src/wasm/WasmSignalHandlers.cpp#1588-1591

Pulsebot

Comment 24

•

8 years ago

Pushed by bbouvier@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/db0bd0d1b8fd Loop over the wasm activations to find the one that's actually interrupted; r=luke

Ryan VanderMeulen [:RyanVM]

Comment 25

•

8 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/db0bd0d1b8fd

Status: ASSIGNED → RESOLVED

Closed: 8 years ago

status-firefox55: affected → fixed

Flags: in-testsuite+

Resolution: --- → FIXED

Target Milestone: --- → mozilla55

Ryan VanderMeulen [:RyanVM]

Updated

•

8 years ago

status-firefox-esr52: --- → unaffected

You need to log in before you can comment on or make changes to this bug.