Closed Bug 1370696 Opened 7 years ago Closed 7 years ago

Intermittent js/src/jit-test/tests/wasm/timeout/debug-noprofiling.js | Assertion failure: fp && fp->instance()->code().containsFunctionPC(pc), at js/src/vm/Stack.cpp:1701 (code -11, args "--no-baseline --no-ion")

Categories

(Core :: JavaScript Engine: JIT, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla56
Tracking Status
firefox-esr52 --- unaffected
firefox54 --- unaffected
firefox55 --- fixed
firefox56 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: bbouvier)

References

Details

(Keywords: assertion, intermittent-failure)

Attachments

(3 files, 4 obsolete files)

I've tried 3800 runs and couldn't reproduce with:

while true; do rr record --chaos ./dist/bin/js --no-baseline --no-ion /tmp/test.js || [ $? -eq 6 ] || break; done
It was the SM(arm) build in case that matters.
See Also: → 1371138
Ok, I have a theory: if we interrupt during the first few instructions of the prologue of a call_indirect that switches instance, then 'fp' can still point to the caller's frame which will have a difference instance/code than wasmResumePC (which is in the callee's code).  This is the same case handled by ProfilingFrameIterator (where it looks at which exact instruction of the prologue/epilogue and uses sp), so I think we just need to hoist the logic out so it can be used when we interrupt.

This is actually a pre-existing bug; it's just the assert that's new so if it's hitting too much, we could comment it out until we have the fix above.
First part: a way to make it 100% reproducible, by allowing a way to fake interrupts between each instruction in the ARM simulator.
Assignee: nobody → bbouvier
Status: NEW → ASSIGNED
Attachment #8875796 - Flags: review?(luke)
Comment on attachment 8875796 [details] [diff] [review]
alwaysinterrupt.patch

Review of attachment 8875796 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/wasm/WasmSignalHandlers.h
@@ +54,5 @@
> +
> +// Returns true if wasm code is on top of the activation stack (and fills out
> +// parameters in this case), or false otherwise.
> +bool
> +IsPCInWasmCode(JSContext* cx, uint8_t* pc, const CodeSegment** cs, WasmActivation** act);

There's also a wasm::InCompiledCode() in WasmFrameIterator.h.  To distinguish this one, perhaps name it wasm::InInterruptibleCode()?

Also, I think only the WasmSignalHandlers.cpp use needs a WasmActivation so I think you could boot it out.
Attachment #8875796 - Flags: review?(luke) → review+
Attached patch 2.refactor.patch (obsolete) — Splinter Review
(back to wasm after too much arewefastyet fun!)

This takes code from the WasmFrameIterator into a new function called FixupRegisterState, using new data structures as in and out arguments.

The plan is to use this function in WasmActivation::startInterrupt (passing the CallerRegisterState as an argument of startInterrupt).
Attachment #8879242 - Flags: feedback?(luke)
Comment on attachment 8879242 [details] [diff] [review]
2.refactor.patch

Just realized it's incorrect in the case of non-prologue/non-epilogue.
Attachment #8879242 - Flags: feedback?(luke)
Nice start, I like where this is going (bugs notwithstanding :).

Some naming/style nits:
 - could we reuse JS::RegisterState instead of creating wasm::FullRegisterState?
 - could the factored-out function be named wasm::StartUnwinding()?
 - could wasm::CallerRegisterState be generalized to wasm::UnwindState and contain all the outparams of wasm::StartUnwinding (incl. optional ones)?  I'm thinking that, after the iterator merging, wasm::FrameIterator would want to store a wasm::UnwindState directly.
Attached patch 2.refactor.patch (obsolete) — Splinter Review
Thanks for the feedback! The naming and your suggestion make the interface much simpler, even though StartUnwinding is still a bit hairy.
Attachment #8879242 - Attachment is obsolete: true
Attachment #8880056 - Flags: review?(luke)
Attached patch 3.signalhandlers.patch (obsolete) — Splinter Review
So now in WasmActivation::startInterrupt, we can use the StartUnwinding function. Note we might now have a different value for the actual resumePC and the one returned by StartUnwinding (if we're in the parent state), so we need to store another value of PC in the runtime: the actual resumePC, and the PC used for unwinding in the profiling iterators.

After that, the patch is pretty much mechanical; I've checked and it passes all the timeout tests (including the modified tests with always-interrupt) on x86, x64 and the arm simulator.
Attachment #8880057 - Flags: review?(luke)
Attached patch 3.fixup-interrupts.patch (obsolete) — Splinter Review
Fixed Android build + manual testing on Android.
Attachment #8880057 - Attachment is obsolete: true
Attachment #8880057 - Flags: review?(luke)
Attachment #8880444 - Flags: review?(luke)
Comment on attachment 8880056 [details] [diff] [review]
2.refactor.patch

Review of attachment 8880056 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/wasm/WasmFrameIterator.cpp
@@ +609,5 @@
>  {
> +    // Shorthands.
> +    uint8_t* pc = (uint8_t*) registers.pc;
> +    Frame* fp = (Frame*) registers.fp;
> +    void** sp = (void**) registers.sp;

Could you make these 3 variables const so it's clear there's no funny-business below?

@@ +776,5 @@
> +
> +    bool unwoundCaller;
> +    UnwindState unwindState;
> +    if (!StartUnwinding(*activation_, state, &unwindState, &unwoundCaller)) {
> +        MOZ_ASSERT(!unwindState.codeRange);

This assert doesn't seem necessary since, on failure, you wouldn't expect outparams to be set.
Attachment #8880056 - Flags: review?(luke) → review+
Comment on attachment 8880444 [details] [diff] [review]
3.fixup-interrupts.patch

Review of attachment 8880444 [details] [diff] [review]:
-----------------------------------------------------------------

Great job here!

::: js/src/vm/Stack.h
@@ +1764,3 @@
>      void finishInterrupt();
>      bool interrupted() const;
> +    void* profilingPC() const;

Since it's not just for profiling, but for all stack iteration during an interrupt, could you rename profilingPC to unwindPC (here and throughout the patch; wasmUnwindPC in other cases)?

::: js/src/wasm/WasmSignalHandlers.cpp
@@ +507,5 @@
> +}
> +#endif // XP_DARWIN
> +
> +static JS::ProfilingFrameIterator::RegisterState
> +ToRegisterState(void* pc, CONTEXT* context)

Does 'pc' need to be passed or, for regularity, could it be derived from the CONTEXT as well?
Attachment #8880444 - Flags: review?(luke) → review+
Thanks for the reviews! Addressed all the comments.

(In reply to Luke Wagner [:luke] from comment #15)
> Comment on attachment 8880056 [details] [diff] [review]
> 2.refactor.patch
> 
> Review of attachment 8880056 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: js/src/wasm/WasmFrameIterator.cpp
> @@ +609,5 @@
> >  {
> > +    // Shorthands.
> > +    uint8_t* pc = (uint8_t*) registers.pc;
> > +    Frame* fp = (Frame*) registers.fp;
> > +    void** sp = (void**) registers.sp;
> 
> Could you make these 3 variables const so it's clear there's no
> funny-business below?

Yes (the pointer has to be const, the pointed value doesn't, e.g. uint8_t* const pc = ...).
Pushed by bbouvier@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/59fa15a39162
Add an option to trigger an interrupt every single instruction in the simulator; r=luke
https://hg.mozilla.org/integration/mozilla-inbound/rev/8e9bfb53f309
Factor out register state unwinding to its own function; r=luke
https://hg.mozilla.org/integration/mozilla-inbound/rev/9ff531898139
Use register state unwinding when interrupting within wasm; r=luke
Please request Beta approval on this when you get a chance.
Flags: needinfo?(bbouvier)
Attached patch 2.refactor.patchSplinter Review
Approval Request Comment
[Feature/Bug causing the regression]: wasm
[User impact if declined]: subtle causes of crashes in wasm apps
[Is this code covered by automated tests?]: yes
[Has the fix been verified in Nightly?]: yes
[Needs manual test from QE? If yes, steps to reproduce]: no
[List of other uplifts needed for the feature/fix]: (next patch)
[Is the change risky?]: low to medium risk
[Why is the change risky/not risky?]: risky because it involves a refactoring and sensitive code touching profiling. Not too much because it's restricted to a very small part of the wasm engine.
[String changes made/needed]: n/a
Attachment #8880056 - Attachment is obsolete: true
Flags: needinfo?(bbouvier)
Attachment #8881366 - Flags: review+
Attachment #8881366 - Flags: approval-mozilla-beta?
Approval Request Comment: see previous comment

(rolled up bustage fix)
Attachment #8880444 - Attachment is obsolete: true
Attachment #8881367 - Flags: review+
Attachment #8881367 - Flags: approval-mozilla-beta?
Comment on attachment 8881366 [details] [diff] [review]
2.refactor.patch

kind of scary but let's get this in 55.0b6, we have 6 weeks to go still
Attachment #8881366 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #8881367 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Depends on: 1379430
You need to log in before you can comment on or make changes to this bug.