Closed Bug 1480390 Opened 6 years ago Closed 5 years ago

Crash [@ js::jit::MachineState::read] with Debugger and OOM

Categories

(Core :: JavaScript Engine, defect, P1)

x86_64
Linux
defect

Tracking

()

RESOLVED FIXED
mozilla66
Tracking Status
firefox-esr60 --- wontfix
firefox63 --- wontfix
firefox64 --- wontfix
firefox65 --- wontfix
firefox66 --- fixed

People

(Reporter: decoder, Assigned: iain)

References

(Blocks 1 open bug)

Details

(4 keywords, Whiteboard: [jsbugmon:update])

Crash Data

Attachments

(5 files, 3 obsolete files)

The following testcase crashes on mozilla-central revision bd79b07f57a3 (build with --enable-posix-nspr-emulation --enable-valgrind --enable-gczeal --disable-tests --disable-profiling --enable-debug --enable-optimize, run with --fuzzing-safe):

var g = newGlobal();
g.parent = this;
g.eval("new Debugger(parent).onExceptionUnwind = function () {};");
oomTest(new Function(`
  function* wrapNoThrow() {
    let iter = {
      [Symbol.iterator]() {
        return this;
      },
      next() {
        return { value: 10, done: false };
      },
      return() {}
    };
    for (const i of iter)
      yield i;
  }
  for (const i of wrapNoThrow()) break;
`));


Backtrace:

received signal SIGSEGV, Segmentation fault.
0x00000000008164e0 in js::jit::MachineState::read (reg=..., this=<optimized out>) at js/src/jit/Registers.h:322
#0  0x00000000008164e0 in js::jit::MachineState::read (reg=..., this=<optimized out>) at js/src/jit/Registers.h:322
#1  js::jit::SnapshotIterator::fromRegister (this=0x7fffffffb270, reg=...) at js/src/jit/JSJitFrameIter.h:442
#2  js::jit::SnapshotIterator::allocationValue (this=this@entry=0x7fffffffb270, alloc=..., rm=rm@entry=js::jit::SnapshotIterator::RM_Normal) at js/src/jit/JitFrames.cpp:1740
#3  0x0000000000954209 in js::jit::SnapshotIterator::read (this=this@entry=0x7fffffffb270) at js/src/jit/JSJitFrameIter.h:563
#4  0x00000000010995aa in InitFromBailout (excInfo=0x7fffffffb800, nextCallee=..., startFrameFormals=..., builder=..., invalidate=true, iter=..., script=..., fun=..., frameNo=0, cx=<optimized out>) at js/src/jit/BaselineBailouts.cpp:960
#5  js::jit::BailoutIonToBaseline (cx=<optimized out>, cx@entry=0x7ffff5f17000, activation=<optimized out>, iter=..., invalidate=invalidate@entry=true, bailoutInfo=bailoutInfo@entry=0x7fffffffb398, excInfo=excInfo@entry=0x7fffffffb800) at js/src/jit/BaselineBailouts.cpp:1650
#6  0x000000000109cbcc in js::jit::ExceptionHandlerBailout (cx=cx@entry=0x7ffff5f17000, frame=..., rfe=rfe@entry=0x7fffffffbd58, excInfo=..., overrecursed=overrecursed@entry=0x7fffffffb6e7) at js/src/jit/Bailouts.cpp:217
#7  0x000000000081e3a7 in js::jit::HandleExceptionIon (overrecursed=0x7fffffffb6e7, rfe=0x7fffffffbd58, frame=..., cx=0x7ffff5f17000) at js/src/jit/JitFrames.cpp:204
#8  js::jit::HandleException (rfe=<optimized out>) at js/src/jit/JitFrames.cpp:702
#9  0x000021aaac7303d6 in ?? ()
[...]
#17 0x0000000000000000 in ?? ()
rax	0x101	257
rbx	0x7fffffffb270	140737488335472
rcx	0x1352f10	20262672
rdx	0x7fffffffb400	140737488335872
rsi	0x7fffffffadf4	140737488334324
rdi	0x6	6
rbp	0x7fffffffade0	140737488334304
rsp	0x7fffffffada0	140737488334240
r8	0x1352f40	20262720
r9	0xfff9800000000000	-1829587348619264
r10	0x7ffff5f20e00	140737319669248
r11	0x7ffff5f20f80	140737319669632
r12	0x6	6
r13	0x1	1
r14	0x7fffffffb800	140737488336896
r15	0x7fffffffb270	140737488335472
rip	0x8164e0 <js::jit::SnapshotIterator::allocationValue(js::jit::RValueAllocation const&, js::jit::SnapshotIterator::ReadMethod)+688>
=> 0x8164e0 <js::jit::SnapshotIterator::allocationValue(js::jit::RValueAllocation const&, js::jit::SnapshotIterator::ReadMethod)+688>:	mov    (%rax),%rax
   0x8164e3 <js::jit::SnapshotIterator::allocationValue(js::jit::RValueAllocation const&, js::jit::SnapshotIterator::ReadMethod)+691>:	jmpq   0x8162a8 <js::jit::SnapshotIterator::allocationValue(js::jit::RValueAllocation const&, js::jit::SnapshotIterator::ReadMethod)+120>
I should be the default person to investigate these issues unless jsbugmon blames someone else.
Flags: needinfo?(nicolas.b.pierron)
Whiteboard: [jsbugmon:update,bisect] → [jsbugmon:update]
JSBugMon: Bisection requested, result:
autoBisect shows this is probably related to the following changeset:

The first bad revision is:
changeset:   https://hg.mozilla.org/mozilla-central/rev/20da293c7656
user:        Emanuel Hoogeveen
date:        Wed Jul 26 08:53:00 2017 -0400
summary:     Bug 1342023 - Part 1: Remove ProtectedReallocPolicy from PageProtectingVector. r=jandem

This iteration took 265.611 seconds to run.
Per triage team - Hello Nicolas - Have we been able to investigate this? Thanks.
Emanuel, is bug 1342023 a likely regressor?
Flags: needinfo?(emanuel.hoogeveen)
No, that just removed some extra allocations we were doing to check for memory corruption during realloc. These allocations were only temporary and contained to a single function, so aside from additional OOM risk the only differences caused by them should be timing differences (e.g. taking longer to allocate in one thread compared to another). Perhaps they were masking a pre-existing race condition.
Flags: needinfo?(emanuel.hoogeveen)
If it helps, a small patch that just changes ProtectedReallocPolicy to SystemAllocPolicy in js/src/jit/x86-shared/AssemblerBuffer-x86-shared.h should do the same thing as that patch (the rest was just code removal).
The problem seems to be an issue in IonMonkey.

What happens is that we have an empty MachineState, with fixed (0x100 + reg-index) pointers causing the crash when we are trying to load a register which was not enumerated in the Safepoint (which is strangely empty).
The problem comes from HasLiveStackValueAtDepth [1] which returns "true" for reading the return value instead of "false" while processing an exception.

[1] https://searchfox.org/mozilla-central/rev/f2ac80ab7dbde5400a3400d463e07331194dec94/js/src/jit/BaselineBailouts.cpp#968
Flags: needinfo?(nicolas.b.pierron)
Priority: -- → P1
Flags: needinfo?(nicolas.b.pierron)
Can you help us find someone to take this P1?
Flags: needinfo?(sdetar)
Nicolas is the correct person to look at this, and should do so next week.  I would NI him, but he is already NI'd.
Flags: needinfo?(sdetar)
Assigning to Nicolas as per comment #10
Assignee: nobody → nicolas.b.pierron
Iain, would you like to look at this possibly-OOM issue?
Flags: needinfo?(iireland)
It turns out that this actually has nothing to do with OOM. The only thing oomTest does here is run the code often enough to get us into Ion. Here's an oom-less testcase that triggers the same bug (with --ion-eager and --ionoffthread-compile=off)

```
var g = newGlobal();
g.parent = this;
g.eval("new Debugger(parent).onExceptionUnwind = function () {};");
function* wrapNoThrow() {
    let iter = {
      [Symbol.iterator]() {
        return this;
      },
      next() {
        return { value: 10, done: false };
      },
      return() { return "invalid return value"; }
    };
    for (const i of iter)
      yield i;
  }
function foo() {
    for (const i of wrapNoThrow()) break;
}
try {
    foo();
} catch(e) {}
foo();
```

A rough outline of the bug: we ion-compile foo, but the generator code has to stay at baseline because Ion doesn't support generators. Inside of foo, we break out of the for-of loop. This calls .return() on the iterator. The implementation of return for our custom iterator returns a nonsense value, which throws an exception. (Step 9 here: https://tc39.github.io/ecma262/#sec-iteratorclose). At this point, the stack contains a baseline frame (wrapNoThrow().return()) on top of an ion frame (foo).

The exception triggers the debugger. onExceptionUnwind fires, which triggers the recompilation of the baseline script. 

Eventually the exception propagates out of the baseline frame back to the ion frame, which triggers a bailout to baseline. (The eventual strategy is to throw the exception in that baseline frame: see https://searchfox.org/mozilla-central/source/js/src/jit/JitFrames.cpp#195-211) 

While building the baseline frame, we hit this code: https://searchfox.org/mozilla-central/source/js/src/jit/BaselineBailouts.cpp#1007-1023. This code fixes up the expression stack in cases where we bail out while there are still values there. As far as I can tell, the problem is that HasLiveStackValueAtDepth notices that we are in the middle of a for-of (which would normally leave three values on the stack), but doesn't notice that we are in the process of breaking out, which leaves the stack in a different state. (See https://searchfox.org/mozilla-central/source/js/src/frontend/ForOfLoopControl.cpp#169-228)

I'm still trying to work out where the fix goes.
Flags: needinfo?(iireland)
Notable: I tried adding an assertion just before the switch in HasLiveStackValueAtDepth (https://searchfox.org/mozilla-central/source/js/src/jit/BaselineBailouts.cpp#507), and it appears that prior to this bug we didn't have a single testcase that reached the switch for a trynote with kind JSTRY_FOR_OF_ITERCLOSE.

Based on this, my proposed fix: in HaveLiveStackValueAtDepth, seeing a for-of-iterclose trynote should override the for-of trynote it's nested in.

It's insufficient to just return false if we see for-of-iterclose, because one for-of loop can be nested inside another, and we still need to recover the stack values for the outer loop. Instead, my current approach sets a flag if we see a for-of-iterclose trynote to skip the enclosing for-of trynote, which should be the next one we see at this pc offset. (This relies on trynotes being topologically sorted, with outer scopes coming after inner scopes. This seems to be the case in practice but is not documented anywhere.)

The whole thing feels like a bit of a hack, but I'm not sure what else we can do without overhauling the entire system.
Attachment #9029012 - Flags: feedback?(arai.unmht)
Assignee: nicolas.b.pierron → iireland
Gary: This code looks kind of fragile. Once we've landed a fix for this bug, I'd like to point the fuzzer at this general area of code and see if anything else turns up. Specifically, this template is a good way to trigger the code in question:

```
var g = newGlobal();
g.parent = this;
g.eval("new Debugger(parent).onExceptionUnwind = function () {};");
function* wrapNoThrow() {
    let iter = {
      [Symbol.iterator]() {
        return this;
      },
      next() {
        return { value: 10, done: false };
      },
      return() { return "invalid return value"; }
    };
    for (const i of iter)
      yield i;
  }
[...more code goes here...]
for (const i of wrapNoThrow()) break;
[...more code goes here...]
```

I'm particularly interested in: try/catch/finally, loops (for, for-in, and for-of), break, and array destructuring ([a, b, ...rest] = [10, 20, 30, 40, 50]).

Is it possible for you to steer the fuzzer in this general direction?

PS: The code I want to stress-test handles bailouts from ion to baseline, so ion-eager would probably maximize our chances of finding something.
Flags: needinfo?(nicolas.b.pierron) → needinfo?(nth10sd)
As long as you land the testcase, :decoder and I will have fuzzers that import/interleave/mutate the testcases.
Flags: needinfo?(nth10sd)
Attachment #9029012 - Flags: feedback?(arai.unmht) → feedback+
Depends on D14784
Attachment #9029012 - Attachment is obsolete: true
Attachment #9031966 - Attachment is obsolete: true
Attachment #9032000 - Attachment description: Bug 1480390: Rename TRY_DESTRUCTURING_ITERCLOSE to TRY_DESTRUCTURING r?tcampbell → Bug 1480390: Rename TRY_DESTRUCTURING_ITERCLOSE to TRY_DESTRUCTURING to standardize naming conventions r?tcampbell
Pushed by iireland@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/d30b4fd63e17
Rename TRY_DESTRUCTURING_ITERCLOSE to TRY_DESTRUCTURING to standardize naming conventions r=tcampbell
https://hg.mozilla.org/mozilla-central/rev/d30b4fd63e17
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla66
Should this have been marked leave-open? Would appear that only one cleanup patch landed.
Flags: needinfo?(iireland)
Yes, sorry. Thanks for catching that.
Status: RESOLVED → REOPENED
Flags: needinfo?(iireland)
Keywords: leave-open
Resolution: FIXED → ---
Target Milestone: mozilla66 → ---

Update: every use of TryNoteIter has special logic to handle for-of-iterclose. The obvious thing to do is to move all that logic inside the iterator itself, and then use TryNoteIter in HasLiveStackValueAtDepth instead of handrolling the try note iteration.

While working on this, I found a testcase where we catch an exception in the wrong place. I set it aside to look into later, only to discover that for some reason moving the for-of-iterclose logic inside the iterator fixed the bug. It took me a while to figure out the explanation, but I believe I now understand why.

We have two separate mechanisms for skipping try-notes during abnormal returns: first, second.

The first of these mechanisms handles for-of-iterclose by setting a flag when we see a for-of-iterclose trynote, and ignoring all subsequent trynotes until we see a matching for-of.

The second mechanism is hidden inside the iterator itself. It handles the case when we break/return out of a for-in loop. We inline the code to close all outstanding iterators and invoke all finally blocks; if we throw an exception from there, we don't want to invoke that code again. To avoid this, the second mechanism uses the stack depth to filter out trynotes that have already been handled.

The bug that I discovered in my testcase was due to an interaction between these two mechanisms. With the correct arrangement of nested for-of loops and broken iterators, we can end up in a situation where the enclosing for-of to which mechanism #1 is trying to skip is never returned by the iterator because the stack depth is wrong, causing mechanism #2 to skip right past a valid catch block.

By moving the for-of-iterclose logic into TryNoteIter, I inadvertently fixed this bug, because the for-of-iterclose logic now occurs before the stack depth filtering.

It feels wrong to have two separate mechanisms, and in the long run we should rework this whole mess. In the short run, this fix is better than nothing.

Pushed by iireland@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/025feea5945b
Move ForOfIterClose logic inside TryNoteIter r=tcampbell
Pushed by iireland@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b3d4bff86800
Update comment for JSTryNoteKind r=tcampbell
https://hg.mozilla.org/integration/autoland/rev/e5caca48603e
Replace TryNoteIter template op with a more general filter op r=tcampbell
Whiteboard: [jsbugmon:update] → [jsbugmon:update,ignore]
JSBugMon: The testcase found in this bug no longer reproduces (tried revision 3dc7d345da52).
Pushed by iireland@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/627f1def8aeb
Use TryNoteIter in HasLiveStackValueAtDepth r=tcampbell
Whiteboard: [jsbugmon:update,ignore] → [jsbugmon:update]

The bustage occurred because changes were made in bug 1518753 (after my last try build) which tightened up same-compartment realms. The fuzz testcase I added was therefore no longer valid.

I've added the mystical incantation (newGlobal() -> newGlobal({newCompartment: true})), so everything should be better now.

Flags: needinfo?(iireland)
Whiteboard: [jsbugmon:update] → [jsbugmon:update,ignore]
JSBugMon: The testcase found in this bug no longer reproduces (tried revision e49161da5784).
Pushed by iireland@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/83d4a8e7b551
Use TryNoteIter in HasLiveStackValueAtDepth r=jandem
Keywords: leave-open
Attachment #9031972 - Attachment is obsolete: true
Status: REOPENED → RESOLVED
Closed: 5 years ago5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla66
Whiteboard: [jsbugmon:update,ignore] → [jsbugmon:update]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: