Closed Bug 1483182 Opened Last year Closed Last year

Assertion failure: !cx->isExceptionPending(), at js/src/vm/JSContext-inl.h:331 with OOM

Categories

(Core :: JavaScript Engine, defect, critical)

x86_64
Linux
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla63
Tracking Status
firefox-esr52 --- unaffected
firefox-esr60 --- unaffected
firefox61 --- unaffected
firefox62 --- unaffected
firefox63 --- fixed

People

(Reporter: decoder, Assigned: njn)

References

(Blocks 2 open bugs)

Details

(4 keywords, Whiteboard: [jsbugmon:])

Attachments

(2 files, 2 obsolete files)

The following testcase crashes on mozilla-central revision bf79440c1376 (build with --enable-posix-nspr-emulation --enable-valgrind --enable-gczeal --disable-tests --disable-profiling --enable-debug --enable-optimize, run with --fuzzing-safe):

var lfLogBuffer = `
  function testOuterForInVar() {
    return eval("for (var x in {}); (function() { return delete x; })");
  }
  testOuterForInVar();
`;
loadFile(lfLogBuffer);
loadFile(lfLogBuffer);
function loadFile(lfVarx) {
    try {
        oomTest(function() {
            eval(lfVarx);
        });
    } catch (lfVare) {}
}


Backtrace:

received signal SIGSEGV, Segmentation fault.
0x00000000004d8cf0 in js::CheckForInterrupt (cx=0x7ffff5f17000) at js/src/vm/JSContext-inl.h:331
#0  0x00000000004d8cf0 in js::CheckForInterrupt (cx=0x7ffff5f17000) at js/src/vm/JSContext-inl.h:331
#1  0x00000000005a79b8 in Interpret (cx=0x7ffff5f17000, state=...) at js/src/vm/Interpreter.cpp:2268
#2  0x00000000005b2956 in js::RunScript (cx=0x7ffff5f17000, state=...) at js/src/vm/Interpreter.cpp:425
#3  0x00000000005b5cbd in js::ExecuteKernel (cx=<optimized out>, cx@entry=0x7ffff5f17000, script=..., script@entry=..., envChainArg=..., newTargetValue=..., evalInFrame=..., evalInFrame@entry=..., result=<optimized out>) at js/src/vm/Interpreter.cpp:773
#4  0x00000000005eeaac in EvalKernel (cx=0x7ffff5f17000, v=..., v@entry=..., evalType=evalType@entry=DIRECT_EVAL, caller=..., env=env@entry=..., pc=<optimized out>, vp=...) at js/src/builtin/Eval.cpp:319
#5  0x00000000005ef2be in js::DirectEval (cx=<optimized out>, v=..., vp=...) at js/src/builtin/Eval.cpp:428
#6  0x00000000005a67c7 in Interpret (cx=0x7ffff5f17000, state=...) at js/src/vm/Interpreter.cpp:3150
#7  0x00000000005b2956 in js::RunScript (cx=0x7ffff5f17000, state=...) at js/src/vm/Interpreter.cpp:425
#8  0x00000000005b5cbd in js::ExecuteKernel (cx=<optimized out>, cx@entry=0x7ffff5f17000, script=..., script@entry=..., envChainArg=..., newTargetValue=..., evalInFrame=..., evalInFrame@entry=..., result=<optimized out>) at js/src/vm/Interpreter.cpp:773
#9  0x00000000005eeaac in EvalKernel (cx=0x7ffff5f17000, v=..., v@entry=..., evalType=evalType@entry=DIRECT_EVAL, caller=..., env=env@entry=..., pc=<optimized out>, vp=...) at js/src/builtin/Eval.cpp:319
#10 0x00000000005ef2be in js::DirectEval (cx=<optimized out>, v=..., vp=...) at js/src/builtin/Eval.cpp:428
#11 0x0000000000721523 in js::jit::DoCallFallback (cx=<optimized out>, frame=0x7fffffffbec8, stub_=<optimized out>, argc=<optimized out>, vp=0x7fffffffbe78, res=...) at js/src/jit/BaselineIC.cpp:2566
#12 0x000002173ffa2d0c in ?? ()
[...]
#44 0x0000000000000000 in ?? ()
rax	0x0	0
rbx	0x7ffff5f17000	140737319628800
rcx	0x7ffff6c1c2dd	140737333281501
rdx	0x0	0
rsi	0x7ffff6eeb770	140737336227696
rdi	0x7ffff6eea540	140737336223040
rbp	0x7fffffff9880	140737488328832
rsp	0x7fffffff9860	140737488328800
r8	0x7ffff6eeb770	140737336227696
r9	0x7ffff7fe6780	140737354033024
r10	0x58	88
r11	0x7ffff6b927a0	140737332717472
r12	0x7fffffffa010	140737488330768
r13	0x1	1
r14	0x7ffff48b3940	140737296152896
r15	0x7fffffff9d70	140737488330096
rip	0x4d8cf0 <js::CheckForInterrupt(JSContext*)+224>
=> 0x4d8cf0 <js::CheckForInterrupt(JSContext*)+224>:	movl   $0x0,0x0
   0x4d8cfb <js::CheckForInterrupt(JSContext*)+235>:	ud2
Whiteboard: [jsbugmon:update,bisect] → [jsbugmon:update]
JSBugMon: Bisection requested, result:
autoBisect shows this is probably related to the following changeset:

The first bad revision is:
changeset:   https://hg.mozilla.org/mozilla-central/rev/ad30dc53e38e
user:        Nicholas Nethercote
date:        Fri Aug 10 18:00:29 2018 +1000
summary:     Bug 1481998 - Make mozilla::Hash{Map,Set}'s entry storage allocation lazy. r=luke,sfink

This iteration took 1.629 seconds to run.
Whiteboard: [jsbugmon:update] → [jsbugmon:]
JSBugMon: Cannot process bug: Unable to automatically reproduce, please track manually.
Flags: needinfo?(n.nethercote)
I will investigate this properly on Monday, but in the meantime I have a question:

> Looks like a dup of bug 1483016: OOM for lookupForAdd() is not handled here [1].

Why do you think the problem is there? If lookupForAdd() OOMs it will return an invalid AddPtr:

  https://searchfox.org/mozilla-central/source/mfbt/HashTable.h#2115

and the subsequent add() call will fail early:

  https://searchfox.org/mozilla-central/source/mfbt/HashTable.h#2136-2138

Am I missing something?
I can reproduce this and I have confirmed that bug 1481998 is at fault. The failure reproduces about 90% of the time for me.
Assignee: nobody → n.nethercote
Flags: needinfo?(n.nethercote)
> Am I missing something?

I see it now: the subsequent add() call is guarded by a condition that might fail. In which case the OOM won't be detected. Adding an explicit check fixes the problem.
Duplicate of this bug: 1483016
Attached patch Handle OOM in Enumerate() (obsolete) — Splinter Review
Luke: This is an unexpected and annoyingly subtle bug. I am going to audit all
the existing lookupForAdd() call sites and add extra comments to HashTable.h to
mention this possibility.
Attachment #9002342 - Flags: review?(luke)
> Luke: This is an unexpected and annoyingly subtle bug.

Actually, it's not as bad as I thought. If lookupForAdd() fails due to OOM, that means the table was empty, which means that the element isn't present and the lookup would "fail" (i.e. not find the element) anyway. The distinction between "failed because the element is missing" and "failed because the element is missing *and* we OOMed when trying to instantiate the storage" doesn't matter in practice -- the OOM is silently swallowed, the behaviour will end up the same anyway.

*Except* when deliberate OOM injections are involved, because the JS engine is looking for a specific exception in response, and it doesn't get one.

So I think this bug is harmless, though certainly worth fixing to satisfy the fuzzers.
To clarify further: the OOM injection mechanism may complain about a function that features an execution path that:

- passes through a `h.lookupForAdd(k)` that fails,

- and reaches a non-failing `return` statement,

- without passing through an intermediate `h.add(p)` with its own check and early return on failure.

Fixing it requires adding an explicit `p.isValid()` to the path.
Attached patch Handle OOM in Enumerate() (obsolete) — Splinter Review
Attachment #9002348 - Flags: review?(luke)
Attachment #9002342 - Attachment is obsolete: true
Attachment #9002342 - Flags: review?(luke)
I guess `AutoEnterOOMUnsafeRegion` could be used instead to fix this.
(In reply to Nicholas Nethercote [:njn] from comment #9)
> Actually, it's not as bad as I thought. If lookupForAdd() fails due to OOM,
> that means the table was empty, which means that the element isn't present
> and the lookup would "fail" (i.e. not find the element) anyway.

Just curious, why does lookupForAdd need to allocate storage for the table?
(I'm asking because, in the Enumerate case, having lookupForAdd not allocate storage could be a small perf optimization in some cases!)
> Just curious, why does lookupForAdd need to allocate storage for the table?

It returns a pointer to an entry: a live entry if the element is present, or a free entry if the element is not present. In order to do that, there must be storage to point to!

Enumerate() could be rearranged so that the `pobj->is<ProxyObject>() || ...` test is done before `lookupForAdd()`. Do you know if that might be a better approach?
(In reply to Nicholas Nethercote [:njn] from comment #15)
> It returns a pointer to an entry: a live entry if the element is present, or
> a free entry if the element is not present. In order to do that, there must
> be storage to point to!

D'oh, sorry, I was looking at this all wrong, confusing it with a plain lookup(). Let's blame it on Monday morning.

> Enumerate() could be rearranged so that the `pobj->is<ProxyObject>() || ...`
> test is done before `lookupForAdd()`. Do you know if that might be a better
> approach?

There's this |if (!ht) ht.emplace(cx, 5);| code there, I think that could be rearranged so we only emplace() if we're add()ing something, but then the lookupForAdd call will have to deal with the case where |ht| is still mozilla::Nothing(). Probably not worth the complexity for this bug.
I think what this bug shows is that lookupForAdd() should not report (i.e., create an exception, such that cx->isExceptionPending()) OOM.  Otherwise, we end up with this super-subtle contract leading to patches that requires comments like this; if for some reason we think lookupForAdd() *should* report on OOM, then I think we need to change the interface to, e.g., use Result<AddPtr, bool> or something, because that's just too subtle.
I'm happy for lookupForAdd() to not report, but I don't know how to implement it. We can't use AutoEnterOOMUnsafeRegion in MFBT. Any suggestions?
Flags: needinfo?(luke)
Currently lookupOrAdd() will allocate if the table has no storage. But it
doesn't report an error if the allocation fails, which can cause problems.

This patch changes things so that lookupOrAdd() doesn't allocate when the table
has no storage. Instead, it returns an AddPtr that is not *valid* (its mTable
is empty) but it is *live*, and can be used in add(), whereupon the allocation
will occur.

The patch also makes Ptr::isValid() and AddPtr::isValid() non-public, because
the valid vs. live distinction is non-obvious and best kept hidden within the
classes.
Attachment #9002688 - Flags: review?(luke)
This fixes a typo.
Attachment #9002689 - Flags: review?(luke)
(In reply to Nicholas Nethercote [:njn] from comment #6)
> > Am I missing something?
> 
> I see it now: the subsequent add() call is guarded by a condition that might
> fail. In which case the OOM won't be detected. 

Yes, exactly. Sorry I didn't get to respond to your question earlier, I was away for the most part of the weekend.
Comment on attachment 9002688 [details] [diff] [review]
Don't allocate in HashTable::lookupOrAdd()

Review of attachment 9002688 [details] [diff] [review]:
-----------------------------------------------------------------

Ah yes, of course, that's much better.

::: mfbt/HashTable.h
@@ +2179,5 @@
>          return false;
>        }
>        mRemovedCount--;
>        aPtr.mKeyHash |= sCollisionBit;
> +

side note: I've never seen the style of adding a trailing whitespace to visually separate else-if branches.  I like it, and have at times considered adding them myself, but I've never seen it anywhere else.
Attachment #9002688 - Flags: review?(luke) → review+
Flags: needinfo?(luke)
Attachment #9002348 - Flags: review?(luke)
Attachment #9002689 - Flags: review?(luke) → review+
Blocks: 1481998
> side note: I've never seen the style of adding a trailing whitespace to visually
> separate else-if branches.  I like it, and have at times considered adding them myself,
> but I've never seen it anywhere else.

I do it all the time and I wouldn't have thought it was at all notable. Huh.
Attachment #9002348 - Attachment is obsolete: true
Pushed by nnethercote@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/8116d3ee3a5a
Don't allocate in HashTable::lookupOrAdd(). r=luke
https://hg.mozilla.org/integration/mozilla-inbound/rev/c3bb26f98555
Do report OOM failures in HashTable::reserve(). r=luke
https://hg.mozilla.org/mozilla-central/rev/8116d3ee3a5a
https://hg.mozilla.org/mozilla-central/rev/c3bb26f98555
Status: NEW → RESOLVED
Closed: Last year
Resolution: --- → FIXED
Target Milestone: --- → mozilla63
Duplicate of this bug: 1482881
You need to log in before you can comment on or make changes to this bug.