Closed Bug 942480 Opened 6 years ago Closed 6 years ago

Crash [@ js::gc::Cell::runtimeFromAnyThread] or Assertion failure: table, at dist/include/js/HashTable.h or Assertion failure: object->runtimeFromMainThread()->isHeapBusy(), at vm/Debugger.cpp

Categories

(Core :: JavaScript Engine, defect, critical)

defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla28
Tracking Status
firefox26 --- disabled
firefox27 --- disabled
firefox28 --- fixed
firefox-esr24 --- disabled
b2g18 --- unaffected

People

(Reporter: gkw, Assigned: shu)

References

(Blocks 1 open bug)

Details

(6 keywords, Whiteboard: [MemShrink])

Crash Data

Attachments

(4 files, 1 obsolete file)

Attached file stacks
x = ParallelArray([0]);
x.shape[0] = Infinity;
x.filter(function(){ new Debugger });

asserts 32-bit js debug shell on m-c changeset ad6589ed742c without any CLI arguments at Assertion failure: table, at dist/include/js/HashTable.h or Assertion failure: object->runtimeFromMainThread()->isHeapBusy(), at vm/Debugger.cpp

It also crashes 32-bit js opt shell at js::gc::Cell::runtimeFromAnyThread

64-bit seems unaffected.

This seems to be involving the Debugger object, probably an OOM, but due to the presence of GC and the varying assertions, I'm locking it s-s for the moment.

jimb, any idea what might be going on here?

My configure flags are:

Debug:

LD=ld CROSS_COMPILE=1 CXX="clang++ -Qunused-arguments -arch i386" RANLIB=ranlib CC="clang -Qunused-arguments -arch i386" AS=$CC AR=ar STRIP="strip -x -S" HOST_CC="clang -Qunused-arguments" HOST_CXX="clang++ -Qunused-arguments" sh ./configure --target=i386-apple-darwin9.2.0 --enable-macos-target=10.5 --enable-optimize --enable-debug --enable-profiling --enable-gczeal --enable-debug-symbols --enable-methodjit --enable-type-inference --disable-tests --with-ccache --enable-threadsafe <other NSPR flags>

Opt:

LD=ld CROSS_COMPILE=1 CXX="clang++ -Qunused-arguments -arch i386" RANLIB=ranlib CC="clang -Qunused-arguments -arch i386" AS=$CC AR=ar STRIP="strip -x -S" HOST_CC="clang -Qunused-arguments" HOST_CXX="clang++ -Qunused-arguments" sh ./configure --target=i386-apple-darwin9.2.0 --enable-macos-target=10.5 --enable-optimize --disable-debug --enable-profiling --enable-gczeal --enable-debug-symbols --enable-methodjit --enable-type-inference --disable-tests --with-ccache --enable-threadsafe <other NSPR flags>
Flags: needinfo?(jimb)
autoBisect shows this is probably related to the following changeset:

The first bad revision is:
changeset:   http://hg.mozilla.org/mozilla-central/rev/77bcbb164b45
user:        Dan Gohman
date:        Fri Aug 16 14:40:50 2013 -0700
summary:     Bug 894813 - IonMonkey: Eliminate ambiguity in pow calls in a CLOSED TREE

Actually, bug 894813 may be a regressor instead (not sure), so switching over to Dan.
Blocks: 894813
Flags: needinfo?(jimb) → needinfo?(sunfish)
Keywords: regression
I can reproduce the "object->runtimeFromMainThread()->isHeapBusy()" assertion failure on 32-bit Linux.

I can reproduce it with --ion-range-analysis=off, so it's probably unrelated to bug 894813 or range analysis in general.

I poked around a little:

The "Assertion failure: table" abort is coming from the "JS_ASSERT(debuggees.empty())" in ~Debugger(). It should perhaps be "JS_ASSERT_IF(debuggees.initialized(), debuggees.empty());", because if the debugger's init function fails due to OOM, it may leave the debuggees table in its "uninitialized" state. Same thing for the similar assert in Debugger::finalize.

The "object->runtimeFromMainThread()->isHeapBusy()" assertion has a comment that says "This always happens in the GC thread, so no locking is required." However, when it fails, the debugger is being called from Debugger::construct because of an OOM failure, which presumably isn't the normal GC case. If that assertion is disabled, the program proceeds into what looks like a double free: the Debugger object which had its init fail due to OOM was deleted explicitly, and later it has its finalizer called from the GC.

Since this looks like a double free, security-sensitive looks appropriate, probably sec-critical. It's not clear when the bug was first introduced; I didn't see anything recent at a quick glance. Someone familiar with the Debugger and/or the GC should look at this.
Flags: needinfo?(sunfish)
> Since this looks like a double free, security-sensitive looks appropriate,
> probably sec-critical. It's not clear when the bug was first introduced; I
> didn't see anything recent at a quick glance. Someone familiar with the
> Debugger and/or the GC should look at this.

CC'ing some Debugger/GC folks, and needinfo'ing a handful.
No longer blocks: 894813
Flags: needinfo?(terrence)
Flags: needinfo?(jimb)
It looks like creating a Debugger object from PJS fails (correctly), but leaves the debugger present in an invalid state. Niko, Shu, could one of you take a look?
Flags: needinfo?(terrence)
Flags: needinfo?(shu)
Flags: needinfo?(nmatsakis)
Flags: needinfo?(jimb)
For debugger, I think it has to be shu.
Flags: needinfo?(nmatsakis)
This doesn't really have anything to do with ParallelArray, this is just an OOM
bug.

~Debugger can be called from GC or during OOM, and it currently incorrectly
assumes it's always called from GC. Jim, I'm not sure what the comment above
isHeapBusy() is talking about in terms of racing. AFAICT it's not possible for
~Debugger to race on the runtime-wide linked lists.

Gary, with this patch I get an uncaught OOM exception, but OS X 10.9 also
starts barfing messages like:

js(37743,0xa19be1a8) malloc: *** mach_vm_map(size=8388608) failed (error code=3)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug

over and over and over and over. I suppose that means it's running of VM space
on 32bit. Probably intended behavior, since we are OOMing.
Attachment #8341488 - Flags: review?(jimb)
Assignee: general → shu
Status: NEW → ASSIGNED
Flags: needinfo?(shu)
That patch doesn't fix all the crashes. It's running out of VM space and it's still crashing in the HashTable destructors. Not sure why -- maybe heap and stack become overlapped?
Whiteboard: [jsbugmon:update] → [jsbugmon:update][MemShrink]
Some further investigation reveals that we basically allocate Debuggers forever until we run out of VM space, at which point we start unwinding, destructing the millions of Debuggers we've allocated.

We don't get very far in trying to destruct the millions of Debugger, though, until we hit a crash. I'm betting that we run out of stack space and bad things start to happen, since I got this in lldb:

  (lldb) p e
  error: Couldn't allocate space for the stack frame: Couldn't malloc: address space is full
  Errored out in Execute, couldn't PrepareToExecuteJITExpression

If that is the case (we're running out of stack while running in the destructor), I know how to fix this. Ideas, Jim?
Flags: needinfo?(jimb)
err, left out an operative word there: I *don't* know how to fix this.
Attached file Reduced testcase
Remove duplicate assert.
Attachment #8341535 - Flags: review?(jimb)
Attachment #8341488 - Attachment is obsolete: true
Attachment #8341488 - Flags: review?(jimb)
Maybe make an upper limit on how many Debuggers can be made so we don't blow the C stack?
Also, I have a hard time reproducing the OOM in an optimized build, but it happens readily on debug.
(maybe something about inlined frames, less stack)
Whiteboard: [jsbugmon:update][MemShrink] → [MemShrink] [jsbugmon:]
JSBugMon: Cannot process bug: Error: Failed to compile specified revision ad6589ed742c (maybe try another?)
Whiteboard: [MemShrink] [jsbugmon:] → [MemShrink]
That assertion in ~Debugger that the heap is busy is obviously incorrect in the OOM case; it should go.

The assertion in ~Debugger that the 'debuggees' table is empty should be protected by a 'debuggees.initialized()' check.

But we really should find out why those Debuggers can't be destroyed nicely. I have no idea.
Flags: needinfo?(jimb)
(In reply to Shu-yu Guo [:shu] from comment #9)
> millions of Debuggers

This bug is so awesome.
(In reply to Jim Blandy :jimb from comment #17)
> That assertion in ~Debugger that the heap is busy is obviously incorrect in
> the OOM case; it should go.
> 
> The assertion in ~Debugger that the 'debuggees' table is empty should be
> protected by a 'debuggees.initialized()' check.
> 
> But we really should find out why those Debuggers can't be destroyed nicely.
> I have no idea.

Yes, that is in fact what the patch does. :)
Comment on attachment 8341535 [details] [diff] [review]
Fix some OOM handling in Debugger construction.

Review of attachment 8341535 [details] [diff] [review]:
-----------------------------------------------------------------

Why, so it is!
Attachment #8341535 - Flags: review?(jimb) → review+
(In reply to Shu-yu Guo [:shu] from comment #9)
> Some further investigation reveals that we basically allocate Debuggers
> forever until we run out of VM space, at which point we start unwinding,
> destructing the millions of Debuggers we've allocated.
>
> We don't get very far in trying to destruct the millions of Debugger,
> though, until we hit a crash. I'm betting that we run out of stack space and
> bad things start to happen, since I got this in lldb:
> 
>   (lldb) p e
>   error: Couldn't allocate space for the stack frame: Couldn't malloc:
> address space is full
>   Errored out in Execute, couldn't PrepareToExecuteJITExpression
> 
> If that is the case (we're running out of stack while running in the
> destructor), I know how to fix this. Ideas, Jim?

That error message, "couldn't PrepareToExecuteJITExpression", is an lldb error message, not a SpiderMonkey error message. So it's just saying that LLDB itself can't get the memory it wants.

Can you make it crash in an interesting way, with your patch applied, if you do something like this?

$ (ulimit -v 4000000; obj~/js -f ~/moz/cool-bug.js)

The parens make bash run the command in a subshell, so the ulimit doesn't affect your main shell, only the JS shell invocation.)
Ah, okay, I'm able to reproduce a double-free. The problem is that Debugger::construct stores the pointer to the js::Debugger in the JSObject's private slot, but then js_deletes the js::Debugger on OOM. Later, the GC runs the JSObject's finalizer, which double-frees the hash tables.
(In reply to Jim Blandy :jimb from comment #22)
> Ah, okay, I'm able to reproduce a double-free. The problem is that
> Debugger::construct stores the pointer to the js::Debugger in the JSObject's
> private slot, but then js_deletes the js::Debugger on OOM. Later, the GC
> runs the JSObject's finalizer, which double-frees the hash tables.

Ooh, nice catch.
(In reply to Shu-yu Guo [:shu] from comment #24)
> Ooh, nice catch.

Glibc's malloc gave me a backtrace for the double free, with hex PC values and no source locations, so I was typing stuff like "list *0x921971" into GDB until I figured out what was going on.
Attachment #8343926 - Flags: review?(shu) → review+
Jim, your double free fix fixes the intermittent crashes in the test cases? Also, do you mind just rolling the 2 patches into 1 and pushing it?
https://hg.mozilla.org/integration/mozilla-inbound/rev/6c6feff2a255
https://hg.mozilla.org/integration/mozilla-inbound/rev/521a84fc4f6f

(That's both patches.)
Flags: in-qa-testsuite-
OS: Mac OS X → All
Hardware: x86 → All
Target Milestone: --- → mozilla28
landed on central
https://hg.mozilla.org/mozilla-central/rev/6c6feff2a255
https://hg.mozilla.org/mozilla-central/rev/521a84fc4f6f
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Is this really sec-critical, given that it's a double-free involving Debugger (which is not available in content)? Also, is bug 926448 a dup of this?
Duplicate of this bug: 926448
(In reply to Christian Holler (:decoder) from comment #30)
> Is this really sec-critical, given that it's a double-free involving
> Debugger (which is not available in content)?

Now that we know that that's the cause: no, I think this is not security-critical.

> Also, is bug 926448 a dup of
> this?

Sure looks like it. Marked as such.
Group: core-security
You need to log in before you can comment on or make changes to this bug.