Closed Bug 1152980 Opened 9 years ago Closed 7 years ago

Startup crash in mozalloc_abort(char const* const) | NS_DebugBreak | mozilla::ipc::MessageChannel::DebugAbort(char const*, int, char const*, char const*, bool) | mozilla::ipc::MessageChannel::~MessageChannel() ...

Categories

(Core :: IPC, defect)

x86
Windows NT
defect
Not set
critical

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox37 --- wontfix
firefox38 + wontfix
firefox39 + unaffected
firefox40 --- unaffected
firefox41 --- unaffected

People

(Reporter: away, Unassigned)

Details

(Keywords: crash)

Crash Data

This bug was filed from the Socorro interface and is 
report bp-1f15da08-ee54-4d6f-99cc-962562150404.
=============================================================

Topcrash in 37.0.1, and it's a startup crash. 38 beta is affected. bent says: "we spun the event loop while an IPC call was on the stack and one of the events destroyed the channel"

0 	mozalloc.dll 	mozalloc_abort(char const* const) 	memory/mozalloc/mozalloc_abort.cpp
1 	xul.dll 	NS_DebugBreak 	xpcom/base/nsDebugImpl.cpp
2 	xul.dll 	mozilla::ipc::MessageChannel::DebugAbort(char const*, int, char const*, char const*, bool) 	ipc/glue/MessageChannel.cpp
3 	xul.dll 	mozilla::ipc::MessageChannel::~MessageChannel() 	ipc/glue/MessageChannel.cpp
4 	mozalloc.dll 	moz_xmalloc 	memory/mozalloc/mozalloc.cpp
5 	xul.dll 	mozilla::layers::PCompositorChild::~PCompositorChild() 	obj-firefox/ipc/ipdl/PCompositorChild.cpp

(Frame 4 is probably spurious)
[Tracking Requested - why for this release]: Startup topcrash in 37.0.1, and 38 is affected

v39 status unknown. It might be fixed or it might just not be showing up on that channel.

Note that this is different from bug 1122008 whose signature is extremely similar but differs in CompositorParent vs CompositorChild.
(In reply to David Major [:dmajor] from comment #0)
> bp-1f15da08-ee54-4d6f-99cc-962562150404

This shows us releasing the last reference to a ContentChild in a runnable that runs within a nested event loop during a sync XHR. Apparently there is an IPC call on the stack (lost in the JIT frames) that sent an intr message to the parent. Ugh.

I guess we need to figure out how to guard against this in a general sense.
Startup crash, tracking!
Ben, we are late in the beta cycle, any chance you could provide a fix for this soon?
Thanks
Flags: needinfo?(bent.mozilla)
I'm not the right asignee for this, you need someone on e10s here.
Flags: needinfo?(bent.mozilla)
Bill, is there anything you can do for 38 here?
Flags: needinfo?(wmccloskey)
(In reply to Ben Turner [:bent] (use the needinfo flag!) from comment #2)
> (In reply to David Major [:dmajor] from comment #0)
> > bp-1f15da08-ee54-4d6f-99cc-962562150404
> 
> This shows us releasing the last reference to a ContentChild in a runnable
> that runs within a nested event loop during a sync XHR. Apparently there is
> an IPC call on the stack (lost in the JIT frames) that sent an intr message
> to the parent. Ugh.
> 
> I guess we need to figure out how to guard against this in a general sense.

Could you look at this again, Ben? I'm not seeing what you're seeing. We're releasing a CompositorChild and I don't see any intr calls. mCxxStackFrames would be non-empty if there were any messages being sent or dispatched on the CompositorChild channel. It's mysterious how we could get from there to executing JS code though. PCompositor is a sync protocol with only normal message priorities, so sending a message shouldn't allow anything to run. So we'd have to be doing something weird while dispatching.

It also seems possible that the channel has already been released and mCxxStackFrames just looks non-empty.

It would really help to have more stack frames here. Is that possible dmajor?
Flags: needinfo?(wmccloskey)
Flags: needinfo?(dmajor)
Flags: needinfo?(bent.mozilla)
Oops, we're releasing a *CompositorChild*, not a *ContentChild*... Sorry!

There's nsXMLHttpRequest::Send(JSContext*, mozilla::ErrorResult&) on the stack, and that spins the event loop, so that's how we have JS triggering this.
Flags: needinfo?(bent.mozilla)
Loading the dmp in MSVC shows this:

mozalloc.dll!mozalloc_abort(...) Line 37	C++
xul.dll!mozilla::layers::LayerTransactionChild::Release() Line 32	C++
xul.dll!mozilla::layers::CompositorChild::DeallocPLayerTransactionChild(...) Line 128	C++
xul.dll!mozilla::layers::PCompositorChild::RemoveManagee(...) Line 632	C++
xul.dll!mozilla::layers::PLayerTransactionChild::OnMessageReceived(...) Line 883	C++
xul.dll!mozilla::layers::PCompositorChild::OnMessageReceived(...) Line 969	C++
xul.dll!nsAppShell::EventWindowProc(...) Line 113	C++

And WinDbg shows this:

ntdll!ZwWaitForSingleObject+0x15
KERNELBASE!WaitForSingleObjectEx+0x98
kernel32!WaitForSingleObjectExImplementation+0x75
kernel32!WaitForSingleObject+0x12
xul!google_breakpad::ExceptionHandler::WriteMinidumpOnHandlerThread+0x59
xul!google_breakpad::ExceptionHandler::WriteMinidumpForException+0x25
xul!CrashReporter::WriteMinidumpForException+0x1a
xul!nsXULAppInfo::WriteMinidumpForException+0x9
xul!mozilla::ReportException+0x22
xul!CallWindowProcCrashProtected+0x3388b8
xul!nsWindow::WindowProc+0x37
user32!InternalCallWinProc+0x23
user32!UserCallWinProcCheckWow+0x109
user32!DispatchMessageWorker+0x3bc
user32!DispatchMessageW+0xf
xul!nsAppShell::ProcessNextNativeEvent+0x1de
0x233efd4

No idea why they're all so different (including the stack on socorro)
(In reply to Ben Turner [:bent] (use the needinfo flag!) from comment #8)
> There's nsXMLHttpRequest::Send(JSContext*, mozilla::ErrorResult&) on the
> stack, and that spins the event loop, so that's how we have JS triggering
> this.

But the XHR is triggered from JIT code. Who's running that JS? As I said above, we would expect that whatever is below the JS is doing compositor stuff. But that's weird, because we shouldn't be able to get from compositor stuff to JS.
> It would really help to have more stack frames here. Is that possible dmajor?
The stack at bp-1f15da08-ee54-4d6f-99cc-962562150404 goes pretty deep. It has some nonsense frames like moz_xmalloc and IsWindowVisible, but if you ignore those, does the rest seem reasonable?
Flags: needinfo?(dmajor)
Too late for 38 but tracking in case it happens again with 39.
There aren't currently any crashes with this signature for 39+ so I'm dropping the tracking.
no crashes matching this sig.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.