1049275 - Frequent hangs with e10s enabled

Reporter

Description

•

10 years ago

Attached file Sample of Nightly.txt — Details

the stack implicates a deadlock with CPOWs

    +2286 js::jit::DoGetPropFallback(JSContext*, js::jit::BaselineFrame*, js::jit::ICGetProp_Fallback*, JS::MutableHan
dle<JS::Value>, JS::MutableHandle<JS::Value>)  (in XUL) + 712  [0x104350c18]
    +  2286 js::proxy_GetGeneric(JSContext*, JS::Handle<JSObject*>, JS::Handle<JSObject*>, JS::Handle<jsid>, JS::Mutab
leHandle<JS::Value>)  (in XUL) + 9  [0x10458ec79]
    +    2286 js::Proxy::get(JSContext*, JS::Handle<JSObject*>, JS::Handle<JSObject*>, JS::Handle<jsid>, JS::MutableHa
ndle<JS::Value>)  (in XUL) + 446  [0x10458d58e]
    +      2286 js::CrossCompartmentWrapper::get(JSContext*, JS::Handle<JSObject*>, JS::Handle<JSObject*>, JS::Handle<
jsid>, JS::MutableHandle<JS::Value>) const  (in XUL) + 356  [0x1045aa9e4]
    +        2286 js::proxy_GetGeneric(JSContext*, JS::Handle<JSObject*>, JS::Handle<JSObject*>, JS::Handle<jsid>, JS:
:MutableHandle<JS::Value>)  (in XUL) + 9  [0x10458ec79]
    +          2286 js::Proxy::get(JSContext*, JS::Handle<JSObject*>, JS::Handle<JSObject*>, JS::Handle<jsid>, JS::Mut
ableHandle<JS::Value>)  (in XUL) + 446  [0x10458d58e]
    +            2286 CPOWProxyHandler::get(JSContext*, JS::Handle<JSObject*>, JS::Handle<JSObject*>, JS::Handle<jsid>
, JS::MutableHandle<JS::Value>) const  (in XUL) + 93  [0x1021c4b4d]
    +              2286 mozilla::jsipc::WrapperOwner::get(JSContext*, JS::Handle<JSObject*>, JS::Handle<JSObject*>, JS
::Handle<jsid>, JS::MutableHandle<JS::Value>)  (in XUL) + 248  [0x1021c4c58]
    +                2286 mozilla::jsipc::JavaScriptBase<mozilla::jsipc::PJavaScriptParent>::CallGet(unsigned long lon
g const&, mozilla::jsipc::ObjectVariant const&, nsString const&, mozilla::jsipc::ReturnStatus*, mozilla::jsipc::JSVariant*)  (in XUL) + 13  [0x1021c884d]
    +                  2286 mozilla::jsipc::PJavaScriptParent::CallGet(unsigned long long const&, mozilla::jsipc::Obje
ctVariant const&, nsString const&, mozilla::jsipc::ReturnStatus*, mozilla::jsipc::JSVariant*)  (in XUL) + 608  [0x1020ae110]
    +                    2286 mozilla::ipc::MessageChannel::Call(IPC::Message*, IPC::Message*)  (in XUL) + 21  [0x101ef9375]
    +                      2286 mozilla::ipc::MessageChannel::UrgentCall(IPC::Message*, IPC::Message*)  (in XUL) + 262  [0x101ef8e96]
    +                        2286 mozilla::ipc::MessageChannel::SendAndWait(IPC::Message*, IPC::Message*)  (in XUL) + 200  [0x101ef8cd8]
    +                          2286 mozilla::ipc::MessageChannel::WaitForSyncNotify()  (in XUL) + 69  [0x101ef8ff5]
    +                            2286 PR_WaitCondVar  (in libnss3.dylib) + 105  [0x101904219]
    +                              2286 _pthread_cond_wait  (in libsystem_pthread.dylib) + 727  [0x7fff982d0c3b]
    +                                2286 __psynch_cvwait  (in libsystem_kernel.dylib) + 10  [0x7fff8f9f6716]

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 1

•

10 years ago

Next time it happens can you also get a stack for the child process?

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 2

•

10 years ago

Also, is this a recent thing or has it always happened? Can you try to see if a particular add-on is triggering it?

Brad Lassey [:blassey] (use needinfo?)

Reporter

Comment 3

•

10 years ago

(In reply to Bill McCloskey (:billm) from comment #2)
> Also, is this a recent thing or has it always happened? Can you try to see
> if a particular add-on is triggering it?

I've been noticing 3 or 4 hangs a day for the last few days.

(In reply to Bill McCloskey (:billm) from comment #1)
> Next time it happens can you also get a stack for the child process?

will do

Brad Lassey [:blassey] (use needinfo?)

Reporter

Comment 4

•

10 years ago

sampled both this time. The parent looks roughly the same (same last few frames anyway). Here's what the Child process looks like:

2439 nsListControlFrame::Reflow(nsPresContext*, nsHTMLReflowMetrics&, nsHTMLReflowState const&, unsigned int&)  (in XUL) + 399  [0x101b7d47f]
   2439 nsListControlFrame::ReflowAsDropdown(nsPresContext*, nsHTMLReflowMetrics&, nsHTMLReflowState const&, unsigned int&)  (in XUL) + 654  [0x101b7db2e]
     2439 nsComboboxControlFrame::GetAvailableDropdownSpace(int*, int*, nsPoint*)  (in XUL) + 124  [0x101b73fbc]
       2439 nsDeviceContext::GetClientRect(nsRect&)  (in XUL) + 37  [0x1009bcd75]
         2439 nsDeviceContext::ComputeClientRectUsingScreen(nsRect*)  (in XUL) + 102  [0x1009bcde6]
           2439 nsScreenManagerProxy::ScreenForNativeWidget(void*, nsIScreen**)  (in XUL) + 122  [0x1014df5ca]
             2439 mozilla::dom::PScreenManagerChild::SendScreenForBrowser(mozilla::dom::PBrowserChild*, mozilla::dom::ScreenDetails*, bool*)  (in XUL) + 483  [0x1005efb23]
               2439 mozilla::ipc::MessageChannel::Send(IPC::Message*, IPC::Message*)  (in XUL) + 197  [0x1003910e5]
                 2439 mozilla::ipc::MessageChannel::DebugAbort(char const*, int, char const*, char const*, bool) const  (in XUL) + 813  [0x10038ff5d]
                   2439 NS_DebugBreak  (in XUL) + 1180  [0x1000719fc]
                     2439 CrashReporter::AppendAppNotesToCrashReport(nsACString_internal const&)  (in XUL) + 315  [0x10205fb6b]
                       2439 mozilla::dom::PCrashReporterChild::SendAppendAppNotes(nsCString const&)  (in XUL) + 441  [0x10048b179]
                        2439 mozilla::ipc::MessageChannel::Send(IPC::Message*)  (in XUL) + 84  [0x100390b74]
                          2439 PR_Lock  (in libnss3.dylib) + 14  [0x107d44f4e]
                            2439 _pthread_mutex_lock  (in libsystem_pthread.dylib) + 372  [0x7fff982d1779]
                              2439 __psynch_mutexwait  (in libsystem_kernel.dylib) + 10  [0x7fff8f9f6746]

Brad Lassey [:blassey] (use needinfo?)

Reporter

Comment 5

•

10 years ago

So, the child process has crashed and is trying to send crash data to the parent, but the parent is issuing an urgent message. From an irc conversation with billm, the child process needs to "win" in this situation.

Brad Lassey [:blassey] (use needinfo?)

Reporter

Updated

•

10 years ago

Blocks: old-e10s-m2

tracking-e10s: ? → +

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 6

•

10 years ago

I was able to reproduce this while debugging something else. It turns out that we're not waiting on an IPC message; we're just waiting for a lock. It happens like this:

take a lock in the IPC code
do something that causes the IPC code to assert
try to send crash reporter data to the parent, invoking IPC
try to take the same IPC lock, which is already locked

I'm still unsure how to address this. It looks like we don't send any critical data over the IPC channel--just some textual annotations. What's even stranger is that these annotations look like they get attached to the parent process's crashes. I might be wrong about that though--I'll need to look more into how child process crash dumps are generated.

Chris Peterson [:cpeterson]

Comment 7

•

10 years ago

Gavin and Brad think this bug should block M2.

tracking-e10s: + → m2+

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 9

•

10 years ago

Attached patch avoid-ipc-hang — Details — Splinter Review

Please see the explanation in comment 6. This patch just turns off the crash annotations when we abort. I was wrong about these annotations getting attached to the parent--they are attached to the child minidump. However, they don't seem critical. And right now we're not getting any crash reports because of this, so we really need to fix it.

Attachment #8488331 - Flags: review?(benjamin)

sebboh (use needinfo)

Comment 10

•

10 years ago

The other comments here are way above my head.

I found my way here by searching the E10S bugs for the string 'hang' in the title.

I performed that search because that's what was happening.  It seemed to happen when I'd open a link or bookmark in a new tab.  I've disabled E10S in the meantime.

If I can gather more information, please tell me how.  What tool do you use to generate those stack traces?

I am a Java developer, so I know how to produce a stack trace or a thread dump--in my world, but not this one. :)

sebboh (use needinfo)

Comment 11

•

10 years ago

Pardon me: re: Comment 10: Whoops, this bug is for OSX.  I am on Windows 7, 64bit.

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 12

•

10 years ago

This might be the cause of your crash, but we won't know until the bug is fixed.

OS: Mac OS X → All

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 13

•

10 years ago

er, your hangs, sorry.

Benjamin Smedberg

Updated

•

10 years ago

Attachment #8488331 - Flags: review?(benjamin) → review+

Chris Peterson [:cpeterson]

Comment 14

•

10 years ago

Bill: is your r+'d avoid-ipc-hang patch ready to land? We're trying to close out our remaining M2 bugs today.

Flags: needinfo?(wmccloskey)

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 15

•

10 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/24a8469c2ff6

Flags: needinfo?(wmccloskey)

Phil Ringnalda (:philor)

Comment 16

•

10 years ago

https://hg.mozilla.org/mozilla-central/rev/24a8469c2ff6

Status: NEW → RESOLVED

Closed: 10 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla35

Jack Moffitt [:jack]

Comment 17

•

10 years ago

Small followup. I just compiled from changeset 205234:bf5fcc0c4b27 and the hang is fixed. I now get a tab crash page.

After the crash, the browser is quite slow reloading the other tabs. It probably took a minute to get back to normal. I have no idea if that's normal.

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 18

•

10 years ago

(In reply to Jack Moffitt [:jack] from comment #17)
> Small followup. I just compiled from changeset 205234:bf5fcc0c4b27 and the
> hang is fixed. I now get a tab crash page.
> 
> After the crash, the browser is quite slow reloading the other tabs. It
> probably took a minute to get back to normal. I have no idea if that's
> normal.

Usually session restore reloads tabs on demand. But the tab crashed page actually loads them all right away. Maybe that's what you're seeing.

Chris Peterson [:cpeterson]

Comment 19

•

10 years ago

(In reply to Jack Moffitt [:jack] from comment #17)
> After the crash, the browser is quite slow reloading the other tabs. It
> probably took a minute to get back to normal. I have no idea if that's
> normal.

Jack, you are probably seeing session restore bug 1037179.

Sample of Nightly.txt 10 years ago Brad Lassey [:blassey] (use needinfo?) 115.18 KB, text/plain		Details
avoid-ipc-hang 10 years ago Bill McCloskey [inactive unless it's an emergency] (:billm) 1.85 KB, patch	benjamin : review+	Details \| Diff \| Splinter Review