Closed Bug 1614305 Opened 5 years ago Closed 5 years ago

Crash in [@ IPCError-browser | ShutDownKill | NtYieldExecution]

Categories

(Core :: XPCOM, defect, P3)

Unspecified
Windows 10
defect

Tracking

()

RESOLVED DUPLICATE of bug 1279293

People

(Reporter: pascalc, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: crash)

Crash Data

This bug is for crash report bp-d90d2a48-9fa2-41a9-800d-3df1c0200210.

Top 10 frames of crashing thread:

0 ntdll.dll NtYieldExecution 
1 user32.dll PeekMessageW 
2 xul.dll SingleNativeEventPump::OnProcessNextEvent widget/windows/nsAppShell.cpp:140
3 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1124
4 xul.dll NS_ProcessNextEvent xpcom/threads/nsThreadUtils.cpp:486
5 xul.dll mozilla::ipc::MessagePump::Run ipc/glue/MessagePump.cpp:87
6 xul.dll MessageLoop::RunHandler ipc/chromium/src/base/message_loop.cc:308
7 xul.dll MessageLoop::Run ipc/chromium/src/base/message_loop.cc:290
8 xul.dll nsBaseAppShell::Run widget/nsBaseAppShell.cpp:137
9 xul.dll nsAppShell::Run widget/windows/nsAppShell.cpp:406

This signature shows up only on nightly which is odd, because I don't see why it shouldn't show up in release build. What makes it worrisome is that if you look at the crashes you'll see that all the threads are stopped waiting. The content processes here didn't even begin shutting down, they're just sitting idle.

Tracking given the volume, let's see if the crashes get into 74 beta.

Spinning the event loop is sort of like XPCOM...

Component: General → XPCOM

Restricted to nightly, no crashes in 74 beta so the release is unaffected.

Bugbug thinks this bug is a regression, but please revert this change in case of error.

Keywords: regression

Nathan, the crash volume seems to be high. Could you set the priority flag for this bug?

Flags: needinfo?(nfroyd)

I am apparently dumb, because what's getting changed in ntdll.dll is our hooking of functions using WindowsDllInterceptor, and the code getting changed is completely unrelated to where the crashes are. Thanks to dmajor for educating me.

So I have no idea what's going on with these content processes, unless we're not going through their proper shutdown sequence due to some nightly-only thing. I don't suppose we're able to correlate these with parent process crashes (or maybe the parent process is shutting down cleanly...), gsvelto?

Flags: needinfo?(gsvelto)

Looking at the IPCShutdownState annotation there's roughly 30% of the crashes which have the annotation set to SendFinishShutdown (sent) so they were just being slow; the process finished shutting down right after we captured the minidump and before we killed the process.

The remaining ones don't have the annotation set at all which indicates that they haven't received the shutdown message yet. This can happen if the process has never been scheduled between when we sent the shutdown IPC message and when we decided to kill it. It's 5 seconds, which is a pretty long time, but it could just indicate slowness.

Which made me remember something important on Windows: since bug 1366356 processes that were not running a foreground tab had their CPU priority demoted. Since we're killing these processes they're certainly not running a foreground tab. A way to speed them up might be to raise their priority again just before we send them the shutdown IPC message. I'll file a bug for that.

Flags: needinfo?(gsvelto)
See Also: → 1619676

Considering the "regressionwindow-wanted" tag, I could try to find the regression if some steps to reproduce are provided.
Please NI me if any working STR are obtained. Thanks.

QA Whiteboard: [qa-regression-triage]
QA Whiteboard: [qa-regression-triage]

This is not a new issue, removed the regression-related flags.

Dan - see comment 7

Flags: needinfo?(dveditz)

I believe comment 7 is obsolete: comment 8 explains it. I've hidden the comment to avoid future confusion.

Flags: needinfo?(dveditz)

This is one of the top overall Nightly crashes at the moment. Given the resolution of bug 1619676, is there anything else we can do to mitigate this issue?

Flags: needinfo?(gsvelto)

This is just a "content process being slow" issue so I'm dup'ing against bug 1279293 which is where this belongs.

Flags: needinfo?(gsvelto)
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.