Crash in [@ IPCError-browser | ShutDownKill | mozilla::ipc::MessageChannel::SynchronouslyClose]
Categories
(Core :: Networking, defect, P2)
Tracking
()
People
(Reporter: gsvelto, Unassigned)
References
Details
(Keywords: crash, Whiteboard: [necko-triaged])
Crash Data
This bug is for crash report bp-eb962072-9915-4d77-897b-7a76a0200209.
Top 10 frames of crashing thread:
0 ntdll.dll NtWaitForAlertByThreadId
1 ntdll.dll RtlSleepConditionVariableSRW
2 kernelbase.dll SleepConditionVariableSRW
3 mozglue.dll mozilla::detail::ConditionVariableImpl::wait mozglue/misc/ConditionVariable_windows.cpp:50
4 xul.dll mozilla::ipc::MessageChannel::SynchronouslyClose ipc/glue/MessageChannel.cpp:2694
5 xul.dll mozilla::ipc::MessageChannel::Close ipc/glue/MessageChannel.cpp:2767
6 xul.dll mozilla::net::SocketProcessBridgeChild::Observe netwerk/ipc/SocketProcessBridgeChild.cpp:168
7 xul.dll nsObserverList::NotifyObservers xpcom/ds/nsObserverList.cpp:65
8 xul.dll nsObserverService::NotifyObservers xpcom/ds/nsObserverService.cpp:292
9 xul.dll mozilla::dom::ContentChild::ShutdownInternal dom/ipc/ContentChild.cpp:3059
This is a content process hung during shutdown and it seems to be happening almost exclusively on nightly.
It seems like the content process was stuck here waiting for something to happen before we were forced to kill it because it was taking too long.
I don't know this code well but that looks like a synchronous IPC message. Those tend to be slow so we might just being too slow here, but we might also be stuck.
Reporter | ||
Comment 1•5 years ago
|
||
I found another signature for this issue.
Reporter | ||
Comment 2•5 years ago
|
||
Found another signature for this.
Comment 3•5 years ago
|
||
Kershaw, Byron, can you take look?
Comment 4•5 years ago
|
||
Seems like the content process is stuck waiting on the socket process. I see a similar signature that looks related to RDD here.
Maybe there's something about IPC channels between content and other types of child process that simply isn't plumbed correctly right now? It would be really nice to know what is going on in the other process (eg; socket, RDD, what-have-you).
Reporter | ||
Comment 5•5 years ago
|
||
(In reply to Byron Campen [:bwc] from comment #4)
Maybe there's something about IPC channels between content and other types of child process that simply isn't plumbed correctly right now? It would be really nice to know what is going on in the other process (eg; socket, RDD, what-have-you).
It's technically possible to do that but it needs specific plumbing within the ContentParent
class. ATM we grab a minidump for the affected content process and the main process. It should be possible to also grab minidumps for the socket and RDD and associate them with the crash report. The modifications would be non-trivial though.
Comment 6•5 years ago
|
||
I've spent some time on this, but I still can't figure out the root cause of this.
I think this might need another fresh pair of eyes to take a look.
;jld, do you probably have an idea about this?
Thanks.
Reporter | ||
Comment 7•5 years ago
•
|
||
I've had another pass at the crashes and I'm now convinced that this is just content processes being slow during shutdown. Here's why: many crashes have the IPCShutdownState
annotation set to RecvShutdown
which is consistent with the stack we see here - shutdown has begun but not finished yet. However the majority of the crashes have that annotation set to SendFinishShutdown (sent)
which happens past the point where this stack trace originates from.
Since the minidump and the annotations are not perfectly in-sync it's possible that in most cases we grabbed a minidump, and by the time we grabbed the annotations the content process had made forward progress already. Jed if you agree with this analysis feel free to close this as invalid and move the signatures to bug 1279293 since this is just generic slowness and not something we can act upon directly.
Reporter | ||
Comment 8•5 years ago
|
||
Naturally if we could speed-up this step it should bring the overall volume down.
![]() |
||
Comment 9•5 years ago
|
||
cc dthayer for speeding up shutdown having other nice knock-on effects.
(In reply to Gabriele Svelto [:gsvelto] from comment #5)
(In reply to Byron Campen [:bwc] from comment #4)
Maybe there's something about IPC channels between content and other types of child process that simply isn't plumbed correctly right now? It would be really nice to know what is going on in the other process (eg; socket, RDD, what-have-you).
It's technically possible to do that but it needs specific plumbing within the
ContentParent
class. ATM we grab a minidump for the affected content process and the main process. It should be possible to also grab minidumps for the socket and RDD and associate them with the crash report. The modifications would be non-trivial though.
We should consider doing this, I think; when the RDD (or socket?) processes were being brought up, I remember fielding questions from people who were puzzled about why we didn't get crash dumps for them...which makes debugging on try annoying.
Comment 10•5 years ago
|
||
¡Hola!
Per https://crash-stats.mozilla.org/signature/?product=Firefox&signature=IPCError-browser%20%7C%20ShutDownKill%20%7C%20NtSetIoCompletion 74 and 75 are affected.
¡Gracias!
Alex
Updated•5 years ago
|
Comment 11•5 years ago
|
||
The explanation in comment #7 sounds plausible… but the IPCShutdownState
annotation is specific to ContentParent
and I don't know it very well; someone from the DOM: Content Processes component might be more helpful.
Comment 12•5 years ago
|
||
The priority flag is not set for this bug.
:mayhemer, could you have a look please?
For more information, please visit auto_nag documentation.
![]() |
||
Updated•5 years ago
|
Reporter | ||
Updated•5 years ago
|
Updated•5 years ago
|
Comment 14•5 years ago
|
||
¡Hola!
Just noticed that my Firefox Nightly crashed like this earlier today:
bp-28730c5f-abfc-49c7-83fa-01e3f0200502
Updating flags per https://crash-stats.mozilla.org/signature/?product=Firefox&signature=IPCError-browser%20%7C%20ShutDownKill%20%7C%20mozilla%3A%3Aipc%3A%3AMessageChannel%3A%3ASynchronouslyClose FWIW.
¡Gracias!
Alex
Comment 15•5 years ago
|
||
(In reply to alex_mayorga from comment #14)
Updating flags per https://crash-stats.mozilla.org/signature/?product=Firefox&signature=IPCError-browser%20%7C%20ShutDownKill%20%7C%20mozilla%3A%3Aipc%3A%3AMessageChannel%3A%3ASynchronouslyClose FWIW.
Please don't.
Comment 16•4 years ago
|
||
I wasn't shutting down Firefox when I hit this - d16ae064-c3c2-46b4-9aeb-82d9e0201011
The entire window became momentarily blurry, as if the window had been duplicated and offset a few pixels (like bad font smoothing), then Firefox crashed without me doing anything.
Updated•3 years ago
|
Comment 17•3 years ago
|
||
Recent builds don't seem to be affected.
Description
•