Closed Bug 1668034 Opened 5 years ago Closed 5 years ago

Crash in [@ shutdownhang | PR_MD_WAIT_CV | PR_Wait | mozilla::layers::ImageBridgeChild::ShutdownSingleton]

Categories

(Core :: Graphics: Layers, defect)

Firefox 82
All
Windows
defect

Tracking

()

RESOLVED DUPLICATE of bug 1613128
Tracking Status
firefox-esr78 --- unaffected
firefox81 --- unaffected
firefox82 + wontfix
firefox83 --- ?

People

(Reporter: philipp, Unassigned)

References

Details

(Keywords: crash, regression, topcrash)

Crash Data

Crash report: https://crash-stats.mozilla.org/report/index/6c3f85b9-1067-4260-b48d-e8fe70200929

Top 10 frames of crashing thread:

0 ntdll.dll NtWaitForSingleObject 
1 kernelbase.dll WaitForSingleObjectEx 
2 nss3.dll PR_MD_WAIT_CV nsprpub/pr/src/md/windows/w95cv.c:252
3 nss3.dll PR_Wait nsprpub/pr/src/threads/prmon.c:305
4 xul.dll static mozilla::layers::ImageBridgeChild::ShutdownSingleton gfx/layers/ipc/ImageBridgeChild.cpp:498
5 xul.dll static mozilla::layers::ImageBridgeChild::ShutDown gfx/layers/ipc/ImageBridgeChild.cpp:485
6 xul.dll static gfxPlatform::ShutdownLayersIPC gfx/thebes/gfxPlatform.cpp:1375
7 xul.dll mozilla::ShutdownXPCOM xpcom/build/XPCOMInit.cpp:632
8 xul.dll ScopedXPCOMStartup::~ScopedXPCOMStartup toolkit/xre/nsAppRunner.cpp:1291
9 xul.dll XREMain::XRE_main toolkit/xre/nsAppRunner.cpp:4962

windows shutdownhangs with this signature seem to be appearing for the first time in firefox 82 - they currently account for 1.4% of browser crashes on beta.

Blocks: gfx-triage
Flags: needinfo?(aosmond)
Keywords: topcrash

:jimm - is this something you are intending to fix for Firefox 82? Any idea how the crash is impacting users?

Flags: needinfo?(jmathies)

In bug 1402592 we changed to ensure we shut down the ImageBridgeParent before the children. It was me who wrote that patch, but at this moment, I'm having trouble recalling the full story behind why. It was related to the CompositorManager changes I did so that all CompositorBridge objects would have a common IPDL ancestor, which allowed us to reason about ordering of IPC messages between different CompositorBridge entities (as well as add messages to the CompositorManager, when we just wanted to talk to the compositor thread in the other process, but not for a specific bridge).

Whatever I was trying to accomplish, I believe bug 1653060 changed it so that now the content processes are shutting down before the GPU process gets to shutting down the ImageBridge channels.

My investigation continues.

Flags: needinfo?(jmathies)
Regressed by: 1653060
See Also: → 1402592

jya, not sure if you have any alternatives to the changes you made in GPUParent::ActorDestroy from the perspective of bug 1653060? ImageBridgeParent/Child are used by video frames, but I'm not sure if I can move it to be sooner, or if it was integral to what you were trying to accomplish in that bug?

Flags: needinfo?(aosmond) → needinfo?(jyavenard)
Flags: needinfo?(aosmond)

This could be signature morph of bug 1613128. It does seem to happen more often now in beta as well.

See Also: → 1613128

(In reply to Andrew Osmond [:aosmond] from comment #4)

This could be signature morph of bug 1613128. It does seem to happen more often now in beta as well.

Now, while I suspect it is possible bug 1653060 has contributed to an increased rate of what we saw in bug 1613128, that it isn't truly a regression and it should be marked as a dupe (and signatures updated).

Status: NEW → RESOLVED
Closed: 5 years ago
Flags: needinfo?(jyavenard)
Flags: needinfo?(aosmond)
No longer regressed by: 1653060
Resolution: --- → DUPLICATE
No longer blocks: gfx-triage

(In reply to Andrew Osmond [:aosmond] from comment #3)

jya, not sure if you have any alternatives to the changes you made in GPUParent::ActorDestroy from the perspective of bug 1653060? ImageBridgeParent/Child are used by video frames, but I'm not sure if I can move it to be sooner, or if it was integral to what you were trying to accomplish in that bug?

bug 1653060 ensured that the content process' RemoteDecoderManagerChild didn't get destroyed before the RDD's RemoteDecoderManagerParent as if it did the images stored in the RemoteDecoderManagerParent would never be freed and be reported as leaked and that is the only thing it changed. It doesnt change on when GPUParent itself is going to be destroyed, this is orthogonal.

It may have made more common the problem on when the GPU process itself is being destroyed but 1653060 had no impact on that, at worse a change in signature.

The entire shutdown sequencing is broken and has been for a while, and this is a shutdown crash. It's a wonder it doesn't cause more problem more often,

We probably need something similar to 1653060, but for the ImageBridgParent/Child instead so that the GPUParent doesn't get killed too early.

Right now the ImageBridge is shutdown whenever XPCOM is shutdown, it pays no attention to any pending transactions which is wrong.

A good start would be to stop using SynchronousTask, it's buggy and here fundamentally wrong. XPCOM is being shutdown and threads are being shutdown.
It's entirely likely that the task dispatched to call ImageBridgeChild::ShutdownStep1 failed to dispatch, if so we have a deadlock as the synchronoustask will never then be marked as done.

You need to log in before you can comment on or make changes to this bug.