Closed Bug 1373540 Opened 2 years ago Closed 2 years ago

Crash in mozilla::layers::CompositorBridgeParent::InitSameProcess when terminating GPU process from about:support

Categories

(Core :: Graphics: Layers, defect, P3, critical)

Unspecified
All
defect

Tracking

()

RESOLVED FIXED
mozilla56
Tracking Status
firefox-esr52 --- unaffected
firefox54 --- unaffected
firefox55 --- unaffected
firefox56 --- fixed

People

(Reporter: m_kato, Assigned: aosmond)

References

Details

(Keywords: crash, Whiteboard: gfx-noted)

Crash Data

Attachments

(1 file, 2 obsolete files)

This bug was filed from the Socorro interface and is 
report bp-95a6e1c6-6561-4b5c-a004-f965f0170616.
=============================================================

- Step
1. Browse about:support
2. Click [Terminate GPU process]

- Result
Chrome process crashes.

- Expected Result
Don't crash chrome process even if GPU process is terminated.
Reproduced on Windows 10 + Nighty 20170615030208
It seems like a regression of Bug 1365927. CompositorManagerParent::CreateSameProcessWidgetCompositorBridge() returned nullptr. And then de-reference the nullptr.
Blocks: 1365927
:aosmond, Bug 1365927 might cause this bug, can you take a look?
Flags: needinfo?(aosmond)
I expect this is mine. I tested Terminate GPU process in about:support on Linux (when I use layers.gpu-process.force-enabled, that is), and just assumed it would be similar on Windows. That doesn't seem to be the case though :(.
Assignee: nobody → aosmond
Flags: needinfo?(aosmond)
Never mind, this reproduces on Linux.
Attachment #8878462 - Flags: review?(dvander)
Status: NEW → ASSIGNED
Has STR: --- → yes
OS: Windows 10 → All
Priority: -- → P3
Whiteboard: gfx-noted
Comment on attachment 8878462 [details] [diff] [review]
Fix switchover from the GPU process compositor to the same process compositor., v1

Review of attachment 8878462 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/ipc/CompositorManagerChild.cpp
@@ +65,5 @@
> +
> +  // GPUChild was torn down first when the GPU process was lost. We can't wait
> +  // for CompositorManagerChild::ActorDestroy because we may recreate the
> +  // CompositorManager pair before that is executed.
> +  sInstance->mCanSend = false;

Do you need to set mCanSend here? It should be okay to wait for ActorDestroy for that part.

::: gfx/layers/ipc/CompositorManagerParent.cpp
@@ +63,5 @@
>    CompositorThreadHolder::Loop()->PostTask(runnable.forget());
>  }
>  
> +/* static */ void
> +CompositorManagerParent::OnGPUProcessShutdown()

Do you need this function? If this is for a GPU process crash, the parent doesn't exist in the UI process yet. (But even if it did, it should go away automatically.)
(In reply to David Anderson [:dvander] from comment #7)
> Comment on attachment 8878462 [details] [diff] [review]
> Fix switchover from the GPU process compositor to the same process
> compositor., v1
> 
> Review of attachment 8878462 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: gfx/layers/ipc/CompositorManagerChild.cpp
> @@ +65,5 @@
> > +
> > +  // GPUChild was torn down first when the GPU process was lost. We can't wait
> > +  // for CompositorManagerChild::ActorDestroy because we may recreate the
> > +  // CompositorManager pair before that is executed.
> > +  sInstance->mCanSend = false;
> 
> Do you need to set mCanSend here? It should be okay to wait for ActorDestroy
> for that part.
> 

Hm, you are right, I was just being paranoid.

> ::: gfx/layers/ipc/CompositorManagerParent.cpp
> @@ +63,5 @@
> >    CompositorThreadHolder::Loop()->PostTask(runnable.forget());
> >  }
> >  
> > +/* static */ void
> > +CompositorManagerParent::OnGPUProcessShutdown()
> 
> Do you need this function? If this is for a GPU process crash, the parent
> doesn't exist in the UI process yet. (But even if it did, it should go away
> automatically.)

This makes sense yes. When I debugged it the first time, I was in CompositorManagerParent, but if I fix CompositorManagerChild first, that error never happens.
Attachment #8878462 - Attachment is obsolete: true
Attachment #8878462 - Flags: review?(dvander)
Attachment #8878520 - Flags: review?(dvander)
Comment on attachment 8878520 [details] [diff] [review]
Fix switchover from the GPU process compositor to the same process compositor., v2

Review of attachment 8878520 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/ipc/CompositorManagerChild.cpp
@@ +63,5 @@
> +
> +  // GPUChild was torn down first when the GPU process was lost. We can't wait
> +  // for CompositorManagerChild::ActorDestroy because we may recreate the
> +  // CompositorManager pair before that is executed.
> +  sInstance = nullptr;

Maybe worth asserting that sInstance is null or |this|
Attachment #8878520 - Flags: review?(dvander) → review+
(In reply to David Anderson [:dvander] from comment #9)
> Comment on attachment 8878520 [details] [diff] [review]
> Fix switchover from the GPU process compositor to the same process
> compositor., v2
> 
> Review of attachment 8878520 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: gfx/layers/ipc/CompositorManagerChild.cpp
> @@ +63,5 @@
> > +
> > +  // GPUChild was torn down first when the GPU process was lost. We can't wait
> > +  // for CompositorManagerChild::ActorDestroy because we may recreate the
> > +  // CompositorManager pair before that is executed.
> > +  sInstance = nullptr;
> 
> Maybe worth asserting that sInstance is null or |this|

Hm, it is a static method, and I think it *could* be null in some circumstances, like the GPU process went down before we even got to setup the CompositorManagerChild...
Pushed by aosmond@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/866d59780f6d
Fix switchover from the GPU process compositor to the same process compositor. r=dvander
Backed out for mass test failures on Windows debug (tests terminated with RunWatchdog):

https://hg.mozilla.org/integration/mozilla-inbound/rev/554c39a4445a5e3e2219f99c527e79e3d50eb6d7

Push with failures: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=866d59780f6d2d2978aa1f5671eab0d6324db4fc&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=runnable
Failure log example: https://treeherder.mozilla.org/logviewer.html#?job_id=107944249&repo=mozilla-inbound

22:41:33     INFO -  ###!!! [Child][MessageChannel] Error: (msgtype=0x74001B,name=PGPU::Msg_AccumulateChildKeyedHistograms) Closed channel: cannot send/recv
22:41:33     INFO -  [GPU 388] WARNING: '!ipcActor->SendAccumulateChildKeyedHistograms(keyedAccumulationsToSend)', file z:/build/build/src/toolkit/components/telemetry/ipc/TelemetryIPCAccumulator.cpp, line 268
22:41:35     INFO -  ###!!! [Child][MessageChannel] Error: (msgtype=0x74001A,name=PGPU::Msg_AccumulateChildHistograms) Closed channel: cannot send/recv
22:41:35     INFO -  [GPU 388] WARNING: '!ipcActor->SendAccumulateChildHistograms(accumulationsToSend)', file z:/build/build/src/toolkit/components/telemetry/ipc/TelemetryIPCAccumulator.cpp, line 264
22:41:35     INFO -  ###!!! [Child][MessageChannel] Error: (msgtype=0x74001B,name=PGPU::Msg_AccumulateChildKeyedHistograms) Closed channel: cannot send/recv
22:41:35     INFO -  [GPU 388] WARNING: '!ipcActor->SendAccumulateChildKeyedHistograms(keyedAccumulationsToSend)', file z:/build/build/src/toolkit/components/telemetry/ipc/TelemetryIPCAccumulator.cpp, line 268
22:41:36     INFO -  Hit MOZ_CRASH(Shutdown too long, probably frozen, causing a crash.) at z:/build/build/src/toolkit/components/terminator/nsTerminator.cpp:160
22:41:37     INFO -  #01: _PR_NativeRunThread [nsprpub/pr/src/threads/combined/pruthr.c:397]
22:41:37     INFO -  #02: pr_root [nsprpub/pr/src/md/windows/w95thred.c:95]
22:41:37     INFO -  #03: ucrtbase.DLL + 0x3d5ef
22:41:37     INFO -  #04: kernel32.dll + 0x53c45
22:41:37     INFO -  #05: patched_BaseThreadInitThunk [mozglue/build/WindowsDllBlocklist.cpp:849]
22:41:37     INFO -  #06: ntdll.dll + 0x637f5
22:41:37     INFO -  #07: ntdll.dll + 0x637c8
22:41:37     INFO -  [GPU 388] WARNING: '!compMgr', file z:/build/build/src/xpcom/components/nsComponentManagerUtils.cpp, line 63
22:41:37     INFO -  nsStringStats
22:41:37     INFO -   => mAllocCount:           4877
22:41:37     INFO -   => mReallocCount:           33
22:41:37     INFO -   => mFreeCount:            4877
22:41:37     INFO -   => mShareCount:           5746
22:41:37     INFO -   => mAdoptCount:              0
22:41:37     INFO -   => mAdoptFreeCount:          0
22:41:37     INFO -   => Process ID: 388, Thread ID: 384
22:41:37  WARNING -  TEST-UNEXPECTED-FAIL | file:///Z:/task_1497738271/build/tests/reftest/tests/layout/reftests/w3c-css/submitted/variables/variable-declaration-40.html | application terminated with exit code 1
Flags: needinfo?(aosmond)
Not sure if I follow the exact cause of the failure, the stack traces don't offer many clues, but let's try keeping around the CompositorManagerChild until we actually replace it and see if that helps.

try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=edf15979384fed4e304c79c893097314e6a76b95
Attachment #8878520 - Attachment is obsolete: true
Flags: needinfo?(aosmond)
Pushed by aosmond@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/54f3c792db02
Fix switchover from the GPU process compositor to the same process compositor. r=me,dvander
https://hg.mozilla.org/mozilla-central/rev/54f3c792db02
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
Hello Andrew: I still see crashes in 20170621102301 - is it possible there is more than one crash in this signature? The comments I see mention A-Frame and Web VR.
Flags: needinfo?(aosmond)
I will fix this in bug 1374278. It is the same race as in that bug, but happening in different states (reinit GPU process, or move into UI process).
Flags: needinfo?(aosmond)
Hey aosmond,

Do you think this crash report[1], which crashes within mozilla::layers::CompositorBridgeParent::InitSameProcess, but also has the fix for this bug, indicates that there are other crashing cases that need to be addressed? Or is [1] another manifestation of the crasher you fixed in bug 1374278?

[1]: https://crash-stats.mozilla.com/report/index/523b88f1-08b6-45be-b352-947c00170625
(In reply to Mike Conley (:mconley) from comment #18)
> Hey aosmond,
> 
> Do you think this crash report[1], which crashes within
> mozilla::layers::CompositorBridgeParent::InitSameProcess, but also has the
> fix for this bug, indicates that there are other crashing cases that need to
> be addressed? Or is [1] another manifestation of the crasher you fixed in
> bug 1374278?
> 
> [1]:
> https://crash-stats.mozilla.com/report/index/523b88f1-08b6-45be-b352-
> 947c00170625

Thanks -- turns out to be the same signature, but a different problem. Bug 1376590 tracks this. Hopefully the last one.
See Also: → 1376590
You need to log in before you can comment on or make changes to this bug.