Closed Bug 1374258 Opened 3 years ago Closed 3 years ago

Crash in mozilla::ipc::Endpoint<T>::Bind from ImageBridgeParent

Categories

(Core :: Graphics: Layers, defect)

55 Branch
All
Windows
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla56
Tracking Status
firefox-esr52 --- unaffected
firefox54 --- unaffected
firefox55 --- fixed
firefox56 --- fixed

People

(Reporter: philipp, Assigned: dvander)

References

Details

(Keywords: crash, regression)

Crash Data

Attachments

(2 files)

This bug was filed from the Socorro interface and is 
report bp-f6406714-2bf5-4b2d-ae54-c795a0170618.
=============================================================
Crashing Thread (24), Name: Compositor
Frame 	Module 	Signature 	Source
0 	xul.dll 	mozilla::ipc::Endpoint<mozilla::PProfilerChild>::Bind(mozilla::PProfilerChild*) 	obj-firefox/dist/include/mozilla/ipc/ProtocolUtils.h:649
1 	xul.dll 	mozilla::layers::ImageBridgeParent::Bind(mozilla::ipc::Endpoint<mozilla::layers::PImageBridgeParent>&&) 	gfx/layers/ipc/ImageBridgeParent.cpp:221
2 	xul.dll 	mozilla::detail::RunnableMethodImpl<mozilla::dom::HTMLMediaElement* const, void ( mozilla::dom::HTMLMediaElement::*)(nsACString const&), 1, 0, nsCString>::Run() 	obj-firefox/dist/include/nsThreadUtils.h:1133
3 	xul.dll 	MessageLoop::RunTask(already_AddRefed<mozilla::Runnable>) 	ipc/chromium/src/base/message_loop.cc:361
4 	xul.dll 	MessageLoop::DeferOrRunPendingTask(MessageLoop::PendingTask&&) 	ipc/chromium/src/base/message_loop.cc:369
5 	xul.dll 	MessageLoop::DoWork() 	ipc/chromium/src/base/message_loop.cc:444
6 	xul.dll 	base::MessagePumpForUI::DoRunLoop() 	ipc/chromium/src/base/message_pump_win.cc:212
7 	xul.dll 	base::MessagePumpWin::RunWithDispatcher(base::MessagePump::Delegate*, base::MessagePumpWin::Dispatcher*) 	ipc/chromium/src/base/message_pump_win.cc:56
8 	xul.dll 	base::MessagePumpWin::Run(base::MessagePump::Delegate*) 	ipc/chromium/src/base/message_pump_win.h:80
9 	xul.dll 	MessageLoop::RunHandler() 	ipc/chromium/src/base/message_loop.cc:231
10 	xul.dll 	MessageLoop::Run() 	ipc/chromium/src/base/message_loop.cc:211
11 	xul.dll 	base::Thread::ThreadMain() 	ipc/chromium/src/base/thread.cc:180
12 	xul.dll 	`anonymous namespace'::ThreadFunc 	ipc/chromium/src/base/platform_thread_win.cc:28
13 	kernel32.dll 	BaseThreadInitThunk 	
14 	mozglue.dll 	patched_BaseThreadInitThunk 	mozglue/build/WindowsDllBlocklist.cpp:809
15 	ntdll.dll 	__RtlUserThreadStart 	
16 	ntdll.dll 	_RtlUserThreadStart

reports with this signature seem to be increasing in the 55.0b cycle judging from early stability data of the staged rollout there (the signature has been present for a while longer though). reports come from users on windows and generally have MOZ_RELEASE_ASSERT(mMyPid == base::GetCurrentProcId()) that got added in bug 1223240.
I looked at the proto signatures, and almost all of these crashes are in ImageBridgeParent.
Component: IPC → Graphics: Layers
Summary: Crash in mozilla::ipc::Endpoint<T>::Bind → Crash in mozilla::ipc::Endpoint<T>::Bind from ImageBridgeParent
None of these crashes have the "process type" field set, which I think means they are in the main process.
My guess is this is a super rare race here [1]. We call EnsureGPUReady after binding a PID to the pipe. If it returns false, we'll try to bind it in the parent process instead, and the PIDs won't match.

We should probably call EnsureGPUReady at the top and use the return value where we checked it previously (and where we check mGPUChild). This would affect the other pipe-y functions as well.

[1] http://searchfox.org/mozilla-central/rev/7cc377ce3f0a569c5c9b362d589705cd4fecc0ac/gfx/ipc/GPUProcessManager.cpp#802
This was partially fixed by bug 1314543.
Assignee: nobody → dvander
Attached patch fixSplinter Review
Attachment #8879271 - Flags: review?(rhunt)
Attachment #8879271 - Flags: review?(rhunt) → review+
Keywords: checkin-needed
David, wanna land this patch?
Oh dear, I thought I landed this already. Will do.
Pushed by danderson@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/a45473c94b8d
Fix race where in-process GPU endpoints could have the wrong PID. (bug 1374258, r=rhunt)
https://hg.mozilla.org/mozilla-central/rev/a45473c94b8d
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
i don't see any more crashes on nightly after build 20170630030203 - could you request an uplift to beta as you deem fit?
Flags: needinfo?(dvander)
Comment on attachment 8879271 [details] [diff] [review]
fix

Approval Request Comment
[Feature/Bug causing the regression]: Compositor Process (firefox 53)
[User impact if declined]: rare crash when the compositor process crashes during launch
[Is this code covered by automated tests?]:
[Has the fix been verified in Nightly?]: Yes
[Needs manual test from QE? If yes, steps to reproduce]: No
[List of other uplifts needed for the feature/fix]:
[Is the change risky?]: No
[Why is the change risky/not risky?]: This is not an invasive change, we're just checking state earlier to avoid a race.
[String changes made/needed]:

Note: I'm not sure if this patch will apply to beta, since this code was refactored recently. If not let me know and I will make a new patch.
Flags: needinfo?(dvander)
Attachment #8879271 - Flags: approval-mozilla-beta?
Comment on attachment 8879271 [details] [diff] [review]
fix

fix a race condition leading to crash, beta55+
Attachment #8879271 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Needs a rebased patch.
Flags: needinfo?(dvander)
Attached patch fixSplinter Review
Minimal fix for beta.
Flags: needinfo?(dvander)
Attachment #8885993 - Flags: review?(rhunt)
Attachment #8885993 - Flags: review?(rhunt) → review+
(In reply to David Anderson [:dvander] from comment #11)
> [Is this code covered by automated tests?]:
> [Has the fix been verified in Nightly?]: Yes
> [Needs manual test from QE? If yes, steps to reproduce]: No

Setting qe-verify- based on David's assessment on manual testing needs.
Flags: qe-verify-
You need to log in before you can comment on or make changes to this bug.