Crash in mozilla::ipc::Endpoint<T>::Bind from ImageBridgeParent

RESOLVED FIXED in Firefox 55

Status

()

defect
--
critical
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: philipp, Assigned: dvander)

Tracking

({crash, regression})

55 Branch
mozilla56
All
Windows
Points:
---
Bug Flags:
qe-verify -

Firefox Tracking Flags

(firefox-esr52 unaffected, firefox54 unaffected, firefox55 fixed, firefox56 fixed)

Details

(crash signature)

Attachments

(2 attachments)

(Reporter)

Description

2 years ago
This bug was filed from the Socorro interface and is 
report bp-f6406714-2bf5-4b2d-ae54-c795a0170618.
=============================================================
Crashing Thread (24), Name: Compositor
Frame 	Module 	Signature 	Source
0 	xul.dll 	mozilla::ipc::Endpoint<mozilla::PProfilerChild>::Bind(mozilla::PProfilerChild*) 	obj-firefox/dist/include/mozilla/ipc/ProtocolUtils.h:649
1 	xul.dll 	mozilla::layers::ImageBridgeParent::Bind(mozilla::ipc::Endpoint<mozilla::layers::PImageBridgeParent>&&) 	gfx/layers/ipc/ImageBridgeParent.cpp:221
2 	xul.dll 	mozilla::detail::RunnableMethodImpl<mozilla::dom::HTMLMediaElement* const, void ( mozilla::dom::HTMLMediaElement::*)(nsACString const&), 1, 0, nsCString>::Run() 	obj-firefox/dist/include/nsThreadUtils.h:1133
3 	xul.dll 	MessageLoop::RunTask(already_AddRefed<mozilla::Runnable>) 	ipc/chromium/src/base/message_loop.cc:361
4 	xul.dll 	MessageLoop::DeferOrRunPendingTask(MessageLoop::PendingTask&&) 	ipc/chromium/src/base/message_loop.cc:369
5 	xul.dll 	MessageLoop::DoWork() 	ipc/chromium/src/base/message_loop.cc:444
6 	xul.dll 	base::MessagePumpForUI::DoRunLoop() 	ipc/chromium/src/base/message_pump_win.cc:212
7 	xul.dll 	base::MessagePumpWin::RunWithDispatcher(base::MessagePump::Delegate*, base::MessagePumpWin::Dispatcher*) 	ipc/chromium/src/base/message_pump_win.cc:56
8 	xul.dll 	base::MessagePumpWin::Run(base::MessagePump::Delegate*) 	ipc/chromium/src/base/message_pump_win.h:80
9 	xul.dll 	MessageLoop::RunHandler() 	ipc/chromium/src/base/message_loop.cc:231
10 	xul.dll 	MessageLoop::Run() 	ipc/chromium/src/base/message_loop.cc:211
11 	xul.dll 	base::Thread::ThreadMain() 	ipc/chromium/src/base/thread.cc:180
12 	xul.dll 	`anonymous namespace'::ThreadFunc 	ipc/chromium/src/base/platform_thread_win.cc:28
13 	kernel32.dll 	BaseThreadInitThunk 	
14 	mozglue.dll 	patched_BaseThreadInitThunk 	mozglue/build/WindowsDllBlocklist.cpp:809
15 	ntdll.dll 	__RtlUserThreadStart 	
16 	ntdll.dll 	_RtlUserThreadStart

reports with this signature seem to be increasing in the 55.0b cycle judging from early stability data of the staged rollout there (the signature has been present for a while longer though). reports come from users on windows and generally have MOZ_RELEASE_ASSERT(mMyPid == base::GetCurrentProcId()) that got added in bug 1223240.
I looked at the proto signatures, and almost all of these crashes are in ImageBridgeParent.
Component: IPC → Graphics: Layers
Summary: Crash in mozilla::ipc::Endpoint<T>::Bind → Crash in mozilla::ipc::Endpoint<T>::Bind from ImageBridgeParent
None of these crashes have the "process type" field set, which I think means they are in the main process.
My guess is this is a super rare race here [1]. We call EnsureGPUReady after binding a PID to the pipe. If it returns false, we'll try to bind it in the parent process instead, and the PIDs won't match.

We should probably call EnsureGPUReady at the top and use the return value where we checked it previously (and where we check mGPUChild). This would affect the other pipe-y functions as well.

[1] http://searchfox.org/mozilla-central/rev/7cc377ce3f0a569c5c9b362d589705cd4fecc0ac/gfx/ipc/GPUProcessManager.cpp#802
This was partially fixed by bug 1314543.
Assignee: nobody → dvander
Posted patch fixSplinter Review
Attachment #8879271 - Flags: review?(rhunt)

Updated

2 years ago
Attachment #8879271 - Flags: review?(rhunt) → review+
(Reporter)

Updated

2 years ago
Keywords: checkin-needed
David, wanna land this patch?
Oh dear, I thought I landed this already. Will do.

Comment 8

2 years ago
Pushed by danderson@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/a45473c94b8d
Fix race where in-process GPU endpoints could have the wrong PID. (bug 1374258, r=rhunt)

Comment 9

2 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/a45473c94b8d
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
(Reporter)

Comment 10

2 years ago
i don't see any more crashes on nightly after build 20170630030203 - could you request an uplift to beta as you deem fit?
Flags: needinfo?(dvander)
Comment on attachment 8879271 [details] [diff] [review]
fix

Approval Request Comment
[Feature/Bug causing the regression]: Compositor Process (firefox 53)
[User impact if declined]: rare crash when the compositor process crashes during launch
[Is this code covered by automated tests?]:
[Has the fix been verified in Nightly?]: Yes
[Needs manual test from QE? If yes, steps to reproduce]: No
[List of other uplifts needed for the feature/fix]:
[Is the change risky?]: No
[Why is the change risky/not risky?]: This is not an invasive change, we're just checking state earlier to avoid a race.
[String changes made/needed]:

Note: I'm not sure if this patch will apply to beta, since this code was refactored recently. If not let me know and I will make a new patch.
Flags: needinfo?(dvander)
Attachment #8879271 - Flags: approval-mozilla-beta?
Comment on attachment 8879271 [details] [diff] [review]
fix

fix a race condition leading to crash, beta55+
Attachment #8879271 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Needs a rebased patch.
Flags: needinfo?(dvander)
Posted patch fixSplinter Review
Minimal fix for beta.
Flags: needinfo?(dvander)
Attachment #8885993 - Flags: review?(rhunt)

Updated

2 years ago
Attachment #8885993 - Flags: review?(rhunt) → review+
(In reply to David Anderson [:dvander] from comment #11)
> [Is this code covered by automated tests?]:
> [Has the fix been verified in Nightly?]: Yes
> [Needs manual test from QE? If yes, steps to reproduce]: No

Setting qe-verify- based on David's assessment on manual testing needs.
Flags: qe-verify-
You need to log in before you can comment on or make changes to this bug.