This bug was filed from the Socorro interface and is report bp-f6406714-2bf5-4b2d-ae54-c795a0170618. ============================================================= Crashing Thread (24), Name: Compositor Frame Module Signature Source 0 xul.dll mozilla::ipc::Endpoint<mozilla::PProfilerChild>::Bind(mozilla::PProfilerChild*) obj-firefox/dist/include/mozilla/ipc/ProtocolUtils.h:649 1 xul.dll mozilla::layers::ImageBridgeParent::Bind(mozilla::ipc::Endpoint<mozilla::layers::PImageBridgeParent>&&) gfx/layers/ipc/ImageBridgeParent.cpp:221 2 xul.dll mozilla::detail::RunnableMethodImpl<mozilla::dom::HTMLMediaElement* const, void ( mozilla::dom::HTMLMediaElement::*)(nsACString const&), 1, 0, nsCString>::Run() obj-firefox/dist/include/nsThreadUtils.h:1133 3 xul.dll MessageLoop::RunTask(already_AddRefed<mozilla::Runnable>) ipc/chromium/src/base/message_loop.cc:361 4 xul.dll MessageLoop::DeferOrRunPendingTask(MessageLoop::PendingTask&&) ipc/chromium/src/base/message_loop.cc:369 5 xul.dll MessageLoop::DoWork() ipc/chromium/src/base/message_loop.cc:444 6 xul.dll base::MessagePumpForUI::DoRunLoop() ipc/chromium/src/base/message_pump_win.cc:212 7 xul.dll base::MessagePumpWin::RunWithDispatcher(base::MessagePump::Delegate*, base::MessagePumpWin::Dispatcher*) ipc/chromium/src/base/message_pump_win.cc:56 8 xul.dll base::MessagePumpWin::Run(base::MessagePump::Delegate*) ipc/chromium/src/base/message_pump_win.h:80 9 xul.dll MessageLoop::RunHandler() ipc/chromium/src/base/message_loop.cc:231 10 xul.dll MessageLoop::Run() ipc/chromium/src/base/message_loop.cc:211 11 xul.dll base::Thread::ThreadMain() ipc/chromium/src/base/thread.cc:180 12 xul.dll `anonymous namespace'::ThreadFunc ipc/chromium/src/base/platform_thread_win.cc:28 13 kernel32.dll BaseThreadInitThunk 14 mozglue.dll patched_BaseThreadInitThunk mozglue/build/WindowsDllBlocklist.cpp:809 15 ntdll.dll __RtlUserThreadStart 16 ntdll.dll _RtlUserThreadStart reports with this signature seem to be increasing in the 55.0b cycle judging from early stability data of the staged rollout there (the signature has been present for a while longer though). reports come from users on windows and generally have MOZ_RELEASE_ASSERT(mMyPid == base::GetCurrentProcId()) that got added in bug 1223240.
I looked at the proto signatures, and almost all of these crashes are in ImageBridgeParent.
Component: IPC → Graphics: Layers
Summary: Crash in mozilla::ipc::Endpoint<T>::Bind → Crash in mozilla::ipc::Endpoint<T>::Bind from ImageBridgeParent
None of these crashes have the "process type" field set, which I think means they are in the main process.
My guess is this is a super rare race here . We call EnsureGPUReady after binding a PID to the pipe. If it returns false, we'll try to bind it in the parent process instead, and the PIDs won't match. We should probably call EnsureGPUReady at the top and use the return value where we checked it previously (and where we check mGPUChild). This would affect the other pipe-y functions as well.  http://searchfox.org/mozilla-central/rev/7cc377ce3f0a569c5c9b362d589705cd4fecc0ac/gfx/ipc/GPUProcessManager.cpp#802
This was partially fixed by bug 1314543.
Assignee: nobody → dvander
David, wanna land this patch?
Oh dear, I thought I landed this already. Will do.
Pushed by firstname.lastname@example.org: https://hg.mozilla.org/integration/mozilla-inbound/rev/a45473c94b8d Fix race where in-process GPU endpoints could have the wrong PID. (bug 1374258, r=rhunt)
i don't see any more crashes on nightly after build 20170630030203 - could you request an uplift to beta as you deem fit?
Comment on attachment 8879271 [details] [diff] [review] fix Approval Request Comment [Feature/Bug causing the regression]: Compositor Process (firefox 53) [User impact if declined]: rare crash when the compositor process crashes during launch [Is this code covered by automated tests?]: [Has the fix been verified in Nightly?]: Yes [Needs manual test from QE? If yes, steps to reproduce]: No [List of other uplifts needed for the feature/fix]: [Is the change risky?]: No [Why is the change risky/not risky?]: This is not an invasive change, we're just checking state earlier to avoid a race. [String changes made/needed]: Note: I'm not sure if this patch will apply to beta, since this code was refactored recently. If not let me know and I will make a new patch.
Attachment #8879271 - Flags: approval-mozilla-beta?
Comment on attachment 8879271 [details] [diff] [review] fix fix a race condition leading to crash, beta55+
Attachment #8879271 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Needs a rebased patch.
Minimal fix for beta.
Attachment #8885993 - Flags: review?(rhunt)
(In reply to David Anderson [:dvander] from comment #11) > [Is this code covered by automated tests?]: > [Has the fix been verified in Nightly?]: Yes > [Needs manual test from QE? If yes, steps to reproduce]: No Setting qe-verify- based on David's assessment on manual testing needs.
You need to log in before you can comment on or make changes to this bug.