Closed Bug 1261321 Opened 4 years ago Closed 4 years ago
crash in mozilla::layers::Compositable
[Tracking Requested - why for this release]: newly introduced crash in 45 This bug was filed from the Socorro interface and is report bp-61743e6e-3cd2-4e8e-b8ac-a9adc2160320. ============================================================= Crashing Thread (20) Frame Module Signature Source 0 xul.dll mozilla::layers::CompositableClient::IsConnected() gfx/layers/client/CompositableClient.cpp 1 xul.dll mozilla::layers::CompositableClient::Destroy() gfx/layers/client/CompositableClient.cpp 2 xul.dll mozilla::layers::ImageBridgeShutdownStep1 gfx/layers/ipc/ImageBridgeChild.cpp 3 xul.dll RunnableFunction<void (*)(mozilla::layers::ImageClient*, mozilla::layers::PImageContainerChild*), mozilla::Tuple<mozilla::layers::ImageClient*, mozilla::layers::PImageContainerChild*> >::Run() ipc/chromium/src/base/task.h 4 xul.dll MessageLoop::DoWork() ipc/chromium/src/base/message_loop.cc 5 xul.dll base::MessagePumpDefault::Run(base::MessagePump::Delegate*) ipc/chromium/src/base/message_pump_default.cc 6 xul.dll MessageLoop::RunHandler() ipc/chromium/src/base/message_loop.cc 7 xul.dll MessageLoop::Run() ipc/chromium/src/base/message_loop.cc 8 xul.dll base::Thread::ThreadMain() ipc/chromium/src/base/thread.cc 9 xul.dll mozilla::LazyIdleThread::LazyIdleThread(unsigned int, nsACString_internal const&, mozilla::LazyIdleThread::ShutdownMethod, nsIObserver*) xpcom/threads/LazyIdleThread.cpp 10 @0x164ef86f 11 ntdll.dll __RtlUserThreadStart 12 ntdll.dll _RtlUserThreadStart this is a new crash signature in firefox 45 builds and upwards which has apparently been introduced by the work in bug 121976. it is mid- to low-level in volume - on 45.0.1 it is #83 & making up 0.15% of all crashes.
Shutdown bug. Apparently, ImageClient::FromIPDLActor returned nullptr in ImageBridgeShutdownStep1, which is somewhat unexpected. A simple null-check will be enough to fix the crash, although I'd like to understand how we got in this state.
Assignee: nobody → nical.bugzilla
Let's get a simple patch "just stop the crash" ready and into review while we're figuring out the cause?
This is blocking a wrong bug - missing a digit. Do we know the actual bug number?
sorry, that was bug 1219761 - got the wrong number from https://hg.mozilla.org/mozilla-central/rev/932318a4e150
(In reply to Milan Sreckovic [:milan] from comment #2) > Let's get a simple patch "just stop the crash" ready and into review while > we're figuring out the cause? I am working on a fix that should fix this in a reliable way. If we just null-check we'll run into a similar problem as bug 1241875. It's easier to fix this race condition for CompoaitableClient than TextureClient so I should have a patch ready tomorrow.
The fix is getting a bit more involved than I'd like to but there are several issues at play and fixing one mostly makes us fall on the others. The root of the problem, from a high level perspective, is that we have written the gfx ipdl actors with the idea that each actors is fully owned by one object (for example CompositableClient fully owns CompositableChild) and that object's destruction drives the actor's destruction. This is mostly true, except when IPDL shuts down and forces the destruction of the actor. Orchestrating this all becomes a bit complicated when threads get involved. CompositableClient and TextureClient have the same problems (Well, TextureClient's case is more complex but I've tried to leave that out for a followup patch as much as possible), so it makes sense to fix this at the ChildActor<T> level (TextureChild and COmpositableChild both inherit from it). This patch does a bunch of things: - rename DestroyIPDLActor -> DeallocIPDLActor to avoid confusion with the Destroy() methods which involve doing the destruction handshake and sending messages, while DeallocIPDLActor is the hook that gets called when IPDL wants to delete the actor (at this point we can't send messages). - expose "ReleaseActor" rather than "Destroy" to make it clear that after this is called the TextureClient/CompositableClient should NOT keep their pointer to the actor (which may just have been deleted). - track what happens and in which order for the following things: - ***Client is destroyed and calls ReleaseActor - IPDL shuts down (normal or abnormal shutdown) - the ***Child actor's destruction handshake is complete And make sure that the actor is deleted at the right time, ideally when the destruction handshake finishes, if and only if the ***Client has already released its reference to the the ***Child actor (this is where most of the complexity is). When all the pieces come together other crashes like bug 1241875 should get fixed as well.
Getting the more-correct-but-complicated approach to work is proving harder than I expected, so in the mean-time, let's just null-check to avoid some of the crashes. I'll pursue the other approach in another bug.
Comment on attachment 8738626 [details] [diff] [review] Null check Review of attachment 8738626 [details] [diff] [review]: ----------------------------------------------------------------- Do we want a warning when compositable is null?
Attachment #8738626 - Flags: review?(jnicol) → review+
(In reply to Jamie Nicol [:jnicol] from comment #8) > Do we want a warning when compositable is null? This is going to change soon-ish, and there will be better places to report what puts us in this state. So for now let's just paper over the null dereference.
[Tracking Requested - why for this release]: newly introduced crash in 45
Nicolas, could you fill the uplift request for this crash? The volume is quite important. Thanks
Comment on attachment 8738626 [details] [diff] [review] Null check Approval Request Comment [Feature/regressing bug #]: [User impact if declined]: crashes [Describe test coverage new/current, TreeHerder]: none [Risks and why]: just the simples null-check. Absolutely no risk [String/UUID change made/needed]: none
Comment on attachment 8738626 [details] [diff] [review] Null check [Triage Comment] Crash fix, recent regression, let's take this fix for 46. Please uplift to aurora, beta, m-r.
This would land in an RC2 for 46, later this week. Too late for rc1
Attachment #8738626 - Flags: approval-mozilla-esr45+
You need to log in before you can comment on or make changes to this bug.