Closed
Bug 1223193
Opened 8 years ago
Closed 8 years ago
crash in mozilla::layers::ImageBridgeChild::RemoveTextureFromCompositableAsync
Categories
(Core :: Graphics: Layers, defect)
Tracking
()
RESOLVED
FIXED
mozilla46
People
(Reporter: u279076, Unassigned)
References
Details
(Keywords: crash, topcrash-win)
Crash Data
Attachments
(2 files)
1.24 KB,
patch
|
nical
:
review+
ritu
:
approval-mozilla-aurora+
ritu
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
4.04 KB,
patch
|
nical
:
review+
ritu
:
approval-mozilla-aurora+
ritu
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
This bug was filed from the Socorro interface and is report bp-fb40c0ff-9da5-4289-bd13-285512151108. ============================================================= 0 xul.dll mozilla::layers::ImageBridgeChild::RemoveTextureFromCompositableAsync(mozilla::layers::AsyncTransactionTracker*, mozilla::layers::CompositableClient*, mozilla::layers::TextureClient*) gfx/layers/ipc/ImageBridgeChild.cpp 1 xul.dll mozilla::layers::ImageClient::RemoveTextureWithWaiter(mozilla::layers::TextureClient*, mozilla::layers::AsyncTransactionWaiter*) gfx/layers/client/ImageClient.cpp 2 xul.dll mozilla::layers::ImageClientSingle::FlushAllImages(mozilla::layers::AsyncTransactionWaiter*) gfx/layers/client/ImageClient.cpp 3 xul.dll mozilla::layers::FlushAllImagesSync gfx/layers/ipc/ImageBridgeChild.cpp 4 xul.dll RunnableFunction<void (*)(mozilla::layers::ImageClient*, mozilla::layers::ImageContainer*, mozilla::layers::AsyncTransactionWaiter*), Tuple3<mozilla::layers::ImageClient*, mozilla::layers::ImageContainer*, RefPtr<mozilla::layers::AsyncTransactionWaiter> > >::Run() ipc/chromium/src/base/task.h 5 xul.dll base::MessagePumpDefault::Run(base::MessagePump::Delegate*) ipc/chromium/src/base/message_pump_default.cc 6 xul.dll MessageLoop::RunHandler() ipc/chromium/src/base/message_loop.cc 7 xul.dll MessageLoop::Run() ipc/chromium/src/base/message_loop.cc 8 xul.dll base::Thread::ThreadMain() ipc/chromium/src/base/thread.cc 9 xul.dll `anonymous namespace'::ThreadFunc(void*) ipc/chromium/src/base/platform_thread_win.cc 10 kernel32.dll BaseThreadInitThunk 11 ntdll.dll RtlUserThreadStart 12 kernel32.dll BasepReportFault 13 kernel32.dll BasepReportFault ============================================================= More reports: https://crash-stats.mozilla.com/report/list?product=Firefox&signature=mozilla%3A%3Alayers%3A%3AImageBridgeChild%3A%3ARemoveTextureFromCompositableAsync
[Tracking Requested - why for this release]: this crash first shows up in Firefox 43 and is currently the #20 crash in Beta. There are no strong device/driver correlations. Facebook.com, about:home, and about:newtab rank high in the URLs.
status-firefox42:
--- → unaffected
status-firefox43:
--- → affected
status-firefox44:
--- → affected
status-firefox45:
--- → affected
tracking-firefox43:
--- → ?
OK, tracking for 43. Looks like this first was reported on 2015-10-07. So the regression may have been around then.
This is still frequent in 43 late betas, around the #20 topcrash. There are several other high volume crash signatures containing "mozilla::layers::ImageBridgeChild", maybe related? If they are related then this is probably our top crash on beta.
Here's a link to the beta 8 topcrashes: https://crash-stats.mozilla.com/topcrasher/products/Firefox/versions/43.0b8
Comment 5•8 years ago
|
||
The crash reports show that the crash happens during or after gfx is shut(ting) down (from another thread). The fix for this and the long tail of similar shutdown crashes is to ensure that media shuts down completely before gfx (which can't be shut down any later because the destruction of things gfx relies on happens right after gfx's shut down).
Updated•8 years ago
|
Component: Graphics: Layers → Audio/Video: Playback
[Tracking Requested - why for this release]: this is now the 9th topcrash in Release accounting for 1.06% of crashes reported against Firefox 43.0.1.
Thanks Anthony. We should probably track for 44+ but it's too late for a fix for 43. jya is this something you could have a look at? Or can you help us find an owner for this bug?
Flags: needinfo?(jyavenard)
Comment hidden (obsolete) |
Comment 9•8 years ago
|
||
from comment 5, looks like gfx has a different opinion on the matter, so likely this bug will just bounce back and forth between teams :)
Comment 10•8 years ago
|
||
is there a way to test if gfx is shut down? would be the easiest work around ensuring gfx isn't called when that happens
Flags: needinfo?(nical.bugzilla)
Comment 11•8 years ago
|
||
Upon investigating, the relation between ImageClient, TextureClient and ImageBridgeChild isn't threadsafe either and a race is possible. It's when things are being shutdown while tasks are pending that an issue can occur (not just because the media stack could continue to use gfx after shutdown). I can't see any easy fix here, maybe a quick work around.
Comment 12•8 years ago
|
||
bug 1209724 didn't paper over the issue unfortunately. ImageBridgeChild::sImageBridgeChildSingleton is deleted too late in the process. (btw access to sImageBridgeChildSingleton isn't thread-safe either)
See Also: → 1209724
Comment 13•8 years ago
|
||
Fundamentally unnecessary, because this member is only used for debug build, but I had thought of using this value to detect if we are shut down or not. so seeing that it doesn't hurt..
Attachment #8702804 -
Flags: review?(sotaro.ikeda.g)
Attachment #8702804 -
Flags: review?(nical.bugzilla)
Comment 14•8 years ago
|
||
This really is the same logic as used in bug 1209724. The two bugs really are identical just crashing differently depending on the timing of when the race occurred. By the time sImageBridgeChildThread is cleared, shutdown has already started and if a proxy call is made on the ImageBridgeChild at this time you will get this crash. We can’t use sImageBridgeChildSingleton->mShuttingDown to detect if shutting down is happening for two reasons: 1- Accessing sImageBridgeChildSingleton itself isn’t thread-safe (this needs to be addressed separately) 2- ImageBridgeChild::mShuttingDown is set too late in the process, it's really ImageBridgeChild::mShutDown. I could have replaced !ImageBridgeChild::IsCreated() with ImageBridgeChild::IsShutDown() entirely, but I felt that it didn't matter and it's clearer that way. Seeing that everything is being shut down at the same time (be it gfx, media or what else), and we can't really easily alter the timing of everything; this is the easiest way to proceed at this stage. Nical, Sotaro whichever wants to review this first. Too bad it won't be in 43, the problem could have been foreseen.
Flags: needinfo?(nical.bugzilla)
Attachment #8702805 -
Flags: review?(sotaro.ikeda.g)
Attachment #8702805 -
Flags: review?(nical.bugzilla)
Comment 15•8 years ago
|
||
(In reply to Jean-Yves Avenard [:jya] from comment #14) > 1- Accessing sImageBridgeChildSingleton itself isn’t thread-safe (this needs > to be addressed separately) It doesn't matter. sImageBridgeChildSingleton is assigned at startup before users of ImageBridgeChild are created and cleared at shutdown after all users of ImageBridge *must* have been destoyed. While the latter doesn't hold true we have plenty of other races conditions so adding synchronization primitives around the singleton pointer just paper over some of the issue. > Seeing that everything is being shut down at the same time (be it gfx, media > or what else), and we can't really easily alter the timing of everything; > this is the easiest way to proceed at this stage. I'm fine with r+ing this, but it is really papering over part of a larger issue. I am the author of many similar shutdown workarounds, I have been piling them up for a long while and new problems keep showing up. All of these issues would not exist if code that depends on gfx (especially the ipc stuff) would be entirely shut down before gfx. > > Nical, Sotaro whichever wants to review this first. > Too bad it won't be in 43, the problem could have been foreseen. The problem has been foreseen for quite a while, but my limited knowledge of MediaShutdownManager hasn't been enough for me to figure out why most of the media resources are synchronously destroyed when XPCOM_SHUTDOWN_OBSERVER_ID is fired (which is exactly what we need), but not all of it (where the problems arise). Hard to get help on IRC because timezones and everybody being busy (for very good reasons like EME, etc). Honestly, my biggest mistake was to start papering over shutdown races years ago, thinking a quick fix in gfx would be easier and sufficient while I should have put some strict assertions in place before things got out of hand. I am sorry to make it look like I am bouncing gfx issues toward other teams, but after trying to make gfx stuff not crash when several threads are poking at it while it is shutting down for a long while without great success, I think we really need to look for another approach.
Updated•8 years ago
|
Attachment #8702804 -
Flags: review?(nical.bugzilla) → review+
Updated•8 years ago
|
Attachment #8702805 -
Flags: review?(nical.bugzilla) → review+
Updated•8 years ago
|
Attachment #8702804 -
Flags: review?(sotaro.ikeda.g)
Updated•8 years ago
|
Attachment #8702805 -
Flags: review?(sotaro.ikeda.g)
Comment 16•8 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/327f2836d54a https://hg.mozilla.org/integration/mozilla-inbound/rev/7fd666d36af5
Comment 17•8 years ago
|
||
> It doesn't matter. sImageBridgeChildSingleton is assigned at startup before > users of ImageBridgeChild are created and cleared at shutdown after all > users of ImageBridge *must* have been destoyed. While the latter doesn't > hold true we have plenty of other races conditions so adding synchronization > primitives around the singleton pointer just paper over some of the issue. Well, it is accessed :) I guess I will have to disagree with you in regards to "harmless" data races. We can't know what the compiler will generate as the C++ behaviour is undefined. > I'm fine with r+ing this, but it is really papering over part of a larger > issue. I am the author of many similar shutdown workarounds, I have been > piling them up for a long while and new problems keep showing up. All of > these issues would not exist if code that depends on gfx (especially the ipc > stuff) would be entirely shut down before gfx. > > > > > Nical, Sotaro whichever wants to review this first. > > Too bad it won't be in 43, the problem could have been foreseen. > > The problem has been foreseen for quite a while, but my limited knowledge of > MediaShutdownManager hasn't been enough for me to figure out why most of the > media resources are synchronously destroyed when XPCOM_SHUTDOWN_OBSERVER_ID > is fired (which is exactly what we need), but not all of it (where the > problems arise). Hard to get help on IRC because timezones and everybody > being busy (for very good reasons like EME, etc). I don't know anything about MediaShutdownManager either :) The media stack readers (in those backtrace, mostly MediaFormatReader) are working asynchronously. They return a MozPromise when shutting down which will be resolved once shutdown has completed. Assuming the media is the only thing left accessing gfx, we will want to complete gfx shutdown only once all those MozPromises have been resolved. But having said that, we have multiple threads working concurrently within gfx, all using weak references to their parameters. Everything relies on things happening in a particular order when really there's no guarantee that it will happen in that order. The gfx elements used by the media stack are ref-counted, so it's possible for the gfx stack to know that there are elements still alive. Unfortunately, as almost everything uses weak ptr, it becomes impossible to track what is freed and what isn't. > Honestly, my biggest mistake was to start papering over shutdown races years > ago, thinking a quick fix in gfx would be easier and sufficient while I > should have put some strict assertions in place before things got out of > hand. indeed. Didn't you say you had started a complete rewrite of this? The whole TextureClient/TextureServer, the recycling of D3D objects, all of this appears racy to me. The use of multiple threads with no synchronisation or monitors; plenty of objects accessed concurrently: makes it all fragile. > I am sorry to make it look like I am bouncing gfx issues toward other teams, > but after trying to make gfx stuff not crash when several threads are poking > at it while it is shutting down for a long while without great success, I > think we really need to look for another approach. Yeah, we need a different approach so we can guarantee that gfx will be the last thing to shut down. Or maybe make gfx the owner of all gfx related objects (which from the media point of view it already looks that way. Media never creates a gfx element (be it VideoFrameContainer, layers backend etc..), it is only ever passed a reference to them. It never destroys them either, at worse it calls VideoFrameContainer::ClearCurrentFrame ; seeing that the gfx is the owner, should we stop calling this?
Comment 18•8 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/327f2836d54a https://hg.mozilla.org/mozilla-central/rev/7fd666d36af5
Status: NEW → RESOLVED
Closed: 8 years ago
status-firefox46:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla46
Nical, should we uplift this fix to Beta44? I looked at the last few days of data on this crash signature but it doesn't seem to occur on Nightly46 so we don't have a verification to go by. However, given that this is a top crash on 44.0b2/b4, it seems like uplifting is still a good idea.
Flags: needinfo?(nical.bugzilla)
Comment 20•8 years ago
|
||
I have managed to reproduce another shutdown race in a VM rather consistently.
Comment 21•8 years ago
|
||
Bug 1236384 is fixing the underlying cause for all those problems, which as Nicolas described is due to the improper handling of the MediaShutdownManager. So Nicolas my apologies, this issue is really a Audio/Video playback one.
Comment 22•8 years ago
|
||
Comment on attachment 8702804 [details] [diff] [review] 0001-Bug-1223193-P1.-Make-ImageBridgeChild-mShuttingDown-.patch Approval Request Comment [Feature/regressing bug #]: [User impact if declined]: shutdown crashes [Describe test coverage new/current, TreeHerder]: [Risks and why]: low risk, only affect shutdown in cases where we are already in a bad situation today. [String/UUID change made/needed]:
Flags: needinfo?(nical.bugzilla)
Attachment #8702804 -
Flags: approval-mozilla-beta?
Attachment #8702804 -
Flags: approval-mozilla-aurora?
Comment 23•8 years ago
|
||
Comment on attachment 8702805 [details] [diff] [review] 0002-Bug-1223193-P2.-Cancel-ImageBridge-proxy-functions-i.patch Approval Request Comment Same as the previous patch.
Attachment #8702805 -
Flags: approval-mozilla-beta?
Attachment #8702805 -
Flags: approval-mozilla-aurora?
Comment 24•8 years ago
|
||
(In reply to Jean-Yves Avenard [:jya] from comment #21) > So Nicolas my apologies, this issue is really a Audio/Video playback one. No problem, a million thanks for your patch in bug 1207220!
Comment on attachment 8702804 [details] [diff] [review] 0001-Bug-1223193-P1.-Make-ImageBridgeChild-mShuttingDown-.patch Top crash fix, Beta44+, Aurora45+
Attachment #8702804 -
Flags: approval-mozilla-beta?
Attachment #8702804 -
Flags: approval-mozilla-beta+
Attachment #8702804 -
Flags: approval-mozilla-aurora?
Attachment #8702804 -
Flags: approval-mozilla-aurora+
Comment on attachment 8702805 [details] [diff] [review] 0002-Bug-1223193-P2.-Cancel-ImageBridge-proxy-functions-i.patch Beta44+, Aurora45+
Attachment #8702805 -
Flags: approval-mozilla-beta?
Attachment #8702805 -
Flags: approval-mozilla-beta+
Attachment #8702805 -
Flags: approval-mozilla-aurora?
Attachment #8702805 -
Flags: approval-mozilla-aurora+
Comment 27•8 years ago
|
||
bugherder uplift |
https://hg.mozilla.org/releases/mozilla-aurora/rev/8d5090df5068 https://hg.mozilla.org/releases/mozilla-aurora/rev/3f73d822655e
Comment 28•8 years ago
|
||
bugherder uplift |
https://hg.mozilla.org/releases/mozilla-beta/rev/0ea89d1db175 https://hg.mozilla.org/releases/mozilla-beta/rev/cf6974b88731
Comment 29•8 years ago
|
||
bugherder uplift |
https://hg.mozilla.org/releases/mozilla-b2g44_v2_5/rev/0ea89d1db175 https://hg.mozilla.org/releases/mozilla-b2g44_v2_5/rev/cf6974b88731
status-b2g-v2.5:
--- → fixed
Updated•8 years ago
|
You need to log in
before you can comment on or make changes to this bug.
Description
•