Closed
Bug 1279612
Opened 8 years ago
Closed 8 years ago
Near permafailing mda tests on Win7 debug on beta in test_webvtt_disabled.html | application crashed [@ mozalloc_abort(char const * const)]
Categories
(Core :: Audio/Video: GMP, defect, P1)
Tracking
()
RESOLVED
FIXED
mozilla52
People
(Reporter: KWierso, Assigned: cyu)
References
Details
(Keywords: assertion, crash, intermittent-failure)
Attachments
(1 file)
58 bytes,
text/x-review-board-request
|
mozbugz
:
review+
ritu
:
approval-mozilla-aurora+
ritu
:
approval-mozilla-beta+
|
Details |
This started on the uplift of 48 to beta. https://treeherder.mozilla.org/logviewer.html#?job_id=1152890&repo=mozilla-beta https://treeherder.mozilla.org/#/jobs?repo=mozilla-beta&fromchange=46d72a56c57dafb4dc1061d4741a3e1181ac3d68&filter-tier=1&filter-searchStr=c3a7383f47df96872e0be9293883f6536cc64f7b&selectedJob=1152890
Comment hidden (Intermittent Failures Robot) |
Updated•8 years ago
|
Component: Audio/Video → Audio/Video: Playback
Benjamin - can you look into this?
Flags: needinfo?(bechen)
Priority: -- → P2
Updated•8 years ago
|
Assignee: nobody → bechen
Flags: needinfo?(bechen)
Comment hidden (Intermittent Failures Robot) |
Comment 4•8 years ago
|
||
[Tracking Requested - why for this release]: Extremely high media test failure rate on OSX & Windows, especially on e10s, undermining confidence in what we're shipping. mochitest-media (especially e10s) on OSX/Win is a trainwreck on Beta at the moment. It makes me very concerned about the quality of what we're shipping to our users. The crash this bug was filed for is one of a few a different ones that are hitting nearly every push at the moment: https://treeherder.mozilla.org/logviewer.html#?job_id=1178363&repo=mozilla-beta#L21500 https://treeherder.mozilla.org/logviewer.html#?job_id=1175310&repo=mozilla-beta#L9472 https://treeherder.mozilla.org/logviewer.html#?job_id=1182251&repo=mozilla-beta#L29271 https://treeherder.mozilla.org/logviewer.html#?job_id=1187186&repo=mozilla-beta#L5180 https://treeherder.mozilla.org/logviewer.html#?job_id=1186497&repo=mozilla-beta#L29188 etc... https://treeherder.mozilla.org/#/jobs?repo=mozilla-beta&filter-searchStr=mda&fromchange=46d72a56c57dafb4dc1061d4741a3e1181ac3d68&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=runnable Seems like we have some serious media shutdown issues on 48 AFAICT.
Severity: normal → critical
status-firefox48:
--- → affected
tracking-firefox48:
--- → ?
Flags: needinfo?(ajones)
See Also: → 1238542
JW - is this a media shutdown issue?
Flags: needinfo?(ajones) → needinfo?(jwwang)
Priority: P2 → P1
Comment 7•8 years ago
|
||
It looks like an stl container error. https://treeherder.mozilla.org/logviewer.html#?job_id=1243597&repo=mozilla-beta#L8407 fatal: STL threw length_error: deque<T> too long stack: 0 mozglue.dll!mozalloc_abort(char const * const) [mozalloc_abort.cpp:2b22de79c849 : 33 + 0x2e] 1 mozglue.dll!abort_from_exception [msvc_raise_wrappers.cpp:2b22de79c849 : 18 + 0x8] 2 mozglue.dll!std::moz_Xlength_error(char const *) [msvc_raise_wrappers.cpp:2b22de79c849 : 36 + 0xd] 3 xul.dll!std::deque<MessageLoop::PendingTask,std::allocator<MessageLoop::PendingTask> >::_Growmap(unsigned int) [deque : 1782 + 0xb] Hi Nathan, Do you have any idea about "STL threw length_error: deque<T> too long" or should I ask some MFBT guys?
Flags: needinfo?(jwwang) → needinfo?(nfroyd)
Assignee | ||
Comment 8•8 years ago
|
||
MSVC STL implementation aborts if the container length exceeds 0x7fffffff, which is quite large. I think this is more likely a memory corruption.
Comment 9•8 years ago
|
||
Cervantes's comment sounds like a more likely cause of this problem; a queue in core IPC code growing to 2^31 elements sounds pretty unusual. It's possible that we just keep injecting messages into the queue without processing them, though? Some logging on that deque when it grows to 128 elements, 256 elements, etc. might be informative, along with an NS_ASSERTION(false): we might get some insight as to whether it's an overly long queue if the same stacks are showing up repeatedly. Or we might confirm the memory corruption hypothesis if we only get a few messages before crashing. Looking at the crash report shows that this is a non-main-thread queue that's supposedly overflowing. I bet that the crashing thread is a PBackground thread, since the main thread (Thread 0) certainly looks like it's waiting for a background thread to shut down. This crash is happening in the parent process, so perhaps the PBackground thread is stuck in some sort of error loop, constantly posting error messages (perhaps because it can't reach the child process?), and it can't make progress to unblock the main thread? ni? Kyle for PBackground eyes on this problem.
Flags: needinfo?(nfroyd) → needinfo?(khuey)
The crashing thread is the IPC thread, not the PBackground thread (note the base::MessagePumpForIO::DoRunLoop() on the stack). The main thread is definitely blocked waiting for the PBackground thread to shut down. It's not obvious to me which thread is the PBackground thread. None of the threads remaining are wedged, and none are obviously the PBackground thread. It's possible that it's not even running at all, because we keep spinning until there are no more actors left (http://hg.mozilla.org/releases/mozilla-beta/annotate/tip/ipc/glue/BackgroundImpl.cpp#l1244) but there is the timeout code just above that that should kick in.
Flags: needinfo?(khuey)
Comment 11•8 years ago
|
||
https://hg.mozilla.org/try/rev/1303f0a868c2b8b60ba406e38f28af33cd32ce11#l1.13 I add some sanity checks and assertions. https://treeherder.mozilla.org/#/jobs?repo=try&revision=7358d94e4b2c The tests show that we never have more than 512 tasks in the queue. Maybe it is really a memory corruption?
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 13•8 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=0f1512b7fd05 This sets X86 debug register to crash when the length variable of the incoming queue of the IPC thread is corrupted. Hopefully we should see the corruption in the crash stack.
Assignee | ||
Comment 14•8 years ago
|
||
Fix build bustage. https://treeherder.mozilla.org/#/jobs?repo=try&revision=0363e19bf430
Assignee | ||
Comment 15•8 years ago
|
||
On 48: https://treeherder.mozilla.org/#/jobs?repo=try&revision=477f56bccec3&selectedJob=23333260 Strangely, it still crashed like without the watchpoint.
Comment 16•8 years ago
|
||
https://hg.mozilla.org/try/rev/024cec644970bcf7e0ce018c94880c93da011e2e I add some logs to trace the life cycle of MessageLoop and find out... https://treeherder.mozilla.org/logviewer.html#?job_id=23334553&repo=try#L14694 Assertion failure: !mOnDelete->IsRevoked() (jw UAF! mWorkerLoop is about to delete!), at c:/builds/moz2_slave/try-w32-d-00000000000000000000/build/src/ipc/glue/MessageChannel.cpp:2119 It looks like |mWorkerLoop| becomes a dangling pointer and causes UAF. Hi Kyle, Since |MessageChannel::mWorkerLoop| is a raw pointer, do you know what ensures it is always valid?
Flags: needinfo?(khuey)
I don't think anything ensures it's always valid. The target thread/MessageLoop should outlive the channel. Can you figure out if the timer at http://searchfox.org/mozilla-central/rev/a7c8e9f3cc323fd707659175a46826ad12899cd1/ipc/glue/BackgroundImpl.cpp#1228 is firing?
Flags: needinfo?(khuey)
Comment 18•8 years ago
|
||
https://treeherder.mozilla.org/logviewer.html#?job_id=23402844&repo=try#L14536 22:33:33 INFO - jw schedule shutdown timer, closuer=002EF698 timer=10A9FFB0 ... 22:33:33 INFO - Assertion failure: !mOnDelete->IsRevoked() (jw UAF! mWorkerLoop is about to delete!), at c:/builds/moz2_slave/try-w32-d-00000000000000000000/build/src/ipc/glue/MessageChannel.cpp:2119 Can't find logs about timer fired.
Updated•8 years ago
|
Flags: needinfo?(khuey)
There's also "[Parent 1148] WARNING: pipe error: 109: file c:/builds/moz2_slave/try-w32-d-00000000000000000000/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 343". That is (presumably) the error that we're trying to handle when we crash. MSDN says system error code 109 is ERROR_BROKEN_PIPE, which makes sense. I would try to figure out how the PBackground thread's MessageLoop got destroyed, because that doesn't make any sense. The main thread is still spinning the event loop in ShutdownBackgroundThread so it shouldn't have called Shutdown() on it yet.
Flags: needinfo?(khuey)
Comment hidden (Intermittent Failures Robot) |
Comment 21•8 years ago
|
||
Benjamin and JW, How is this bug going?
Flags: needinfo?(jwwang)
Flags: needinfo?(bechen)
Comment 22•8 years ago
|
||
Per comment 19, Kyle is figuring out the timing of destroying the PBackground thread.
Flags: needinfo?(jwwang)
Flags: needinfo?(bechen)
Er, no, those were suggestions for you to try ... Anyways, I'm in Taipei this week, so let's look at it in person.
Comment 24•8 years ago
|
||
Can we find someone familiar with PBackground to look into this issue?
I'm the closest thing to that that exists.
Comment hidden (Intermittent Failures Robot) |
Comment 27•8 years ago
|
||
Too late for 48, can we make progress for 49?
Comment 28•8 years ago
|
||
Since Kyle is going, who can find someone else familiar with PBackground to take a look?
Comment 29•8 years ago
|
||
I know something about PBackground... perhaps billm also?
Assignee | ||
Comment 30•8 years ago
|
||
It appears that this crash is related to GMPParent. I added more debug code to the patch in comment #16 and found that the crash happens when the IPC transport notifies channel error after the GMP thread is already shut down. The try push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=1bb9adf0e8dc&selectedJob=25363170 The stack of opening an IPC channel with the crashing MessageLoop: 02:16:27 INFO - jw MessageChannel 198D8438 Open 12636CC0 ^^^^^^^^ this is the message loop address 02:16:27 INFO - [1704] #01: mozilla::gmp::GMPParent::LoadProcess() [dom/media/gmp/GMPParent.cpp:175] 02:16:27 INFO - [1704] #02: mozilla::gmp::GMPParent::EnsureProcessLoaded() [dom/media/gmp/GMPParent.cpp:626] 02:16:27 INFO - [1704] #03: mozilla::gmp::GMPServiceParent::RecvLoadGMP(nsCString const &,nsCString const &,nsTArray<nsCString> &&,nsTArray<unsigned long> &&,unsigned long *,nsCString *,unsigned int *,nsresult *) [dom/media/gmp/GMPServiceParent.cpp:1846] 02:16:27 INFO - [1704] #04: mozilla::gmp::PGMPServiceParent::OnMessageReceived(IPC::Message const &,IPC::Message * &) [obj-firefox/ipc/ipdl/PGMPServiceParent.cpp:297] 02:16:27 INFO - [1704] #05: mozilla::ipc::MessageChannel::DispatchSyncMessage(IPC::Message const &,IPC::Message * &) [ipc/glue/MessageChannel.cpp:1637] 02:16:27 INFO - [1704] #06: mozilla::ipc::MessageChannel::DispatchMessageW(IPC::Message &&) [ipc/glue/MessageChannel.cpp:1600] 02:16:27 INFO - [1704] #07: mozilla::ipc::MessageChannel::OnMaybeDequeueOne() [ipc/glue/MessageChannel.cpp:1572] 02:16:27 INFO - [1704] #08: nsRunnableMethodImpl<bool ( mozilla::ipc::MessageChannel::*)(void),0,1>::Run() [obj-firefox/dist/include/nsThreadUtils.h:751] 02:16:27 INFO - [1704] #09: mozilla::ipc::MessageChannel::DequeueTask::Run() [obj-firefox/dist/include/mozilla/ipc/MessageChannel.h:498] 02:16:27 INFO - [1704] #10: nsThread::ProcessNextEvent(bool,bool *) [xpcom/threads/nsThread.cpp:1067] 02:16:27 INFO - [1704] #11: NS_ProcessNextEvent(nsIThread *,bool) [xpcom/glue/nsThreadUtils.cpp:290] 02:16:27 INFO - [1704] #12: mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate *) [ipc/glue/MessagePump.cpp:384] 02:16:27 INFO - [1704] #13: MessageLoop::RunInternal() [ipc/chromium/src/base/message_loop.cc:241] 02:16:27 INFO - [1704] #14: MessageLoop::RunHandler() [ipc/chromium/src/base/message_loop.cc:235] 02:16:27 INFO - [1704] #15: MessageLoop::Run() [ipc/chromium/src/base/message_loop.cc:215] 02:16:27 INFO - [1704] #16: nsThread::ThreadFunc(void *) [xpcom/threads/nsThread.cpp:469] 02:16:27 INFO - [1704] #17: _PR_NativeRunThread [nsprpub/pr/src/threads/combined/pruthr.c:397] 02:16:27 INFO - [1704] #18: pr_root [nsprpub/pr/src/md/windows/w95thred.c:95] 02:16:27 INFO - [1704] #19: ucrtbase.DLL + 0x862a4 02:16:27 INFO - [1704] #20: kernel32.dll + 0x53c45 02:16:27 INFO - [1704] #21: ntdll.dll + 0x637f5 02:16:27 INFO - [1704] #22: ntdll.dll + 0x637c8 Then we the log entry 02:32:11 INFO - cyu shutdown the GMP thread. pid=3952 shows that the parent process shuts down the GMP thread, but later an error with IPC channel is notified: 02:32:11 INFO - mWorkerLoop is 12636CC0 02:32:11 INFO - Assertion failure: !mOnDelete->IsRevoked() (jw UAF! mWorkerLoop is about to delete!), at c:/builds/moz2_slave/try-w32-d-00000000000000000000/build/src/ipc/glue/MessageChannel.cpp:2132 02:32:11 INFO - [3996] #01: mozilla::ipc::ProcessLink::OnChannelError() [ipc/glue/MessageLink.cpp:418] 02:32:11 INFO - [3996] #02: base::MessagePumpForIO::WaitForIOCompletion(unsigned long,base::MessagePumpForIO::IOHandler *) [ipc/chromium/src/base/message_pump_win.cc:497] 02:32:11 INFO - [3996] #03: base::MessagePumpForIO::DoRunLoop() [ipc/chromium/src/base/message_pump_win.cc:441] 02:32:11 INFO - [3996] #04: base::MessagePumpWin::Run(base::MessagePump::Delegate *) [ipc/chromium/src/base/message_pump_win.h:80] 02:32:11 INFO - [3996] #05: MessageLoop::RunInternal() [ipc/chromium/src/base/message_loop.cc:241] 02:32:11 INFO - [3996] #06: MessageLoop::RunHandler() [ipc/chromium/src/base/message_loop.cc:235] 02:32:11 INFO - [3996] #07: MessageLoop::Run() [ipc/chromium/src/base/message_loop.cc:215] 02:32:11 INFO - [3996] #08: base::Thread::ThreadMain() [ipc/chromium/src/base/thread.cc:192] 02:32:11 INFO - [3996] #09: `anonymous namespace'::ThreadFunc [ipc/chromium/src/base/platform_thread_win.cc:29] 02:32:11 INFO - [3996] #10: kernel32.dll + 0x53c45 02:32:11 INFO - [3996] #11: ntdll.dll + 0x637f5 02:32:11 INFO - [3996] #12: ntdll.dll + 0x637c8 We need to find out which IPC channel opened on the GMP thread is left open when the thread is shut down.
Comment 31•8 years ago
|
||
cpearce, gerald, see comment 30 - GMP is involved here, and this may also be the cause of bug 1238542, which is apparently near permafail on beta (MSG not shutdown issues). This bug started about when 48 uplifted to beta.
Flags: needinfo?(gsquelart)
Flags: needinfo?(cpearce)
Assignee | ||
Comment 32•8 years ago
|
||
It appears that this is a race condition during shutdown. A GMPServiceParent instance is deleted asynchronously in its GMPServiceParent::ActorDestroy(), which depends on PGMPServiceChild::Close(). If the parent process observes "xpcom-shutdown-threads" earlier than the child process, it could shut down the GMP thread while the GMPServiceParent instance is still alive. Later when GMPServiceChild::Close() is called because the child process observes "xpcom-shutdown-threads" and Close() the IPC channel, the parent process will observe that and try to post a task to the already-shutdown GMP thread. I guess we need something similar to mozilla::layers::CompositorThreadHolder() for the GMP thread to wait until all IPC channels running on it are closed.
Comment 33•8 years ago
|
||
(In reply to Cervantes Yu [:cyu] [:cervantes] from comment #32) > I guess we need something similar to > mozilla::layers::CompositorThreadHolder() for the GMP thread to wait until > all IPC channels running on it are closed. You should look at AsyncShutdownBlocker (as used in MediaStreamGraph); this will hold shutdown at that stage of shutdown. (Note: currently that's backed out of beta, but will reland soon. It's enabled on aurora/nightly)
Comment 34•8 years ago
|
||
It looks like it is not caused by WebVTT, so un-assign Benjamin.
Assignee: bechen → nobody
Updated•8 years ago
|
Assignee: nobody → cpearce
Comment 35•8 years ago
|
||
Anthony, looks like Chris is out on PTO till Sept. 5th. Is there anyone else who can fix the tests before the beta-release merge next Monday, or before 49 release a week later?
Flags: needinfo?(ajones)
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #35) > Anthony, looks like Chris is out on PTO till Sept. 5th. Is there anyone else > who can fix the tests before the beta-release merge next Monday, or before > 49 release a week later? Blake - can you co-ordinate with JW and Cervantes to see what we can do here?
Flags: needinfo?(ajones) → needinfo?(bwu)
Comment 37•8 years ago
|
||
(In reply to Anthony Jones (:kentuckyfriedtakahe, :k17e) from comment #36) > (In reply to Liz Henry (:lizzard) (needinfo? me) from comment #35) > > Anthony, looks like Chris is out on PTO till Sept. 5th. Is there anyone else > > who can fix the tests before the beta-release merge next Monday, or before > > 49 release a week later? > > Blake - can you co-ordinate with JW and Cervantes to see what we can do here? JW is PTO today. Cervantes, Can you help us move this bug forward?
Flags: needinfo?(bwu) → needinfo?(cyu)
Comment 38•8 years ago
|
||
Is there anything left to do here? Now that bug 1239873 appears to be landed for good (not getting backed out from Beta via bug 1255737 every cycle), the media test shutdown issues are largely gone. Anyway, I think the issue is well-understood enough via bug 1255737 and deps that we can just close this out now.
Assignee | ||
Comment 39•8 years ago
|
||
I don't think bug 1239873 and 1255737 fix this bug. Those 2 bugs handles shutdown of the MediaStreanGraph thread, while this bug handles the bug of shutting down the GMP thread. I have a preliminary patch to avoid use-after-free of the GMP thread based on the previous analysis: https://treeherder.mozilla.org/#/jobs?repo=try&revision=124b32d6c992 . We should see no more GMP thread crashes with it. But note that win7 mda chunk has several crashes. I've seen MediaStreamGraph thread, PBackground thread and GMP thread shutdown crashes in this chunk. Those crashes will start showing up after the GMP thread crash is fixed.
Flags: needinfo?(cyu)
Comment 40•8 years ago
|
||
If I load TreeHerder on beta and filter by "job symbol": "mda", I hardly see this test_webvtt_disabled.html failure coming up as "near permafail". I've spotted 2 instances of this failures in the last 50 runs. https://treeherder.mozilla.org/#/jobs?repo=mozilla-beta&fromchange=e48f4467d705d26ddd69ef5ac1825ec973c30344&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=runnable&filter-job_type_symbol=mda&selectedJob=1475053 Of course mda test boxes do look alarmingly orange, it looks like most of these are other bugs in the MSG, i.e. they should be WebRTC or WebAudio related. That's not to say we shouldn't fix the bug; I'd like to see cyu land his fix. It just seems we should be focusing on the other failures in MSG will bring the orange rate down faster. For example bug 1238542, and the *test*peerConnection* failures.
Flags: needinfo?(cpearce)
Assignee | ||
Comment 42•8 years ago
|
||
The original crash seems to disappear from beta runs. I am not sure if some change elsewhere fixes or hides this crash. We need to check if the analysis in comment #32 is still valid. If yes then I'll move forward with the current patch. Otherwise, we can close this bug.
Flags: needinfo?(cyu)
Comment hidden (mozreview-request) |
Comment 44•8 years ago
|
||
Comment on attachment 8789755 [details] Bug 1279612 - Block xpcom-will-shutdown before GMPServiceParent instances are shut down. I'm on PTO, so defer this to Gerald.
Attachment #8789755 -
Flags: review?(cpearce) → review?(gsquelart)
Comment 45•8 years ago
|
||
mozreview-review |
Comment on attachment 8789755 [details] Bug 1279612 - Block xpcom-will-shutdown before GMPServiceParent instances are shut down. https://reviewboard.mozilla.org/r/77856/#review76500 r+ with comments addressed. ::: dom/media/gmp/GMPServiceParent.cpp:1874 (Diff revision 1) > +} > + > +NS_IMETHODIMP > +GeckoMediaPluginServiceParent::BlockShutdown(nsIAsyncShutdownClient*) > +{ > + Remove empty line. ::: dom/media/gmp/GMPServiceParent.cpp:1899 (Diff revision 1) > + MOZ_ASSERT(mServiceUserCount > 0); > + if (--mServiceUserCount == 0) { Are you sure about this assert & decrement? In ServiceUserCreated() above, you allow for a creation after shutdown, but don't modify the count for it (which makes sense). But then when that post-shutdown user gets destroyed, I believe this assertion will fail (either because the count is already 0; or you will decrement it once too many, so shutdown will be unblocked too soon and then the assertion will fail later on). If I am correct, I cannot think of an easy solution to this, other than keeping a list of post-shutdown creations (this list should stay empty is almost all cases anyway), so that you can actively ignore them when they get destroyed. ::: dom/media/gmp/GMPServiceParent.cpp:1945 (Diff revision 1) > + NS_DispatchToMainThread( > + NewRunnableMethod(mService.get(), > + &GeckoMediaPluginServiceParent::ServiceUserDestroyed), > + NS_DISPATCH_SYNC); Does this dispatch really need to be synchronous? Synchronous dispatching between GMPT and MT has been a source of shutdown hangs, so it's best to avoid them when possible! ServiceUserDestroyed doesn't need the object to still be alive, does it? (Even if you implement my proposal from the previous issue, you will just pass the 'this' pointer as an ID instead of a usable pointer, so it should still be fine to dispatch asynchronously -- I don't think an ABA problem is possible, for reasons that cannot fit in this margin.)
Attachment #8789755 -
Flags: review?(gsquelart) → review+
Assignee | ||
Comment 46•8 years ago
|
||
mozreview-review-reply |
Comment on attachment 8789755 [details] Bug 1279612 - Block xpcom-will-shutdown before GMPServiceParent instances are shut down. https://reviewboard.mozilla.org/r/77856/#review76500 > Are you sure about this assert & decrement? > > In ServiceUserCreated() above, you allow for a creation after shutdown, but don't modify the count for it (which makes sense). > But then when that post-shutdown user gets destroyed, I believe this assertion will fail (either because the count is already 0; or you will decrement it once too many, so shutdown will be unblocked too soon and then the assertion will fail later on). > > If I am correct, I cannot think of an easy solution to this, other than keeping a list of post-shutdown creations (this list should stay empty is almost all cases anyway), so that you can actively ignore them when they get destroyed. You are right. The assertion here is incorrect since post-shutdown creation is not handled corretly. Instead of handling the post-shutdown creation problem, I think it makes much more sense if we just deny the creation of GMPServiceParent in GMPServiceParent::Create() if GeckoMediaPluginServiceParent::mShuttingDown is true. After all, we are shutting down the system, and creating a new instances of GMPServiceParent after the "profile-change-teardown" phase doesn't makes much sense. Then we don't need to worry about the inconsistency between ServiceUserCreated() and ServiceUserDestroyed(). > Does this dispatch really need to be synchronous? Synchronous dispatching between GMPT and MT has been a source of shutdown hangs, so it's best to avoid them when possible! > ServiceUserDestroyed doesn't need the object to still be alive, does it? > (Even if you implement my proposal from the previous issue, you will just pass the 'this' pointer as an ID instead of a usable pointer, so it should still be fine to dispatch asynchronously -- I don't think an ABA problem is possible, for reasons that cannot fit in this margin.) It doesn't need to be synchornous. And if we deny the creation of GMPServiceParent in shutting down, then we don't event need to pass the pointer here.
Comment 47•8 years ago
|
||
mozreview-review-reply |
Comment on attachment 8789755 [details] Bug 1279612 - Block xpcom-will-shutdown before GMPServiceParent instances are shut down. https://reviewboard.mozilla.org/r/77856/#review76500 > You are right. The assertion here is incorrect since post-shutdown creation is not handled corretly. Instead of handling the post-shutdown creation problem, I think it makes much more sense if we just deny the creation of GMPServiceParent in GMPServiceParent::Create() if GeckoMediaPluginServiceParent::mShuttingDown is true. After all, we are shutting down the system, and creating a new instances of GMPServiceParent after the "profile-change-teardown" phase doesn't makes much sense. Then we don't need to worry about the inconsistency between ServiceUserCreated() and ServiceUserDestroyed(). Thank you. I'll want to review that, please!
Comment 48•8 years ago
|
||
mozreview-review |
Comment on attachment 8789755 [details] Bug 1279612 - Block xpcom-will-shutdown before GMPServiceParent instances are shut down. https://reviewboard.mozilla.org/r/77856/#review76792 Back to 'r?' so I can review the upcoming big change.
Attachment #8789755 -
Flags: review+ → review?(gsquelart)
Updated•8 years ago
|
Assignee: cpearce → cyu
Component: Audio/Video: Playback → Audio/Video: GMP
Comment hidden (mozreview-request) |
Comment 50•8 years ago
|
||
mozreview-review |
Comment on attachment 8789755 [details] Bug 1279612 - Block xpcom-will-shutdown before GMPServiceParent instances are shut down. https://reviewboard.mozilla.org/r/77856/#review78378 Sorry for the delay, I didn't notice the new revision. I'm usually quick to review, so please buzz me next time you don't hear from me within a day. r+ with concern addressed: ::: dom/media/gmp/GMPServiceParent.cpp:2118 (Diff revision 2) > + > nsAutoPtr<GMPServiceParent> serviceParent(new GMPServiceParent(gmp)); > > nsCOMPtr<nsIThread> gmpThread; > nsresult rv = gmp->GetThread(getter_AddRefs(gmpThread)); > - NS_ENSURE_SUCCESS(rv, nullptr); > + MOZ_RELEASE_ASSERT(NS_SUCCEEDED(rv)); Why change this from a mild test-and-return-null to a very strong assert-and-crash-everywhere? Not being in shutdown does not guarantee that gmp->GetThread will always succeed, e.g.: NS_NewNamedThread could fail (very unlikely, but we should still handle this gracefully where possible). If you want to prevent an unnecessary GMPServiceParent construction&destruction in this case, I think you could move the creation from line 2114 to after this line.
Attachment #8789755 -
Flags: review?(gsquelart) → review+
Assignee | ||
Comment 51•8 years ago
|
||
mozreview-review-reply |
Comment on attachment 8789755 [details] Bug 1279612 - Block xpcom-will-shutdown before GMPServiceParent instances are shut down. https://reviewboard.mozilla.org/r/77856/#review78378 > Why change this from a mild test-and-return-null to a very strong assert-and-crash-everywhere? > Not being in shutdown does not guarantee that gmp->GetThread will always succeed, e.g.: NS_NewNamedThread could fail (very unlikely, but we should still handle this gracefully where possible). > > If you want to prevent an unnecessary GMPServiceParent construction&destruction in this case, I think you could move the creation from line 2114 to after this line. From GeckoMediaPluginService::GetThread(), it looks to me that if we are not in shutdown, then we should be able to get the thread (except when we fail to create the thread). But I think you are right that it the wrong choice to place an assertion here because the caller won't have any chance to handle the failure, even if the failure is very likely. And it's wrong that the caller method of GeckoMediaPluginService::GetThread() crosses the boundary to look at the implementation and place an assertion based on it. So I'll use NS_ENSURE_SUCCESS() here.
Comment hidden (mozreview-request) |
Comment 53•8 years ago
|
||
Pushed by cyu@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/b82fa1825412 Block xpcom-will-shutdown before GMPServiceParent instances are shut down. r=gerald
Comment 54•8 years ago
|
||
Backed out in https://hg.mozilla.org/integration/autoland/rev/79604052ba79 for static analysis bustage, https://treeherder.mozilla.org/logviewer.html#?job_id=3849813&repo=autoland
Assignee | ||
Comment 55•8 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=caba143550c7
Comment hidden (mozreview-request) |
Comment 57•8 years ago
|
||
Pushed by cyu@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/583124bcd4d9 Block xpcom-will-shutdown before GMPServiceParent instances are shut down. r=gerald
Reporter | ||
Comment 58•8 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/583124bcd4d9
Status: NEW → RESOLVED
Closed: 8 years ago
status-firefox52:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla52
Comment 59•8 years ago
|
||
Not sure which branches (if any) you want to backport this to, but please fill out the approval requests whenever you decide :)
status-firefox50:
--- → affected
status-firefox51:
--- → affected
Flags: needinfo?(gsquelart) → needinfo?(cyu)
Assignee | ||
Comment 60•8 years ago
|
||
Comment on attachment 8789755 [details] Bug 1279612 - Block xpcom-will-shutdown before GMPServiceParent instances are shut down. Approval Request Comment [Feature/regressing bug #]: 1057908 [User impact if declined]: Potential browser crash during shutdown after the user has ever viewed DRM-wrapped content. [Describe test coverage new/current, TreeHerder]: Patch landed on m-c. It was also tested on try on the beta branch. Also performed manual tests with code modification trying to trigger the bug. [Risks and why]: Low to medium. This is a fix of a timing issue during shutdown. We carefully reviewed and tested the patch. It's unlikely to have side effects. [String/UUID change made/needed]: None.
Flags: needinfo?(cyu)
Attachment #8789755 -
Flags: approval-mozilla-beta?
Attachment #8789755 -
Flags: approval-mozilla-aurora?
Comment on attachment 8789755 [details] Bug 1279612 - Block xpcom-will-shutdown before GMPServiceParent instances are shut down. Makes the GMP shutdown more robust, Aurora51+, Beta50+
Attachment #8789755 -
Flags: approval-mozilla-beta?
Attachment #8789755 -
Flags: approval-mozilla-beta+
Attachment #8789755 -
Flags: approval-mozilla-aurora?
Attachment #8789755 -
Flags: approval-mozilla-aurora+
Reporter | ||
Comment 62•8 years ago
|
||
bugherder uplift |
https://hg.mozilla.org/releases/mozilla-aurora/rev/c796e9b24f3c
Reporter | ||
Comment 63•8 years ago
|
||
bugherder uplift |
https://hg.mozilla.org/releases/mozilla-beta/rev/118b9b4f5473
You need to log in
before you can comment on or make changes to this bug.
Description
•