Closed Bug 938107 Opened 6 years ago Closed 6 years ago

Hang at shutdown since Firefox25


(Core :: Audio/Video, defect, critical)

25 Branch
Not set



Tracking Status
firefox25 --- wontfix
firefox26 + wontfix
firefox27 + verified
firefox28 + verified
firefox29 --- verified
firefox-esr24 --- unaffected


(Reporter: alice0775, Assigned: cpearce)




(Keywords: hang, regression, reproducible)


(3 files, 2 obsolete files)

Build Identifier:
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:28.0) Gecko/20100101 Firefox/28.0 ID:20131112030204

stack bp-afb2277f-0887-41aa-b91b-930172131113.

Reproducible: always

Steps To Reproduce:
1. Start Firefox with clean profile
2. Open
3. Exit browser(Ex. Ctrl+F4) as soon as the playback of the sound began

Actual Results:
Process of firefox.exe remains forever in Windows Task Manager

Expected Results:
Process of firefox.exe should be quit immediately
Version: 24 Branch → 25 Branch
Correct STR

> 3. Exit browser(Ex. Ctrl+F4) as soon as the playback of the sound began

3. Exit browser(Ex. Ctrl+F4) at 0:02 the playback time
Regression window(m-c)
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:25.0) Gecko/20130704 Firefox/25.0 ID:20130707172141
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:25.0) Gecko/20130708 Firefox/25.0 ID:20130708031114

Regression window(m-i)
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:25.0) Gecko/20130704 Firefox/25.0 ID:20130707103844
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:25.0) Gecko/20130707 Firefox/25.0 ID:20130707132744

Regressed by: Bug 803414
Blocks: 803414
Component: General → Video/Audio
Keywords: regression
There is absolutely no way bug 803414 caused this regression. bug 803414 introduces new WebAPI-specific code for the Media Recording API with opus files which shouldn't have any effect on mp3 at all.
No longer blocks: 803414
(In reply to Jason Smith [:jsmith] from comment #3)
> There is absolutely no way bug 803414 caused this regression. bug 803414
> introduces new WebAPI-specific code for the Media Recording API with opus
> files which shouldn't have any effect on mp3 at all.

Then, Bug 744115 may triggers
Blocks: 744115
Component: Video/Audio → XPCOM

to ensure that Bug 744115 is actually the "bad one", could you try with a new build without those three lines?

Is this bug only present on windows (7 or not)?

I can reproduce the shutdown hang in Ubuntu
Mozilla/5.0 (X11; Linux x86_64; rv:28.0) Gecko/20100101 Firefox/28.0 ID:20131113030205
OS: Windows 7 → All
I built from m-c tip on windows7 64bit PC.
And I confirmed that the attached patch fixed the hang.

hang: 7b014f0f3b03
Not hang: 7b014f0f3b03 and apply the attached patch
Ok so Bug 744115 caused the regression...
asking to bsmedberg as he reviewed my patch.
Flags: needinfo?(benjamin)
So clearly *something* is relying on being able to dispatch events after it shouldn't be possible. I don't know what that is, although media code is the obvious suspect here given the testcase.

option A is to make this *more* of an error by crashing using NS_RUNTIMEABORT (instead of return NS_ERROR_ILLEGAL_DURING_SHUTDOWN).
option B is to debug this locally, set a breakpoint on the return line, and see when it is hit.
(option C, back it out)

Arnaud, can you try the local debugging option first and see if we can get the stack of the bad interaction here? We probably can't do option A until we've figured out the issue that's breaking here.
Flags: needinfo?(benjamin)

i'm sorry but since a few weeks i can't work anymore on firefox cause i'm not often enough at home...
i'm following this ticket as it's involving one of my patch.
we need to find somebody else to do this debug test.
Alternative url in

Maybe a lot of site is affected.

This problem is very annoying.

So, until Arnaud came back, 
Please take option C of comment#9.
Attached file stack
Media decoder thread:

1383f7b8 56ad5f8d 00f54fa0 00000001 1383f7d7 xul!nsThread::ProcessNextEvent+0x4bc
1383f7cc 569f970c 00f54fa0 00000001 09428480 xul!NS_ProcessNextEvent+0x2d
1383f7f4 5780ebc9 09336120 09428488 09428480 xul!nsThread::Shutdown+0xff
1383f80c 578ee225 12f3f334 094284b8 1383f898 xul!mozilla::MediaDecoderStateMachine::StopAudioThread+0x59
1383f864 578eedc8 12fb4000 12f54fa0 09428480 xul!mozilla::MediaDecoderStateMachine::RunStateMachine+0x67
1383f898 578eee18 09428480 12f54fcc 1383f924 xul!mozilla::MediaDecoderStateMachine::CallRunStateMachine+0x58
1383f8a8 56aac4cb 12f3f334 12f54fcc 12f54fa0 xul!mozilla::MediaDecoderStateMachine::Run+0x21
1383f924 56b1dbaf 01f54fa0 00000001 1383f957 xul!nsThread::ProcessNextEvent+0x3cb

Main thread:
003ef26c 5c31f321 nss3!_PR_WaitCondVar(struct PRThread * thread = 0x00b0f140, struct PRCondVar * cvar = 0x00000000, struct PRLock * lock = 0x00b38080, unsigned int timeout = 0xffffffff)+0x58 [e:\builds\moz2_slave\m-cen-w32-ntly-000000000000000\build\nsprpub\pr\src\threads\combined\prucv.c @ 173]
003ef288 56aac36f nss3!PR_Wait(struct PRMonitor * mon = 0x00b01360, unsigned int ticks = 0xffffffff)+0x51 [e:\builds\moz2_slave\m-cen-w32-ntly-000000000000000\build\nsprpub\pr\src\threads\prmon.c @ 154]
003ef308 56ad5f8d xul!nsThread::ProcessNextEvent(bool mayWait = true, bool * result = 0x003ef327)+0x26f [e:\builds\moz2_slave\m-cen-w32-ntly-000000000000000\build\xpcom\threads\nsthread.cpp @ 602]
003ef31c 569f970c xul!NS_ProcessNextEvent(class nsIThread * thread = 0x00b18340, bool mayWait = true)+0x2d [e:\builds\moz2_slave\m-cen-w32-ntly-000000000000000\build\xpcom\glue\nsthreadutils.cpp @ 251]
003ef344 56c017d6 xul!nsThread::Shutdown(void)+0xff [e:\builds\moz2_slave\m-cen-w32-ntly-000000000000000\build\xpcom\threads\nsthread.cpp @ 465]
003ef360 56bec5d7 xul!nsThreadManager::Shutdown(void)+0x5f [e:\builds\moz2_slave\m-cen-w32-ntly-000000000000000\build\xpcom\threads\nsthreadmanager.cpp @ 131]
003ef38c 56c28888 xul!mozilla::ShutdownXPCOM(class nsIServiceManager * servMgr = 0x00b59104)+0xec [e:\builds\moz2_slave\m-cen-w32-ntly-000000000000000\build\xpcom\build\nsxpcominit.cpp @ 683]
003ef3a8 56c287d5 xul!ScopedXPCOMStartup::~ScopedXPCOMStartup(void)+0x57 [e:\builds\moz2_slave\m-cen-w32-ntly-000000000000000\build\toolkit\xre\nsapprunner.cpp @ 1133]
003ef3c8 56c42d03 xul!XREMain::XRE_main(int argc = 0n4, char ** argv = 0x0040d9d0, struct nsXREAppData * aAppData = 0x003ef52c)+0x101 [e:\builds\moz2_slave\m-cen-w32-ntly-000000000000000\build\toolkit\xre\nsapprunner.cpp @ 4069]
003ef4e8 00f219c2 xul!XRE_main(int argc = 0n4, char ** argv = 0x0040d9d0, struct nsXREAppData * aAppData = 0x003ef52c, unsigned int aFlags = 0)+0x4e [e:\builds\moz2_slave\m-cen-w32-ntly-000000000000000\build\toolkit\xre\nsapprunner.cpp @ 4257]

The fact that the media decoder thread and it's child audio thread are both running at this point is a clear bug in the media code. rjesup, who can make these observe/honor at least xpcom-shutdown-threads if not some earlier shutdown notification?
Flags: needinfo?(rjesup)
> The fact that the media decoder thread and it's child audio thread are both running at this point is a clear bug in the media code. rjesup, who can make these observe/honor at least xpcom-shutdown-threads if not some earlier shutdown notification?

This is something for roc/etc
Component: XPCOM → Video/Audio
Flags: needinfo?(rjesup) → needinfo?(roc)
Adding other media team people
Thanks for looping in the media team Jesup.

MediaDecoder has a shutdown observer, using nsContentUtils::RegisterShutdownObserver(). I'm guessing that's not adequate based on what bsmedberg is saying. We can switch to listening for xpcom-shutdown-threads or whatever is recommended instead.

Or perhaps the problem (or an additional problem) is that our shutdown code is asynchronous, so we still do stuff on other threads after our shutdown notification handler returns?

bsmedberg: I'm guessing all our threads using XPCOM should be shutdown completely shutdown before our xpcom-shutdown-threads observer returns? And is there an earlier notification that we should be listening on instead?

Changing to crash with an NS_RUNTIMEABORT when we do things like sending events to threads in an invalid state would be very helpful in educating users of these APIs as to how they should be used! has a summary if that's helpful. Yes, all XPCOM threads should be .shutdown() by the time xpcom-shutdown-threads returns. It's fine to shut things down earlier, for example from xpcom-shutdown.

We avoided the runtime abort because we didn't know whether there were harmless events being dispatched which would turn into crashes.
Can you do the backout suggested in comment 9 for our final beta to avoid shipping this?
Flags: needinfo?(benjamin)
Since we shipped this in FF25 we will have to live with this again in 26 as it's time for final Beta and there's been no change in status or attempt to test a backout.  Will continue to track for FF27.
I'm sorry, I didn't realize the backout request was for me. I really want somebody from the media team to own this.
Flags: needinfo?(benjamin)
Chris - can you follow up on this?
Assignee: nobody → cpearce
Flags: needinfo?(cpearce)
So if I understand this correctly, this is a regression from a behaviour change in bug 744115, which caused the incorrect usage of threads in the media shutdown code to start failing, whereas previously its incorrect usage of threads worked enough to shutdown without a hang at least.

Before bug 744115 the media code was shutting down without a hang, whereas after bug 744115 we have a hang. So I suggest that we backout bug 744115.

We can backout bug 744115 from Beta, Aurora and Nightly, change the shutdown code in the media stack to have a synchronous path for xpcom shutdown, and land that, and then re-land bug 744115.

The media shutdown code is fragile, so I think any changes to it should ride the trains for maximal testing, hence the backout, fix, reland strategy. We should aim to land the fix sometime at the start of the next cycle, so it has adequate testing.

I propose we add a synchronous path to the media stack shutdown like so:
* Add a new singleton MediaShutdownManager. This is main thread only.
* Remove the shutdown observer code from MediaDecoder.
* Change MediaDecoder ctor to register itself with MediaShutdownManager. This can be a weak reference. The first time this happens, the MediaShutdownManager is constructed, held in a StaticRefPtr.
* The MediaShutdownManager ctor registers it as an xpcom-shutdown listener.
* When MediaDecoders are shutdown normally and their dtor runs (in a runnable that's dispatched to the main thread, as happens currently), they de-register themselves from the MediaShutdownManager. The MediaDecoder's dtor should only run when the reader and state machine have shutdown all their threads.
* When the xpcom shutdown notification comes in, the MediaShutdownManager will iterate over its list of living MediaDecoders, and call their Shutdown() method. The MediaShutdownManager will then block waiting until the last living MediaDecoder has deregistered itself (has been deleted). Then the MediaShutdownManager needs to ensure that the MediaDecoderStateMachine thread has shutdown, and once that's happened MediaShutdownManager's xpcom shutdown observer can return. Note this relies on the MediaDecoder shutdown code still being able to dispatch events to the main thread, and on those events running while the MediaShutdownManager blocks, but that's possible according to .

Benjamin: Is the above an acceptable course of action to you?
Flags: needinfo?(roc)
Flags: needinfo?(cpearce)
Flags: needinfo?(benjamin)
Bug 744115 is not essential to anything except perhaps some intermittent orange. It's fine to back it out of aurora/beta. I'd like to leave it in nightly, though, because I recall it fixed a few orangefactor bugs by accident and would make sheriffing harder if we backed out there.

It sounds like the MediaDecoder/MediaShutdownManager should be a linked list.
Flags: needinfo?(benjamin)
* Add a singleton object MediaShutdownManager that manages the shutdown of the MediaDecoder infrastructure.
* MediaShutdownManager is an xpcom-shutdown observer, and the only such observer in the MediaDecoder infrastructure (apart from HTMLMediaElement). There's a dependency in the order in which things must shutdown; we must shutdown the decoders and then wait for the state machine thread to terminate. If we'd had multiple observers we couldn't guarantee the order in which they're called, so we have a single observer and do things in the right order there.
* The MediaDecoders register themselves with the MediaShutdownManager upon init, and unregister on shutdown. 
* We also wrap the state machine thread in an object, StateMachineThread and register that with the MediaShutdownManager on init, and unregister after shutdown. We wrap the state machine thread so that we can easily pass control of its shutdown from the StateMachineTracker to the MediaShutdownManager.
* The MediaShutdownManager's xpcom-shutdown observer iterates through the registered MediaDecoders and calls their Shutdown() function. Then it iterates through all registered StateMachineThreads, waiting until they finish. Typically there is only one state machine thread, but I think two could be registered at the same time if the shutdown of the old thread and the construction of the new thread interleave.
* Once the state machine threads are shutdown, we exit the xpcom-shutdown observer, all threads that use XPCOM should be shutdown.
* Fixes the shutdown hang for me. :)

Looks green:
Attachment #8344447 - Flags: review?(roc)
(In reply to Chris Pearce (:cpearce) from comment #25)
> Created attachment 8344447 [details] [diff] [review]
> Patch: Centralize MediaDecoder shutdown
> Looks green:

Completely green even. Must be a fluke.
Comment on attachment 8344447 [details] [diff] [review]
Patch: Centralize MediaDecoder shutdown

Review of attachment 8344447 [details] [diff] [review]:

Generally I like this patch a lot. Thanks!

::: content/media/MediaShutdownManager.cpp
@@ +188,5 @@
> +    MediaDecoder* decoder = *(mDecoders.begin());
> +    decoder->Shutdown();
> +    // The decoder should have unregistered itself from our set when its shutdown
> +    // completed. The decoder should now be deleted.
> +    MOZ_ASSERT(mDecoders.find(decoder) == mDecoders.end());

Won't this assertion crash if decoder->Shutdown() indirectly deleted 'this'?

@@ +196,5 @@
> +  // shutting down. Once all the decoders have shutdown the active state
> +  // machine thread will naturally shutdown asynchronously. We may also have
> +  // another state machine thread active, if construction and shutdown of
> +  // the state machine thread has interleaved.
> +  while (!mStateMachineThreads.empty()) {

And won't we crash here for the same reason? Or are you relying on the observer service holding a strong reference to 'this' during Shutdown()? If so, please add a comment to that effect.

::: content/media/MediaShutdownManager.h
@@ +104,5 @@
> +
> +  // Note: These are deliberately weak references! The MediaDecoders must
> +  // unregister themselves before they're deleted.
> +  typedef std::set<MediaDecoder*> DecoderSet;
> +  DecoderSet mDecoders;

Let's not start using std::set here unilaterally. If you want to argue for it, start that discussion on dev-platform.

nsTHashtable<nsPtrHashkey<MediaDecoder> > should work just as well.

@@ +112,5 @@
> +  // the construction and shutdown of these can interleave, so we must track
> +  // individual instances of the state machine threads.
> +  // These are strong references.
> +  typedef std::set<RefPtr<StateMachineThread>> StateMachineThreadSet;
> +  StateMachineThreadSet mStateMachineThreads;

same here

@@ +120,5 @@
> +};
> +
> +// A wrapper for the state machine thread. We must wrap this so
> +// that the state machine threads can shutdown independently from the
> +// StateMachineTracker, under the control of the MediaShutdownManager.

This comment isn't clear enough.
Attachment #8344447 - Flags: review?(roc) → review-
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #27)
> Let's not start using std::set here unilaterally. If you want to argue for
> it, start that discussion on dev-platform.

It's already used in other media code. Eg: ./content/media/webaudio/PannerNode.h
(Note, std::map or ::set must not be used in performance critical code.)
Is a map/hashtable really better than a linked list?
(In reply to Olli Pettay [:smaug] from comment #29)
> (Note, std::map or ::set must not be used in performance critical code.)

Can you point me to why?  Where does it fall down perf-wise?  (Which usecases/edge-cases?)  (I've never looked at std::set<>, but I'll note it's used in some reasonable realtime code inside aka media/webrtc/trunk - paced sender, and rtcp and rtp code.  Not sure how extensively it's used or the sizes of the sets without looking deeper.

std::map<> is used a lot more in webrtc, including in the video encoder framework, bitrate estimator, video capture, and a lot in the RTP code.  Again, what's the perf impact?  Or is it just "use our well-known cross-platform versions to avoid surprises"?

This is all imported code.  We also use std::map<> in PeerConnectionMedia.cpp to keep track of the a/v pipelines (generally only a handful); in some cases (in the future) there might be a few dozen.
Flags: needinfo?(bugs)
It is easy to use std::map/set as hashtables/sets but they don't have O(1) lookup, since they
are usually implements as trees.
std::unordered_map might work as replacement.

Of course if ordering is required, then std::map/set can be useful.

Bug 944810 is an example where std::map would have regressed performance significantly.
Flags: needinfo?(bugs)
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #30)
> Is a map/hashtable really better than a linked list?

I think it's a nice separation of concerns to avoid having to have the decoder declare it belongs to a linked list. This is especially obvious if you need an object to belong to multiple linked lists.
With review comments addressed.

* Using Mozilla Standard Hashtable as a Hashset.
* Rely on observer service's reference to keep MediaShutdownManager alive during xpcom shutdown.
Attachment #8344447 - Attachment is obsolete: true
Attachment #8347804 - Flags: review?(roc)
Landed for Firefox 29:

We'll need to either uplift this to Firefox 28, or backout bug 744115 from there.
I forgot to mention, I forgot to clear sIntance, so we were leaking the MediaShutdownManager. I adjusted the patch before landing to fix that, and update the comments.
And backed out due to build errors that weren't there on Friday on Try:

Somehow the bug number in the commit message was off-by-one, and the landing ended up being attributed to bug 938108.
Backed out again in, since along with the debug build failures that were gone on the relanding, you had crashtest shutdown crashes which were still there on the relanding ( etc.)
This is basically the same as the pervious patch, but I changed the hashset of MediaDecoders to have a refptr key. This means we don't try to deref a dangling pointer to call Shutdown() on a deleted MediaDecoder.

Attachment #8347804 - Attachment is obsolete: true
Attachment #8349134 - Flags: review?(roc)
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla29
Reproduced with local mp3s on nightly 2013-12-15.
Verified fixed FF 29.0a1 2013-12-19, Win 7 x64.
Comment on attachment 8349134 [details] [diff] [review]
Patch: Holding strong refs to MediaDecoders

[Approval Request Comment]
Bug caused by (feature/regressing bug #): bug 744115

User impact if declined: Chance of shutdown hangs if user exits while playing MP3 and/or other media using HMTL5 audio/video.

Testing completed (on m-c, etc.):  This patch has been on Nightly for 3 weeks with no issues reported.

Risk to taking this patch (and alternatives if risky): Risk is low-to-moderate. This touches an area of code that has been traditionally fragile in the past. We've had no issues reported since it landed three weeks ago. Alternative is to back out bug 744115 (we backed bug 744115 out of Firefox 27 (current beta)). Bsmedberg said that backing bug 744115 out from Firefox 28 may change the intermittent orange profile (bug 938107 comment 23). We must either uplift this patch to Firefox 28, or backout bug 744115 from Firefox 28 to prevent this hang.

String or IDL/UUID changes made by this patch: None.
Attachment #8349134 - Flags: approval-mozilla-aurora?
Comment on attachment 8349134 [details] [diff] [review]
Patch: Holding strong refs to MediaDecoders

We'll have to take the plunge eventually here so let's get it on Aurora and then to Beta for wider audience feedback.
Attachment #8349134 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
checkin needed on aurora.
Keywords: checkin-needed
Alice, can you please confirm this is fixed for you now in Firefox 27 and 28?
Flags: needinfo?(alice0775)
I cannot reproduce the problem with URL on Firefox27b4, Aurora28.0a2and Nightly29,0a1.
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:27.0) Gecko/20100101 Firefox/27.0 ID:20140106141415
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:28.0) Gecko/20100101 Firefox/28.0 ID:20140108004006
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:29.0) Gecko/20100101 Firefox/29.0 ID:20140108030203
Mozilla/5.0 (X11; Linux i686; rv:27.0) Gecko/20100101 Firefox/27.0 ID:20140106141415
Mozilla/5.0 (X11; Linux i686; rv:28.0) Gecko/20100101 Firefox/28.0 ID:20140108004006
Mozilla/5.0 (X11; Linux i686; rv:29.0) Gecko/20100101 Firefox/29.0 ID:20140108030203
Flags: needinfo?(alice0775)
Thanks for your help, Alice.
Duplicate of this bug: 964354
Duplicate of this bug: 964354
See Also: → 965523
Depends on: 1004669
You need to log in before you can comment on or make changes to this bug.