Closed Bug 917491 Opened 6 years ago Closed 6 years ago

Guarantee safe shutdown of AsyncLatencyLogger

Categories

(Core :: Audio/Video, defect)

x86
Linux
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla27

People

(Reporter: jesup, Assigned: jesup)

References

()

Details

(Whiteboard: [getusermedia])

Attachments

(2 files, 4 obsolete files)

In gum_test.html, dismiss the device selector, then hit the close box.  Deadlock detector fires on gAsyncLogger.  Doesn't happen if you accept a device, or if you accept and then stop (or close the tab after accept).  Not sure if it happens on other OS's, it might well.

GDB shows that ~nsLayoutStatics() (where AsyncLatencyLogger::Shutdown() is called) isn't being called in this case.  Probably we need to move to Observing xpcom-shutdown directly.

There's probably a better place than nsLayoutStatics to init it as well, though that's far less critical.
Duplicate of this bug: 918093
All thread consumers are responsible for calling nsIThread.Shutdown on any threads they create either before or during xpcom-shutdown-threads (xpcom-shutdown would be fine also). AsyncLatencyLogger should be doing this, so that even if there are leaks we don't end up calling into XPCOM after it shuts down and potentially causing use-after-free issues elsewhere.
The fundamental issue here seems to be a leak of something blocking releasing nsLayoutStatics; because of this a *lot* of stuff doesn't get shut down, not just the latency logger (AudioStreams, etc, etc).

I tried a balance tree for nsLayoutStatics; nothing jumps out at me and there's no media code in any of the stacks (and it was 380K output...)

-> Firefox/dao

I noticed that if I invoke a password save doorhanger and click close, it does call nsLayoutStatics::Shutdown; this implies it's specific to getUserMedia permission requests, but doesn't prove it
Component: WebRTC: Audio/Video → General
Product: Core → Firefox
Summary: Shutdown deadlock on Linux if you quit without answering a gUM request → nsLayoutStatics leaks if we shutdown without closing a getUserMedia permission doorhanger
Whiteboard: [getusermedia]
Depends on: 918234
rjesup, normally our builtin leak tests would catch nsLayoutStatics not being cleaned up. Does that mean we don't have test coverage for this scenario?

Leaking this on shutdown is less important than making sure we don't deadlock, though, so at least joining the latency logging thread is necessary even if we fix this particular way to leak.
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #5)
> rjesup, normally our builtin leak tests would catch nsLayoutStatics not
> being cleaned up. Does that mean we don't have test coverage for this
> scenario?

Yeah, I suspect that shutdown-without-answering isn't covered in the UI tests (not sure about other doorhangers).

> Leaking this on shutdown is less important than making sure we don't
> deadlock, though, so at least joining the latency logging thread is
> necessary even if we fix this particular way to leak.

Yeah, sigh.  Somehow LayoutStatics became the dumping-ground for a ton of startup/shutdown code; quite a bit of which has nothing to do with layout (AudioStream?)  Sometimes it would be far easier to do what's done here: just add yourself to a Statics object and not have to deal with "ok, which observer will I be called on, and what will be shut down or not when I'm called?  And what could be maybe-shutdown because the order of the observer calls isn't guaranteed?"  We should consider an GeckoStatics class (though likely this has been long-ago nixed for good reasons).  Given all the stuff shut down there, I'd guess there are more threads that can leak (though perhaps I'm wrong, we don't use *that* many threads outside webrtc).

I had a patch that did this, then deleted it.  I'll recreate it.  I'm guessing if I can't rely on not leaking, I need something to *always* call ::Shutdown off an xpcom-shutdown (-threads) observer.
Safety-valve to guarantee shutdown of the logger even if the world leaks
Attachment #808096 - Flags: review?(benjamin)
Attachment #808096 - Flags: review?(benjamin)
Using nsContentUtils and RUN_ON_THREAD to register
Attachment #808096 - Attachment is obsolete: true
Attachment #808098 - Attachment is obsolete: true
Comment on attachment 808100 [details] [diff] [review]
Guarantee cleanup of AsyncLatencyLogger on xpcom-shutdown

Cleaner way to register

Note that the thread only gets created on the first media use, not at browser startup.

Note: this does NOT resolve the leak that exposed this problem; this just guarantees it doesn't cause XPCOM thread issues.
Attachment #808100 - Flags: review?(benjamin)
Blocks: 919176
Attachment #808100 - Attachment is obsolete: true
Attachment #808100 - Flags: review?(benjamin)
Attachment #808265 - Flags: review?(benjamin)
(In reply to Randell Jesup [:jesup] from comment #6)
> (In reply to Benjamin Smedberg  [:bsmedberg] from comment #5)
> > rjesup, normally our builtin leak tests would catch nsLayoutStatics not
> > being cleaned up. Does that mean we don't have test coverage for this
> > scenario?
> 
> Yeah, I suspect that shutdown-without-answering isn't covered in the UI
> tests (not sure about other doorhangers).

AFAIK all of our getUserMedia tests run with media.navigator.permission.disabled to avoid the doorhanger entirely, so they wouldn't test this scenario.
Comment on attachment 808265 [details] [diff] [review]
Guarantee cleanup of AsyncLatencyLogger on xpcom-shutdown

I don't understand the difference between AsyncLatencyLogger::Shutdown and ShutdownLogger. Latency.h needs some docs explaining why these two methods exist and how they are different. Also probably explaining the lifetime of this object and thread : once it's created, does it always live for the life of the process, or can it be torn down early if the process stops playing audio?

That would help me understand whether the call to nsContentUtils::RegisterShutdownObserver is safe, or whether it needs a matching call to UnregisterShutdownObserver within Shutdown.
Attachment #808265 - Flags: review?(benjamin) → review-
Component: General → Video/Audio
Product: Firefox → Core
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #13)
> Comment on attachment 808265 [details] [diff] [review]
> Guarantee cleanup of AsyncLatencyLogger on xpcom-shutdown
> 
> I don't understand the difference between AsyncLatencyLogger::Shutdown and
> ShutdownLogger. Latency.h needs some docs explaining why these two methods
> exist and how they are different. Also probably explaining the lifetime of
> this object and thread : once it's created, does it always live for the life
> of the process, or can it be torn down early if the process stops playing
> audio?

Sure.  Right now it lives until process shutdown, since MediaStreamGraph and others don't shut down once started (in theory we could start/stop it, but the overhead of this isn't huge, and the users of it generally aren't on MainThread, and if MSG doesn't shut down it's irrelevant).  The only overhead worth mentioning is the memory overhead of an idle thread.  The thread's lifetime would be until either the logger is destroyed (in nsLayoutStatics) or xpcom-shutdown, and once it shuts down it refuses to start back up so that there's no danger of it restarting and living past xpcom-shutdown.  The logger object's lifetime is until nsLayoutStatics tears down (and AudioStream/MediaStreamGraph have released it - they hold a RefPtr to it for safety).  

I'm trying to keep overhead as low as possible, especially for the "not logging" case, so I can leave this code in for production builds (this will be expanded to be able to be started/stopped at runtime, and to feed data to an upcoming about:webrtc page, etc).  Right now it uses barely anything when disabled (<0.01% of used CPU in a webrtc call IIRC), and not much when logging a call (<0.1% on regular threads, ~0.7% on my machine for the logging thread, and that thread doesn't have realtime requirements.  If I need to, I can drive those numbers down further.

The patch I'm putting up cleans all this up (this logger was inherited from padenot and updated by suhas; I filed off some rough edges and had padenot/kinetik/roc review it).  I would imagine this can be done better in the long run...  But I think this should work.
 
> That would help me understand whether the call to
> nsContentUtils::RegisterShutdownObserver is safe, or whether it needs a
> matching call to UnregisterShutdownObserver within Shutdown.

It's there, but a raw RemoverObserver(), which is wrong though it'll work.  I'll switch the RemoveObserver() to UnregisterShutdownObserver() so they match.
Attachment #808265 - Attachment is obsolete: true
Since this bug has largely become about the shutdown safety issues, retargeting as that and removing UI people.  There's still an issue with *why* we leak the world when a permission doorhanger isn't answered.
Summary: nsLayoutStatics leaks if we shutdown without closing a getUserMedia permission doorhanger → Guarantee safe shutdown of AsyncLatencyLogger
Attachment #808702 - Flags: review?(benjamin)
Comment on attachment 808701 [details] [diff] [review]
interdiffs

Interdiffs from previous patch for ease of reviewing
Attachment #808701 - Flags: review?(benjamin)
Attachment #808701 - Flags: review?(benjamin) → review+
Comment on attachment 808702 [details] [diff] [review]
Guarantee cleanup of AsyncLatencyLogger on xpcom-shutdown

carry r+ from interdiff patch
Attachment #808702 - Flags: review?(benjamin) → review+
https://hg.mozilla.org/mozilla-central/rev/2ebbcdeeab08
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.