Guarantee safe shutdown of AsyncLatencyLogger

RESOLVED FIXED in mozilla27

Status

()

Core
Audio/Video
RESOLVED FIXED
5 years ago
5 years ago

People

(Reporter: jesup, Assigned: jesup)

Tracking

unspecified
mozilla27
x86
Linux
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [getusermedia], URL)

Attachments

(2 attachments, 4 obsolete attachments)

(Assignee)

Description

5 years ago
In gum_test.html, dismiss the device selector, then hit the close box.  Deadlock detector fires on gAsyncLogger.  Doesn't happen if you accept a device, or if you accept and then stop (or close the tab after accept).  Not sure if it happens on other OS's, it might well.

GDB shows that ~nsLayoutStatics() (where AsyncLatencyLogger::Shutdown() is called) isn't being called in this case.  Probably we need to move to Observing xpcom-shutdown directly.

There's probably a better place than nsLayoutStatics to init it as well, though that's far less critical.
(Assignee)

Updated

5 years ago
Duplicate of this bug: 918093

Comment 3

5 years ago
All thread consumers are responsible for calling nsIThread.Shutdown on any threads they create either before or during xpcom-shutdown-threads (xpcom-shutdown would be fine also). AsyncLatencyLogger should be doing this, so that even if there are leaks we don't end up calling into XPCOM after it shuts down and potentially causing use-after-free issues elsewhere.
(Assignee)

Comment 4

5 years ago
The fundamental issue here seems to be a leak of something blocking releasing nsLayoutStatics; because of this a *lot* of stuff doesn't get shut down, not just the latency logger (AudioStreams, etc, etc).

I tried a balance tree for nsLayoutStatics; nothing jumps out at me and there's no media code in any of the stacks (and it was 380K output...)

-> Firefox/dao

I noticed that if I invoke a password save doorhanger and click close, it does call nsLayoutStatics::Shutdown; this implies it's specific to getUserMedia permission requests, but doesn't prove it
Component: WebRTC: Audio/Video → General
Product: Core → Firefox
Summary: Shutdown deadlock on Linux if you quit without answering a gUM request → nsLayoutStatics leaks if we shutdown without closing a getUserMedia permission doorhanger
Whiteboard: [getusermedia]
(Assignee)

Updated

5 years ago
Depends on: 918234

Comment 5

5 years ago
rjesup, normally our builtin leak tests would catch nsLayoutStatics not being cleaned up. Does that mean we don't have test coverage for this scenario?

Leaking this on shutdown is less important than making sure we don't deadlock, though, so at least joining the latency logging thread is necessary even if we fix this particular way to leak.
(Assignee)

Comment 6

5 years ago
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #5)
> rjesup, normally our builtin leak tests would catch nsLayoutStatics not
> being cleaned up. Does that mean we don't have test coverage for this
> scenario?

Yeah, I suspect that shutdown-without-answering isn't covered in the UI tests (not sure about other doorhangers).

> Leaking this on shutdown is less important than making sure we don't
> deadlock, though, so at least joining the latency logging thread is
> necessary even if we fix this particular way to leak.

Yeah, sigh.  Somehow LayoutStatics became the dumping-ground for a ton of startup/shutdown code; quite a bit of which has nothing to do with layout (AudioStream?)  Sometimes it would be far easier to do what's done here: just add yourself to a Statics object and not have to deal with "ok, which observer will I be called on, and what will be shut down or not when I'm called?  And what could be maybe-shutdown because the order of the observer calls isn't guaranteed?"  We should consider an GeckoStatics class (though likely this has been long-ago nixed for good reasons).  Given all the stuff shut down there, I'd guess there are more threads that can leak (though perhaps I'm wrong, we don't use *that* many threads outside webrtc).

I had a patch that did this, then deleted it.  I'll recreate it.  I'm guessing if I can't rely on not leaking, I need something to *always* call ::Shutdown off an xpcom-shutdown (-threads) observer.
(Assignee)

Comment 7

5 years ago
Created attachment 808096 [details] [diff] [review]
Guarantee cleanup of AsyncLatencyLogger on xpcom-shutdown

Safety-valve to guarantee shutdown of the logger even if the world leaks
(Assignee)

Updated

5 years ago
Attachment #808096 - Flags: review?(benjamin)
(Assignee)

Updated

5 years ago
Attachment #808096 - Flags: review?(benjamin)
(Assignee)

Comment 8

5 years ago
Created attachment 808098 [details] [diff] [review]
Guarantee cleanup of AsyncLatencyLogger on xpcom-shutdown

Using nsContentUtils and RUN_ON_THREAD to register
(Assignee)

Updated

5 years ago
Attachment #808096 - Attachment is obsolete: true
(Assignee)

Comment 9

5 years ago
Created attachment 808100 [details] [diff] [review]
Guarantee cleanup of AsyncLatencyLogger on xpcom-shutdown
(Assignee)

Updated

5 years ago
Attachment #808098 - Attachment is obsolete: true
(Assignee)

Comment 10

5 years ago
Comment on attachment 808100 [details] [diff] [review]
Guarantee cleanup of AsyncLatencyLogger on xpcom-shutdown

Cleaner way to register

Note that the thread only gets created on the first media use, not at browser startup.

Note: this does NOT resolve the leak that exposed this problem; this just guarantees it doesn't cause XPCOM thread issues.
Attachment #808100 - Flags: review?(benjamin)
(Assignee)

Updated

5 years ago
Blocks: 919176
(Assignee)

Comment 11

5 years ago
Created attachment 808265 [details] [diff] [review]
Guarantee cleanup of AsyncLatencyLogger on xpcom-shutdown

Lost a line while modifying.  https://tbpl.mozilla.org/?tree=Try&rev=daeeefb73261
(Assignee)

Updated

5 years ago
Attachment #808100 - Attachment is obsolete: true
Attachment #808100 - Flags: review?(benjamin)
(Assignee)

Updated

5 years ago
Attachment #808265 - Flags: review?(benjamin)
(In reply to Randell Jesup [:jesup] from comment #6)
> (In reply to Benjamin Smedberg  [:bsmedberg] from comment #5)
> > rjesup, normally our builtin leak tests would catch nsLayoutStatics not
> > being cleaned up. Does that mean we don't have test coverage for this
> > scenario?
> 
> Yeah, I suspect that shutdown-without-answering isn't covered in the UI
> tests (not sure about other doorhangers).

AFAIK all of our getUserMedia tests run with media.navigator.permission.disabled to avoid the doorhanger entirely, so they wouldn't test this scenario.

Comment 13

5 years ago
Comment on attachment 808265 [details] [diff] [review]
Guarantee cleanup of AsyncLatencyLogger on xpcom-shutdown

I don't understand the difference between AsyncLatencyLogger::Shutdown and ShutdownLogger. Latency.h needs some docs explaining why these two methods exist and how they are different. Also probably explaining the lifetime of this object and thread : once it's created, does it always live for the life of the process, or can it be torn down early if the process stops playing audio?

That would help me understand whether the call to nsContentUtils::RegisterShutdownObserver is safe, or whether it needs a matching call to UnregisterShutdownObserver within Shutdown.
Attachment #808265 - Flags: review?(benjamin) → review-
Component: General → Video/Audio
Product: Firefox → Core
(Assignee)

Comment 14

5 years ago
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #13)
> Comment on attachment 808265 [details] [diff] [review]
> Guarantee cleanup of AsyncLatencyLogger on xpcom-shutdown
> 
> I don't understand the difference between AsyncLatencyLogger::Shutdown and
> ShutdownLogger. Latency.h needs some docs explaining why these two methods
> exist and how they are different. Also probably explaining the lifetime of
> this object and thread : once it's created, does it always live for the life
> of the process, or can it be torn down early if the process stops playing
> audio?

Sure.  Right now it lives until process shutdown, since MediaStreamGraph and others don't shut down once started (in theory we could start/stop it, but the overhead of this isn't huge, and the users of it generally aren't on MainThread, and if MSG doesn't shut down it's irrelevant).  The only overhead worth mentioning is the memory overhead of an idle thread.  The thread's lifetime would be until either the logger is destroyed (in nsLayoutStatics) or xpcom-shutdown, and once it shuts down it refuses to start back up so that there's no danger of it restarting and living past xpcom-shutdown.  The logger object's lifetime is until nsLayoutStatics tears down (and AudioStream/MediaStreamGraph have released it - they hold a RefPtr to it for safety).  

I'm trying to keep overhead as low as possible, especially for the "not logging" case, so I can leave this code in for production builds (this will be expanded to be able to be started/stopped at runtime, and to feed data to an upcoming about:webrtc page, etc).  Right now it uses barely anything when disabled (<0.01% of used CPU in a webrtc call IIRC), and not much when logging a call (<0.1% on regular threads, ~0.7% on my machine for the logging thread, and that thread doesn't have realtime requirements.  If I need to, I can drive those numbers down further.

The patch I'm putting up cleans all this up (this logger was inherited from padenot and updated by suhas; I filed off some rough edges and had padenot/kinetik/roc review it).  I would imagine this can be done better in the long run...  But I think this should work.
 
> That would help me understand whether the call to
> nsContentUtils::RegisterShutdownObserver is safe, or whether it needs a
> matching call to UnregisterShutdownObserver within Shutdown.

It's there, but a raw RemoverObserver(), which is wrong though it'll work.  I'll switch the RemoveObserver() to UnregisterShutdownObserver() so they match.
(Assignee)

Comment 15

5 years ago
Created attachment 808701 [details] [diff] [review]
interdiffs
(Assignee)

Comment 16

5 years ago
Created attachment 808702 [details] [diff] [review]
Guarantee cleanup of AsyncLatencyLogger on xpcom-shutdown
(Assignee)

Updated

5 years ago
Attachment #808265 - Attachment is obsolete: true
(Assignee)

Comment 17

5 years ago
Since this bug has largely become about the shutdown safety issues, retargeting as that and removing UI people.  There's still an issue with *why* we leak the world when a permission doorhanger isn't answered.
Summary: nsLayoutStatics leaks if we shutdown without closing a getUserMedia permission doorhanger → Guarantee safe shutdown of AsyncLatencyLogger
(Assignee)

Updated

5 years ago
Attachment #808702 - Flags: review?(benjamin)
(Assignee)

Comment 18

5 years ago
Comment on attachment 808701 [details] [diff] [review]
interdiffs

Interdiffs from previous patch for ease of reviewing
Attachment #808701 - Flags: review?(benjamin)

Updated

5 years ago
Attachment #808701 - Flags: review?(benjamin) → review+
(Assignee)

Comment 20

5 years ago
Comment on attachment 808702 [details] [diff] [review]
Guarantee cleanup of AsyncLatencyLogger on xpcom-shutdown

carry r+ from interdiff patch
Attachment #808702 - Flags: review?(benjamin) → review+
https://hg.mozilla.org/mozilla-central/rev/2ebbcdeeab08
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.