MediaStreamGraph doesn't shut down when the last user is gone

RESOLVED INVALID

Status

()

defect
P2
normal
Rank:
13
RESOLVED INVALID
2 years ago
a year ago

People

(Reporter: jesup, Unassigned)

Tracking

({perf})

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [qf:p2])

Found by watching about:telemetry.  STR:

Start browser.  Browse to about:telemetry, select Content on right, search for media.  Then in a second tab browse to https://appear.in/whatever.  Accept permissions.  In the first tab, refresh and note that the Runner for MediaStreamGraphStableStateRunnable is getting hit about 50-100 times per second (probably once per DataCallback).

Now leave the room.  close the tab.  You can even go to about:memory and GC/CC/MinimizeMemoryUse.  Hit refresh on the about:telemetry tab.  Note that the entry for StateStateRunnable is still incrementing, and doesn't stop.

This is likely the cause of the high number of reported events for that runnable in telemetry from the field.
So, for some reason it after a browser restart I'm not seeing the 'leak' of a running MSG (or rather, it stops after a few seconds, likely on a GC/CC pass).

This implies that something is leaking references to the MSG somewhere or there's a broken CC impl.  The browser I saw that in is often used for appear.in/moz-webrtc calls, however the count of StableStateRunnables before I started was 0 (the entry wasn't there it appeared) so likely I hadn't used webrtc since restarting.  After loading appear.in/jesup, I saw it start counting, and stopping and closing the tab (and GC/CC/minimize) didn't stop it that time.  

One different might be that before I tried appear.in, I tried youtube.  I tried that again, still shuts down properly.

Note that these are slightly newer builds since it took an update when I restarted (from 8/23? to 8/24?, and then to 8/29).  I don't think any bug was fixed in that timeframe that could affect this.

The Telemetry info from the field, however, implies that something like this is happening.  Though even a small number of users using webrtc for an hour or two could cause a very large number of events at 50-100/s, even if we do properly shut it down when not in use.

Bill, can you check the new data you collected?  And Paul/jya and others, can you periodically check about:telemetry StableState?  Thanks

Note: you can just put StableState in the search box with "Home" selected; you don't need to check Keyed Histogram and Select Content.  Note this seems to be new since 8/12; one I was using to test was an 8/12 build and about:telemetry is a bit different.
Flags: needinfo?(wmccloskey)
Flags: needinfo?(padenot)
Flags: needinfo?(jyavenard)
Out of 252,144 sessions I looked at, only 22,091 saw any instances of MediaStreamGraphStableStateRunnable (so about 10%). Within that 10%, I looked at how highly the runnable ranked. In 1711 sessions, it was #1.

I'm not sure what this means. It's definitely something that only a portion of our users see.
Flags: needinfo?(wmccloskey)
Padenot tells me he landed a fix for an MSG shutdown bug recently.  Paul, when did that land so we can look at sessions with build-id's after that hit central?
(In reply to Bill McCloskey (:billm) from comment #2)
> Out of 252,144 sessions I looked at, only 22,091 saw any instances of
> MediaStreamGraphStableStateRunnable (so about 10%). Within that 10%, I
> looked at how highly the runnable ranked. In 1711 sessions, it was #1.
> 
> I'm not sure what this means. It's definitely something that only a portion
> of our users see.

This is only used when a page uses Web Audio or WebRTC, really, I'm not surprised by those numbers. When Web Audio or WebRTC are being used, then they are going to be used for some time: people are making a call, or playing a game or something.

Randell, I need to know which OS you we're when you observed the msg leak.
Flags: needinfo?(padenot) → needinfo?(rjesup)
Windows.

We're going to add a bug or two about reducing the frequency of RunInStableState, and eliminating some or all of the RunInStableState runnables in favor of atomics.  Right now we post a lot of them, but should only happen when WebRTC, WebAudio or a few related cases are in operation.
Flags: needinfo?(rjesup)
Flags: needinfo?(jyavenard)
Mass change P1->P2 to align with new Mozilla triage process
Priority: P1 → P2

Comment 7

2 years ago
This bug won't be addressed for 57 so I am changing from QF:p1 to QF:p1 for post 57 work.
Whiteboard: [qf:p1] → [qf:p2]
Keywords: perf

Updated

a year ago
Flags: needinfo?(ajones)
See Also: → 1404991
Jan-Ivar - I'm going to move the ni? over to you and let you decide how to follow up with this bug.
Flags: needinfo?(ajones) → needinfo?(jib)
I don't think there is a bug here. Bug 1404991 is what we will do in the future.
To elaborate, we've only been able to see this once, not under controlled conditions.

The large amount of runnable is expected, when running a WebRTC or Web Audio API application. They don't really show up in the profile. Again, bug 1404991 will be the real solution.

I'm closing this. If someone ever sees an MSG not being collected, and it should have been, please open a new bug in Audio/Video: MediaStreamGraph with details.
Status: NEW → RESOLVED
Last Resolved: a year ago
Flags: needinfo?(jib)
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.