Open Bug 1724417 Opened 3 years ago Updated 6 months ago

EverySecondTelemetryCallback_m is using a lot of power when using Panopto

Categories

(Core :: WebRTC: Signaling, defect)

defect

Tracking

()

People

(Reporter: florian, Unassigned, NeedInfo)

References

(Blocks 1 open bug)

Details

See this profile: https://share.firefox.dev/3lEhumT

mozilla::PeerConnectionCtx::EverySecondTelemetryCallback_m is called once per second, and uses 100% of a CPU core for about 100ms each time.

This process is the one in which panopto (air mozilla) was loaded. I watched a stream there yesterday, and forgot a tab or two (which explains why the process is still alive), but there's no video playing in them.

Here is another profile with more tracks https://share.firefox.dev/3A8Daf8 showing that there's also a lot of activity in the IPC I/O Child and Socket thread on both the content process and the Socket process.

I had 2 panopto tabs. One was the homepage, and the other one was the tab in which a stream was played yesterday. Closing the latter tab made the CPU use drop to almost 0 in the panopto content process.

Re-opening the closed tab doesn't make the problem come back; it's likely something that happens when the tab remained for hours.

There are a lot of things we could do to optimize this, although the fact that this is taking as much time as it is seems kinda pathological (eg; site is keeping hundreds of PeerConnections open for no good reason). What does about:webrtc look like when this occurs?

From the profile, it looks like an easy win would be to keep a cache of the serialized SDP (these fields) inside JsepSessionImpl.

Another easy win might be to elide some of the stats gathering when the PeerConnectionImpl has never actually been used (ie; no sRD or sLD). This may not be a common enough situation to warrant this kind of optimization, though. I would have to see about:webrtc to figure out what we need to be optimizing.

Flags: needinfo?(florian)

(In reply to Byron Campen [:bwc] from comment #2)

What does about:webrtc look like when this occurs?

About:webrtc makes my Firefox parent process use massive amounts of memory (>14GB) and I have to close it quickly after opening it. It shows "Session History", so I think that means there's no active peer connection, but I'm not sure.

Here's a profile of about:webrtc making my browser unusable: https://share.firefox.dev/2VDk5CD

Flags: needinfo?(florian)

It is probably empty because it is trying to get stats from a ridiculous number of PeerConnections, and taking a very long time. Does this still occur if you close all of the airmo tabs?

(In reply to Byron Campen [:bwc] from comment #4)

It is probably empty because it is trying to get stats from a ridiculous number of PeerConnections, and taking a very long time. Does this still occur if you close all of the airmo tabs?

Yes, even with all the airmo tabs closed and the airmo content process no longer alive, about:webrtc still makes my browser unusable. Here's another profile: https://share.firefox.dev/37moXip The profile only lasts 10s, during which about 600MB of additional memory were used and the UI was quite unresponsive.

Are there any other tabs that might have a PeerConnection in them? If it is taking that long when working just from the cached stats, we might be talking thousands of PeerConnections worth of stats... how long has this browser instance been running continuously? Or maybe the PeerConnections are being leaked somehow...

(In reply to Byron Campen [:bwc] from comment #6)

Are there any other tabs that might have a PeerConnection in them?

The only other domain that about:webrtc listed is my bank's web site, but all the tabs (and the content process for it) are closed.

how long has this browser instance been running continuously?

It's a Nightly from 2021-07-29, so about a week.

Panopto (air mozilla)'s use of WebRTC here is abnormal. This was reported as ZenDesk issue #00720429 by Andy after Randell reported internally (and I confirmed from his saved about:webrtc page) over 2000 peer connections after a live airmo call. Andy, is there any update on this?

So I don't think we should rush to optimize EverySecondTelemetryCallback_m. A cap on RTCPeerConnections maybe?

Flags: needinfo?(akochendorfer)

I went back and forth with panopto with them not believing they had that many connections open (or that they used webrtc in the viewer), and they closed it.
"Good afternoon. Our team had reviewed the details shared and they have confirmed that we do not use WebRTC within the product as a whole and specifically not within the viewer used to watch sessions.

Given this scenario, the team has advised that we would need network and console logs from a browser reproducing this behavior in order to identify the calls involved and their source. The team has also advised that having any special steps being followed to generate the behavior, if any, would be most helpful to the review."

I'm sure there are ways we could reduce the overhead here, which would be good, especially for inactive connections

I just encountered this bug again: https://share.firefox.dev/3kyndIZ (on Nightly 93.0a1 (2021-08-25) (64 bits), on OS X 10.15.7 (19H114))

And here is a profile of what happened when I closed the panopto tab where I viewed the Firefox all-hands last Thursday: https://share.firefox.dev/3sVNOU2 (note: the Socket process is not included, it failed to capture the profile, its "IPC I/O Child" thread is using 98% of a CPU core).

(In reply to Randell Jesup [:jesup] (needinfo me) from comment #9)

Given this scenario, the team has advised that we would need network and console logs from a browser reproducing this behavior in order to identify the calls involved and their source.

Should we add profiler markers with a stack when creating a PeerConnection object?

The team has also advised that having any special steps being followed to generate the behavior, if any, would be most helpful to the review."

For me, it's just watching an air mozilla call live, then letting the tab sit for about half a week (I only remember the tab is open when I come back to my laptop and find the fans spinning in the morning).

See Also: → 1728091

I'm still seeing this, with activity every second in both the panopto content process https://share.firefox.dev/3580GyJ and the socket process https://share.firefox.dev/3GX23NR

What's the purpose of this telemetry captured every second? Would there be a way to capture similar data without needing a repeating timer and the IPC round trip to the Socket process?

See Also: → 1828685
Summary: EverySecondTelemetryCallback_m is using a lot of power → EverySecondTelemetryCallback_m is using a lot of power when using Panopto

Hey Florian, what was the STR for the most recent profiles you collected?

Flags: needinfo?(florian)

(In reply to Jim Mathies [:jimm] from comment #13)

Hey Florian, what was the STR for the most recent profiles you collected?

Same as comment 0, watch a stream (not sure if it needs to be a live stream or if it can be a replay of a previous event) on air mozilla, and once the video is done playing, forget the tab for about a day.

Flags: needinfo?(florian)
See Also: → 1863805

Here's a profile captured today: https://share.firefox.dev/3xMuXRO

Same steps to reproduce, forgotting a panopto (air mozilla) video tab in the background for 24 hours.

EverySecondTelemetryCallback_m is less visible in the profile (it is now very short), but we instead see PWebrtcGlobal::Msg_GetStats and MozPromise::ThenValueBase::ResolveOrRejectRunnable long runnables, so I would guess some of the code has been made more async.

You need to log in before you can comment on or make changes to this bug.