Open Bug 1863805 Opened 10 months ago Updated 10 days ago

PWebrtcGlobal::Reply_GetStats is very visible in hang reports

Categories

(Core :: WebRTC: Audio/Video, defect)

defect

Tracking

()

Performance Impact medium

People

(Reporter: florian, Unassigned, NeedInfo)

References

Details

(Keywords: perf:resource-use, perf:responsiveness, Whiteboard: [bhr:PWebrtcGlobal::Reply_GetStats])

I noticed that several stacks in the top hangs reported on BHR contain "Task PWebrtcGlobal::Reply_GetStats". They seem to be mostly free'ing memory (je_free).

How much we see this hang seems to vary a lot from one day to another.
For the Nightly of 20231101, it's the #2 hang: https://fqueze.github.io/hang-stats/#date=20231101&row=1 (and #1 should be fixed by bug 1862712 that landed on November 3)
For the 20231101 nightly it's #6: https://fqueze.github.io/hang-stats/#date=20231030&row=5
For the 20231028 build it's #24: https://fqueze.github.io/hang-stats/#date=20231028&row=23

Severity: -- → S3

Is there a way to check locally how many background hangs you've had?

Flags: needinfo?(florian)

We think this is related to newer features we've added to about:webrtc and debugging in Nightly.

Florian, can you confirm this is getting reported in Nightly?

Flags: needinfo?(na-g)

(In reply to Jeff Muizelaar [:jrmuizel] from comment #2)

Is there a way to check locally how many background hangs you've had?

I'm not sure. Doug, do you know?

(In reply to Jim Mathies [:jimm] from comment #3)

Florian, can you confirm this is getting reported in Nightly?

Yes, the reports are from Nightly. We only track hangs in Nightly, so I can't say if the same thing also happens on beta or release.

Flags: needinfo?(florian) → needinfo?(dothayer)

(In reply to Jeff Muizelaar [:jrmuizel] from comment #2)

Is there a way to check locally how many background hangs you've had?

There's no easy way, no. The data is all visible if you go to about:telemetry, click "current data", select "Archived ping data", and filter to "bhr", but it's not symbolicated and there's no nice UI for exploring it.

Flags: needinfo?(dothayer)

Is there a way to get a sense of the number of users experiencing the hang? i.e. differentiating 14,000 users have the hang once vs. 1 user having it 14,000 times.

Flags: needinfo?(florian)

(In reply to Jeff Muizelaar [:jrmuizel] from comment #6)

Is there a way to get a sense of the number of users experiencing the hang? i.e. differentiating 14,000 users have the hang once vs. 1 user having it 14,000 times.

There's no easy way. The raw data came from telemetry, so there must be a way, but I don't know how. Doug can probably give a sense of how much effort would be required. I assume a large part of the challenge will be that the data stored in telemetry is unsymbolicated, but given PWebrtcGlobal::Reply_GetStats is a label frame, maybe that can help.

Flags: needinfo?(florian) → needinfo?(dothayer)
Flags: needinfo?(na-g)
Flags: needinfo?(na-g)
Flags: needinfo?(docfaraday)

@florian it's difficult to assess the performance impact here, are you able to use the impact calculator, or provide your thoughts on the impact?

Flags: needinfo?(florian)
Performance Impact: ? → pending-needinfo

The Performance Impact Calculator has determined this bug's performance impact to be medium. If you'd like to request re-triage, you can reset the Performance Impact flag to "?" or needinfo the triage sheriff.

Platforms: [x] Windows [x] macOS [x] Linux
Impact on browser: Causes noticeable jank
Websites affected: Rare
Resource impact: Some
[x] Bug affects multiple sites

Performance Impact: pending-needinfo → medium
Flags: needinfo?(florian)
Flags: needinfo?(na-g)
Flags: needinfo?(docfaraday)

We looked at uses of this and could not see how this would impact everyday performance in the general population. Without more data this is pretty unactionable.

(In reply to Jim Mathies [:jimm] from comment #10)

Without more data this is pretty unactionable.

Could this be related to bug 1724417?

I would expect this to show up in profiles during webrtc conferencing if it was a problem. I'll capture a profile today when using Meet and see what we can find.

No longer blocks: webrtc-triage

Here's a profile: https://share.firefox.dev/44xtWcV

It doesn't take long enough to be reported as BHR hangs, but "Task PWebrtcGlobal::Reply_GetStats" takes 13ms on the parent process main thread, 4 times per second: https://share.firefox.dev/3JAYhxl

I was initially going to put this profile in bug 1724417, as the content process taking plenty of CPU time is for panopto.com

I have only 2 panopto tabs in that content process:

Both tabs have been sitting in the background for about 24 hours. The content process priority in my profile is "background", so no video has been playing.

See Also: → 1724417

I'm seeing jank from this too: https://share.firefox.dev/3XIU5nj

Jim, can someone take a closer look?

Flags: needinfo?(jmathies)
You need to log in before you can comment on or make changes to this bug.