Open Bug 1501982 Opened 2 years ago Updated 2 years ago

Media Panel open on twitch.tv results in a rapid memory leak

Categories

(Core :: Audio/Video: Playback, defect, P3)

65 Branch
defect

Tracking

()

Tracking Status
firefox-esr60 --- wontfix
firefox63 --- wontfix
firefox64 --- wontfix
firefox65 --- wontfix
firefox66 --- fix-optional

People

(Reporter: ke5trel, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: memory-leak, regression, Whiteboard: [MemShrink:P3])

Attachments

(2 files)

Attached file memory-report.json.gz
Ubuntu 18.10

STR:
1. Install Media Panel extension.
2. Go to https://www.twitch.tv/videos/325845815
3. Open DevTools Media-Webrtc panel.

Memory grows rapidly and it is difficult to use the media panel before running out.
Component: Developer Tools → General
Product: WebExtensions → DevTools
This doesn't look as a WebExtension bug, but I'm not sure that it is a DevTools bug neither.
Based on what I saw it looks more like a bug related to the "media debug features" used from the content script.
  
The attached memory report shows that most of the memory is used in the content process where the twitch.tv webpage is being loaded and categorized as "heap-unclassified":

5,522.89 MB (100.0%) -- explicit
├──5,155.53 MB (93.35%) ── heap-unclassified
├────139.15 MB (02.52%) -- window-objects
│    ├──133.10 MB (02.41%) -- top(https://www.twitch.tv/videos/325845815, id=8589934593)
...

When the extensions's media panel is activated, the extension background page will start to execute the extractMediaInfo.js content script in the target tab using tabs.executeScript, this API calls are happening every few seconds and are used to retrieve updated info related to all the media elements into the target webpage (and its subframes).
I doubt that the data sent back from the tabs.executeScript can be the source of that high memory usage on its own, especially because if it would be then we would expect it to be visible in the memory report as "explicit/dom/structured-clone-holder" (https://searchfox.org/mozilla-central/rev/e0c879c86b95bdc752b1dbff6088169735674e4a/dom/base/StructuredCloneBlob.cpp#188-198)

By digging into what the above content script is doing, looking for something "special" and a potential source of the high "heap-unclassified" memory usage, I noticed that extractMediaInfo.js is enabling and using some "Firefox-only debugging facilities" on the media elements (https://searchfox.org/mozilla-central/rev/e0c879c86b95bdc752b1dbff6088169735674e4a/dom/webidl/HTMLMediaElement.webidl#103-113), e.g.:

- HTMLMediaElement.mozEnableDebugLog (https://github.com/mjfroman/media-devtools-panel-react/blob/ebc9f5f34beaac9d29c0a0ff08d98f66bdbbd353/extension/extractMediaInfo.js#L255-L257)
- HTMLMediaElement's mozRequestDebugLog (https://github.com/mjfroman/media-devtools-panel-react/blob/ebc9f5f34beaac9d29c0a0ff08d98f66bdbbd353/extension/extractMediaInfo.js#L208-L227)
- HTMLMediaElement's mozRequestDebugInfo (https://github.com/mjfroman/media-devtools-panel-react/blob/ebc9f5f34beaac9d29c0a0ff08d98f66bdbbd353/extension/extractMediaInfo.js#L180-L206)

These are parts of the media APIs that are not usually enabled in production (and in the webidl, they have been made available only to extensions with the tabs or debugger permissions), and I guess that they may be likely missing the nsIMemoryReporter used to detail their memory usage in the about:memory reports.

To double-check how much "heap-unclassified" memory usage can be triggered by using even just one of these "media debug" features (e.g. HTMLMediaElement.mozEnableDebugLog), I've created a small test extension (composed by just a manifest and a single content script) and then I've used it on the twitch.tv url and after just a few minutes this was the memory usage of the content process where we have called HTMLMediaElement.mozEnableDebugLog in the twitch.tv webpage:

2,947.64 MB (100.0%) -- explicit
├──2,675.38 MB (90.76%) ── heap-unclassified
├─────72.76 MB (02.47%) ++ js-non-window
├─────60.09 MB (02.04%) ++ heap-overhead
├─────53.31 MB (01.81%) ++ window-objects/top(https://www.twitch.tv/videos/325845815, id=4294967297)/active/window(https://www.twitch.tv/videos/325845815)
├─────37.05 MB (01.26%) ++ workers/workers(twitch.tv)/worker(blob:https://www.twitch.tv/434d68f0-5c52-4017-af71-44a0425be761, 0x7f644cf8b800)
├─────37.05 MB (01.26%) ++ media
└─────12.00 MB (00.41%) ++ (22 tiny)
Gerald, do you mind to take a look at this issue and move it into the right bugzilla component?

In comment 1 I've added some additional notes about the issue from the brief investigation I did, and I'm also attaching here a minimal test extension (as a way to simplify further investigation, and to double-check that the "high memory usage" issue can still be triggered by just calling HTMLMediaElement.mozEnableDebugLog from a content script injected only once in the twitch.tv webpage, and even without any developer tools opened on the tab).
Flags: needinfo?(gsquelart)
Regression window: https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=97a987d8875dac5dae5c4724747009a8e8018b32&tochange=dca4b9b236d79b3591434b1170aa46f622887014

Regressed by Bug 1407810.
Blocks: ddlogger, 1407810
Component: General → Audio/Video: Playback
Keywords: regression
Product: DevTools → Core

(Sorry for the delay, I left this in my backlog, as it didn't seem urgent because it's only triggered by the optional DevTools Media panel extension.)

The media panel starts DecoderDoctorLogger, which collects lots of media data. In this case, it seems Twitch makes us produce a massive amout of logging, which takes more time to process than to generate, and then still takes a lot of memory after that.
There are some safety valves ( https://searchfox.org/mozilla-central/rev/5053031ba7621fa8f63f42de4c204ab3561e4e59/dom/media/doctor/DDMediaLogs.cpp#519-526 ), but they're in minutes, so not adequate for this case.
Running Firefox with MOZ_LOG=DDLogger:3 shows some "yet-unprocessed message buffers" climbing quickly (when there should normally hover in low units).

It seems the media panel doesn't even show these log messages anyway, it's quite a waste!
So as a first step, maybe the extension should not enable logging, i.e., don't call HTMLMediaElement.mozEnableDebugLog() unless really needed.
Michael, what do you think? Could you delay mozEnableDebugLog() until the user really wants to see logs?

On top of that, a better solution would be to discard unprocessed logs when they start taking too much memory, but this will be tricky because only DDMediaLogs::ProcessBuffer() is supposed to access it. adding a reader/discarder would need extra concurrency management.
Maybe as a first step ProcessBuffer() could notice the memory issue and not actually process each message (which takes time and memory), so we'll recover memory much more quickly, at the cost of incomplete logs -- but then this means the DevTools Media panel may not be that useful on Twitch anyway. 🤔

Another solution would be to make the safety valves more dynamic, discarding more of the processed logs when the pressure mounts.

And an extra thought: I don't think processed messages are immediately cleared, which means those that contain an nsCString or MediaResult may be also holding onto heap memory. So ideally, MultiWriterQueue::PopAll() should std::move the message and ProcessBuffer() should clear it.

I could look into the DDLogger code changes, but officially I've been out of Media Playback for more than a year! I'd be happy to guide someone else with this, otherwise I'll try to work on it when I've got some free time...

Flags: needinfo?(gsquelart) → needinfo?(mfroman)

(In reply to Gerald Squelart [:gerald] from comment #4)

It seems the media panel doesn't even show these log messages anyway, it's quite a waste!
So as a first step, maybe the extension should not enable logging, i.e., don't call HTMLMediaElement.mozEnableDebugLog() unless really needed.
Michael, what do you think? Could you delay mozEnableDebugLog() until the user really wants to see logs?

I can take a look at delaying or controlling when/if we ask for logs. I think it be some time before it bubbles up to the top of my stack though.

Flags: needinfo?(mfroman)
Severity: normal → enhancement
Priority: -- → P3

Happy to take a patch in nightly 67, or potentially, in beta 66 for this.
I'm marking it fix-optional to remove it from weekly regression triage, since it has a priority assigned.

Whiteboard: [MemShrink] → [MemShrink:P3]
You need to log in before you can comment on or make changes to this bug.