Closed Bug 2011064 Opened 3 months ago Closed 3 months ago

High CPU and memory usage in YouTube embed with CanvasBlocker extension, triggered after early content script execution in temporary about:blank frame

Categories

(WebExtensions :: Developer Outreach, defect, P1)

Firefox 149
x86_64
Windows 10
defect

Tracking

(Not tracked)

RESOLVED MOVED

People

(Reporter: the.ra2.ifv, Assigned: robwu)

References

(Regression)

Details

(Keywords: dev-doc-needed, regression)

Attachments

(3 files)

Steps to reproduce:

  1. Enable https://addons.mozilla.org/en-US/firefox/addon/canvasblocker
  2. Visit https://ylianst.github.io/MeshCentral/meshcentral/
  3. Observe for increased CPU and RAM usage

Expected results:

na

Behavior: High CPU usage (8%) RAM leak (>4G)
Threads profile: https://share.firefox.dev/49xH3NO

(root scope) https://www.google.com/js/th/Yey1O02-OXiYiyXPnwQ2gWM0fdJziOgjbfkkNrIybyM.js line 2 > eval line 3334 > eval line 1 > eval line 1 > eval [https://www.google.com/js/th/Yey1O02-OXiYiyXPnwQ2gWM0fdJziOgjbfkkNrIybyM.js line 2 > eval line 3334 > eval line 1 > eval line 1 > eval:1:11]
Q2 [https://www.google.com/js/th/Yey1O02-OXiYiyXPnwQ2gWM0fdJziOgjbfkkNrIybyM.js line 2 > eval line 3334 > eval line 1 > eval line 1 > eval:1:14]
js::RunScript
S.l</< [https://www.google.com/js/th/Yey1O02-OXiYiyXPnwQ2gWM0fdJziOgjbfkkNrIybyM.js line 2 > eval:3334:52643]
js::RunScript
$n [https://www.google.com/js/th/Yey1O02-OXiYiyXPnwQ2gWM0fdJziOgjbfkkNrIybyM.js line 2 > eval:3334:29658]
b0 [https://www.google.com/js/th/Yey1O02-OXiYiyXPnwQ2gWM0fdJziOgjbfkkNrIybyM.js line 2 > eval:3334:2391]
P [https://www.google.com/js/th/Yey1O02-OXiYiyXPnwQ2gWM0fdJziOgjbfkkNrIybyM.js line 2 > eval:3334:25817]
js::RunScript
p [https://www.google.com/js/th/Yey1O02-OXiYiyXPnwQ2gWM0fdJziOgjbfkkNrIybyM.js line 2 > eval line 3334 > eval line 1 > eval line 1 > eval:1:13]
js::RunScript
...
(root)
OS: Unspecified → Windows 10
Hardware: Unspecified → x86_64

The Bugbug bot thinks this bug should belong to the 'Core::Performance: General' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.

Component: Untriaged → Performance: General
Product: Firefox → Core

Can repro. Looks like https://ylianst.github.io/MeshCentral/meshcentral/ has youtube videos in an iframe, which leads to creation of a new process. When i install the addon and visit the page, the iframe-d youtube process goes berserk.

Note that the addon says that they change some JS functions to prevent fingerprinting. So maybe some important js call gets borked?
Profile: https://share.firefox.dev/3NC7Dhz

Tentatively classifying under webextensions.

Component: Performance: General → Compatibility
Product: Core → WebExtensions

Tested on macOS with Firefox 149.
Observed resource usage stabilizes after initial load and I am not able to reproduce the bug.

Can you please show me exactly where to look in order to see the described behavior?

Flags: needinfo?(the.ra2.ifv)

I can't get a stabilized result, as I don't have enough RAM (8G) to feed. If the RAM usage stabilized in the end, it stabilizes, I think.
Please visit Github issue for more feedbacks.

Flags: needinfo?(the.ra2.ifv)

Yeah, after studying the Github issue, I was able to reproduce on macOS with Firefox 149. Marking bug as confirmed.

Status: UNCONFIRMED → NEW
Ever confirmed: true

Results of bisection:
Bug 543435 - Only reuse inner window for uncommitted initial documents. r=smaug,asuth,credential-management-reviewers,webrtc-reviewers,pehrsons,hsivonen,dimi

Keywords: regression
Regressed by: sync-about-blank

Set release status flags based on info from the regressing bug 543435

:vhilla, since you are the author of the regressor, bug 543435, could you take a look? Also, could you set the severity field?

For more information, please visit BugBot documentation.

Flags: needinfo?(vhilla)

I recorded two profiles

They are recorded with local opt builds and I followed the steps from comment 0. (Opened this bug, installed the extension, activated the profiler, clicked on the second link)

Looking at the markers, the IPC messages sent by the extension differ. And there are several sent right before some script on YouTube enters a some endless microtask. This causes yank and continuously allocates memory. The microtask is some promise callback.

But I have no idea what happens.

Preventing lib/iframeProtection.js from being injected or early-returning iframeProtections.js protect avoids the issue.

(In reply to Vincent Hilla [:vhilla] from comment #11)

Preventing lib/iframeProtection.js from being injected or early-returning iframeProtections.js protect avoids the issue.

And this is because calling extension.createProxyFunction does setupWindowForProxies where it modifies toString. This alteredToString is somehow necessary to trigger the issue. If alteredToString returns "", the issue disappears.

Attached file index.htm

For ease of debugging, a reduced test case instead of https://ylianst.github.io/MeshCentral/meshcentral/

Clicking on the button to embed YouTube in a frame triggers the bug, as observed through a memory spike in about:processes.

Sometimes the bug does not trigger (i.e. no memory spike within seconds). In that case, click the button to remove the iframe and retry.

Here is a profile, with JS Execution Tracing (tracing) and JavaScript Sources (jssources) enabled via about:profiling: https://share.firefox.dev/3LRR3Kf

For some reason the tracing setting disables native frames in the call tree, so here is one without the JS Execution Tracing (tracing) option checked: https://share.firefox.dev/4bZjdxp

I don't immediately see anything obvious in the JS, YouTube/Google's JS code is extremely obfuscated.
The second profile looks similar to the second profile from comment 10.

I spotted a behavioral change triggered by bug 543435: YouTube creates a very temporary <iframe style="display: none;"></iframe>. Previously the extension's content script did not execute there, but now its document_start scripts execute immediately. The behavior can easily be observed through the devtools and adding a conditional breakpoint in any of the extension scripts (e.g. lib/extension.js) with condition new Error().stack.includes("appendChild").

For some reason, the severe memory leak stops occurring if I keep the iframe (ignoring YouTube's request to remove the frame). To do so, the following snippet can be added as a conditional breakpoint in lib/extension.js, or by patching the file (after extraction, and loading the new extension via about:debugging):

new Error().stack.includes("appendChild") && (f=>{
  let p = f.parentNode.wrappedJSObject;
  let rc = p.removeChild.wrappedJSObject;
  if (rc) p.removeChild=exportFunction(function(e) {
    if (e===f) {console.log("nah, did not think so - keep iframe pls")}
    else return rc.call(this, e);
  }, p.ownerDocument.defaultView);
})(frameElement);

Without the above snippet, I see the memory usage spiking within seconds. BUT the Snapshot feature of the devtools fails to record any notable allocations.

And about:memory shows that the vast majority of the leak is attributed to class(Array)/objects -> gc-buffers at YouTube.

Summary: Looping (High CPU RAM use) on specific page with CanvasBlocker extension → High CPU and memory usage in YouTube embed with CanvasBlocker extension, triggered after early content script execution in temporary about:blank frame

Could we get the Priority and Severity field set so as that we understand the urgency wrt releases in Flight? thanks

Flags: needinfo?(lgreco)

P1 because this is a recent regression.
S2 because of high impact to those affected (gigabytes of memory usage on YouTube*). It is unclear whether the issue is specific to the (recommended) CanvasBlocker extension (with 40k users) or even more general.

* I can reliably reproduce the issue in YouTube embeds. While trying out top-level YouTube, I also saw the same memory issue in top-level YouTube documents.

In fact, it might not be YouTube-specific, while trying to confirm the reproducibility of this issue with top-level YouTube, I see that not only YouTube has a 3 GB memory usage, but also the process hosting an iframe from https://tpc.googlesyndication.com. That could be a sign of a common Google library triggering the leak with the CanvasBlocker, but could also mean that there is a more generic issue at play here.

Severity: -- → S2
Flags: needinfo?(lgreco)
Priority: -- → P1
Attached file toStringTamper.zip

I've minimized CanvasBlocker to the attached extension. Apparently all that is needed is to tamper with Function.prototype.toString.

Attached file pageWorldTamper.zip

The leak also occurs if the content script is injected at document_end.

We know that the issue occurs within some temporary blank iframe. Here's an AI written extension that adds a page script, which modifies createElement to hook into this iframe and proxy toString on load. This way, we don't need to use exportFunction. I initially wanted to test this in Chrome, but this extension doesn't trigger the leak in Firefox.

The timing of load vs document_end is slightly different. But as we modify createElement and add a capturing load listener, we should be the first to get to the iframe context. Logging indicates that we did add our proxy in that temporary iframe.

So it seems like installing the proxy from within the page world doesn't cause an issue. Could the cross-compartment function call cause problems?

Another peculiar thing is whether devtools collapses the logging I add to the toString proxy. If toString returns "", identical messages from it are collapsed. If it returns original.call(this), they aren't collapsed.

And whatever is returned from toString is being evaled in the parent window, i.e. the youtube iframe.

I think that I identified the cause of the CPU + memory issue.

To rule out web platform issues, I reduced the extension from comment 17 even further, to

globalThis.exportFunction ??= f => f;
var win = window.wrappedJSObject ?? window;
var orig = win.Function.prototype.toString
win.Function.prototype.toString = exportFunction(function() {
  return orig.call(this, ...arguments);
}, window);

... and added "world": "MAIN" in manifest.json. This did not trigger the problem any more, which made me suspect Xrays or Sandbox.

Along with my prior observation of there being a temporary about:blank frame being involved (comment 14), I formed the hypothesis of the issue being caused by the invalidation of the function generated by exportFunction. You see, when an iframe unloads, the sandbox of the web extension is nuked, and that invalidates anything generated with exportFunction. To validate this hypothesis, I ran a custom build of Firefox with the Cu.nukeSandbox call commented out, which extends the lifetime of functions generated by exportFunction, beyond the lifetime of the frame. This caused the issue to disappear.

Flags: needinfo?(vhilla)

Some more explanation:

  • exportFunction allows extensions to export an function from the (privileged) extension content script scope to the web page. Extensions may use this in order to run code in the content script's scope, isolated from the web page. This is a Firefox-only method, not supported by Chrome or other browsers.
  • The lifetime of exportFunction is bound to the document where the content script is running.
  • YouTube appears to create an iframe, create some objects in it, and then removes the iframe. Past the removal of the frame, it continues to use objects associated with that frame's global.
  • The CanvasBlocker extension modifies the prototype of functions, including Function.prototype.toString, overridden with the result of exportFunction. Because that exported function is destroyed upon frame invalidation, an error is now thrown whenever toString() is called.

I can reproduce this issue independently of the content script sandbox, by running a content script with "world": "MAIN" and the following content (plus test page from comment 13):

var orig = Function.prototype.toString;
Function.prototype.toString = function() {
  if (window.closed) {
    trigger_intentional_error(); // not defined, throws error.
  } else {
    return orig.call(this);
  }
};
See Also: → 2010130

I confirmed the above hypothesis (on the dead wrapper from exportFunction triggering an error) by checking the "Pause on exceptions" and "Pause on caught exceptions" options in the Debugger. This was too noisy because YouTube triggers and catches many exceptions, so I looked for a way to have a conditional error breakpoint. That feature does not exist in the devtools (bug 836298), so I patched the devtools by adding the following snippet to https://searchfox.org/firefox-main/rev/8d8cb89b010e9214fdb87a915511f265eeb06530/devtools/server/actors/thread.js#2049 :

    try {
      if (!this._options.ignoreCaughtExceptions) {
        let errfound = false;
        if (value?.class === "TypeError") {
          errfound = value.unsafeDereference()?.message === "can't access dead object";
        } else if (value?.class === "DOMException") {
          // May sometimes be redacted:
          // https://searchfox.org/firefox-main/rev/8d8cb89b010e9214fdb87a915511f265eeb06530/js/xpconnect/src/ExportHelpers.cpp#326
          errfound = value.unsafeDereference()?.message === "An exception was thrown";
        }
        // Uncomment to trigger for specific origins only, e.g. advert frame on YouTube:
        // errfound ||= youngestFrame.eval("origin == 'https://tpc.googlesyndication.com'").return;
        if (!errfound) {
          return undefined;
        }
      }
    } catch {
      return undefined;
    }

with the above patch to Firefox, following the STR triggers the following breakpoint in YouTube:

(p(m, d.y7, R.apply(Q, d.D)), m.Bv = m.H(), T = 46)
              ^^^^^
              throws TypeError: can't access dead object

In this specific example, R is the Function.prototype.toString method from the temporary about:blank frame, overridden by the extension with exportFunction. But it throws, as explained in comment 21.

I noticed that YouTube was not the only process with excessive memory usage, I saw the same for https://tpc.googlesyndication.com. To figure out the source of that, I modified my breakpoint to trigger on that origin only, and appended the following to the extension.js snippet from comment 20:

Object.assign(win.Function.prototype.toString, {
  origin: origin,
  documentURL: document.URL,
  frameElement: frameElement?.outerHTML,
  ancestors: [].join.call(location.ancestorOrigins),
  window: win,
});

After repeating the previous steps, now with YouTube in a top level document, the breakpoint triggered again

  (t(F.Es, b, y.apply(G, F.I)), b.KU = b.K(), A = 61)
              ^^^^^
              throws TypeError: can't access dead object

and I ran the following in the console to inspect the source: Object.fromEntries(Object.entries(y))

Object {
  ancestors: "https://tpc.googlesyndication.com,https://www.youtube.com"
  documentURL: "about:blank"
​  frameElement: '<iframe style="display: none;"></iframe>'
  origin: "https://tpc.googlesyndication.com"
  window: Window
}

This shows that Google's googlesyndication.com appears to use the same technique as YouTube. I guess that there is some common Google-internal library.

With these findings, I think that the extension could consider deploying a hotfix to not run code when document.URL === "about:blank" && frameElement?.style.display === "none" (or to run code, but keep the iframe in the document). This would be a stopgap work-around to avoid known bugs (with YouTube, etc). It does not fix the fundamental issue of it being unsafe to overwrite built-ins with exportFunction (which is what CanvasBlocker does). What it SHOULD be doing instead is to use a "world": "MAIN" to safely overwrite globals without being affected by unloaded content scripts.

An alternative is for Firefox to be modified to not execute content scripts immediately at about:start, but a bit later. The downside to that is that extensions would not be able to execute their logic immediately.

To be abundantly obvious, this bug is caused by a bug in the extension. While Firefox triggered the regression due to an improvement (bug 543435), the extension modified built-in prototypes in ways that are incompatible with actual usage by third party sites (i.e. YouTube and presumably other Google properties in this case). But besides extensive testing, extension authors could not have known about the pitfall, because the lifetime of exportFunction is not documented on MDN. We should do so: https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/Content_scripts/exportFunction

Downgrading S2 to S3, due to the improved understanding of the issue. In short, this is not inherently a problem with about:blank, but a bug caused by a combination of an extension relying on exportFunction, which breaks web pages (YouTube, Google Ads, likely other Google properties) (see comment 21 for details, and comment 23 for even more details and action items).

dev-doc-needed: We should document the pitfalls of exportFunction: we should call out the lifetime of the exported function and that a "can't access dead object" error can be thrown if the function is called past the end of the lifetime. In particular, this can happen if an exported function is called from a different frame.

I will take this bug, to submit a work-around PR to the CanvasBlocker extension author. This is not something that we are required to do, but after having spent hours (not just me, also vhilla), it would be nice to at least not have this impact on major sites. Users don't care where a bug is coming from, they just want the web to work.

Nice to haves ideas for the future on the Firefox side:

  • Make detection and diagnosing this issue easier by capturing JSMSG_DEAD_OBJECT errors (potentially with an extension-specific exportFunction / cloneInto implementation?) and logging the warning somewhere when called from web content. This would make it easier to surface the bug even if the web page catches all errors (like YouTube/Google did in this case).
  • Add option to allow built-ins overwritten with exportFunction to be restored to their original (or any other method from the web page) when the wrapper were to become invalid (frame removal as seen here, but also extension unload as seen in bug 2010130).
Assignee: nobody → rob
Severity: S2 → S3
Status: NEW → ASSIGNED
Flags: needinfo?(mixedpuppy)
Keywords: dev-doc-needed
Status: ASSIGNED → RESOLVED
Closed: 3 months ago
Component: Compatibility → Developer Outreach
Resolution: --- → MOVED
Duplicate of this bug: 2012370

(In reply to Rob Wu [:robwu] from comment #21)

I can reproduce this issue independently of the content script sandbox, by running a content script with "world": "MAIN" and the following content (plus test page from comment 13):

var orig = Function.prototype.toString;
Function.prototype.toString = function() {
  if (window.closed) {
    trigger_intentional_error(); // not defined, throws error.
  } else {
    return orig.call(this);
  }
};

Leaving a comment here because the implications didn't immediately become clear to me. Throwing always or always returning orig.call(this) would not cause issues, it's important we only start raising exceptions once window.closed. Then this reproduces the issue, so the exception likely causes some loop in YouTube.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: