insertCSS/removeCSS promise may not settle during heavy page loads, causing subsequent async code to fail
Categories
(WebExtensions :: General, defect, P2)
Tracking
(firefox126 wontfix, firefox127 wontfix, firefox128 verified, firefox129 verified)
People
(Reporter: top.car8902, Assigned: robwu)
References
Details
(Whiteboard: [addons-jira])
Attachments
(6 files)
|
50.09 KB,
image/png
|
Details | |
|
1.60 KB,
application/zip
|
Details | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
phab-bot
:
approval-mozilla-beta+
|
Details | Review |
|
48 bytes,
text/x-phabricator-request
|
phab-bot
:
approval-mozilla-beta+
|
Details | Review |
Steps to reproduce:
Firefox 126, macOS 14.5.
- Download the source code for my comment blocker extension from this branch:
https://github.com/RickyRomero/shut-up-webextension/tree/firefox-css-bug - In the src folder, delete manifest.json and rename manifest.ffx.json to manifest.json
- Open about:debugging#/runtime/this-firefox
- Click "Load Temporary Add-on" and browse to the src folder in my extension
- Grant site access permission in the options page if requested
- Inspect the extension from the debugging tab
- Load https://www.reddit.com/r/aww/comments/1d5sdin/my_17_yr_old_blind_and_deaf_shiba_loves_the/ in a new tab and observe the console output
NOTE: THIS BUG MAY REQUIRE MULTIPLE PAGE LOADS TO APPEAR.
Actual results:
Some page loads may show "result: elapsed" in red text in the console. That's the insertCSS/removeCSS promise failing to settle within 10 seconds, and my own code stepping in with Promise.race() to prevent a softlock.
I've extended the timeout to 10 seconds to show how serious this bug is. Under normal conditions, I time out within 50ms, extending the timeout over multiple attempts until the promise either resolves or rejects. It can sometimes take 5 or 6 of these timeouts for the promise to settle correctly. You can test this behavior yourself by changing the timeout at the top of blocker.js.
Expected results:
If insertCSS or removeCSS fails with an error, the promise should reject. If it succeeds, it should resolve. In other words, it should always either resolve or reject, not block further async code from executing by remaining open indefinitely.
Comment 1•1 year ago
|
||
The Bugbug bot thinks this bug should belong to the 'WebExtensions::Untriaged' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.
| Assignee | ||
Comment 2•1 year ago
|
||
Could you create a minimal test case that demonstrates the issue? I see the scripting.insertCSS / scripting.removeCSS calls in core/blocker.js, but it is mixed with lots of unrelated code.
And could you show the output of the global Browser Console? If you don't see any relevant errors, try enabling the Multi Process option to also include logs from other processes.
| Reporter | ||
Comment 3•1 year ago
|
||
@robwu
I've spent my evening updating my repo with a branch for this bug:
https://github.com/RickyRomero/shut-up-webextension/tree/firefox-css-bug
The branch contains a minimal test case. All relevant code is in src/init.js. You don't need to rename the manifest JSON file anymore either.
You should (hopefully) see that, when loading a simple page, the promises resolve and the operation queue clears every time. Try it on https://rickyromero.com/ to see what I mean. Conversely, https://www.reddit.com/r/aww/comments/1d5sdin/my_17_yr_old_blind_and_deaf_shiba_loves_the/ leaves promises hanging open most times I load the page with this same code.
Unfortunately, the Browser Console doesn't reveal any clues in either parent- or multi-process modes.
Same setup as before: Firefox 126, macOS 14.5.
Comment 4•1 year ago
|
||
I reproduced the issue on the latest Nightly (128.0a1/20240603160728), Beta (127.0/20240603152359) and Release (126.0.1/20240526221752) under Windows 10 x64 and Ubuntu 22.04 LTS.
Loading the extension from Comment 3 and then https://www.reddit.com/r/aww/comments/1d5sdin/my_17_yr_old_blind_and_deaf_shiba_loves_the/ in a new tab and checking the extension console will show continuous “Async code halted; promise hanging open…” errors. See the attached screenshot for more details.
Comment 5•1 year ago
|
||
| Assignee | ||
Updated•1 year ago
|
Updated•1 year ago
|
| Assignee | ||
Comment 6•1 year ago
|
||
Minimal test case demonstrating the specific issue:
- Visit example.com
- Press the extension button. This will insert a slow-loading frame and call
scripting.executeScript.
Expected:
- Slow-loading frame inserted.
scripting.executeScriptresolves, with results:- results from top frame and slow frame (this is Chrome's behavior).
- or at least only the top frame without the slow frame (but definitely be resolving).
- second
scripting.executeScriptcall after frame load reports results from top document and the frame.
Actual:
- Slow-loading frame inserted.
scripting.executeScriptgot stuck.- second
scripting.executeScriptcall after frame load reports results from top document and the frame.
| Assignee | ||
Comment 7•1 year ago
|
||
The issue is not specific to scripting.insertCSS / scripting.removeCSS. It also happens with scripting.executeScript, and the variants of these APIs in the tabs namespace.
A work-around is to use injectImmediately:true in the scripting.executeScript API, or runAt:"document_start" in the tabs.executeScript API. There are no work-arounds available for insertCSS / removeCSS, because these do not have the runAt option available (they are scheduled at document_idle).
The issue is happening because the implementation for execution at document_idle and document_end gets stuck when an initial about:blank frame is encountered (which has document.readyState = "uninitialized"). When the requested frame src is loaded, the frame will navigate to the final destination, and the internal promiseDocumentIdle, promiseDocumentLoaded (or in document_end's case) promiseReady implementations will never resolve.
Note that if the document load is aborted, that the frame will always be at the uninitialized state, without content script execution.
At the very least we should get unstuck, whether by throwing an error or by executing the script after a delay. A point against executing the script after the delay is that the frame could be cross-origin and be in a different process, so there is no guarantee that we could get meaningful results.
In case of declarative content scripts, the content script will execute in the final frame. I haven't checked whether it would try injecting in the initial about:blank.
| Assignee | ||
Comment 8•1 year ago
|
||
The reported bug is caused by an attempt to schedule a content script in
an initial about:blank document/frame for which document_end and
document_idle never resolve. Although the bug is about dynamic execution
(insertCSS/executeScript/etc), the underlying injection logic is also
shared by declarative content scripts.
This test adds test coverage for two relevant scenarios:
-
Content scripts declared via content_scripts in manifest.json (this is
also the underlying implementation of contentScripts.register,
userScripts.register and scripting.registerContentScripts).
This WORKS AS EXPECTED, because scripts never run in initial
about:blank ( which is reported at bug 1415539 + bug 1486036 ). -
Content scripts executed in existing content as part of extension
install via ExtensionPolicyService::InjectContentScripts. This
UNEXPECTEDLY results in executions that are stuck and never-settling.
The same issue is also relevant to the handleActivateScripts logic,
which did not and still does not have explicit coverage for this
scenario. A generic solution to the InjectContentScripts case will also
apply to handleActivateScripts, so I did not add tests for that here.
A test case for the dynamic code execution case is not part of this
patch because the observed result would be that the test gets stuck.
This test case will be included with the next patch that fixes this bug,
which will also update the test expectation for the "UNEXPECTEDLY"
never-settling promise mentioned above.
| Assignee | ||
Comment 9•1 year ago
|
||
This patch fixes the issue by injecting immediately when a content
script execution is scheduled for an initial about:blank document.
This case is only reachable through dynamic content script execution
APIs, and not through declarative content scripts. The latter is out of
scope for this bug and tracked separately at bug 1415539 + bug 1486036.
An alternative solution could have been to not inject, e.g. by ignoring
the frame or throwing an error. This would however be unexpected to an
an extension developer whose intention is to run a code snippet in all
frames that are reachable by web pages (which works in Chrome).
| Assignee | ||
Updated•1 year ago
|
| Assignee | ||
Updated•1 year ago
|
Comment 10•1 year ago
|
||
Comment 11•1 year ago
|
||
Backed out for causing assertion failures @ WebExtensionPolicy.cpp
Backout link: https://hg.mozilla.org/integration/autoland/rev/39acea0a60fe191ead9b51557ac3e7c10782d64e
[Failure log -> 17388] Assertion failure: !mThis.URL().Spec().EqualsLiteral("about:blank"), at /builds/worker/checkouts/gecko/toolkit/components/extensions/WebExtensionPolicy.cpp](https://treeherder.mozilla.org/logviewer?job_id=462466611&repo=autoland&lineNumber=3059)
| Assignee | ||
Comment 12•1 year ago
|
||
The debug assertion is associated with the patch from bug 1901894, and independent of the patches here.
Comment 13•1 year ago
|
||
Comment 14•1 year ago
|
||
| bugherder | ||
https://hg.mozilla.org/mozilla-central/rev/35a4042189bb
https://hg.mozilla.org/mozilla-central/rev/144bdc6f060b
Comment 15•1 year ago
|
||
Verified as Fixed. Tested using the test extension and STR from Comment 6 on the latest Nightly (129.0a1/20240616215431) under Windows 10 x64 and Ubuntu 22.04 LTS.
The first scripting.executeScript resolves with results and does not get stuck, confirming the fix. For comparison, on the latest Release 127, the first scripting.executeScript gets stuck.
Comment 16•1 year ago
|
||
The patch landed in nightly and beta is affected.
:robwu, is this bug important enough to require an uplift?
- If yes, please nominate the patch for beta approval.
- If no, please set
status-firefox128towontfix.
For more information, please visit BugBot documentation.
| Assignee | ||
Comment 17•1 year ago
|
||
The reported bug is caused by an attempt to schedule a content script in
an initial about:blank document/frame for which document_end and
document_idle never resolve. Although the bug is about dynamic execution
(insertCSS/executeScript/etc), the underlying injection logic is also
shared by declarative content scripts.
This test adds test coverage for two relevant scenarios:
-
Content scripts declared via content_scripts in manifest.json (this is
also the underlying implementation of contentScripts.register,
userScripts.register and scripting.registerContentScripts).
This WORKS AS EXPECTED, because scripts never run in initial
about:blank ( which is reported at bug 1415539 + bug 1486036 ). -
Content scripts executed in existing content as part of extension
install via ExtensionPolicyService::InjectContentScripts. This
UNEXPECTEDLY results in executions that are stuck and never-settling.
The same issue is also relevant to the handleActivateScripts logic,
which did not and still does not have explicit coverage for this
scenario. A generic solution to the InjectContentScripts case will also
apply to handleActivateScripts, so I did not add tests for that here.
A test case for the dynamic code execution case is not part of this
patch because the observed result would be that the test gets stuck.
This test case will be included with the next patch that fixes this bug,
which will also update the test expectation for the "UNEXPECTEDLY"
never-settling promise mentioned above.
Original Revision: https://phabricator.services.mozilla.com/D212986
Updated•1 year ago
|
| Assignee | ||
Comment 18•1 year ago
|
||
This patch fixes the issue by injecting immediately when a content
script execution is scheduled for an initial about:blank document.
This case is only reachable through dynamic content script execution
APIs, and not through declarative content scripts. The latter is out of
scope for this bug and tracked separately at bug 1415539 + bug 1486036.
An alternative solution could have been to not inject, e.g. by ignoring
the frame or throwing an error. This would however be unexpected to an
an extension developer whose intention is to run a code snippet in all
frames that are reachable by web pages (which works in Chrome).
Original Revision: https://phabricator.services.mozilla.com/D212987
Updated•1 year ago
|
Comment 19•1 year ago
|
||
beta Uplift Approval Request
- User impact if declined: Extensions that dynamically insert content scripts may get stuck; memory leaks may occur
- Code covered by automated testing: yes
- Fix verified in Nightly: yes
- Needs manual QE test: no
- Steps to reproduce for manual QE testing: https://bugzilla.mozilla.org/show_bug.cgi?id=1900222#c6
- Risk associated with taking this patch: Low
- Explanation of risk level: Specific change in extensions code, well-understood cause and fix, covered by automated unit tests and manual tests
- String changes made/needed: none
- Is Android affected?: yes
| Assignee | ||
Comment 20•1 year ago
|
||
Requested uplift for the above patches and the one from bug 1901894.
Updated•1 year ago
|
Updated•1 year ago
|
Updated•1 year ago
|
Updated•1 year ago
|
Updated•1 year ago
|
Updated•1 year ago
|
Updated•1 year ago
|
Comment 21•1 year ago
|
||
| uplift | ||
Updated•1 year ago
|
Comment 22•1 year ago
|
||
Verified as Fixed. Tested using the test extension and STR from Comment 6 on the latest Beta (128.0b5/20240618204306), available from https://treeherder.mozilla.org/jobs?repo=mozilla-beta&revision=94586d93e2a743a9764c62a4098617652fad8339 under Windows 10 x64 and Ubuntu 22.04 LTS.
The first scripting.executeScript resolves with results and does not get stuck, confirming the fix.
| Reporter | ||
Comment 23•1 year ago
|
||
Chiming in here, on macOS 14.5 and Firefox 128.0b6 the fix appears to have solved my problem. insertCSS/removeCSS are both much more reliable now. Thank you for quickly fixing this @robwu!
Description
•