Firefox terminates the background script of WebExtensions after 30 seconds
Categories
(WebExtensions :: General, defect, P2)
Tracking
(relnote-firefox 117+, firefox-esr102 unaffected, firefox-esr115 unaffected, firefox117+ fixed, firefox118+ verified, firefox119+ verified)
Tracking | Status | |
---|---|---|
relnote-firefox | --- | 117+ |
firefox-esr102 | --- | unaffected |
firefox-esr115 | --- | unaffected |
firefox117 | + | fixed |
firefox118 | + | verified |
firefox119 | + | verified |
People
(Reporter: soeren.hentzschel, Assigned: rpl)
References
(Regression)
Details
(Keywords: regression, Whiteboard: [addons-jira])
Attachments
(4 files)
48 bytes,
text/x-phabricator-request
|
pascalc
:
approval-mozilla-beta+
RyanVM
:
approval-mozilla-release+
|
Details | Review |
48 bytes,
text/x-phabricator-request
|
pascalc
:
approval-mozilla-beta+
RyanVM
:
approval-mozilla-release+
|
Details | Review |
534.42 KB,
image/png
|
Details | |
663.92 KB,
image/png
|
Details |
Since bug 1830767 Firefox terminates the background script of WebExtensions after 30 seconds. I got multiple reports that my add-on Bookmarks Organizer no longer works since Firefox 117.
STR:
- Make sure that you have enough add-ons that a check for broken bookmarks takes longer than 30 seconds, let's say 300 bookmarks or more (it will depend on your internet speed).
- Install https://addons.mozilla.org/en-US/firefox/addon/bookmarks-organizer/
- Check for broken bookmarks.
- After 30 seconds, take a look at the Bookmarks Organizer card on about:debugging#/runtime/this-firefox
Actual:
The check for broken bookmarks hangs forever in step 3. In step 4 you can see that the background script is no longer running.
Expected:
The check for broken bookmarks finishes without any problem.
[Tracking Requested - why for this release]:
This is a critical regression in Firefox 117 for users of affected WebExtensions as they suddenly stopped working as expected. I can't imagine that Bookmarks Organizer is the only add-on that is affected by this change (or does it really do something special?). No required change was communicated to / documented for WebExtension developers at all. So I hope this can be fixed for Firefox 117 with a dot release as soon as possible.
Even if you say that the change for Firefox was intended and the add-on has to do something differently - you shouldn't ship a behavior change without any notice to the add-on developers and giving them enough time to ship an update. Please also consider that it always take multiple days until new add-on versions get approved by Mozilla. So in case of an intended change, maybe you could postpone it at least?
Comment 1•1 year ago
|
||
Set release status flags based on info from the regressing bug 1830767
Updated•1 year ago
|
Comment 2•1 year ago
|
||
Luca, can you please triage this bug? Thanks.
Assignee | ||
Comment 3•1 year ago
|
||
(In reply to Pascal Chevrel:pascalc (PTO until August 21) from comment #2)
Luca, can you please triage this bug? Thanks.
Sure thing, I was already starting to investigate it as soon as I was back this morning.
Assignee | ||
Comment 4•1 year ago
|
||
Details from investigation:
- I've confirmed that what was keeping their event page alive are in fact other events which are not technically event page listener (as I was suspecting)
- in particular, the idle timer was being reset as a side effect of the event page sending to the
runtime.onMessage
listener registered from the UI page) an "update-counters" messages while the process was ongoing, given that the message name suggests the message will be sent every time the counters are updated, that was likely often enough to keep the event page alive for long enough for the long lasting process to complete - and so the part of before Bug 1830767 that have technically changed the idle timer behavior for the bookmark organizers extension is the part that is forcefully setting the resetIdleTimerOnEvent to false for child process EventManager instances or not background contexts (instead, before Bug 1830767 those where making the EventManager instance associated to the UI page to route the internal "background-script-reset-idle" event from the child to the parent process, where it was then resetting the idle timer)
In other words, the following change (plus eventually adjusting tests covering the current expected behavior) would be bringing back the part of the previous behavior that the bookmarks-organizer extension was leveraging to stay alive (even if it may have been more of a side effect than the intended reason to be for those events):
diff --git a/toolkit/components/extensions/ExtensionCommon.sys.mjs b/toolkit/components/extensions/ExtensionCommon.sys.mjs
--- a/toolkit/components/extensions/ExtensionCommon.sys.mjs
+++ b/toolkit/components/extensions/ExtensionCommon.sys.mjs
@@ -2273,12 +2273,8 @@ class EventManager {
this.inputHandling = inputHandling;
this.resetIdleOnEvent = resetIdleOnEvent;
- const isBackgroundParent =
- this.context.envType === "addon_parent" &&
- this.context.isBackgroundContext;
-
// Avoid resetIdleOnEvent overhead by only consider it when applicable.
- if (!isBackgroundParent || context.extension.persistentBackground) {
+ if (context.extension.persistentBackground) {
this.resetIdleOnEvent = false;
}
If we will resort to that, we should also file a followup bug (because the current behavior introduced in Bug 1830767 is actually the intended behavior and should have been the case since the original changes to introduce the event pages) and mark it as dev-doc-needed up front to make sure we would be calling it out to the extension developers that may be hitting this kind of issue without realizing.
Possible Hack to workaround it on the extension side
Given that the current behavior is actually the intended one, I also took a look into what an extension may be doing to workaround the issue, at least until we may finally get an official method for the extension to signal to the browser that some long lasting work is still ongoing and would like to be completed before suspending the event page (tracked by Bug 1781265 / https://github.com/w3c/webextensions/issues/416).
And so alternatively, a workaround may be achieved on the extension side using something like the following keepAliveHack
(which is leveraging storage.session
API) from the following diff (where there are also the changes to using it in the extension):
diff --git a/src/js/core/background.js b/src/js/core/background.js
index a56d040..8717a38 100644
--- a/src/js/core/background.js
+++ b/src/js/core/background.js
@@ -794,7 +794,7 @@ const bookmarksorganizer = {
tasks.push(limiter.take().then(executeTask(bookmark, mode, type)));
}
- return Promise.all(tasks);
+ return keepAliveHack(Promise.all(tasks));
},
/**
diff --git a/src/js/core/keepAliveHack.js b/src/js/core/keepAliveHack.js
new file mode 100644
index 0000000..15a5239
--- /dev/null
+++ b/src/js/core/keepAliveHack.js
@@ -0,0 +1,24 @@
+if (!browser.storage?.session) {
+ console.warn("keepAliveHack: browser.storage or browser.storage.session not available, workaround disabled");
+ globalThis.keepAliveHack = (promise) => promise;
+} else {
+ const RESET_IDLE_TIME_MS = 10000;
+ const KEY_NAME = `keepAliveHack:${Date.now()}`;
+ browser.storage.session.onChanged.addListener(data => {
+ // This storage onChanged listener is only used to keep
+ // the event page alive while running tasks that will be
+ // potentially last longer than the idle timeout.
+ });
+ globalThis.keepAliveHack = async function(promise) {
+ forceResetIdleTimer = () => browser.storage.session.set({[KEY_NAME]: true});
+ clearSessionStore = () => browser.storage.session.remove([KEY_NAME]);
+ forceResetIdleTimer();
+ const interval = setInterval(forceResetIdleTimer, RESET_IDLE_TIME_MS);
+ try {
+ return await promise;
+ } finally {
+ clearInterval(interval);
+ clearSessionStore();
+ }
+ };
+}
diff --git a/src/manifest.json b/src/manifest.json
index 022484f..8a1af11 100644
--- a/src/manifest.json
+++ b/src/manifest.json
@@ -11,7 +11,7 @@
"url": "https://www.soeren-hentzschel.at/firefox-webextensions/bookmarks-organizer/?utm_campaign=webext&utm_term=bookmarksorganizer"
},
"background": {
- "scripts": ["js/core/status.js", "js/core/background.js"]
+ "scripts": ["js/core/keepAliveHack.js", "js/core/status.js", "js/core/background.js"]
},
"action": {
"default_icon": "images/icon.svg",
Reporter | ||
Comment 5•1 year ago
|
||
Thank you very much for the investigation and also for your time to find a workaround that I can use in my add-on! I was already able to confirm that it works and will prepare an update later today.
If the current behavior is intended, a more intuitive way to support use cases like this (like the mentioned bug 1781265) would be much appreciated.
Comment 6•1 year ago
|
||
The bug is marked as tracked for firefox117 (release), tracked for firefox118 (beta) and tracked for firefox119 (nightly). However, the bug still isn't assigned.
:mixedpuppy, could you please find an assignee for this tracked bug? Given that it is a regression and we know the cause, we could also simply backout the regressor. If you disagree with the tracking decision, please talk with the release managers.
For more information, please visit BugBot documentation.
Comment 7•1 year ago
|
||
Luca, what are the next steps on the Firefox side of things? Is there any way we can try to assess how many other extensions might be impacted?
Assignee | ||
Comment 8•1 year ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM] from comment #7)
Luca, what are the next steps on the Firefox side of things?
Hi Ryan, I added this to our team standup to discuss the findings from the investigation with the rest of the team and get to an agreement about how we plan to proceed. Next steps we agreed on are below in this comment.
Is there any way we can try to assess how many other extensions might be impacted?
We can easily enough count how many extensions (and/or popular enough ones) have migrated to either event pages in an manifest_version 2 extension or migrated to manifest_version 3 (where is mandatory for the background context to be an event page), but it wouldn't be possible to determine if they would be actually hitting the issue through automated checks, because how likely the extension can hit the issue will largely depends from the kind of tasks the background page is going to be handling and how many chance those have to be long lasting ones that will not be completed in the time available for an event page between WebExtensions API events being handled (e.g. even for this particular extension, to be able to spot the issue one needs a big enough list of bookmarks or a few that loads very slowly, with just a few bookmarks that can be fetched quickly the issue isn't hit and can go unnoticed).
As for the next steps, even if there is a reasonable workaround that can be applied on the extension side (like the one mentioned in comment 4), we got to the following agreed next steps:
-
still prepare a patch to attach to this issue with the changes on the Gecko side mentioned in comment 4 (along with additional changes we may need to adjust tests expectations) as part of fixing this issue on our side in 119 (and request an uplift to 118).
-
we will file a followup to reintroduce the more strict behavior, along with making sure that API implemented in the child process (like the messaging API) are still going to reset the idle timer when they are expected to
If the final version of the fix stays as small as described in comment 4 then we would be also ok to candidate it for a 117 dot release, at least if you think that would be also reasonable from your perspective (even if we may not be able to produce precise enough numbers about how many extensions would be actually able to hit the issue).
Assignee | ||
Updated•1 year ago
|
Updated•1 year ago
|
Updated•1 year ago
|
Comment 9•1 year ago
|
||
Test case that verifies that extension messaging API events cancel event
page suspension.
Assignee | ||
Comment 10•1 year ago
|
||
Assignee | ||
Comment 11•1 year ago
|
||
(In reply to Rob Wu [:robwu] from comment #9)
Created attachment 9351636 [details]
Bug 1851373 - Test coverage for event page + extension messagingTest case that verifies that extension messaging API events cancel event
page suspension.
I have verified locally and so I can already confirm that as we were expecting, running the tests attached from comment 9 gets us the following results:
-
test_runtime_onMessage_cancels_suspend
passes with the patch from comment 10 applied, which is an additional confirmation of what I manually verified while investigating this issue and described in comment 4 (about the messaging API being used to message between the background page and the tab ui to be the one resetting the idle timer) -
test_runtime_onConnect_cancels_suspend
instead does not pass, that was good to confirm explicitly because I didn't look into it specifically while investigating this issue. I'm not that surprised that this is not yet resetting the idle timer (but that was not the case even before the regressing change and so it would be something to leave it for the followup we are going to file)
Comment 12•1 year ago
|
||
in particular, the idle timer was being reset as a side effect of the event page sending to the
runtime.onMessage
listener registered from the UI page) an "update-counters" messages while the process was ongoing, given that the message name suggests the message will be sent every time the counters are updated, that was likely often enough to keep the event page alive for long enough for the long lasting process to complete
For backgrounds with active ports, Firefox will still force stop after 30 seconds.
Here is the demo:
manifest.json
{
"manifest_version": 3,
"name": "Test",
"version": "1.0",
"description": "Just for test.",
"content_scripts": [
{
"matches": ["https://example.com/"],
"js": ["content_scripts.js"]
}
],
"background": {
"scripts": ["background.js"]
}
}
background.js
function onConnect(port) {
console.debug("Port connected", port.name, port);
port.onDisconnect.addListener(async (port) => {
console.debug("Port disconnected", port.name, port);
});
port.onMessage.addListener((msg, port) => {
console.debug("Port message", port.name, msg);
if (msg.startsWith("ping:")) {
port.postMessage(`pong:${msg.substring(5)}`);
}
});
}
browser.runtime.onConnect.addListener(onConnect);
content_scripts.js
function startPingPort() {
const port = browser.runtime.connect(undefined, { name: `port-${crypto.randomUUID()}` });
console.info("Port connected", port.name, port);
port.onMessage.addListener((msg, port) => {
console.debug("Port onMessage", port.name, msg);
});
const intervalID = setInterval(() => {
const getMsg = () => `ping:${crypto.randomUUID()}`;
const msg = getMsg();
port.postMessage(msg);
console.debug("Port postMessage", port.name, msg);
}, 1_000);
port.onDisconnect.addListener((port) => {
console.info("Port disconnected", port.name);
clearInterval(intervalID);
startPingPort();
});
}
function addButton() {
const button = document.createElement("button");
button.innerText = "Start";
button.style = `position: fixed;
top: 10%;
right: 5%;
z-index: 10000;`;
button.addEventListener("click", startPingPort);
document.body.appendChild(button);
}
addButton();
Steps to reproduce:
- Load the demo extension
- Open https://example.com/
- Click the
Start
button
Comment 13•1 year ago
|
||
Comment 14•1 year ago
|
||
Assignee | ||
Comment 15•1 year ago
|
||
(In reply to i from comment #12)
in particular, the idle timer was being reset as a side effect of the event page sending to the
runtime.onMessage
listener registered from the UI page) an "update-counters" messages while the process was ongoing, given that the message name suggests the message will be sent every time the counters are updated, that was likely often enough to keep the event page alive for long enough for the long lasting process to completeFor backgrounds with active ports, Firefox will still force stop after 30 seconds.
Thanks for the additional STR, looking further into resetting the idle timer also on messaging happening through runtime ports is going to be part of what Bug 1851668 is now tracking.
Comment 17•1 year ago
|
||
Comment 18•1 year ago
|
||
Backed out for xpcshell failure on test_ext_eventpage_messaging.js
Backout link: https://hg.mozilla.org/integration/autoland/rev/6b1fa6e9fd7d906f585f9955dba2f40b95d055b7
Log link: https://treeherder.mozilla.org/logviewer?job_id=428186016&repo=autoland&lineNumber=5383
Comment 19•1 year ago
|
||
The test times out if the test runs too slow, to the extent that the background page suspends before the extension page with the onSuspend listener has loaded. I agreed with Luca to bump the timeout to a "large" value (from 1000ms to 5000ms) to reduce the odds of intermittent failures (at the cost of the test taking about 10 seconds more to run).
Comment 20•1 year ago
|
||
Comment 21•1 year ago
|
||
Backed out for xpcshell failure on test_ext_eventpage_messaging.js
Backout link: https://hg.mozilla.org/integration/autoland/rev/1c46c0e9fe19e8dfba2f991bcf784c5699196d87
Log link: https://treeherder.mozilla.org/logviewer?job_id=428201832&repo=autoland&lineNumber=8610
Comment 22•1 year ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/d62de29e10ee
https://hg.mozilla.org/mozilla-central/rev/e842a119bf40
Comment 23•1 year ago
|
||
Should we uplift these patches to 118? Thanks
Comment 24•1 year ago
|
||
(In reply to Pascal Chevrel:pascalc (PTO until August 21) from comment #23)
Should we uplift these patches to 118? Thanks
Yes, we should but I don't understand why Comment 21 said the patches were backed out and Comment 22 (4 hours later) merged the patches to m-c. Now we have a perma failure apparently (Bug 1851966).
:NarcisB or any Sheriff, could you please comment? Thanks!
![]() |
||
Comment 25•1 year ago
|
||
There is a backout of this on autoland which will get merged to central.
Comment 26•1 year ago
|
||
Backout merged to central: https://hg.mozilla.org/mozilla-central/rev/6b1fa6e9fd7d
Assignee | ||
Comment 27•1 year ago
|
||
The issue that triggered the backouts is originated by the additional tests from the second patch, I'll reland the patch with the actual fix (rolling back to part of the original behaviors before the regressing change landed + the adjustments to the existing test coverage) and mark the bug as leave-open to dig more into the reasons behind the failure on android builds while running the additional tests from the second patch.
Comment 28•1 year ago
|
||
Assignee | ||
Updated•1 year ago
|
Comment 29•1 year ago
|
||
Comment 30•1 year ago
|
||
bugherder |
Comment 31•1 year ago
|
||
bugherder |
Assignee | ||
Comment 32•1 year ago
|
||
Comment on attachment 9351660 [details]
Bug 1851373 - Only forcefully set EventManager resetIdleOnEvent flag to false for extensions with a persistent background page. r?robwu!
Beta/Release Uplift Approval Request
- User impact if declined: Users that are using extensions with an event page (either by opt-in to an event page in a manifest_version 2 extension, or by being ported to manifest_version 3) that is responsible for long lasting tasks (as the extension mentioned in comment 0) may experience a regression in the expected behavior of the extension if the event page is shutdown while the long lasting task is still in progress after getting past the idle timeout, due to the messaging API not resetting the idle timeout in Gecko versions affected by this regression.
- Is this code covered by automated tests?: Yes
- Has the fix been verified in Nightly?: No
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): The change is small and brings back part of the behaviors expected in Gecko versions that didn't include the regressing changes. The internals changed by the small patch and related expected behaviors are covered both by pre-existing and new automated tests (provided by the additional patch we are requesting uplift along with the one providing the fix).
- String changes made/needed:
- Is Android affected?: Yes
Assignee | ||
Updated•1 year ago
|
Comment 33•1 year ago
|
||
This seems more severe than S4 given the wider number of affected extensions.
Updated•1 year ago
|
Updated•1 year ago
|
Comment 34•1 year ago
|
||
Comment on attachment 9351660 [details]
Bug 1851373 - Only forcefully set EventManager resetIdleOnEvent flag to false for extensions with a persistent background page. r?robwu!
Approved for 118.0b7, thanks.
Updated•1 year ago
|
Comment 35•1 year ago
|
||
uplift |
Comment 36•1 year ago
|
||
bugherder uplift |
Updated•1 year ago
|
Comment 37•1 year ago
|
||
Verified as Fixed. Tested on the latest Nightly (119.0a1/20230910212034) and Beta (118.0b7/20230910175934) under Windows 10 x64 and Ubuntu 22.04 LTS using the STR provided in Comment 12. Also tested on the latest Release (117.0/20230824132758) for comparison purposes.
Once the STR from Comment 12 is performed, on Nightly and Beta, there is no message logged to console regarding port disconnection after the task runs for 30 seconds. In fact, there is no port disconnection message logged at all, even after 10 minutes. As such, the fix is confirmed.
On Release (which does not contain the fix) once the task is started, after 30 seconds it’s terminated and a port disconnection message is logged to console.
Comment 38•1 year ago
|
||
Comment on attachment 9351636 [details]
Bug 1851373 - Test coverage for event page + extension messaging
Approved for 117.0.1.
Updated•1 year ago
|
Updated•1 year ago
|
Comment 39•1 year ago
|
||
uplift |
Updated•1 year ago
|
Comment 40•1 year ago
|
||
Added to the 117.0.1 release notes:
Fixed a bug causing extensions using an event page for long-running tasks to be terminated while running, causing unexpected behavior changes
Description
•