Closed Bug 1905153 Opened 3 months ago Closed 3 months ago

Non-functional extension APIs at startup due to race condition, when asyncLoadModule is called before loadModule

Categories

(WebExtensions :: General, defect)

defect

Tracking

(firefox-esr128 fixed, firefox127 wontfix, firefox128 wontfix, firefox129 fixed)

RESOLVED FIXED
129 Branch
Tracking Status
firefox-esr128 --- fixed
firefox127 --- wontfix
firefox128 --- wontfix
firefox129 --- fixed

People

(Reporter: robwu, Assigned: robwu)

References

Details

(Whiteboard: [addons-jira])

Attachments

(2 files)

I have been investigating a report where a very simple extension (with only tabs.onCreated & tabs.onUpdated) never woke up its event page at https://discourse.mozilla.org/t/one-of-my-manifest-v3-addons-is-not-waking-up-when-it-should/131349

I audited the code to rule out many issues, and arrived at only one way that the reported issue could possibly have happened (full analysis in this document, which I'm reporting here. The exact impact of the conditions and all affected APIs is listed at the end of this report.

Race condition

If the stars are aligned poorly, a race condition could occur that causes a call to extension.apiManager.getAPI(module, extension, "addon_parent") in primeListeners to throw. This happens via loadModule (specifically _checkLoadModule, failing the module.asyncLoaded check) if asyncLoadModule has been called before for the same module name, possibly from a completely unrelated extension (as all extensions share the same SchemaAPManager instance (aka Management aka ExtensionParent.apiManager).

In the case that I analyzed, the error would have been:

Error: Module 'tabs' currently being lazily loaded

asyncLoadModule has several callers, including _callHandlers which may be called from the apiManager constructor when any add-on was disabled. The “tabs” module does indeed have a registration for “disable” (and “update”) events in ext-browser.json.
For “update”, there is an update handler that also calls asyncLoadModule, but with more levels of asynchronicity, so either the timing was very unlucky, or the “disable” case happened.

In the parent process, this race condition can happen anywhere where there is potential for loading extension API modules synchronously (after an async load). Although the implementation is shared by extension API manager in the parent and child processes, the issue is limited to the parent process because we only perform synchronous API module loading in the child process.

There are four callers of apiManager.getAPI (which performs a synchronous API load). As the breakdown below shows, the impact of this issue is currently limited to EventManager.primeListeners:

Impacted APIs

This affects all APIs whose implementation inherits from ExtensionAPIPersistent and have "events" in their module definition, which are the following APIs (along with the events that should happen at startup for any add-on to trigger this bug):

The impact is that if any of the add-on lifecycle events mentioned above occur at browser startup, that the event page will not wake up despite the a persisted event registration for one of the above events, or even a event registration for an API namespace that happened after the failing event registration.

Pushed by rob@robwu.nl: https://hg.mozilla.org/integration/autoland/rev/cb040184e20b Support loadModule after asyncLoadModule r=rpl
Status: NEW → RESOLVED
Closed: 3 months ago
Resolution: --- → FIXED
Target Milestone: --- → 129 Branch
Attachment #9410660 - Flags: approval-mozilla-beta?

beta Uplift Approval Request

  • User impact if declined: The background script of an extension may not start at browser startup due to a race condition.
  • Code covered by automated testing: yes
  • Fix verified in Nightly: no
  • Needs manual QE test: no
  • Steps to reproduce for manual QE testing: Not reproducible, it is a race condition. But fully covered by automated tests.
  • Risk associated with taking this patch: None
  • Explanation of risk level: This patch removes a redundant "assertion" and updates the code path triggering that assertion to behave as expected. This behavior is verified by automated tests. The issue is very well-understood, and the fix is very targeted.
  • String changes made/needed: none
  • Is Android affected?: yes
Flags: in-testsuite+
Attachment #9410660 - Flags: approval-mozilla-release?

Comment on attachment 9410660 [details]
Bug 1905153 - Support loadModule after asyncLoadModule

We're in RC week already, so it's too late to get this into Beta. Will keep this on the radar in the event of an RC respin or the scheduled mid-cycle dot release otherwise.

Attachment #9410660 - Flags: approval-mozilla-beta? → approval-mozilla-beta-

Can we get this in ESR128? My desire for uplifting is mainly to improve the reliability of ESR128.

Yeah, if it goes into the mid-cycle dot release I'll take it on ESR128 also. Otherwise, we can move the uplift request over there once the approval flags are created.

Attachment #9410660 - Flags: approval-mozilla-esr128?
Attachment #9410660 - Flags: approval-mozilla-release?
Attachment #9410660 - Flags: approval-mozilla-esr128? → approval-mozilla-esr128+

FYI the fix in 129 has been verified externally by the original reporter at https://discourse.mozilla.org/t/one-of-my-manifest-v3-addons-is-not-waking-up-when-it-should/131349/41

Edit: this comment was meant for bug 1905505 (https://bugzilla.mozilla.org/show_bug.cgi?id=1905505#c6). This bug here cannot easily be verified because it is a race condition, one that has been simulated by the automated unit test.

Status: RESOLVED → VERIFIED
Status: VERIFIED → RESOLVED
Closed: 3 months ago3 months ago
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: