Closed Bug 1651838 Opened 3 months ago Closed 2 months ago

Built-in extensions with experiments fail to load when the jar/apk location changes (i.e. app is rebuilt)

Categories

(GeckoView :: Extensions, defect, P1)

Unspecified
All

Tracking

(firefox80 fixed)

RESOLVED FIXED
mozilla80
Tracking Status
firefox80 --- fixed

People

(Reporter: robwu, Assigned: agi)

References

(Blocks 1 open bug)

Details

(Whiteboard: [geckoview:m80])

Attachments

(3 files)

While debugging bug 1651697, we found that Gecko failed to load jar:file:///data/app/org.mozilla.fenix.debug-pZnmlDcQNjaHTbIOkbgbjw==/base.apk!/assets/extensions/webcompat/about-compat/aboutPage.js. This is an experiment script from a built-in extension, packaged in Fenix's apk. The load failed because the location does not exist any more.

This bug happens when the following sequence of events occur:

  1. App using GeckoView + packaged add-on is build.
  2. App is published and installed.
  3. The app is rebuilt, without updating GeckoView (specifically, Gecko).
  4. App is published and installed.

At step 2, as part of parsing the extension's manifest.json, the experimental modules are loaded, script URLs are resolved *(!!) and the result is cached in the extension's StartupCache, keyed by extension ID+version+locale.
After step 4, when the app is started, the cached data is used because the cache is still considered valid because Gecko hasn't updated. The startupCache content would only be nuked when Gecko updates.

*(!!) Built-in extensions are input via a resource:-URL such as resource://android/assets/extensions/webcompat/about-compat/aboutPage.js, but the AddonManager resolves the resource:-URL to get their real location as a jar:file:-URL before initialization. This resolution is being relied upon by the moz-extension:-protocol handler, and failing to resolve it will cause bug 1645264.

( I checked extensions.json and addonsStartup.mozlz4, and those contain the resource://android/ URLs as expected, which means that the addon manager does not persist the resolved resource:-URLs to disk. )


There are three ways to solve this bug:

  1. (AndroidComponents) Require a version bump of the built-in packages whenever the app is built.
  2. (GeckoView) Let GeckoView detect when the package has been updated, and nuke the caches of the built-in add-ons.
  3. (Gecko / WebExtensions) Change the extension internals to store the unresolved resource://-URL instead of the resolved jar:file:-URL.
  • Option 1 can be used by GeckoView consumers (e.g. Fenix) until a fix is available.
  • Option 2 is my preferred solution, because the StartupCache logic relies on the assumption that built-in files aren't changed/moved when the version doesn't change. This option is only acceptable if GeckoView is able to efficiently verify that the app has been updated.
  • Option 3 may also be a reasonable solution, if it doesn't impact the performance of the add-on initialization (read: if we can implement this without additional filesystem access).
See Also: → 1651844

PS. code snippet to parse webext.sc.lz4 and find the keys / paths within the data structure that contains the "pZnmlDcQNjaHTbIOkbgbjw" substring.

{
    let dat = await OS.File.read("/tmp/jo7mcxqb.default/startupCache/webext.sc.lz4")
    let aomStartup = Cc["@mozilla.org/addons/addon-manager-startup;1"].createInstance(Ci.amIAddonManagerStartup);
    let res = aomStartup.decodeBlob(dat.buffer);

    // Serializers so that JSON.stringify will read their content.
    let global = Cu.getGlobalForObject(res);
    global.Map.prototype.toJSON = function() {
        let m = {};
        for (let [a, b] of this.entries()) {
            m[a] = b;
        }
        return m;
    };
    const privateSym = Symbol("privateSym");
    global.StructuredCloneHolder.prototype.toJSON = function() {
        if (privateSym in this) {
            return this[privateSym];
        }
        return this[privateSym] = this.deserialize(global);
    };

    var path = [];

    function visit(o) {
        let str = JSON.stringify(o);
        if (typeof str != "string" || !str.includes('pZnmlDcQNjaHTbIOkbgbjw')) {
            return;
        }
        console.log(path);
        if (o?.[privateSym]) o = o[privateSym];
        if (!o || typeof o !== "object") return;
        let entries = o.entries ? o.entries() : Object.entries(o);
        for (let [k, v] of entries) {
            path.push(k);
            visit(v);
            path.pop();
        }
    }
    visit(res);
}

To simulate the error:

  1. Open Fenix Nightly and open any tab, e.g. example.com.
  2. Use about:debugging on desktop to attach a debugger to Fenix. Switch to geckoview.xhtml to get a console for the privileged browser process.
  3. Run the code snippet below.
  4. Quit Fenix (swipe away and wait a few seconds).
  5. Start Fenix.
  6. Click on "Add-ons" - to reproduce bug 1651697 (tested with Nightly 200709)
/* This script will store an invalid URL in the StartupCache and print the original URL + new URL */
await (async () => {
    let StartupCache = ChromeUtils.import('resource://gre/modules/ExtensionParent.jsm', null).ExtensionParent.StartupCache;
    let modules = StartupCache._data.get('manifests').get('webcompat@mozilla.org').values().next().value.get("en-US").modules;
    let modulesParent = modules.parent.deserialize(window);
    let mod = modulesParent.modules.get("aboutPage");
    console.log(mod.url);
    mod.url = "jar:file:///data/app/org.mozilla.fenix.debug-pZnmlDcQNjaHTbIOkbgbjw==/base.apk!/assets/extensions/webcompat/about-compat/aboutPage.js";
    modules.parent = new StructuredCloneHolder(modulesParent);
    await StartupCache._saveNow();
    return mod.url;
})();

I think this is just a bug in how we compute the StartupCache's compatibility key for android.

In the current version the key is something like this:

[Compatibility]
LastVersion=80.0a1_20200707110539/20200707110539
LastOSABI=Android_x86_64-gcc3
LastPlatformDir=/data/user/0/org.mozilla.geckoview_example
LastAppDir=/data/user/0/org.mozilla.geckoview_example

Note the /data/user/0/org.mozilla.geckoview_example points to where our Android "dummy" executable is. This only captures when the data directory changes, but not when the executable data changes, we should do something like this

[Compatibility]
LastVersion=80.0a1_20200707110539/20200707110539
LastOSABI=Android_x86_64-gcc3
LastPlatformDir=/data/app/org.mozilla.geckoview_example-QyhEDWT7jIwY3rqYDJmCSA==/base.apk
LastAppDir=/data/user/0/org.mozilla.geckoview_example

where LastPlatformDir captures changes in the executable folder (and would catch this case) and LastAppDir catches changes in the data folder, which happens when the app is moved to removable storage (need to double check this last bit)

Agi will attach a patch that improves the update detection, which ought to fix the issue by flushing the caches on app updates.

Assignee: rob → agi

xreDirectory is used to determine when the omnijar location changes and for
some crash reporter logic that is not used on Android.

On Android, this location currently points to the App data folder, something
like

/data/user/0/org.mozilla.<App identifier>

where <App identifier> is the unique App identifier, e.g. org.mozilla.firefox
for Firefox for Android.

while the omnijar is located inside the APK (which is just a zip file) in a
location that looks like this

/data/app/org.mozilla.geckoview_example-<base64 string>/base.apk

where <base64 string> is a base64-encoded string that's unique to the current
APK (it might be a hash of the APK but I'm not sure). This value changes every
time the app is updated.

Code that uses the StartupCache assumes that whenever the location of the
omnijar (and its contents) change, the cache will be invalidated.

The assumption is usually true for code that resides inside the omnijar as part
of Gecko, as anytime the contents of the omnijar change, the Gecko version
changes too, invalidating the cache.

There is an exception however, extension built-in scripts are stored in the
same path as the omnijar (inside the apk) but can change without a Gecko
update.

This causes the StartupCache to have invalid data that points to the old apk
path.

To resolve this proble, we store the apk path as xreDirectory, and the data
path as directory, so that if either change, the StartupCache will be
invalidated.

The new compatibility.ini will look like this:

[Compatibility]
LastVersion=80.0a1_20200707110539/20200707110539
LastOSABI=Android_x86_64-gcc3
LastPlatformDir=/data/app/org.mozilla.geckoview_example-QyhEDWT7jIwY3rqYDJmCSA==/base.apk
LastAppDir=/data/user/0/org.mozilla.geckoview_example

where today LastAppDir and LastPlatformDir are both equal to the data path

/data/user/0/org.mozilla.geckoview_example

Callers looking for NS_GRE_BIN_DIR and NS_GRE_PATH expect the returned folder
to contain native libraries and executables.

The corresponding folder for this on Android is inside the APK and at the
moment there is no way to return a nsIFile to a zipped folder.

Callers already account for Android, either by failing gracefully or by having
Android specific code.

In this patch we flat out refuse to return these paths to avoid confusion.

Whiteboard: [geckoview:m80]
Pushed by asferro@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/558fe8966d86
Remove unused --appomni on Android. r=glandium,snorp
https://hg.mozilla.org/integration/autoland/rev/e831393fd920
Use APK file for xreDirectory. r=glandium,snorp
https://hg.mozilla.org/integration/autoland/rev/1d4650fa1b78
Don't provide NS_GRE_* paths on Android. r=snorp
Status: ASSIGNED → RESOLVED
Closed: 2 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla80
You need to log in before you can comment on or make changes to this bug.