Updating bootstrapped add-ons causes NS_ERROR_FILE_ACCESS_DENIED in XPIProvider.jsm::recursiveRemove when the add-on loads scripts in child processes

VERIFIED FIXED in mozilla6

Status

()

Toolkit
Add-ons Manager
VERIFIED FIXED
7 years ago
7 years ago

People

(Reporter: kmag, Assigned: mao)

Tracking

2.0 Branch
mozilla6
x86
Windows XP
Points:
---
Bug Flags:
in-testsuite +
in-litmus -

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments, 5 obsolete attachments)

User-Agent:       Mozilla/5.0 (X11; Linux x86_64; rv:2.0b13pre) Gecko/20110310 Firefox/4.0b13pre
Build Identifier: 

Installing the attached XPI and then upgrading it causes the following error in the Error Console:

Error: ERROR addons.xpi: Failed to remove file C:\Documents and Settings\Administrator\Application Data\Mozilla\Firefox\Profiles\kqr17zwh.default\extensions\trash\pentadactyl@dactyl.googlecode.com.xpi: [Exception... "Component returned failure code: 0x80520015 (NS_ERROR_FILE_ACCESS_DENIED) [nsIFile.remove]"  nsresult: "0x80520015 (NS_ERROR_FILE_ACCESS_DENIED)"  location: "JS frame :: resource://gre/modules/XPIProvider.jsm :: recursiveRemove :: line 1090"  data: no]
Source file: resource://gre/modules/XPIProvider.jsm
Line: 1090

The installation does, however, succeed. Upgrading again causes another such error and a door knocker message that the installation has failed.

Reproducible: Always

Steps to Reproduce:
1. Install attachment 1 [details] [diff] [review].
2. Install again in the same session.
3. Check the Error Console.
4. Install once more.
5. Notice door knocker message and failed installation.



This may have something to do with the use of Components.utils.import. The extension also keeps a component registered between upgrades to handle the re-loading of scripts into module globals, but I don't see how this could be related.

This is only an issue on Windows.
For some reason we seem to be locking the XPI in the trash folder (where we move old add-ons while in the process of upgrading). The first upgrade succeeds because if we fail to clean up the trash afterwards it isn't a big deal, the second fails because we can't empty the trash to start with so can't proceed.

Presumably if you restart between the 1st and 2nd upgrade the 2nd upgrade completes although with the same error message.
It seems that this is only an issue when installing from the network. Updating a version installed from a local file doesn't seem to cause any problems.
Can't reproduce this on Windows 7. Are there perhaps some particular settings you have in your profile that might be causing this?
It looks like InstallTrigger might be involved. I don't have any issues from downloading from bugzilla, but when trying to install from AMO or any page that uses InstallTrigger (http://dactyl.sourceforge.net/nightlies) I have issues.
(Assignee)

Comment 6

7 years ago
I've got the very same problem with NSA ( http://noscript.net/nsa ), which also registers a component and imports modules from content processes.

The error is triggered whenever I try to uninstall the extensions, not just on updates.

FWIW, using Firefox 4 Mobile for Windows I've found the XPI in the trash directory is locked by plugin-container.exe (a content process, I assume?)

Might the problem be related with the unability of content processes to deregister XPCOM components before the uninstallation takes place? The only way to tell them to cleanup is sending an asynchronous message from the bootstrap.js' shutdown function, but when the message gets processed is already too late.
Status: UNCONFIRMED → NEW
Ever confirmed: true
(In reply to comment #6)
> I've got the very same problem with NSA ( http://noscript.net/nsa ), which also
> registers a component and imports modules from content processes.

Thanks, I haven't been able to reproduce this before, I'll try with that.

> FWIW, using Firefox 4 Mobile for Windows I've found the XPI in the trash
> directory is locked by plugin-container.exe (a content process, I assume?)

Odd, the content process shouldn't be touching things I'd think

> Might the problem be related with the unability of content processes to
> deregister XPCOM components before the uninstallation takes place? The only way
> to tell them to cleanup is sending an asynchronous message from the
> bootstrap.js' shutdown function, but when the message gets processed is already
> too late.

I didn't think the content process registered XPCOM components?
(Assignee)

Comment 8

7 years ago
(In reply to comment #7)

> I didn't think the content process registered XPCOM components?

Not sure whether Fennec does on its own, but extensions definitely can and in some instances must, e.g. to install a content policy: two extensions who surely do are the aforementioned experimental NoScript Anywhere (NSA) and the currently shipping Adblock Plus (which, on the other hand, is not restartless hence can't be affected by this issue).
This raises a problem that I really failed to consider, we simply don't support the case where shutdown (or any of the other methods) are asynchronous in nature. That could become a problem we need to address soon (though right now its quite difficult to do).

Kris, is your extension doing anything similar to this?
(Assignee)

Comment 10

7 years ago
(In reply to comment #9)
> This raises a problem that I really failed to consider, we simply don't support
> the case where shutdown (or any of the other methods) are asynchronous in
> nature.

Yes: even if we end to find that this is not the crux of the issue at hand,  bootstrapped add-ons in E10s land definitely need a safe way to cleanup and release resources in the content processes on upgrades, and this can only be done asynchronously.
(In reply to comment #10)
> (In reply to comment #9)
> > This raises a problem that I really failed to consider, we simply don't support
> > the case where shutdown (or any of the other methods) are asynchronous in
> > nature.
> 
> Yes: even if we end to find that this is not the crux of the issue at hand, 
> bootstrapped add-ons in E10s land definitely need a safe way to cleanup and
> release resources in the content processes on upgrades, and this can only be
> done asynchronously.

I've filed bug 652860 to track that issue
(Assignee)

Comment 12

7 years ago
Created attachment 528617 [details] [diff] [review]
Tentative patch which fixes this issue in the E10s bootstrapped case

Debugging a bit and checking for open file handles I've found the culprit to be, at least in the E10s scenario, both the main and the content process locking the file through the JAR cache. While the main process releases the lock because it gets notified ("flush-cache-entry") through the Observer Service ( http://mxr.mozilla.org/mozilla-central/source/modules/libjar/nsJAR.cpp#1353 ), the content process never gets this notification and thus keeps locking the file (strangely enough, even after it gets moved to the trash directory), preventing it (and the trash directory) from being deleted and causing this bug.

The attached patch fixes this issue by remoting the "flush-cache-entry" notification.

Of course there are several ways, probably cleaner too, to do this, so feel free to tore it apart and rebuild it the way you prefer, but the concept does work. 

In fact, I'm putting this very code at the bottom of my bootstrap script (along with a per-session uniqueness check) as a temporary work-around until this bug is fixed.
Assignee: nobody → g.maone
Attachment #528617 - Flags: review?
(In reply to comment #12)
> Created attachment 528617 [details] [diff] [review]
> Tentative patch which fixes this issue in the E10s bootstrapped case

This is going to be something of a race right? It relies on the content process being able to receive and process the message before we go on and attempt to delete the files
(Assignee)

Comment 14

7 years ago
(In reply to comment #13)
> (In reply to comment #12)
> > Created attachment 528617 [details] [diff] [review]
> > Tentative patch which fixes this issue in the E10s bootstrapped case
> 
> This is going to be something of a race right? It relies on the content process
> being able to receive and process the message before we go on and attempt to
> delete the files

Yes and no.

In my tests the lock always went away before it could prevent the deletion (I've got a fairly fast test machine).

But even if we failed to delete the file on first attempt (when the failure is not fatal anyway), we can surely delete it next time getTrashDir() is called, where the actual breakage happen preventing any extension to be installed anymore in the session.
Attachment #528617 - Flags: review? → review?(dtownsend)
Comment on attachment 528617 [details] [diff] [review]
Tentative patch which fixes this issue in the E10s bootstrapped case

Review of attachment 528617 [details] [diff] [review]:
-----------------------------------------------------------------

I guess this is a thing we can do however I'm nervous about just passing any jar flush over to the content process. Instead just create a function in XPIProvider.jsm that we call whenever we flush a jar, dispatch the observer notification and the message to the child processes from there so we only do it for ours. Put the chile process handling stuff in extensions-content.js as that is already running for child processes.

Bonus points if you can figure out how not to do double flushing if we are all in one process as Firefox is right now.
Attachment #528617 - Flags: review?(dtownsend) → review-
(Assignee)

Comment 16

7 years ago
Created attachment 532040 [details]
Version 2 of the tentative patch

This implements the suggested changes.
Attachment #528617 - Attachment is obsolete: true
Attachment #532040 - Flags: review?(dtownsend)
(Assignee)

Updated

7 years ago
Attachment #532040 - Flags: review?(dtownsend)
(Assignee)

Comment 17

7 years ago
Created attachment 532044 [details] [diff] [review]
Version 2 of the tentative patch (flagged as a patch)

It addresses the issues raised by v1 review.
Attachment #532040 - Attachment is obsolete: true
Attachment #532044 - Flags: review?(dtownsend)
(Assignee)

Comment 18

7 years ago
Created attachment 532086 [details] [diff] [review]
Patch v3

More concise and easy to maintain version.
Attachment #532086 - Flags: review?(dtownsend)
Comment on attachment 532086 [details] [diff] [review]
Patch v3

Review of attachment 532086 [details] [diff] [review]:
-----------------------------------------------------------------

Basically there, couple of small comments.

::: bug/XPIProvider.jsm
@@ +150,5 @@
>    locale: 8,
>    multipackage: 32
>  };
>  
> +const MSG_JAR_FLUSH = "WebInstallerJarFlush";

Call this "AddonJarFlush"

@@ +7160,5 @@
>        }
>        else {
> +        if (aSource.isFile()) {
> +          flushCache(aSource);
> +        }

Don't need braces around a single statement.

@@ +7223,5 @@
>      let trashDir = this.getTrashDir();
>  
> +    if (file.leafName != aId) {
> +      flushCache(file);
> +    }

Nor here.

@@ +7437,5 @@
>    }
>  };
>  #endif
>  
> +function flushCache(file) {

Name thus flushJarCache and move it up top amongst the other helper functions (below buildJarURI would be fine).

::: bug/extensions-content.js
@@ +224,5 @@
>    addMessageListener(MSG_INSTALL_CALLBACK, this);
> +  
> +  try {
> +    // only if we live in the child process...
> +    if (Cc["@mozilla.org/xre/app-info;1"].getService(Ci.nsIXULRuntime).processType === 2) {

I guess we might also want this in the jetpack process and really any process that isn't the main one so just compare against nsIXULRuntime.PROCESS_TYPE_DEFAULT
Attachment #532086 - Flags: review?(dtownsend) → review-
Comment on attachment 532044 [details] [diff] [review]
Version 2 of the tentative patch (flagged as a patch)

Definitely prefer the other version
Attachment #532044 - Flags: review?(dtownsend) → review-
(Assignee)

Comment 21

7 years ago
Created attachment 532728 [details] [diff] [review]
Patch v4
Attachment #532044 - Attachment is obsolete: true
Attachment #532086 - Attachment is obsolete: true
Attachment #532728 - Flags: review?(dtownsend)
Comment on attachment 532728 [details] [diff] [review]
Patch v4

Review of attachment 532728 [details] [diff] [review]:
-----------------------------------------------------------------

::: bug/extensions-content.js
@@ +43,5 @@
>  
>  const MSG_INSTALL_ENABLED  = "WebInstallerIsInstallEnabled";
>  const MSG_INSTALL_ADDONS   = "WebInstallerInstallAddonsFromWebpage";
>  const MSG_INSTALL_CALLBACK = "WebInstallerInstallCallback";
> +const MSG_JAR_FLUSH = "AddonJarFlush";

Oh forgot, line the = up with the previous lines here.
Attachment #532728 - Flags: review?(dtownsend) → review+
(Assignee)

Comment 23

7 years ago
Created attachment 532733 [details] [diff] [review]
Patch v5 [checked in]

Fixes constant alignment nit.
Attachment #532728 - Attachment is obsolete: true
Attachment #532733 - Flags: review?(dtownsend)
Attachment #532733 - Flags: review?(dtownsend) → review+
Comment on attachment 532733 [details] [diff] [review]
Patch v5 [checked in]

Landed: http://hg.mozilla.org/mozilla-central/rev/08aa930c3367

In the future please generate patches against the mozilla central tree, this was a bit of a pain to apply.
Attachment #532733 - Attachment description: Patch v5 → Patch v5 [checked in]
Dave, is this fixed now?
We never heard back from Kris about whether the problem Giorgio was the same issue he was seeing but at this point probably best to just morph this and close it and if Kris is still having problems start out with a new bug.
Status: NEW → RESOLVED
Last Resolved: 7 years ago
Flags: in-testsuite+
Flags: in-litmus-
Resolution: --- → FIXED
Summary: Updating certain bootstrapped add-ons causes NS_ERROR_FILE_ACCESS_DENIED in XPIProvider.jsm::recursiveRemove → Updating bootstrapped add-ons causes NS_ERROR_FILE_ACCESS_DENIED in XPIProvider.jsm::recursiveRemove when the add-on loads scripts in child processes
Target Milestone: --- → mozilla6
Kris, can you please verify that this issue is solved for you with the latest Aurora build? I'm still not able to reproduce this failure on my XP installation with Firefox 4. Thanks.
Version: unspecified → 2.0 Branch
Sorry, I lost track of this thread. I'll confirm it later today.
This appears to fix my issue. Thanks.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.