Loading an XBL binding via chrome:// locks XPI file, prevents restartless extension from updating

RESOLVED FIXED in Firefox 19

Status

()

defect
RESOLVED FIXED
8 years ago
6 years ago

People

(Reporter: gaubugzilla, Assigned: nmaier)

Tracking

Trunk
mozilla20
x86_64
Windows 7
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox17-, firefox19 verified, firefox20 verified, firefox-esr10 affected, b2g18 fixed)

Details

Attachments

(3 attachments, 4 obsolete attachments)

Steps to reproduce:

1. Install attached extension (no browser restart).
2. Now install it again (update).

Expected results:

Update succeeds without an error, the extension's XPI file gets replaced.

Actual results (20120118 trunk nightly on Windows 7 x64):

A message in the Error Console complaining about the XPI file not being removable (the file is locked). Add-on manager "solves" by moving the old XPI into the trash/ subfolder instead of removing it, second update attempt during the same session fails completely however because the trash/ subfolder cannot be removed (extension is inactive, Add-on Manager shows an installation error).

The extension creates a XUL element and assigns an XBL binding to it. The XBL binding points to a file within the XPI file via a chrome:// URL, doing that permanently locks the XPI file. Strange thing is, using the same file via the equivalent jar: URL works correctly (using chromeRegistry.convertChromeURL() on the binding URL is actually my current work-around).
Additional/Alternative steps to reproduce:

1. Install extension in comment #0
2. Remove extension via about:addons.
3. Close about:addons
-> File gets moved into trash directory, but cannot be deleted as it is supposed to be

What works, however:
1. Install extension in comment #0
2. Restart the browser!
3. Remove extension via about:addons.
4. Close about:addons
-> File gets moved into trash, and is then deleted

What also works, is restarting the browser between add-on installation/updates. Works as in: the old file gets immediately removed.

So it seems that at some point during the actual installation a file handle is leaked for whatever reason. This is an indication for me that the XPIProvider.jsm does forget to flush some file handle (jar cache or something) during installation, instead of being a clear-cut XBL problem.
Component: XBL → Add-ons Manager
Product: Core → Toolkit
(In reply to Wladimir Palant from comment #0)
> A message in the Error Console complaining about the XPI file not being
> removable (the file is locked). Add-on manager "solves" by moving the old
> XPI into the trash/ subfolder instead of removing it, second update attempt
> during the same session fails completely however because the trash/
> subfolder cannot be removed (extension is inactive, Add-on Manager shows an
> installation error).

When the second install happens, the original XPI goes through the same file operations as the process of being uninstalled - it *always* gets moved to the trash folder before being deleted. Moving the file to the trash folder succeeds, as it avoids the file lock issues.

What fails is deleting the file when it's in the trash folder. However that's fine (er, with the exception of bug 719185) - the code expects that may fail, and will clean it up on the next restart (which succeeds, because the file won't be locked).

Unfortunately, the Add-ons Manager logging for that failure (deleting the trash directory) is a bit overzealous, and it logs the same error twice. I'd noticed that before and forgot to file a bug - filed bug 779507.



(In reply to Nils Maier [:nmaier] from comment #1)
> What works, however:
> 1. Install extension in comment #0
> 2. Restart the browser!
> 3. Remove extension via about:addons.
> 4. Close about:addons
> -> File gets moved into trash, and is then deleted
> 
> What also works, is restarting the browser between add-on
> installation/updates. Works as in: the old file gets immediately removed.

That works because the add-on doesn't attach an XBL binding after a browser restart (when startup() is called there are no browser windows yet). It only attaches an XBL binding when the addon is installed/enabled when Firefox is already running. And modifying the addon code so the XBL binding is never applied makes the bug go away (ie, the file is removed ok) for the steps given in comment 0. So the file locking issue is indeed an XBL bug.
Component: Add-ons Manager → XBL
Product: Toolkit → Core
(In reply to Blair McBride (:Unfocused) from comment #2)
> That works because the add-on doesn't attach an XBL binding after a browser
> restart (when startup() is called there are no browser windows yet). It only
> attaches an XBL binding when the addon is installed/enabled when Firefox is
> already running. And modifying the addon code so the XBL binding is never
> applied makes the bug go away (ie, the file is removed ok) for the steps
> given in comment 0. So the file locking issue is indeed an XBL bug.

Your assessment is absolutely correct and reproducible. Sorry for me not thinking this through and causing confusion.
No worries - I'd rather have it fully explored :)
Alright, found the underlying issue.
The XBL service will synchronously load the XBL document whenever it is considered a chrome document:
http://mxr.mozilla.org/mozilla-central/source/content/xbl/src/nsXBLService.cpp#1083
http://mxr.mozilla.org/mozilla-central/source/content/xbl/src/nsXBLService.cpp#447
This will result in a call to nsJARChannel::Open instead of ::AsyncOpen, which would be used by the XBL service for other protocols.
While ::AsyncOpen will finally kill the reference to mJarInput (holding the file handle) when the read is done:
http://mxr.mozilla.org/mozilla-central/source/modules/libjar/nsJARChannel.cpp#923
::Open will keep mJarInput referenced until the channel destructor is run. However, since th XBL Document will the channel reference somewhere, the destructor will effectively not be called.

Unreferencing mJarInput in ::Open (while also initializing the mContentLength member) seems to fix the issue described in comment 0.

Also, it explains the workaround from comment 0, using the jar-protocol directly. It will make the XBL service load the binding document asynchronously, hence killing the reference in nsJARChannel:OnStopRequest

While I do think that the nsJARChannel::Open behavior is buggy, the XBL document being still around is unexpected as well (xul/xbl cache not properly cleared? failure to do so by the XPIProvider?)
Component: XBL → Networking: JAR
This will make the procedure from comment #0 work with expected results.

Not tested well (I just run the build, but not the full test suite).
Also, it does not address the potentially dangling XBL document.
Attachment #655245 - Flags: feedback?(taras.mozilla)
Comment on attachment 655245 [details] [diff] [review]
Patch v0, Unref mJarInput in nsJARChannel::Open

+    mContentLength = mJarInput->GetContentLength();
+    NS_IF_RELEASE(mJarInput);

That does not seem safe at all. This relies on caller to keep mJarInput alive.

I think there are 2 ways to solve this
a) Get rid of mJarInput, it doesn't seem to be used for much...this however requires refactoring.

b) Detect EOF(ie a 0length read) in nsJARChannel::Read and release mJarInput in there. I think that should be ok and straightforward.
Attachment #655245 - Flags: feedback?(taras.mozilla) → feedback-
(In reply to Taras Glek (:taras) from comment #7)
> Comment on attachment 655245 [details] [diff] [review]
> Patch v0, Unref mJarInput in nsJARChannel::Open
> 
> +    mContentLength = mJarInput->GetContentLength();
> +    NS_IF_RELEASE(mJarInput);
> 
> That does not seem safe at all. This relies on caller to keep mJarInput
> alive.

The caller has to keep it alive... it is the input stream. Once the caller does not track it any longer, it will get destroyed, but that is perfectly fine from what I gather. The request is done then (closed) and you should not repeatedly call ::Open on a channel anyway as there is no guarantee this will work (this behavior is documented in nsIChannel).

With this patch, the only instance where mJarInput can be safely called on a sync request and yields the wrong result is ::GetOwner. Easily fixed; same deal as with GetContentLength, just forgot about it (BTW, the GetOwner logic is to be removed in bug 726125 anyway).
I believe my patch is OK otherwise...


> b) Detect EOF(ie a 0length read) in nsJARChannel::Read and release mJarInput
> in there. I think that should be ok and straightforward.

There is no nsJARChannel::Read. I think you confused something here.

Detecting EOF is problematic, as the caller may read only a few bytes and may never read to EOF... More importantly, I do not see any added benefit in doing so.
(In reply to Nils Maier [:nmaier] from comment #8)
> (In reply to Taras Glek (:taras) from comment #7)
> > Comment on attachment 655245 [details] [diff] [review]
> > Patch v0, Unref mJarInput in nsJARChannel::Open
> > 
> > +    mContentLength = mJarInput->GetContentLength();
> > +    NS_IF_RELEASE(mJarInput);
> > 
> > That does not seem safe at all. This relies on caller to keep mJarInput
> > alive.
> 
> The caller has to keep it alive... it is the input stream. Once the caller
> does not track it any longer, it will get destroyed, but that is perfectly
> fine from what I gather. The request is done then (closed) and you should
> not repeatedly call ::Open on a channel anyway as there is no guarantee this
> will work (this behavior is documented in nsIChannel).
> 
> With this patch, the only instance where mJarInput can be safely called on a
> sync request and yields the wrong result is ::GetOwner. Easily fixed; same
> deal as with GetContentLength, just forgot about it (BTW, the GetOwner logic
> is to be removed in bug 726125 anyway).
> I believe my patch is OK otherwise...
So we agree to remove mJarInput usage?

> 
> 
> > b) Detect EOF(ie a 0length read) in nsJARChannel::Read and release mJarInput
> > in there. I think that should be ok and straightforward.
> 
> There is no nsJARChannel::Read. I think you confused something here.
> 
> Detecting EOF is problematic, as the caller may read only a few bytes and
> may never read to EOF... More importantly, I do not see any added benefit in
> doing so.

I meant http://mxr.mozilla.org/mozilla-central/source/modules/libjar/nsJARChannel.cpp#163 to reach parity with async reading code.
(In reply to Taras Glek (:taras) from comment #9)
> (In reply to Nils Maier [:nmaier] from comment #8)
> > (In reply to Taras Glek (:taras) from comment #7)
> > > Comment on attachment 655245 [details] [diff] [review]
> > > Patch v0, Unref mJarInput in nsJARChannel::Open
> > > 
> > > +    mContentLength = mJarInput->GetContentLength();
> > > +    NS_IF_RELEASE(mJarInput);
> > > 
> > > That does not seem safe at all. This relies on caller to keep mJarInput
> > > alive.
> > 
> > The caller has to keep it alive... it is the input stream. Once the caller
> > does not track it any longer, it will get destroyed, but that is perfectly
> > fine from what I gather. The request is done then (closed) and you should
> > not repeatedly call ::Open on a channel anyway as there is no guarantee this
> > will work (this behavior is documented in nsIChannel).
> > 
> > With this patch, the only instance where mJarInput can be safely called on a
> > sync request and yields the wrong result is ::GetOwner. Easily fixed; same
> > deal as with GetContentLength, just forgot about it (BTW, the GetOwner logic
> > is to be removed in bug 726125 anyway).
> > I believe my patch is OK otherwise...
> So we agree to remove mJarInput usage?

I have no objections against removing it if it simplifies the code, however it is not really clear to me that it would...
Detecting EOF to close the stream is incorrect, though, as there is no guarantee the caller will read till reaching EOF.

Having thought some more about it, it is my opinion that the correct way is indeed to transfer ownership of the input stream to caller in ::Open, as my patch prototype did. Also, this mirrors the behavior of nsBaseStream/nsFileStream/nsDataChannel which do not track the input stream when ::Open'ed and leave that to the caller.
(In reply to Nils Maier [:nmaier] from comment #10)
> > So we agree to remove mJarInput usage?
> 
> I have no objections against removing it if it simplifies the code, however
> it is not really clear to me that it would...
> Detecting EOF to close the stream is incorrect, though, as there is no
> guarantee the caller will read till reaching EOF.

There isn't, however Mozilla clients typically does read whole files out of jars, just they typically hold the stream alive.

Messing with nsCOMPtr breaks the contract that nsCOMPtr serves. Just because code continues to work with manual AddRef/Release, does not mean it's ok.

I'll r+ a patch to remove mJarInput.

We will also need a testcase to verify this fix.
What nsCOMPtr? mJarInput is a raw pointer...
(In reply to Nils Maier [:nmaier] from comment #12)
> What nsCOMPtr? mJarInput is a raw pointer...


PS: Even if it was a nsCOMPtr, the correct course IMO would be to .swap it then.
(In reply to Nils Maier [:nmaier] from comment #13)
> (In reply to Nils Maier [:nmaier] from comment #12)
> > What nsCOMPtr? mJarInput is a raw pointer...
> 
> 
> PS: Even if it was a nsCOMPtr, the correct course IMO would be to .swap it
> then.

Yuck, good point. My point that we should get rid of it remains.
This now corrects the ownership of the jar channel input stream by removing the mJarInput member completely.

One smallish drawback is, that the input stream is always immediately initialized, which may cause a slight performance regression when .contentLength (stream.available()) and .owner are not accessed or are accessed off the main thread. In case of .owner, that should not matter much, as the logic is pending removal in bug 726125 anyway.
Assignee: nobody → MaierMan
Attachment #655245 - Attachment is obsolete: true
Attachment #656170 - Flags: review?(taras.mozilla)
Some unit tests, including test verifying that any file locks are gone once the stream is closed (and the jar cache dropped).
Attachment #656172 - Flags: review?(taras.mozilla)
Note this patch will have to be rebased on bug 726125 since that's about to land.
Depends on: 726125
I'm going to review this on Wednesday(optimistic) or Thursday.
Comment on attachment 656170 [details] [diff] [review]
Part 1 - Correct jar channel stream ownership

I like your neat logic transplant surgery in this patch.

mUsingJarCache doesn't do anything in this patch. Leftover from debugging?

+    if (cert) {
+        nsCAutoString certFingerprint;
+        rv = cert->GetFingerprint(certFingerprint);
+        if (NS_FAILED(rv))
+            return rv;
+
...

Take out GetOwner stuff. now that the removal bug landed

+    NS_ADDREF(*resultInput = input);
+    NS_ADDREF(*stream = input);
Now this isn't a naked member pointer, use  *stream = input.forget() to make transfer of ownership clear.

+error_cleanup:
+    mIsPending = false;
+    mListenerContext = nullptr;
+    mListener = nullptr;
+    return rv;

I would prefer a RAII cleanup here, but I don't see a way to do this that's cleaner than the fragile(easy to introduce an accidental return into these) C goto soup.

f+ because this is good work, but needs another (quicker) round of review before it can land.
Attachment #656170 - Flags: review?(taras.mozilla) → feedback+
(In reply to Nils Maier [:nmaier] from comment #15)

> One smallish drawback is, that the input stream is always immediately
> initialized, which may cause a slight performance regression when
> .contentLength (stream.available()) and .owner are not accessed or are
> accessed off the main thread. In case of .owner, that should not matter
> much, as the logic is pending removal in bug 726125 anyway.

.available should be cheap on a filehandle...doing it post-open almost guarantees it to be free
Comment on attachment 656172 [details] [diff] [review]
Part 2 - Add jar channel unit tests

This looks good. I trust that you checked that the test fails without your patch and passes with it.
Attachment #656172 - Flags: review?(taras.mozilla) → review+
Alright, here is a rebased after the GetOwner stuff landed and tested patch.

(In reply to Taras Glek (:taras) from comment #19)

> mUsingJarCache doesn't do anything in this patch. Leftover from debugging?

Actually it does. See nsJARInputThunk::Close


> Take out GetOwner stuff. now that the removal bug landed

gone

> +    NS_ADDREF(*resultInput = input);
> +    NS_ADDREF(*stream = input);
> Now this isn't a naked member pointer, use  *stream = input.forget() to make
> transfer of ownership clear.

done and done

> +error_cleanup:
> +    mIsPending = false;
> +    mListenerContext = nullptr;
> +    mListener = nullptr;
> +    return rv;
> 
> I would prefer a RAII cleanup here, but I don't see a way to do this that's
> cleaner than the fragile(easy to introduce an accidental return into these)
> C goto soup.

Turns out the goto-soup isn't really needed after all. It's still not RAII, but I added an additional comment emphasizing not to return early. However, I could probably rewrite the stuff to use one or two additional helper functions (or another helper class for RAII), but that seems a bit over-engineery and is not really worth the ovehead and fuss at this point.
Attachment #656170 - Attachment is obsolete: true
Attachment #657130 - Flags: review?(taras.mozilla)
(In reply to Taras Glek (:taras) from comment #21)
> This looks good. I trust that you checked that the test fails without your
> patch and passes with it.

The Unlocks test is crashing and burning without this patch. The other tests must be and are fine without and with the patch. I just put them in to test the basic functionality and to ensure that the patch does not regress it.
Comment on attachment 657130 [details] [diff] [review]
Part 1 - Correct jar channel stream ownership

I like your goto-soup removal.

Rest looks good. Sorry for the review lag, was away for long weekend.
Attachment #657130 - Flags: review?(taras.mozilla) → review+
Hey Taras, I'm never sure about these things: sr needed?

Also, since current aurora is going to be an ESR FWIK, do you think it is feasible to request approval-mozilla-aurora?
(In reply to Nils Maier [:nmaier] from comment #25)
> Hey Taras, I'm never sure about these things: sr needed?
> 
> Also, since current aurora is going to be an ESR FWIK, do you think it is
> feasible to request approval-mozilla-aurora?

my r+ is enough. Check it in, lets let it bake on mc. What is the impact of not taking this on aurora? Can you make a strong case for functionality that is busted unless this goes upstream? I'm a bit worried about risks involved in changing ownership, handle-lifecycles in patches like this.
I second Nils's motion and would like to see this uplifted. We want more add-on developers to create restartless add-ons, and this limits their possibilities. And serious add-on devs will want to at least support whatever the latest ESR version is, so not including this fix would delay their migration or implementation of certain features.
> Check it in, lets let it bake on mc.
/me has no commit access. So, checkin-needed.

> What is the impact of not taking this on aurora?

What Jorge said.
Keywords: checkin-needed
(In reply to Jorge Villalobos [:jorgev] from comment #27)
> I second Nils's motion and would like to see this uplifted. We want more
> add-on developers to create restartless add-ons, and this limits their
> possibilities. And serious add-on devs will want to at least support
> whatever the latest ESR version is, so not including this fix would delay
> their migration or implementation of certain features.

Ok, sounds like a serious addon limitation, you should flag this for consideration for uplift. We should let this bake for a week before uplift.
Nominating for uplift to Aurora (17), given the reasons in comment #27.
It was. Tryserver kthx (I broke my own rules by landing this without a link to a green Try run).
https://tbpl.mozilla.org/?tree=Try&rev=40ccd580787d
This adds back the missing mIsUnsafe assignments for local files.
Attachment #657130 - Attachment is obsolete: true
Attachment #660131 - Flags: review?(taras.mozilla)
Changes to the previous version are the two additions (in ::Open, ::AsyncOpen) of:
// local files are always considered safe
mIsUnsafe = false;
Attachment #660131 - Flags: review?(taras.mozilla) → review+
When nominating for uplift to Aurora/ESR, please include separate risk evaluations. Thanks!
Keywords: checkin-needed
This leaks on Try.
https://tbpl.mozilla.org/?tree=Try&rev=c7399c61a306

https://tbpl.mozilla.org/php/getParsedLog.php?id=15175344&tree=Try

(In reply to Ryan VanderMeulen from comment #32)
> It was. Tryserver kthx (I broke my own rules by landing this without a link
> to a green Try run).

no, srsly
Keywords: checkin-needed
Unfortunately was quite busy recently and a little ill in between, but tried to reproduce and narrow it down today (and failed due to numerous reasons, involving unrelated tests blowing up in my face :p) or at least psychic-debug it...

Taras, I have the strong feeling that the input thunk no longer holding a reference to the global nsIZipReaderCache jarCache might cause the leak. Any thoughts?
I'd do a try for that, but bug 791078 is stalled somehow.
Depends on: 791078
Not clear that this fix is any more necessary in FF17 than it was in FF16, and there's been little movement. Please nominate for uplift when prepared, but we won't track for release at this point.

Feel free to re-nominate for tracking if I've got that wrong.
Depends on: 812887
Depends on: 814470
Unbitrotted. Exactly the same changes as before (just the context lines have changed), so the re-review should be trivial.
Since the blockers were resolved, we got green without any leaks:
https://tbpl.mozilla.org/?tree=Try&rev=08567a03f6d4
Attachment #660131 - Attachment is obsolete: true
Attachment #686108 - Flags: review?(taras.mozilla)
Comment on attachment 686108 [details] [diff] [review]
Part 1 - Correct jar channel stream ownership

no need for re-review here.
Attachment #686108 - Flags: review?(taras.mozilla) → review+
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/736e9f98d97a
https://hg.mozilla.org/mozilla-central/rev/0e5ca55005ae
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla20
Comment on attachment 686108 [details] [diff] [review]
Part 1 - Correct jar channel stream ownership

I'd like to land the 2 patches in this bug, and the two small bugs they depend on (bug 812887 and bug 814470) on b2g18 (and thus aurora too).

Reason: they contain nsJARChannel changes that will require significant enough changes to the patches for bug 815523 (B2G blocker) that it seems worth just porting these, especially since they've been baking on m-c for a while w/o trouble (november). They also contain the xpcshell test that I'm using to test bug 815523.

No String or UUID changes made by any of these patches.

try build:

  https://tbpl.mozilla.org/?tree=Try&rev=285f52ae4434

I'm not flagging all of the patches for now since they need to be waved through as a group.  If you +a feel free to mark the other patches too.
Attachment #686108 - Flags: approval-mozilla-b2g18?
Attachment #686108 - Flags: approval-mozilla-aurora?
Comment on attachment 686108 [details] [diff] [review]
Part 1 - Correct jar channel stream ownership

If these patches make it significantly easier for us to fix b-b+ bug 815523 and are low risk for Aurora 19, then let's land these on branches.
Attachment #686108 - Flags: approval-mozilla-b2g18?
Attachment #686108 - Flags: approval-mozilla-b2g18+
Attachment #686108 - Flags: approval-mozilla-aurora?
Attachment #686108 - Flags: approval-mozilla-aurora+
Unfortunately, I had to back this out of Aurora (and will be from b2g18 shortly as well) due to bug 822313. I did have to do some manual fixup to get this to apply to Aurora, so it's possible something went wrong in the process. Can you please post a branch-specific patch that has successfully run through Try without leaking? Thanks!
https://hg.mozilla.org/releases/mozilla-aurora/rev/2bc7e0553fe6
Sorry, I know we're all in a hurry--if you look at comment 43 you'll see I mentioned we also need bug 812887 and bug 814470 (I'll go mark them ?a now).   That definitely explains some leaking.   Here's a push to the 18b2g branch with everything I think we need applied

https://tbpl.mozilla.org/?tree=Try&rev=cd22b8f5501a
Sorry for missing that. In the future, a whiteboard comment would help a lot as it's a lot easier to see than scanning through the comments.
Ran into some very persistent (failed 3 out of 7 runs) oranges that now seem to be gone in the next patch that landed after this.  Filed bug 823429 for what looks like new random orange.  Only happens on b2g18 branch--aurora landed fine.
Verified the issue using the testcase and STR from comment 0 for 2012-01-18 Nightly

Verified the fix using the same testcase and STR for Firefox 19.0 beta 4
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:19.0) Gecko/20100101 Firefox/19.0
Build ID: 20130130080006

No message found in the Error Console complaining about the XPI file.
Verified fixed with Firefox 20 beta 7 using the STR from comment 0. No message found in the Error Console complaining about the XPI file.

Build ID: 20130325214615
You need to log in before you can comment on or make changes to this bug.