Closed Bug 368428 Opened 18 years ago Closed 16 years ago

XUL FastLoad cache corruption when application running while upgrading

Categories

(Core :: Networking: JAR, defect)

1.8 Branch
x86
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla1.9.1b1

People

(Reporter: glandium, Assigned: glandium)

Details

Attachments

(1 file, 5 obsolete files)

This is something that does not happen on windows but will happen with unix builds.

Here is the scenario to trigger the issue:
- Start the application
- Upgrade the application while running, replacing the files currently in use (this kind of thing can happen when upgrading through the linux distribution upgrade system while a user is running the application).
- In the already running application, open a new kind of window, control, etc. For example, the preferences panel. It will more than probably show XML errors.
- Stop the application
- Start the application again. Go to the preferences panel again, if you went there and got XML errors. You will get the same errors.
- Stop the application
- Remove the XUL.mfasl file
- Start the application. Try again. The XML errors are gone.

To solve this, maybe it would be better to not cache invalid XUL files...
A bit more precision:

- When you go to the preferences panel the first time, after upgrade, the XUL.mfasl is removed and recreated with the cache content for the broken preferences panel.
- The corruption doesn't seem to happen with flat chrome (no jar files), so this may be related to jar handling vs. the jar file that changing.
I can confirm that bypassing zip cache in nsZipReaderCache::GetZip solves the problem. This cache should be invalidated when the zip file's mtime changes. This probably means adding the old mtime information to the nsJAR or the nsZipArchive class.

Changing bug component accordingly (though it sounds weird that Jar is under Networking)
Component: XP Toolkit/Widgets: XUL → Networking: JAR
QA Contact: xptoolkit.xul → networking.jar
Attached patch patch (obsolete) — Splinter Review
I'm not quite sure it doesn't leak the previously cached nsJAR object, but releasing ends in a dead lock... Not knowing the code doesn't help :-p
Attachment #253083 - Flags: review?
Attachment #253083 - Flags: review? → review?(darin.moz)
(Rereading comment #0, I feel I have to precise it doesn't happen on windows because the installer doesn't want to run if an instance is running)
Comment on attachment 253083 [details] [diff] [review]
patch

Mmmmmm for some reason, that leads to a lock...
Attachment #253083 - Flags: review?(darin.moz)
Attached patch patch (obsolete) — Splinter Review
mZips.Put was trying to release the collisioning zip, which ended in a lock in nsZipReaderCache::ReleaseZip. The new patch removes the zip from mZips beforehand.
Attachment #253083 - Attachment is obsolete: true
Attachment #253090 - Flags: review?(darin.moz)
Comment on attachment 253090 [details] [diff] [review]
patch

sorry, i'm not going to have time to give this a proper review.  over to biesi.
Attachment #253090 - Flags: review?(darin.moz) → review?(cbiesinger)
Mike, is this a problem on the trunk?

biesi, do you think you could review the patch sometime?
Assignee: nobody → mh+mozilla
Attached patch New patch against trunk (obsolete) — Splinter Review
Updated patch for current trunk
Attachment #253090 - Attachment is obsolete: true
Attachment #284746 - Flags: review?(cbiesinger)
Attachment #253090 - Flags: review?(cbiesinger)
Comment on attachment 284746 [details] [diff] [review]
New patch against trunk

+  zipFile->GetLastModifiedTime(&mMtime);

you should check that that succeeded

+      lock.unlock();
+      mZips.Remove(&key);
+      lock.lock();

why are you doing that outside of the lock? And why is that safe?

you should initialize mMtime in the constructor
(In reply to comment #10)
> +      lock.unlock();
> +      mZips.Remove(&key);
> +      lock.lock();
> 
> why are you doing that outside of the lock? And why is that safe?

mZips.Remove does end up locking when it needs to. If you don't unlock, that's a dead lock situation.

I'll update the patch with your other comments when you tell me you're okay with this part.
It locks a different lock... consider ReleaseZip: That also removes the entry while holding the lock.
Comment on attachment 284746 [details] [diff] [review]
New patch against trunk

r-, please call .Remove while holding the lock
Attachment #284746 - Flags: review?(cbiesinger) → review-
Attached patch patch against trunk (obsolete) — Splinter Review
This should solve your concerns.
Attachment #284746 - Attachment is obsolete: true
Attachment #308130 - Flags: review?(cbiesinger)
Comment on attachment 308130 [details] [diff] [review]
patch against trunk

+  if (zip && LL_EQ(Mtime, zip->GetMtime())) {

You can actually use == these days for comparing PRInt64 values.

+      ReleaseZip(zip);

I don't think it's necessary to call this... the mZips.Remove should be enough. Doesn't this cause a deadlock anyway?
Attachment #308130 - Flags: review?(cbiesinger) → review+
Mike, I just noticed that Ubuntu is taking this patch upstream for XULRunner 1.9. Can you address biesi's nits, and then I can work on getting this in for Firefox 3.1 / Gecko 1.9.1? Thanks!
Status: NEW → ASSIGNED
I'll have to do some testing. IIRC, mZips.Remove alone leads to a deadlock situation.
Attached patch patch (obsolete) — Splinter Review
Same patch with changes suggested in comment #15. I verified this works properly without ReleaseZip. Retrospectively, I wonder how I got deadlock situations before with the same code...
Attachment #308130 - Attachment is obsolete: true
Downstream Ubuntu (and probably Debian) are taking this patch to fix corruption issues they are having when upgrading Firefox while it is running. Would be great to get it upstream. Possible to get it for Firefox 3.0.1?
Flags: wanted1.9.0.x?
Flags: blocking1.9.0.1?
Comment on attachment 322150 [details] [diff] [review]
patch

This patch is not good. I just got a lock situation while running epiphany due to the removal of ReleaseZip.

Stacktrace:
#0  0x00007ffc20821d04 in __lll_lock_wait () from /lib/libpthread.so.0
#1  0x00007ffc2081d984 in _L_lock_1007 () from /lib/libpthread.so.0
#2  0x00007ffc2081d7ae in pthread_mutex_lock () from /lib/libpthread.so.0
#3  0x00007ffc26f29709 in PR_Lock (lock=0x1d4a0c0) at ptsynch.c:206
#4  0x00007ffc1a9aef37 in nsZipReaderCache::ReleaseZip (this=0x1d49e10, zip=0x80) at ../../dist/include/xpcom/nsAutoLock.h:219
#5  0x00007ffc1a9af086 in nsJAR::Release (this=0x1d4bf20) at nsJAR.cpp:158
#6  0x00007ffc1b003dc8 in nsSupportsHashtable::Remove (this=<value optimized out>, aKey=<value optimized out>, value=0x0) at nsHashtable.cpp:854
#7  0x00007ffc1a9af161 in nsZipReaderCache::GetZip (this=0x1d49e10, zipFile=0x45bb7f0, result=0x8ee7518) at nsJAR.cpp:1147
#8  0x00007ffc1a9b1dc7 in nsJARInputThunk::EnsureJarStream (this=0x8ee7500) at nsJARChannel.cpp:138
#9  0x00007ffc1a9b1f91 in nsJARChannel::Open (this=0x8a59640, stream=0x7fff2fd750f0) at nsJARChannel.cpp:663
#10 0x00007ffc1a9a3837 in nsStringBundle::LoadProperties (this=0x1f79c90) at nsStringBundle.cpp:128
#11 0x00007ffc1a9a4e1f in nsStringBundle::GetStringFromName (this=0x1f79c90, aName=0x80, aResult=0x1) at nsStringBundle.cpp:269

This looks pretty much like the deadlocks that forced me to force an unlock or do ReleaseZip beforehand.
Attachment #322150 - Attachment is obsolete: true
attachment 322150 [details] [diff] [review] + keep a ref on zip to not trigger ReleaseZip ... which causes the recurring lock attempt.

I was able to 100% reproduce the deadlock by simply touching some .jar files in chrome/ and trying to access new chrome elements afterwards in firefox.

I could verify that this patch fixes it for me.

Any thoughts?
Does the older zip properly get released afterwards, then ?

Replacing antiLockZipGrip = zip by ReleaseZip(zip) works, too...
Yes, putting the grip before the lock on the stack should make release happen after unlock.(In reply to comment #22)
> Does the older zip properly get released afterwards, then ?
> 
> Replacing antiLockZipGrip = zip by ReleaseZip(zip) works, too...
> 

Don't know if I understand you correctly, but yes, this patch tries to take care that actual Release happens happen _outside_ the GetZip lock.
(In reply to comment #22)
> Replacing antiLockZipGrip = zip by ReleaseZip(zip) works, too...
> 

Just tried this approach and I get the same deadlock by just

  sudo touch /usr/lib/xulrunner-1.9/chrome/*.jar

before opening any menu. Clicking on Edit will then deadlock firefox.

Could you please verify that this is the case for you too and then test the updated patch I submitted? Merci.

Attachment 308130 [details] [diff] has always worked fine for me... it's strange that it deadlocks for you with ReleaseZip... there's no reason it should...
well, i have been fast talking, strangely enough, the releasezip patch finally failed for me... while yours works. I'll start applying your patch in next xulrunner upload in debian.
Attachment #323500 - Flags: review?(cbiesinger)
Flags: blocking1.9.0.1? → blocking1.9.0.1-
What's the impact of this bug? Only on XULRunner apps, right?
(In reply to comment #27)
> What's the impact of this bug? Only on XULRunner apps, right?
> 

Comment #24 mentions deadlocking Firefox
Sure, if you upgrade XULRunner while Firefox is running. But is it the case that if a Linux user upgrades Firefox a bunch of their other applications fall over? And the fix is required on the 1.8 branch, not the 1.9 branch, right?
If you upgrade Firefox while it is running, chances are you will screw up your XUL FastLoad cache. This happens with any version and is unrelated to Firefox being built on top of xulrunner or not.
Attachment #323500 - Flags: review?(cbiesinger) → review+
Keywords: checkin-needed
http://hg.mozilla.org/mozilla-central/rev/c61681813a11
Status: ASSIGNED → RESOLVED
Closed: 16 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Target Milestone: --- → mozilla1.9.1b1
Flags: wanted1.9.0.x? → wanted1.9.0.x-
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: