If you think a bug might affect users in the 57 release, please set the correct tracking and status flags for Release Management.

XUL FastLoad cache corruption when application running while upgrading

RESOLVED FIXED in mozilla1.9.1b1

Status

()

Core
Networking: JAR
RESOLVED FIXED
11 years ago
9 years ago

People

(Reporter: glandium, Assigned: glandium)

Tracking

1.8 Branch
mozilla1.9.1b1
x86
Linux
Points:
---
Bug Flags:
blocking1.9.0.1 -
wanted1.9.0.x -

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment, 5 obsolete attachments)

(Assignee)

Description

11 years ago
This is something that does not happen on windows but will happen with unix builds.

Here is the scenario to trigger the issue:
- Start the application
- Upgrade the application while running, replacing the files currently in use (this kind of thing can happen when upgrading through the linux distribution upgrade system while a user is running the application).
- In the already running application, open a new kind of window, control, etc. For example, the preferences panel. It will more than probably show XML errors.
- Stop the application
- Start the application again. Go to the preferences panel again, if you went there and got XML errors. You will get the same errors.
- Stop the application
- Remove the XUL.mfasl file
- Start the application. Try again. The XML errors are gone.

To solve this, maybe it would be better to not cache invalid XUL files...
(Assignee)

Comment 1

11 years ago
A bit more precision:

- When you go to the preferences panel the first time, after upgrade, the XUL.mfasl is removed and recreated with the cache content for the broken preferences panel.
- The corruption doesn't seem to happen with flat chrome (no jar files), so this may be related to jar handling vs. the jar file that changing.
(Assignee)

Comment 2

11 years ago
I can confirm that bypassing zip cache in nsZipReaderCache::GetZip solves the problem. This cache should be invalidated when the zip file's mtime changes. This probably means adding the old mtime information to the nsJAR or the nsZipArchive class.

Changing bug component accordingly (though it sounds weird that Jar is under Networking)
Component: XP Toolkit/Widgets: XUL → Networking: JAR
QA Contact: xptoolkit.xul → networking.jar
(Assignee)

Comment 3

11 years ago
Created attachment 253083 [details] [diff] [review]
patch

I'm not quite sure it doesn't leak the previously cached nsJAR object, but releasing ends in a dead lock... Not knowing the code doesn't help :-p
Attachment #253083 - Flags: review?
(Assignee)

Updated

11 years ago
Attachment #253083 - Flags: review? → review?(darin.moz)
(Assignee)

Comment 4

11 years ago
(Rereading comment #0, I feel I have to precise it doesn't happen on windows because the installer doesn't want to run if an instance is running)
(Assignee)

Comment 5

11 years ago
Comment on attachment 253083 [details] [diff] [review]
patch

Mmmmmm for some reason, that leads to a lock...
Attachment #253083 - Flags: review?(darin.moz)
(Assignee)

Comment 6

11 years ago
Created attachment 253090 [details] [diff] [review]
patch

mZips.Put was trying to release the collisioning zip, which ended in a lock in nsZipReaderCache::ReleaseZip. The new patch removes the zip from mZips beforehand.
Attachment #253083 - Attachment is obsolete: true
Attachment #253090 - Flags: review?(darin.moz)

Comment 7

11 years ago
Comment on attachment 253090 [details] [diff] [review]
patch

sorry, i'm not going to have time to give this a proper review.  over to biesi.
Attachment #253090 - Flags: review?(darin.moz) → review?(cbiesinger)
Mike, is this a problem on the trunk?

biesi, do you think you could review the patch sometime?
Assignee: nobody → mh+mozilla
(Assignee)

Comment 9

10 years ago
Created attachment 284746 [details] [diff] [review]
New patch against trunk

Updated patch for current trunk
Attachment #253090 - Attachment is obsolete: true
Attachment #284746 - Flags: review?(cbiesinger)
Attachment #253090 - Flags: review?(cbiesinger)
Comment on attachment 284746 [details] [diff] [review]
New patch against trunk

+  zipFile->GetLastModifiedTime(&mMtime);

you should check that that succeeded

+      lock.unlock();
+      mZips.Remove(&key);
+      lock.lock();

why are you doing that outside of the lock? And why is that safe?

you should initialize mMtime in the constructor
(Assignee)

Comment 11

10 years ago
(In reply to comment #10)
> +      lock.unlock();
> +      mZips.Remove(&key);
> +      lock.lock();
> 
> why are you doing that outside of the lock? And why is that safe?

mZips.Remove does end up locking when it needs to. If you don't unlock, that's a dead lock situation.

I'll update the patch with your other comments when you tell me you're okay with this part.
It locks a different lock... consider ReleaseZip: That also removes the entry while holding the lock.
Comment on attachment 284746 [details] [diff] [review]
New patch against trunk

r-, please call .Remove while holding the lock
Attachment #284746 - Flags: review?(cbiesinger) → review-
(Assignee)

Comment 14

10 years ago
Created attachment 308130 [details] [diff] [review]
patch against trunk

This should solve your concerns.
Attachment #284746 - Attachment is obsolete: true
Attachment #308130 - Flags: review?(cbiesinger)
Comment on attachment 308130 [details] [diff] [review]
patch against trunk

+  if (zip && LL_EQ(Mtime, zip->GetMtime())) {

You can actually use == these days for comparing PRInt64 values.

+      ReleaseZip(zip);

I don't think it's necessary to call this... the mZips.Remove should be enough. Doesn't this cause a deadlock anyway?
Attachment #308130 - Flags: review?(cbiesinger) → review+
Mike, I just noticed that Ubuntu is taking this patch upstream for XULRunner 1.9. Can you address biesi's nits, and then I can work on getting this in for Firefox 3.1 / Gecko 1.9.1? Thanks!
Status: NEW → ASSIGNED
(Assignee)

Comment 17

10 years ago
I'll have to do some testing. IIRC, mZips.Remove alone leads to a deadlock situation.
(Assignee)

Comment 18

10 years ago
Created attachment 322150 [details] [diff] [review]
patch

Same patch with changes suggested in comment #15. I verified this works properly without ReleaseZip. Retrospectively, I wonder how I got deadlock situations before with the same code...
Attachment #308130 - Attachment is obsolete: true
Downstream Ubuntu (and probably Debian) are taking this patch to fix corruption issues they are having when upgrading Firefox while it is running. Would be great to get it upstream. Possible to get it for Firefox 3.0.1?
Flags: wanted1.9.0.x?
Flags: blocking1.9.0.1?
(Assignee)

Comment 20

10 years ago
Comment on attachment 322150 [details] [diff] [review]
patch

This patch is not good. I just got a lock situation while running epiphany due to the removal of ReleaseZip.

Stacktrace:
#0  0x00007ffc20821d04 in __lll_lock_wait () from /lib/libpthread.so.0
#1  0x00007ffc2081d984 in _L_lock_1007 () from /lib/libpthread.so.0
#2  0x00007ffc2081d7ae in pthread_mutex_lock () from /lib/libpthread.so.0
#3  0x00007ffc26f29709 in PR_Lock (lock=0x1d4a0c0) at ptsynch.c:206
#4  0x00007ffc1a9aef37 in nsZipReaderCache::ReleaseZip (this=0x1d49e10, zip=0x80) at ../../dist/include/xpcom/nsAutoLock.h:219
#5  0x00007ffc1a9af086 in nsJAR::Release (this=0x1d4bf20) at nsJAR.cpp:158
#6  0x00007ffc1b003dc8 in nsSupportsHashtable::Remove (this=<value optimized out>, aKey=<value optimized out>, value=0x0) at nsHashtable.cpp:854
#7  0x00007ffc1a9af161 in nsZipReaderCache::GetZip (this=0x1d49e10, zipFile=0x45bb7f0, result=0x8ee7518) at nsJAR.cpp:1147
#8  0x00007ffc1a9b1dc7 in nsJARInputThunk::EnsureJarStream (this=0x8ee7500) at nsJARChannel.cpp:138
#9  0x00007ffc1a9b1f91 in nsJARChannel::Open (this=0x8a59640, stream=0x7fff2fd750f0) at nsJARChannel.cpp:663
#10 0x00007ffc1a9a3837 in nsStringBundle::LoadProperties (this=0x1f79c90) at nsStringBundle.cpp:128
#11 0x00007ffc1a9a4e1f in nsStringBundle::GetStringFromName (this=0x1f79c90, aName=0x80, aResult=0x1) at nsStringBundle.cpp:269

This looks pretty much like the deadlocks that forced me to force an unlock or do ReleaseZip beforehand.
Attachment #322150 - Attachment is obsolete: true

Comment 21

9 years ago
Created attachment 323500 [details] [diff] [review]
with deadlock fix (v1)

attachment 322150 [details] [diff] [review] + keep a ref on zip to not trigger ReleaseZip ... which causes the recurring lock attempt.

I was able to 100% reproduce the deadlock by simply touching some .jar files in chrome/ and trying to access new chrome elements afterwards in firefox.

I could verify that this patch fixes it for me.

Any thoughts?
(Assignee)

Comment 22

9 years ago
Does the older zip properly get released afterwards, then ?

Replacing antiLockZipGrip = zip by ReleaseZip(zip) works, too...

Comment 23

9 years ago
Yes, putting the grip before the lock on the stack should make release happen after unlock.(In reply to comment #22)
> Does the older zip properly get released afterwards, then ?
> 
> Replacing antiLockZipGrip = zip by ReleaseZip(zip) works, too...
> 

Don't know if I understand you correctly, but yes, this patch tries to take care that actual Release happens happen _outside_ the GetZip lock.

Comment 24

9 years ago
(In reply to comment #22)
> Replacing antiLockZipGrip = zip by ReleaseZip(zip) works, too...
> 

Just tried this approach and I get the same deadlock by just

  sudo touch /usr/lib/xulrunner-1.9/chrome/*.jar

before opening any menu. Clicking on Edit will then deadlock firefox.

Could you please verify that this is the case for you too and then test the updated patch I submitted? Merci.

(Assignee)

Comment 25

9 years ago
Attachment 308130 [details] [diff] has always worked fine for me... it's strange that it deadlocks for you with ReleaseZip... there's no reason it should...
(Assignee)

Comment 26

9 years ago
well, i have been fast talking, strangely enough, the releasezip patch finally failed for me... while yours works. I'll start applying your patch in next xulrunner upload in debian.

Updated

9 years ago
Attachment #323500 - Flags: review?(cbiesinger)
Flags: blocking1.9.0.1? → blocking1.9.0.1-
What's the impact of this bug? Only on XULRunner apps, right?
(In reply to comment #27)
> What's the impact of this bug? Only on XULRunner apps, right?
> 

Comment #24 mentions deadlocking Firefox
Sure, if you upgrade XULRunner while Firefox is running. But is it the case that if a Linux user upgrades Firefox a bunch of their other applications fall over? And the fix is required on the 1.8 branch, not the 1.9 branch, right?
(Assignee)

Comment 30

9 years ago
If you upgrade Firefox while it is running, chances are you will screw up your XUL FastLoad cache. This happens with any version and is unrelated to Firefox being built on top of xulrunner or not.
Attachment #323500 - Flags: review?(cbiesinger) → review+
Keywords: checkin-needed
http://hg.mozilla.org/mozilla-central/rev/c61681813a11
Status: ASSIGNED → RESOLVED
Last Resolved: 9 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Target Milestone: --- → mozilla1.9.1b1
Flags: wanted1.9.0.x? → wanted1.9.0.x-
You need to log in before you can comment on or make changes to this bug.