Closed Bug 373908 Opened 18 years ago Closed 18 years ago

Software update hangs part way though, chews CPU

Categories

(Toolkit :: Application Update, defect)

PowerPC
macOS
defect
Not set
major

Tracking

()

RESOLVED FIXED

People

(Reporter: jay, Assigned: moco)

References

Details

(Keywords: fixed1.8.0.12, fixed1.8.1.4, Whiteboard: see also bug #375619)

Attachments

(4 files, 1 obsolete file)

3.0a3pre updater indicates it is updating. "firefox will restart soon" line reached half way and stalls. laptop overheating: fan comes on. 30 minutes later unchanged using up plenty of resources. other applications work and load, but very slowly. no means to close updater. Required REBOOT probably force quit might have worked....
let me know if there's some file somewhere that might help resolve this bug
I'm seeing the same thing. The "Firefox is installing your updates and will start in a few moments..." progress indicator is stuck at about 33%, and the updater is chewing up 100% CPU. I've gathered up a bunch of debugging info, and confirmed it's caught in a loop. More info in a bit.
Severity: critical → major
Summary: REBOOT required → Software update hangs part way though, chews CPU
today updater failed according to dialogue it then downloaded whole application then when restarting, froze as per yesterday with heavy cpu usage etc. closed all applications and started successfully.
then after I closed this window, I discovered a dialogue telling my strongly recommend you upgrade clicked ok hangs on connect to server ~:"
justin writes in irc that he sees this with the trunk nightly on mac, but fwiw, I don't see any changes to the relevant code during the past month (http://bonsai.mozilla.org/cvsquery.cgi?branch=HEAD&file=mozilla/toolkit/mozapps/update/src/updater/&date=month). maybe something changed on the .mar side?
I did some low-level poking around in gdb, and found the update hang (and CPU consumption) is caused by a failure to exit the loop in ArchiveReader::ExtractItemToStream(). Looks like it's trying to process a zero-length file, and is confused when there's no data available. Summary of debugging follows.. Here's our stack: 6 ArchiveReader::ExtractItemToStream () 5 ArchiveReader::ExtractFile () 4 AddFile::Execute () 3 ActionList::Execute () 2 PatchFile::~PatchFile () 1 _pthread_body () --2-- PatchFile::~PatchFile() http://mxr.mozilla.org/seamonkey/source/toolkit/mozapps/update/src/updater/updater.cpp#735 It seems a little weird that this is in our stack. The destructor doesn't seem to obviously call anything other than ensure_remove(), when seems to end up in a nsIFile's remove(). I guess we're hooking the ActionList stuff into that somehow? --3-- ActionList::Execute() http://mxr.mozilla.org/seamonkey/source/toolkit/mozapps/update/src/updater/updater.cpp#1222 (nothing interesting here) --4-- AddFile::Execute() http://mxr.mozilla.org/seamonkey/source/toolkit/mozapps/update/src/updater/updater.cpp#673 We're making a tail call to ArchiveReader::ExtractFile(). --5-- ArchiveReader::ExtractFile() http://mxr.mozilla.org/seamonkey/source/toolkit/mozapps/update/src/updater/archivereader.cpp#73 The two string arguments (name, dest) are both "Contents/MacOS/libxpistub.dylib.bz2". [That's normal, as AddFile::Execute() passes the same variable to both args.] It has created and opened the destination file. --6-- ArchiveReader::ExtractItemToStream() http://mxr.mozilla.org/seamonkey/source/toolkit/mozapps/update/src/updater/archivereader.cpp#98 The first arg is a MarItem, 52 typedef struct MarItem_ { 53 struct MarItem_ *next; /* private field */ 54 PRUint32 offset; /* offset into archive */ 55 PRUint32 length; /* length of data in bytes */ 56 PRUint32 flags; /* contains file mode bits */ 57 char name[1]; /* file path */ 58 } MarItem; Here's the contents: (gdb) x/a $ebp+120xb00805e4: 0x3072a0 <-- that's our MarInfo (gdb) x/xw 0x3072a0 0x3072a0: 0x00000000 (next) (gdb) x/xw 0x3072a0 + 4 0x3072a4: 0x00065ba3 (offset) (gdb) x/xw 0x3072a0 + 8 0x3072a8: 0x00000000 (length) (gdb) x/xw 0x3072a0 + 12 0x3072ac: 0x000001a4 (flags) (gdb) x/s 0x3072a0 + 16 0x3072b0: "Contents/MacOS/libxpistub.dylib.bz2" 113 if (offset < (int) item->length && strm.avail_in == 0) As the loop starts again, item->length == 0 and offset == 0. So the first if() test fails. Should the test here be |offset <= item.length| instead? Or treat offset==length as a failure? 122 strm.next_out = outbuf; 123 strm.avail_out = BUFSIZ; outbuf's contents is currently UCS2 text (2-byte chars): "rsions of Info.plist keys * /\n\nCFBundleNa 0x01/00/00/00 0x01/00/00/00 ....however, I think that's just leftover bits from a previous operation, as the state indicates the buffer is empty. BUFSIZ is 1024. 125 ret = BZ2_bzDecompress(&strm); The BZ2 code is trying to check for a magic byte at the beginning of the input, but ends up doing nothing because there is no input data available. http://mxr.mozilla.org/seamonkey/source/modules/libbz2/src/bzlib.c#852 s->state ==BZ_X_MAGIC_1 calls BZ2_decompress http://mxr.mozilla.org/seamonkey/source/modules/libbz2/src/decompress.c#146 Executes code in GET_UCHAR(BZ_X_MAGIC_1, uc); macro.... http://mxr.mozilla.org/seamonkey/source/modules/libbz2/src/decompress.c#83 if(s->bsLive >= nnn) test fails (s->bsLive == 0, nnn==8) if (s->strm->avail_in == 0) test succeeds returns BZ_OK ...and returns that value. The remaining if() also do nothing, and so the ExtractItemToStream loop starts over again, with the same state as it began.
Yep, empty file... Dib:/Applications/Minefield.app/Contents/MacOS dolske$ ls -l libxpistub.dylib.bz2 -rw-r--r-- 1 dolske dolske 0 Mar 14 11:05 libxpistub.dylib.bz2 I have the contents of updates/0/ available, although at 41MB it's probably too large to attach here.
Dup of bug 372500? Although, I'm surprised it's showing up now; that machine hasn't had that issue for a couple of days now.
even if this the "bad mar" was caused by an out-of-space problem (thanks preed), the client code should be made to be more defensive and handle it better than pigging out on CPU.
This happened with an update downloaded automatically between the 13th and 14th... Dib:/Applications/Minefield.app/Contents/MacOS/updates/0 dolske$ ls -l total 79800 -rw-r--r-- 1 dolske dolske 1643740 Mar 14 11:05 0.patch -rw-r--r-- 1 dolske dolske 640892 Mar 14 11:05 1.patch -rw-r--r-- 1 dolske dolske 376952 Mar 14 11:05 10.patch -rw-r--r-- 1 dolske dolske 1382 Mar 14 11:05 11.patch -rw-r--r-- 1 dolske dolske 26701620 Mar 14 11:05 12.patch -rw-r--r-- 1 dolske dolske 760284 Mar 14 11:05 13.patch -rw-r--r-- 1 dolske dolske 1381040 Mar 14 11:05 14.patch -rw-r--r-- 1 dolske dolske 488632 Mar 14 11:05 15.patch -rw-r--r-- 1 dolske dolske 983568 Mar 14 11:05 16.patch -rw-r--r-- 1 dolske dolske 520680 Mar 14 11:05 17.patch -rw-r--r-- 1 dolske dolske 127712 Mar 14 11:05 18.patch -rw-r--r-- 1 dolske dolske 117876 Mar 14 11:05 19.patch -rw-r--r-- 1 dolske dolske 1951170 Mar 14 11:05 2.patch -rw-r--r-- 1 dolske dolske 324280 Mar 14 11:05 20.patch -rw-r--r-- 1 dolske dolske 724448 Mar 14 11:05 21.patch -rw-r--r-- 1 dolske dolske 355792 Mar 14 11:05 22.patch -rw-r--r-- 1 dolske dolske 37328 Mar 14 11:05 23.patch -rw-r--r-- 1 dolske dolske 1734728 Mar 14 11:05 24.patch -rw-r--r-- 1 dolske dolske 559676 Mar 14 11:05 3.patch -rw-r--r-- 1 dolske dolske 83264 Mar 14 11:05 4.patch -rw-r--r-- 1 dolske dolske 162376 Mar 14 11:05 5.patch -rw-r--r-- 1 dolske dolske 521636 Mar 14 11:05 6.patch -rw-r--r-- 1 dolske dolske 51312 Mar 14 11:05 7.patch -rw-r--r-- 1 dolske dolske 2669 Mar 14 11:05 8.patch -rw-r--r-- 1 dolske dolske 101936 Mar 14 11:05 9.patch -rw-r--r-- 1 dolske dolske 12288 Mar 14 11:05 update.log -rw-r--r-- 1 dolske dolske 7161 Mar 14 11:05 update.manifest -rw------- 1 dolske dolske 418511 Mar 14 04:46 update.mar -rw-r--r-- 1 dolske dolske 9 Mar 14 11:05 update.status drwxr-xr-x 3 dolske dolske 102 Mar 13 13:55 updater.app Dib:/Applications/Minefield.app/Contents/MacOS/updates/0 dolske$ ls -l .. total 184 drwxr-xr-x 32 dolske dolske 1088 Mar 14 11:05 0 -rw-r--r-- 1 dolske dolske 91657 Mar 13 13:55 last-update.log The last-update.log is from the day before, IIRC I manually updated it and it worked fine afterwards.
Hmm, something must be persistently broken after this... I killed off my original hung update yesterday, and made it update itself. It hung again, but I figured it might just be some funky junk getting cleared out. It hung again updating this morning. Stack looked the same, but looked to be hung on a different file.
after justin did all the hard work, I'm going to implement the fix suggested in comment #9: that updater should handle a mar like this gracefully. thanks again to justin for doing the initial analysis.
Assignee: nobody → sspitzer
After a lull of a few days, it looks like updates are being pushed again. Unfortunately my Minefield hung again on both the partial and full updates. Same stack signature, so whatever the root cause of this is (disk space?), it's still an issue.
Attachment #259804 - Attachment mime type: text/plain → text/xml
Attachment #259804 - Attachment mime type: text/xml → application/xml
I'm able to reproduce this with Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.9a3pre) Gecko/20070325 Minefield/3.0a3pre" from http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2007-03-25-04-trunk/firefox-3.0a3pre.en-US.mac.dmg when I run software update, the manifest in the complete mar (update.mar) contains 0 length items, which is hitting this bug. sh-3.1$ mar -t update.mar | less | grep "^0" 0 0644 Plugin.plugin/Contents/Resources/Default 0 0644 Plugin.plugin/Contents/Resources/Default 0 0644 Plugin.plugin/Contents/Resources/Default 0 0644 Plugin.plugin/Contents/Resources/Default 0 0644 Plugin.plugin/Contents/MacOS/Default 0 0644 Plugin.plugin/Contents/MacOS/Default 0 0644 Plugin.plugin/Contents/MacOS/Default 0 0644 Plugin.plugin/Contents/MacOS/Default 0 0644 Plugin.plugin/Contents/MacOS/Default
justin may have spotted the problem here. see bug #375415. the real path is supposed to be: Contents/MacOS/plugins/Default Plugin.plugin/Contents/Resources/Default, but due to bug #375415, we are ending up with Plugin.plugin/Contents/Resources/Default
Comment on attachment 259808 [details] [diff] [review] potential fix, need to make sure that is the most appropriate error code Looks good to me.
Attachment #259808 - Flags: review+
client fix landed on trunk. Checking in archivereader.cpp; /cvsroot/mozilla/toolkit/mozapps/update/src/updater/archivereader.cpp,v <-- ar chivereader.cpp new revision: 1.4; previous revision: 1.3 done I really think we want this for the branch(es), too. I'll go add a test case to litmus using the attachments in this bug.
Status: NEW → RESOLVED
Closed: 18 years ago
Flags: blocking1.8.1.4?
Flags: blocking1.8.0.12?
Resolution: --- → FIXED
Whiteboard: landed on the trunk, wanted for branch to prevent disasters, see bug #375619
Flags: blocking1.8.1.4?
Flags: blocking1.8.1.4+
Flags: blocking1.8.0.12?
Flags: blocking1.8.0.12+
Comment on attachment 259808 [details] [diff] [review] potential fix, need to make sure that is the most appropriate error code seeking approval for the branches.
Attachment #259808 - Flags: approval1.8.1.4?
Attachment #259808 - Flags: approval1.8.0.12?
Comment on attachment 259808 [details] [diff] [review] potential fix, need to make sure that is the most appropriate error code approved for 1.8.0.12 and 1.8.1.4, a=dveditz for release-drivers
Attachment #259808 - Flags: approval1.8.1.4?
Attachment #259808 - Flags: approval1.8.1.4+
Attachment #259808 - Flags: approval1.8.0.12?
Attachment #259808 - Flags: approval1.8.0.12+
in addition to addings this to litmus (which I have not done yet), this would be something good for the test suite.
Flags: in-testsuite?
Software Update is broken now.
> Software Update is broken now. how? what are you seeing? which version of Firefox are you using?
(In reply to comment #28) > > Software Update is broken now. > > how? what are you seeing? which version of Firefox are you using? He might mean bug 376428, which is a trunk-only regression.
What I meant became bug #376483 where Software Update appeared to hang partway through -- now solved. There is still the minor issue of high CPU usage (on trunk at least): bug #376983
fix landed on the MOZILLA_1_8_0_BRANCH: Checking in archivereader.cpp; /cvsroot/mozilla/toolkit/mozapps/update/src/updater/archivereader.cpp,v <-- ar chivereader.cpp new revision: 1.2.4.1.2.1; previous revision: 1.2.4.1 done fix landed on the MOZILLA_1_8_BRANCH: Checking in archivereader.cpp; /cvsroot/mozilla/toolkit/mozapps/update/src/updater/archivereader.cpp,v <-- ar chivereader.cpp new revision: 1.2.4.2; previous revision: 1.2.4.1 done
Whiteboard: landed on the trunk, wanted for branch to prevent disasters, see bug #375619 → see also bug #375619
Product: Firefox → Toolkit
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: