Closed
Bug 380015
Opened 18 years ago
Closed 17 years ago
Crash [@ nsFrame::BoxReflow] on startup when Fx 2.0 libraries not removed from install directory
Categories
(Firefox :: Installer, defect, P2)
Firefox
Installer
Tracking
()
RESOLVED
FIXED
mozilla1.9beta5
People
(Reporter: fantasai.bugs, Assigned: robert.strong.bugs)
References
Details
(Keywords: crash, topcrash)
Crash Data
Attachments
(3 files)
Description:
Nightly builds segfault on startup.
Steps to Reproduce:
1. Download nightly build
2. tar -xvjf f<tab>
3. cd firefox
4. ./firefox
Expected Results:
Working firefox build
Actual Results:
./run-mozilla.sh: line 131: 759 Segmentation fault "$prog" ${1+"$@"}
Tested with ftp.mozilla.org trunk nightlies, 2007-02-03 and 2007-05-07 on Ubuntu Linux 6.06 on Dell D620 machine.
I don't have this problem when I compile a build myself.
Updated•18 years ago
|
Product: Firefox → Core
QA Contact: general → general
Comment 1•18 years ago
|
||
I can reproduce this with a static opt build on ubuntu 7.04.
Comment 2•18 years ago
|
||
oops, missed the reported error:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 47007716719648 (LWP 10267)]
0x00002ac0ceed561c in nsFrame::BoxReflow (this=0x64dd70, aState=@0xbf08b0, aPresContext=0xd455e8, aDesiredSize=@0x7fffdc57ebd0,
aRenderingContext=0x177000ef4e10, aX=32767, aY=-823286586, aWidth=6000, aHeight=0, aMoveFrame=0) at nsFrame.cpp:6257
6257 if (metrics->mLastSize.width != aWidth)
Status: UNCONFIRMED → NEW
Ever confirmed: true
Comment 3•18 years ago
|
||
(gdb) print metrics
$1 = (nsBoxLayoutMetrics *) 0x0
Updated•18 years ago
|
Flags: blocking1.9?
Comment 4•18 years ago
|
||
Comment 5•18 years ago
|
||
The pango_shape valgrind warnings are likely due to bug 381654.
The XftFontOpenInfo warnings are independent.
Comment 6•18 years ago
|
||
Attachment 265736 [details] [diff] submitted for bug 381654 but I'm not so hopeful that it will help here.
Depends on: 381654
Comment 7•17 years ago
|
||
This has the same stack as the crash from bug 383875:
https://crash-reports.mozilla.com/reports/report/index/27a59de6-1758-11dc-a54e-001a4bd43ed6
Would be nice if crash-reports worked. :-/
Comment 8•17 years ago
|
||
Should have put more info there, that crash is on Win32, with my patch to embed manifests in all DLLs, crash is 100% reproducible on startup on a machine without the VC8 CRT installed.
Comment 9•17 years ago
|
||
FYI, this is our #1 topcrasher over the past few days:
https://crash-reports.mozilla.com/reports/?do_query=1&product=Firefox&version=Firefox%3A3.0a6pre&signature_search=signature&signature_type=contains&signature=&date=&range_value=3&range_unit=days
For instance:
https://crash-reports.mozilla.com/reports/report/index/f5c7b3a3-1aab-11dc-a4b3-001a4bd43ed6
OS: Linux → All
Hardware: PC → All
Summary: segfault on nightly startup → Crash [@ nsFrame::BoxReflow] on startup
Updated•17 years ago
|
Comment 10•17 years ago
|
||
Comment on attachment 265674 [details]
backtrace
This backtrace shows a problem with the stack. Frame 5 has a null aPresContext, but its caller just passes it through.
These are opt builds so the parameter values in the stack are not reliable, right?
Comment 12•17 years ago
|
||
(In reply to comment #11)
> These are opt builds so the parameter values in the stack are not reliable,
> right?
>
er, yes, of course
Comment 13•17 years ago
|
||
do people have any reason to believe this is not a duplicate of bug 292549?
Depends on: 292549
I launched -P and created a new profile (the old profile was just from 2.0.0.6, and was a "stock" profile with no new bookmarks or pref changes), and this started working for me again...
Linux mozilla-qa 2.6.20-16-generic #2 SMP Thu Jun 7 20:19:32 UTC 2007 i686 GNU/Linux
Comment 15•17 years ago
|
||
WFIW, according to [1] almost zero nsFrame::BoxReflow crashes on trunk 2007-06-13 through 2007-08-25. Then, a series of thunderbird crashes, which seem to have gone.
[1] http://crash-stats.mozilla.com/report/list?range_unit=weeks&branch=1.9&range_value=2&signature=nsFrame%3A%3ABoxReflow%28nsBoxLayoutState%26%2C+nsPresContext%2A%2C+nsHTMLReflowMetrics%26%2C+nsIRenderingContext%2A%2C+int%2C+int%2C+int%2C+int%2C+int%29
Comment 16•17 years ago
|
||
This is a topcrash on the trunk again. Currently #18.
http://crash-stats.mozilla.com/?do_query=1&product=Firefox&branch=1.9&version=Firefox%3A3.0a9pre&query_search=signature&query_type=contains&query=&date=&range_value=1&range_unit=weeks
I think this is a dupe, so requesting blocking of the other bug as well. :)
Comment 17•17 years ago
|
||
+'ing. Seems to be crashing on windows as well. Setting priority to P2 so we can dig into this rather quickly. It's at #16 on top crashers.
Flags: blocking1.9? → blocking1.9+
Comment 18•17 years ago
|
||
No real STR, let's watch for it in Beta 4 and see if we can figure out the cause.
Flags: tracking1.9+
Comment 19•17 years ago
|
||
(to be clear, not marking wanted-next+ as we're not sure that this hasn't been fixed by some other bug; for now this is a tracking bug for this type of crash, if it becomes a major issue then it should be renominated as blocking1.9?)
Comment 20•17 years ago
|
||
That tells me that this bug is still blocking, on investigation and confidence if not on additional code changes. Renominating, because I think we should prefer to have maybe-fixed things on the list than to have to remember to watch for these or detect that a reported crash is similar enough to be renominated.
Flags: blocking1.9?
Comment 21•17 years ago
|
||
blocking1.9+, P2, since Shaver is so demanding and all.
Flags: blocking1.9? → blocking1.9+
Priority: -- → P2
It's worth noting that a lot of these stacks (all of the ones that seem unmangled all the way down to main) go through:
http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/toolkit/xre/nsAppRunner.cpp&rev=1.208&mark=3076#3076
Maybe there's something unusual with the way we bring up the modal dialog so early in startup?
So how does one get the extension manager to bring up modal dialogs from that line of code? Is it possible that doing that is reliably crashy on Windows?
Comment 24•17 years ago
|
||
(In reply to comment #23)
> So how does one get the extension manager to bring up modal dialogs from that
> line of code? Is it possible that doing that is reliably crashy on Windows?
If you have some extensions installed that aren't compatible with your current version of firefox (setting extensions.checkCompatibility to false will let you install them) then change extensions.lastAppVersion to a lower version number, quit firefox and delete the compatibility.ini file from your profile. Then on startup you should get the modal window from there.
It is worth mentioning that there is another point in there that we can launch a modal dialog, the extension updates dialog which would open from:
http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/toolkit/xre/nsAppRunner.cpp&rev=1.208&mark=3084#3084
Do we get stack traces there?
I wonder if this is related to the kind of reflow loop I see as part of bug 354527 (caused by bug 413336)
So, looking through the topcrashes on Windows showing up with this signature, I'm noticing that most (although not all) of them are 1.9 builds (with firefox.exe showing version 1.9.0.2988) that have a bunch of 1.8.* version libraries loaded (typically xpcom_core.dll, jar50.dll, and myspell.dll; sometimes also spellchk.dll and xpinstal.dll) -- generally all showing the same 1.8.* version (almost all 1.8.20080.20121, but I saw one 1.8.20071.12718). I don't see this happening with other crashes that I looked at.
See bp-c3fbe6ba-f07f-11dc-b0b4-001a4bd43ef6 for an example (the "Modules" tab).
Comment 26•17 years ago
|
||
The end of the version ID is the Build ID. 2008020121 is 2.0.0.12:
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.12) Gecko/20080201 Firefox/2.0.0.12 (although this doesn't show the hour)
The fact that that's most of them just proves that we have good security release uptake. :)
Yeah, I knew that the 20080.20121 was a build ID. I'm still not sure what the .2988 is.
I just did the experiment of (in a clean directory):
$ (cd /c/Program\ Files/Mozilla\ Firefox/ && tar -c .) | tar -x
$ mkdir deleteme
$ ./firefox.exe -profile deleteme
[quit]
$ (cd /c/Program\ Files/Minefield/ && tar -c .) | tar -x
$ ./firefox.exe -profile deleteme
and the build crashed on startup, producing bp-faffdae1-f089-11dc-9cc6-001a4bd43ed6. I'm still waiting to see if it's this crash, but I'm sort of expecting it will be. (If so, the question then becomes how users end up in that situation and if there's anything we can do avoid ending up with those dlls there or to avoid using them.)
Comment 28•17 years ago
|
||
We changed the versioning on trunk, so the last field is just "days since jan 1, 2000".
Aren't crashes like these the reason we stopped shipping zip builds? I can't fathom how you'd wind up in this situation using the installer, seems like you would have to manually screw yourself by copying files around.
Comment 29•17 years ago
|
||
Although, I guess this was initially reported on Linux, where we *only* ship tarballs. :-/
Sure enough, this is the crash signature you get when you copy a Minefield install on top of a Firefox 2.0.0.12 install. See bp-faffdae1-f089-11dc-9cc6-001a4bd43ed6 (same one mentioned above), bp-3fce0e4a-f08c-11dc-83ef-001a4bd46e84, and bp-54fcbec7-f08c-11dc-b2c3-001a4bd43ed6.
For what it's worth, I tried the other way around 3.0 to 2.0.0.12 (wondering if I could get a 3.0 thanks to simultaneous use of updater and a fresh install on top), and I got a busted 2.0.0.12 since it (of course) doesn't know to remove the libraries that are new in 3.0 (brwsrcmp.dll), resulting in crashes in nsACString_internal::Assign: TB42544824, TB42544899.
So I guess we need to try to figure out how users are ending up in this situation (enough that it's the #6 topcrash on the first day of release, although it's since dropped to #16).
Summary: Crash [@ nsFrame::BoxReflow] on startup → Crash [@ nsFrame::BoxReflow] on startup when Fx 2.0 libraries not removed from install directory
I just discussed this briefly with Rob Strong. The discussion led to two breakpad/socorro questions:
1. are the files in the module list stored in the database as full pathnames and just shown in the UI as file basenames, or does the database only really have the basename? If the former, could we see some examples from some of these **Windows** incidents (perhaps cleaned up) so we could find out:
+ what the directory name the install was done into was (might tell us something about steps to repro -- e.g., whether it's Firefox, Minefield, etc.)
+ so that we could confirm that this is happening on Windows because of crossing of files within a single install directory rather than some condition that causes dlls from separate install directories to be used at the same time
2. whether it would be possible to make the install.log part of the data sent in the breakpad reports, at least temporarily, so that we could see if there were installation issues causing this and what they were. (Potentially this could be conditioned on something you could test at runtime, like whether there's an "xpcom_core.dll" in the Modules list.)
Comment 32•17 years ago
|
||
1. Only the leaf name is being stored in the database, but the dump files contain full pathnames. They get stripped out during processing, so it would currently be slightly tricky to get them back, but I think we could.
2. We could probably send install.log wholesale, but since these are startup crashes I guess we'd have to do it very early in startup. Not terrible, just have to use NSPR file methods. The crash reporter doesn't currently conditionally send any data after the crash, it's all setup beforehand. We don't actually look at the list of modules while we're submitting it, so it'd be easier just to send it all the time. We'd need a db change to store it, though.
Comment 33•17 years ago
|
||
Any thoughts on how we can fix? fx3 installer changes, with checking/warnings about installing on top of an existing installation?
we could quitely disallow installation on top, or provide warning, or insist on retrying another install location.
we are past the string deadline if we need to present the user with additional info, but this might deserve an exception. Marking late-l10n until we know more about how we want to address. if its late-l10n the pri should move to P1 as well.
dveditz might also have some ideas on how we have approached this in the past...
Keywords: late-l10n
Comment 34•17 years ago
|
||
The installer already removes these files. The only way to get into this situation is to manually extract on top of an existing build.
Assignee | ||
Comment 35•17 years ago
|
||
I think I see what is going on here for Win32 and should be able to fix it without a string change.
Assignee | ||
Comment 36•17 years ago
|
||
Moving over to toolkit -> nsis installer
Component: General → NSIS Installer
Product: Core → Toolkit
QA Contact: general → installer
Target Milestone: --- → mozilla1.9beta5
Assignee | ||
Updated•17 years ago
|
Assignee | ||
Comment 38•17 years ago
|
||
note: should have a patch over this weekend.
Comment 39•17 years ago
|
||
(In reply to comment #30)
> I got a busted 2.0.0.12 since it (of course) doesn't know to remove
> the libraries that are new in 3.0 (brwsrcmp.dll)
We could fix that. If we expect some number of people to install an early 3.0 beta/preview and then downgrade until some site/addon/bug gets fixed we probably should. I'm pretty sure we did that with later versions of the FF1.5 installer.
Filed bug 423226 to add this to the FF2 installer, should block 2.0.0.14 (unfortunately it's missed 2.0.0.13, unless we think this is respin-worthy and/or some other regression forces a respin). That's pretty late (will miss beta 5, for example), is this important enough to stop-ship 2.0.0.13? how common a crash is this?
Fixing the windows installer doesn't help on Mac or Linux of course, not sure what we could do there.
Depends on: 423226
Assignee | ||
Comment 40•17 years ago
|
||
Adds delete on reboot support for files listed in removed-files.log that are deleted.
This is similar to the following for comparison purposes
http://lxr.mozilla.org/seamonkey/source/toolkit/mozapps/installer/windows/nsis/common.nsh#3710
Attachment #310306 -
Flags: review?(benjamin)
Assignee | ||
Updated•17 years ago
|
Whiteboard: [has patch]
Comment 41•17 years ago
|
||
Comment on attachment 310306 [details] [diff] [review]
patch rev 1
Boy, do I wish we had some installer unit-tests... rstrong, can you maybe file a bug about that and we can think about it after FF3 ships?
Attachment #310306 -
Flags: review?(benjamin) → review+
Assignee | ||
Comment 42•17 years ago
|
||
Filed bug 423754 for the NSIS installer unit tests
Assignee | ||
Comment 43•17 years ago
|
||
Checked in to trunk. This should fix the recent rise in the Win32 crashes with trunk builds. bug 423226 is for the branch bug as mentioned in comment #39.
Checking in mozilla/toolkit/mozapps/installer/windows/nsis/common.nsh;
/cvsroot/mozilla/toolkit/mozapps/installer/windows/nsis/common.nsh,v <-- common.nsh
new revision: 1.35; previous revision: 1.34
done
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Whiteboard: [has patch]
As far as trunk topcrash data go: this signature was responsible for about 10-40 crashes per build ID in nightlies in the week leading up to this fix landing; in the 3 days since it landed there have been a total of 6 crashes. So it looks like it fixed most (although not quite all) of this problem.
Updated•13 years ago
|
Crash Signature: [@ nsFrame::BoxReflow]
Updated•1 year ago
|
Component: NSIS Installer → Installer
Product: Toolkit → Firefox
You need to log in
before you can comment on or make changes to this bug.
Description
•