Closed Bug 602826 Opened 14 years ago Closed 14 years ago

Freshly homemade trunk builds (SeaMonkey, Minefield or Shredder) hang at startup (if built with gcc 4.5.x ?)

Categories

(Core :: General, defect, P1)

All
Linux
defect

Tracking

()

VERIFIED FIXED
mozilla2.0b7
Tracking Status
blocking2.0 --- beta8+

People

(Reporter: fredbezies, Assigned: dbaron)

References

Details

(Keywords: hang, regression)

Attachments

(4 files, 1 obsolete file)

User-Agent:       Mozilla/5.0 (X11; Linux x86_64; rv:2.0b7pre) Gecko/20100101 Firefox/4.0b7pre
Build Identifier: 

Could be related to bug 601676.

I tweaked configure.in line for python and added python2.7 is order to see build process working.

But the problem is that produced software are not working anymore. They are just eating CPU (up to 101%) and nothing more :(

Reproducible: Always

Steps to Reproduce:
1.See details in bug #601676
2.
3.
Actual Results:  
Not working builds. Just eating CPU.

Expected Results:  
Working builds.

Official nightly build are working perfectly.
> Not working builds. Just eating CPU.

Can you attach a debugger and see what code is running?
Ok. Will do another build by adding "python27" to supported python version in configure.in (cf bug #601676) and tell you what happens.

A debug build will be better, I suppose ?
Yes, of course.
So, here is the result, when I try to run a debug build (infinite loop ?) :

###!!! ASSERTION: Computed overflow area must contain frame bounds: 'aNewSize.width == 0 || aNewSize.height == 0 || aOverflowAreas.Overflow(otype).Contains(nsRect(nsPoint(0,0), aNewSize))', file /home/fred/logs/fox/src/layout/generic/nsFrame.cpp, line 6092
###!!! ASSERTION: index out of range: '0 <= aIndex && aIndex < 2', file /home/fred/logs/fox/src/layout/generic/nsHTMLReflowMetrics.h, line 72
And before this infinite loop :

WARNING: 1 sort operation has occurred for the SQL statement '0x7f767be12d88'.  See https://developer.mozilla.org/En/Storage/Warnings details.: file /home/fred/logs/fox/src/storage/src/mozStoragePrivateHelpers.cpp, line 138
WARNING: dependent window created without a parent: file /home/fred/logs/fox/src/toolkit/components/startup/src/nsAppStartup.cpp, line 465
++DOCSHELL 0x7f76736df800 == 1
++DOMWINDOW == 1 (0x7f76736e0868) [serial = 1] [outer = (nil)]
WARNING: Subdocument container has no content: file /home/fred/logs/fox/src/layout/base/nsDocumentViewer.cpp, line 2403
WARNING: Context has no global.: file /home/fred/logs/fox/src/dom/base/nsJSEnvironment.cpp, line 2410
++DOMWINDOW == 2 (0x7f76736e3868) [serial = 2] [outer = 0x7f76736e0800]
WARNING: NS_ENSURE_TRUE(sf) failed: file /home/fred/logs/fox/src/docshell/base/nsDocShell.cpp, line 4913
WARNING: NS_ENSURE_TRUE(sf) failed: file /home/fred/logs/fox/src/docshell/base/nsDocShell.cpp, line 4913
WARNING: Subdocument container has no content: file /home/fred/logs/fox/src/layout/base/nsDocumentViewer.cpp, line 2403
LoadPlugin() /opt/java/jre/lib/amd64/libnpjp2.so returned 7f7672245400
WARNING: Unable to retrieve pref: plugins.unloadASAP: file /home/fred/logs/fox/src/modules/plugin/base/src/nsPluginHost.cpp, line 316
LoadPlugin() /usr/lib/mozilla/plugins/libtotem-cone-plugin.so returned 7f76722453d0
WARNING: Unable to retrieve pref: plugins.unloadASAP: file /home/fred/logs/fox/src/modules/plugin/base/src/nsPluginHost.cpp, line 316
LoadPlugin() /usr/lib/mozilla/plugins/libtotem-gmp-plugin.so returned 7f7672245790
WARNING: Unable to retrieve pref: plugins.unloadASAP: file /home/fred/logs/fox/src/modules/plugin/base/src/nsPluginHost.cpp, line 316
LoadPlugin() /usr/lib/mozilla/plugins/libtotem-mully-plugin.so returned 7f76722459a0
WARNING: Unable to retrieve pref: plugins.unloadASAP: file /home/fred/logs/fox/src/modules/plugin/base/src/nsPluginHost.cpp, line 316
LoadPlugin() /usr/lib/mozilla/plugins/libtotem-narrowspace-plugin.so returned 7f76734f71c0
WARNING: Unable to retrieve pref: plugins.unloadASAP: file /home/fred/logs/fox/src/modules/plugin/base/src/nsPluginHost.cpp, line 316
LoadPlugin() /usr/lib/openoffice/program/libnpsoplugin.so returned 7f76734f7400
WARNING: Unable to retrieve pref: plugins.unloadASAP: file /home/fred/logs/fox/src/modules/plugin/base/src/nsPluginHost.cpp, line 316
LoadPlugin() /usr/lib/mozilla/plugins/libflashplayer.so returned 7f76734f7700
WARNING: Unable to retrieve pref: plugins.unloadASAP: file /home/fred/logs/fox/src/modules/plugin/base/src/nsPluginHost.cpp, line 316
Chrome file doesn't exist: /home/fred/logs/fox/objdir-fx/dist/bin/chrome/toolkit/skin/classic/mozapps/update/update.png

$ hg identify
0983c1870159 tip

.mozconfig :

export AUTOCONF=autoconf-2.13

mk_add_options MOZ_OBJDIR=@TOPSRCDIR@/../objdir-fx
mk_add_options MOZ_MAKE_FLAGS=-j4

# Options for ‘configure’ (same as command-line options).
ac_add_options --enable-application=browser
ac_add_options --with-ccache
ac_add_options --enable-debug

Configure.in, line 994 :

MOZ_PATH_PROGS(PYTHON, $PYTHON python2.7 python2.6 python2.5 python2.4 python)

Hope it helps.

If you want gdb infos, just tell me how to do ;)
Got a backtrace (using backtrace command from gdb). I will had it asap. Hope it will help !
backtrace I got using gdb.
Attachment #481851 - Attachment is obsolete: true
What's the backtrace from?  Is it to where the assert is hit?

How did you determine there's an infinite loop?
This backtrace is from the "infinite loop" problem.

It is the result of the starting problem I reported. I got it this way.

1) Started a debug build
2) get a reference to do a backtrace
3) opened it in gdb and typed backtrace.

I hope it was the way to follow.
That backtrace shows the build has crashed.  Comment 0 talks about 100% CPU usage...  Crashed processes usually don't use 100% CPU.
I know. But the 100% CPU usage is "random". Could be because my linux distribution is completely busted. Anyway "the loop" is about those assertion which appears about 15 to 25 times before it stops beeping and gave me back control.
roc, any idea how we can hit those asserts?
Something to do with dbaron's overflow changes, I'm guessing. Maybe overflow font glyphs are triggering that infinite-reflow-loop bug in XUL stacks where they keep trying to adjust size to include the overflow of the children? I don't have the bug number handy...
Could you try the following (this should work given that it's a debug build):

create a file containing the text "* 1" (as one line)

set the environment variable GECKO_DISPLAY_REFLOW_RULES_FILE to the name of that file

start up the build, and pipe the output to a file

attach that file to this bug (perhaps compressed, if it ends up with gobs of output in an infinite loop)
Hmm, I guess we got rid of the code in nsFrame/nsBox that forced non-XUL frames in a XUL layout to expand to include their overflow. So that's not it.
(In reply to comment #15)
> Could you try the following (this should work given that it's a debug build):
> 
> create a file containing the text "* 1" (as one line)
> 
> set the environment variable GECKO_DISPLAY_REFLOW_RULES_FILE to the name of
> that file
> 
> start up the build, and pipe the output to a file
> 
> attach that file to this bug (perhaps compressed, if it ends up with gobs of
> output in an infinite loop)

Tried what you want me to do. Give me a 840 kb errors.log file. bzip2 -> 4 Kb.
Not related to python 3...

I switched to frugalware which has python 2.7...

Weird.
OS: Linux → Windows CE
blocking2.0: --- → ?
Could it be a gcc 4.5.x bug ? I can launch official nightlies, based on gcc 4.3.3...
Summary: Freshly homemade builds (minefield or shredder) cannot start on a python3 / python 2.7 enabled linux distribution. → Freshly homemade builds (minefield or shredder) cannot start if built with gcc 4.5.x ?
Could you get a stack for the first assertion ("index out of range")?  (And maybe some other occurrences of the "index out of range" assertion?)

There are two ways to do this:

 (1) set the environment variable XPCOM_DEBUG_BREAK=trap, and run in gdb, and use gdb's "bt" command every time it stops, and then "c" to continue

 (2) set the environment variable XPCOM_DEBUG_BREAK=stack, pipe the output to mozilla/tools/rb/fix-linux-stack.pl, and pipe the output of *that* to a file.  For example (in bash):
  XPCOM_DEBUG_BREAK=stack ./firefox 2>&1 | fix-linux-stack > output-file
(In reply to comment #20)
> Could it be a gcc 4.5.x bug ? I can launch official nightlies, based on gcc
> 4.3.3...

Yes, I think you're right. I just build with a gcc-4.6 snapshot (20101002) 
and everything is fine again. (version 4.5.1 is bad here)
So all we need to do now is to narrow the problem down to a simple testcase
and file a bug on the gcc bugzilla...
That doesn't necessarily mean it's a gcc bug.  I'd like to see a stack for that assertion, though.
FWIW, my self-built SeaMonkey on the openSUSE 11.3 gcc 4.5 shows this problem, but apparently the SeaMonkey builds from our buildbots seem to work fine, being done on the Mozilla gcc 4.5.1 (with a fix for a js-ctypes problem that turned out to be a gcc bug) - as the test suites being run on those builds show, at least.
OS: Windows CE → Linux
(In reply to comment #22)
> Could you get a stack for the first assertion ("index out of range")?  (And
> maybe some other occurrences of the "index out of range" assertion?)
> 
> There are two ways to do this:
> 
>  (1) set the environment variable XPCOM_DEBUG_BREAK=trap, and run in gdb, and
> use gdb's "bt" command every time it stops, and then "c" to continue
> 
>  (2) set the environment variable XPCOM_DEBUG_BREAK=stack, pipe the output to
> mozilla/tools/rb/fix-linux-stack.pl, and pipe the output of *that* to a file. 
> For example (in bash):
>   XPCOM_DEBUG_BREAK=stack ./firefox 2>&1 | fix-linux-stack > output-file

Followed the last line.

Adding log, in a bzip2 version.
In nsBlockFrame::ComputeOverflowAreas, if you change (near the end of the function) this:

    NS_FOR_FRAME_OVERFLOW_TYPES(otype) {
      nsRect& o = areas.Overflow(otype);
      o.height = NS_MAX(o.YMost(), bottomEdgeOfContents) - o.y;
    }

to this:

    nsRect& vo = areas.VisualOverflow();
    vo.height = NS_MAX(vo.YMost(), bottomEdgeOfContents) - vo.y;
    nsRect& so = areas.ScrollableOverflow();
    so.height = NS_MAX(so.YMost(), bottomEdgeOfContents) - so.y;does that help?
In which file ?
(In reply to comment #28)
> In nsBlockFrame::ComputeOverflowAreas, if you change (near the end of the
> function) this:
> 
>     NS_FOR_FRAME_OVERFLOW_TYPES(otype) {
>       nsRect& o = areas.Overflow(otype);
>       o.height = NS_MAX(o.YMost(), bottomEdgeOfContents) - o.y;
>     }
> 
> to this:
> 
>     nsRect& vo = areas.VisualOverflow();
>     vo.height = NS_MAX(vo.YMost(), bottomEdgeOfContents) - vo.y;
>     nsRect& so = areas.ScrollableOverflow();
>     so.height = NS_MAX(so.YMost(), bottomEdgeOfContents) - so.y;
> does that help?

Yes. Firefox starts normally, but unfortunately it segfaults as soon 
as one loads a complex website. (about:buildconfig is displayed fine,
for example, but clicking on the link to http://hg.mozilla.org results
in a segmentation fault.)
(In reply to comment #29)
> In which file ?

layout/generic/nsBlockFrame.cpp line 1491
(In reply to comment #25)
> FWIW, my self-built SeaMonkey on the openSUSE 11.3 gcc 4.5 shows this problem,
> but apparently the SeaMonkey builds from our buildbots seem to work fine, being
> done on the Mozilla gcc 4.5.1 (with a fix for a js-ctypes problem that turned
> out to be a gcc bug) - as the test suites being run on those builds show, at
> least.

I'm seeing this both on SeaMonkey 2.1b2pre builds (and on the 2.1b1pre builds dated Oct.7 but not Oct.6) downloaded from ftp.mozilla.org and also on homemade builds (which take approx. 24 hours to compile); in addition, my Python version is "only" 2.6.5

In reply to comment #20, my gcc version is:
gcc (SUSE Linux) 4.5.0 20100604 [gcc-4_5-branch revision 160292]
Copyright (C) 2010 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Since dbaron wants to keep my bug 602829 duped to here (see comment #14 and comment #21), the Hardware should not remain x86_64 only (I'm on i686 32-bit).

When starting as "./seamonkey/seamonkey -no-remote -ProfileManager" the profile manager does not even come up.
Hardware: x86_64 → All
Summary: Freshly homemade builds (minefield or shredder) cannot start if built with gcc 4.5.x ? → Freshly homemade trunk builds (SeaMonkey, Minefield or Shredder) hang at startup (if built with gcc 4.5.x ?)
Whiteboard: [started happening between 2010-10-06 and 2010-10-07]
Version: unspecified → Trunk
Severity: normal → critical
Keywords: hang, regression
Blocks: 542595
Whiteboard: [started happening between 2010-10-06 and 2010-10-07]
Had it again with the "official" SeaMonkey Oct.9 nightly from ftp.mozilla.org.

Tried it once with -browser, then with -mail, then with -chat : each of them got hung without displaying anything, requiring a kill -15. (Some disk movement noise at the start of the first run, which soon ceased.) Then I reinstalled the Oct.6 nightly, which started like a breeze (with browser and mailer, as per my Preferences for that profile).
If anyone is interested, I have a stdout+stderr log from the script listed below for a SeaMonkey trunk build with Build ID 20101008121130. It is bulky (full or almost-full build including a reconfigure: 8988420 bytes) so I'm not attaching it to this bug; but maybe someone could deduce something from the warnings put in it by gcc 4.5.0. Send me a mail if you want a copy.

#!/bin/bash
export AJM_OBJDIR='obj-i686-pc-linux-gnu'
date && \
echo 'python client.py checkout' && \
python client.py checkout && \
date && \
echo 'make -f client.mk build' && \
make -f client.mk build && \
test -n "$AJM_OBJDIR" -a -d $AJM_OBJDIR && \
date && \
echo "make -C $AJM_OBJDIR package" && \
make -C $AJM_OBJDIR package
echo 'Exit status' $?
date
I've been trying to build gcc 4.5.1 on my Ubuntu 10.04 machine, but the build keeps failing while linking libstdc++.la.  (I tried both with nearly-default options and with options much more like what Ubuntu 10.04 used to build their gcc.)
(In reply to comment #36)
> I've been trying to build gcc 4.5.1 on my Ubuntu 10.04 machine, but the build
> keeps failing while linking libstdc++.la.  (I tried both with nearly-default
> options and with options much more like what Ubuntu 10.04 used to build their
> gcc.)

Maybe it would easier to quickly set up an Arch qemu (kvm) image?
 
Arch uses gcc 4.5.1 at the moment with the following options:
% gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-unknown-linux-gnu/4.5.1/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: ../configure --prefix=/usr --enable-languages=c,c++,fortran,objc,obj-c++,ada --enable-shared --enable-threads=posix --enable-__cxa_atexit --enable-clocale=gnu --enable-gnu-unique-object --enable-lto --enable-plugin --disable-multilib --disable-libstdcxx-pch --with-system-zlib --with-ppl --with-cloog --libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man --infodir=/usr/share/info
Thread model: posix
gcc version 4.5.1 (GCC)
On #developers, <Anarchy> tells me I need to uninstall gold in order to build gcc.
Changing the symlink to /usr/bin/ld should suffice.
or just:

export LD=/usr/bin/ld.bfd
export LD_FOR_BUILD=/usr/bin/ld.bfd
export LD_FOR_TARGET=/usr/bin/ld.bfd

and then I won'
...t need to change my configuration that's set up to build firefox quickly.

That said, I actually needed more than that; I also needed --with-ld=/usr/bin/ld.bfd in order to get the gcc to work correctly, and I needed to set LD in my mozconfig as well in addition to setting CXX and CC.

That said, I now have a build that has the hang (although it actually doesn't assert first).
Attached patch patchSplinter Review
Of the things that worked, this seems the least painful.

(The other one that worked was having a |prev| variable inside the loop in NS_FOR_FRAME_OVERFLOW_TYPES, and checking prev != 1.)
Attachment #482403 - Flags: review?(roc)
Assignee: nobody → dbaron
blocking2.0: ? → beta8+
Priority: -- → P1
Target Milestone: --- → mozilla2.0b8
http://hg.mozilla.org/mozilla-central/rev/e84f3fb9fd56
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Mozilla/5.0 (X11; Linux i686; rv:2.0b8pre) Gecko/20101012 Firefox/4.0b8pre SeaMonkey/2.1b2pre - Build ID: 20101012010401

Previously I was seeing the bug in every SeaMonkey build including those from ftp.mozilla.org -- I VERIFY that this nightly does not show me the bug.

Please check other applications (e.g. Firefox) and/or platforms (e.g. x86_64) as appropriate.
No more problem for me. Answering from a yesterday homemade build of minefield, linux x86_64
IIUC, comment #44 and #45 cover the range of apps, platforms and build methods where this bug was seen => VERIFIED.
Status: RESOLVED → VERIFIED
Target Milestone: mozilla2.0b8 → mozilla2.0b7
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: