Closed Bug 57051 Opened 24 years ago Closed 23 years ago

Can't run under gdb

Categories

(Core :: XPCOM, defect, P3)

x86
Linux
defect

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: akkzilla, Assigned: blizzard)

References

()

Details

(Keywords: helpwanted)

On the trunk, I can no longer run in gdb.  I see:

> Note: verifyreflow is disabled
> Note: styleverifytree is disabled
> Note: frameverifytree is disabled
> [New Thread 3076 (runnable)]
> 
> Program received signal SIGTRAP, Trace/breakpoint trap.
> [Switching to Thread 2051 (runnable)]
> 0x0 in ?? ()

and I can't continue past that point.

Dbaron says:

> Reverting xpcom/threads/nsThread.cpp to revision 1.26 and
> xpcom/threads/nsThread.h to revision 1.15 (i.e., reverting dougt's
> checkin at 2000-09-30 22:35) fixes this problem for me.  (I figured
> this out by doing a binary search of release builds, which also show
> the problem.)

I'm running gdb-5.0-0 and glibc-2.1.3-15 on a fairly stock (but the gdb isn't
stock) RH 6.2 system.

Marking critical because not being able to debug at all is a very serious
regression.
Been seeing this too lately, the startup looks like this with gdb 4.18

(gdb) run
Starting program: /usr/local/mozilla/mozilla-bin
(no debugging symbols found)...(no debugging symbols found)...(no debugging
symbols found)...
(no debugging symbols found)...(no debugging symbols found)...(no debugging
symbols found)...
(no debugging symbols found)...[New Thread 20406 (manager thread)]
[New Thread 20404 (initial thread)]
[New Thread 20407]
 I am inside the initialize
 Hey : You are in QFA Startup
(QFA)Talkback loaded Ok.
[New Thread 20408]
Cannot access memory at address 0x2e362e6f
(gdb)

It's always the same address.
bryner and I also see this.  dougt said he didn't

Here are some machine details I compiled via email and IRC:

dbaron & bryner:
 * RH 7.0 plus all errata (dbaron tried with and without glibc errata)
 * gdb-5.0-7
 * glibc-2.1.92-14 and glibc-2.1.94-3
 * binutils-2.10.0.18-1

dbaron:
 * kernel 2.4.test10.pre2
 * SMP (dual P-733)
bryner:
 * kernel 2.2
 * not SMP

akkana & dougt:
 * gdb 5
 * glibc-2.1.3-15
 * binutils-2.9.5.0.22-6

akkana:
 * compat-binutils-5.2-2.9.1.0.23.1
 R.K.Aa comments make me think that this is debugging optimized builds,  akk,
are you trying to debug an optimized build?
For reference, the output I saw from gdb was:

Note: verifyreflow is disabled
Note: styleverifytree is disabled
Note: frameverifytree is disabled
[New Thread 3076 (LWP 30351)]
ptrace: No such process.
(gdb)
I'm not debugging an optimized build.  However, the problem *also* showed up in
release builds for me too -- that's how I quickly narrowed it down to a half-day
period and figured out it was your checkin.
No, I'm seeing this with my own debug builds.
this is a gdb bug. There's nothing we can do about it in mozilla.
I'm going to mark this INVALID in 3...2...1...
Status: NEW → RESOLVED
Closed: 24 years ago
Resolution: --- → INVALID
what is the bug?  is there a gdb bug number?  
It may not be a gdb bug, since some of us who see/don't see the problem have the
same versions of gdb.  It could be a glibc bug, or something else.  But we do
need to figure out which...
Reopening.  We need to be able to debug mozilla, and this used to work (and
still does work on the branch).

If we could find a version of gdb that does work to debug mozilla, that would be
a perfectly adequate resolution; but just saying "Oh, well, try to do your
debugging with printf" isn't enough.
Status: RESOLVED → REOPENED
Resolution: INVALID → ---
bruce pointed me at something earlier today:
http://sources.redhat.com/ml/gdb/2000-10/msg00016.html

basically gdb is really flakey when it comes to mixing threads and DLLs loaded
at runtime.
That's my message to the gdb mailing list.

This message is about using a GUI to run gdb.
Threads were an issue but only in the sense that the gdb
code needed to service the event loop when the app
was threaded.

I was working at getting Insight (a GUI for gdb) to work on
mozilla. The problem was that gdb when to sleep waiting for the
app (mozilla in my case) to stop. But to stop mozilla I wanted
to click the "Stop" button but the UI was frozen.

After I figured out how to provide appropriate info this was
fixed.

I just updated my tree and it is not crashing.

Is there something one does to make the crash happen?
err.. i run from console, simply a "mozilla -g -d gdb".
That approach has worked just fine till quite recently.
I run from rxvt, usually as "gdb mozilla-bin", but I just tried mozilla -g and
mozilla -g -d gdb, and saw the same problem both times.
I built the trunk and it doesn't happen for me.

So I'm copying Akkana's tree to my system and trying that.
when i run my copy of akkana's code on akkana's system it crashes
when i run it on my system it does not crash
(it is nfs mounted on both systems)
I checked that the loaded libraries are the same
(static lib must be the same since I'm running the same program)

[guitar]$ LD_LIBRARY_PATH=. ldd mozilla-bin 
        libgkgfx.so => ./libgkgfx.so (0x40015000)
        libxpcom.so => ./libxpcom.so (0x4006d000)
        libmozjs.so => ./libmozjs.so (0x401a9000)
        libjsj.so => ./libjsj.so (0x40257000)
        libplds4.so => ./libplds4.so (0x40278000)
        libplc4.so => ./libplc4.so (0x4027c000)
        libnspr4.so => ./libnspr4.so (0x40281000)
        libpthread.so.0 => /lib/libpthread.so.0 (0x402cd000)
        libjprof.so => ./libjprof.so (0x402e0000)
        libnsl.so.1 => /lib/libnsl.so.1 (0x402e4000)
        libutil.so.1 => /lib/libutil.so.1 (0x402fa000)
        libresolv.so.2 => /lib/libresolv.so.2 (0x402fd000)
        libdl.so.2 => /lib/libdl.so.2 (0x4030c000)
        libstdc++-libc6.1-1.so.2 => /usr/lib/libstdc++-libc6.1-1.so.2
(0x40310000)
        libm.so.6 => /lib/libm.so.6 (0x40352000)
        libc.so.6 => /lib/libc.so.6 (0x40370000)
        /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)
[guitar]$ sum -r /lib/libpthread.so.0 /lib/libnsl.so.1 /lib/libutil.so.1
/lib/libresolv.so.2 /lib/libdl.so.2 /usr/lib/libstdc++-libc6.1-1.so.2
/lib/libm.so.6 /lib/libc.so.6 /lib/ld-linux.so.2
25259   284 /lib/libpthread.so.0
11717   362 /lib/libnsl.so.1
16688    46 /lib/libutil.so.1
64731   166 /lib/libresolv.so.2
38159    74 /lib/libdl.so.2
62678  1118 /usr/lib/libstdc++-libc6.1-1.so.2
12635   516 /lib/libm.so.6
15111  4006 /lib/libc.so.6
24886   333 /lib/ld-linux.so.2
[guitar]$ 

[accipiter]$ LD_LIBRARY_PATH=. ldd mozilla-bin 
        libgkgfx.so => ./libgkgfx.so (0x40015000)
        libxpcom.so => ./libxpcom.so (0x4006d000)
        libmozjs.so => ./libmozjs.so (0x401a9000)
        libjsj.so => ./libjsj.so (0x40257000)
        libplds4.so => ./libplds4.so (0x40278000)
        libplc4.so => ./libplc4.so (0x4027c000)
        libnspr4.so => ./libnspr4.so (0x40281000)
        libpthread.so.0 => /lib/libpthread.so.0 (0x402cd000)
        libjprof.so => ./libjprof.so (0x402e0000)
        libnsl.so.1 => /lib/libnsl.so.1 (0x402e4000)
        libutil.so.1 => /lib/libutil.so.1 (0x402fa000)
        libresolv.so.2 => /lib/libresolv.so.2 (0x402fd000)
        libdl.so.2 => /lib/libdl.so.2 (0x4030c000)
        libstdc++-libc6.1-1.so.2 => /usr/lib/libstdc++-libc6.1-1.so.2
(0x40310000)
        libm.so.6 => /lib/libm.so.6 (0x40352000)
        libc.so.6 => /lib/libc.so.6 (0x40370000)
        /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)
[accipiter]$ sum -r /lib/libpthread.so.0 /lib/libnsl.so.1 /lib/libutil.so.1
/lib/libresolv.so.2 /lib/libdl.so.2 /usr/lib/libstdc++-libc6.1-1.so.2
/lib/libm.so.6 /lib/libc.so.6 /lib/ld-linux.so.2 
25259   284 /lib/libpthread.so.0
11717   362 /lib/libnsl.so.1
16688    46 /lib/libutil.so.1
64731   166 /lib/libresolv.so.2
38159    74 /lib/libdl.so.2
62678  1118 /usr/lib/libstdc++-libc6.1-1.so.2
12635   516 /lib/libm.so.6
15111  4006 /lib/libc.so.6
24886   333 /lib/ld-linux.so.2
[accipiter]$ 

uname -a

Linux guitar.mcom.com 2.2.14-5.0smp #1 SMP Tue Mar 7 21:01:40 EST 2000 i686
unknown

Linux accipiter.mcom.com 2.2.14-5.0smp #1 SMP Tue Mar 7 21:01:40 EST 2000 i686
unknown
[guitar]$ gtk-config --version
1.2.6
[guitar]$ 

[accipiter]$ gtk-config --version
1.2.6
[accipiter]$ 
[guitar]$ gdb -v
GNU gdb 5.0
Copyright 2000 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux".
[guitar]$ 


[accipiter]$ gdb -v
GNU gdb 20001004
Copyright 2000 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnu".
[accipiter]$ 

[guitar]$ gcc --version
egcs-2.91.66
[guitar]$ 

[accipiter]$ gcc --version
egcs-2.91.66
[accipiter]$ 
blizzard, can you look at this?
Assignee: dougt → blizzard
Status: REOPENED → NEW
I've built a recent (10/4) insight/gdb.

When I run it (uninstalled) on Akkana's machine guitar it does not crash but gdb
still has a problem.


People are free to install from my build
Note: it will as a side effect provide a GUI front end to gdb
Note: it will install in /usr/local/bin/gdb so the copy in /usr/bin/gdb
      will still be there. One will need have the new version first
      on their path.

To install:
cd to /u/bstell/downloads/insight/insight+dejagnu-20001004
as root do a make install
If you did something in mozilla that breaks gdb it's a gdb bug.  Period.  You
shouldn't be able to crash the debugger.  For what it's worth I don't have any
problems.  You can also try breaking at main, shutting off loading libraries and
let it finish running and loading the libraries later.  It makes it hard to
debug certain kinds of bugs but it does help some.
I've tried various combinations of loading / not loading shared libraries, and I
see the same symptoms.

Whatever the gdb bug is (or bug in something gdb uses), it only shows up on some
people's machines.  If we're going to report a bug against gdb, it might be good
to have some idea of what the conditions are, although I guess I could create an
account on my machine for someone to debug on...
I built with the "--prefix" options so I could run it without installing it.
However this means it won't install.

I'm rebuilding without the "--prefix".
There's a misperception here.  gdb doesn't crash; it starts up mozilla, then
mozilla dies with either a bad memory reference or a SIGTRAP (neither of which
happens when mozilla is run outside of gdb).  It's probably a race condition of
some sort, which would explain why it happens on some machines and not others,
and why it happens in slightly different ways on different machines.

I run with prun (from mozilla's debugging-hints.html), so I already am breaking
at main and delaying loading libraries, because that's the only way I've ever
managed to get mozilla to run under gdb5 and RH6.2, even on my huge-memory
machine.  (I used to be able to run directly under gdb4 and RH6.0.)
it looks like one cannot install off a NFS partition.
I would guess that the install creates a temp file but
on NFS root (need to be root to install) cannot create 
that file so the install fails.
Akkana upgraded her gdb (with insight/gdb) and mozilla on the trunk appears not 
to crash.

get insight here: http://sources.redhat.com/insight/
get gdb here: http://sources.redhat.com/gdb/
Life is good.  Insight is pretty cool, too.  Check it out!

We should add this info to the Mozilla debugging faq (I can do that, unless bliz
would rather do it).
sea linux 2000102021:
Can again run "mozilla -g -d gdb"
An additional "run" now starts mozilla just fine, like it used to, before this
weirdness. (still using RH6.2 gdb-4.18-11)
just fine, apart from a line during startup:
warning: find_solib: Can't read pathname for load map: Input/output error

Didn't notice this earlyer, perhaps it doesn't always display.
I encountred the same prob with gdb-5.0 from ftp.gnu.org compile on RH6.2.
After I grabbed cvs tree from :pserver:anoncvs@anoncvs.cygnus.com:/cvs/src:
cvs -z9 -d :pserver:anoncvs@anoncvs.cygnus.com:/cvs/src co gdb dejagnu
everything is OK(at least with 22 Oct 2000 cvs).
So I think it's gdb bug.
Can someone with a working gdb put his/her binary (or a source tarball) on the
web, please? I can't believe that everyone is supposed to pull the gdb sources
from CVS, it's so sloooowwwww :(
Ok, sorry, I yelled at the wrong people. The problem is that the German ftp
mirror is not up-to-date. Adding a working URL to the gdb snapshot tarball.
People probably want to talk to the gdb folks (or redhat folks, etc.) to figure 
out how to get pre-built gdb.

get insight here: http://sources.redhat.com/insight/
get gdb here: http://sources.redhat.com/gdb/

Since upgrading gdb seems to solve this issue should we close this bug?
Since it isn't even a problem with OLD gdb anymore, i think it's safe to close
this bug yes.
It isn't?  I still can't run under the gdb 5.0 that comes with RH7:

Note: verifyreflow is disabled
Style Data Sharing is Enabled :)
Note: styleverifytree is disabled
Note: frameverifytree is disabled
[New Thread 3076 (LWP 21385)]
[New Thread 4101 (LWP 21386)]
[New Thread 5126 (LWP 21387)]
ptrace: No such process.
(gdb) c
Continuing.
ptrace: No such process.
[Switching to Thread 3076 (unknown thread_db state 1)]
0x0 in ?? ()

Subsequent continues just repeat the same messages, except for the "[switching
to thread" message, which isn't repeated.

Sure would be nice if there were a binary install of gdb available somewhere
which could debug mozilla.
I am using RH7 too, with its version of gdb.  I am not having any problems.  

How are you starting mozilla?

I basically do this in emacs:

gdb ~/mozilla-bin
br main
cont
shar thread
shar xpcom
cont
Aha, you're right, it does work if I set auto-solib-add 0 and then do prun.
It's just a regular run that doesn't work now, even on a huge-memory machine), i.e.

gdb mozilla-bin
(gdb) run
gdb in redhat rawhide seems to have the necessary fixes:

ftp://rpmfind.net/linux/rawhide/1.0/i386/RedHat/RPMS/
Indeed, the rawhide RPM for gdb 5.0 does let me run, even without delayed shlib
loading.  However, once it runs it's not very useful: it randomly skips lines
when single-stepping with "next" (usually the lines I most want to step into),
and it randomly loses track of where it is in my source file and starts printing
line numbers from nsCOMPtr.cpp or somewhere else in xpcom even when it's still
stepping through source lines that live in some other file.  The latest gdb
built from cvs has the same problem.
akkana, are you still on an RH 6.x box running that rawhide rpm?
No, this is RH 7.0.
UGH.  This bit me this morning.  I am not sure what gives.  Upgrading to the url
that bryner suggests and this does not help  :-( 



This is what I did:

(gdb) prun
Breakpoint 1 at 0x8054de7: file nsAppRunner.cpp, line 1216.
Warning: MOZILLA_FIVE_HOME not set.
main (argc=2, argv=0xbffffab4) at nsAppRunner.cpp:1216
[New Thread 1024 (LWP 2022)]
(gdb) shar pthr
Symbols already loaded for /lib/libpthread.so.0
(gdb) c
Continuing.
Warning: MOZILLA_FIVE_HOME not set.
Type Manifest File: /home/builds/cmonkey/mozilla/dist/bin/components/xpti.dat
nsNativeComponentLoader: autoregistering begins.
nsNativeComponentLoader: autoregistering succeeded
nNCL: registering deferred (0)
 --- > Buffered registry read fs hits (31)
[New Thread 2049 (LWP 2025)]
[New Thread 1026 (LWP 2026)]
 --- > Buffered registry read fs hits (32)
GFX: dpi=96 t2p=0.0666667 p2t=15 depth=16
WEBSHELL+ = 1
[New Thread 2051 (LWP 2027)]
********** Got plugins path: /home/builds/cmonkey/mozilla/dist/bin/plugins
Note: verifyreflow is disabled
[New Thread 3076 (LWP 2028)]
Note: styleverifytree is disabled
ptrace: No such process.
(gdb) 



Any ideas?

dougt: try building from gdb's CVS tip.  the URL that bstell pasted in earlier
on can get you to instructions on how to do that.
I tried build 5.0-11 which works better, but it still gets stupid during thread 
switches.  
With gdb 20010102, I get continuable SIG32 breaks:

(gdb) run
Starting program: /project/omega/mozilla/mtrunk-0105/./mozilla-bin 
(no debugging symbols found)...(no debugging symbols found)...
(no debugging symbols found)...(no debugging symbols found)...
(no debugging symbols found)...(no debugging symbols found)...

Gdk-WARNING **: locale not supported by C library
(no debugging symbols found)...
Program received signal SIG32, Real-time event 32.
0x4024bb6e in __sigsuspend (set=0xbfffdc4c)
    at ../sysdeps/unix/sysv/linux/sigsuspend.c:48
48      ../sysdeps/unix/sysv/linux/sigsuspend.c: No such file or directory.
        in ../sysdeps/unix/sysv/linux/sigsuspend.c
(gdb) cont
Continuing.
dmose: Thanks, I indeed had forgotten about that, but I was talking about
nighlies here, and there a "b main" gives: Function "main" not defined.
So I just start them with "(gdb) run", and that causes two or four SIG32 breaks
before a mozilla window (profile manager) comes up, namely between 
	Gdk-WARNING **: locale not supported by C library
and
	Registering plugin 0 for: "*","All types",".*"
On exit, there's the same problem, and sometimes during the run, too. But these
may all be signs of subtle differences between Linux versions (I'm running on
SuSE 6.2/6.4 here). Debugging a self-made debug build doesn't show this problem.
I just started seeing this problem for the first time with a CVS debug build
with build date 2001-03-01 15:58:53 PST. In my case, the following steps
make it happen reproduceably:
1. Start mozilla using either mozilla -g (I have ddd) or
   export LD_LIBRARY_PATH=.
   gdb mozilla-bin
   The results are the same either way.
2. b main
3. r
4. set auto-solib-add 0
5. c

I see:
Program received signal SIGTRAP, Trace/breakpoint trap.
[Switching to Thread 3076 (runnable)]
0x0 in ?? ()
(gdb) 

I was able to debug using exactly this methodology without problem until 
recently. I have gdb 5.0 and gcc 2.95.2. I mention gcc because I only started
to see this problem after the upgrade of gcc from 2.91 to 2.95.2 and full
recompile of mozilla using it, but maybe that is just a coincidence. I get 
the problem with gdb itself compiled with gcc 2.91 or 2.95.2. The gcc business 
is probably just a bad tree I've been barking up.
This is a RH 6.2 system (but with kernel 2.4.2).
Still the original glibc-2.1.3-15. 
This has me absolutely at a standstill since I can't debug at all now. Based on 
reading all the prior comments, I guess I have to find a newer-than-5.0 gdb.
I have made some progress by getting a recent snapshot gdb (last weekly).
However, this is not entirely satisfacory. First, there is some sort of
incompatability with ddd. Wben I attempt to do a 'set env', I get 2 alerts:
"GDB is busy", and "GDB terminated abnormally". The latter one has a 
"Restart GDB" button. If I click that, I am able to go ahead and type
for example 'set env XPCOM_BREAK_ON_LOAD msgnews'
this time without error (!?). But on most (but not all) attempts, after it has
stopped (or sometimes before) at the load of libmsgnews.so and I have 
successfully set my breakpoint, everything grinds to a halt with:
  Cannot find thread 33: invalid thread handle.
This last error occurs less often if I just run gdb from the command line, but
it still happens sometimes even then. Once it does, I'm dead.
Is there a specific gdb version that actually works reliably on mozilla?
seems like this is a fine collection of several gdb problems.
I do get the "ptrace: No such process" message.
Perhaps it is helpfull for you to know, that this ONLY happens with a
selfcompiled version taken from CVS, while the precompiled versions from
mozilla.org work fine.
I do a configure with the following options:
./configure --enable-strip-libs --enable-optimize --disable-debug 
--disable-pedantic 
via CXXFLAGS and friends "-O2" is set. The compiler is gcc 2.95.2, gdb is
version 5.0 and I am using a Suse 7.1 Linux distribution.

Please note, that this bug prevents me from getting good stack traces, which are
quite usefull for bugzilla.
rumstich: I suspect --enable-strip-libs might be what's causing your problem. 
try removing that and see if it makes any difference.
You may also want to try a post-5.0 gdb. I'm running a 04/01/2001 gdb. 
"info threads" still comes up empty, but otherwise it seems to work (SuSE 6.2).

As a side note (probably unrelated): if your gdb stops on SIG32 real time
events, adding a "handle SIG32 nostop" line to ~/.gdbinit may help.
<aol>me too</aol>

seriously, this is blocking me from getting any stack traces out of my own debug
builds. i should note that there's some vaguely useful info out there if you
search google for "ptrace no such process."

upping to blocker, per trudelle. adding cc's.
Severity: critical → blocker
dan:
I just found out, that I am stupid :-/
I just tested one of the latest binary downloads and it did not work either, it
also showed the ptrace problem. Seems that since I installed the new Suse a few
monthes ago I never downloaded a binary Mozilla (with a slow modem it takes too
quite a while).
Ok, the debugging works on a Suse 6.4, but not on a Suse7.1. I guess that one
the the newer things broke this feature. After reading this bug I think it is
gdb5.0, Suse6.4 still has gdb 4.18. But I don't know anything about it...
FWIW, the CVS tree for GDB is scheduled to get some big whacks related to it's
handling of threads in the next few weeks.  See
<http://sources.redhat.com/ml/gdb-patches/2001-04/msg00240.html> and subsequent
messages for more details.
now after mozilla changed to gcc2.95.3 I can't use gdb on my good old Suse6.4
system anymore, it worked fine here before. I get the following message

Thread 1721 (manager thread)]
[New Thread 1720 (initial thread)]
[New Thread 1722]
[New Thread 1723]
[New Thread 1724]
Cannot access memory at address 0x46203a6c.

looks like the behaviour under gdb is also connected to the libs dynamicly
linked to Mozilla. without gdb mozilla works fine. 
Suse6.4 ships with gcc2.95.2, so i am quite surprised that I get problems now! 
Works for Me on Red Hat 7.1
Status: NEW → RESOLVED
Closed: 24 years ago23 years ago
Resolution: --- → WORKSFORME
blizzard:
doesn't work on a LOT of distributions, for example several Suse's, all those
which ship with gdb5.0. 
As described in this bug this is caused by a bug in gdb and it should work find
with an upgraded gdb. Not sure if you can expect a little user to upgrade to a
new (unofficial?) gdb just to produce a stacktrace...
Jens-Uwe: what other solution do you propose?
Moving all threading bugs to XPCOM. See bug 160356.
Component: Threading → XPCOM
You need to log in before you can comment on or make changes to this bug.