Closed Bug 408122 Opened 17 years ago Closed 15 years ago

crashreporter segvs if called manually

Categories

(Toolkit :: Crash Reporting, defect)

x86
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla1.9.3a1

People

(Reporter: jburgess777, Assigned: karlt)

Details

Attachments

(1 file)

User-Agent:       Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.9b2) Gecko/2007121016 Firefox/3.0b2
Build Identifier: Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.9b2) Gecko/2007121016 Firefox/3.0b2

I doubt this is important, but I thought I'd try running crashreporter from the shell to see if it suffered from Bug 392919. After I clicked OK to the nice message telling not to run it like this, it promptly Segv'd.

Reproducible: Always

Steps to Reproduce:
1. Install firefox-3.0b2rc1
2. run ./crashreporter
3. Click OK on dialogue
4. Observe error in shell
Actual Results:  
[jburgess@shark firefox3]$ ./crashreporter
Segmentation fault

[jburgess@shark firefox3]$ gdb ./crashreporter
...
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 4160521936 (LWP 32527)]
0xf7d29730 in ?? ()
(gdb) bt
#0  0xf7d29730 in ?? ()
#1  0x0018763e in exit () from /lib/libc.so.6
#2  0x00171398 in __libc_start_main () from /lib/libc.so.6
#3  0x0804adf1 in ?? ()
On what distribution do you see this?

It doesn't crash on Ubuntu 7.10 x86_64.
But I see it crash on a Fedora 8 32bit, here's the trace with more symbols:

#0  0x00846420 in ?? ()
#1  0x0044bc80 in ORBit_POA_deactivate_object_T (poa=0x99faa28, pobj=0x99fca68, do_etherealize=<value optimized out>, 
    is_cleanup=1 '\001') at poa.c:1106
#2  0x0026bf76 in g_hash_table_foreach () from /lib/libglib-2.0.so.0
#3  0x0044f5a0 in ORBit_POA_deactivate (poa=0x99faa28, etherealize_objects=1 '\001', ev=0xbfdd4478) at poa.c:588
#4  0x0044f80e in ORBit_POA_destroy_T_R (poa=0x99faa28, etherealize_objects=<value optimized out>, ev=0xbfdd4478)
    at poa.c:500
#5  0x0044fa2c in PortableServer_POA_destroy (poa=0x99faa28, etherealize_objects=1 '\001', wait_for_completion=1 '\001', 
    ev=0xbfdd4478) at poa.c:1960
#6  0x0043ace6 in CORBA_ORB_shutdown (orb=0x99fa9a0, wait_for_completion=1 '\001', ev=0xbfdd4478) at corba-orb.c:1228
#7  0x0043ae5d in CORBA_ORB_destroy (orb=0x99fa9a0, ev=0xbfdd4478) at corba-orb.c:1258
#8  0x0043c57f in shutdown_orb () at corba-orb.c:307
#9  0x0734a63e in exit (status=0) at exit.c:75
#10 0x07334398 in __libc_start_main (main=0x804bf80, argc=1, ubp_av=0xbfdd4544, init=0x80504e0, fini=0x80504d0, 
    rtld_fini=0x23d940 <_dl_fini>, stack_end=0xbfdd453c) at libc-start.c:252
#11 0x0804adf1 in ?? ()




Component: General → Breakpad Integration
Product: Firefox → Toolkit
QA Contact: general → breakpad.integration
Version: unspecified → Trunk
I've seen this too, but never got around to filing it.  I think we're not cleaning some GNOME thing up properly.  I don't think this happens if you actually submit a report though, so it might be an easy fix.  To try submitting something from the command line, you can move a dump file (and its extra file) to some directory, and run "crashreporter /path/to/dump.dmp".

Thanks for filing!
Status: UNCONFIRMED → NEW
Ever confirmed: true
I'm seeing the problem on F8 x86-64 (with the 32 bit f3.0b2rc). If I pass an empty file then I get no segv:

[jburgess@shark firefox3]$ touch /tmp/foo
[jburgess@shark firefox3]$ ./crashreporter /tmp/foo
[jburgess@shark firefox3]$

The dailogue which appears says:

##  Unfortunately the crash reporter is unable to submit a report for this crash.
##  Details: The application passed an invalid argument.

I don't have a valid dump file to try it properly (unless you can point me at a simple way to generate one).
Look for bugs with keywords crash and testcase, then don't submit the dump, and move it out of the Crash Reports/pending directory afterwards.  I use bug 378521 since it's been around for a while.
When I open the test case I get a crash and the dump window appears OK. Clicking 'resrart firefox' does not work though, I get:

./firefox-bin: error while loading shared libraries: libxul.so: cannot open shared object file: No such file or directory

This looks to be the same effect as mentioned in Bug 407229.

Interestingly if I try to manually run the dump tool then I get the window appear correctly but I still get a segv:

$ ./crashreporter /home/jburgess/.mozilla/firefox/Crash\ Reports/pending/599a350c-845e-2572-56913222-0eb4fde0.dmp
Segmentation fault

The backtrace looks identical to the one I originally generated with no dump file.
The crash also happens when submitting a report.  It just happens when the client is trying to exit (after main is finished), so the report goes through fine.
While this one is pretty old - this might be useful: do you have libcurl.so.3 (or libcurl.so.4) installed? Apparently crashreporter dlopens it and crashes when it is missing.

 I found out when trying to debug firefox crashing on flash-plugin-10, which also needs libcurl.so.3 :-)

 It seems that the dependency on libcurl (and other dlopended libraries) needs to be documented/checked/error-handled for binary distributions.

Cheers
Martin
It shouldn't crash if it's missing, that's the whole point of dlopen()ing it. It should simply fail to send the report.
In addition to just not sending the report, it should log a useful message. All it says now is:

[Sun 13 Sep 2009 08:21:38 PM CEST] Crash report submission failed:

and then it catches a segvio. Not very useful.
Yeah, we should definitely fix that, at least.
Fixing the log-message would be great.

But it seems there are two problems here. Even with libcurl installed, "crashreporter" dies with an segvio, this time after sending the report. Kind of annoying.

The stack trace looks like the original post.

Cheers
Martin
Those stack traces don't mean anything to me. Karl, any idea?
It looks like a function registered with atexit (or on_exit) is in an unloaded library.

I'm not sure how that might be happening if what "man 3 atexit" says is true:

   Linux Notes
       Since glibc 2.2.3, atexit() (and  on_exit(3))  can  be  used  within  a
       shared  library  to establish functions that are called when the shared
       library is unloaded.

libORBit-2, which could be loaded by libgconf-2, libgnome-2, and/or
libgnomeui-2 uses atexit (from a function called CORBA_ORB_init).
The source in ORBit2-2.14.16/src/orb/orb-core/corba-orb.c seems to imply it expects some different behavior on Linux, but I don't follow what kind of different behavior might be better.

#ifndef G_OS_WIN32
        /* atexit(), which g_atexit() is just a #define for on Win32,
         * often causes breakage when invoked from DLLs. It causes the
         * registered function to be called when the calling DLL is
         * being unloaded. At that time, however, random other DLLs
         * might also have already been unloaded. There is no
         * guarantee WinSock even works any longer. Etc. Best to avoid
         * atexit() completely on Win32, and hope that just exiting
         * the process and thus severing all connections will be
         * noticed by all peers the process was connected to and acted
         * upon properly.

ORBit2 is deprecated so newer gnome libs may not cause this problem.
Apparently we can't rely on man 3 atexit.
gnome-vfs also uses atexit().

(gdb) info shared
From        To          Syms Read   Shared Object Library

0xb725f000  0xb729b788  Yes         /usr/lib/libgnomevfs-2.so.0

(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0xb7277030 in ?? ()
(gdb) bt
#0  0xb7277030 in ?? ()
#1  0xb759b68d in __libc_start_main (main=0x804e47f <main>, argc=1,
    ubp_av=0xbf9b1c64, init=0x80571e0 <__libc_csu_init>,
    fini=0x80571d0 <__libc_csu_fini>, rtld_fini=0xb7fa3f50 <_dl_fini>,
    stack_end=0xbf9b1c5c) at libc-start.c:252
#2  0x0804bc71 in _start ()
(gdb) p /x 0xb7277030 - 0xb725f000
$1 = 0x18030

% addr2line -j .text -C -f -i -e /usr/lib/debug/usr/lib/libgnomevfs-2.so.0.2400.0 0x18030
free_stack_tables_to_free
/build/buildd/gnome-vfs-2.24.0/libgnomevfs/gnome-vfs-module-callback.c:445
Assignee: nobody → mozbugz
Status: NEW → ASSIGNED
Attachment #402214 - Flags: review?(ted.mielczarek)
Comment on attachment 402214 [details] [diff] [review]
keep dynamic libraries open for atexit callbacks

I'd drop the bug numbers, they'll be in blame anyway. Thanks for tracking this down!
Attachment #402214 - Flags: review?(ted.mielczarek) → review+
http://hg.mozilla.org/mozilla-central/rev/7a855a5595da
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla1.9.3a1
Will this be fixed for 1.9.1/1.9.2 as well?
(In reply to comment #9)
> In addition to just not sending the report, it should log a useful message. All
> it says now is:
> 
> [Sun 13 Sep 2009 08:21:38 PM CEST] Crash report submission failed:

That message is improved in bug 517493.

> 
> and then it catches a segvio. Not very useful.

(In reply to comment #11)
> Even with libcurl installed,
> "crashreporter" dies with an segvio, this time after sending the report. Kind
> of annoying.

To aid in making a decision re whether to fix this on branches, can you comment on what inconvenience is caused, please?

I would have thought that most users wouldn't notice the crash on exit.
Does bug buddy notify the user?
OK, having the message improved solves most of the issue.

The segvio is mostly annoying, especially when running the command manually from a shell. It somehow implies trouble.
Users shouldn't be running the crashreporter manually. There's no reason to do so.
fwiw atexit basically can't work (as the comment excerpted in comment 13 kinda explains), if libraries need to be cleaned up before they're unloaded, they need to provide public entry points for that purpose and demand that their consumers use them.

http://mxr.maemo.org/diablo/source/glib2.0-2.12.12/glib/gutils.c?mark=234-236,247-249#216

I have in some cases had success in filing bugs against libraries asking them to clean up their behavior.

https://bugzilla.gnome.org/show_bug.cgi?id=563546 <- developer confirmed it's a bug, but then it got forgotten
https://bugs.maemo.org/show_bug.cgi?id=3420 <- got fixed eventually

For modules which are still alive/maintained, someone should probably file a bug against upstream (that'll probably be me, but it'd help if someone listed the modules i need to bug and their bug trackers).
(In reply to comment #22)
> For modules which are still alive/maintained, someone should probably file a
> bug against upstream (that'll probably be me, but it'd help if someone listed
> the modules i need to bug and their bug trackers).

The callers of g_atexit() I saw here were libgnomevfs and libORBit-2, both of which are deprecated, so not much point filing bugs there.

However I did file a bug against GLib:

(In reply to comment #13)
> It looks like a function registered with atexit (or on_exit) is in an unloaded
> library.
> 
> I'm not sure how that might be happening if what "man 3 atexit" says is true:
> 
>    Linux Notes
>        Since glibc 2.2.3, atexit() (and  on_exit(3))  can  be  used  within  a
>        shared  library  to establish functions that are called when the shared
>        library is unloaded.

The problem was that the library calling atexit() was Glib not the library that called g_atexit().

https://bugzilla.gnome.org/show_bug.cgi?id=599855
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: