Open Bug 675585 Opened 13 years ago Updated 2 years ago

crash [@ g_slice_free1] when run firefox-bin -no-remote -process-updates

Categories

(Toolkit :: Startup and Profile System, defect)

x86
Solaris
defect

Tracking

()

People

(Reporter: ginnchen+exoracle, Unassigned)

References

Details

(Keywords: crash)

Run firefox-bin -no-remote -process-updates
Core dumped.

Stack:
 fb2d1db0 ???????? (80e9c68, 0, 60, fe77949c, 8068c40, fe83fc50)
 fe78fdea g_slice_free1 (30, 80dbb50, fe840e6c, fe76094e) + 186
 fe7609a6 g_hash_table_unref (80dbb50, 80dbb50, 8, fe760a1c) + 62
 fe760a46 g_hash_table_destroy (80dbb50, fa1f4384, 0, fa1f43c2) + 36
 fa1f444c free_stack_tables_to_free (8047844, 8047730, feffb93c, fef68e00, 0, fefa06a0) + 94
 fee499c2 _exithandle (feffb93c, 8052566, 0, 8054970, fefcdba4, 0) + 66
 fee3c022 exit     (3, 80478ac, 0, 0, 0, 80478dd) + 12

The address 0xfb2d1db0 was `libgthread-2.0.so.0.2800.6`gthread-impl.c`g_private_get_posix_impl.
It looks like the library is unloaded when exithandler is called.

If I do LD_PRELOAD=/usr/lib/libgthread-2.0.so.0.2800.6 dist/bin/firefox-bin -no-remote -process-updates, the problem is gone.

It might be related to
https://bugzilla.redhat.com/show_bug.cgi?id=472253

It doesn't happen with Firefox 6.0.
This issue is triggered by Bug 552864.
Blocks: 552864
Similar bug like Bug 658995? But the atexit is used in gnome-vfs, so we may need a workaround for it.
(In reply to comment #2)
> Similar bug like Bug 658995? But the atexit is used in gnome-vfs, so we may
> need a workaround for it.

the gnome people is annoying by using atexit in libraries... I guess a workaround is to dlopen libgthread :(
Note that we normally don't dlclose libgnome-vfs for that reason... which obviously works in your case, since the free_stack_tables_to_free function is available. Now the question is: how come dlclosing libxul unloads libgthread if it is still used by libgnome-vfs through libglib ?
I think this is how it happens:

1) libxul.so calls g_thread_init(), libgthread-2.0.so.0 is loaded
2) gnome_vfs_init() called, libgnomevfs-2.so.0 loaded
3) The only symbol in libgthread-2.0.so.0 used by libgnomevfs-2.so.0 is g_thread_init(), since g_thread_got_initialized is true now, it isn't called, so ref count doesn't change?
4) dlclose libxul.so, libgthread-2.0.so.0 is also unloaded.
5) gthread func pointers are left in g_thread_functions_for_glib_use, which is in glib space, in atexit of gnomevfs-2, glib function free_stack_tables_to_free is called and crashes.

So, technically, libgnome-vfs just uses libglib, libglib uses libgthread-2.0 through g_thread_functions_for_glib_use, either libgnome-vfs or libglib refs libgthread-2.0.

Maybe we should intentionally leave libgthread-2.0 or libxul in memory?
(In reply to comment #5)
> I think this is how it happens:
> 
> 1) libxul.so calls g_thread_init(), libgthread-2.0.so.0 is loaded
> 2) gnome_vfs_init() called, libgnomevfs-2.so.0 loaded
> 3) The only symbol in libgthread-2.0.so.0 used by libgnomevfs-2.so.0 is
> g_thread_init(), since g_thread_got_initialized is true now, it isn't
> called, so ref count doesn't change?

Is that an optimization of the solaris dynamic linker?

> 4) dlclose libxul.so, libgthread-2.0.so.0 is also unloaded.

why should it, if gnomevfs is still loaded, which should keep gthread alive at least through glib?
(In reply to comment #6)
> (In reply to comment #5)
> > I think this is how it happens:
> > 
> > 1) libxul.so calls g_thread_init(), libgthread-2.0.so.0 is loaded
> > 2) gnome_vfs_init() called, libgnomevfs-2.so.0 loaded
> > 3) The only symbol in libgthread-2.0.so.0 used by libgnomevfs-2.so.0 is
> > g_thread_init(), since g_thread_got_initialized is true now, it isn't
> > called, so ref count doesn't change?
> 
> Is that an optimization of the solaris dynamic linker?

I think the reason is the dependencies of libgnomevgs-2.so are marked as LAZYLOAD on Solaris.
If I set LD_NOLAZYLOAD, this bug doesn't happen.

> 
> > 4) dlclose libxul.so, libgthread-2.0.so.0 is also unloaded.
> 
> why should it, if gnomevfs is still loaded, which should keep gthread alive
> at least through glib?

gnomevfs will keep glib alive.
gthread depends on glib, but glib doesn't depend on gthread, what keeps gthread alive?
(In reply to comment #7)
> (In reply to comment #6)
> > (In reply to comment #5)
> > > I think this is how it happens:
> > > 
> > > 1) libxul.so calls g_thread_init(), libgthread-2.0.so.0 is loaded
> > > 2) gnome_vfs_init() called, libgnomevfs-2.so.0 loaded
> > > 3) The only symbol in libgthread-2.0.so.0 used by libgnomevfs-2.so.0 is
> > > g_thread_init(), since g_thread_got_initialized is true now, it isn't
> > > called, so ref count doesn't change?
> > 
> > Is that an optimization of the solaris dynamic linker?
> 
> I think the reason is the dependencies of libgnomevgs-2.so are marked as
> LAZYLOAD on Solaris.
> If I set LD_NOLAZYLOAD, this bug doesn't happen.
> 
> > 
> > > 4) dlclose libxul.so, libgthread-2.0.so.0 is also unloaded.
> > 
> > why should it, if gnomevfs is still loaded, which should keep gthread alive
> > at least through glib?
> 
> gnomevfs will keep glib alive.
> gthread depends on glib, but glib doesn't depend on gthread, what keeps
> gthread alive?

So, from what I gather from your comments (not looked at the code), the real problem is that gthread registers callbacks to its functions in glib, but doesn't unregisters them when it is unloaded. Maybe that's not possible at all, but that does seem like a recipe for disaster.

I guess you could change the dlopen call for gnomevfs to have the RTLD_NOW flag, which is more than LD_NOLAZYLOAD would do, aiui, but still be a good tradeoff.
(In reply to comment #8)

> So, from what I gather from your comments (not looked at the code), the real
> problem is that gthread registers callbacks to its functions in glib, but
> doesn't unregisters them when it is unloaded. Maybe that's not possible at
> all, but that does seem like a recipe for disaster.
> 
> I guess you could change the dlopen call for gnomevfs to have the RTLD_NOW
> flag, which is more than LD_NOLAZYLOAD would do, aiui, but still be a good
> tradeoff.

gnome_vfs_init() is called by gnome_program_init(), which is called by nsNativeAppSupportUnix::Start().

We didn't do dlopen gnomevfs at that time.
How old is gnome on solaris ? libgnome hasn't dependended on libgnomevfs for a while...
(In reply to comment #9)
> We didn't do dlopen gnomevfs at that time.

Wouldn't RTLD_NOW still trigger binding, though ?
(In reply to comment #10)
> How old is gnome on solaris ? libgnome hasn't dependended on libgnomevfs for
> a while...

GNOME 2.30.2(In reply to comment #11)

> (In reply to comment #9)
> > We didn't do dlopen gnomevfs at that time.
> 
> Wouldn't RTLD_NOW still trigger binding, though ?

Yes.
(In reply to comment #12)
> > Wouldn't RTLD_NOW still trigger binding, though ?
> 
> Yes.

Then doesn't it work?
(In reply to comment #13)
> (In reply to comment #12)
> > > Wouldn't RTLD_NOW still trigger binding, though ?
> > 
> > Yes.
> 
> Then doesn't it work?

What are you suggesting?
Use RTLD_NOW for all lib in nsGlueLinkingDlOpen.cpp?
dlopen gnomevfs with RTLD_NOW in nsGlueLinkingDlOpen.cpp?
AFAIK, gnomevfs is only dlload()ed from
modules/libpr0n/decoders/icon/gtk/nsIconChannel.cpp, which uses PR_LoadLibrary. Note there is a PR_LoadLibraryWithFlags function that take a PR_LD_NOW flag corresponding to RTLD_NOW.
The other places that require it are libmozgnome.so and libnkgnomevfs.so, where it is linked to these components.
Note modules/libpr0n/decoders/icon/gtk/nsIconChannel.cpp does unload the library on shutdown, so it would still be a problem. See bug 379666 for another place where gnome libraries hit us with atexit.
We didn't reach that far yet.

We just do gnome_program_init(), libgnome will load libgnomevfs and do gnome_vfs_init().
(In reply to comment #17)
> We didn't reach that far yet.
> 
> We just do gnome_program_init(), libgnome will load libgnomevfs and do
> gnome_vfs_init().

Which comes back to the question from comment 11. Wouldn't dlopen(RTDL_NOW) trigger binding, even when the library is already loaded (lazily)?
I think it will.
But "firefox-bin -no-remote -process-updates" will not get into nsIconChannel.
Quick note... the -process-updates command line flag was added to the startup code so we can test applying an update using each app's build without launching the app and is not meant to be used anywhere else in case this makes a difference.
(In reply to comment #19)
> I think it will.
> But "firefox-bin -no-remote -process-updates" will not get into
> nsIconChannel.

Well, -process-updates probably doesn't need to initialize gnome before processing the updates...
If I do "firefox -P" and press exit, it doesn't crash.
From the ld log, I found dist/bin/components/*.so hold handles to libxul.so, so libxul.so is never deleted, also libgnomeui-2, libgnome2, libgconf-2, libdbus-glib-1, ... hold handles to libgthread-2.0.so, so libgthread-2.0.so is never deleted.

It's kind of tricky.
Severity: normal → critical
Keywords: crash
Moving to startup where this code actually lives (e.g. toolkit/xre).
Component: Application Update → Startup and Profile System
Severity: critical → S2
You need to log in before you can comment on or make changes to this bug.