Closed Bug 417163 Opened 16 years ago Closed 16 years ago

Crash on exit [@ XFreeCursor]

Categories

(Core :: XPCOM, defect, P2)

x86
Linux
defect

Tracking

()

RESOLVED FIXED
mozilla1.9

People

(Reporter: MatsPalmgren_bugz, Assigned: MatsPalmgren_bugz)

References

Details

(Keywords: crash, regression)

Crash Data

Attachments

(6 files, 1 obsolete file)

STEPS TO REPRODUCE
1. start Firefox
2. File->Quit

ACTUAL RESULT
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 47155432012880 (LWP 13618)]
0x00002ae33799404c in XFreeCursor () from /usr/lib/libX11.so.6
(gdb) bt
#0  0x00002ae33799404c in XFreeCursor () from /usr/lib/libX11.so.6
#1  0x00002aaaacef2418 in QCursorData::~QCursorData () from /usr/lib/libqt-mt.so.3
#2  0x00002aaaacef2629 in QCursor::~QCursor () from /usr/lib/libqt-mt.so.3
#3  0x00002aaaacef2666 in ?? () from /usr/lib/libqt-mt.so.3
#4  0x00002ae338d75500 in exit () from /lib/libc.so.6
#5  0x00002ae338d5eb4b in __libc_start_main () from /lib/libc.so.6
#6  0x0000000000400e29 in _start ()
(gdb)


100% reproducible.

PLATFORMS AND BUILDS TESTED
Firefox trunk debug build on Kubuntu 7.10 (x86_64)
It did not occur before jemalloc was enabled for Linux (bug 417066).
The crash seems to always be somewhere in libqt-mt.so.3
Summary: Crash on exit [@ → Crash on exit [@ XFreeCursor]
qt??
It looks like libgtk-x11-2.0.so.0 dlopen it from gtk_theme_engine_get(),
so I'm guessing it has something to do with my KDE theme...
without having malloc or free or anything on the stack I'm not really sure.
Is there a way to disable jemalloc without rebuilding?
can you get symbols for those libraries? :)
remove the LIBS += -ljemalloc in browser/app/Makefile.in

another thing that might be interesting -- run with MALLOC_OPTIONS='P' and see if the stats print before or after the crash?
(In reply to comment #6)
> remove the LIBS += -ljemalloc in browser/app/Makefile.in

Removing that line makes the crash go away.

> another thing that might be interesting -- run with MALLOC_OPTIONS='P' and see
> if the stats print before or after the crash?

Sorry, nothing is printed.
Attached file gdb output
(In reply to comment #5)
> can you get symbols for those libraries? :)

Better yet, I downloaded the Ubuntu source package and built my own
debug version of it.

It looks like a bug in Qt for sure.  It's the destructor for some static
objects that acquires and uses a Display* object that's most likely gone
by that time.  You can see in the attached gdb output that the memory is
zero'ed out.  Now that I think about it, I have seen Valgrind complaining
about these functions too...  but I'm guessing the memory wasn't zeroed
out before jemalloc and therefore didn't crash...

Let me know if there's anything else you need...
Yeah... it certainly looks like it got freed.  Can you try calling malloc_usable_size() (from jemalloc) on the display pointer and see if it can find it?

if it can't, it got freed and just happens to be pointing to memory that hasn't been given away yet.. sounds like a bug in QT... Not sure what to do about it:/

Perhaps instead of the above suggestion, try defining MALLOC_DECOMMIT in this block:
http://mxr.mozilla.org/mozilla/source/memory/jemalloc/jemalloc.c#153 and run with MALLOC_OPTIONS='29f' -- that should cause it to mostly decommit unused pages and throw an access exception on access of uncommited memory (ideally -- its possible that the thing lives in a page that isn't fully decommitable, but worth a shot)
Attached file more data
hrm, i wouldn't expect that sqlite crash...  jason.. any ideas?
There's something wrong with the stack trace.  Frames 1 and 2 show substantially different values for 'size', which is not right.  There are other possible inconsistencies in the stack trace that make it hard to guess what's really going on.  It's possible that there's something wrong with jemalloc's shrinking realloc of large objects, but I can't really tell from the stack trace what the problem might be.

What do I need to do to reproduce this bug on a Ubuntu (not Kubuntu) 7.10 system?
(In reply to comment #12)
> There's something wrong with the stack trace.

Yes, gdb on this system frequently shows "<value optimized out>" or
simply the wrong value.

Does jemalloc come with some tracing/debug feature?  How do I enable it?

> What do I need to do to reproduce this bug on a Ubuntu (not Kubuntu) 7.10
> system?

Don't know, since I don't have a Ubuntu 7.10 system to test (only Kubuntu).
I think the important part is that it should be a 64-bit system (x86_64).
The Firefox build is a pretty standard debug build --
I'll attach the .mozconfig 
Attached file .mozconfig
In order to enable tracing, you'll need to first modify jemalloc.c and define MALLOC_UTRACE (or make sure MALLOC_PRODUCTION is not defined).  Then you can set MALLOC_OPTIONS=U in the environment when running firefox.  That will cause a full trace of all allocation event to be printed to stderr.

I'm starting now on work to reproduce this problem.  If you figure it out in the meanwhile, please let me know.
Attached patch wip (obsolete) — Splinter Review
I looked a bit closer at the crash with MALLOC_OPTIONS='29f'
and it seems like we're filling a bit too much junk when
realloc'ing a large area (more than a page) to a small area
(less than a page).  The attached patch fixes the "29f" crash
for me anyway.

#1  0x00002b037fa2c6de in arena_ralloc (ptr=0x2aaab4681000, size=3336, oldsize=8952) at /usr/moz/xxx/mozilla/memory/jemalloc/jemalloc.c:4364
4364                    memset((void *)((uintptr_t)ptr + size), 0x5a, oldsize - size);
(gdb) list
4359            idalloc(ptr);
4360            return (ret);
4361    IN_PLACE:
4362    #ifdef MALLOC_FILL
4363            if (opt_junk && size < oldsize)
4364                    memset((void *)((uintptr_t)ptr + size), 0x5a, oldsize - size);
4365            else if (opt_zero && size > oldsize)
4366                    memset((void *)((uintptr_t)ptr + oldsize), 0, size - oldsize);
4367    #endif
4368            return (ptr);
(gdb) p size
$4 = 3336
(gdb) p oldsize
$5 = 8952
(gdb) p ptr
$6 = (void *) 0x2aaab4681000
(gdb) p malloc_usable_size(0x2aaab4681000)
$7 = 4096
(gdb)
MALLOC_OPTIONS=U confirms that the original crash (in Qt) is indeed
that Display* has been freed at that point.  I tried using
MALLOC_OPTIONS=jz to turn off filling but it didn't help - the mem
used by Display* is still zeroed out when Qt tries to use it.
Maybe jemalloc recycles freed memory faster than standard malloc?
Regarding comment #16, there is indeed a bug in the code that does in-place shrinking of large objects.  It should merely be a race condition on Linux (the Windows decommit code makes this worse), so I'd be a bit surprised if that is the cause of these crashes.  The problem is that we fill the deallocated trailing page(s) after they are made available for use by other allocations.  However, unless we hit the race condition, I don't see a problem with the MALLOC_FILL code.

I will attach a patch that fixes the race.
This patch fixes a race in realloc() on Linux, but the problem is critical on Windows due to decommit.
Attachment #303566 - Flags: superreview+
Attachment #303566 - Flags: review+
Assignee: nobody → jasone
Flags: blocking1.9?
Target Milestone: --- → mozilla1.9beta4
Attachment #303566 - Flags: approval1.9?
Flags: blocking1.9? → blocking1.9+
Attachment #303492 - Attachment is obsolete: true
Comment on attachment 303566 [details] [diff] [review]
Large realloc() fix

Just confirming that this fixes the crashes I got with
MALLOC_OPTIONS='29f' and decommit enabled per comment 8.
Attachment #303566 - Flags: approval1.9?
Status: NEW → ASSIGNED
(I checked in the patch in the bug) -- Mats: any ideas on what we can do to avoid this crash in the Qt code?  Get a bug upstream maybe?
This is the stack for the theme_init() call.  There is never any call
to theme_exit() or QApplication::~QApplication when exiting Firefox.
I suspect Qt would not crash if theme_exit() is called (I think that
would make QPaintDevice::x11AppDisplay() return null and they do null
check that).  So the blame lays higher up, gtk or even our widget code.
Maybe we're not destroying all widgets before the Display is closed?
If we did maybe that would cause the last ref to the GtkStyle to
unload the theme module which calls theme_exit()?
Attached patch wallpaperSplinter Review
Here's a totally lame wallpaper that fixes the crash for me.
It avoids closing the Display if the theme name is "Qt".
Moving bugs that aren't beta 4 blockers to target final release.
Target Milestone: mozilla1.9beta4 → mozilla1.9
Just need to get some fix in for ship, even the hacky one.
Flags: tracking1.9+ → blocking1.9+
Priority: -- → P2
See bug 331088 comment 6 for a similar crash;  it looks like
libXrender.so.1 is trying to use Display after it was closed/deallocated?
Mats - what's left to do on this?
we should probably just take this wallpaper fix
Assignee: jasone → mats.palmgren
Status: ASSIGNED → NEW
(In reply to comment #28)
> we should probably just take this wallpaper fix
> 

Can we get the right reviews and get this closed on then Mats?
Attachment #304186 - Flags: review+
Checking in toolkit/xre/nsAppRunner.cpp;
/cvsroot/mozilla/toolkit/xre/nsAppRunner.cpp,v  <--  nsAppRunner.cpp
new revision: 1.211; previous revision: 1.210
done
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
(In reply to comment #22)
> There is never any call to theme_exit() or QApplication::~QApplication when
> exiting Firefox.

Right.  gtkthemes.c never empties its engine_hash and so never calls
g_type_module_unuse on the module.

http://svn.gnome.org/viewvc/gtk%2B/trunk/gtk/gtkthemes.c?revision=20900&view=markup

> I suspect Qt would not crash if theme_exit() is called (I think that
> would make QPaintDevice::x11AppDisplay() return null and they do null
> check that).

Right (I think from a quick look), but...

> So the blame lays higher up, gtk or even our widget code.
> Maybe we're not destroying all widgets before the Display is closed?
> If we did maybe that would cause the last ref to the GtkStyle to
> unload the theme module which calls theme_exit()?

... the Display that libqtengine.so is using was not provided directly to the
module by GTK (the engine uses the default display) and GTK makes no promises
to notify theme engines when opening and closing displays.  It is up to the
theme engine to register for close display signals.

The fix here looks the sensible option.

(Just writing this in case it ever becomes an issue in the future:)
If it really becomes necessary to close the display then perhaps Mozilla could
create a QApplication before creating any widgets (and libqtengine would use
that QApplication), and destroy the QApplication before closing the display.
But working out whether the theme is Qt without using it would be difficult,
and using dlsym for C++ constructors and destructors could be hairy.
Crash Signature: [@ XFreeCursor]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: