Closed
Bug 417163
Opened 16 years ago
Closed 16 years ago
Crash on exit [@ XFreeCursor]
Categories
(Core :: XPCOM, defect, P2)
Tracking
()
RESOLVED
FIXED
mozilla1.9
People
(Reporter: MatsPalmgren_bugz, Assigned: MatsPalmgren_bugz)
References
Details
(Keywords: crash, regression)
Crash Data
Attachments
(6 files, 1 obsolete file)
4.32 KB,
text/plain
|
Details | |
7.13 KB,
text/plain
|
Details | |
1.03 KB,
text/plain
|
Details | |
3.60 KB,
patch
|
pavlov
:
review+
pavlov
:
superreview+
|
Details | Diff | Splinter Review |
5.57 KB,
text/plain
|
Details | |
2.31 KB,
patch
|
pavlov
:
review+
|
Details | Diff | Splinter Review |
STEPS TO REPRODUCE 1. start Firefox 2. File->Quit ACTUAL RESULT Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 47155432012880 (LWP 13618)] 0x00002ae33799404c in XFreeCursor () from /usr/lib/libX11.so.6 (gdb) bt #0 0x00002ae33799404c in XFreeCursor () from /usr/lib/libX11.so.6 #1 0x00002aaaacef2418 in QCursorData::~QCursorData () from /usr/lib/libqt-mt.so.3 #2 0x00002aaaacef2629 in QCursor::~QCursor () from /usr/lib/libqt-mt.so.3 #3 0x00002aaaacef2666 in ?? () from /usr/lib/libqt-mt.so.3 #4 0x00002ae338d75500 in exit () from /lib/libc.so.6 #5 0x00002ae338d5eb4b in __libc_start_main () from /lib/libc.so.6 #6 0x0000000000400e29 in _start () (gdb) 100% reproducible. PLATFORMS AND BUILDS TESTED Firefox trunk debug build on Kubuntu 7.10 (x86_64) It did not occur before jemalloc was enabled for Linux (bug 417066). The crash seems to always be somewhere in libqt-mt.so.3
Assignee | ||
Updated•16 years ago
|
Summary: Crash on exit [@ → Crash on exit [@ XFreeCursor]
Comment 1•16 years ago
|
||
qt??
Assignee | ||
Comment 2•16 years ago
|
||
It looks like libgtk-x11-2.0.so.0 dlopen it from gtk_theme_engine_get(), so I'm guessing it has something to do with my KDE theme...
Comment 3•16 years ago
|
||
without having malloc or free or anything on the stack I'm not really sure.
Assignee | ||
Comment 4•16 years ago
|
||
Is there a way to disable jemalloc without rebuilding?
Comment 6•16 years ago
|
||
remove the LIBS += -ljemalloc in browser/app/Makefile.in another thing that might be interesting -- run with MALLOC_OPTIONS='P' and see if the stats print before or after the crash?
Assignee | ||
Comment 7•16 years ago
|
||
(In reply to comment #6) > remove the LIBS += -ljemalloc in browser/app/Makefile.in Removing that line makes the crash go away. > another thing that might be interesting -- run with MALLOC_OPTIONS='P' and see > if the stats print before or after the crash? Sorry, nothing is printed.
Assignee | ||
Comment 8•16 years ago
|
||
(In reply to comment #5) > can you get symbols for those libraries? :) Better yet, I downloaded the Ubuntu source package and built my own debug version of it. It looks like a bug in Qt for sure. It's the destructor for some static objects that acquires and uses a Display* object that's most likely gone by that time. You can see in the attached gdb output that the memory is zero'ed out. Now that I think about it, I have seen Valgrind complaining about these functions too... but I'm guessing the memory wasn't zeroed out before jemalloc and therefore didn't crash... Let me know if there's anything else you need...
Comment 9•16 years ago
|
||
Yeah... it certainly looks like it got freed. Can you try calling malloc_usable_size() (from jemalloc) on the display pointer and see if it can find it? if it can't, it got freed and just happens to be pointing to memory that hasn't been given away yet.. sounds like a bug in QT... Not sure what to do about it:/ Perhaps instead of the above suggestion, try defining MALLOC_DECOMMIT in this block: http://mxr.mozilla.org/mozilla/source/memory/jemalloc/jemalloc.c#153 and run with MALLOC_OPTIONS='29f' -- that should cause it to mostly decommit unused pages and throw an access exception on access of uncommited memory (ideally -- its possible that the thing lives in a page that isn't fully decommitable, but worth a shot)
Assignee | ||
Comment 10•16 years ago
|
||
Comment 11•16 years ago
|
||
hrm, i wouldn't expect that sqlite crash... jason.. any ideas?
Comment 12•16 years ago
|
||
There's something wrong with the stack trace. Frames 1 and 2 show substantially different values for 'size', which is not right. There are other possible inconsistencies in the stack trace that make it hard to guess what's really going on. It's possible that there's something wrong with jemalloc's shrinking realloc of large objects, but I can't really tell from the stack trace what the problem might be. What do I need to do to reproduce this bug on a Ubuntu (not Kubuntu) 7.10 system?
Assignee | ||
Comment 13•16 years ago
|
||
(In reply to comment #12) > There's something wrong with the stack trace. Yes, gdb on this system frequently shows "<value optimized out>" or simply the wrong value. Does jemalloc come with some tracing/debug feature? How do I enable it? > What do I need to do to reproduce this bug on a Ubuntu (not Kubuntu) 7.10 > system? Don't know, since I don't have a Ubuntu 7.10 system to test (only Kubuntu). I think the important part is that it should be a 64-bit system (x86_64). The Firefox build is a pretty standard debug build -- I'll attach the .mozconfig
Assignee | ||
Comment 14•16 years ago
|
||
Comment 15•16 years ago
|
||
In order to enable tracing, you'll need to first modify jemalloc.c and define MALLOC_UTRACE (or make sure MALLOC_PRODUCTION is not defined). Then you can set MALLOC_OPTIONS=U in the environment when running firefox. That will cause a full trace of all allocation event to be printed to stderr. I'm starting now on work to reproduce this problem. If you figure it out in the meanwhile, please let me know.
Assignee | ||
Comment 16•16 years ago
|
||
I looked a bit closer at the crash with MALLOC_OPTIONS='29f' and it seems like we're filling a bit too much junk when realloc'ing a large area (more than a page) to a small area (less than a page). The attached patch fixes the "29f" crash for me anyway. #1 0x00002b037fa2c6de in arena_ralloc (ptr=0x2aaab4681000, size=3336, oldsize=8952) at /usr/moz/xxx/mozilla/memory/jemalloc/jemalloc.c:4364 4364 memset((void *)((uintptr_t)ptr + size), 0x5a, oldsize - size); (gdb) list 4359 idalloc(ptr); 4360 return (ret); 4361 IN_PLACE: 4362 #ifdef MALLOC_FILL 4363 if (opt_junk && size < oldsize) 4364 memset((void *)((uintptr_t)ptr + size), 0x5a, oldsize - size); 4365 else if (opt_zero && size > oldsize) 4366 memset((void *)((uintptr_t)ptr + oldsize), 0, size - oldsize); 4367 #endif 4368 return (ptr); (gdb) p size $4 = 3336 (gdb) p oldsize $5 = 8952 (gdb) p ptr $6 = (void *) 0x2aaab4681000 (gdb) p malloc_usable_size(0x2aaab4681000) $7 = 4096 (gdb)
Assignee | ||
Comment 17•16 years ago
|
||
MALLOC_OPTIONS=U confirms that the original crash (in Qt) is indeed that Display* has been freed at that point. I tried using MALLOC_OPTIONS=jz to turn off filling but it didn't help - the mem used by Display* is still zeroed out when Qt tries to use it. Maybe jemalloc recycles freed memory faster than standard malloc?
Comment 18•16 years ago
|
||
Regarding comment #16, there is indeed a bug in the code that does in-place shrinking of large objects. It should merely be a race condition on Linux (the Windows decommit code makes this worse), so I'd be a bit surprised if that is the cause of these crashes. The problem is that we fill the deallocated trailing page(s) after they are made available for use by other allocations. However, unless we hit the race condition, I don't see a problem with the MALLOC_FILL code. I will attach a patch that fixes the race.
Comment 19•16 years ago
|
||
This patch fixes a race in realloc() on Linux, but the problem is critical on Windows due to decommit.
Updated•16 years ago
|
Attachment #303566 -
Flags: superreview+
Attachment #303566 -
Flags: review+
Updated•16 years ago
|
Assignee: nobody → jasone
Flags: blocking1.9?
Target Milestone: --- → mozilla1.9beta4
Updated•16 years ago
|
Attachment #303566 -
Flags: approval1.9?
Updated•16 years ago
|
Flags: blocking1.9? → blocking1.9+
Assignee | ||
Updated•16 years ago
|
Attachment #303492 -
Attachment is obsolete: true
Assignee | ||
Comment 20•16 years ago
|
||
Comment on attachment 303566 [details] [diff] [review] Large realloc() fix Just confirming that this fixes the crashes I got with MALLOC_OPTIONS='29f' and decommit enabled per comment 8.
Updated•16 years ago
|
Attachment #303566 -
Flags: approval1.9?
Updated•16 years ago
|
Status: NEW → ASSIGNED
Comment 21•16 years ago
|
||
(I checked in the patch in the bug) -- Mats: any ideas on what we can do to avoid this crash in the Qt code? Get a bug upstream maybe?
Assignee | ||
Comment 22•16 years ago
|
||
This is the stack for the theme_init() call. There is never any call to theme_exit() or QApplication::~QApplication when exiting Firefox. I suspect Qt would not crash if theme_exit() is called (I think that would make QPaintDevice::x11AppDisplay() return null and they do null check that). So the blame lays higher up, gtk or even our widget code. Maybe we're not destroying all widgets before the Display is closed? If we did maybe that would cause the last ref to the GtkStyle to unload the theme module which calls theme_exit()?
Assignee | ||
Comment 23•16 years ago
|
||
Here's a totally lame wallpaper that fixes the crash for me. It avoids closing the Display if the theme name is "Qt".
Comment 24•16 years ago
|
||
Moving bugs that aren't beta 4 blockers to target final release.
Target Milestone: mozilla1.9beta4 → mozilla1.9
Comment 25•16 years ago
|
||
Just need to get some fix in for ship, even the hacky one.
Flags: tracking1.9+ → blocking1.9+
Priority: -- → P2
Assignee | ||
Comment 26•16 years ago
|
||
See bug 331088 comment 6 for a similar crash; it looks like libXrender.so.1 is trying to use Display after it was closed/deallocated?
Comment 27•16 years ago
|
||
Mats - what's left to do on this?
Comment 28•16 years ago
|
||
we should probably just take this wallpaper fix
Assignee: jasone → mats.palmgren
Status: ASSIGNED → NEW
Comment 29•16 years ago
|
||
(In reply to comment #28) > we should probably just take this wallpaper fix > Can we get the right reviews and get this closed on then Mats?
Updated•16 years ago
|
Attachment #304186 -
Flags: review+
Comment 30•16 years ago
|
||
Checking in toolkit/xre/nsAppRunner.cpp; /cvsroot/mozilla/toolkit/xre/nsAppRunner.cpp,v <-- nsAppRunner.cpp new revision: 1.211; previous revision: 1.210 done
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Comment 31•16 years ago
|
||
(In reply to comment #22) > There is never any call to theme_exit() or QApplication::~QApplication when > exiting Firefox. Right. gtkthemes.c never empties its engine_hash and so never calls g_type_module_unuse on the module. http://svn.gnome.org/viewvc/gtk%2B/trunk/gtk/gtkthemes.c?revision=20900&view=markup > I suspect Qt would not crash if theme_exit() is called (I think that > would make QPaintDevice::x11AppDisplay() return null and they do null > check that). Right (I think from a quick look), but... > So the blame lays higher up, gtk or even our widget code. > Maybe we're not destroying all widgets before the Display is closed? > If we did maybe that would cause the last ref to the GtkStyle to > unload the theme module which calls theme_exit()? ... the Display that libqtengine.so is using was not provided directly to the module by GTK (the engine uses the default display) and GTK makes no promises to notify theme engines when opening and closing displays. It is up to the theme engine to register for close display signals. The fix here looks the sensible option. (Just writing this in case it ever becomes an issue in the future:) If it really becomes necessary to close the display then perhaps Mozilla could create a QApplication before creating any widgets (and libqtengine would use that QApplication), and destroy the QApplication before closing the display. But working out whether the theme is Qt without using it would be difficult, and using dlsym for C++ constructors and destructors could be hairy.
Updated•13 years ago
|
Crash Signature: [@ XFreeCursor]
You need to log in
before you can comment on or make changes to this bug.
Description
•