2.23 KB, patch
|Details | Diff | Splinter Review|
2.08 KB, patch
|Details | Diff | Splinter Review|
2.54 KB, patch
|Details | Diff | Splinter Review|
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1a2pre) Gecko/2008072900 SeaMonkey/2.0a1pre I'm getting a lot of crashes since a few days (the switch to comm-central?) when quitting, or restarting the app. I mostly noticed them when updating to a new nightly. I submitted crash reports for most of them, e.g. 065aa707-5e17-11dd-97af-001a4bd43ed6 2825c480-5e17-11dd-87cf-001cc45a2ce4 The stack contains MOZ_gdk_display_close as called from toolkit/xre/nsAppRunner.cpp:2399. So either my Gtk libs (gtk+-2.12.8) are broken or the checkin of Bug 417163 (that I see last changed that line) caused something bad. In the topcrashers I see many more crashes with just a hex signature, maybe some others also refer to similar crashes (or they are all mine), but I can't search, because the server is broken.
I updated to the newest available Gtk+ version that I could find for my distro (Gentoo, gtk+-2.12.9-r2) but it crashed again, e.g. 8f7646a6-5e39-11dd-bab9-001cc45a2ce4
Signature @0xb15b5a40 UUID 8f7646a6-5e39-11dd-bab9-001cc45a2ce4 Time 2008-07-30 06:13:43-07:00 Uptime 8 Product SeaMonkey Version 2.0a1pre Build ID 2008072900 OS Linux OS Version 0.0.0 Linux 2.6.25-tuxonice-r6 #1 Mon Jul 7 16:20:39 CEST 2008 i686 GNU/Linux CPU x86 CPU Info GenuineIntel family 2 model 13 stepping 8 Crash Reason SIGSEGV Crash Address 0xb15b5a40 Comments Upgrading Crashing Thread Frame Module Signature Source 0 @0xb15b5a40 1 libgdk-x11-2.0.so.0.1200.9 libgdk-x11-2.0.so.0.1200.9@0x3675b 2 libgobject-2.0.so.0.1600.3 libgobject-2.0.so.0.1600.3@0xb512 3 libgdk-x11-2.0.so.0.1200.9 libgdk-x11-2.0.so.0.1200.9@0x14dc0 4 libxul.so MOZ_gdk_display_close toolkit/xre/nsAppRunner.cpp:2399 5 libxul.so XRE_main 6 seamonkey-bin main 7 libc-2.5.so libc-2.5.so@0x15837 gtk bug, please get symbols for your os and debug...
OK, I recompiled Gtk (now gtk+-2.12.11) with -g and without stripping and now see the following backtrace: #0 0xb2b4f340 in ?? () #1 0xb76e7e66 in XCloseDisplay () from /usr/lib/libX11.so.6 #2 0xb79c5b46 in gdk_display_x11_finalize (object=0x80637e8) at gdkdisplay-x11.c:832 #3 0xb789bc43 in g_object_unref () from /usr/lib/libgobject-2.0.so.0 #4 0xb79a2e78 in IA__gdk_display_close (display=0x80637e8) at gdkdisplay.c:188 #5 0xb7e7c684 in gtk_moz_embed_get_js_status () from <path>/seamonkey/libxul.so #6 0xb7e7e4bc in XRE_main () from <path>/seamonkey/libxul.so #7 0x08048949 in ?? () #8 0x00000001 in ?? () #9 0xbff9c294 in ?? () #10 0x0804f7a8 in ?? () #11 0x0804f928 in ?? () #12 0x48c8e793 in ?? () #13 0xb7f95360 in _dl_make_stack_executable () from /lib/ld-linux.so.2 #14 0x080486e9 in _init () #15 0xb7494fec in __libc_start_main () from /lib/libc.so.6 #16 0x080487f1 in ?? () (I also submitted bp-2302a8e8-7fe5-11dd-8483-001a4bd43e5c but that doesn't give any more info than before.) Seems a lot like the crash in Bug 448076 or Bug 449371. I'm pretty clueless about Linux graphics nowadays, but I do see using lsof that seamonkey has /usr/lib/libXinerama.so.1.0.0 loaded...
(In reply to comment #3) i have similar to yours crash but with firefox Mozilla/5.0 (X11; U; Linux i686; ru_RU; rv:1.9.1b1pre) Gecko/20080925021240 Minefield/3.1b1pre #0 0xb2fbb0f7 in ?? () #1 0xb6ba5bba in XCloseDisplay () from /usr/lib/libX11.so.6 #2 0x00000004 in ?? () #3 0x00000001 in ?? () #4 0xbfe40d88 in ?? () #5 0xb6e875ed in gdk_display_x11_finalize () from /usr/lib/libgdk-x11-2.0.so.0
Created attachment 350311 [details] [diff] [review] confirm problem OK with this patch I at finally confirmed that the problem is the same as solved on OpenSolaris. With the patch I don't get a shutdown crash any my Linux system anymore.
So, either we leak (as in bug 403706) or we crash, both is bad. ;-) Boying Lu: you found out (in bug 449371) about this callback on OpenSolaris. Is there something one can do automatically to find out if this callback was registered? Like, do certain X variables or versions do that? The crash occurs here on a Gentoo Linux running libXinerama 1.0.2. This is a laptop with only one screen (it even has Option "Xinerama" "off" in xorg.conf), but it of course does have a port to plug in another one. On a SuSE Linux workstation with only one screen it does not crash. libXinerama there is coming from the xorg-x11-libs package (version 7.3-64.1). I hope someone can deduce something from that. I'm happy to test any suggestions...
(In reply to comment #6) > So, either we leak (as in bug 403706) or we crash, both is bad. ;-) > > Boying Lu: you found out (in bug 449371) about this callback on OpenSolaris. Is > there something one can do automatically to find out if this callback was > registered? Like, do certain X variables or versions do that? > I don't know if there is any automatical way. I suggest you to read the Xinerama related source code. I think a quick check is removing PR_UnLoadLibrary() of the patch of the bug 449371 and make a build. Run the build to see if it crashes or not.
(In reply to comment #7) > I think a quick check is removing PR_UnLoadLibrary() of the patch of the bug > 449371 and make a build. Run the build to see if it crashes or not. Yes, that's what I already did before CCing you. See comment 5 or attachment 350311 [details] [diff] [review]. It seems that on Linux one can have at least two configurations, one that crashes and one that doesn't...
I built and run the latest firefox on my ubuntu 8.10 box. But I don't see the problem. Peter, can you tell me your build configuration and Xinerama library version? Thanks
Xinerama here is "libXinerama 1.0.2", also see comment 6. The build that I tested most recently has the following .mozconfig: ac_add_options --enable-application=suite ac_add_options --enable-optimize ac_add_options --disable-debug ac_add_options --disable-tests ac_add_options --disable-updater ac_add_options --enable-codesighs Except the --disabler-updater this should be identical to the official SM nightlies.
I added some more debugging to nsScreenManagerGtk.cpp and see that _XnrmIsActive returns true and numScreens is queried by _XnrmQueryScreens as 1. Adding a call to XineramaQueryVersion I get major=1 minor=1, despite the package system telling me that it's libXinerama 1.0.2.
Aha! On the OpenSuSE system where the program does not crash, I see that numScreens=0! Perhaps that can be used to test if we should unload or not?
Created attachment 350955 [details] [diff] [review] possible workaround (with debug output) The actual change is to use the original reported number of screens in ~nsScreenManagerGtk() when checking if we want to unload. But I left the debugging stuff in for now. On the problematic Gentoo system it says here: nsScreenManagerGtk :: Init(): _XnrmIsActive = true! nsScreenManagerGtk :: Init(): version=1.1 nsScreenManagerGtk :: Init(): numScreens=1 mCachedScreenArray.Count()=1, nNumScreens=1 not unloading mXineramalib=0x428cb100 while on the OpenSuSE system I get only nsScreenManagerGtk :: Init(): numScreens=0 mCachedScreenArray.Count()=1, nNumScreens=0 unloading mXineramalib=0xf2a52de0 Boying Lu, could you test this patch on your Ubuntu system and maybe on Solaris, too?
(In reply to comment #13) > > Boying Lu, could you test this patch on your Ubuntu system and maybe on > Solaris, too? Yes. the patch works on both Solaris and Ubuntu.
Created attachment 351139 [details] [diff] [review] cleaned up patch OK, great, thanks. So this is ready to be reviewed, now without debugging stuff. dbaron, as this code comes from you originally, I thought you might want to take a look. :-)
(In reply to comment #14) > Yes. the patch works on both Solaris and Ubuntu. can't wait it in trunk :-)
Why not just make the member variable "PRBool mXineramaIsActive"? That would seem a good bit clearer.
Created attachment 351205 [details] [diff] [review] better solution Yes, indeed. That also removes the weirdness of having two similarly named variables.
Comment on attachment 351205 [details] [diff] [review] better solution + PRBool mXineramaIsActive; PRPackedBool
Comment on attachment 351205 [details] [diff] [review] better solution r=dbaron
Fixed on trunk: http://hg.mozilla.org/mozilla-central/rev/240bf1b22cd4
what about branch 3.1? it still crashes :-(
Mozilla/5.0 (X11; U; Linux i686; ru; rv:1.9.1b3pre) Gecko/20081215 Shiretoko/3.1b3pre (gdb) bt #0 0x4bbb40f7 in ?? () #1 0xb699dbba in XCloseDisplay () from /usr/lib/libX11.so.6 #2 0x00000004 in ?? () #3 0x00000001 in ?? () #4 0xbff08658 in ?? () #5 0xb6c837f7 in gdk_display_x11_finalize () from /usr/lib/libgdk-x11-2.0.so.0 Backtrace stopped: previous frame inner to this frame (corrupt stack?)
Rion: of course branch still crashes. You have to wait until this bug is marked fixed1.9.1. (Actually, I forgot to mark it fixed because of mozilla-central checkin.)
Comment on attachment 351205 [details] [diff] [review] better solution a191=beltzner
Pushed to mozilla-1.9.1: http://hg.mozilla.org/releases/mozilla-1.9.1/rev/8dadfd7b54d0
After the fix of bug 477959, I start to see this bug again on Solaris/SPARC with Xsun. With Xsun, _XnrmIsActive returns false. I guess in this situation we still should not call PR_Unload(). (Bug 477959 hides this issue, since number is uninitiated in this case) I don't know if it is Xsun only problem. I don't have another environment that Xinerame extension is available but _XnrmIsActive returns false.
If so, we should not do PR_Unload() if _XnrmIsActive is ever called. The easiest way is to remove all these code, because _XnrmIsActive is always called if Xinerama library is loaded. (unless the function pointer is null)
I forgot to remove mXineramaIsActive from the header file.
Comment on attachment 364486 [details] [diff] [review] patch r=dbaron if you also remove mXineramaIsActive from nsScreenManagerGtk.h
Created attachment 366518 [details] [diff] [review] patch for 1.9.1 branch This would prevent a random crash on quit with Xsun on Solaris and perhaps other platforms.
Comment on attachment 366518 [details] [diff] [review] patch for 1.9.1 branch a191=beltzner
Pushed to 1.9.1 http://hg.mozilla.org/releases/mozilla-1.9.1/rev/4245fff36291