Last Comment Bug 448512 - Crash on quit [@ XCloseDisplay]
: Crash on quit [@ XCloseDisplay]
Status: RESOLVED FIXED
: crash, fixed1.9.1
Product: Core
Classification: Components
Component: Widget: Gtk (show other bugs)
: Trunk
: x86 Linux
: -- normal with 2 votes (vote)
: ---
Assigned To: Peter Weilbacher
:
Mentors:
Depends on:
Blocks: 451985
  Show dependency treegraph
 
Reported: 2008-07-30 03:26 PDT by Peter Weilbacher
Modified: 2011-06-09 14:58 PDT (History)
4 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
confirm problem (746 bytes, patch)
2008-11-27 03:46 PST, Peter Weilbacher
no flags Details | Diff | Review
possible workaround (with debug output) (3.24 KB, patch)
2008-12-02 06:39 PST, Peter Weilbacher
no flags Details | Diff | Review
cleaned up patch (2.18 KB, patch)
2008-12-03 02:26 PST, Peter Weilbacher
no flags Details | Diff | Review
better solution (2.23 KB, patch)
2008-12-03 09:51 PST, Peter Weilbacher
dbaron: review+
roc: superreview+
mbeltzner: approval1.9.1+
Details | Diff | Review
patch (2.08 KB, patch)
2009-02-26 23:33 PST, Ginn Chen
dbaron: review+
Details | Diff | Review
patch for 1.9.1 branch (2.54 KB, patch)
2009-03-09 23:33 PDT, Ginn Chen
mbeltzner: approval1.9.1+
Details | Diff | Review

Description Peter Weilbacher 2008-07-30 03:26:23 PDT
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1a2pre) Gecko/2008072900 SeaMonkey/2.0a1pre

I'm getting a lot of crashes since a few days (the switch to comm-central?) when quitting, or restarting the app. I mostly noticed them when updating to a new nightly. I submitted crash reports for most of them, e.g.

   065aa707-5e17-11dd-97af-001a4bd43ed6
   2825c480-5e17-11dd-87cf-001cc45a2ce4

The stack contains MOZ_gdk_display_close as called from toolkit/xre/nsAppRunner.cpp:2399. So either my Gtk libs (gtk+-2.12.8) are broken or the checkin of Bug 417163 (that I see last changed that line) caused something bad. In the topcrashers I see many more crashes with just a hex signature, maybe some others also refer to similar crashes (or they are all mine), but I can't search, because the server is broken.
Comment 1 Peter Weilbacher 2008-07-30 07:11:50 PDT
I updated to the newest available Gtk+ version that I could find for my distro (Gentoo, gtk+-2.12.9-r2) but it crashed again, e.g.

   8f7646a6-5e39-11dd-bab9-001cc45a2ce4
Comment 2 timeless 2008-07-31 06:11:27 PDT
Signature	@0xb15b5a40
UUID	8f7646a6-5e39-11dd-bab9-001cc45a2ce4
Time	2008-07-30 06:13:43-07:00
Uptime	8
Product	SeaMonkey
Version	2.0a1pre
Build ID	2008072900
OS	Linux
OS Version	0.0.0 Linux 2.6.25-tuxonice-r6 #1 Mon Jul 7 16:20:39 CEST 2008 i686 GNU/Linux
CPU	x86
CPU Info	GenuineIntel family 2 model 13 stepping 8
Crash Reason	SIGSEGV
Crash Address	0xb15b5a40
Comments	Upgrading
Crashing Thread
Frame 	Module 	Signature 	Source
0 		@0xb15b5a40 	
1 	libgdk-x11-2.0.so.0.1200.9 	libgdk-x11-2.0.so.0.1200.9@0x3675b 	
2 	libgobject-2.0.so.0.1600.3 	libgobject-2.0.so.0.1600.3@0xb512 	
3 	libgdk-x11-2.0.so.0.1200.9 	libgdk-x11-2.0.so.0.1200.9@0x14dc0 	
4 	libxul.so 	MOZ_gdk_display_close 	toolkit/xre/nsAppRunner.cpp:2399
5 	libxul.so 	XRE_main 	
6 	seamonkey-bin 	main 	
7 	libc-2.5.so 	libc-2.5.so@0x15837

gtk bug, please get symbols for your os and debug...
Comment 3 Peter Weilbacher 2008-09-11 03:00:47 PDT
OK, I recompiled Gtk (now gtk+-2.12.11) with -g and without stripping and now see the following backtrace:

#0  0xb2b4f340 in ?? ()
#1  0xb76e7e66 in XCloseDisplay () from /usr/lib/libX11.so.6
#2  0xb79c5b46 in gdk_display_x11_finalize (object=0x80637e8)
    at gdkdisplay-x11.c:832
#3  0xb789bc43 in g_object_unref () from /usr/lib/libgobject-2.0.so.0
#4  0xb79a2e78 in IA__gdk_display_close (display=0x80637e8) at gdkdisplay.c:188
#5  0xb7e7c684 in gtk_moz_embed_get_js_status () from <path>/seamonkey/libxul.so
#6  0xb7e7e4bc in XRE_main () from <path>/seamonkey/libxul.so
#7  0x08048949 in ?? ()
#8  0x00000001 in ?? ()
#9  0xbff9c294 in ?? ()
#10 0x0804f7a8 in ?? ()
#11 0x0804f928 in ?? ()
#12 0x48c8e793 in ?? ()
#13 0xb7f95360 in _dl_make_stack_executable () from /lib/ld-linux.so.2
#14 0x080486e9 in _init ()
#15 0xb7494fec in __libc_start_main () from /lib/libc.so.6
#16 0x080487f1 in ?? ()

(I also submitted bp-2302a8e8-7fe5-11dd-8483-001a4bd43e5c but that doesn't give any more info than before.)

Seems a lot like the crash in Bug 448076 or Bug 449371. I'm pretty clueless about Linux graphics nowadays, but I do see using lsof that seamonkey has /usr/lib/libXinerama.so.1.0.0 loaded...
Comment 4 Rion 2008-09-30 05:24:37 PDT
(In reply to comment #3)


i have similar to yours crash but with firefox

Mozilla/5.0 (X11; U; Linux i686; ru_RU; rv:1.9.1b1pre) Gecko/20080925021240 Minefield/3.1b1pre

#0  0xb2fbb0f7 in ?? ()
#1  0xb6ba5bba in XCloseDisplay () from /usr/lib/libX11.so.6
#2  0x00000004 in ?? ()
#3  0x00000001 in ?? ()
#4  0xbfe40d88 in ?? ()
#5  0xb6e875ed in gdk_display_x11_finalize () from /usr/lib/libgdk-x11-2.0.so.0
Comment 5 Peter Weilbacher 2008-11-27 03:46:08 PST
Created attachment 350311 [details] [diff] [review]
confirm problem

OK with this patch I at finally confirmed that the problem is the same as solved on OpenSolaris. With the patch I don't get a shutdown crash any my Linux system anymore.
Comment 6 Peter Weilbacher 2008-11-27 03:59:00 PST
So, either we leak (as in bug 403706) or we crash, both is bad. ;-)

Boying Lu: you found out (in bug 449371) about this callback on OpenSolaris. Is there something one can do automatically to find out if this callback was registered? Like, do certain X variables or versions do that?

The crash occurs here on a Gentoo Linux running libXinerama 1.0.2. This is a laptop with only one screen (it even has Option "Xinerama" "off" in xorg.conf), but it of course does have a port to plug in another one. On a SuSE Linux workstation with only one screen it does not crash. libXinerama there is coming from the xorg-x11-libs package (version 7.3-64.1).

I hope someone can deduce something from that. I'm happy to test any suggestions...
Comment 7 Boying Lu 2008-11-27 18:10:07 PST
(In reply to comment #6)
> So, either we leak (as in bug 403706) or we crash, both is bad. ;-)
> 
> Boying Lu: you found out (in bug 449371) about this callback on OpenSolaris. Is
> there something one can do automatically to find out if this callback was
> registered? Like, do certain X variables or versions do that?
> 
I don't know if there is any automatical way. I suggest you to read the Xinerama 
related source code.

I think a quick check is removing PR_UnLoadLibrary() of the patch of the bug 449371 and make a build. Run the build to see if it crashes or not.
Comment 8 Peter Weilbacher 2008-11-28 06:38:29 PST
(In reply to comment #7)
> I think a quick check is removing PR_UnLoadLibrary() of the patch of the bug
> 449371 and make a build. Run the build to see if it crashes or not.

Yes, that's what I already did before CCing you. See comment 5 or attachment 350311 [details] [diff] [review]. It seems that on Linux one can have at least two configurations, one that crashes and one that doesn't...
Comment 9 Boying Lu 2008-12-02 02:19:22 PST
I built and run the latest firefox on my ubuntu 8.10 box. But I don't see the problem.  

Peter, can you tell me your build configuration and Xinerama library version? Thanks
Comment 10 Peter Weilbacher 2008-12-02 05:55:19 PST
Xinerama here is "libXinerama 1.0.2", also see comment 6.

The build that I tested most recently has the following .mozconfig:
  ac_add_options --enable-application=suite
  ac_add_options --enable-optimize
  ac_add_options --disable-debug
  ac_add_options --disable-tests
  ac_add_options --disable-updater
  ac_add_options --enable-codesighs
Except the --disabler-updater this should be identical to the official SM nightlies.
Comment 11 Peter Weilbacher 2008-12-02 06:06:44 PST
I added some more debugging to nsScreenManagerGtk.cpp and see that _XnrmIsActive returns true and numScreens is queried by _XnrmQueryScreens as 1. Adding a call to XineramaQueryVersion I get major=1 minor=1, despite the package system telling me that it's libXinerama 1.0.2.
Comment 12 Peter Weilbacher 2008-12-02 06:25:52 PST
Aha! On the OpenSuSE system where the program does not crash, I see that numScreens=0! Perhaps that can be used to test if we should unload or not?
Comment 13 Peter Weilbacher 2008-12-02 06:39:31 PST
Created attachment 350955 [details] [diff] [review]
possible workaround (with debug output)

The actual change is to use the original reported number of screens in ~nsScreenManagerGtk() when checking if we want to unload. But I left the debugging stuff in for now.
On the problematic Gentoo system it says here:
   nsScreenManagerGtk :: Init(): _XnrmIsActive = true!
   nsScreenManagerGtk :: Init(): version=1.1
   nsScreenManagerGtk :: Init(): numScreens=1
   mCachedScreenArray.Count()=1, nNumScreens=1
   not unloading mXineramalib=0x428cb100

while on the OpenSuSE system I get only
   nsScreenManagerGtk :: Init(): numScreens=0
   mCachedScreenArray.Count()=1, nNumScreens=0
   unloading mXineramalib=0xf2a52de0

Boying Lu, could you test this patch on your Ubuntu system and maybe on Solaris, too?
Comment 14 Boying Lu 2008-12-03 01:30:17 PST
(In reply to comment #13)
> 
> Boying Lu, could you test this patch on your Ubuntu system and maybe on
> Solaris, too?

Yes. the patch works on both Solaris and Ubuntu.
Comment 15 Peter Weilbacher 2008-12-03 02:26:10 PST
Created attachment 351139 [details] [diff] [review]
cleaned up patch

OK, great, thanks. So this is ready to be reviewed, now without debugging stuff.

dbaron, as this code comes from you originally, I thought you might want to take a look. :-)
Comment 16 Rion 2008-12-03 02:27:41 PST
(In reply to comment #14)
> Yes. the patch works on both Solaris and Ubuntu.

can't wait it in trunk :-)
Comment 17 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2008-12-03 07:56:58 PST
Why not just make the member variable "PRBool mXineramaIsActive"?  That would seem a good bit clearer.
Comment 18 Peter Weilbacher 2008-12-03 09:51:49 PST
Created attachment 351205 [details] [diff] [review]
better solution

Yes, indeed. That also removes the weirdness of having two similarly named variables.
Comment 19 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2008-12-03 14:34:18 PST
Comment on attachment 351205 [details] [diff] [review]
better solution

+  PRBool mXineramaIsActive;

PRPackedBool
Comment 20 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2008-12-04 17:21:23 PST
Comment on attachment 351205 [details] [diff] [review]
better solution

r=dbaron
Comment 21 Peter Weilbacher 2008-12-05 01:19:15 PST
Fixed on trunk:
http://hg.mozilla.org/mozilla-central/rev/240bf1b22cd4
Comment 22 Rion 2008-12-15 01:28:00 PST
what about branch 3.1?
it still crashes :-(
Comment 23 Rion 2008-12-16 00:51:46 PST
Mozilla/5.0 (X11; U; Linux i686; ru; rv:1.9.1b3pre) Gecko/20081215 Shiretoko/3.1b3pre

(gdb) bt
#0  0x4bbb40f7 in ?? ()
#1  0xb699dbba in XCloseDisplay () from /usr/lib/libX11.so.6
#2  0x00000004 in ?? ()
#3  0x00000001 in ?? ()
#4  0xbff08658 in ?? ()
#5  0xb6c837f7 in gdk_display_x11_finalize () from /usr/lib/libgdk-x11-2.0.so.0
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
Comment 24 Peter Weilbacher 2008-12-16 01:59:07 PST
Rion: of course branch still crashes. You have to wait until this bug is marked fixed1.9.1.
(Actually, I forgot to mark it fixed because of mozilla-central checkin.)
Comment 25 Mike Beltzner [:beltzner, not reading bugmail] 2008-12-29 15:19:15 PST
Comment on attachment 351205 [details] [diff] [review]
better solution

a191=beltzner
Comment 26 Peter Weilbacher 2008-12-29 23:58:35 PST
Pushed to mozilla-1.9.1:
http://hg.mozilla.org/releases/mozilla-1.9.1/rev/8dadfd7b54d0
Comment 27 Ginn Chen 2009-02-26 23:11:08 PST
After the fix of bug 477959, I start to see this bug again on Solaris/SPARC with Xsun.

With Xsun, _XnrmIsActive returns false.
I guess in this situation we still should not call PR_Unload().
(Bug 477959 hides this issue, since number is uninitiated in this case)

I don't know if it is Xsun only problem.
I don't have another environment that Xinerame extension is available but _XnrmIsActive returns false.
Comment 28 Ginn Chen 2009-02-26 23:26:12 PST
If so, we should not do PR_Unload() if _XnrmIsActive is ever called.
The easiest way is to remove all these code, because _XnrmIsActive is always called if Xinerama library is loaded. (unless the function pointer is null)
Comment 29 Ginn Chen 2009-02-26 23:33:48 PST
Created attachment 364486 [details] [diff] [review]
patch
Comment 30 Ginn Chen 2009-03-03 20:59:59 PST
I forgot to remove mXineramaIsActive from the header file.
Comment 31 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2009-03-09 15:46:06 PDT
Comment on attachment 364486 [details] [diff] [review]
patch

r=dbaron if you also remove mXineramaIsActive from nsScreenManagerGtk.h
Comment 33 Ginn Chen 2009-03-09 23:33:04 PDT
Created attachment 366518 [details] [diff] [review]
patch for 1.9.1 branch

This would prevent a random crash on quit with Xsun on Solaris and perhaps other platforms.
Comment 34 Mike Beltzner [:beltzner, not reading bugmail] 2009-03-31 10:52:11 PDT
Comment on attachment 366518 [details] [diff] [review]
patch for 1.9.1 branch

a191=beltzner
Comment 35 Ginn Chen 2009-03-31 22:39:16 PDT
Pushed to 1.9.1

http://hg.mozilla.org/releases/mozilla-1.9.1/rev/4245fff36291

Note You need to log in before you can comment on or make changes to this bug.