Closed Bug 448512 Opened 16 years ago Closed 15 years ago

Crash on quit [@ XCloseDisplay]

Categories

(Core :: Widget: Gtk, defect)

x86
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: mozilla, Assigned: mozilla)

References

Details

(Keywords: crash, fixed1.9.1)

Crash Data

Attachments

(3 files, 3 obsolete files)

Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1a2pre) Gecko/2008072900 SeaMonkey/2.0a1pre

I'm getting a lot of crashes since a few days (the switch to comm-central?) when quitting, or restarting the app. I mostly noticed them when updating to a new nightly. I submitted crash reports for most of them, e.g.

   065aa707-5e17-11dd-97af-001a4bd43ed6
   2825c480-5e17-11dd-87cf-001cc45a2ce4

The stack contains MOZ_gdk_display_close as called from toolkit/xre/nsAppRunner.cpp:2399. So either my Gtk libs (gtk+-2.12.8) are broken or the checkin of Bug 417163 (that I see last changed that line) caused something bad. In the topcrashers I see many more crashes with just a hex signature, maybe some others also refer to similar crashes (or they are all mine), but I can't search, because the server is broken.
I updated to the newest available Gtk+ version that I could find for my distro (Gentoo, gtk+-2.12.9-r2) but it crashed again, e.g.

   8f7646a6-5e39-11dd-bab9-001cc45a2ce4
Signature	@0xb15b5a40
UUID	8f7646a6-5e39-11dd-bab9-001cc45a2ce4
Time	2008-07-30 06:13:43-07:00
Uptime	8
Product	SeaMonkey
Version	2.0a1pre
Build ID	2008072900
OS	Linux
OS Version	0.0.0 Linux 2.6.25-tuxonice-r6 #1 Mon Jul 7 16:20:39 CEST 2008 i686 GNU/Linux
CPU	x86
CPU Info	GenuineIntel family 2 model 13 stepping 8
Crash Reason	SIGSEGV
Crash Address	0xb15b5a40
Comments	Upgrading
Crashing Thread
Frame 	Module 	Signature 	Source
0 		@0xb15b5a40 	
1 	libgdk-x11-2.0.so.0.1200.9 	libgdk-x11-2.0.so.0.1200.9@0x3675b 	
2 	libgobject-2.0.so.0.1600.3 	libgobject-2.0.so.0.1600.3@0xb512 	
3 	libgdk-x11-2.0.so.0.1200.9 	libgdk-x11-2.0.so.0.1200.9@0x14dc0 	
4 	libxul.so 	MOZ_gdk_display_close 	toolkit/xre/nsAppRunner.cpp:2399
5 	libxul.so 	XRE_main 	
6 	seamonkey-bin 	main 	
7 	libc-2.5.so 	libc-2.5.so@0x15837

gtk bug, please get symbols for your os and debug...
OK, I recompiled Gtk (now gtk+-2.12.11) with -g and without stripping and now see the following backtrace:

#0  0xb2b4f340 in ?? ()
#1  0xb76e7e66 in XCloseDisplay () from /usr/lib/libX11.so.6
#2  0xb79c5b46 in gdk_display_x11_finalize (object=0x80637e8)
    at gdkdisplay-x11.c:832
#3  0xb789bc43 in g_object_unref () from /usr/lib/libgobject-2.0.so.0
#4  0xb79a2e78 in IA__gdk_display_close (display=0x80637e8) at gdkdisplay.c:188
#5  0xb7e7c684 in gtk_moz_embed_get_js_status () from <path>/seamonkey/libxul.so
#6  0xb7e7e4bc in XRE_main () from <path>/seamonkey/libxul.so
#7  0x08048949 in ?? ()
#8  0x00000001 in ?? ()
#9  0xbff9c294 in ?? ()
#10 0x0804f7a8 in ?? ()
#11 0x0804f928 in ?? ()
#12 0x48c8e793 in ?? ()
#13 0xb7f95360 in _dl_make_stack_executable () from /lib/ld-linux.so.2
#14 0x080486e9 in _init ()
#15 0xb7494fec in __libc_start_main () from /lib/libc.so.6
#16 0x080487f1 in ?? ()

(I also submitted bp-2302a8e8-7fe5-11dd-8483-001a4bd43e5c but that doesn't give any more info than before.)

Seems a lot like the crash in Bug 448076 or Bug 449371. I'm pretty clueless about Linux graphics nowadays, but I do see using lsof that seamonkey has /usr/lib/libXinerama.so.1.0.0 loaded...
Summary: Crash on quit → Crash on quit [@ XCloseDisplay]
(In reply to comment #3)


i have similar to yours crash but with firefox

Mozilla/5.0 (X11; U; Linux i686; ru_RU; rv:1.9.1b1pre) Gecko/20080925021240 Minefield/3.1b1pre

#0  0xb2fbb0f7 in ?? ()
#1  0xb6ba5bba in XCloseDisplay () from /usr/lib/libX11.so.6
#2  0x00000004 in ?? ()
#3  0x00000001 in ?? ()
#4  0xbfe40d88 in ?? ()
#5  0xb6e875ed in gdk_display_x11_finalize () from /usr/lib/libgdk-x11-2.0.so.0
Assignee: general → nobody
Component: General → Widget: Gtk
Product: SeaMonkey → Core
QA Contact: general → gtk
Attached patch confirm problem (obsolete) — Splinter Review
OK with this patch I at finally confirmed that the problem is the same as solved on OpenSolaris. With the patch I don't get a shutdown crash any my Linux system anymore.
So, either we leak (as in bug 403706) or we crash, both is bad. ;-)

Boying Lu: you found out (in bug 449371) about this callback on OpenSolaris. Is there something one can do automatically to find out if this callback was registered? Like, do certain X variables or versions do that?

The crash occurs here on a Gentoo Linux running libXinerama 1.0.2. This is a laptop with only one screen (it even has Option "Xinerama" "off" in xorg.conf), but it of course does have a port to plug in another one. On a SuSE Linux workstation with only one screen it does not crash. libXinerama there is coming from the xorg-x11-libs package (version 7.3-64.1).

I hope someone can deduce something from that. I'm happy to test any suggestions...
(In reply to comment #6)
> So, either we leak (as in bug 403706) or we crash, both is bad. ;-)
> 
> Boying Lu: you found out (in bug 449371) about this callback on OpenSolaris. Is
> there something one can do automatically to find out if this callback was
> registered? Like, do certain X variables or versions do that?
> 
I don't know if there is any automatical way. I suggest you to read the Xinerama 
related source code.

I think a quick check is removing PR_UnLoadLibrary() of the patch of the bug 449371 and make a build. Run the build to see if it crashes or not.
(In reply to comment #7)
> I think a quick check is removing PR_UnLoadLibrary() of the patch of the bug
> 449371 and make a build. Run the build to see if it crashes or not.

Yes, that's what I already did before CCing you. See comment 5 or attachment 350311 [details] [diff] [review]. It seems that on Linux one can have at least two configurations, one that crashes and one that doesn't...
I built and run the latest firefox on my ubuntu 8.10 box. But I don't see the problem.  

Peter, can you tell me your build configuration and Xinerama library version? Thanks
Xinerama here is "libXinerama 1.0.2", also see comment 6.

The build that I tested most recently has the following .mozconfig:
  ac_add_options --enable-application=suite
  ac_add_options --enable-optimize
  ac_add_options --disable-debug
  ac_add_options --disable-tests
  ac_add_options --disable-updater
  ac_add_options --enable-codesighs
Except the --disabler-updater this should be identical to the official SM nightlies.
I added some more debugging to nsScreenManagerGtk.cpp and see that _XnrmIsActive returns true and numScreens is queried by _XnrmQueryScreens as 1. Adding a call to XineramaQueryVersion I get major=1 minor=1, despite the package system telling me that it's libXinerama 1.0.2.
Aha! On the OpenSuSE system where the program does not crash, I see that numScreens=0! Perhaps that can be used to test if we should unload or not?
The actual change is to use the original reported number of screens in ~nsScreenManagerGtk() when checking if we want to unload. But I left the debugging stuff in for now.
On the problematic Gentoo system it says here:
   nsScreenManagerGtk :: Init(): _XnrmIsActive = true!
   nsScreenManagerGtk :: Init(): version=1.1
   nsScreenManagerGtk :: Init(): numScreens=1
   mCachedScreenArray.Count()=1, nNumScreens=1
   not unloading mXineramalib=0x428cb100

while on the OpenSuSE system I get only
   nsScreenManagerGtk :: Init(): numScreens=0
   mCachedScreenArray.Count()=1, nNumScreens=0
   unloading mXineramalib=0xf2a52de0

Boying Lu, could you test this patch on your Ubuntu system and maybe on Solaris, too?
Attachment #350311 - Attachment is obsolete: true
(In reply to comment #13)
> 
> Boying Lu, could you test this patch on your Ubuntu system and maybe on
> Solaris, too?

Yes. the patch works on both Solaris and Ubuntu.
Attached patch cleaned up patch (obsolete) — Splinter Review
OK, great, thanks. So this is ready to be reviewed, now without debugging stuff.

dbaron, as this code comes from you originally, I thought you might want to take a look. :-)
Assignee: nobody → mozilla
Status: NEW → ASSIGNED
Attachment #351139 - Flags: superreview?(roc)
Attachment #351139 - Flags: review?(dbaron)
(In reply to comment #14)
> Yes. the patch works on both Solaris and Ubuntu.

can't wait it in trunk :-)
Why not just make the member variable "PRBool mXineramaIsActive"?  That would seem a good bit clearer.
Attached patch better solutionSplinter Review
Yes, indeed. That also removes the weirdness of having two similarly named variables.
Attachment #350955 - Attachment is obsolete: true
Attachment #351139 - Attachment is obsolete: true
Attachment #351205 - Flags: superreview?(roc)
Attachment #351205 - Flags: review?(dbaron)
Attachment #351139 - Flags: superreview?(roc)
Attachment #351139 - Flags: review?(dbaron)
Attachment #351205 - Flags: superreview?(roc) → superreview+
Comment on attachment 351205 [details] [diff] [review]
better solution

+  PRBool mXineramaIsActive;

PRPackedBool
Comment on attachment 351205 [details] [diff] [review]
better solution

r=dbaron
Attachment #351205 - Flags: review?(dbaron) → review+
what about branch 3.1?
it still crashes :-(
Attachment #351205 - Flags: approval1.9.1?
Mozilla/5.0 (X11; U; Linux i686; ru; rv:1.9.1b3pre) Gecko/20081215 Shiretoko/3.1b3pre

(gdb) bt
#0  0x4bbb40f7 in ?? ()
#1  0xb699dbba in XCloseDisplay () from /usr/lib/libX11.so.6
#2  0x00000004 in ?? ()
#3  0x00000001 in ?? ()
#4  0xbff08658 in ?? ()
#5  0xb6c837f7 in gdk_display_x11_finalize () from /usr/lib/libgdk-x11-2.0.so.0
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
Rion: of course branch still crashes. You have to wait until this bug is marked fixed1.9.1.
(Actually, I forgot to mark it fixed because of mozilla-central checkin.)
Status: ASSIGNED → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Comment on attachment 351205 [details] [diff] [review]
better solution

a191=beltzner
Attachment #351205 - Flags: approval1.9.1? → approval1.9.1+
After the fix of bug 477959, I start to see this bug again on Solaris/SPARC with Xsun.

With Xsun, _XnrmIsActive returns false.
I guess in this situation we still should not call PR_Unload().
(Bug 477959 hides this issue, since number is uninitiated in this case)

I don't know if it is Xsun only problem.
I don't have another environment that Xinerame extension is available but _XnrmIsActive returns false.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
If so, we should not do PR_Unload() if _XnrmIsActive is ever called.
The easiest way is to remove all these code, because _XnrmIsActive is always called if Xinerama library is loaded. (unless the function pointer is null)
Attached patch patchSplinter Review
Attachment #364486 - Flags: review?
Attachment #364486 - Flags: review? → review?(dbaron)
I forgot to remove mXineramaIsActive from the header file.
Comment on attachment 364486 [details] [diff] [review]
patch

r=dbaron if you also remove mXineramaIsActive from nsScreenManagerGtk.h
Attachment #364486 - Flags: review?(dbaron) → review+
http://hg.mozilla.org/mozilla-central/rev/bb0e88c6c738
Status: REOPENED → RESOLVED
Closed: 16 years ago15 years ago
Keywords: fixed1.9.1
Resolution: --- → FIXED
This would prevent a random crash on quit with Xsun on Solaris and perhaps other platforms.
Attachment #366518 - Flags: approval1.9.1?
Comment on attachment 366518 [details] [diff] [review]
patch for 1.9.1 branch

a191=beltzner
Attachment #366518 - Flags: approval1.9.1? → approval1.9.1+
Crash Signature: [@ XCloseDisplay]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: