Closed Bug 467744 Opened 13 years ago Closed 10 years ago
_window _new from moz _drawingarea _create _windows after Gdk Window 0x???????? unexpectedly destroyed
19.96 KB, text/plain
397.27 KB, text/plain
12.83 KB, text/plain
33.84 KB, text/plain
29.46 KB, text/plain
41.19 KB, text/plain
Includes "window unexpectedly destroyed" errors.
Shows which libraries are loaded from /opt/libdbg (subset of C libraries compiled for debugging) and /usr/LOCAL/lib/firefox-3.0.3pre (Firefox libraries compiled for debugging).
Complete listing of /opt/libdbg which contains the C and gtk/gdk/glib libraries (plus a handful of others) compiled for debugging. These are included in LD_LIBRARY_PATH before running Firefox so one can get useful strack traces. One can see the precise versions using the symbolic links. Not all libraries are compiled for debugging but the primary libraries which seem to be related to the problems I have encountered over several years are. In general the libraries are versions of the "current" Gentoo Linux source releases.
Here is the gdb trace of the core dump with firefox 3.0.3 pre. I am not sure of all of the information you might require but can should be able to produce it if you request it. I could upload the files if you provide an FTP area, but the core dump itself is ~1.4 GB and the firefox binaries perhaps ~150MB so it might take a while. It might be better to simply create a ssh login so that you could use my machine "as is". Though my limited read of the core dump is that it was having a problem in "mozdrawingarea.c" in moz_drawingarea_create_windows() and calls to gdkwindow.c:gdk_window_new() which is exactly what it would be doing after destroying the gmail compose & send window and redrawing it as the Inbox listing window. This is also exactly what I would expect to have be a problem if the old window was not fully destroyed before the new window was being created if there is not proper locking on the "active" window lists. Now the problem may be in the gdk/gtk window locking code (and multi-threaded applications) which is why I suggested that you are really going to need someone who understands how those libraries work. I've tried going through the code to a limited extent and IMO it would require several days of dedicated effort to get up-to-speed.
It looks like I may finally understand this. I had an instance where Firefox did a core dump without Gmail being involved (I had a tab which failed to load a URL and was either attempting to delete it or return it to the default home page). It did a SEGVIO core dump with similar characteristics to those found with Gmail. I've managed to learn enough GDB that I can kind of work through the core dumps (and the trace will subsequently be attached) but here is what appears to be the problem: nsWindow::NativeCreate() calls widget/src/gtk2/mozdrawingarea.c:moz_drawingarea_new() calls widget/src/gtk2/mozdrawingarea.c:moz_drawingarea_create_windows() calls gdk_window_new() (=IA__gdk_window_new()) This calls: window = _gdk_window_new (parent, attributes, attributes_mask); which detects that "parent" has been marked as destroyed, presumably by a separate asynchronous call to _gdk_window_destroy_hierarchy() which sets private->destroyed = TRUE This is checked for in gdk/x11/gdkwindow-x11.c:_gdk_window_new() using the GDK_WINDOW_DESTORYED(parent) macro. While gdk_window_new() checks to make sure (parent != NULL) it does not check (window != NULL) thus GDK_WINDOW_OBJECT() which presumably translates into g_type_check_instance_cast() is passed a NULL and gets back a NULL (which it assigns to "private"). Thus the statement private->redirect = parent_private->redirect results in a SEGVIO due to private being NULL. This can possibly be considered a bug in gdk_window_new (not making sure the parent window is "undestroyed" before using it). It can also be considered a *BUG* in Mozilla/Firefox because it seems to be attempting to create a new "child" window for a window being destroyed (perhaps asynchronously). This seems to be consistent with my experience of when and where the problem crops up (one sees a "GdkWindow 0xXXXXXXXX unexpectedly destroyed" warning) just before one gets an orphan window being created (if one is lucky) or a SEGVIO and a core dump (if one is unlucky). Could somebody *please* who understands window creation & deletion interactions please fix the locking so that windows are not created as children of windows in the process of being destroyed? In the meantime I'm going to modify my gdk_window_new() so the code is if (parent != NULL && window != NULL) ... Lord only what Mozilla/Firefox will do if gdk_window_new() returns "NULL". From the looks of mozdrawingarea.c and mozcontainer.c they always expect gdk_window_new() to work and make subsequent gdk_window_set_user_data() calls. Looks like people haven't considered the possibility that one could run out of memory or swap space in which new windows can be created... (Wait until somebody tries running this on an Android Phone...)
As pointed out in the attachment this is a somewhat edited GDB trace but I think I have diagnosed the problem correctly (see Comment #6).
This bug has been filed also as Gnome Bug #563592 (@ bugzilla.gnome.org) with references to the related Firefox bugs. It is suggested that the Mozilla/Firefox developers of the gtk2 utilities should interact with the Gnome gdk/gtk developers to determine how window creation & deletion should be "locked" to prevent such activities from interfering with each other. Either that and/or the Mozilla/Firefox developers need to do a complete code review to make sure they are not assuming the success of library functions which may infrequently fail (esp. under high stress/load situations).
Attachment #351160 - Attachment mime type: application/octet-stream → text/plain
Attachment #351161 - Attachment mime type: application/octet-stream → text/plain
Attachment #351163 - Attachment mime type: application/octet-stream → text/plain
Thank you for the analysis. The patch in bug 451341 may help here, or at least catch the problem with destroyed windows earlier. It may require some hand editing to apply to 3.0, or you can use 3.1beta1 or 3.1beta2 (which is going through QA right now), which already have the patch applied. (In reply to comment #6) > nsWindow::NativeCreate() > calls widget/src/gtk2/mozdrawingarea.c:moz_drawingarea_new() > calls widget/src/gtk2/mozdrawingarea.c:moz_drawingarea_create_windows() > calls gdk_window_new() (=IA__gdk_window_new()) > > This calls: > window = _gdk_window_new (parent, attributes, attributes_mask); > which detects that "parent" has been marked as destroyed, presumably > by a separate asynchronous call to > _gdk_window_destroy_hierarchy() which sets private->destroyed = TRUE Have you checked the value of the destroyed field to confirm this theory? I think this would be available through p ((GdkWindowObject*)parent)->destroyed > Looks like people haven't considered the possibility that one could run out > of memory or swap space in which new windows can be created... g_object_new typically uses glib memory allocation functions that abort on failure and so never return NULL. This doesn't provide any mechanism for recovery but it is always a safe (non-exploitable) crash.
Status: UNCONFIRMED → NEW
Component: General → Widget: Gtk
Depends on: 451341
Ever confirmed: true
Product: Firefox → Core
QA Contact: general → gtk
Well I had already recompiled the GDK/GTK libraries with the suggested change (and possibly a C compiler upgrade) from the old libraries so GDB doesn't seem to load them so as to get the correct stack trace again. But if I assume the original strack trace (posted) was correct and try: p ((GdkWindowObject *)0x7f2ee5e0)->destroyed it *does* print "$1 = 1". So I read that as a strong indication that parent->destroyed = TRUE. which in my framework gives more support to the case that one thread is trying to create windows while another thread is destroying them and there may be lingering cases where that breaks the libraries if locking isn't handled at higher levels. I did look briefly at Bug #451341 and it doesn't appear to be the same bug (do you get a "window unexpectedly destroyed" warning?) but I'll have to study it further. In the meantime, I'll start running on firefox-3.0.6pre with the patched gdkwindow.c tomorrow (CVS does not seem to give me firefox-3.1). I will also note for the record that "--disable-strip-libs" still appears to be unfixed when installing firefox (firefox-bin is unstripped but all the libraries are stripped) so some by-hand copying from the build directories to the install directories is required if one expects "robust" traces.
Component: Widget: Gtk → Toolbars
Product: Core → Firefox
Version: unspecified → 3.0 Branch
Component: Toolbars → Widget: Gtk
Product: Firefox → Core
Version: 3.0 Branch → unspecified
I've got 7 firefox core dumps (mostly in the range 1.2-1.6 GB -- good thing I've got some large disks :-)) for Firefox 3.0.3 dating back 6+ weeks and even though the stack traces don't quite read properly because I've recompiled the gtk/gdk libraries the frame tracing still pretty much works and one can determine what the parent argument to gdk_window_new() is. For 5 of those core dumps the parent->destroyed flag is set to 1. This pretty much seals it from my perspective that (a) there are still some latent bugs in libgdk/libgtk/libglib involving multi-threaded manipulation of the data structures; and(or?) (b) Firefox/Mozilla/et al are not doing a very good job making sure the threads don't step on each other. Lord knows what a mess there is going to be when the 6 & 12 core CPUs start arriving in 2010.
(In reply to comment #11) > For 5 of those core dumps the parent->destroyed flag is set to 1. Thanks for checking that out. (In reply to comment #10) > which in my framework gives more support to the case that one thread is trying > to create windows while another thread is destroying them and there may be > lingering cases where that breaks the libraries if locking isn't handled at > higher levels. All Mozilla/GDK interaction happens on one thread. The only possibility of other threads being involved would be if plugins were manipulating windows on another thread, but that seems unlikely. However, it does sound like a window that has been destroyed is being used while creating a child window. > I did look briefly at Bug #451341 and it doesn't appear to be the same bug It was the patch, rather than the bug, that I was referring to. The patch includes better cleaning up of destroyed nsWindows, so as not to hold dangling pointers to destroyed GdkWindows, and assertions that may catch some flaws in the window hierarchy. It would be interesting to know whether Mozilla's window structures know/think that the native window is destroyed. If you have sufficient debug information for any crash, would you mind checking this, please? The best way to do this would be to first run the following command: (gdb) set print object on Note that there are two different stacks here, so two different commands would be required to get the information. In one stack, nsWindow::NativeCreate has aParent, but not aNativeParent: #7 0x08245015 in nsWindow::NativeCreate (this=0x9bacce00, aParent=0x9b9c6ef0, aNativeParent=0x0, aRect=@0xbfc3ff74, aHandleEventFunction=0x846ee1a <HandleEvent>, aContext=0x54952580, aAppShell=0x0, aToolkit=0x0, aInitData=0xbfc3ff5c) at nsWindow.cpp:3244 (gdb) f 7 (gdb) p aParent->mIsDestroyed In the other stack, nsWindow::NativeCreate has aNativeParent, but not aParent: #7 0x08245015 in nsWindow::NativeCreate (this=0x66e6bdf0, aParent=0x0, aNativeParent=0x7f2ee5e0, aRect=@0xbfef3b64, aHandleEventFunction=0x846ee1a <HandleEvent>, aContext=0x620386d0, aAppShell=0x0, aToolkit=0x0, aInitData=0xbfef3b4c) at nsWindow.cpp:3244 #10 0x082806a6 in DocumentViewerImpl::MakeWindow (this=0x6b18e0e0, aSize=@0xbfef3c70) at nsDocumentViewer.cpp:2259 (gdb) f 10 (gdb) p mParentWidget->mIsDestroyed > (CVS does not seem to give me firefox-3.1). 3.1 version control is handled by mercurial, or a source tarball is available at http://releases.mozilla.org/pub/mozilla.org/firefox/releases/3.1b2/source/ > will also note for the record that "--disable-strip-libs" still appears to be > unfixed when installing firefox (firefox-bin is unstripped but all the > libraries are stripped) so some by-hand copying from the build directories to > the install directories is required if one expects "robust" traces. It would be good to file a bug on this if there is not one already. I'm not sure if I understand the issue fully, but you can run firefox directly from the dist/bin build directory.
Depends on: 451341
Version: unspecified → 1.9.0 Branch
(In reply to comment #13) > Is the mouse operating via a different thread? There is no "mouse" thread within Mozilla. The mouse events come through the kernel and the X server (which is a separate process), and are delivered to the SeaMonkey process as X events. There is potential for race conditions between the SeaMonkey process and the X server, as X server events and actions are asynchronous, but I won't speculate on the cause of the problem based on the current information. > Side note: I do not be able to build the current beta (Bug #469493), so > progress in that direction seems to be blocked until they have a 3.1 beta > distribution which will build under Linux. > If you wish to contribute you might most various config options which do indeed > work under Linux. I use the following and run the executable in the dist/bin directory: ac_add_options --disable-optimize --enable-debug --disable-installer --disable-libxul ac_add_options --enable-extensions=" gnomevfs" ac_add_options --enable-ogg ac_add_options --with-system-jpeg=/usr ac_add_options --with-system-lcms=/usr ac_add_options --with-system-zlib=/usr ac_add_options --enable-startup-notification
I saw the same kind of crash as described in comment 6 with Firefox 3.0.7 following a link for an identifier search at mxr.mozilla.org. The parent GdkWindow for gdk_window_new is the same window that was previously "unexpectedly destroyed" . Its refcount is still 1 held by the MozDrawingArea for which it is the inner_window and its nsWindow is not destroyed. I saw a few earlier "unexpectedly destroyed" messages from GDK when opening links in background tabs for other pages, but they did not result in a crash. (This crash happened following a link in the same tab.) All (~4) unexpectedly destroyed windows were inner_window leaves in the GdkWindow tree, which looked intact. Each of the associated nsWindows were of type child, had no nsWindow parent, were 3 nsWindows below what looked like the first (zeroth) nsWindow for a tab, and had the size and position of the document window. The only child nsWindows with no parent that I know of come from DocumentViewerImpl::MakeWindow(). The crash happens because a window that seems to have been destroyed without its nsWindow's initiation or GDK's initiation gets used as the parent in gdk_window_new(). So the root cause seems to be whatever destroyed the window (unexpectedly), which is still a mystery.
It is likely that this will be fixed in 126.96.36.199 as I haven't seen any "unexpectedly destroyed" messages since applying the patch from bug 451341, and this will be applied for Gecko 188.8.131.52. If you see the same kind of crash with 1.9.1 beta 4 (Firefox 3.5 beta 4) or trunk or 184.108.40.206, then please reopen.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WORKSFORME
I got some "unexpectedly destroyed" messages recently, even with the patch from 451341. The messages were later followed by this crash.
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
What's happening here is: 1) GDK creates a GdkWindow which is a record of an X window. The XID for the X window is recorded on the GdkWindow. So that X events can be traced from the X window XID to the GdkWindow, the GdkWindow is recorded in a hash table keyed by XID (and another reference is added to the GdkWindow). 2) The GdkWindow is destroyed (but not deleted). This calls XDestroyWindow with the associated XID, and so the server sends a DestroyNotify event and deletes the X window. (One reference to the GdkWindow will not be removed until the DestroyNotify event is received.) 3) GDK creates a second GdkWindow, which uses XCreateWindow. XCreateWindow returns an XID equal to the XID associated with the previous window. This new GdkWindow is recorded against the XID in the table, removing the entry pointing to the first GdkWindow (without removing a reference to the first GdkWindow). 4) GDK processes X events, including the DestroyNotify event for the first window. The XID in the event is translated to a GdkWindow using the table, but this now returns the second GdkWindow. This GdkWindow was not expecting a DestroyNotify event and so a warning "GdkWindow %#lx unexpectedly destroyed" is emitted. A reference is removed from this second GdkWindow, and it is marked as destroyed (even though the X window has not been destroyed. So the first GdkWindow is leaked, and when the second GdkWindow (marked destroyed) gets used as a parent for gdk_window_new, the result is the crash described in comment 6. I hope the X server is always sending the DestroyNotify event before making the XID available again. In the situation that I saw, the event was sitting in the queue after XCreateWindow was called at step 3. (I hope it was in the queue before the XCreateWindow call.)
Assignee: nobody → mozbugz
I suspect that I'm seeing this too. But I don't have an idea how to tell. I'm running Fedora 10's firefox-3.0.10-1.fc10.x86_64 I did run the "firefox --sync" command under the script command, so I have a list of the output. And I have a core file. Firefox crashes for me ever week or so. It used to crash more frequently until the generate_xid bug was fixed https://bugs.freedesktop.org/show_bug.cgi?id=20254 The last thing printed before the segfault was (firefox:3824): Gdk-WARNING **: GdkWindow 0x44f4e45 unexpectedly destroyed GDB says that the segfault happened here: #3 0x0000003da7434d4d in IA__gdk_window_new (parent=0x82fabe0, attributes=<value optimized out>, attributes_mask=<value optimized out>) at gdkwindow.c:381 I will attach the typescript showing the stdout & stderr of firefox and gdb output including a traceback. Is there any additional information that I should gather?
the gmail site was not the trigger of this crash.
Yes, that's the same issue, thanks Hugh. I think we have enough information about what is happening in Mozilla and GTK2. I don't know whether this should be fixed in GTK, Xlib, or Xserver.
I can confirm that sometime between gtk+-2.14.3 and gtk+-2.16.1 the gtk developers changed gdk/gdkwindow.c to check for window == NULL, the code, mow lines 379-380 reads: window = _gdk_window_new (parent, attributes, attributes_mask); g_return_val_if_fail (window != NULL, window); So if my experience is to serve as an example (esp. regarding stack traces provided), then this should fix the SEGVIO part of this problem. It may not fix the "window unexpectedly destroyed" part of the problem, or the detached window problem which as Karl's analysis shows are a bit more complicated. But at least the fix may end crashing Firefox. So if one wants to avoid the crashes one had best be running a recent release of the gtk+ libraries. However, as Karl's analysis shows the Firefox code may be stressing gtk/gdk enough that one could trip over other lurking problems of a different nature. so its clearly a case of "buyer beware".
I wanted to comment that this bug is *NOT* fixed in Firefox 3.0.11 (current Gentoo release). This is presumably the major component to soon be "3.5 production" (something which really shouldn't happen until this long existing bug is resolved). I've got NINE "GdkWindow 0x.... unexpectedly destroyed" messages on my current Firefox console (along with a slew of other gdk_window_set/show/move/hide... assertion failures) from a "large" firefox session (71 windows / 850 tabs) that is currently consuming 30-60% of my CPU doing essentially nothing. As mentioned previously this shows up most frequently (for me) on a machine which either has a very large (and CPU wasting) Firefox session and/or Firefox running simultaneously with system builds (Gentoo emerge's) which compete for CPU time causing Firefox and/or X to lose the CPU. (This is on a single core Pent. IV Prescott.) I had to laugh at the recent 3.5 review I read that commented that the "new" and "improved" Firefox 3.5 could handle a lot more windows/tabs without problems (but what do reviewers know...).
3.5 was released and, surprise, surprise: firefox-bin:11131): Gdk-WARNING **: XID collision, trouble ahead (firefox-bin:11131): Gdk-WARNING **: XID collision, trouble ahead (firefox-bin:11131): Gdk-WARNING **: XID collision, trouble ahead (firefox-bin:11131): Gdk-WARNING **: XID collision, trouble ahead (firefox-bin:11131): Gdk-WARNING **: XID collision, trouble ahead (firefox-bin:11131): Gdk-WARNING **: XID collision, trouble ahead (firefox-bin:11131): Gdk-WARNING **: XID collision, trouble ahead (firefox-bin:11131): Gdk-WARNING **: XID collision, trouble ahead (firefox-bin:11131): Gdk-WARNING **: XID collision, trouble ahead (firefox-bin:11131): Gdk-WARNING **: XID collision, trouble ahead
I too am using 3.5 (Fedora 11's firefox-3.5-1.fc11.x86_64). I'm running it with --sync which, I thought, would prevent these problems, but it does not. Here is the first chunk of what appeared on stdout or stderr. There is more. These are all the messages printed, up to a point in time. I don't know how they are related -- the session lasted some days. I ended the session when FF seemed unable to finish displaying a page. (firefox:3192): Gdk-CRITICAL **: gdk_x11_xatom_to_atom_for_display: assertion `xatom != None' failed (firefox:3192): Gdk-CRITICAL **: gdk_x11_xatom_to_atom_for_display: assertion `xatom != None' failed (firefox:3192): Gdk-CRITICAL **: gdk_x11_xatom_to_atom_for_display: assertion `xatom != None' failed (firefox:3192): Gdk-CRITICAL **: gdk_x11_xatom_to_atom_for_display: assertion `xatom != None' failed (firefox:3192): Gdk-CRITICAL **: gdk_x11_xatom_to_atom_for_display: assertion `xatom != None' failed (firefox:3192): Gdk-CRITICAL **: gdk_x11_xatom_to_atom_for_display: assertion `xatom != None' failed (firefox:3192): Gdk-CRITICAL **: gdk_x11_xatom_to_atom_for_display: assertion `xatom != None' failed (firefox:3192): Gdk-WARNING **: XID collision, trouble ahead (firefox:3192): Gdk-WARNING **: GdkWindow 0x44f263c unexpectedly destroyed (firefox:3192): Gdk-WARNING **: XID collision, trouble ahead (firefox:3192): Gdk-WARNING **: GdkWindow 0x44f2607 unexpectedly destroyed
I should have mentioned that Fedora 11 includes the fix for xcb_generate_id https://bugs.freedesktop.org/show_bug.cgi?id=20254
I got with freshly started seamonkey browser, before even going to any web page, this crash as well on Gentoo Linux: (gecko:20280): Gdk-CRITICAL **: gdk_x11_xatom_to_atom_for_display: assertion `xa tom != None' failed (gecko:20280): Gdk-CRITICAL **: gdk_x11_xatom_to_atom_for_display: assertion `xa tom != None' failed (gecko:20280): Gdk-CRITICAL **: gdk_x11_xatom_to_atom_for_display: assertion `xa tom != None' failed /usr/libexec/mozilla-launcher: line 119: 20280 Segmentation fault (core dum ped) $(type -P aoss) "$mozbin" $xulparams "$@" seamonkey-bin exited with non-zero status (139) Also seen in other sessions the "XID collision, trouble ahead" messages. Mozilla/5.0 (X11; U; Linux i686; en-US; rv:220.127.116.11) Gecko/20090731 SeaMonkey/1.1.17
And partially resolved stacktrace is: (gdb) where #0 0xb7ef7424 in __kernel_vsyscall () (gdb) where #0 0xb7ef7424 in __kernel_vsyscall () #1 0xb7cf2d5e in raise () from /lib/libpthread.so.0 #2 0xb5d28ff4 in ?? () from /usr/lib/seamonkey/components/libprofile.so #3 0x0000000b in ?? () #4 0xb5d2578b in nsProfileLock::FatalSignalHandler (signo=-1211125772) at nsProfileLock.cpp:206 #5 <signal handler called> #6 0xb7b923a7 in gtk_widget_event_internal (widget=0x0, event=0xbfd1405c) at gtkwidget.c:4766 #7 0x0000008d in ?? () #8 0x00000004 in ?? () #9 0x00000020 in ?? () #10 0x08b96a88 in ?? () #11 0x00000034 in ?? () #12 0x080940b0 in ?? () #13 0x00000000 in ?? () (gdb)
Note that the recent reports, perhaps comment #25, but certainly comment #26 thru comment #29 may (or may not) be a different bug. See Bug #507910. It is critical that people document the versions of libgdk/gtk/glib that they are using on their system to diagnose these 3 problems (xatom conversion, XID collision and Window unexpectedly destroyed) as I am aware of partial fixes for at least the last two which I have had in my library sources at some point (but may have lost during recent upgrades). Please cite the Gnome Bugs database where appropriate.
(In reply to comment #29) I have a Gentoo system, witch gtk+2.16.5, glib-2.20.4. $ equery belongs libgdk.so * Searching for libgdk.so ... x11-libs/gtk+-1.2.10-r12 (/usr/lib/libgdk.so -> libgdk-1.2.so.0.9.1) $ Hope this helps somebody.
This was caused by the same issue as caused bug 263160.
No longer blocks: 263160
Status: REOPENED → RESOLVED
Closed: 12 years ago → 10 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 263160
You need to log in before you can comment on or make changes to this bug.