Last Comment Bug 263160 - frames open in new windows leaving the firefox window unusable
: frames open in new windows leaving the firefox window unusable
Status: RESOLVED WORKSFORME
: crash, topembed
Product: Core
Classification: Components
Component: Widget: Gtk (show other bugs)
: Trunk
: x86 Linux
: -- critical with 32 votes (vote)
: ---
Assigned To: Nobody; OK to take it and work on it
:
:
Mentors:
http://bugzilla.gnome.org/show_bug.cg...
: 244482 339251 348734 349497 352178 354104 354970 365734 367211 367832 368260 370787 370915 381270 395999 399436 402774 409059 410325 467744 (view as bug list)
Depends on: 130078 widget-removal
Blocks: 362955
  Show dependency treegraph
 
Reported: 2004-10-06 06:23 PDT by Mikael Hedberg
Modified: 2011-09-20 01:03 PDT (History)
61 users (show)
mbeltzner: blocking1.9-
mbeltzner: wanted1.9+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
gtk errors logged to stderr when bug occurs (50.99 KB, text/plain)
2006-10-26 03:04 PDT, Adrian Mettler
no flags Details
Backtrace from the creation of a rogue window in Epiphany (47.24 KB, text/plain)
2007-02-02 21:45 PST, Braden
no flags Details
Example of Window creation & destruction with NSPR_LOG_MODULES=Widget:4 (2.46 KB, text/plain)
2007-02-21 08:08 PST, Robert Bradbury
no flags Details
Example of unusual window destruction case. (49.06 KB, text/plain)
2007-02-21 08:18 PST, Robert Bradbury
no flags Details
gdb trace of SegFault while attempting to trigger this bug (43.66 KB, text/plain)
2007-02-23 05:31 PST, Robert Bradbury
no flags Details
gdb trace of seamonkey around the time of problem (7.46 KB, text/plain)
2007-02-24 13:09 PST, Robert Bradbury
no flags Details
Firefox segfaulting in gdk_window_get_root_origin? (28.01 KB, text/plain)
2007-02-25 08:12 PST, Robert Bradbury
no flags Details
case of using RobertB firefox-bin 2.0 version with debug instrumentation (3.15 KB, text/plain)
2007-03-03 18:01 PST, erik red
no flags Details
gdb trace of Gmail hung during loading Inbox (22.78 KB, text/plain)
2007-03-13 09:35 PDT, Robert Bradbury
no flags Details
gdb trace of gmail hung with 4 untitled windows (22.60 KB, text/plain)
2007-03-13 09:37 PDT, Robert Bradbury
no flags Details
gdb trace of SeaMonkey hung in gmail with 4 untitled windows (35.88 KB, text/plain)
2007-03-13 12:12 PDT, Robert Bradbury
no flags Details
Full trace of GdkWindow %#lx unexpectedly destroyed (19.74 KB, text/plain;charset=iso-8859-1)
2007-05-12 11:28 PDT, Robert Bradbury
no flags Details
Traces of window (frame) destroy's (79.80 KB, text/plain)
2007-05-14 10:12 PDT, Robert Bradbury
no flags Details
Screenshot displaying window error (291.41 KB, image/jpeg)
2007-06-09 12:07 PDT, Tom Simnett
no flags Details
Screenshot displaying window error (291.41 KB, image/jpeg)
2007-06-09 12:07 PDT, Tom Simnett
no flags Details
Yet another example of Firefox going south... (17.36 KB, text/plain)
2007-11-06 14:25 PST, Robert Bradbury
no flags Details
craziness (#1) (208.62 KB, image/png)
2007-11-27 01:06 PST, Reed Loden [:reed] (use needinfo?)
no flags Details
craziness (#2) (292.01 KB, image/png)
2007-11-27 01:08 PST, Reed Loden [:reed] (use needinfo?)
no flags Details
GDB log of window unexpectedly destroyed errors (64.84 KB, text/plain)
2008-04-24 04:44 PDT, Robert Bradbury
no flags Details
Window destroyed problems opening new tabs (8.06 KB, text/plain)
2008-04-24 05:13 PDT, Robert Bradbury
no flags Details
Set of gdb stack traces of destroyed windows (63.98 KB, text/plain)
2008-05-25 12:09 PDT, Robert Bradbury
no flags Details

Description Mikael Hedberg 2004-10-06 06:23:55 PDT
User-Agent:       Mozilla/5.0 (X11; U; Linux i686; rv:1.7.3) Gecko/20041002 Firefox/0.10.1
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; rv:1.7.3) Gecko/20041002 Firefox/0.10.1

After using Firefox for a while with a number of tabs open, sometimes links will
open in new (separate) windows rather than in the frame where they belong. These
new windows will lack a close button and Firefox crashes if you try to forcibly
close them.

The "main" or "real" Firefox window is left unusable where the frame should have
been. The space is not drawn, and does not react to mouse clicks. Reloading the
page or reopening the link (if it's in a frame) can sometimes cause the bogus
window to disappear and the page to be drawn in its proper place.

This happens both when opening links inside a tab, reloading a page in a tab,
opening a link in a new tab and opening a link which opens inside a different
frame on the same page. It seems to happen more the longer you've had Firefox
open, and possibly the more tabs you've been using, though that could just be a
"how much you've used it" criteria, but can sometimes pop up even after
relatively short periods of usage. It's not in any way reliable enough to
reproduce in any way I've managed to figure out.

A restart (of Firefox) "solves" the problem.

I've never experienced this running on Windows, but I'm unsure if it's ever
happened on Solaris. I'm running WindowMaker 0.80.2 on x.org 6.7.0 and Linux
2.4.26, but both different x.org/XFree86 versions and linux kernel versions seem
to be affected.

Also, this has been around since at least Firefox 0.9.

Reproducible: Couldn't Reproduce
Steps to Reproduce:
Comment 1 Mikael Hedberg 2004-10-07 07:40:18 PDT
Managed to grab a couple of screenshots. Note the ad banner image (which is an
iframe) that has turned up in the wrong place, its own window. Also, the
blue-ish box in the middle of the page is my desktop background image hanging
around from a workspace change:
http://www.mozilla.se/bugs/firefox-bug-1.png

Right after that, reloading the page made it even worse. Now both the iframe of
the banner and the frame of the whole tab got ripped out. Note how the window
covers the toolbar of the "real" firefox window, and the "Screen Shot" window
showing through the firefox window is actually the first screenshot, which again
is on a different workspace. That area isnt updated at all - dragging a window
across it creates a "trail".
http://www.mozilla.se/bugs/firefox-bug-2.png
Comment 2 Ed Anderson 2005-03-18 08:40:37 PST
I've experienced this bug several times as well.  It happens after I've had
firefox open for several weeks.  On each page reload, on the frames page, any of
the frames may "jump out" of the page in a new window.  Between 80 and 90% of
the time, the frame will actually fix itself on reload, but may pop out on
subsequent reloads.

Restarting firefox does fix the problem.  Everything Mikael said was true for me
as well.  

I run Gentoo Linux with Xorg and Blackbox WM.
I took a screenshot:  <a
href="http://www.nilbus.com/pub/firefox-frames-bug.png">http://www.nilbus.com/pub/firefox-frames-bug.png</a>
Comment 3 Trevor Watson 2005-04-06 03:13:43 PDT
I have this happen at least once a day on FF 1.0.2 on Solaris, running under
GNOME. It has been happening since FF 0.9 and on more than one GNOME release.
Comment 4 Gervase Markham [:gerv] 2005-09-27 01:48:07 PDT
This is an automated message, with ID "auto-resolve01".

This bug has had no comments for a long time. Statistically, we have found that
bug reports that have not been confirmed by a second user after three months are
highly unlikely to be the source of a fix to the code.

While your input is very important to us, our resources are limited and so we
are asking for your help in focussing our efforts. If you can still reproduce
this problem in the latest version of the product (see below for how to obtain a
copy) or, for feature requests, if it's not present in the latest version and
you still believe we should implement it, please visit the URL of this bug
(given at the top of this mail) and add a comment to that effect, giving more
reproduction information if you have it.

If it is not a problem any longer, you need take no action. If this bug is not
changed in any way in the next two weeks, it will be automatically resolved.
Thank you for your help in this matter.

The latest beta releases can be obtained from:
Firefox:     http://www.mozilla.org/projects/firefox/
Thunderbird: http://www.mozilla.org/products/thunderbird/releases/1.5beta1.html
Seamonkey:   http://www.mozilla.org/projects/seamonkey/
Comment 5 Gervase Markham [:gerv] 2005-10-13 10:16:45 PDT
This bug has been automatically resolved after a period of inactivity (see above
comment). If anyone thinks this is incorrect, they should feel free to reopen it.
Comment 6 Phil Ringnalda (:philor) 2006-09-26 15:54:26 PDT
Reopening, since I have a couple more to mark as duplicates...
Comment 7 Phil Ringnalda (:philor) 2006-09-26 15:54:52 PDT
*** Bug 348734 has been marked as a duplicate of this bug. ***
Comment 8 Phil Ringnalda (:philor) 2006-09-26 15:55:05 PDT
*** Bug 354104 has been marked as a duplicate of this bug. ***
Comment 9 Christian Hernmarck 2006-10-04 03:58:16 PDT
Mozilla/5.0 (X11; U; Linux i686 (x86_64); de; rv:1.8.0.4) Gecko/20060508 Firefox/1.5.0.4

I can confirm this bug - I often leave ff (and Mozilla-suite) open for days/weeks.  Early 2006 I started "facing" this bug and hating it :-)
I often work with Typo3, phpMyAdmin and other tools heavily using frames.
This summer I also noticed this bug in mozilla
Mozilla/5.0 (X11; U; Linux x86_64; de-AT; rv:1.7.13) Gecko/20060411
There I have about 30 tabs open - all the time (news sites etc).

I never saw it on MS Windows, I mostly work on linux (SuSE 10.0 - Xorg 6.8.2, kde). Until end of 2005 I had SuSE 9.1 (XFree 4.3.99, kde) and can't remember this behaviour.
So I think it's also depending on the Window-Manager...

I always try to continue working without closing firefox. When closing the tab with the frames the frame-windows are also disappearing. But sometimes it's useless to try opening the same framepage in another tab or ff window - you always get the "flying frame windows". Just now: after waiting (and reporting here) I can open the last problem frame page again without problems, so maybe it's also a time problem...???

Reproducable: no step-by-step.
It happens when it happens (having ff open for some days and using many frame websites).

/Christian



Comment 10 Rob 2006-10-04 04:09:44 PDT
For me (see Bug 354104) it happens on SLES9, XFree86 4.3.99, Window Maker 0.92.0
Comment 11 Adrian Mettler 2006-10-26 03:04:21 PDT
Created attachment 243592 [details]
gtk errors logged to stderr when bug occurs
Comment 12 Adrian Mettler 2006-10-26 03:06:20 PDT
I can confirm occurence of this bug with Debian, KDE, on i386 and amd64; I usually keep firefox open for weeks.  
At the moment, this and bug 341731 are more or less the only stability issues I'm experiencing.  One can interact with the dislocated content as usual, but closing them or clicking in the pane where they should be rendered causes a crash; reloading the page several times will eventually result in the frames being rendered correctly, but loading another page will often cause the dislocation to occur again.  "tail -n 1000 .xsession-errors | grep Gecko" is attached.
Comment 13 dev.null 2006-12-05 14:37:38 PST
I can confirm this bug. It happens very often on my Debian Computer too. It seems to be connected to the time the xsession (and/or the computer) is running. Furthermore I've noticed this bug the first time after upgrading to a dual-core system using smp.

Why is the status still unconfirmed after several people have confirmed this bug?
Comment 14 Phil Ringnalda (:philor) 2006-12-05 15:17:26 PST
It's unconfirmed partly because it's certainly in the wrong product and component, though the right one isn't clear, and partly because nobody knows what's at fault, and mostly because it doesn't make any difference: if someone chooses to work on it, the difference between this report being UNCO and NEW won't matter to them, while there's a slight chance that someone looking through UNCO bugs will say "oh, I know something that causes that..."
Comment 15 dev.null 2006-12-06 03:01:11 PST
Sorry, but I can't follow that argumentation. I'd think the contrary is the case. If there is a confirmed (and very nasty) bug reported by several people, I would assume somebody feels the obligation to correct this bug before the next Firefox release. I mean somebody has to be reponsible for the quality of Firefox. If the bug is unconfirmed, developers might not care about it because to them, it's very possbile that the bug is a problem somewhere else and not in their product.

You're right that if someone chooses to work on it, the difference between this report being UNCO or NEW won't matter to them. Though, I think marking this bug as new will improve the chances that somebody chooses to work on it.

As I can see, the bug was opened in 2004. How many Firefox release have there been since the first report? Why does nobody fix the bugs for new release? (Or at least stop the new releases until somebody is willing to fix the bug.) How can you release new Firefox versions with open bugs? Or do the developers think there are no bugs because even after several confirmations you leave the bug unconfirmed?

I suggest increasing the severity to at least critical because for users who happen to stumble upon this bug, it's very painful.
Comment 16 Braden 2006-12-17 10:30:31 PST
I'm experiencing this in Epiphany; see <http://bugzilla.gnome.org/show_bug.cgi?id=352408>.
Comment 17 Aaron Brick 2007-01-17 14:03:48 PST
*** Bug 367211 has been marked as a duplicate of this bug. ***
Comment 18 Aaron Brick 2007-01-17 14:07:20 PST
this bug is seriously awful, i also am at a loss why it hasn't been prioritized upwards.
Comment 19 Robert O'Callahan (:roc) (email my personal email if necessary) 2007-01-17 14:23:36 PST
It is awful. It's very hard to debug because it's very hard to reproduce. If you can figure out a reliable way to reproduce it, that would help very very much.
Comment 20 Braden 2007-01-17 14:59:30 PST
If my experience is any indication, Epiphany might be a better context in which to reproduce this than Firefox. This bug bites me *frequently* in Epiphany. While I don't have a magic formula for reliably reproducing it, it tends happen within an hour or so of use. "Busy" pages with a lot of IFRAMEs seem especially likely to trigger it. A heavily customized Google home page is one such animal; cnn.com seems to be another.

While Epiphany is my preferred browser, I've tried reproducing this in Firefox--and I haven't had any luck so far.

Oh, and when I do close one of these rogue windows causing the browser to "crash", I don't get a stack. I get a program exit with a return code of 1.

And FWIW, I'm on x86_64.

I'd be happy to help someone familiar with the Mozilla code to chase this down; but without a stack, I'm not sure where to start.
Comment 21 Robert O'Callahan (:roc) (email my personal email if necessary) 2007-01-17 15:14:22 PST
My guess is that what happens is a subframe's window gets created as a top level widget by mistake. (The crash occurs later so a stack might not help.) I really have no idea how that could happen. A mess of logging code in nsWindow::NativeCreate might help; maybe run under gdb for a while and use a breakpoint with commands to dump a detailed stack every time a top-level GTK window is created ?
Comment 22 Braden 2007-01-17 15:40:13 PST
I'll give it a shot.
Comment 23 Braden 2007-02-02 21:45:05 PST
Created attachment 253838 [details]
Backtrace from the creation of a rogue window in Epiphany

This is a backtrace from the creation of one of these rogue top-level windows when using Epiphany.
Comment 24 Robert O'Callahan (:roc) (email my personal email if necessary) 2007-02-04 19:36:57 PST
In that call to NativeCreate, aNativeParent is non-null. So we should be hitting either
http://lxr.mozilla.org/seamonkey/source/widget/src/gtk2/nsWindow.cpp#2722
or
http://lxr.mozilla.org/seamonkey/source/widget/src/gtk2/nsWindow.cpp#2724
and setting parentGdkWindow or parentGtkContainer to something, which should ensure that this window gets created inside some other window. Can you figure out why it isn't?
Comment 25 Phil Ringnalda (:philor) 2007-02-17 18:43:35 PST
*** Bug 370787 has been marked as a duplicate of this bug. ***
Comment 26 erik red 2007-02-17 18:57:16 PST
Glad to get my bug report 370787 classified as a duplicate of this one. What Mikael Hedberg and the rest of you have described is exactly what is happening to me.

I think the key to reproducing the bug is to open LOTS of windows with LOTS of tabs in them, say 10-20 windows and a 100 tabs. The more the merrier. And try some of the more prone web sites such as www.marketwatch.com or www.huffingtonpost.com.

I have a hunch that web pages with lots of subframes or flash frames also may provoke the bug faster.

This is really a major annoyance, and seeing how many other people have the problem I vote for upgrading it to critical.
Comment 27 Phil Ringnalda (:philor) 2007-02-19 12:26:37 PST
*** Bug 370915 has been marked as a duplicate of this bug. ***
Comment 28 erik red 2007-02-19 13:00:00 PST
Bug 370915 is another excellent description of the same problem.

I have some observations to add: In my experience, it is not strictly required
to run out of memory and start chewing up swap space before the problem occurs.
For example, at the moment of this writing I have 3G ram, 1.4G used, 0G swap
used and I already have two disembodied Untitled windows popping up.

I can also confirm the console error messages syndrome, although I had not made
the connection with the disembodied window problem before.

I can however confirm that the cobination of X and my 2-3 firefox processes are chewing up quite a lot of cpu while this is going on.
Comment 29 Robert Bradbury 2007-02-20 06:32:54 PST
Ok, I, as author of Bug 370915, agree that we are all talking about the same bug and am noving discussion from Bug 370915 to Bug 263160.  Wrestling with this is difficult unless you are adept at building Firefox and system libraries completely in debug mode.  It took me months to work out how to do this but I now have such a system (a "debug" version of firefox-bin alone without the shared libraries is 130MB).  To debug this specific problem easily it appears that you also need to be using gtk+ libraries (gdk & gtk) and the glib libraries (glib) compiled for debugging.  Having libstdc++ and glibc compiled for debugging helps as well.  (minus points to the Firefox developers for not releasing a static binary for Linux with all of these included!).

I am currently *still* running the gdb & firefox instance in the state that produces the bug in the hope that we can figure out what to do with it (I'm filing this bug report using a different seamonkey process set).  As stated in the series of bug reports, the bug isn't easy to reproduce -- but once you get Firefox+X into the state where it is consuming a significant fraction of CPU time (40-60% minimum?), and depending on what other processes are consuming CPU time, you can make it happen without too much trouble.

I thought memory usage was the problem initially but I now no longer think that is the real problem.  The problem is that if you leave Firefox running for days, and/or have opened and closed lots of windows the Firefox heap becomes increasingly fragmented and it take more CPU time to allocate or deallocate anything from the heap.  This becomes problematic if one is near the system physical memory limits (active resident memory ~= total physical memory) because running through the fragmented heap may require paging which will of course make the process slower.

My current working hypothesis is that there is a subtle coordination/timing problem between Firefox, GDK/GLIB & X.

Here is the scenario. My Firefox is currently using 214 MiB of X Server Memory according to the Process Monitor.  I believe that X programs map shared memory and then coordinate when Firefox can write into it and when X can read from it.  Firefox says "create a new tab".  Firefox talks to GDK/GLIB they talk to X and begin this process. (I am relatively well qualified to debug C programs but relatively illiterate about Firefox/GDK/GLIB/X).  Firefox says "create a window", GDK/GLIB<-->X then go off to do this.  Remember there is no window title at this point (that gets assigned later when the page <TITLE> gets loaded.)  Firefox gets handed a presumed-to-be-complete window descriptor.  It then goes off and starts modifying it (presumably trying to make it a <TAB> subwindow of the parent master window).  *But* because the CPU is heavily loaded GDK/GLIB<-->X haven't gotten their act together to really have completed creating the window.  So Firefox is running along trying to modify a bunch of NULL fields in the window description (leading to all the .... "!= NULL" and (null) errors in the console log file resulting from internal checks in the GDK/GTK/GLIB libraries).

The key error seems to be the "GdkWindow ...... unexpectedly destroyed" error which seems to occur at the start of each one of these "untitled window" situations.  What if instead the error should be "GdkWindow ...... I haven't finished creating it yet!"?  (Perhaps due to delays in getting X to allocate and attach the memory for the window???).

It sounds to me like (a) Firefox is missing a window creation method which includes something like:
   window = gdk_create_window(WAIT_UNTIL_WINDOW_REALLY_EXISTS);
or (b) needs some kind of spin wait after requesting the window creation:
   window = create_window();
   wait_until_window_is_completely_valid(window);
   /* now go attach the window as a tab in the parent window */

It may be that this is a bug in GDK/GTK/GLIB in which case this bug needs to be shifted to those (gnome.org?) developers.  Or it may be the case that Firefox is taking an advantage of a library behavior which is "undefined" which happens to work under high CPU load conditions under MS Windows but not under high CPU load conditions under Linux (perhaps because X windows is separate from the O.S. under Linux?)

In any case, it *should* be possible for Firefox to work around this problem by determining what fields must be valid in *window before it goes attempting to modify them (and generating all of the NULL/(null) errors).  This should be a required Firefox fix for Linux systems because there should be no requirement that people upgrade older system GDK/GTK/GLIB libraries to run Firefox without problems.  (You wouldn't require the MS Windows users to upgrade to Vista to run Firefox reliably would you...?)

As stated in my previous messages you *can* work around the problem without restarting Firefox if in the parent window you select the (blank) tab associated with the "untitled window" and close it.  This will close the tab and close the untitled window.  Unless however you close lots of tabs/windows you are unlikely to lower your CPU usage to the point where there is minimal chance of the problem happening again.  If Firefox has been running a long time and Firefox (and/or X?) have fragmented heaps the CPU usage will tend to be higher than if you restart them from scratch.  (This is a long term problem that should be resolved... but that's another bug.)

This entire thread does point out an interesting problem with Firefox stress testing.  Either there isn't any or it isn't very robust.  As I mentioned in Bug  370915, the problem seemed to happen to me either when Firefox+X were consuming lots of CPU time or when I was executing other high CPU load conditions, e.g. an package emerge of openoffice, seamonkey, etc. -- something that takes hours of CPU time and involves thousands of processes with relatively short execution times (seconds).  Stress tests which rely on a single CPU bound process running for hours (while testing Firefox performance) does *not* have the same scheduling characteristics under Linux as many short running processes (something I only found out by reading some of the Linux kernel scheduler documentation).

So, perhaps someone could go digging into the Firefox tests and see if there are any many-window/many-tab performance tests and or whether they are good enough to test under (a) High alternate CPU load and (b) High alternate memory load conditions.  For example, is there a test which simply calls Firefox with the home page of the largest 500, 1000, 2000 corporations (one could hand them to Firefox as command line arguments).  Is there a test that opens up 2000 tabs @ Amazon.com?  All of the URLs on the first 20 pages of digg.com?  The last 5000 slashdot.org stories?  (I can easily write a script that opens OMIM or PubMed URLs (there are hundreds of thousands and millions of those respectively) but the National Library of Medicine has bandwidth constraints on page download frequency).

In the meantime if anyone can tell me precisely what source file in the mozilla source tree does the window/tab creation, I may see if I can use that to track down what is causing the "DestroyNotify" error coming out of GTK.
Comment 30 erik red 2007-02-20 08:28:51 PST
Robert Bradbury,

I think you are an absolute hero for getting this close to identifying the bug. Operations on a window that X has not yet finished creating seems like a good theory. Please keep up the good work!!

While we are on the topic, is it not strange that X and/or Firefox does not seem to have robust heap management and garbage collection? I mean, If I create huge amounts of tabs, X/Firefox will get very large, but they do not seem to shrink much if I delete the window. Maybe I'm wrong, I'm certainly no expert on the innards of X nor Firefox.




Comment 31 Robert Bradbury 2007-02-20 11:26:22 PST
Update.  I strongly suspect the top level problem area is in:
  mozilla/widget/src/gtk2/nsWindow.cpp - nsWindow::NativeCreate(...)
between lines ~2825 tto 2897.  The lower level calls include such functions
as:
gtk_window_new(), gtk_window_group_add_window(), gtk_container_add(),
gtk_widget_realize() and gtk_window_set_focus().  The problem is that
...realize() and ...set_focus() functions are called from other than
NativeCreate() so debugging it is tricky.

In the course of trying to debug across a Create-TAB request, gdb ended up
handing me a "Cannot get thread event message: debugger service failed" error. 
Attempts to continue Firefox ended up with a SETFAULT and core dump (so I've
lost the error state, though I have the sessionstore.js file for it).

There is some interaction between the Create-TAB request and creating new
threads so the pthread() library gets dragged into this discussion (along with
GDK/GTK/GLIB).  I think it would help if people also made clear:
1) What processor you are using.
2) What thread library you are using.

I'm running a Pentium IV (Prescott) which has only 1 core but does support
hyperthreading.  I'm using the most recent release of the Linux Posix pthread
library (glibc 2.5 I think) and it looks like GLIB is supposed to be using
pthread_mutex_lock() and pthread_mutex_unlock to get and release locks.

It appears that it may be impossible using gdb (at least on my system) to debug
pthread_mutex functions (setting breakpoints at them results in the "... thread
event..." message mentioned previously).

It may be necessary to compile GLIB with G_DEBUG_LOCKS (glib/gthreads.h) and
set the proper debug flags and/or compile mozilla with MOZ_LOGGING at least for
the widget/src/gtk2 functions (see #define LOG() in
widget/src/gtk2/nsCommonWidget.h.  Of course adding logging to either the
widget functions or GLIB may disrupt the timing sufficiently to make the
problem go away.  One thing that appears key is the need to find out where the
DestroyNotify is coming from (see
gtk+/gdk/x11/gdkevents-x11.c:gdk_event_translate() -- case DestroyNotify:).
If your Gtk/Gdk library is compiled with debugging, running Firefox with
"export GDK_DEBUG=events" may help provide destroy notify messages on the
console log, but what one really wants is a way to do "_gdk_debug_flags |=
GDK_DEBUG_EVENTS;" (see GDK_NOTE macro in gdk/gdkinternals.h) after you have
loaded up all of the windows & tabs that lead up to the problem state.

I hope the above helps to put our hands around the problem.
Comment 32 Robert Bradbury 2007-02-20 11:28:55 PST
(In reply to comment #31)
> Update.  I strongly suspect the top level problem area is in:
>   mozilla/widget/src/gtk2/nsWindow.cpp - nsWindow::NativeCreate(...)
> between lines ~2825 tto 2897.  The lower level calls include such functions
> as:
> gtk_window_new(), gtk_window_group_add_window(), gtk_container_add(),
> gtk_widget_realize() and gtk_window_set_focus().  The problem is that
> ...realize() and ...set_focus() functions are called from other than
> NativeCreate() so debugging it is tricky.
> 
> In the course of trying to debug across a Create-TAB request, gdb ended up
> handing me a "Cannot get thread event message: debugger service failed" error. 
> Attempts to continue Firefox ended up with a SEGFAULT and core dump (so I've
> lost the error state, though I have the sessionstore.js file for it).
> 
> There is some interaction between the Create-TAB request and creating new
> threads so the pthread() library gets dragged into this discussion (along with
> GDK/GTK/GLIB).  I think it would help if people also made clear:
> 1) What processor you are using.
> 2) What thread library you are using.
> 
> I'm running a Pentium IV (Prescott) which has only 1 core but does support
> hyperthreading.  I'm using the most recent release of the Linux Posix pthread
> library (glibc 2.5 I think) and it looks like GLIB is supposed to be using
> pthread_mutex_lock() and pthread_mutex_unlock to get and release locks.
> 
> It appears that it may be impossible using gdb (at least on my system) to debug
> pthread_mutex functions (setting breakpoints at them results in the "... thread
> event..." message mentioned previously).
> 
> It may be necessary to compile GLIB with G_DEBUG_LOCKS (glib/gthreads.h) and
> set the proper debug flags and/or compile mozilla with MOZ_LOGGING at least for
> the widget/src/gtk2 functions (see #define LOG() in
> widget/src/gtk2/nsCommonWidget.h.  Of course adding logging to either the
> widget functions or GLIB may disrupt the timing sufficiently to make the
> problem go away.  One thing that appears key is the need to find out where the
> DestroyNotify is coming from (see
> gtk+/gdk/x11/gdkevents-x11.c:gdk_event_translate() -- case DestroyNotify:).
> If your Gtk/Gdk library is compiled with debugging, running Firefox with
> "export GDK_DEBUG=events" may help provide destroy notify messages on the
> console log, but what one really wants is a way to do "_gdk_debug_flags |=
> GDK_DEBUG_EVENTS;" (see GDK_NOTE macro in gdk/gdkinternals.h) after you have
> loaded up all of the windows & tabs that lead up to the problem state.
> 
> I hope the above helps to put our hands around the problem.
> 

Comment 33 Robert Bradbury 2007-02-20 12:34:28 PST
Sorry about comment #32. I was trying to correct a spelling error in #31 but there doesn't appear to be an easy way to do that.

Eric, regarding Firefox memory usage, I can kind of explain that problem.  I believe that Firefox does have garbage collection for Java and perhaps Javascript.  However everything else, the image management, the TCP/IP management, the window and tab management, etc. seem to all be written in C++ and C.  So those will normally go through the C++: new()/delete() functions, or C: malloc()/free() functions.  These all end up on normal Linux systems using the "standard" GNU glibc memory management functions which in turn rely upon the standard Linux(UNIX) sbrk() and brk() system calls.

The Glibc memory management function system *is* robust (I think it is 90+ pages of code).  The problem is that it was not designed to handle situations of "run-for-days" allocating and deallocating many small memory fragments.  Once you allocate such memory (in C++ or C) its location has to remain fixed in the virtual memory address space.  Over time that means that the heap memory becomes increasingly fragmented (lots of little small holes) and total memory usage tends to creep up.  This isn't the same as a "memory leak" where you are losing track of the memory.  The glibc memory management system *knows* where all the small fragments are -- the problem is it can't relocate the in use fragments (defragment the heap) to turn all of the unused small fragments into a single large free fragment (preferably at the end of the virtual address space) which could then be returned to Linux (and shrink your VM memory requirements).  In contrast, I believe Java, and perhaps Javascript, are sufficiently object oriented that you can relocate objects and perform garbage collection thus preventing the problem of excessive memory fragmentation in their heaps.

In practice, if you watch what Firefox is doing on the System Monitor you may sometimes see VM shrink if you open a window, open lots of tabs in it and then close that window, particularly if you have opened all of those tabs sometime previously.  But if they are "new" URLs, then those records may get added to your history list (which may be at the end of Virtual Memory).  In this case you can delete the window and the memory will be returned to glibc pool but because the history records are locking up the end of the glibc pool, glibc will not return the  memory to Linux.  In practice you only see VM shrink at the very end of an normal "Quit" request when Firefox has closed all of the windows, closed the bookmarks file, closed the history file, closed all TCP/IP handles, i.e. freed up *all* of the memory in the glibc memory pool.  Only when all of the memory in the heap is completely free will the glibc memory allocator reunite it all as one big hunk of free memory and return it all to Linux (in practice this is done by issuing a brk() system call to lower the last physical address of the process heap).

The "right" way to make this problem better is to put (a) the history records; (b) the Bookmarks file; and (c) image files into separately managed heaps away from the glibc functions (so glibc is primarily handling smaller short term data storage requirements associated with the creation of tabs, windows, data being loaded into them, etc.).  I've seen messages indicating that at least some of the Firefox core developers seem to be aware of this (they may need to use the mmap() function under Linux to get separate heaps) but they generally consider that this is either hard (cough) or not worth doing (because its only a problem under Linux???) and doesn't make any difference when users should be upgrading their machines to support more memory anyway... :-( If you google my name + Firefox + memory + heap you can probably find multiple rants by me on this topic on /. and other BBS systems.

It might be possible to adjust the glibc() memory manager to better handle the behavior of Firefox (the code does have ways its behavior can be tuned) but I've never seen any messages about anyone attempting to do this.  If someone wants to play with it... you want to look at .../glibc/glibc-2.#/malloc/ (malloc.c, mtrace.c & arena.c).  You may be able to determine how your system is compiled with something along the lines of "strings /lib/libc.so.6 | egrep "MALLOC_|MMAP_").  Googling might turn up a tutorial or two on how to change the environment variables.

(All opinions are obviously my own...)

(Also, if we are going to start a discussion on Firefox memory allocation I'm sure there are other bugs better qualified for discussion of *that* problem.)
Comment 34 Braden 2007-02-20 12:59:13 PST
FWIW... I'm observing this on a machine with 6 GB of physical memory.

So I am very skeptical that throwing more physical memory at this problem will alleviate it in the least. On the contrary, I'd be more optimistic about a theory that suggested large amounts of physical memory could aggravate the problem. But I don't have one.

That is not to say that fragmentation of the process address space could not be related to this. It does make a certain amount of sense--just considering the fact that this bug seems only to surface after the process has been running for a while. But, please, let's try to avoid clogging this report with *too* much speculation.
Comment 35 Robert O'Callahan (:roc) (email my personal email if necessary) 2007-02-20 15:37:52 PST
Robert: thanks for all the info!

My current hypothesis is that window creation is failing somehow and GTK isn't picking it up, or we're not checking GTK results correctly, and then we create another window with the failed window as its parent and this new window ends up as a rogue toplevel window.

So what'd I'd like you (or someone else) to try is setting a breakpoint at nsWindow::NativeCreate after you've received the first "unexpectedly destroyed" message, perhaps conditional on aNativeParent being non-null and aParent being null. When that gets hit, step through and see what happens. In particular see if the parentGdkWindow or the parentGtkContainer is associated with one of the windows that was unexpectedly destroyed.

There are a lot of reports of these "unexpectedly destroyed" messages happening to various apps over the years, but nothing much in the way of information about what causes them or how to resolve them...
Comment 36 erik red 2007-02-20 17:29:20 PST
Robert Bradbury, thanks for the lesson on firefox heap management. Great stuff.
I wish I had a better way of finding this kind of information from a web search.
Too many false hits on anything that relates to firefox.

Braden, Robert O'Callahan, Metler, Phil Ringnalda and everyone else:

I should have included you all in the initial kudos. Not that it matters much, I'm a total nobody around here anyway :-). But I'm sure you agree that Bradbury did a great job with his gdb magic.

I'm rooting for you all, unfortunately I'm not capable of contributing much else than anecdotal evidence about the bug.

Comment 37 Braden 2007-02-21 01:50:12 PST
(In reply to comment #24)
> In that call to NativeCreate, aNativeParent is non-null. So we should be
> hitting either
> http://lxr.mozilla.org/seamonkey/source/widget/src/gtk2/nsWindow.cpp#2722
> or
> http://lxr.mozilla.org/seamonkey/source/widget/src/gtk2/nsWindow.cpp#2724
> and setting parentGdkWindow or parentGtkContainer to something, which should
> ensure that this window gets created inside some other window.

We hit

2270        else if (aNativeParent && GDK_IS_WINDOW(aNativeParent))

and do

2271            parentGdkWindow = GDK_WINDOW(aNativeParent);

> Can you figure out why it isn't?

Umm... It seems that mWindowType == eWindowType_popup. That doesn't seem right; any idea why might it be happening?
Comment 38 Robert Bradbury 2007-02-21 07:41:02 PST
Morning update.  After the gdb debacle yesterday (gdb needs some work... :-(), I made the following changes.
1) Recompile gtk+ with --enable-debug=all (because GDK_DEBUG wasn't working);
2) Setup firefox-3.0a2 (with all the debug code) to run with:
     export NSPR_LOG_MODULES=Widget:4
     export NSPR_LOG_FILE=nsprlog.txt (= the Netscape error log)
     export GDK_DEBUG=events
     export GOBJECT_DEBUG=objects
       (GOBJECT_DEBUG may not be important for this problem but does seem
        to reveal some interesting problems with GLIB "leftovers" when
        firefox exits).
     firefox-bin 2>firefox.err (= the GDK error log)
3) Run firefox restoring the previous session (which had demonstrated the problem).

Now, the session that first showed the problem had 73 windows & 424 tabs.  When gdb went belly up I was up to 76 windows and 438 tabs (demonstrating the tab-start=>"untitled window" problem with some frequency but not every time).

I'm now up to 100 windows and 586 tabs (mainly using random URLs from the first 25 pages of Digg) with firefox-bin consuming 1.1 GiB of VM and 1.2 GiB Mem (is this including page tables???).  No problem.  I can't push this much further because I normally run firefox with a 1.4 GiB virtual memory limit (ulimit -Sv 1400000) -- due to Firefox's poor handling of memory allocation failures it will likely core dump if I push it to 1.4.  (If one allows Firefox VM >> system PysMem (1.5 GiB for me) ==> watch the system turn into a dog -- but this is really a Linux paging problem somewhat aggravated by the Firefox heap management problems so not for this discussion).

Firefox has been running for ~12 hours.  I ran it for a while with Java and Javascript disabled (because the logging seems to slow down new tab/window creation), but Javascript is now enabled without making much difference.  One difference may be that the AdBlock addon may not have been active in the previous instance when the problem did occur.  Noscript is active and is blocking Javascript on most sites (gmail exempted).  Gmail works fine (and it tends to be a moderately reliable "helper" (?) in my case to trigger the problem state).

CPU-wise firefox-bin+X are consuming 40-60% of the CPU time.

The debug log files (for NSPR & GDK) are rather large (10's of MB).  I am concerned that outputing the debug messages has changed the timing of Firefox+GDK/GTK/GLIB+X just enough to make the problem "disappear".

I'm not sure I understand yet the discussion of the nsWindow code, but would argue that until we know *precisely* where the DestroyNotify is coming from and why it is happening it may be difficult to know whether changing the upper level code has fixed or simply masked the real problem.  But given the number (5362) of "Gdk-Message: destroy notify" events I'm seeing in the GDK log file, it isn't going to be as simple setting a breakpoint in gdkevents-x11.c::gdk_event_translate():case DestroyNotify.  What one needs is a stack trace for the DestroyNotify that is at the start of the "untitled window" error sequence (which is presumably different from all of the other DestroyNotify stacks).  Given the problem of logging perhaps changing subtle timing requirements for the problem to occur it isn't clear that one could simply stick some stack tracing code into the DestroyNotify code.

Even if one fixes the nsWindow code so it works, that doesn't explain why it works almost all of the time now but fails in these random cases (when the CPU load is too heavy?  when Javascript has fragmented the heap?  when some race condition occurs in GDK/GLIB?)

One question might be whether there is an explicit way when Firefox is running to stop asynchronous activity (e.g. garbage collectors, which I presume are in separate threads)?  (In the fragmented heap situation, I presume that these will contribute to excessive CPU consumption by firefox, so it would be useful to (a) create the problem state; (b) turn off async CPU consuming threads; (c) see if the probability of the problem taking place diminishes).

It would also be nice if there were some in-place hooks so that one could put the stack trace code into the DestroyNotify code, restart firefox with an error-prone session and then hit the process with a SIGUSR1 to activate the DestroyNotify stack tracing after one has encountered the problem the first time.  At least then one would be dealing with perhaps dozens of traces rather than thousands.
Comment 39 Robert Bradbury 2007-02-21 08:08:15 PST
Created attachment 255907 [details]
Example of Window creation & destruction with NSPR_LOG_MODULES=Widget:4

Example of "normal" window creation & destruction trace (for comparison purposes).
Comment 40 Robert Bradbury 2007-02-21 08:18:04 PST
Created attachment 255908 [details]
Example of unusual window destruction case.

This is an example of a window destruction trace which occurs when firefox is "inactive", i.e. no firefox windows or tabs are being manipulated by the keyboard or mouse.  If firefox is free to create & destroy windows "behind the scenes" then debugging Bug #263160 is going to be a problem.
Comment 41 erik red 2007-02-21 08:49:56 PST
Robert Bradbury,

I have a suggestion that might help on getting the problem to show up more
quickly and with less windows/tabs:

After starting up the gdb/firefox combo, start up a 2nd firefox (I use firefox
--class=mysecondprofile) and have it load anther session with plenty of
tabs/windows. Start a 3d, 4th, ... fox as well if you like.

When I do this, each of the foxes need not have so many windows and tabs before
the Untitled windows start popping up.

This method rhymes with your working assumption about the bug being related to
X interactions: having multiple foxes creates more competition for getting the
attention of X, and competition for X resources will cause unpredictable delays
in creating windows etc.

Also, this way, the size of the gdb/fox combo may also be more manageable.
Comment 42 erik red 2007-02-21 08:58:17 PST
Robert Bradbury,

In response to comment #40, I also get Untitled windows popping up when FF is idle. No need to be doing any keboard or mouse operations. Presumably this happens because one of the "busy" web pages (www.marketwatch.com) is updating/changing/re-creating some advertisement frame.

So it is probably some javascript or java doing the deed (I have javascript enabled everywhere, with Adblock and Flashblock add-ons, an popups blocked).
Comment 43 Robert O'Callahan (:roc) (email my personal email if necessary) 2007-02-21 11:01:08 PST
(In reply to comment #37)
> 2270        else if (aNativeParent && GDK_IS_WINDOW(aNativeParent))
> 
> and do
> 
> 2271            parentGdkWindow = GDK_WINDOW(aNativeParent);

Which is what? null? or something that was destroyed?

> > Can you figure out why it isn't?
> 
> Umm... It seems that mWindowType == eWindowType_popup. That doesn't seem right;
> any idea why might it be happening?

No. Are you sure? Look up the call stack to see where it got set...

Comment 44 Robert O'Callahan (:roc) (email my personal email if necessary) 2007-02-22 12:04:46 PST
*** Bug 244482 has been marked as a duplicate of this bug. ***
Comment 45 jnoyes-sf 2007-02-22 12:35:42 PST
So how does a bug opened 5 months earlier get resolved as the duplicate?

What a convenient way to make this bug look 5 months younger, drop its vote counts, and reset all its flags.

Sheesh.  This is never gonna get fixed.
Comment 46 Robert O'Callahan (:roc) (email my personal email if necessary) 2007-02-22 13:38:30 PST
Because this bug has all the analysis and people actually helping to work on it, that's why.
Comment 47 Robert Bradbury 2007-02-22 17:24:41 PST
Jnoyes, not so fast.  Robert O'C. is right -- to solve this bug we are going to have to concentrate experiences and knowledge in one place.  But not solving it is not in the cards now that I have a handle on it (i.e. I've got at least one core complete core dump of the process state when the problem took place as well as at least some understanding of the complexity of the problem.)

Let me recount experiences over the last day.  I backed off on running Firefox with the debug flags enabled (generating the large log files).  I've now recreated the most of the original situation (i.e. all the URLs) + some more.  Am up to 107 windows and 658 tabs in Firefox and it has been running most of the day without any problems.  Its consuming 1.1 GiB VirMem, 1.2GiB Mem, 511 MiB X server Mem (on a 1.5 GiB PhysMem w/ a 1.4 VirMem ulimit set).

I tried to lean on the process switching Firefox+X CPU use theory.  Nada.  I ran glxgears (on a non-DRI X instantiation).  That drove X use up to the point where Firefox was very slow (30+ seconds to minimize a Firefox window).  As Firefox was pretty unusable in that situation I backed off to a situation of running a continual loop of MPlayer video+sound side-by-side with Firefox.  That bumps X usage a bit but does not present the problem with Firefox working after a fashion.  Overall the system is currently @ 100% CPU use, varying something like 30-60% firefox-bin, 10-30% X, and most of the rest gnome-system-monitor.

Now here is the interesting part.  While I was trying to resume the complex Firefox session last night, my seamonkey session went belly-up (with the *same* problem) -- i.e. starts out with "Gecko:Process-#): Gdk-WARNING **: GdkWindow 0x######## unexpectedly destroyed" followed by many errors regarding GDK/GLib trying to manipulate a null objects or assertion failures.

In all cases that I've seen this problem starts out with the "unexpectedly destroyed" WARNING.  So for people who want to work on this you have to start your Firefox/Epiphany/Seamonkey session from a terminal in which you can log the GdK/GLib WARNINGSs/errors.  (And Gtk+(GDK+GTK) & Glib need to be compiled on your system in --enable-debug=yes mode to allow such messages to appear).

Now Firefox 3.0a2 and Seamonkey 1.0.7 are relatively distinct on my system.  The Firefox instantiation is running with the debug libs (separately downloaded and compiled for debugging).  Seamonkey was running with the standard system libraries (though I'm running it with the debug libraries now).  Firefox was compiled in full debug mode, Seamonkey was not.

Now, common aspects.  My normal Firefox runs (including the current run) is running with NoScript enabled (currently not showing the problem when one might expect that it should).  The Firefox 3.0a2 run when the "unexpectedly destroyed" error took place may not have had NoScript in effect.  Seamonkey also does not normally have a NoScript activity in effect.  So this raises the open question of whether Javascripts are running amok in such a way as to corrupt GDK/GLIB memory?  Or perhaps whether a Javascript garbage collection takes place from time to time when CPU resources should be shifting directly between Firefox and X.

To resolve this we are going to need to answer the question of whether the problem ever occurs when Javascript is entirely disabled?

I have yet to see the problem occur when Javascript is completely disabled.  Indeed the probability of the problem taking place seems to correlate with the number of sites allowed to use Javascript.

So it seems to me we are back to
(a) how do we define a case which reliably demonstrates the problem?
(b) does the problem vary with the Javascript usage of "active" URLs?

It is worth noting that the completely asynchronous appearance of the problem is most likely due to sites doing auto-refreshes of some kind.  This problem can occur when the user is doing nothing and the sites are doing something behind the scenes.

Also important to diagnosis...
1) Are you running a hyperthread enabled or multi-core CPU?
2) Are your libraries compiled to provide the GTK debug warnings?
3) What are you running for Linux pthreads? (older glibc or other or newer Posix?)

Eric, I will consider the --class alternative.  I would need to write a perl script to split the sessionstore.js file into a hundred individual firefox invocations (this would take a few days or more given my other priorities).  I agree that the idea has merit but I would like to see how my current state (100+ windows in Firefox + a dozen or in Seamonkey) plays out.

In the mean time I would encourage people interested in this problem to move in the direction of having their gdk+ and glib system libraries compiled with debug symbols available.
Comment 48 Robert Bradbury 2007-02-23 05:31:17 PST
Created attachment 256156 [details]
gdb trace of SegFault while attempting to trigger this bug

Well, after a day Firefox finally core dumped.  But it does not appear to be related to the destroy window activity.  Firefox was running at around 1.1-1.2 GiB (with a 1.4GiB ulimit set) so I don't think it was a problem of hitting the memory limit either.  I believe the activity was returning from a message in gmail back to the Inbox (so it was attempting to redraw the list of messages in the Inbox).  As firefox is so slow after one has 100 windows open I was working in other windows and am not precisely sure what it was doing.  If anyone wants to dig into a 1.16 GiB core file however I'd be happy to hand it over.
Comment 49 erik red 2007-02-23 09:28:14 PST
Robert Bradbury,

I should correct myself. What I was doing was

firefox --class=Firefox1 -P default
firefox --class=Firefox2 -P profile2
firefox --class=Firefox3 -P profile3
...

and so on. But you probably realized what I meant already. I encourage you to give it a try. Even if you load he same session in all of the instantiations, I think it will do the trick of tripping up the bug.
Comment 50 Robert Bradbury 2007-02-24 13:09:39 PST
Created attachment 256310 [details]
gdb trace of seamonkey around the time of problem

The attached trace is a debug of seamonkey in the "problem state".  I threw two "untitled windows" right after each other in seamonkey when accessing the NY Times.

Notes:
1) I run Firefox with Noscript active, seamonkey (as currently installed does not limit Javascript).
2) Seamonkey is not compiled with -g2, so code information may be limited.  It was however running on -g2 compiled gtk+ libs.
3) It is most likely not a memory consumption related problem.  Firefox had aborted several days ago and Seamonkey was only consuming ~300MB on a 1.5 GiB main memory machine.  (In contrast Firefox was consuming 1.1-1.2 GiB and not throwing the problem.)
4) I note from the stack trace that one of the threads (thread 4) appears to be doing a DNS lookup on "graphics8.nytimes.com" which may be being driven out of a javascript enabled thread.

This may be important.  I normally run Firefox with NoScript enabled except primarily for gmail.com and NCBI PubMed.  Seamonkey runs with Javascript completely operational (and thus the NY Times advertisements can run away with the browser).  This tends to be consistent with my experience in Firefox.  E.g. I am much more likely to spring the problem when gmail (with javascript) is running than when it is not.

Now, while bearing in mind that the stack trace is *after* two subsequent "untitled window" errors had taken place, it is interesting that when GDB attached to the process it was still doing a DNS lookup.  The problem in my mind is not explicitly related to Javascript running or the async DNS lookups but in the interference in GLIB/GDK/GTK they may introduce.

Further information regarding the parent "tab".  You can take the tab which does not normally display anything once the "untitled window" has appeared and run it <BACK>.  In that case it will properly display the previous window.  But it does not destroy or manipulate the "orphan" untitled window.  Only closing the tab for the orphan window will eliminate it.  So there is still some link between the orphan window and the tabs.
Comment 51 Robert Bradbury 2007-02-25 08:12:08 PST
Created attachment 256359 [details]
Firefox segfaulting in gdk_window_get_root_origin?

I'm not an expert at interpreting these stack traces yet, but it looks like this segfault is in gdk/x11/gdkwindow-x11.c:gdk_window_get_root_origin() which is called from nsWindow.cpp:WidgetToScreen.  My suspicition is that GTK_WIDGET(mContainer)->window is hosed but I don't understand the structures enough to know for sure.  If anyone wants to advise on anything that needs to be looked at in the core dump let me know.

What I was doing at the time was a Google search and had just done a search along the lines of "firefox fetch CVS site:mozilla.org".  It segfaulted before it could display any of the results.

It was not a memory problem (the core file is only 282 MB).  Nor was the system particularly busy.  The libraries were gtk+-2.10.9 and glib-2.12.9.

Just as an FYI, for those of you who build firefox from scratch and who want to debug these types of problems, I would highly urge you to compile toolkit/xre/nsSigHandlers.cpp (aka xpfe/bootstrap/nsSigHandlers.cpp) with the -DDEBUG so you can get stack traces from within Firefox rather than having to mess around with gdb.  You don't want to compile the entire source with DEBUG because there are 1200+ instances where it adds additional code which one may not want in general.

It also looks as if we can arrange glib so if we can tell it *when* there is a problem (using a signal or the debugger to set a flag) we could have it produce stack traces in gdk_event_translate() on every DestroyNotify by calling the Netscape function DumpStackToFile(stderr).  (Of course libgtk won't be usable by anything other than Mozilla/Firefox at that point but we are debugging a *real* problem here.)
Comment 52 erik red 2007-02-28 09:37:49 PST
Robert B,

Would you consider making your debug environment available by ftp? If I had your setup, including a dot.gdb file or equivalent gdb setup commands, I could try some runs and see if any usable data came out of it. Or is it fair to say that you have enough data at this point? I'm on Fedora Core 5.

Comment 53 Robert Bradbury 2007-03-01 09:15:04 PST
Erik, I'm working towards precisely what you ask.  I am *almost* at the point where I have GDK turn on stack tracing after the first orphan window condition occurs.  This is a *REAL* pain as it involves a callback from the GDK library which is written in C to the Mozilla stack trace code which is in C++ (I am learning things I really didn't want to have to learn about allowing C to call C++).

But getting a workable static debug variant of Firefox 3.0a2 is proving difficult (there are problems in the Mozilla cairo/pango code -- even the latest CVS sources refuse to compile).

I am making available the relatively static debugable Firefox 2.0 I managed to assemble at one point.

URL: http://www.aeiveos.com:8080/~bradbury/Firefox

Note the firefox-2.0d link.

This is a complete 2.0 install directory which does work for me (Firefox has some real problems if you don't give it a complete environment on startup -- but that is another bug).

At any rate you should note the firefox-bin file (which is the key component) at 139 MB (debugging doesn't come cheap...).  If you do a ldd analysis on that firefox-bin you will see that it uses a limited set of system libraries.  That is because almost all of the other libraries have been compiled into it having been compiled with debugging enabled.  So if you choose to use it to diagnose the problem (since we seem to have it narrowed down to a mozilla / gtk / pthread / glib arena) it may be a useful.

Also please note, the downloads are going to be fed via a standard U.S. DSL line so ones use should be balanced.
Comment 54 Robert Bradbury 2007-03-01 14:01:25 PST
Erik, I'm working towards precisely what you ask.  I am *almost* at the point where I have GDK turn on stack tracing after the first orphan window condition occurs.  This is a *REAL* pain as it involves a callback from the GDK library which is written in C to the Mozilla stack trace code which is in C++ (I am learning things I really didn't want to have to learn about allowing C to call C++).

But getting a workable static debug variant of Firefox 3.0a2 is proving difficult (there are problems in the Mozilla cairo/pango code -- even the latest CVS sources refuse to compile).

I am making available the relatively static debugable Firefox 2.0 I managed to assemble at one point.

URL: http://www.aeiveos.com:8080/~bradbury/Firefox

Note the firefox-2.0d link.

This is a complete 2.0 install directory which does work for me (Firefox has some real problems if you don't give it a complete environment on startup -- but that is another bug).

At any rate you should note the firefox-bin file (which is the key component) at 139 MB (debugging doesn't come cheap...).  If you do a ldd analysis on that firefox-bin you will see that it uses a limited set of system libraries.  That is because almost all of the other libraries have been compiled into it having been compiled with debugging enabled.  So if you choose to use it to diagnose the problem (since we seem to have it narrowed down to a mozilla / gtk / pthread / glib arena) it may be a useful.

Also please note, the downloads are going to be fed via a standard U.S. DSL line so ones use should be balanced.
Comment 55 erik red 2007-03-01 14:45:13 PST
Robert B,

The file date was surprisingly old,
is this the right one?

firefox-bin             24-Dec-2006 12:26  139M  
Comment 56 erik red 2007-03-01 14:51:46 PST
Robert B,

I saw you said "managed to assemble at some point", so that probably means that
2006-12-24 is indeed the version you meant.
Comment 57 Robert Bradbury 2007-03-02 07:58:19 PST
Yes, that version is the Firefox 2.0 version.  As such it is old.  I have not yet been able to get the 3.0 version to assemble in the same way.  But given the history of this problem I think it is present in most if not all versions of Linux based Firefox.  You can attach firefox-bin in gdb and set a breakpoint at "g_log_default_handler" and that is the initial entry point for all subsequent errors.  Under "normal" operations, Firefox should never be posting GDK/GLIB errors.  The problem (IMO) is that the g_log_default_handler is an upper level reporting function.  What we really need to get our hands on is the DestroyNotify condition which is occuring at a much lower level.  I believe that this is being driven either by (a) a much higher misuse of the gdk code (as some discussion) seems to indicate; or (b) a much lower level problem in the gdk and threads usage.

But it seems apparent at this point that it is Linux/GTK specific problem so we are not going to receive assistance from Firefox developers focused on other operating systems.

Comment 58 erik red 2007-03-02 11:28:22 PST
RB,

I tried the binary last night. Here's what I did:

1. make a copy of my firefox-2.0.0.1 to firefox-2.0d
2. drop you firefox-bin into firefox-2.0d
3. env LD_LIBRARY_PATH=/usr/local/firefox-2.0d MOZ_NO_REMOTE=1 usr/local/firefox-2.0d/firefox --class=Firefox1
4. fix some .so filenames (link .so files to .so.6 or .so.11, for example)
5. try again, programs starts
6. gdb attach

At this point I was not quite sure what to do in gdb, I tried "cont" and later "run". I didn't get any windows, though, and there was hardly any cpu usage.

Is my recipe sound?
Comment 59 Robert Bradbury 2007-03-03 13:21:01 PST
Erik, It sounds as if you have the right approach.  This entire process is a real *pain* due to the need of getting the .so's right.  That is why I attempted a 2.0 compile with as many static debug libs as possible.  It appears at this time that given general Linux system configurations and the Firefox dependencies on system libraries that it is impossible to actually compile a fully "static" Firefox under Linux (see bug 372269 for example).

Now, with respect to your debugging scenario.  If you have the firefox-bin installed correctly (and all of the libraries upon which it may depend -- dicey question I'll admit).  The firefox-bin should start up and run (if it doesn't then there is some kind of library compatibility problem).

So the simple question is can you get firefox-bin up and running on your machine.  If it runs, then at least we have the possible library incompatibility problems under control.

Now, firefox-bin is compiled with debugging symbols for Firefox as well as being a static link to the gtk+, glib, stdc++ and glibc with debug symbol libraries.  So in theory you should be able to grab almost anything of significance in the binary.  Firefox still seems to load "dynamic" libraries, (e.g. thai fonts from either cairo or pango) so it is not fully "stand-alone".  The "debugability" of those libraries depends upon how they are compiled on your system (but since that isn't the focus of this bug it may be a no-op).

Once you have firefox-bin up and running normally, you can attach it from gdb.  Simply gdb the path to the binary and the process number.  You then want to set a breakpoint at g_log_default_handler.  This should never be reached in normal operation of the browser.  It is triggered on the first "window unexpectedly destroyed" error takes place.  The problem is that that error is being reported at a level higher than that which is responsible for the problem (blame the GTK+ library architecture).  So once Mozilla/Firefox is in its "handicapped" state, it is necessary to get down into the guts by setting a breakpoint around: gdk/x11/gdkevents-x11.c: 1703 which is where the DestroyNotify call is most likely being generated.  Then you have to lean on Mozilla/Firefox in the handicapped state and get stack traces from it in the various DestroyNotify conditions to understand precisely when and why it is destroying windows.

I would fully expect, given my experience, that M/F in this state "works" some of the time that one is going to be dealing with kind of a 50:50 probability of the stack traces being useful.  The "gold ring" in this case is having someone say this is a stack trace for a DestroyNotify Event which in turn generated Firefox attempting to mess around with a window which no longer existed.

The fundamental questions may be why was the window "destroyed" and why did M/F not recognize it as such (and one would hope compensate for it).

As a warning, I do not know if you will be able to work with this in your proposed multi-profile scenario.  I suspect that one is going to have a problem of attaching multiple gdbs to multiple firefox-bin's and attempting to manage that.  That sounds a bit tricky.
Comment 60 erik red 2007-03-03 13:38:08 PST
Thanks, RB. So is it "run" or "continue" once I get in gdb? Or does it not matter?
And what about the 300s delay -- should I leave gdb alone for 300s before I try,
or "continue" gdb and expect a 300s delay before anything happens? Can we reduce the 300s delay without recompiling?

My main problem was that Forefox just sat there and didn't even give me one window. And no CPU usage.

Multiple profiles: The extra firefox instantiations is just to create contention for memory, X and network bandwidth. I was planning on connecting gdb only to "Firefox1".
Comment 61 Robert Bradbury 2007-03-03 16:11:05 PST
> Thanks, RB. So is it "run" or "continue" once I get in gdb?

If you are running straight from the shell, e.g. "gdb firefox-bin" then you have to use "run" to get things going.  If you attach an existing process, e.g. "gdb firefox-bin process-id-#", then once gdb gets going it will suspend the process and you have to "continue" it to get it to respond again.


> And what about the 300s delay -- should I leave gdb alone for 300s before I
> try, or "continue" gdb and expect a 300s delay before anything happens? Can we
> reduce the 300s delay without recompiling?

I am unsure about the 300s delay.  Firefox-bin is *not* a small program and loading in all of the symbol tables takes a while (depends on the speed of your system).  But once you get a gdb prompt you should be able to set breakpoints, do stack traces, run, continue, etc. without excessive delays. 

I would strongly urge you to find a "GDB Quick Reference" manual.  Google should offer it up and its only 2 pages.  There are perhaps a dozen commands from that that are essential (which I can advise on but is perhaps better done one-on-one).

> My main problem was that Firefox just sat there and didn't even give me one
> window. And no CPU usage.
Sounds like it was not running or "suspended".  When I run firefox on my machine, the first thing it attempts is to request which profile I would like to run (I've got several).  If you get that far is is "running".


If you run it from the start out of gdb, you can also set breakpoints using the gdb "break command, e.g. "break XRE_main" or "break gtk_init" or "break gdk_init".  If those trigger a stopping of Firefox (i.e. it becomes unusable) and an activation of gdb (such that you can do "backtrace"(s)) (the critical "incantation" in gdb is "thread apply all bt".  That will show you what the various firefox threads are doing.  (Firefox has to go through all of these breakpoints before it is really "running".)

> 
> Multiple profiles: The extra firefox instantiations is just to create
> contention for memory, X and network bandwidth. I was planning on connecting
> gdb only to "Firefox1".
>

Nothing wrong with that if Firefox1 is the one which springs the error.  But if I understand your approach it sounds like you are going to have multiple instantiations of Firefox (each with a separate process-ID affiliated with specific profiles).  I'm not optimistic that you will be able to break 1 vs. 2 or 3.  (Take it from someone who has tried 100 windows and half-a-thousand tabs.)    Fortunately, if you start them up in separate shell windows, I believe the Glib/GDK/GTK errors provide the process-ID of the process which throws the "unexpectedly destroyed" error.  You should be able to attach GDB to that specific process, do stack traces, set breakpoints, and continue the process (so in theory if you throw more URLs at it it will throw the error again).

So, Q1 is will firefox-bin run on your machine (ignoring gdb involvement).
Then Q2 is whether you can generate the window unexpectedly destroyed (untitled window) error in a relatively reliable fashion (I can do it but I can't do it reliably).  Then Q3 is whether once you have generated the untitled window problem you can use gdb to attach to the specific firefox-bin which threw the error.  Then at that point things start to get into the we need more detailed information category and and until we have it Q4 remains an open question.

It would be useful to know whether the version of Linux you are running is running with glibc with linux-pthreads.  That is how the firefox-bin I provided was compiled.  Linux-pthreads is supposed to be significantly faster than older implementations of threads and so its probability of generating errors (if we are dealing with a very subtle timing issues) may be much different from those which may have been present in earlier releases of Firefox and/or running on earlier releases of Linux.

(And as an aside to any "real" Firefox developers reading this thread, given the increase in the number of cores on processors that we can anticipate (if this turns out to be a subtle timing problem) -- you have a serious Q/A problem.  Because if you can't guarantee that Firefox should fail gracefully on a machine with 64B of memory [1] (which it does not) How can you assert that it will work on machines with 2, 4, or 8 cores?).  And I don't particularly care if it works on an 4 core Windows Vista machine.  I only care if it works on an 8 core Linux machine or an 8 core FreeBSD machine.

1. Netscape 4.72 did not even come close to this requirement for memory.

Comment 62 erik red 2007-03-03 18:01:19 PST
Created attachment 257189 [details]
case of using RobertB firefox-bin 2.0 version with debug instrumentation
Comment 63 erik red 2007-03-03 18:09:32 PST
RB,

I realize now that my firefox process based on your debug binary never really ran properly in my environment. It got killed by signal 11, but then it strangely continued with assorted messages about how to attach gdb, and so on.
The problem is related to symbol SSL_ImplementedCiphers.

Please see the attachment above. You will also see the message about the 300sec delay, which perhaps you don't get when you run. I thought maybe it was there to give me time to start gdb.

Summary: There is something fundamentally wrong with my setup using your binary, so I have not been able to produce any useful data.

About linux-pthread: "locate linux-pthread" produces no matching filenames on computer. I'm running Fedora Core 5.
Comment 64 Hixie (not reading bugmail) 2007-03-06 18:32:36 PST
FWIW, I've been seeing this a _lot_ in the last few months.
Comment 65 Robert Bradbury 2007-03-12 08:29:06 PDT
Erik, I've been wrestling with trying to get a 3.0 version compiled (not yet but getting there).  But briefly on your gdb attachment.

Netscape has two "portable" system library sets, the portable runtime "nspr" [1], and the security "nss" [2].  Presumably so one can share them between Mozilla, Firefox, Seamonkey, Thunderbird, etc.  These are usually in subdirectories under /usr/lib, etc.

While, the binary I released has the "nspr" functions, compiled with debugging, loaded into the binary, the "nss" functions are *much* more problematic.  Some of them may be compiled into the binary (I'd have to check), but it may try to access others at runtime (from your system libraries).  If so then there could quite possibly be problems intermixing the nss libraries.  There is also interaction between the nss libraries and the SSL libraries on your system (probably).  So there is ample opportunity for difficulties.

Do you run into the problem if you completely avoid https: pages (or anything likely to request a password)?

One possibility - try renaming your nss libraries, e.g. mv /usr/lib/nss /usr/lib/NSS and see if it runs [I've never tried this].  You might be able to run fine and then only get a runtime error at those times that it tried to use encryption.  (Or it might fault in a clearer location saying that the NSS libraries are unavailable -- in which case we have another "bug" that the Mozilla code doesn't cleanly handle missing security libraries).  [There are lots of examples of this -- you ought to try starting it without the various subdirectories containing the icons, "pseudo-code", etc. under MOZILLA_FIVE_HOME (MOZILLA_LIBDIR) sometime... :-(]

1. http://en.wikipedia.org/wiki/Netscape_Portable_Runtime
2. http://developer.mozilla.org/en/docs/NSS_FAQ
Comment 66 Robert Bradbury 2007-03-13 08:29:15 PDT
Further update.  I've beem working with SeaMonkey 1.1.1 because its memory requirements seem significantly lower than Firefox (why is this???).  At any rate this morning I managed to spring the untitled window problem several times.  The system is *not* memory constrained (I've got 20+MB of kernel file system buffers).  It is however relatively busy, running mplayer and a fairly high network load (~20-30% busy over a DSL line).  The same URL will not always reproduce the problem.  I sprang it both on the result of an eBay query as well as google query results (where I commonly make multiple "Open Link in New Tab" requests in rapid succession before the initial request has completed).  I think this may be a critical aspect of this -- trying to get the browser to open multiple tabs (or the resizing of window contents of half-downloaded/drawn windows(?)) at the same time.  This may explain why it so hard to reproduce -- because its the user + network + Gtk/X window system simultaneous activity that is essential.  Again I'm back to my earlier comments that I think this is a subtle thread/locking/serialization issue.   But its a combination of the machine load + the Mozilla use of GTK that is required to trip over it.

The work-around is to select the tab with no displayed contents, i.e. the untitled windows "parent" tab and copy its URL.  Then close that tab (which will close all the untitled windows [which although they display the URL don't have the "widow dressing", e.g. scroll bars, required to do anything with them]).  Then open an entirely new browser window (ctrl-N), paste the copied URL.  The page seems to always display properly for me.  It is highly annoying to have to do this however.

I've been running SeaMonkey for 2 days, but it isn't heavily loaded.  About 20 windows, maybe 60-70 tabs, only consuming 176MB (VirMem) / 51MB (ResMem).  The trick seems to be to make enough process switching take place that the Mozilla/GTK threads experience delays (which is probably why one frequently runs into it when one is either up against the system memory limits or when one is running large system builds in the background.
Comment 67 Robert Bradbury 2007-03-13 09:23:13 PDT
Jackpot sort of.  It turns out that with my system in this state the bug may be easier to reproduce.  I was able to reproduce the problem 3 times using gmail.  Gmail appears to be particularly adept at reproducing the problem due to its interface.

The steps appear to be.
0) Make you network "relatively" busy (i.e. such that an Inbox "fetch" from gmail will not be "instantaneous".)
1) Ctrl-T to open a new tab.
2) Enter http://www.gmail.com/ in the URL box.

When you do this, gmail should startup (I've got it setup so it does an autologin) and give you the [LOADING...] message in the upper left corner of the tab window.  Now, shortly after this, gmail will throw up 2 untitled windows (if you are going to encounter the problem).  Gmail seems to treat one of these as the "Inbox" window and the other as the "messagebox" window.  Now, unlike the classical case, gmail will usually throw up identical images in both the "tab" window and the separate "Inbox" window.  If you check a box in the tab window *or* Inbox window the image will usually change in *both* windows.  If you click on a message it will show up in the separate messagebox window (and the tab window most of the time I think).  I think what is happening is that Gmail has two window images (Inbox & Messagebox) that it is flipping back and forth into the tab window.

Points of note.  If you open something else, e.g. www.google.com in the "fresh" tab *before* you open www.gmail.com it doesn't seem to spring the problem (i.e. you will get normal gmail behavior within the tab).  You have to open gmail in a "fresh" tab.

Gmail can also get "stuck".  This appears to happen when it downloading the Inbox contents (on an bandwidth limited connection).  In which case you will not get the 2 untitled windows and the inbox will not display in the tab window.  Now interestingly, after I closed the tab in this case, then tried it again I got *4* untitled windows (2 apparently from the closed tab and 2 from the new tab).  When I closed the new tab all 4 untitled windows went away.  Very briefly during the inbox/window setup process seemed to appear an item at the bottom of the window indicating that it was contacting (Waiting for/Transferring data from) "chatenabled.google.com" (which I presume it does after downloading ones Inbox to setup the google chat aspects of ones gmail screen).

I've got a couple of gdb traces I'll be attaching.
Comment 68 Robert Bradbury 2007-03-13 09:35:11 PDT
Created attachment 258427 [details]
gdb trace of Gmail hung during loading Inbox

This was the trace where gmail hung during loading the Inbox.  It did not seem to spring the message it sometimes does involving the fact that loading the Inbox was taking too long (perhaps because the "secondary" window creation was hung as well?).
Comment 69 Robert Bradbury 2007-03-13 09:37:18 PDT
Created attachment 258428 [details]
gdb trace of gmail hung with 4 untitled windows

This is the trace when gmail had brought up 4 untitled windows (presumably 2 from the previous request to gmail, and 2 from the current request).
Comment 70 Robert Bradbury 2007-03-13 09:46:16 PDT
*** Bug 352178 has been marked as a duplicate of this bug. ***
Comment 71 Robert Bradbury 2007-03-13 11:52:37 PDT
The plot thickens even more...  Ok.  While running moderately high network load (perhaps 50+% of outgoing bandwidth) *and* while running a limited Firefox build (80+% CPU consumption)...  New Window (pulls up www.google.com as the default window), change the URL to www.gmail.com.  Spawns the typical 2 untitled windows and hangs in the primary tab window with a red "[Loading...]" box in the upper right hand corner.  Other SeaMonkey windows appear to work fine.  Start a second new window (starts with www.google.com), reset URL to www.gmail.com, bang, 2 more untitled windows (now we have 4).  This time it appears to hang for a long time with "transferring data from 'chatenabled.mail.gmail.com' [a correction from the previous 'chatenabled' labeled URL]".  Eventually (10+ min?) all of the gmail inbox downloads, chat "enabling" appear to complete.  However the gmail windows (which have the associated "untitled windows") are not functional.  In a normal browser window/tab (in the same session) I can scroll up & down normally.  In the gmail windows I cannot -- they will scroll but it takes minutes for them to do so.)

Ok, minor qualification.  If I have one window which has 2 tabs, one a "normal" URL, the other a "hung", 2 untitled window "enabled", gmail ... If I attempt to move up or move down in the gmail window (i.e. by single clicking above or below the window position marker in the scroll bar) nothing will happen.  If I attempt to drag up or drag down the window position marker nothing will happen.  If I click up or click down *and* switch to the non-gmail tab and switch back to the gmail tab the view will have scrolled (up or down).

I would guess we have thread activation or priority problem.  It would appear that Mozilla/GTK is ignoring signals coming from the current window (i.e. scroll this window up or down) but is responding to those signals when the window is reactivated after having been deactivated.  Of course in the case of gmail there is a complex interaction with javascript going on.
Comment 72 Robert Bradbury 2007-03-13 12:12:04 PDT
Created attachment 258441 [details]
gdb trace of SeaMonkey hung in gmail with 4 untitled windows

This is an alternate to the previous trace.  This is a completely different instantiation of a gmail window (with untitled windows) being "hung".  Indeed under this trace there are 4 untitled windows attached to 2 "gmail" tabs at least one of which is fairly unresponsive (i.e. it will seem to respond if I switch to other tabs or windows but not if I sit and wait on the primary window).
Comment 73 Robert Bradbury 2007-03-14 04:58:34 PDT
A bug report for this was filed in the Gnome bug database (#417973) which was marked as a duplicate of:
http://bugzilla.gnome.org/show_bug.cgi?id=352408
which is where I am now posting in an attempt to get the attention of the gtk+ developers.

I will confirm an additional situation of opening a new window, starting gmail and getting untitled popup windows.  This did *not* occur when non-browser network traffic was stopped.  So I think a key element to reproducing this bug involves having a system load which delays prompt processing of the browser network requests (perhaps DNS lookups which I think may involve a specific SeaMonkey/Firefox thread(?)).
Comment 74 Robert Bradbury 2007-03-14 06:07:21 PDT
I have refiled this as a GTK+ bug.  See:
http://bugzilla.gnome.org/show_bug.cgi?id=418199
We shall have to see whether the GTK developers bounce it back into the Mozilla camp.
Comment 75 erik red 2007-03-22 09:02:38 PDT
Shoot, no response over in GTK+ land. Could it be that the bug title did not ring a bell when people read it? It may not sound like a GTK+ bug just from reading the title.

"Gmail / Firefox / SeaMonkey / Epiphany fail to manage windows properly"

I like the bug title on *this* site better, is there a way to change the bug title and see if anyone reacts? Something like

"frames open in new GTK+ windows, leaving firefox unusable"

Only intended as a friendly suggestion!
Comment 76 Howard Chu 2007-04-03 15:36:08 PDT
I'm also seeing this behavior in Suiterunner. It's happened off and on for the past several weeks using the browser. Today is the first time I saw it with MailNews though - I had just finished reading a message in the preview pane, deleted it, and the next new message popped up in a separate window instead of in the preview pane. All the same symptoms as above - GDK errors, etc...
Comment 77 Robert Bradbury 2007-05-12 11:28:29 PDT
Created attachment 264615 [details]
Full trace of GdkWindow %#lx unexpectedly destroyed

This is a subset of a much larger debugging session.  This is running firefox under gdb with args set to "--sync".  In particular this show the start of a window unexpectedly destroyed error set.

In this instance I think I know what caused it because the full thread trace reveals images being loaded from a site which "auto refreshes", in particular the page which displayed the "untitled window", had the following code:
  <meta http-equiv="REFRESH" content="900">
Now, the page in question also had multiple javascripts, and though I was running with NoScript enabled, the page appears to have some very strange code which appears that it might be designed to prevent NoScript from functioning properly (I don't know how NoScript disables javascripts).
I also know that system builds were running and the CPU was maxed out.

This suggests that one could build a debug test case by simply opening a lot of pages which frequently refresh (1-5 sec instead of 900) and max out the CPU by repeatedly building firefox.

I am more convinced than ever this is a thread sequencing problem as when gmail has the problem it seems to be in the Destroy Frame code, when one is "archiving" a message and returning to the Inbox screen.  It looks as if gmail may be running multiple Javascript threads -- one which is closing the message window (frame?) while the other is redrawing the Inbox window (frame?).  It is worth noting because gmail "monitors" ones Inbox and perhaps ones potential chat partners, it does its own internal equivalent of a "REFRESH" at random intervals.  But it is being done by javascript rather than the HTML code.

Firefox seems to be assuming that actions by gdk are synchronous and they really don't appear to be.  So if it starts some activity in a window and that thread gets suspended, then destroys the window, the thread which was doing stuff to the window finds a destroyed window when it is unsuspended.

The question is whether in gdk there is some way to guarantee that all pending operations within a window are complete before one destroys it?  

There is also an interesting question that might be asked, "What happens to javascripts which are dealing with a window when the window is destroyed (or refreshed with a completely different window)?"
Comment 78 David Baron :dbaron: ⌚️UTC-10 2007-05-12 12:01:07 PDT
Mozilla interacts with Gdk only on a single thread (often called "the main thread").
Comment 79 Robert Bradbury 2007-05-12 16:28:46 PDT
David, that is true, thread 1 through DoProcessNextNativeEvent calls g_main_context_iteration which ends up eventually invoking the gtk_widget functions.

But glib is designed to handle inputs from multiple threads and has locks on the data structures to prevent different threads from interfering with each other.  But in the situations where I am seeing the untitled window problem there are anywhere from 9-12 threads running.  Thread 2 appears to be the network interface (to fetch the contents of web pages) and in some cases a third thread may be doing a DNS lookup, but I have to assume that the other threads are presumably asynchronous javascript or plugin threads (though in my test case I've limited plugins to just NoScript and AdBlock).

Glib seems entirely event driven, so if one has a case where thread A is destroying a window and thread B is manipulating the window and thread A isn't locking the window down so it cannot be manipulated during its destruction (or it cannot be destroyed during its manipulation) then you have a recipe for what we are seeing when the machine gets busy -- thread B starts a sequence of operations on the window, gets suspended, thread A deletes it, and thread B resumes only to discover its assertions are failing because the window no longer exists.

The REFRESH and gmail examples are pretty clear.  To "destroy" the window, you have to start the process of freeing all of the data structures within the window -- that may take a while (especially if memory is in short supply and paging of the fragmented heap causes thread suspensions).  If an event comes along and tries to manipulate partially destroyed window structures there may be problems.

I'm just guessing, but it looks like in nsWindow.cpp:nsWindow::Destroy there isn't a lot of effort made to preclude threads from manipulating the window contents.  It looks like Destroy() is trying to indicate the window is "Destroyed" and disconnect signal handlers from the window but I have no idea whether any asynchronous threads are paying attention to these details.

Comment 80 Robert Bradbury 2007-05-14 10:12:37 PDT
Created attachment 264760 [details]
Traces of window (frame) destroy's

This is a set of backtraces from closing the gmail login window.  The gmail login window does have a number of javascripts that run, including the counter that updates how much free space one has (it uses the setTimeout() function.

From studying this and the glib functions it looks to me like the nsBaseAppShell is running and processing events, which in turn invokes the g_main_context functions to process pending gtk/glib events.  Glib maintains its own event queue which through previously setup callbacks ends up back in the Firefox functions in nsWindow.cpp -- delete_event_cb() and OnDeleteEvent() which go through all kinds of functions to delete all of the Firefox data that is associated with the window.  Now, on a machine with a heavy CPU load and/or one where Firefox is using a lot of memory [1], particularly with complex windows, the destruction and freeing can easily become suspended.  If at that time, the result of an HTML REFRESH takes place or a javascript timer goes off and they "think" they have control over the window, I suspect they may do things which add actions to the glib events queue.  Once the g_closure_invoke function completes and backs out to the g_main_context level it will continue to process these events only to discover that the window that the events were being applied to no longer exists -- and this is what causes the Gtk error messages.

I think the nsWindow::OnDeleteEvent function may need to be setting a flag or a lock on the entire window such that any asynchronous activities (REFRESH, Timeouts, ???) do not attempt to alter the window while it is being deleted.

I can see this as being a bug which is likely to only occur in gtk/glib due to the way it seems to be handling queueing and dispatching events.  But the firefox windows code may be at fault for not preventing any activities within the windows while they are being deleted.

1. The problem shows up under high memory usage because the Linux paging functions aren't particularly adept at dealing with this (and one can wait many seconds for a page to get swapped in).  It shows up when you leave Firefox running for an extended period because the heap becomes fragmented and one is more likely to have pages required for freeing data structures (in the fragmented heap) unavailable forcing a suspension of the window destroy thread.
Comment 81 David Baron :dbaron: ⌚️UTC-10 2007-05-14 11:03:54 PDT
Why do you think there are multiple threads involved?  There should only be one.
Comment 82 Robert Bradbury 2007-05-14 13:00:34 PDT
(In reply to comment #81)
> Why do you think there are multiple threads involved?  There should only be
> one.
> 

I've got a debug compiled Firefox currently running under gdb with several dozen open windows (and perhaps a hundred+ tabs).  If I ctrl-C it and say "thread apply all bt" I get 7 active threads (1, 2, 3, 12, 25, 26, 27).  Now gdb cannot give me meaningfuls stack traces for the later threads, but I view this as a gdb bug in tracing system calls.  #1 is, as you have said, the "main" thread apparently controlling much of the activity with gtk/gdk/glib.  #2 is apparently the single network communication thread, which is of course highly questionable if my system (or network) happens to have separate connections for dial-up, DSL, Cable, Satellite and WiFi links to various sources.  Single-threading network communications at the application level is wrong.  The  distribution and collection of network requests is under OS level mamagement problem.  If thread creation is not prohibitively expensive (and it will not be on 2, 4 & 8 core processors) then all network requests should be running on individual threads.  I do not fully understand what threads are created and destroyed within the current Mozilla/Firefox model -- if there is a document which clearly outlines this I would be happy to review it.  (I strongly suspect that the "model" for the program is lost within the minds of a few developers who wrote core aspects of the code -- presumably many of whom were writing for a Windows paradigm rather than an open source Linux paradigm.)

I have no evidence that asynchronous operations, e.g. the HTML REFRESH operation or asynchronous javascript timers are or are not operating in different threads (the handling of asynchronous operations isn't exactly documented to the best of my knowledge).  It really doesn't matter.  If an async interrupt occurs in the "default" main thread it can still add operations to the glib "event" processing queue unless it recognizes that it is adding such operations to a window (or its subcomponents) which are in the process of being deleted.

I am willing to be wrong about this.  Point out the precise functions where REFRESH and/or Javascript Timeouts are being handled and point out the precise locations where they will lock or block on access to the window in which they are running.  Or point out that all of these window activities are being bounced back up to the top level where they are being enqueued and the enqueing will detect an attempt to enqueue on a "Destroyed" object.  Because as things stand right now -- it looks like the code is destroying the window and it is enqueing things to be done to the window which has been destroyed.  One should not perform magic upon an object incapable of supporting magic.  At least in my humble opinion.

Comment 83 David Baron :dbaron: ⌚️UTC-10 2007-05-14 13:13:07 PDT
There's lots of asynchronous stuff happening, but everything to do with Gdk/X, windows, and scripts should be running on the main thread.  The other threads (timer thread, socket transport thread, and a few others) should have nothing to do with it.

Please don't use this bug to discuss general complains about threading design; if we have that debate here we won't be able to find the part relevant to this bug anymore.
Comment 84 Robert Bradbury 2007-05-14 13:26:35 PDT
David, understood.

If indeed all Gtk/glib requests are being submitted through the main thread, there should *still* be a constraint that they should *NOT* be submitted while a window deletion is in progress.  Indeed, all frames, windows, etc. should have a flag indicating that "modifications to this window/frame will be bounced" and such a flag should be set once a window destruction (or redraw) process is invoked.

The form of the error, i.e. "Window unexpectedly destroyed" followed by several window data structure consistency checks reeks of the fact that one is adding things to do to windows that are in the middle of the downwards spiral for destruction.  That is a really bad idea.  One should not be attempting to schedule activities for windows which are effectively dead!

Robert
Comment 85 Adam Guthrie 2007-05-14 14:56:44 PDT
*** Bug 339251 has been marked as a duplicate of this bug. ***
Comment 86 Nico R. 2007-05-16 13:01:15 PDT
The following bugs seem to be duplicates of this bug report:
bug 349497,
bug 354970,
bug 366896.

They are probably filed with the wrong product and/or component.

Please have a look at them and take appropriate action. Thanks!


By the way, I have also experienced this bug with the Firefox Preferences window two days ago, and with its Downloads window twice today.
Comment 87 Braden 2007-05-16 13:59:57 PDT
*** Bug 349497 has been marked as a duplicate of this bug. ***
Comment 88 Braden 2007-05-16 14:00:34 PDT
*** Bug 354970 has been marked as a duplicate of this bug. ***
Comment 89 D. Hugh Redelmeier 2007-05-16 15:22:26 PDT
I'm a refugee from bug 349497.  And before that, https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=193274

I've had this problem for a long time.  I experienced this using firefox on Fedora Core 5 and then 6 on x86_64.  I put some more detail into the Redhat bugzilla entry.
Comment 90 Robert Bradbury 2007-05-16 18:10:24 PDT
I would agree that #349497 and #354970 are the same bug.  Bug #366896 might be if one is getting a case where gtk/glib is not catching a case of the window being destroyed and is attempting to manipulate deleted (free-ed) memory structures.  Given the number of structure consistency checks in Gtk/Glib is is easy to imagine that they may not have caught them all (I've encountered a few of these even with the most up-to-date libraries).  If a user is running on libraries which do not have the debug (& consistency check) options enabled, SEGFAULTs would not be out of the realm of possibility.  One needs a stack trace to determine whether the faults are taking place explicitly within Gtk/Glib.

We are clearly in the realm of items being added to various "event" queues for a window while the window is in the process of being destroyed.  In cases of high CPU use and/or high memory use -- a window "destroy" operation is not a "guaranteed to go to completion" situation and therefore adding anything to pending event queues (where when they come to the head of a queue they are dealing with a semi- (or completely) destroyed object) is quite problematic.

I maintain my position -- that this is a Firefox (and associated program) problem that one should not be attempting to add activities to a window queue during its destruction.  Whether you view this as a Firefox problem or a library problem is open to debate.  (For example the most immediate action when one detects a window destroy request is to destroy all Javascripts (and timeouts) or window REFRESH's associated with said window *before* one destroys the window itself!)
Comment 91 Phil Ringnalda (:philor) 2007-05-19 11:16:19 PDT
*** Bug 381270 has been marked as a duplicate of this bug. ***
Comment 92 Howard Chu 2007-06-04 00:02:10 PDT
Something else I've noticed recently - as noted before, the problems don't start right away, they only start happening after mozilla has been running for a while. Over the course of time things run a lot more sluggishly. Just now I noticed in top that the X server was taking 16% of my CPU even though nothing was happening anywhere. The seamonkey binary was taking like 1%, nothing else was really taking anything. But when I exited seamonkey, the X server's CPU usage dropped back to 0.7%. So it appears that seamonkey is doing something weird that confuses the X server, and that's when things start going wrong.
Comment 93 Howard Chu 2007-06-04 12:30:38 PDT
Seeing the X server slowdowns again, after some prolonged use. As a wild-ass-guess I suspect seamonkey is using up the X server's backing store resources. I wonder if this has anything to do with the cached page renderings.
Comment 94 Robert Bradbury 2007-06-04 13:16:09 PDT
(In reply to comment #93)
> Seeing the X server slowdowns again, after some prolonged use. As a
> wild-ass-guess I suspect seamonkey is using up the X server's backing store
> resources. I wonder if this has anything to do with the cached page renderings.
> 

Howard, I've seen the X server CPU usage go up in conjunction with the Firefox CPU usage but it generally happens only when I've got hundreds of tabs (windows) open.  It seems to be aggravated if I have many other programs open which may be using X as well.  Firefox is not a particularly large user of "X Server Memory". Right now, on my machine, the System Monitor is indicating that acroread is consuming 30 MiB, epiphany: 19 MiB, soffice.bin: 16.5 MiB, nautilus: 747 KiB, gnumeric: 336 KiB.  Firefix is only consuming 335 KiB).  If anything, Firefox may be consuming CPU because it may be constantly transferring data from its own memory to the X-server memory.  (I suspect for example, one can't run Firefox in "DRI" mode for things like animated GIFs, video, etc.).

But in any case, this is not a bug for discussing Firefox+X CPU usage -- a separate bug report should be filed for that if one has not been filed already.  This bug should *only* be discussing cases where one is destroying a window and one gets the "Window --address-- unexpectadly destroyed" error on the Firefox (Seamonkey/Epiphany) console.

The working hypothesis is that this usually only happens under high CPU use conditions (or high swapping conditions) because there are thread race conditions that allow window commands to be added to a window event queue *after* the window deletion process has begun.  Unless you are consistently seeing "window unexpectedly destroyed" errors, you should not be discussing your problem(s) under this bug.

If you find or create a bug which discusses the Firefox-X CPU usage problems, you may want to make a note of it here as it is certainly true that the high CPU usage may make this bug more likely to occur.  This bug is more critical than high CPU usage, because if one closes the Untitled Windows (rather than the tabs associated with them) it will crash Firefox.
Comment 95 Howard Chu 2007-06-04 13:31:43 PDT
Those "window unexpectedly destroyed" messages start showing up shortly after the X server starts to get sluggish, that's the reason I got here in the first place.

In my case there are no other CPU or memory-hungry processes on the machine. Nor am I watching any videos inside a browser when the problems occur. And as I noted before, the problem doesn't just affect browser windows - in Seamonkey once things start going wrong, any window can be affected, e.g. the Print Status dialog, different panes in the MailNews module, etc. have all been affected at various times.

If Window *destruction* is the root cause of the problem, then why is the visible effect disrupting window *creation/rendering* ?

I think you're just chasing a symptom here, and you really need to find out why the CPU usage got to this bad state in the first place.
Comment 96 Robert Bradbury 2007-06-04 14:10:33 PDT
Ok, Howard, sounds like a legitimate case of this bug.  The reason that CPU usage or memory usage (and swapping) are critical is that to redraw *any* previously drawn window or tab requires destroying the previous window and all of the objects within it (this includes javascripts, GDK/GTK/GLIB "window object" structures, etc.)  It is very complex, and therefore time consuming.  It is also difficult to debug because the "destroy" (and memory deallocation) processes are usually not explicit in the code but are linked into the data structures and may be called in some very strange situations.

In a "light load" situation, window destructions will tend to run to completion in one operation (with no interference).  In heavy load situations, particularly if large amounts of the heap are paged out, a delete operation may be suspended (freeing heap memory amy require scanning through much of a highly fragmented heap to find out how to insert the freed memory back into the free memory list).  Window operations (window or sub-window "redraws") can also occur asynchronously -- through either HTTP, e.g.
  <meta http-equiv="REFRESH" content="#-seconds-delay">
or Javascript, using the setTimeout() functions and window.location.reload(true) when the timeout expires.  Google mail may also use something similar to update the screen when ones remote mailbox content has changed.

It *would* be useful to know what URLs you have open when you see this happening as well as whether you think there are pages that happen to be doing some type of "auto-refresh".  If you don't know all the pages, and you don't have privacy concerns send me your sessionstore.js file and I'll turn it into a list of URLs and try to track down the offending URLs.

It isn't that the URLs are really doing anything wrong -- it is that the Firefox/Seamonkey/GDK/GTK/GLIB code is allowing actions (e.g. derivatives of window refresh/redraw) to be added to the window event queue while it is in the process of being deleted.

I suspect there is no test case in Mozilla for a set of windows which attempts to bury the machine (e.g. 100% CPU and/or Swap usage based on nothing more than window refresh/redraw requests).  If there were we probably would have resolved this bug long ago.

My statements still stand that there is a second, probably unrelated bug involving excessive CPU usage (I suspect this is due to poor management of "inactive" windows but haven't begun to investigate it yet).  For example, I suspect a few hundred windows monitoring a few hundred active RSS feeds would drag the machine into the ground -- but it would not do so if the X utilization of visible vs. non-visible windows were managed properly.
Comment 97 Howard Chu 2007-06-04 14:26:53 PDT
Thanks for the recap.

One site that seems to suffer from the problem pretty consistently is the discussion threads at http://www.realworldtech.com/forums/index.cfm

It still takes a long time before the problem first appears, but once it starts, it pretty much keeps on going. For some sites the problem will appear on a window, but hitting Return in the location bar will cause the page to be displayed correctly (with the bogus window disappearing). For this site, once the problem appears, any kind of navigation (forward to new threads, back to previous pages, etc) has the problem.
Comment 98 Howard Chu 2007-06-04 14:29:49 PDT
PS: I've also tried LD_PRELOADING tcmalloc_minimal.so as Google's malloc library generally performs better than glibc's in multithreaded programs. It seems to delay the onset of the problem but doesn't cure it.
Comment 99 Robert Bradbury 2007-06-04 15:07:14 PDT
Howard, looking at the realtechworld.com site, it looks like they are making use of:
<!-- BEGIN: AdSolution-Tag 4.3: Global-Code [PLACE IN HTML-HEAD-AREA!] -->
<script type="text/javascript" language="javascript" src="http://a.as-us.falkag.net/dat/dlv/aslmain.js"></script>
<!-- END: AdSolution-Tag 4.3: Global-Code -->

Taking a look at the aslmain.js script, sure enough they have an assembly of complex ad manipulation code, including a setTimeout() call, lots of calls writing to the window() and even options for handling java and/or shockwave "ads".  I am reasonably sure they are attempting to redraw the window, presumably with rotating ads, on a regular basis thus producing your frequent encounters with the destroy window problem.

You may be able to avoid this by turning off javascript entirely (but it looks like the realtechworld.com site depends heavily on it (boo, hiss!)), or use the NoScript addon to enable it *only* for specific "worthy" sites (e.g. gmail, amazon, perhaps realtechworld, etc. as I do).

However the advertising people are presumably trying to get increasingly clever and shove their "services" down your throat (avoiding NoScript and similar tools) so selective "enabling" is likely to have a limited lifetime.

The only way to fix the problem ultimately is to completely disable Web 2.0 (push) type applications -- i.e. download & draw the page once & don't download anything else until I explicitly request a reload.
Comment 100 Tom Simnett 2007-06-09 12:07:03 PDT
Created attachment 267808 [details]
Screenshot displaying window error
Comment 101 Tom Simnett 2007-06-09 12:07:40 PDT
Created attachment 267809 [details]
Screenshot displaying window error

I've been seeing this problems for some time now too. Still in 2.0.0.4 and somewhat surprised nothing has been done to resolve it since 2004 (0.9).

A page that causes this problem for me is: http://www.developertutorials.com/tutorials/php/php-singleton-design-pattern-050729/page1.html

Attached is the screenshot for this bug to show what is going on.
Comment 102 Robert Bradbury 2007-06-09 14:04:59 PDT
It looks a lot like the developertutorials.com site using a javascript from http://kona.kontera.com which is an "advertising" site.  I haven't traced through all the javascript file loads that result from the use of KonaLibInLine.js but it looks like they are setting up timeouts in the initial file (e.g. dc_ALTimeout=900).

PLEASE! - ALL FUTURE ADDITIONS TO THIS BUG SHOULD INDICATE WHETHER JAVASCRIPT IS ENABLED OR BLOCKED.  Esp. with respect to "unknown" sites.  Tracing javascript timeout calls is very difficult if they go through several levels of indirection to get to the code which is setting and springing the timeouts.

HTTP Refresh problems causing destroyed+detached windows are probably much easier to trace than complex javascript timeout window manipulations.

It should be noted that this is *not* a case of Firefox getting "worse" (the problem has been around for ages), the problem is that the advertisers are making more frequent use of AJAX/Web 3.0/Timeouts in attempts to shove different ads down your throat -- "Well they didn't click on that ad after 5 minutes -- lets throw a different ad at them.".

If you are using "unsafe" sites with Javascript enabled (NoScript solves many of these problems) then (cough) you get what you deserve.  You *are* allowing commercial enterprises (and corrupted web sites) to run programs (not merely display text) on your computer.  Of course one should only worry about browser security, after say what -- half a million bugs?  A browser with less than that, say 383,866 bugs, well certainly that's got to be a "safe" system.
Comment 103 cburroughs 2007-07-12 20:04:26 PDT
I have created bug 386429 to track the separate X usage bug first described in comment #92.
Comment 104 Robert Bradbury 2007-07-28 14:58:14 PDT
I am confirming that the bug *still* exists in Firefox 3.0a7pre using CVS source dated 19 Jul 07.  I managed to spring it by reopening (via Back) a gmail window.  It looks like gmail may be managing more windows-within-windows as it threw up 6-8 untitled windows in quick succession as it attempted to redraw the entire gmail environment.  In the past springing the bug in gmail usually only sprang 2-3 new untitled windows.

I will also confirm that it isn't a memory use by the current Firefox problem (the 3.0a7pre version was only consuming about 30% of main memory.  However main memory was fully in use and a high (nice -19'ed) CPU load had been generated by starting a Gentoo package emerge sequence.

The problem clearly seems to be a window delete (or redraw / resize?) operation is stuck into the glib events queue at the same time various processes are operating on the window.  When the subsequent operations go to work on the deleted window the errors (and new untitled windows) are the result.

It seems to me that this might present a security problem as one is depending on the integrity of the glib code to detect the fact that a window has been deleted and prevent operations on it -- if there are cases where it misses that situation the code (which might be foreign Javascript) could be copying things to/from random parts of memory (e.g. former window memory reallocated to contain form data such as CC #'s, SS #'s, etc).
Comment 105 Aaron Lehmann 2007-08-02 02:35:30 PDT
On the topic of possible heap fragmentation, has anyone tried linking Firefox with TCMalloc? It seems like others have had some pretty good results: http://wiki.wikked.net/wiki/Squid_memory_fragmentation_problem
Comment 106 Howard Chu 2007-08-03 07:19:47 PDT
(In reply to comment #105)
> On the topic of possible heap fragmentation, has anyone tried linking Firefox
> with TCMalloc? It seems like others have had some pretty good results:
> http://wiki.wikked.net/wiki/Squid_memory_fragmentation_problem
> 
Yes, I run with tcmalloc all the time now. It's only a band-aid, not a true fix. The problems still occur, it just takes longer for them to begin.
Comment 107 Robert Bradbury 2007-08-04 11:41:03 PDT
Getting back on topic (firefox/mozilla heap memory usage is a separate problem from the untitled window problem) [1].

This comment is to confirm that in Epiphany (with has no NoScript option so Javascript is enabled), even somewhat moderate browser usage (VirMem ~735 MiB ResMem ~481 MiB) can trigger rather frequent untitled windows if one subjects the CPU to even moderate non-browser loads.  The two specific URLs are the NY Times home page [2]) and a recent CNN news article [3].  I suspect both of these sites, like many news sites, are on auto HTML refresh and/or Javascript managed page reloads, e.g. "window.location.reload(true)" or some equivalent.  They seem to trigger page reloads (and popup/refresh the untitled windows) at least once an hour [4].

Someone who is skilled in HTML/Javascript needs to write a test program [5] which attempts to swamp the CPU (and/or network) with page reloads (increasingly shorter times between page reload requests should do it).  Open up a couple of hundred pages (tabs) with refreshes every 1-5 seconds and I'm reasonably sure the problem will reveal itself.  I would hazard a guess that this type of stress  testing of various asynchronous browser features on various operating systems has not been done.

I would note that given my limited investigation of the window redraw (delete + draw) code thus far I do not believe a patch involving suspending or producing errors on window operations during a destroy window would be that difficult (and it could be applied back to version 2.0.0.X as part of the ongoing security upgrades).  But it should be developed by someone who really understands how the code works and not by someone like me.


1. I'm sure there are more than a few bugs active regarding Firefox memory usage (I filed a few myself).  If anyone runs across them, they may want to post the references to them here and to this topic in the most closely related.  There is a relation between these bugs because as Firefox is used for long periods of time extensive numbers of window management data structures are allocated by gtk/gdk/glib causing heap fragmentation.  The more fragmented the heap is the more CPU time (and paging) will be required to execute a "Delete Window" operation and the more likely an asynchronous operation on the window will be triggered during the deletion operation (thus leading to the window unexpectedly destroyed error).
2. http://www.nytimes.com/
3. http://politicalticker.blogs.cnn.com/2007/08/04/tancredo-bomb-muslim-holy-sites-first/
4. I am unsure whether Epiphany is using the same window management code as is found in the mozilla sources or simply code which involves similar gdk/gtk/glib functions for the reload/redraw window operations.  This may open the question as to whether this is a Mozilla problem or a system library problem -- it ultimately revolves around what operations should be permitted on windows slated for destruction and whether the application or the libraries should manage that.
5. I'm not going to reread all of the comments on this bug but I think I may have encouraged/suggested this in my comments 2 months ago (June '07).
Comment 108 Tony Mechelynck [:tonymec] 2007-08-04 14:30:29 PDT
(In reply to comment #107)
Epiphany is a Gecko product, see http://geckoisgecko.org/
Comment 109 kmike 2007-08-25 02:22:20 PDT
Looks like these bugs are the duplicates:
bug 362955
bug 368260

Also, I'm copying console errors I get when this happens so this bug shows up in the searches:

(seamonkey-bin:3243): Gdk-CRITICAL **: gdk_window_set_back_pixmap: assertion `GDK_IS_WINDOW (window)' failed

(seamonkey-bin:3243): GLib-GObject-CRITICAL **: g_object_set_data: assertion `G_IS_OBJECT (object)' failed

(seamonkey-bin:3243): GLib-GObject-CRITICAL **: g_object_set_data: assertion `G_IS_OBJECT (object)' failed

(seamonkey-bin:3243): Gdk-CRITICAL **: gdk_window_move_resize: assertion `GDK_IS_WINDOW (window)' failed

(seamonkey-bin:3243): Gdk-CRITICAL **: gdk_window_hide: assertion `GDK_IS_WINDOW (window)' failed

(seamonkey-bin:3243): Gdk-CRITICAL **: gdk_window_show_unraised: assertion `GDK_IS_WINDOW (window)' failed

(seamonkey-bin:3243): Gdk-WARNING **: GdkWindow 0x20ea62f unexpectedly destroyed

(seamonkey-bin:3243): Gdk-CRITICAL **: gdk_window_set_user_data: assertion `window != NULL' failed

(seamonkey-bin:3243): Gdk-CRITICAL **: gdk_window_set_back_pixmap: assertion `GDK_IS_WINDOW (window)' failed

(seamonkey-bin:3243): GLib-GObject-CRITICAL **: g_object_set_data: assertion `G_IS_OBJECT (object)' failed

(seamonkey-bin:3243): GLib-GObject-CRITICAL **: g_object_set_data: assertion `G_IS_OBJECT (object)' failed

(seamonkey-bin:3243): Gdk-CRITICAL **: gdk_window_move_resize: assertion `GDK_IS_WINDOW (window)' failed

(seamonkey-bin:3243): Gdk-CRITICAL **: gdk_window_hide: assertion `GDK_IS_WINDOW (window)' failed

(seamonkey-bin:3243): Gdk-CRITICAL **: gdk_window_hide: assertion `GDK_IS_WINDOW (window)' failed

(seamonkey-bin:3243): Gdk-CRITICAL **: gdk_window_set_user_data: assertion `window != NULL' failed

(seamonkey-bin:3243): Gdk-CRITICAL **: _gdk_window_destroy_hierarchy: assertion `window != NULL' failed

(seamonkey-bin:3243): GLib-GObject-CRITICAL **: g_object_unref: assertion `G_IS_OBJECT (object)' failed
Comment 110 Robert Bradbury 2007-11-06 14:25:22 PST
Created attachment 287587 [details]
Yet another example of Firefox going south...

This is yet another example of X spitting out errors when Firefox has gone south.
Comment 111 Jeremy Baron 2007-11-06 15:28:04 PST
*** Bug 402774 has been marked as a duplicate of this bug. ***
Comment 112 Robert Bradbury 2007-11-07 16:52:35 PST
Jeremy, the reason that I assert solving this bug is not difficult is because the code to DELETE Javascript and HTML redraw asynchronous operations *should* be in the base level code.  You have to call such functions before one "really" destroys a window otherwise one has the brower running code/functions which are attached to a non-existent window (or perhaps you don't have such code and that is why Firefox memory and CPU usage grows over time -- orphaned active window subroutines suck down the machine).

The problem is that "code invoked" redraw (delete and recreate) window operations do not clean things up the same way that a legitimate window delete operation should.

I recently experienced this problem a lot with gmail and a maxed out machine.  Gmail apparently redraws its primary window asynchronously under the control of Javascript.

It *very* simple.  Whenever a window redraw is issued:
a) Shut down all javascripts on the window.
b) Shut down all HTML redraws on the window.

Then redraw the window.

It isn't easy for me to fix since I don't know the functions required.
But for someone who does it shouldn't be that difficult.
Comment 113 Reed Loden [:reed] (use needinfo?) 2007-11-27 01:06:49 PST
Created attachment 290364 [details]
craziness (#1)

This is what I've been getting lately. :(

Windows start appearing and doubling... they eventually come to a peak and then all close.
Comment 114 Reed Loden [:reed] (use needinfo?) 2007-11-27 01:08:42 PST
Created attachment 290366 [details]
craziness (#2)

another one
Comment 115 Reed Loden [:reed] (use needinfo?) 2007-11-27 01:16:51 PST
I seem to get this when I have lots of tinderbox tabs open. Note that http://tinderbox.mozilla.org/Firefox/ has ~10 different http refreshes in it (the main page and all the tiny tinderboxen on the left side). I've had it happen at least 5-6 times now (maybe more). I thought it was mochitest doing strange things, but it wasn't. This bug caused me to file bug 403040.
Comment 116 Robert Bradbury 2007-11-27 04:38:23 PST
Reed, if the page you cite indeed contains that many HTTP refresh commands then that would be a very good way to trigger this problem.

It is very clear to me what the fix needs to be.

Before one redraws a window/tab (which internally appears to be a delete and recreate window operation) one has to delete any pending HTTP refresh and any Javascript equivalents.  One cannot have asynchronous operations attempting to redraw a window which has just been destroyed.
Comment 117 David Baron :dbaron: ⌚️UTC-10 2007-11-27 09:41:59 PST
I also saw this again recently; it may be due to the massive pixmap leaks in bug 403481 (which vlad fixed yesterday).
Comment 118 Michael Ventnor 2007-11-27 12:45:14 PST
Any way to reproduce this? I've never seen anything like this before.
Comment 119 Robert Bradbury 2007-11-27 17:35:17 PST
Michael, it helps to be strongly up against the system limits.  I do not know how this problem displays under Windows as I believe it is an X-windows interface problem.  I could reproduce it relatively frequently when Firefox was consuming 60-70% of system memory and if I was running Gentoo emerges on various programs (and thus ~100% CPU usage) at the same time.

It is combination of the X architecture (that one can submit actions to disembodied windows) with Firefox activities which place an unusual load on activity.  Linux is not extremely responsive to "paging on demand", so an excessively large Firefox heap is going to stress this and delay responses to adding or deleting anything from the Firefox heap memory space (because there will be delays in paging things in or out).  And thus Firefox "delete" and "recreate" windows may have long time windows and allow for interruption by async windows operations -- which is what bothers the X windows manager -- remember the messages are about operations on "deleted windows".

The way to test for this is a stress test on Firefox HTTP and/or javascipt page refresh commands when you are stressing the system under high load conditions.  If Firefox 3.0 has not been stress tested to the max, i.e. what are the limits to reliable page refreshes under *Linux* [1], then IMO it should not be released.

1. At one point I started to write a test page which would evolve continually decreasing times for HTTP refresh and/or javascipt page refresh commands.  I never finished it but it seems to be quite feasible.  At some point such a diagnostic should swamp ones system.  If your system is sufficiently loaded with other processes I believe that will trigger the observed bug.
Comment 120 Tony Mechelynck [:tonymec] 2007-11-28 06:58:07 PST
(In reply to comment #119)
For a stress test of one kind of page refreshes, I suggest opening various Tinderbox pages in increasingly many tabs: links to such pages can be found at http://tinderbox.mozilla.org/showbuilds.cgi (Note: Some pages are more active than others). For the other kind, I suppose some repeating-reload kind of pages could similarly be crafted, then loaded in several tabs.
Comment 121 Tony Mechelynck [:tonymec] 2007-11-28 07:00:20 PST
P.S. Of course, to avoid "DoD attacks", use _copies_ of the original pages.
Comment 122 Tony Mechelynck [:tonymec] 2007-11-28 07:01:43 PST
P.P.S. :-( I meand DoS attacks.
Comment 123 Howard Chu 2007-11-28 08:16:45 PST
I've found that running Seamonkey inside gdb slows Mozilla down enough to cause the problems to occur much more often.

E.g. invoke as "seamonkey -g" and then just "run". Gdb prints a message every time a thread starts or exits, and this appears to be enough overhead to trip things up.
Comment 124 D. Hugh Redelmeier 2007-11-28 13:26:41 PST
I have a dual-core AMD system running Fedora Core 7 (and before that, FC6).  I used to get this kind of FireFox crash every few days.  After I turned off one core I seem to get a lot fewer.  I cannot be sure that turning off the core was the cause of fewer FireFox misbehaviours -- it could be a coincidence of time.

(I turned off one core because of a Linux kernel bug that showed up this spring.)

It seems to me that this mildly supports the race condition theory.
Comment 125 Robert Bradbury 2007-11-29 04:49:46 PST
The conditions documented by "D. Hugh".indicate the potential problems of a multi-CPU system with software not designed for such.  The problem is that when an async window/tab refresh comes in the second CPU may get it while the first CPU is dealing with the process of deleting and redrawing the first window.

You have to delete the async refresh operations before you attempt to redraw a tab or window.

The X server is taking things stuck into a queue and if you stick a "delete window" operation into the queue and then an async operation sticks a "do something with that window" into the queue it is not surprising that problems result.

The X (windows) server I believe is working ok.  It is Firefox which is not recognizing that it is attempting operations on windows in the process of being deleted.

There should either be a block on operations on windows being deleted or there should be an elimination of refresh operations on windows being deleted.
Comment 126 Ed Anderson 2007-11-29 06:08:52 PST
Note that this bug is not limited to multi-CPU systems.  I was only on a single-CPU when I experienced this problem.
Comment 127 erik red 2007-11-30 16:03:30 PST
I can also confirm that the bug exists independently of whether a single-core or dual-core CPU being used. To me, the bug has NOT been tripped more frequently after I got a dual core system, rather the other way around if anything. That observation does in some sense rhyme with Robert Bradbury's earlier observations that higher CPU load matters -- it is likely that a dual core is more lightly loaded.
Comment 128 Christian Persch (GNOME) (away; not receiving bug mail) 2007-12-02 16:43:56 PST
*** Bug 368260 has been marked as a duplicate of this bug. ***
Comment 129 erik red 2007-12-11 10:38:52 PST
Here's a twist that may be useful for differentially debugging this problem:

I recently switched to fedora8_x86_64, but I stayed with a 32b version of firefox, because it was easier to get Java to work that way.

Since this change, I have NOT seen the bug. No more disembodied windows, no 
gtk console messages. However,instead, firefox has started crashing regularly,
and it tends to correlate with visiting previously mentioned popular sites that tend to exercise the original bug.

To complicate matters, this is with a slightly newer firefox binary (2.0.0.10),
but presumably someone else can confirm that 2.0.0.10 is stil buggy when run on a 32b linux.

Here's the plain and smple error message:
/usr/lib/firefox-2.0.0.10/run-mozilla.sh: line 131: 31854 Segmentation fault      (core dumped) "$prog" ${1+"$@"}


Comment 130 timeless 2007-12-25 08:44:11 PST
*** Bug 409059 has been marked as a duplicate of this bug. ***
Comment 131 Robert Bradbury 2008-01-19 08:25:31 PST
(In reply to comment #129)

> I recently switched to fedora8_x86_64, but I stayed with a 32b version of
> firefox, because it was easier to get Java to work that way.

Erik, see my comment #48 under Bug 244482 (I didn't realize it had been marked duplicate) regarding the probability of the bug *not* appearing on multi-core (or perhaps simply faster) CPUs.  I view 64 bit kernels & libraries (including the X server) as being inherently faster than 32-bit equivalents due to the increased number of registers available on the 64 bit architecture compared with the 32 bit architecture).  So it may simply be due to the fact that critical aspects of the system (like how fast X is processing window operations) run faster and make it harder to trigger the bug.
Comment 132 Robert Bradbury 2008-01-19 08:56:06 PST
Note: This is using Epiphany rather than Firefox, but the symptoms are the same.

I have definitely confirmed that the NY Times home page (www.nytimes.com) can trigger this error.  I was doing other work and this morning a minimized window sitting on the NY Times home page sprang an "untitled" window with a refreshed home page.  The log file contained the typical "Gdk-WARNING **: GdkWindow 0x2123dc0 unexpectedly destroyed" followed by 6 Gdk-CRITICAL/GLib-GObject-CRITICAL warnings.

I tried to save the "Untitled Window" and that did not work.  Going back to the original window and executing a "save" did work, but the "Save" window failed to  exit properly (I think the entire browser window set was effectively hung).  Trying to minimize the now dysfunctional "save" pop-up window seemed to result in: "Gdk-CRITICAL **: gdk_window_hide: assertion `GDK_IS_WINDOW (window)' failed" (followed by 3 Gtk-CRITICAL/GLib-GObject-CRITICAL) warnings (similar to those when the "Untitled WIndow" first appears.

Killing original NY Times home page window (clicking on the window X) deleted the original window, the untitled window and the save pop-up window (at least there is a work-around).  Of course its a bad work around if you have other useful tabs in the same window as the one causing the problem (though I believe killing the dysfunctional tab might have worked).

Now, I've looked at the homepage window and it does not have a:
  meta http-equiv="refresh"
command to refresh the window using HTML.  Given that I don't think I've seen the error occur when I have Javascript disabled, I think the NY Times is using a javascript window refresh timeout.

It is worth noting that I think you could reproduce the bug (at least on a 32 bit single core CPU) if one simply copied down a number of pages (newspaper or TV station "homepages" might be a good bet) and hacked them to contain a line like
   <meta http-equiv="refresh" content="1">
The problem is the getting the full impact of loading a time-consuming network bandwidth limited page.  This probably requires something like:
    content="1;url=http://www.nytimes.com/"
    content="1;url=http://www.washingtonpost.com/"
     etc.
Then the problem is how to get it to repeat itself.  It might require one master reloading file which sets up multiple foreign-site reloads.

But I think if you do something like this and max out either the CPU or the network bandwidth you should eventually get to the point where the bug becomes reproducible.

It is also useful to note that the NY Times home page includes the lines:
   <meta http-equiv="Expires" content="0">
   <meta http-equiv="Pragma" content="no-cache">
which I believe function to prevent caching of the page contents.  Generally speaking if you have a fast network connection the pages one is loading should probably contain such lines.  If on the other hand you max out your network bandwidth before you max out your CPU (or memory) one may want to try loading pages which are more static and can be cached (to reduce the network load).
Comment 133 Paul Brannan 2008-01-19 09:30:30 PST
> Killing original NY Times home page window (clicking on the window X) deleted
> the original window, the untitled window and the save pop-up window (at least
> there is a work-around).  Of course its a bad work around if you have other
> useful tabs in the same window as the one causing the problem (though I
> believe killing the dysfunctional tab might have worked).

Usually killing the tab also destroys the unwanted window, for me.  Sometimes it takes down firefox, but I think that's because this bug is usually triggered in low-memory situations.
Comment 134 Robert Bradbury 2008-01-19 09:39:01 PST
The NY Times javacript timeout function appears to be the file:
  http://graphics8.nytimes.com/js/home/screen/common.js
I fetched it using "wget".

It looks like it times out every 15 minutes.  I still haven't figured where it
gets called from (perhaps it is simply setup when the common.js file is loaded from the home page).  It gives you a good idea of how to setup the timeouts
using Javascript (which I don't speak).

Since Javacript appears to have a millisecond timer vs. HTML which uses seconds
one ought to be able to max out the machine by having a function like the NY
Times timer function deduct start with a 5 second refresh then deduct 10-100
milliseconds for each successive refresh until the machine gets maxed out.
Comment 135 Robert O'Callahan (:roc) (email my personal email if necessary) 2008-01-20 13:50:05 PST
There are so many comments here that it's going to be hard to get anything done.

What we need here is a testcase that will reproduce the bug, not just for one person but for many or hopefully all people. This may involve writing HTML or possibly even using a Python web server. Or you may be able to get away with enabling popups and using window.open and document.write. Please lets focus on that and not discuss the details of hardware configurations or speculate about what might be causing the bug.
Comment 136 Robert Bradbury 2008-01-21 14:16:38 PST
Please note Bug #413390 and the NYT-test.sh attachment to it.  In a perfect world, i.e. if Firefox could launch tabs until ones swap space was exhausted, I suspect that script (or multiple invocations thereof) would in fact provide the test case ":roc" desires.  The script might provide a test case if one increases MAXSESSIONS to 250+ and increases INTERVAL to 30+ but it is going to require running the script for several hours to generate a sufficient number of tabs (running a sufficient number of page refreshes) that an "untitled window" may appear.

I suspect, the INTERVAL and MAXSESSIONS are going to be highly CPU dependent.  The goal is to generate a sufficiently large number of asynchronous Javascript timeouts running such that one or more timeouts will expire in the middle of a previous page refresh (delete and redraw) operation has completed.  This leading to the GDK errors and the "untitled window".

To the best of my knowledge there is no way to obtain a "ps" within Firefox for currently pending Javascript timeouts.  This is another "bug".
Comment 137 :Mook 2008-02-21 07:35:24 PST
*** Bug 365734 has been marked as a duplicate of this bug. ***
Comment 138 Tony Mechelynck [:tonymec] 2008-03-23 04:17:46 PDT
*** Bug 367832 has been marked as a duplicate of this bug. ***
Comment 139 Robert Bradbury 2008-04-04 08:02:39 PDT
Please note the creation of Bug #427024, using a very recent release of Firefox 3.0pre.  That bug provides both a sessionstore.js file and firefox log files for a reproducible (at least within an existing firefox session) for the "window unexpectedly destroyed" and the "untitled window" problems using Gmail, which means the problem is being generated using Javascript -- this is slightly different from many of the problems reported under this bug which are typically generated by window redraw commands from HTTP timeouts (sometimes used by news providers, advertisers, etc.).
Comment 140 Tony Mechelynck [:tonymec] 2008-04-19 15:26:09 PDT
*** Bug 399436 has been marked as a duplicate of this bug. ***
Comment 141 Robert Bradbury 2008-04-24 04:44:26 PDT
Created attachment 317518 [details]
GDB log of window unexpectedly destroyed errors

Ok, here you go, finally after more than a year of encountering this problem is a set of stack traces.  This is a "current" firefox (CVS compiled 29 Mar 08 / version 3.0pre).  Firefox itself, gdk+, glib and libc are compiled with debug symbols (-g2).

The firefox session has been running 3 days (when the system was rebooted).  It was a restart of a previous long running session so it currently has 53 windows and 445 tabs open.

The problem is gmail is completely dysfunctional!  The symptoms first appeared as an old (working) gmail window could not compose a message.  One could enter the To: line and the Subject: line but the window would fail to echo text typed into the main body message.  One could discard the partial message and start a new message and it exhibited the same problem.  One could close the old gmail window and attempt to reopen a new gmail window and that would produce the standard "GdkWindow ... unexpectedly destroyed" messages *consistantly*.  There are 14 GdkWindow warnings followed by 3 gdk_x11_visual_get_xvisual warnings.  I presume these are due to the initial Gmail Javascript setup code.

It should be noted that a gmail "window" is displayed, with a title "Gmail - Inbox(4) - robert.bradbury@gmail.com - Mozilla Firefox" but no text is displayed within the window.

The debugger was attached to firefox, some breakpoints were set and deleted when they were determined to be too "chatty".  The last ~18 backtraces involved the g_log messages resulting from a fresh gmail window restart.

I will attempt to keep this firefox/gdb session open in the hope that someone who understands widget/src/gtk2/nsWindow.cpp : nsWindow::Destroy() (and the gtk/glib code) will contact me so this problem can finally be debugged.

It should be noted, that even with a troublesome sessionstore.js file (many windows and tabs) it still usually takes several days of use to get Firefox into the window destroying problem state.
Comment 142 Robert Bradbury 2008-04-24 05:13:41 PDT
Created attachment 317520 [details]
Window destroyed problems opening new tabs

Ok, here are the Firefox traces from the same problematic firefox session.  In this case however, the problem is not gmail, instead it is opening new tabs from http://www.sciencedaily.com/.  There was a previous window with tabs open to the sciencedaily home page.  Sciencedaily is normally accessed with Noscript on (so there should be no Javascript running from that site).  Various URLs from the home page were right-clicked and opened in a new tab.  Each of these results in a single window unexpectedly destroyed breakpoint & message.  It appears that it is complaining about the window normally opened when one right-clicks on a URL as the breakpoint/message occurs after that window is opened & closed but before the new tab associated with the new URL is created.
Comment 143 Robert Bradbury 2008-04-24 05:27:02 PDT
Regarding the "semi-responsiveness" of hung gmail window (mentioned in Attachment #317518 [details]).

It may be worth noting that although the Firefox gmail window contents is "dead" (i.e. the window body displays the contents of what was previously at that location on the monitor), it is still "alive".

One can move the window around on the monitor, can move it between workspaces and interestingly enough it is still communicating with gmail.  If I use Galeon to access my gmail mailbox and send myself a test message the title bar on the window does change from 4 unread messages to 5 unread messages.  The time for this to happen however is some number of seconds, significantly longer than for the same change to be reflected in the Galeon window for my mailbox.

Comment 144 Robert Bradbury 2008-05-24 22:14:15 PDT
Ok, hear is the stack trace from an "unexpectedly destroyed" in the context of returning from a gmail message back to the Inbox index error:

Breakpoint 1, IA__g_logv (log_domain=0xb79e98f4 "Gdk", log_level=G_LOG_LEVEL_CRITICAL, 
    format=0xb783deec "%s: assertion `%s' failed", args1=0xbfb43510 "x���� ��\b") at gmessages.c:396
396	  gboolean was_fatal = (log_level & G_LOG_FLAG_FATAL) != 0;
(gdb) thread apply bt all
(gdb) thread apply all bt

Thread 7 (Thread 0xb67a8b90 (LWP 12544)):
#0  0xb7f83410 in __kernel_vsyscall ()
#1  0xb71e65b7 in *__GI___poll (fds=0xb67a7f98, nfds=2, timeout=65535000)
    at ../sysdeps/unix/sysv/linux/poll.c:87
#2  0xb7dffe5a in PR_Poll () from /usr/local/lib/firefox-3.0pre/libnspr4.so
#3  0x080d0a0c in ?? ()
#4  0x08cbd450 in ?? ()
#5  0x00000002 in ?? ()
#6  0x03e7fc18 in ?? ()
#7  0xb7dfe516 in PR_ExitMonitor () from /usr/local/lib/firefox-3.0pre/libnspr4.so
#8  0x080d137d in ?? ()
#9  0x08cbcf70 in ?? ()
#10 0x00000001 in ?? ()
#11 0xb67a8208 in ?? ()
#12 0xb7eb7ff4 in ?? () from /usr/local/lib/firefox-3.0pre/libxpcom_core.so
#13 0x08cbd7b8 in ?? ()
#14 0x00000001 in ?? ()
#15 0xb67a8218 in ?? ()
#16 0xb7e81dd7 in ?? () from /usr/local/lib/firefox-3.0pre/libxpcom_core.so
#17 0x08cbd7d8 in ?? ()
#18 0x00000000 in ?? ()

Thread 6 (Thread 0xb5f75b90 (LWP 12545)):
#0  0xb7f83410 in __kernel_vsyscall ()
#1  0xb7f73b12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /usr/local/lib/libpthread.so.0
#2  0xb7dfd3a5 in ?? () from /usr/local/lib/firefox-3.0pre/libnspr4.so
#3  0x08c3497c in ?? ()
#4  0x08c58410 in ?? ()
#5  0xb5f7528c in ?? ()
#6  0xb7f745f5 in __pthread_getspecific (key=93793) at pthread_getspecific.c:27
#7  0xb7dfe194 in PR_WaitCondVar () from /usr/local/lib/firefox-3.0pre/libnspr4.so
#8  0xb7e8646f in ?? () from /usr/local/lib/firefox-3.0pre/libxpcom_core.so
#9  0x08c34978 in ?? ()
#10 0x00051fb9 in ?? ()
#11 0x08c58410 in ?? ()
#12 0xb7eb7ff4 in ?? () from /usr/local/lib/firefox-3.0pre/libxpcom_core.so
#13 0x08e2a420 in ?? ()
#14 0x00000000 in ?? ()

Thread 5 (Thread 0xb4679b90 (LWP 12549)):
#0  0xb7f83410 in __kernel_vsyscall ()
#1  0xb7f737e5 in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/local/lib/libpthread.so.0
#2  0xb7dfe226 in PR_WaitCondVar () from /usr/local/lib/firefox-3.0pre/libnspr4.so
#3  0x088daa5a in ?? ()
#4  0x090aeb18 in ?? ()
#5  0xffffffff in ?? ()
#6  0xb1c8b2f0 in ?? ()
#7  0xb1746cb0 in ?? ()
#8  0x08bff4a8 in ?? ()
#9  0x00000000 in ?? ()

Thread 4 (Thread 0xb4e7ab90 (LWP 12550)):
#0  0xb7f83410 in __kernel_vsyscall ()
#1  0xb7f737e5 in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/local/lib/libpthread.so.0
#2  0xb7dfe226 in PR_WaitCondVar () from /usr/local/lib/firefox-3.0pre/libnspr4.so
#3  0x088c6de3 in ?? ()
#4  0x09290278 in ?? ()
#5  0xffffffff in ?? ()
#6  0x092901dc in ?? ()
#7  0xb7e0cff4 in ?? () from /usr/local/lib/firefox-3.0pre/libnspr4.so
#8  0x092902b8 in ?? ()
#9  0x00000000 in ?? ()

Thread 3 (Thread 0xb2593b90 (LWP 12563)):
#0  0xb7f83410 in __kernel_vsyscall ()
#1  0xb7f737e5 in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/local/lib/libpthread.so.0
#2  0xb7dfe226 in PR_WaitCondVar () from /usr/local/lib/firefox-3.0pre/libnspr4.so
#3  0xb7dfe287 in PR_Wait () from /usr/local/lib/firefox-3.0pre/libnspr4.so
#4  0xb7e812f5 in nsEventQueue::GetEvent () from /usr/local/lib/firefox-3.0pre/libxpcom_core.so
#5  0xb7e8243d in ?? () from /usr/local/lib/firefox-3.0pre/libxpcom_core.so
#6  0x09eb3868 in ?? ()
#7  0x00000001 in ?? ()
#8  0xb2593314 in ?? ()
#9  0x00000000 in ?? ()

Thread 1 (Thread 0xb6eca6d0 (LWP 12543)):
#0  IA__g_logv (log_domain=0xb79e98f4 "Gdk", log_level=G_LOG_LEVEL_CRITICAL, 
    format=0xb783deec "%s: assertion `%s' failed", args1=0xbfb43510 "x���� ��\b") at gmessages.c:396
#1  0xb77ed9b9 in IA__g_log (log_domain=0xb79e98f4 "Gdk", log_level=G_LOG_LEVEL_CRITICAL, 
    format=0xb783deec "%s: assertion `%s' failed") at gmessages.c:517
#2  0xb77edbfc in IA__g_return_if_fail_warning (log_domain=0xb79e98f4 "Gdk", 
    pretty_function=0xb7a0ae78 "gdk_window_set_back_pixmap", expression=0xb7a020b3 "GDK_IS_WINDOW (window)")
    at gmessages.c:532
#3  0xb79de9c2 in IA__gdk_window_set_back_pixmap (window=0x0, pixmap=0x0, parent_relative=0)
    at gdkwindow-x11.c:3059
#4  0x08268ecb in ?? ()
#5  0x00000000 in ?? ()
Comment 145 Robert Bradbury 2008-05-24 22:26:56 PDT
Another example with clearer traces:
(the former may involve extended Glib errors while this involves current glib errors.

Breakpoint 1, IA__g_logv (log_domain=0xb79e98f4 "Gdk", log_level=G_LOG_LEVEL_WARNING, 
    format=0xb7a0a2e4 "GdkWindow %#lx unexpectedly destroyed", 
    args1=0xbfb43cfc "?M?\001\210??\0368??\b\v\023\236?????\001") at gmessages.c:396
396	  gboolean was_fatal = (log_level & G_LOG_FLAG_FATAL) != 0;
(gdb) bt
#0  IA__g_logv (log_domain=0xb79e98f4 "Gdk", log_level=G_LOG_LEVEL_WARNING, 
    format=0xb7a0a2e4 "GdkWindow %#lx unexpectedly destroyed", 
    args1=0xbfb43cfc "?M?\001\210??\0368??\b\v\023\236?????\001") at gmessages.c:396
#1  0xb77ed9b9 in IA__g_log (log_domain=0xb79e98f4 "Gdk", log_level=G_LOG_LEVEL_WARNING, 
    format=0xb7a0a2e4 "GdkWindow %#lx unexpectedly destroyed") at gmessages.c:517
#2  0xb79e1477 in IA__gdk_window_destroy_notify (window=0x206e51e8) at gdkwindow-x11.c:1222
#3  0xb79c6c68 in gdk_event_translate (display=0x8c18018, event=0x2211c2e8, xevent=0xbfb43e5c, 
    return_exposes=0) at gdkevents-x11.c:1744
#4  0xb79c82d7 in _gdk_events_queue (display=0x8c18018) at gdkevents-x11.c:2285
#5  0xb79c879f in gdk_event_dispatch (source=0x8c1f188, callback=0, user_data=0x0) at gdkevents-x11.c:2345
#6  0xb77e47f8 in IA__g_main_context_dispatch (context=0x8c1f1d0) at gmain.c:2009
#7  0xb77e7a4e in g_main_context_iterate (context=0x8c1f1d0, block=0, dispatch=1, self=0x8c01c48)
    at gmain.c:2642
#8  0xb77e7f9c in IA__g_main_context_iteration (context=0x8c1f1d0, may_block=0) at gmain.c:2705
#9  0x08246aec in ?? ()
#10 0x00000000 in ?? ()
Comment 146 Robert Bradbury 2008-05-24 22:44:47 PDT
My recent file bug reports on this bug have been generated by Gnail which seems adept at generating this bug under heavy load conditions (i.e. 277+ active tabs in the browser).

And so I am saying to the people who wish to verify firefox functionality -- you will not know it until you test it.  My Gmail problems do not seem to appear until I have multiple sites active.
Comment 147 Robert Bradbury 2008-05-24 22:59:15 PDT
Let us seriously discuss this question.  The bug has been here for 4+ years and has still not been resolved,  Therefore it must be an issue between the Mozilla developers and the X developers -- who do not choose to cross-pollinate with respect to potential X-bugs.   OR we must generally consent to the fact that the masses are generally immune to Linux and proceed along their general way.
Comment 148 Robert Bradbury 2008-05-25 00:03:50 PDT
As this has been a Firefox bug for 4+ years and is still unresolved, I feel compelled to point out that it appears in relatively static mode with respect the glib stack dumps and their position.

The fundamental problem appears to be 'do this operation on window X when window X and its subunits have been deleted''

That requires a commitment from the release "gods" of firefox 3.0 that they will not release it with "known bugs on deck"  It is insufficient if a program can be claimed to work for Windows and not for Linux.  IMO, that is a non-functionable.
Comment 149 Robert Bradbury 2008-05-25 12:09:01 PDT
Created attachment 322449 [details]
Set of gdb stack traces of destroyed windows

Here is a set of gdb stack traces of firefox throwing the "window unexpectedly destroyed" bug (plus a few other glib errors).  The URLs involved were gmail (searching ones own mailbox) and the Internet Movie Database (www.imdb.com).

I've got a firefox setup *NOW* which is regularly throwing these errors into gdb.  It will not get resolved until someone, presumably someone who understands Firefox's javascript enabled use of windows timers, creation and destruction, contacts me for further information.

I can verify that this is currently the *real* bug, because in gmail when it is throwing bugs I see it create and subsequently delete the little untitled windows before it returns to the main screen.
Comment 150 Robert Bradbury 2008-06-03 02:09:36 PDT
It also may be of use to look at Bug #437021 which is a distinct bug of its own because it relates to Firefox SEGFAULTing under Linux (repeatable as I have 5+ traces involving the problem with the associated crashes of Firefox), only the most recent of which involved getting a gdb trace.  But Firefox *was* in the state where it was repeatedly throwing the "window unexpectedly destroyed" messages and it was being generated usually from "gmail" which probably means an improperly handled Javascript window timeout (or destruction) problem.
Comment 151 Cameron McCormack (:heycam) 2008-07-15 16:22:08 PDT
I used to get this kind of behaviour before Firefox 3, where occasionally I'd get a tab's frame open in a separate top-level window (with no window title, and strangely isn't focussable -- using Sawfish as my window manager).  Since Firefox 3 I don't get this as much, but I have noticed it happening with Flash sometimes.  I have the FlashBlock extension running.  Sometimes the new top-level window has the Flash object running in it (despite the fact that the replacement graphic in the main page's window is showing), and sometimes it is an empty, grey window.  Sorry I haven't got any more useful information to provide.
Comment 152 Robert Bradbury 2008-12-07 11:20:48 PST
Maybe, just maybe, I have located at least one source of this problem.  People plagued by this over the years may want to look at Bug #467744.

But what I am seeing in that bug is consistent with this bug.  It depends entirely on *when* the thread destroying the parent window marks it as "destroyed".  The gdk/gtk libraries seem to have this interesting feature that the windows don't immediately disappear when they are "destroyed" but are simply marked as such for a period of time.  Of course if one is able to create a new window as a "child" of a window in the process of being destroyed then one is likely to end up with the "orphan" windows we see with this bug.

The asynchronous (multi-threaded) aspect of window creation and destruction is why this bug was/is so sensitive to the machine CPU/memeory (swapping) usage and so difficult to get a handle on.
Comment 153 David Baron :dbaron: ⌚️UTC-10 2008-12-07 11:37:48 PST
(In reply to comment #152)
> The asynchronous (multi-threaded) aspect of window creation and destruction is
> why this bug was/is so sensitive to the machine CPU/memeory (swapping) usage
> and so difficult to get a handle on.

As I said in comment 78, all of Mozilla's interaction with GTK/Gdk/X11 is on a single thread.
Comment 154 erik red 2008-12-23 17:35:49 PST
David Baron,

But what if one has multiple firefoxes all pounding on GTK/Gdk/X11 at the same time?  Could that be part of the problem? I just got an Untitled window again, this time in Fedora 10 64b with 5 instances of firefox 3.0.4 running and maybe 500 tabs altogether,

On a possibly related note, in fedora 10 my X11 process has been going wild using up 11-12G of main memory, and there appears to be a correlation with whether the browsers are started sequentially/gradually or all loaded up at once (maximum stress on GTK/Gdl/X11 window system). This is with Nvidia Geforce6100 or 7200, and either the public (aka. nv) or the proprietary (aka. nvidia) driver.
Comment 155 Robert Bradbury 2008-12-24 14:59:59 PST
Erik/David, related to your comments regarding the problem, see my comments on Bug #467744 # 6.

Regarding David's claims that the GDK access is single threaded (I want the function names that insure this.)  I have been reading C since 1974, I have actually met both Dennis Ritchie and Ken Thompson at various points.  You can claim **** but this is a "trust but verify world".  One of the shortcomings IMO for the mozilla perspective is that the do *NOT* have a perspective for bringing one "up-to-speed".  

Getting back to Erik's points, there is a question of whether or not asynchronous processes (threads) get to address the display manager (GDK).  Given his many valid points about when and how the display manager may be addressed, is the issue of how one is managing that.  (Note I do not see messages between the Firefox developers and the GDK/GTK developers) revealing that they might understand the capabilities and limits of their software systems.  (Which when you are attempting to operate on a deleted window -- clearly show you do not understand.)
Comment 156 Karl Tomlinson (back Dec 13 :karlt) 2009-05-25 01:41:47 PDT
I'd expect this to result from http://bugzilla.gnome.org/show_bug.cgi?id=581526

Changes to gdk_window_new before gtk+-2.14.0 would have meant that the crash of bug 467744 resulted instead:
http://git.gnome.org/cgit/gtk+/commit/?h=gtk-2-14&id=4111cf2065e1f7edc614a936f1fa35e750f13a0f

But gtk+-2.15.1 and newer will probably start showing these symptoms again as the crash of bug 467744 is patched up here:
http://git.gnome.org/cgit/gtk+/commit/?h=gtk-2-16&id=27d8d8ea2bb815df0733f5d4d57d93542e2c160a
Comment 157 ovemen 2009-06-10 09:24:02 PDT
It used to happen to me every time on RH9 and FC6, using Firefox 1.5 (and I believe also 2.0), after the browser was used for a day or so.

Now, with FC6, FF 3.0.10 it doesn't happen very often, but it happens, especially recently.

** (evince:16737): WARNING **: Unimplemented named action: POPPLER_DEST_FITBH, please post a bug report in Evince bugzilla (http://bugzilla.gnome.org) with a testcase.

** (evince:16737): WARNING **: Unimplemented named action: POPPLER_DEST_FITBH, please post a bug report in Evince bugzilla (http://bugzilla.gnome.org) with a testcase.

** (evince:16737): WARNING **: Unimplemented named action: POPPLER_DEST_FITBH, please post a bug report in Evince bugzilla (http://bugzilla.gnome.org) with a testcase.

** (evince:16737): WARNING **: Unimplemented named action: POPPLER_DEST_FITBH, please post a bug report in Evince bugzilla (http://bugzilla.gnome.org) with a testcase.

(Gecko:28775): Gtk-CRITICAL **: gtk_drag_set_icon_pixbuf: assertion `GDK_IS_DRAG_CONTEXT (context)' failed

(Gecko:28775): Gdk-WARNING **: GdkWindow 0x28d6e54 unexpectedly destroyed

(Gecko:28775): Gdk-WARNING **: GdkWindow 0x28d6e47 unexpectedly destroyed

(Gecko:28775): Gdk-WARNING **: GdkWindow 0x28d664f unexpectedly destroyed

(Gecko:28775): Gdk-CRITICAL **: gdk_window_set_user_data: assertion `window != NULL' failed

(Gecko:28775): Gdk-CRITICAL **: gdk_window_set_back_pixmap: assertion `GDK_IS_WINDOW (window)' failed

(Gecko:28775): GLib-GObject-CRITICAL **: g_object_set_data: assertion `G_IS_OBJECT (object)' failed

(Gecko:28775): GLib-GObject-CRITICAL **: g_object_set_data: assertion `G_IS_OBJECT (object)' failed

(Gecko:28775): Gdk-CRITICAL **: gdk_x11_visual_get_xvisual: assertion `visual != NULL' failed

(Gecko:28775): Gdk-CRITICAL **: gdk_window_hide: assertion `GDK_IS_WINDOW (window)' failed

(Gecko:28775): Gdk-CRITICAL **: gdk_window_set_user_data: assertion `window != NULL' failed

(Gecko:28775): Gdk-CRITICAL **: _gdk_window_destroy_hierarchy: assertion `window != NULL' failed

(Gecko:28775): GLib-GObject-CRITICAL **: g_object_unref: assertion `G_IS_OBJECT (object)' failed

(Gecko:28775): Gdk-WARNING **: GdkWindow 0x28d6650 unexpectedly destroyed

(Gecko:28775): Gdk-CRITICAL **: gdk_window_set_user_data: assertion `window != NULL' failed

(Gecko:28775): Gdk-CRITICAL **: gdk_window_set_back_pixmap: assertion `GDK_IS_WINDOW (window)' failed

(Gecko:28775): GLib-GObject-CRITICAL **: g_object_set_data: assertion `G_IS_OBJECT (object)' failed

(Gecko:28775): GLib-GObject-CRITICAL **: g_object_set_data: assertion `G_IS_OBJECT (object)' failed

(Gecko:28775): Gdk-CRITICAL **: gdk_x11_visual_get_xvisual: assertion `visual != NULL' failed

(Gecko:28775): Gdk-CRITICAL **: gdk_window_hide: assertion `GDK_IS_WINDOW (window)' failed

(Gecko:28775): Gdk-CRITICAL **: gdk_window_set_user_data: assertion `window != NULL' failed

(Gecko:28775): Gdk-CRITICAL **: _gdk_window_destroy_hierarchy: assertion `window != NULL' failed

(Gecko:28775): GLib-GObject-CRITICAL **: g_object_unref: assertion `G_IS_OBJECT (object)' failed

(Gecko:28775): Gdk-WARNING **: GdkWindow 0x28d6653 unexpectedly destroyed

(Gecko:28775): Gdk-CRITICAL **: gdk_window_set_user_data: assertion `window != NULL' failed

(Gecko:28775): Gdk-CRITICAL **: gdk_window_set_back_pixmap: assertion `GDK_IS_WINDOW (window)' failed

(Gecko:28775): GLib-GObject-CRITICAL **: g_object_set_data: assertion `G_IS_OBJECT (object)' failed

(Gecko:28775): GLib-GObject-CRITICAL **: g_object_set_data: assertion `G_IS_OBJECT (object)' failed

(Gecko:28775): Gdk-CRITICAL **: gdk_x11_visual_get_xvisual: assertion `visual != NULL' failed

(Gecko:28775): Gdk-CRITICAL **: gdk_window_hide: assertion `GDK_IS_WINDOW (window)' failed

(Gecko:28775): Gdk-CRITICAL **: gdk_window_set_user_data: assertion `window != NULL' failed

(Gecko:28775): Gdk-CRITICAL **: _gdk_window_destroy_hierarchy: assertion `window != NULL' failed

(Gecko:28775): GLib-GObject-CRITICAL **: g_object_unref: assertion `G_IS_OBJECT (object)' failed

(Gecko:28775): Gdk-WARNING **: GdkWindow 0x28d6664 unexpectedly destroyed

(Gecko:28775): Gdk-WARNING **: GdkWindow 0x28d6637 unexpectedly destroyed

(Gecko:28775): Gdk-WARNING **: GdkWindow 0x28d6672 unexpectedly destroyed

(Gecko:28775): Gdk-WARNING **: GdkWindow 0x287af57 unexpectedly destroyed

(Gecko:28775): Gdk-CRITICAL **: gdk_x11_visual_get_xvisual: assertion `visual != NULL' failed

(Gecko:28775): Gdk-CRITICAL **: gdk_window_set_user_data: assertion `window != NULL' failed

(Gecko:28775): Gdk-CRITICAL **: gdk_window_set_back_pixmap: assertion `GDK_IS_WINDOW (window)' failed

(Gecko:28775): GLib-GObject-CRITICAL **: g_object_set_data: assertion `G_IS_OBJECT (object)' failed

(Gecko:28775): GLib-GObject-CRITICAL **: g_object_set_data: assertion `G_IS_OBJECT (object)' failed

(Gecko:28775): Gdk-CRITICAL **: gdk_window_move_resize: assertion `GDK_IS_WINDOW (window)' failed

(Gecko:28775): Gdk-CRITICAL **: gdk_window_hide: assertion `GDK_IS_WINDOW (window)' failed

(Gecko:28775): Gdk-CRITICAL **: gdk_window_show_unraised: assertion `GDK_IS_WINDOW (window)' failed

(Gecko:28775): Gdk-WARNING **: GdkWindow 0x287af5a unexpectedly destroyed

(Gecko:28775): Gdk-WARNING **: GdkWindow 0x287af1f unexpectedly destroyed

(Gecko:28775): Gdk-WARNING **: GdkWindow 0x287aeef unexpectedly destroyed

(Gecko:28775): Gdk-WARNING **: GdkWindow 0x287afc4 unexpectedly destroyed

(Gecko:28775): Gdk-WARNING **: GdkWindow 0x287afb5 unexpectedly destroyed

(Gecko:28775): Gdk-CRITICAL **: gdk_window_get_origin: assertion `GDK_IS_WINDOW (window)' failed

(Gecko:28775): Gdk-WARNING **: GdkWindow 0x287ae6f unexpectedly destroyed

(Gecko:28775): Gdk-WARNING **: GdkWindow 0x287ad2c unexpectedly destroyed

(Gecko:28775): Gdk-WARNING **: GdkWindow 0x287ad6b unexpectedly destroyed

(Gecko:28775): Gdk-WARNING **: GdkWindow 0x287ad59 unexpectedly destroyed

(Gecko:28775): Gdk-WARNING **: GdkWindow 0x287ad58 unexpectedly destroyed

(Gecko:28775): Gdk-WARNING **: GdkWindow 0x287ad43 unexpectedly destroyed

(Gecko:28775): Gdk-WARNING **: GdkWindow 0x287afcb unexpectedly destroyed

(Gecko:28775): Gdk-WARNING **: GdkWindow 0x287abf7 unexpectedly destroyed

(Gecko:28775): Gdk-WARNING **: GdkWindow 0x287af28 unexpectedly destroyed

(Gecko:28775): Gdk-WARNING **: GdkWindow 0x287af09 unexpectedly destroyed

(Gecko:28775): Gdk-WARNING **: GdkWindow 0x287afaa unexpectedly destroyed

(Gecko:28775): Gdk-WARNING **: GdkWindow 0x287af86 unexpectedly destroyed

(Gecko:28775): Gdk-WARNING **: GdkWindow 0x287ae0d unexpectedly destroyed

(Gecko:28775): Gdk-WARNING **: GdkWindow 0x287afb6 unexpectedly destroyed

(Gecko:28775): Gdk-WARNING **: GdkWindow 0x287ad81 unexpectedly destroyed

(Gecko:28775): Gdk-WARNING **: GdkWindow 0x287ae48 unexpectedly destroyed

(Gecko:28775): Gdk-WARNING **: GdkWindow 0x287ad88 unexpectedly destroyed

(Gecko:28775): Gdk-CRITICAL **: gdk_window_set_user_data: assertion `window != NULL' failed

(Gecko:28775): Gdk-CRITICAL **: gdk_window_set_back_pixmap: assertion `GDK_IS_WINDOW (window)' failed

(Gecko:28775): GLib-GObject-CRITICAL **: g_object_set_data: assertion `G_IS_OBJECT (object)' failed

(Gecko:28775): GLib-GObject-CRITICAL **: g_object_set_data: assertion `G_IS_OBJECT (object)' failed

(Gecko:28775): Gdk-CRITICAL **: gdk_window_move_resize: assertion `GDK_IS_WINDOW (window)' failed

(Gecko:28775): Gdk-CRITICAL **: gdk_window_hide: assertion `GDK_IS_WINDOW (window)' failed

(Gecko:28775): Gdk-CRITICAL **: gdk_window_set_user_data: assertion `window != NULL' failed

(Gecko:28775): Gdk-CRITICAL **: gdk_window_set_back_pixmap: assertion `GDK_IS_WINDOW (window)' failed

(Gecko:28775): GLib-GObject-CRITICAL **: g_object_set_data: assertion `G_IS_OBJECT (object)' failed

(Gecko:28775): GLib-GObject-CRITICAL **: g_object_set_data: assertion `G_IS_OBJECT (object)' failed

(Gecko:28775): Gdk-CRITICAL **: gdk_window_move_resize: assertion `GDK_IS_WINDOW (window)' failed

(Gecko:28775): Gdk-CRITICAL **: gdk_window_hide: assertion `GDK_IS_WINDOW (window)' failed

(Gecko:28775): Gdk-CRITICAL **: gdk_window_show_unraised: assertion `GDK_IS_WINDOW (window)' failed

(Gecko:28775): Gdk-CRITICAL **: gdk_window_show_unraised: assertion `GDK_IS_WINDOW (window)' failed

(Gecko:28775): Gdk-CRITICAL **: gdk_window_hide: assertion `GDK_IS_WINDOW (window)' failed

(Gecko:28775): Gdk-CRITICAL **: gdk_window_set_user_data: assertion `window != NULL' failed

(Gecko:28775): Gdk-CRITICAL **: _gdk_window_destroy_hierarchy: assertion `window != NULL' failed

(Gecko:28775): GLib-GObject-CRITICAL **: g_object_unref: assertion `G_IS_OBJECT (object)' failed

(Gecko:28775): Gdk-CRITICAL **: gdk_window_hide: assertion `GDK_IS_WINDOW (window)' failed

(Gecko:28775): Gdk-CRITICAL **: gdk_window_set_user_data: assertion `window != NULL' failed

(Gecko:28775): Gdk-CRITICAL **: _gdk_window_destroy_hierarchy: assertion `window != NULL' failed

(Gecko:28775): GLib-GObject-CRITICAL **: g_object_unref: assertion `G_IS_OBJECT (object)' failed

(Gecko:28775): Gdk-WARNING **: GdkWindow 0x287afae unexpectedly destroyed

(Gecko:28775): Gdk-WARNING **: GdkWindow 0x287ad2e unexpectedly destroyed

(Gecko:28775): Gdk-WARNING **: GdkWindow 0x287ad2d unexpectedly destroyed

(Gecko:28775): Gdk-WARNING **: GdkWindow 0x287ada6 unexpectedly destroyed

(Gecko:28775): Gdk-WARNING **: GdkWindow 0x287af00 unexpectedly destroyed

(Gecko:28775): Gdk-WARNING **: GdkWindow 0x287ad30 unexpectedly destroyed

(Gecko:28775): Gdk-CRITICAL **: gdk_window_hide: assertion `GDK_IS_WINDOW (window)' failed

(Gecko:28775): Gdk-CRITICAL **: gdk_window_hide: assertion `GDK_IS_WINDOW (window)' failed

(Gecko:28775): Gdk-CRITICAL **: gdk_window_set_user_data: assertion `window != NULL' failed

(Gecko:28775): Gdk-CRITICAL **: _gdk_window_destroy_hierarchy: assertion `window != NULL' failed

(Gecko:28775): GLib-GObject-CRITICAL **: g_object_unref: assertion `G_IS_OBJECT (object)' failed
Comment 158 Jesse Ruderman 2009-11-15 00:27:31 PST
*** Bug 395999 has been marked as a duplicate of this bug. ***
Comment 159 Jesse Ruderman 2009-11-15 14:38:15 PST
*** Bug 410325 has been marked as a duplicate of this bug. ***
Comment 160 Richard D 2010-03-24 14:46:34 PDT
This appears to have been fixed for me, in a later 3.5.x release - currently on 3.5.8.  I last noticed it in 3.5.2.  

Haven't had one of these errors in quite a few months, though it's possible it has transformed into a crash bug.
Comment 161 Tony Mechelynck [:tonymec] 2011-09-19 16:57:26 PDT
(In reply to Richard D from comment #160)
> This appears to have been fixed for me, in a later 3.5.x release - currently
> on 3.5.8.  I last noticed it in 3.5.2.  
> 
> Haven't had one of these errors in quite a few months, though it's possible
> it has transformed into a crash bug.

The above comment is now about 1½ years old. Has anyone seen this bug or one of its dupes on any recent version of Firefox or even SeaMonkey? Let's say Fx4 or newer, Sm2.1 or newer?
Comment 162 Howard Chu 2011-09-19 17:19:13 PDT
I haven't seen this bug in quite a long time.
Comment 163 Ed Martin 2011-09-19 18:18:33 PDT
I haven't seen this bug in well over a year.
Comment 164 Christian Hernmarck 2011-09-20 00:29:17 PDT
Me neither -  didn't saw the bug in the last months/years.
In the late 2008 I upgraded to opensuse 11.1 - recently I moved to debian on Desktop.

AFAIK I never met this bug in theese times... (still running FF 3.6.<newest>)
Comment 165 Karl Tomlinson (back Dec 13 :karlt) 2011-09-20 01:00:33 PDT
Work in bug 130078 and bug 352093 means we no longer create new GdkWindows on loading new pages.

GDK client-side windows (first in GTK+-2.18) also would have changed timing, perhaps enough that this bug did not show.
Comment 166 Karl Tomlinson (back Dec 13 :karlt) 2011-09-20 01:03:34 PDT
*** Bug 467744 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.