Closed Bug 92735 Opened 23 years ago Closed 23 years ago

File->Close crashes

Categories

(Core :: XUL, defect)

x86
Linux
defect
Not set
major

Tracking

()

RESOLVED FIXED
mozilla0.9.4

People

(Reporter: irathore, Assigned: bryner)

Details

(Keywords: crash, Whiteboard: <Gdk-ERROR **: BadWindow> [@nsEventListenerManager::HandleEventSubType])

Attachments

(6 files)

Closing a window through menubar File->Close closes all windows and quits. In
other words the behavoiour of Quit and Close is the same. Using quit button on 
the wondow frame (Wondow Manager's Close) works right. User has to use the
window manager's close instead of File-Close.

The behavoiur has been the same since arround 0.9.0

FT
More likely it's bound to crash, please provide a log or ideally run
./mozilla -g
if when you select file>close you get an error and gdb [or ddd] start try
|bt| or |where|

please include glib ver, x11 impl name/ver, kernel ver, distro ver, 
about:plugins, jvm impl vendor/ver, distro name/ver, any other things you think 
are relevant, compiler name/ver, whether you built mozilla (cvs pull date/ftp 
sourceball date) or whether you downloaded it (url + date)
I can confirm that on a 2001072711 build on Mac, not on Linux. I was caused by a
Type 3 error, but I haven't seen a Talkback window. I'm sure that it was the
full-installer version of 12MB (with Talkback - the very first in the list), but
support for Talkback sometimes doesn't seem to work (or isn't included in the
Mac build).

I've never seen the error with only 1 window, you had to have at least 2 windows
open (at the time of the crash, or before) :
- start mozilla
- open window
- open 2nd window
- file->close
- crash
johan: could you install macsbug?

when you crash, enter:
stdlog [standard log]
es     [kill application?]
(that should work, at worst there are other commands, gf [go finder] rs 
[restart])
-
then attach the stdlog (it will be on your desktop) to this bug.
Keywords: crash, stackneeded
Summary: File->Close seems to be bound to quit → File->Close crashes
This time it crashed on the very first window. Immediately after reboot (for
installing Macsbug), I started Zilla. It opened the homepage from the cache
(there was no network connection on my cablemodem yet). I selected File->Close,
and it crashed in nsEventListenerManager::HandleEventSubType.

I'm off to work now - I already missed my bus :-(
Cannot send backtrace because program is exitting not crashing(with code 1).

I will compile again and post more debug info soon . For now below is the system
info you asked:

The crash happens on out of the box RedHat-6.1, RedHat-6.2, RedHat-7,
Mandrake-8, with or without JVM, does not seem to depend on system.

The system I am debugging on is a bastardized version of Mandrake-8
linux-2.4.7, glibc-2.2.2, XFree86-2.1.0, gtk-1.2.10, glib-1.2.10, jdk1.3.1.
Still no backtrace. Even with all debugging enabled all I see is

Document http://www.google.com/ loaded successfully
         (This is where I hit File->Close)
Gdk-ERROR **: BadWindow (invalid Window parameter)
serial 11010 error_code 3 request_code 15 minor_code 0
  serial 11756 error_code 3 request_code 15 minor_code 0
LWP 6863 exited.
LWP 6893 exited.
LWP 6865 exited.
LWP 6864 exited.

Program exited with code 01.
(gdb) bt
No stack.
(gdb) 

I tried hard to find the exit path but It is such a huge app. Is there a place I
could breakpoint and then do "bt". Or if I knew where File->Close is.

If it helps. Attached is the "strace" of execution right after I hit
"File->Close"
Apparently I couldnt attach it to this message, so it arrived in the last
message. Sorry about that.

Feel free to ask if you want me to try anything or any patches. I can reproduce
the result on all my system (I have about 20 all with different
hardware/configurations)
FT



ah the Gdk exit is what's killing you :)... um, iirc there's a flag to force 
gdk errors to crash so you can get useful debug info  ... iirc it was 
-g-fatal-errors but use google or groups.google to find it.

open("/home/rathore/.mozilla/rathore/7cnf33h5.slt/bookmarks.html", 
O_WRONLY|O_CREAT|O_TRUNC, 0666) = 28
shmat(28, 0x1, 0x1ptrace: umoven: Input/output error
)                     = ?

this also intrigued me...

confirming since both crashes have real data...
Assignee: asa → trudelle
Status: UNCONFIRMED → NEW
Component: Browser-General → XP Toolkit/Widgets
Ever confirmed: true
QA Contact: doronr → aegis
Whiteboard: <Gdk-ERROR **: BadWindow> [@nsEventListenerManager::HandleEventSubType]
bryner, can you look at this?
Assignee: trudelle → bryner
Here's what I'm seeing (note that I have mozilla set to open new windows with a
blank page):

- Launch mozilla
- File->New Navigator Window
- In the newly opened window, File->Close

Upon doing this, I see:

###!!! ASSERTION: bad param: 'aScope', file
/home/bryner/Source/mozilla/js/src/xpconnect/src/nsXPConnect.cpp, line 484
###!!! Break: at file
/home/bryner/Source/mozilla/js/src/xpconnect/src/nsXPConnect.cpp, line 484
###!!! ASSERTION: This is not supposed to fail!: 'Error', file
/home/bryner/Source/mozilla/js/src/xpconnect/src/nsXPConnect.cpp, line 343
###!!! Break: at file
/home/bryner/Source/mozilla/js/src/xpconnect/src/nsXPConnect.cpp, line 343
###!!! ASSERTION: NS_ENSURE_TRUE(NS_SUCCEEDED(result)) failed: '(!((result) &
0x80000000))', file
/home/bryner/Source/mozilla/content/events/src/nsEventListenerManager.cpp, line 1041
###!!! Break: at file
/home/bryner/Source/mozilla/content/events/src/nsEventListenerManager.cpp, line 1041

However, the window closes, and I don't crash.  But, when I try to load a URL
into the remaining browser window, I now crash, with these assertions coming first:

WARNING: XPConnect was passed aJSContext from a foreign JSRuntime!, file
/home/bryner/Source/mozilla/js/src/xpconnect/src/nsXPConnect.cpp, line 294
WARNING: XPConnect was passed aJSContext from a foreign JSRuntime!, file
/home/bryner/Source/mozilla/js/src/xpconnect/src/nsXPConnect.cpp, line 294
WARNING: XPConnect was passed aJSContext from a foreign JSRuntime!, file
/home/bryner/Source/mozilla/js/src/xpconnect/src/nsXPConnect.cpp, line 294
WARNING: XPConnect was passed aJSContext from a foreign JSRuntime!, file
/home/bryner/Source/mozilla/js/src/xpconnect/src/nsXPConnect.cpp, line 294
Assertion failure: me == CurrentThreadId(), at
/home/bryner/Source/mozilla/js/src/jslock.c:934

Stack trace coming up.  CC'ing xpconnect and js guys.
Target Milestone: --- → mozilla0.9.4
Status: NEW → ASSIGNED
Keywords: stackneeded
Things are completely screwed up by the time the crash happens. I'd wager you
have a garbage JSContext by then. The 'WARNING's you get in the second block of
output bryner posted indicate to me that the code is probably already to that
point. 

FWIW, given the evolved invariant demanding that xpconnect be used on only one
JSRuntime, we should change that NS_WARNING at nsXPConnect.cpp line 295 to an
NS_ERROR.

More editorial comment... I *hate* the fact that people are routinely running
past botched assertions. There are *no* ignorable assertions in XPConnect.
Bothing any one of them indicates that something is seriously wrong and needs to
be fixed.

The place with the (likely) interesting stack - and the place to debug - is the
very first botched assertion where 'aScope' is null. Whatever code is calling
xpconnect at that point is making a serious error by passing a null JSObject. We
need to backtrack from there. This may take us into jst's domain.

bryner's last stack shows the null JSObject is coming from 
nsEventListenerManager::CompileEventHandlerInternal @ 
nsEventListenerManager.cpp:1040 ...

  result = xpc->WrapNative(cx, ::JS_GetGlobalObject(cx), ...

I'd ask jst to jump in here and help debug this. 

The JSContext may already be whacked at this point. Or the lifecycle of the 
window may be such that we have either not yet set the context global or have 
already cleared it. It is likely that this chunk of code is going to have to be 
more defensive. But there is deeper stuff going on and Johnny will have a clue. 
I wish I could reproduce this on NT.
Hi
I have still not been a ble to make it crash. It just exits with 1 (for gdb) and
prints the same gdk message.

I have also tried compiling with "--enable-crash-on-assert". It show the same
behaviour outside a debugger, but when ran with gdb I get an ASSERTION FAILURE
even before the first window is shown and it crashes with(I will post the output
of script with whole stack trace, it might help):
Delayed SIGSTOP caught for LWP 13257.
###!!! ASSERTION: bad width: 'metrics.width>=0', file nsLineLayout.cpp, line 1093
###!!! Break: at file nsLineLayout.cpp, line 1093
###!!! Abort: at file nsLineLayout.cpp, line 1093

Program received signal SIGABRT, Aborted.
[Switching to Thread 1024 (LWP 13202)]
0x405ff2d1 in kill () at ../../../dist/include/nsCOMPtr.h:409
409
      NS_EXPORT ~nsCOMPtr_base() { }
Current language:  auto; currently c++

FT Rathore -- That particular assertion seems to happen sometimes when running
under gdb... it's most likely a bug in gdb.  I've never seen it outside of the
debugger.

I would guess my crash is different from Bryan Ryner, because I get absolutely
no assertions during my test, besides the one I mentioned (and only inside gdb).
I am compiling with all debug enabled and --enable-crash-on-assert. I never get
any assertions. Bryan are you using some special flag for configure? (would
electric fence help?)

Today's compile is giving me a slightly different result. Here they are

$ ./mozilla
..

File->"New Navigator Window"
File->Close
nsWidget::~nsWidget() of toplevel: 43 widgets still exist.
nsWidget::~nsWidget() of toplevel: 42 widgets still exist.
nsWidget::~nsWidget() of toplevel: 40 widgets still exist.
nsWidget::~nsWidget() of toplevel: 38 widgets still exist.
nsWidget::~nsWidget() of toplevel: 36 widgets still exist.
nsWidget::~nsWidget() of toplevel: 34 widgets still exist.
Gdk-ERROR **: BadWindow (invalid Window parameter)
  serial 10815 error_code 3 request_code 15 minor_code 0

And all windows gone (no other unusual failures)
Bryner and I sat down and looked at this last night and we found the reason for
this crash, the reason was that the nsJSContext that was pointed to by a
nsJSEventListener (through a weak reference, i.e. a raw nsIScriptContext
pointer) had already been destroyed and we ended up asking for the JSContext
through deleted memory and we happend to get the right JSContext, but the
JSContext had no global object in it any more. We came up with a fix that makes
nsJSEventListener hold a strong reference to the nsIScriptContext, and we think
this is the right thing to do. Pre-XPCDOM holding the weak reference was safe
due to the ownership model between the objects in question, but I don't see how
that can be true any more after the XPCDOM landing. We ran a few runs with the
leak detection stuff turned on and we didn't see any new leaks so I believe we
found the right fix. Bryner has the fix and will attach a patch...
Attached patch patchSplinter Review
The patch did not fix my problem. I am still getting the same Gdk-ERROR. I will
try a little more to get a stack trace. Another problem is that gdb runs me out
of resources pretty quick.
Ok, so these do sound like two separate problems (especially since you say this
has been happening since 0.9, well before xpcdom).  Are you running on a remote
X server?
Hi
I am running mozilla local and remote, both on the same display. Your question
gave me and idea and I started trying different combinations of displays and
environments. Now I know a lot more about this problem.

It seems to only happen if running sawfish and have focus-follow-mouse enabled.
Since most of my systems had this thing in common it was crashing on all of
them. So:

This bug is only happening on a display (remote or local) that is using sawfish
as a window manager and have the focus policy "Enter-Only" Or "Enter-Exit" (both
  forms of focus follow mouse).

This might not be a mozilla bug then or maybe somthing that is getting exposed
in this particular case.

Most of my displays are running this combination (gnome + sawfish-0.38 on
Mdk-8). The mozilla itself can be running on any (Linux for now, ill test HP-UX
and Solaris soon) system configuration (kernel, gtk version, Xlib version, etc)
so the problem is with the display, and particularly window manager.

Please allow me to do a little more investigation before closing the bug.
 
Running sawfish with enter-only mouse focus on RH 7.1, I can close windows in
today's build with File->Close in both browser and editor.  Am I missing a step
in reproducing this?  Should I try it with a release build?
sr=jst on bryner's patch.
r=jag
Patch is checked in.

Since this doesn't fix the actual crash experienced by the reporter, leaving
this bug open.
Akanna: the bug cannot be replicated with RH-7.1 since RH uses sawfish-0.36-7,
where Mandrake 9 shipps with sawfish 0.38. I have already tried with using
sawfish RPM from RedHat-7.1 and it works fine.

I will try to spend today figuring it out. I will also try talking to the
sawfish guys.
FT
OK, more corrections. It will only crash if sawfish was compiled with:
$ ./configure --with-gdk-pixbuf

So if sawfish is using gdk-pixbuf instead of imlib I get that error. I have
tested with sawfish 0.37, 0.38, 0.99 and 1.0. with gdk-pixbuf-0.10.1-1mdk. and
all of the will produce tha crash if compiled with gdk-pixbuf

The reason it is not happening in RedHat-7.1 is that redhat is using imlib
instead of gdk-pixbuf. Where Mandrake compiles with gdk-pixbuf

I will post it in sawfish mailing list.

Why does it always have to be Distro issue ;-)
Sorry about this bug..

It is now fixed in sawfish-1.0-3mdk in Mandrake cooker and will be available in
Mandrake 8.1 beta 1 (should be available next week)..

And BTW, next time you find a distribution specific bug, please contact
distribution people first.. Thanks
Hi
It did not seem to be distribution specific bug when I reported it. It is
probably a gdk-pixbuf problem.

I have noticed that loki's site comes up correctly now after using imlib. Mayby
Im imagining.

I guess this one was a false alarm, (though this bug report did get a weak
reference problem licked). It can probably be closed now.

I will be back with more.
In that case...
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
FYI, I checked in this fix on the m092 branch as a fix for bug 94374.
May God have mercy on us all. The 212 bug spam-o-rama is Now!
QA Contact: aegis → jrgm
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: