Closed Bug 72069 Opened 24 years ago Closed 24 years ago

Crash loading page

Categories

(Core :: DOM: UI Events & Focus Handling, defect)

x86
Linux
defect
Not set
critical

Tracking

()

VERIFIED FIXED

People

(Reporter: jg, Assigned: blizzard)

References

()

Details

(Keywords: crash, regression)

Attachments

(2 files)

Visiting this page is causing me to crash in my ~1hour-old linux opt build. Other *.mozilla.org pages work fine. - Reproducable: sometimes. - Steps to reproduce: Sometimes start the browser with blank page, go direct to this page, crash. Sometimes go to bugzilla.mozilla.org then click Today's Bugs and crash. Sometimes works, sometimes doesn't. - Result: About 50% crash rate or more right now. No idea which component is causing crash, so sending to browser-general (shudder). Marking critical since this is a crash and on an important page for QA people to use. No known workaround.
Keywords. Worth noting perhaps that my build tree three days and more go wasn't crashing at all like this. During the 0.8.1 freeze my tree wouldn't build and thus last night I checked out a brand new clean tree and built overnight. All seems to be working bar this crash.
Keywords: crash, regression
are there any error messages in console? (If you start mozilla from commandline...)
Other than the standard crash message which we all get running mozilla on linux, there's nothing. Nominating for 0.8.1 to get some investigation underway, 0.9 otherwise. I have a strange gut feeling this has something to do with cookies, but no evidence. I'm doing a debug build right now, but it may take a few hours, then then a while for me to figure out how to get a stack trace. But it's coming.
i am not seeing the crash, linux mandrake 7.1 opt build from today
no problems here (linux 2001031409 on Susi6.4). James: can you start mozilla with the -g option to run it within the debugger? When it crashed it will tell you where. And if you enter "where" after that, it tells you more exactly, how it came to the crash. Posting such information would probably help to find the component, which caused the crash to get this bug out of "browser general". Did you use any sophisticated options when compiling mozilla?
conforming with PC Linux 2001031408 and Redhat 7.0.
Document http://bugzilla.mozilla.org/ loaded successfully Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 1024 (LWP 2788)] 0x404e00be in NSGetModule () from /l/appl/mozilla/2001-03-14-Mtrunk/components/libwidget_gtk.so (gdb) bt #0 0x404e00be in NSGetModule () from /l/appl/mozilla/2001-03-14-Mtrunk/components/libwidget_gtk.so #1 0x4066dc4f in gdk_event_dispatch () from /usr/lib/libgdk-1.2.so.0 #2 0x406a0987 in g_main_dispatch () from /usr/lib/libglib-1.2.so.0 #3 0x406a1001 in g_main_iterate () from /usr/lib/libglib-1.2.so.0 #4 0x406a11cc in g_main_run () from /usr/lib/libglib-1.2.so.0 #5 0x405b7aa3 in gtk_main () from /usr/lib/libgtk-1.2.so.0 #6 0x404d85ec in NSGetModule () from /l/appl/mozilla/2001-03-14-Mtrunk/components/libwidget_gtk.so #7 0x403b055a in NSGetModule () from /l/appl/mozilla/2001-03-14-Mtrunk/components/libnsappshell.so #8 0x0804e025 in JS_PushArguments () #9 0x0804e885 in JS_PushArguments () #10 0x4025df31 in __libc_start_main (main=0x804e758 <JS_PushArguments+12836>, argc=1, ubp_av=0xbffff83c, init=0x804b074 <_init>, fini=0x8054b04 <_fini>, rtld_fini=0x4000e274 <_dl_fini>, stack_end=0xbffff834) at ../sysdeps/generic/libc-start.c:129
Backtrace coming up (finally). Got this by going direct to Today's bug list, then go to query, find some bugs, go to one of the bugs found, hit the back button's drop-down, select the original bug list, then hit reload. (gdb) bt #0 0x4084d6f5 in handle_gdk_event (event=0x81dcb04, data=0x0) at /usr/src/cvs/mozilla/mozilla/widget/src/gtk/nsGtkEventHandler.cpp:795 #1 0x409ee4d7 in gdk_wm_protocols_filter () from /usr/lib/libgdk-1.2.so.0 #2 0x40a1b2b9 in g_get_current_time () from /usr/lib/libglib-1.2.so.0 #3 0x40a1b8c3 in g_get_current_time () from /usr/lib/libglib-1.2.so.0 #4 0x40a1ba5c in g_main_run () from /usr/lib/libglib-1.2.so.0 #5 0x4093ebd7 in gtk_main () from /usr/lib/libgtk-1.2.so.0 #6 0x408419e5 in nsAppShell::Run (this=0x80b8e10) at /usr/src/cvs/mozilla/mozilla/widget/src/gtk/nsAppShell.cpp:360 #7 0x40602f1e in nsAppShellService::Run (this=0x80b6800) at /usr/src/cvs/mozilla/mozilla/xpfe/appshell/src/nsAppShellService.cpp:407 #8 0x080572c6 in main1 (argc=1, argv=0xbffff40c, nativeApp=0x0) at /usr/src/cvs/mozilla/mozilla/xpfe/bootstrap/nsAppRunner.cpp:1004 #9 0x080580e1 in main (argc=1, argv=0xbffff40c) at /usr/src/cvs/mozilla/mozilla/xpfe/bootstrap/nsAppRunner.cpp:1298 #10 0x40389f5c in __libc_start_main () from /lib/libc.so.6 cc blizzard since this looks like gtk. This is proving very hard to reproduce, more so than on an optimised build. Build pulled from the tip early this morning, with options: mk_add_options MOZ_OBJDIR=@TOPSRCDIR@/../obj-debug ac_add_options --with-gtk ac_add_options --with-extensions=cookie,wallet,inspector and PSM built at make -f client.mk time. System: ii libgtk1.2 1.2.9-2 The GIMP Toolkit set of widgets for X ii libglib1.2 1.2.8-helix1 The GLib library of C routines
this resembles what i see in bug 71833
-> Event Handling. CC Pavlov for GTK issues. Blizzard: didn't you do some event handling fu here?
Component: Browser-General → Event Handling
I did but I don't see that code in this trace at all. In fact, try changing your WM and see what happens because I don't see these crashes at all.
reassigning
Assignee: asa → joki
QA Contact: doronr → gerardok
I tested this without any window manager - still crashing.
I made two different builds: one without the patch for bug 67370 (interleaving events and xlib events) and one with it. Preliminary tests show: without the patch for 67370: no crashes and no GTK/GDK asserts/errors with patch for 67370: crashes and asserts/errors Because this bug is sometimes hard to reproduce I will run other round of tests for it later. Now I go to get some sleep :)
Oh, great.
Same results for the second test round. Can somebody else confirm these results?
OK, last night right before I fell asleep I thought of a possible race condition in the event handling code that might very well cause this. I'll have a patch for testing in a few minutes.
Attached patch patchSplinter Review
The race condition that might exist ( and this patch would fix ) involves event ordering. Imagine a situation where you have just scheduled a window delete event because of a mouse down. That deletion doesn't happen immediately and there's still a mouse up on the X queue for your window. You get the mouse up event and enter handle_gdk_event(), the event queue is processed, the window is destroyed, and you continue to process the mouse up event for the window which has just been destroyed. "Boom." There are no associated Gdk windows and no associated Gtk widget which would cause this behaviour exactly. Asko, can you test this patch please? I still can't reproduce the bug but you seem very adept at it.
Assignee: joki → blizzard
Thanks, I will test the patch but unfortunately those test builds will take some time...
Okay, preliminary test: With the new patch I cannot see any crashes or GTK/GDK asserts/errors. So the patch seems to fix this bug.
Asko, are you sure? Can you run it for a long time just to make sure? The warnings didn't happen every time in my case.
Normally I can crash mozilla in 30-60 seconds with bugzilla if it has this bug. When I installed your new patch I couldn't crash it or see any asserts/errors relating GTK/GDK in the preliminary test which took 10-15 minutes. But if you want I will continue testing it. When will you need the results?
OK, I managed to reproduce this in a debugger without my patch. The window has obviously been destroyed out from under the event: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 1024 (LWP 30202)] 0x4070d17c in handle_gdk_event (event=0x817af68, data=0x0) at ../../../../mozilla/widget/src/gtk/nsGtkEventHandler.cpp:795 Current language: auto; currently c++ (gdb) print object $1 = (GtkObject *) 0x81216b8 (gdb) print *object $2 = {klass = 0x690066, flags = 7536754, ref_count = 4391028, object_data = 0x690068} (gdb) print *event->any.window $3 = (GdkWindow *) 0x8337db8 (gdb) print (GdkWindowPrivate *)event->any.window $4 = {user_data = 0x81216b8} (gdb) print *(GdkWindowPrivate *)event->any.window $5 = (GdkWindowPrivate *) 0x8337db8 (gdb) print *(GdkWindowPrivate *)event->any.window $6 = {window = {user_data = 0x81216b8}, parent = 0x8131050, xwindow = 46137543, xdisplay = 0x8057d80, x = 0, y = 0, width = 600, height = 563, resize_count = 1 '\001', window_type = 2 '\002', ref_count = 2, destroyed = 1, mapped = 0, guffaw_gravity = 1, extension_events = 0, filters = 0x0, colormap = 0x807dd68, children = 0x0} I'm going to try it with the patch and see if I can reproduce it then.
Status: NEW → ASSIGNED
Argh, bugzilla ate my comments... Retrying. Now I have tested this over an hour. During the test period my patched debug build didn't crash and I didn't see the GTK/GDK asserts or errors. Only assertion I got was ###!!! ASSERTION: all buffered data should be gone: '!mBuffer', file nsMultiMixedConv.cpp, line 332 ###!!! Break: at file nsMultiMixedConv.cpp, line 332 but I got it also with the build which didn't have the patch for bug 67370. So I am pretty sure your patch fixes this bug.
In a build without the patch I could usually reload about 4 times before I would crash. I applied the patch and sat there and reloaded for about a minute straight and never saw a warning or crash. I removed the patch, rebuilt and loaded about 5 times before it crashed. I applied the patch, rebuilt and sat there and reloaded over and over and didn't have any problems. I would say that the patch fixes this problem.
I have an sr=shaver on this with a comment change.
The patch is good as gold. I crashed all the time the past 5 days - had reverted to NC4.75. All assertions and crash as reported in bug 71833 are also gone after this patch was applied.
*** Bug 71833 has been marked as a duplicate of this bug. ***
Attached patch patch #2Splinter Review
Lotsa newlines after label end: but it's your code -- r/sr=brendan@mozilla.org on patch #2. /be
Seems better with that comment. I was calculating how long it would take before overflow when you posted it :-)
Fix checked into the tip and the branch.
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
Thanks.
*** Bug 72499 has been marked as a duplicate of this bug. ***
QA contact updated
QA Contact: gerardok → madhur
verifed on Linux , build id : 20010716
Status: RESOLVED → VERIFIED
Component: Event Handling → User events and focus handling
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: