Status

()

--
critical
VERIFIED FIXED
18 years ago
18 years ago

People

(Reporter: jg, Assigned: blizzard)

Tracking

({crash, regression})

Trunk
x86
Linux
crash, regression
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(URL)

Attachments

(2 attachments)

(Reporter)

Description

18 years ago
Visiting this page is causing me to crash in my ~1hour-old linux opt build.
Other *.mozilla.org pages work fine.

- Reproducable: sometimes.
- Steps to reproduce: Sometimes start the browser with blank page, go direct to
this page, crash. Sometimes go to bugzilla.mozilla.org then click Today's Bugs
and crash. Sometimes works, sometimes doesn't.
- Result: About 50% crash rate or more right now.

No idea which component is causing crash, so sending to browser-general
(shudder). Marking critical since this is a crash and on an important page for
QA people to use. No known workaround.
(Reporter)

Comment 1

18 years ago
Keywords. Worth noting perhaps that my build tree three days and more go wasn't
crashing at all like this. During the 0.8.1 freeze my tree wouldn't build and
thus last night I checked out a brand new clean tree and built overnight. All
seems to be working bar this crash.
Keywords: crash, regression

Comment 2

18 years ago
are there any error messages in console? (If you start mozilla from commandline...)
(Reporter)

Comment 3

18 years ago
Other than the standard crash message which we all get running mozilla on linux,
there's nothing. Nominating for 0.8.1 to get some investigation underway, 0.9
otherwise.

I have a strange gut feeling this has something to do with cookies, but no evidence.

I'm doing a debug build right now, but it may take a few hours, then then a
while for me to figure out how to get a stack trace. But it's coming.
Keywords: mozilla0.8.1, mozilla0.9

Comment 4

18 years ago
i am not seeing the crash, linux mandrake 7.1 opt build from today

Comment 5

18 years ago
no problems here (linux 2001031409 on Susi6.4).
James: can you start mozilla with the -g option to run it within the debugger?
When it crashed it will tell you where. And if you enter "where" after that, it
tells you more exactly, how it came to the crash. Posting such information would
probably help to find the component, which caused the crash to get this bug out
of "browser general". 
Did you use any sophisticated options when compiling mozilla?

Comment 6

18 years ago
conforming with PC Linux 2001031408 and Redhat 7.0.

Comment 7

18 years ago
Document http://bugzilla.mozilla.org/ loaded successfully

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 1024 (LWP 2788)]
0x404e00be in NSGetModule ()
   from /l/appl/mozilla/2001-03-14-Mtrunk/components/libwidget_gtk.so
(gdb) bt
#0  0x404e00be in NSGetModule ()
   from /l/appl/mozilla/2001-03-14-Mtrunk/components/libwidget_gtk.so
#1  0x4066dc4f in gdk_event_dispatch () from /usr/lib/libgdk-1.2.so.0
#2  0x406a0987 in g_main_dispatch () from /usr/lib/libglib-1.2.so.0
#3  0x406a1001 in g_main_iterate () from /usr/lib/libglib-1.2.so.0
#4  0x406a11cc in g_main_run () from /usr/lib/libglib-1.2.so.0
#5  0x405b7aa3 in gtk_main () from /usr/lib/libgtk-1.2.so.0
#6  0x404d85ec in NSGetModule ()
   from /l/appl/mozilla/2001-03-14-Mtrunk/components/libwidget_gtk.so
#7  0x403b055a in NSGetModule ()
   from /l/appl/mozilla/2001-03-14-Mtrunk/components/libnsappshell.so
#8  0x0804e025 in JS_PushArguments ()
#9  0x0804e885 in JS_PushArguments ()
#10 0x4025df31 in __libc_start_main (main=0x804e758 <JS_PushArguments+12836>, 
    argc=1, ubp_av=0xbffff83c, init=0x804b074 <_init>, fini=0x8054b04 <_fini>, 
    rtld_fini=0x4000e274 <_dl_fini>, stack_end=0xbffff834)
    at ../sysdeps/generic/libc-start.c:129

(Reporter)

Comment 8

18 years ago
Backtrace coming up (finally). Got this by going direct to Today's bug list,
then go to query, find some bugs, go to one of the bugs found, hit the back
button's drop-down, select the original bug list, then hit reload.

(gdb) bt
#0  0x4084d6f5 in handle_gdk_event (event=0x81dcb04, data=0x0) at
/usr/src/cvs/mozilla/mozilla/widget/src/gtk/nsGtkEventHandler.cpp:795
#1  0x409ee4d7 in gdk_wm_protocols_filter () from /usr/lib/libgdk-1.2.so.0
#2  0x40a1b2b9 in g_get_current_time () from /usr/lib/libglib-1.2.so.0
#3  0x40a1b8c3 in g_get_current_time () from /usr/lib/libglib-1.2.so.0
#4  0x40a1ba5c in g_main_run () from /usr/lib/libglib-1.2.so.0
#5  0x4093ebd7 in gtk_main () from /usr/lib/libgtk-1.2.so.0
#6  0x408419e5 in nsAppShell::Run (this=0x80b8e10) at
/usr/src/cvs/mozilla/mozilla/widget/src/gtk/nsAppShell.cpp:360
#7  0x40602f1e in nsAppShellService::Run (this=0x80b6800) at
/usr/src/cvs/mozilla/mozilla/xpfe/appshell/src/nsAppShellService.cpp:407
#8  0x080572c6 in main1 (argc=1, argv=0xbffff40c, nativeApp=0x0) at
/usr/src/cvs/mozilla/mozilla/xpfe/bootstrap/nsAppRunner.cpp:1004
#9  0x080580e1 in main (argc=1, argv=0xbffff40c) at
/usr/src/cvs/mozilla/mozilla/xpfe/bootstrap/nsAppRunner.cpp:1298
#10 0x40389f5c in __libc_start_main () from /lib/libc.so.6

cc blizzard since this looks like gtk. This is proving very hard to reproduce,
more so than on an optimised build. Build pulled from the tip early this
morning, with options:
mk_add_options MOZ_OBJDIR=@TOPSRCDIR@/../obj-debug
ac_add_options --with-gtk
ac_add_options --with-extensions=cookie,wallet,inspector
and PSM built at make -f client.mk time.

System:
ii  libgtk1.2      1.2.9-2        The GIMP Toolkit set of widgets for X
ii  libglib1.2     1.2.8-helix1   The GLib library of C routines

Comment 9

18 years ago
this resembles what i see in bug 71833
(Reporter)

Comment 10

18 years ago
-> Event Handling. CC Pavlov for GTK issues.

Blizzard: didn't you do some event handling fu here?
Component: Browser-General → Event Handling
(Assignee)

Comment 11

18 years ago
I did but I don't see that code in this trace at all.  In fact, try changing
your WM and see what happens because I don't see these crashes at all.

Comment 12

18 years ago
reassigning
Assignee: asa → joki
QA Contact: doronr → gerardok

Comment 13

18 years ago
I tested this without any window manager - still crashing.

Comment 14

18 years ago
I made two different builds: one without the patch for bug 67370 (interleaving
events and xlib events) and one with it.

Preliminary tests show:
  without the patch for 67370:  no crashes and no GTK/GDK asserts/errors
  with patch for 67370:         crashes and asserts/errors

Because this bug is sometimes hard to reproduce I will run other round of tests
for it later. Now I go to get some sleep :)
(Assignee)

Comment 15

18 years ago
Oh, great.

Comment 16

18 years ago
Same results for the second test round.

Can somebody else confirm these results?
(Assignee)

Comment 17

18 years ago
OK, last night right before I fell asleep I thought of a possible race condition
in the event handling code that might very well cause this.  I'll have a patch
for testing in a few minutes.
(Assignee)

Comment 19

18 years ago
The race condition that might exist ( and this patch would fix ) involves event
ordering.

Imagine a situation where you have just scheduled a window delete event because
of a mouse down.  That deletion doesn't happen immediately and there's still a
mouse up on the X queue for your window.  You get the mouse up event and enter
handle_gdk_event(), the event queue is processed, the window is destroyed, and
you continue to process the mouse up event for the window which has just been
destroyed.  "Boom."  There are no associated Gdk windows and no associated Gtk
widget which would cause this behaviour exactly.

Asko, can you test this patch please?  I still can't reproduce the bug but you
seem very adept at it.
Assignee: joki → blizzard

Comment 20

18 years ago
Thanks, I will test the patch but unfortunately those test builds will take some
time...

Comment 21

18 years ago
Okay, preliminary test:

With the new patch I cannot see any crashes or GTK/GDK asserts/errors. So the
patch seems to fix this bug.
(Assignee)

Comment 22

18 years ago
Asko, are you sure?  Can you run it for a long time just to make sure?  The
warnings didn't happen every time in my case.

Comment 23

18 years ago
Normally I can crash mozilla in 30-60 seconds with bugzilla if it has this bug.
When I installed your new patch I couldn't crash it or see any asserts/errors
relating GTK/GDK in the preliminary test which took 10-15 minutes.

But if you want I will continue testing it. When will you need the results?
(Assignee)

Comment 24

18 years ago
OK, I managed to reproduce this in a debugger without my patch.  The window has
obviously been destroyed out from under the event:


Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 1024 (LWP 30202)]
0x4070d17c in handle_gdk_event (event=0x817af68, data=0x0)
    at ../../../../mozilla/widget/src/gtk/nsGtkEventHandler.cpp:795
Current language:  auto; currently c++
(gdb) print object
$1 = (GtkObject *) 0x81216b8
(gdb) print *object
$2 = {klass = 0x690066, flags = 7536754, ref_count = 4391028, 
  object_data = 0x690068}
(gdb) print *event->any.window
$3 = (GdkWindow *) 0x8337db8
(gdb) print (GdkWindowPrivate *)event->any.window
$4 = {user_data = 0x81216b8}
(gdb) print *(GdkWindowPrivate *)event->any.window
$5 = (GdkWindowPrivate *) 0x8337db8
(gdb) print *(GdkWindowPrivate *)event->any.window
$6 = {window = {user_data = 0x81216b8}, parent = 0x8131050, 
  xwindow = 46137543, xdisplay = 0x8057d80, x = 0, y = 0, width = 600, 
  height = 563, resize_count = 1 '\001', window_type = 2 '\002', 
  ref_count = 2, destroyed = 1, mapped = 0, guffaw_gravity = 1, 
  extension_events = 0, filters = 0x0, colormap = 0x807dd68, children = 0x0}

I'm going to try it with the patch and see if I can reproduce it then.
Status: NEW → ASSIGNED

Comment 25

18 years ago
Argh, bugzilla ate my comments... Retrying.

Now I have tested this over an hour.

During the test period my patched debug build didn't crash and I didn't see
the GTK/GDK asserts or errors. Only assertion I got was

###!!! ASSERTION: all buffered data should be gone: '!mBuffer', file
nsMultiMixedConv.cpp, line 332
###!!! Break: at file nsMultiMixedConv.cpp, line 332

but I got it also with the build which didn't have the patch for bug 67370.

So I am pretty sure your patch fixes this bug.
(Assignee)

Comment 26

18 years ago
In a build without the patch I could usually reload about 4 times before I would
crash.  I applied the patch and sat there and reloaded for about a minute
straight and never saw a warning or crash.  I removed the patch, rebuilt and
loaded about 5 times before it crashed.  I applied the patch, rebuilt and sat
there and reloaded over and over and didn't have any problems.  I would say that
the patch fixes this problem.
(Assignee)

Comment 27

18 years ago
I have an sr=shaver on this with a comment change.

Comment 28

18 years ago
The patch is good as gold. I crashed all the time the past 5 days - had reverted
to NC4.75. All assertions and crash as reported in bug 71833 are also gone after
this patch was applied.
(Assignee)

Comment 29

18 years ago
*** Bug 71833 has been marked as a duplicate of this bug. ***
Lotsa newlines after label end: but it's your code -- r/sr=brendan@mozilla.org
on patch #2.

/be

Comment 32

18 years ago
Seems better with that comment.

I was calculating how long it would take before overflow when you posted it :-)
(Assignee)

Comment 33

18 years ago
Fix checked into the tip and the branch.
Status: ASSIGNED → RESOLVED
Last Resolved: 18 years ago
Resolution: --- → FIXED

Comment 34

18 years ago
Thanks.
(Assignee)

Comment 35

18 years ago
*** Bug 72499 has been marked as a duplicate of this bug. ***

Comment 36

18 years ago
QA contact updated
QA Contact: gerardok → madhur

Comment 37

18 years ago
verifed on Linux , build id : 20010716
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.