Last Comment Bug 398499 - Crash in DispatchEventToHandlers(EventTargetRec*, OpaqueEventRef*, HandlerCallRec*
: Crash in DispatchEventToHandlers(EventTargetRec*, OpaqueEventRef*, HandlerCal...
Status: VERIFIED FIXED
[radar:5466547] 1.8.1.x-only crash
: crash, regression, verified1.8.1.12
Product: Core Graveyard
Classification: Graveyard
Component: Widget: Mac (show other bugs)
: 1.8 Branch
: x86 Mac OS X
: -- critical (vote)
: ---
Assigned To: Steven Michaud [:smichaud] (Retired)
:
Mentors:
http://en.wikipedia.org/w/index.php?t...
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2007-10-03 18:19 PDT by Marcia Knous [:marcia - use ni]
Modified: 2009-11-21 15:10 PST (History)
10 users (show)
dveditz: wanted1.8.1.x+
dveditz: wanted1.8.0.x-
See Also:
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
Apple crash report (82.30 KB, application/rtf)
2007-10-03 18:19 PDT, Marcia Knous [:marcia - use ni]
no flags Details
Text version of crash (81.63 KB, text/plain)
2007-10-04 10:55 PDT, Marcia Knous [:marcia - use ni]
no flags Details
gdb stack on 10.4.10 (157.42 KB, text/plain)
2007-10-16 21:20 PDT, Jeremy Baron
no flags Details
CrashReporter log; 10.4.10 +rosetta (30.22 KB, text/plain)
2007-10-18 20:03 PDT, Jeremy Baron
no flags Details
Stack trace from build with debug symbols (71.81 KB, text/plain)
2007-10-23 10:00 PDT, Steven Michaud [:smichaud] (Retired)
no flags Details
Fix (1.29 KB, patch)
2007-10-26 09:39 PDT, Steven Michaud [:smichaud] (Retired)
mark: review+
roc: superreview+
dveditz: approval1.8.1.12+
Details | Diff | Splinter Review

Description Marcia Knous [:marcia - use ni] 2007-10-03 18:19:08 PDT
Created attachment 283490 [details]
Apple crash report

Seen while testing Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.1.8pre) Gecko/2007100307 BonEcho/2.0.0.8pre.

STR:
1. Load the URL. Crash.

Apple Crash report attached.
Comment 1 Marcia Knous [:marcia - use ni] 2007-10-03 18:21:25 PDT
Additional STR:

1. Change focus to another application, such as Thunderbird and put that window on top. Then switch back to FF and that should generate the crash.
Comment 2 Marcia Knous [:marcia - use ni] 2007-10-03 18:25:58 PDT
I checked the trunk and I do not get this crash when following the same steps. So far now this is 1.8 branch only.
Comment 3 Marcia Knous [:marcia - use ni] 2007-10-03 18:36:55 PDT
nominating, not sure how critical this is.
Comment 4 Olli Pettay [:smaug] (vacation Aug 25-28) 2007-10-04 06:32:52 PDT
Not DOM Events issue. Maybe Widget: Mac?
Comment 5 Marcia Knous [:marcia - use ni] 2007-10-04 10:55:24 PDT
Created attachment 283577 [details]
Text version of crash
Comment 6 Marcia Knous [:marcia - use ni] 2007-10-04 10:56:17 PDT
BTW thanks to Jeremy for pointing me to this bug, he should get the credit for finding it.
Comment 7 Jeremy Baron 2007-10-04 15:21:52 PDT
1.8 is widget:mac (Carbon) and trunk is Cocoa.  This crashes in a Carbon system library (which I'm guessing is not used from Cocoa apps) and requires interaction with another app so it makes sense that it crashes only 1.8 and makes me think it's triggered in or related to widget:mac. [mid-air with smaug]

(In reply to comment #1)
> Additional STR:
> 1. Change focus to another application, such as Thunderbird and put that window
> on top. Then switch back to FF and that should generate the crash.
No need to switch back.  AFAICT this also requires mouse activity over any part of the moz app which is covered by another app.

(In reply to comment #3)
> nominating, not sure how critical this is.
fwiw, I'm 98% sure that a stack in my local CrashReporter (apple) log from march was *not* on a MediaWiki site.  And that happens to be the first Firefox crash in the log with HIToolbox 1.4.9 (the previous crash was 1.4.8 and a different stack)  So this is probably not a MediaWiki specific issue.

The CrashReporter log from marcia has a truncated stack (as does my own on OS X 10.4.10).  I'll attach a my own gdb trace for 10.4.10.  I'll walk Marcia through getting a gdb trace if anyone wants it for Leopard.

(In reply to comment #6)
> BTW thanks to Jeremy for pointing me to this bug, he should get the credit for
> finding it.
btw, thanks to Marcia to for reproducing on Leopard! :)  It was hard to miss on branch when browsing MediaWiki diffs :-{

<rdar://problem/5466547>
Comment 8 Josh Aas 2007-10-15 19:13:38 PDT
Can somebody confirm that this still happens reliably on a fully software-updated Mac OS X 10.5 build 9a559? Does it happen reliably on this build? How likely are users to hit this bug?
Comment 9 Jeremy Baron 2007-10-16 21:16:29 PDT
(In reply to comment #8)
> Can somebody confirm that this still happens reliably on a fully
> software-updated Mac OS X 10.5 build 9a559? Does it happen reliably on this
> build? How likely are users to hit this bug?

Are you having trouble reproducing?

Marcia repro'd on that build and it was prolly up to date but I'll have to double check with her. (there's prolly enough info in the report to determine the answer if you knew enough about their build changes but I don't :P) See attachment 283577 [details] for a CrashReporter log on Leopard.  I see I forgot to attach my gdb trace (10.4.10) after my last comment; coming right up.

This reliably happens on 10.4.10 on many tens of Mediawiki generated pages (mostly but maybe not exclusively diff pages, I forget) and therefore will likely effect the majority of OS X users that are more than just casual editors on any Mediawiki wiki.  (Haven't tried to reduce to MW specific versions yet)

FWIW, I'm 95% certain a matching stack in my log from several months back is *not* from Mediawiki.

from a message I just sent to an apple guy that responded to the radar:
> btw, i'd love to get some numbers on how many people are actually crashing
> from this using statistics from talkback but I can't get it to run in rosetta
> (I even tried invoking oah/traslate directly and it segfaulted)  ISTR firefox
> once worked in rosetta on this machine and I hear it still works for other
> people so probably a local issue.  Later this week or next I'll try again to
> get ahold of a PPC tester / find a public PPC machine / try with rosetta on
> another box.  Also, I know CrashReporter generates lots of reports and it
> would be nice to get stats from those too.
Comment 10 Jeremy Baron 2007-10-16 21:20:39 PDT
Created attachment 285198 [details]
gdb stack on 10.4.10
Comment 11 Jeremy Baron 2007-10-18 20:03:51 PDT
Created attachment 285439 [details]
CrashReporter log; 10.4.10 +rosetta

CrashReporter log of crashing on the same page under Rosetta.  Note HIToolbox doesn't even show up in the stack.
Comment 12 Marcia Knous [:marcia - use ni] 2007-10-19 10:00:30 PDT
I just tested this again on the latest Leopard seed 9a559 on a PPC mac and I crash hitting that URL (it is a bit of a delayed crash). Talkback fires, but I am having trouble submitting the report due to a network error (perhaps a separate bug). I have an Apple report from this PPC crash which I can attach.
Comment 13 Steven Michaud [:smichaud] (Retired) 2007-10-22 12:22:38 PDT
I've reproduced this crash on OS X 10.4.10 (on a PPC Mac).  I'll test
later on Leopard, but it's already pretty clear that this is a bad
bug, and not Leopard-specific.

By the way, Jeremy, what your gdb stack (attachment 285198 [details], comment
#10) has at it's bottom is truly bizarre -- it looks like you've quit
Firefox!  Something similar (and equally bizarre) shows up in a stack
trace posted for a seemingly related bug (in 1.8-branch Firefox,
attachment 285380 [details], bug 400291).  I now suspect that the bottom parts
of both stacks are corrupt.

If possible, Jeremy, please do a debug build of 1.8-branch Firefox and
test with that -- because it will have debug symbols, gdb and
crashreporterd will be less likely to generate corrupt stacks from it.

(I'll do the same, of course ... but I probably won't have time until
tomorrow.)
Comment 14 Steven Michaud [:smichaud] (Retired) 2007-10-22 12:28:27 PDT
> Something similar (and equally bizarre) shows up in a stack trace
> posted for a seemingly related bug ...

I should have said "seemingly unrelated".  Aside from the (apparent)
stack-trace corruption, this bug doesn't seem related to bug 400291.
Comment 15 Steven Michaud [:smichaud] (Retired) 2007-10-22 12:42:19 PDT
I see this bug (on OS X 10.4.10) on all the Firefox 2.0.0.X versions
that I tested, but not on any of the Firefox 1.5.0.X versions.  So it
seems to have been introduced (a long time ago) on the 1.8 branch (as
opposed to the 1.8.0 branch).
Comment 16 Steven Michaud [:smichaud] (Retired) 2007-10-22 14:29:24 PDT
This bug is a regression that took place on the 1.8 branch between the
2006-05-30-07-mozilla1.8 nightly (which doesn't have the bug) and the
2006-05-31-08-mozilla1.8 nightly (which does).

This regression window almost certainly implicates the single patch
that was landed as a fix for bug 106695 and bug 339640:

http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=all&branch=MOZILLA_1_8_BRANCH&branchtype=match&dir=&file=&filetype=match&who=&whotype=match&sortby=Date&hours=2&date=explicit&mindate=2006-05-30+12%3A34&maxdate=2006-05-30+12%3A34&cvsroot=%2Fcvsroot
Comment 17 Steven Michaud [:smichaud] (Retired) 2007-10-22 14:31:47 PDT
This bug isn't Leopard-specific.
Comment 18 Steven Michaud [:smichaud] (Retired) 2007-10-23 10:00:35 PDT
Created attachment 285889 [details]
Stack trace from build with debug symbols

Here's a stack trace made using a build with debug symbols (built from
current 1.8-branch source code).

It includes both a gdb trace of the top and bottom of the main thread
stack, and a crashreporterd trace of all the threads' stacks.
Comment 19 Steven Michaud [:smichaud] (Retired) 2007-10-26 09:27:50 PDT
> This regression window almost certainly implicates the single patch
> that was landed as a fix for bug 106695 and bug 339640:

I was wrong about this.  The actual culprit was the single patch
(landed slightly later) to fix bug 54488, bug 339370 and bug 339482:

http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=all&branch=MOZILLA_1_8_BRANCH&branchtype=match&dir=&file=&filetype=match&who=&whotype=match&sortby=Date&hours=2&date=explicit&mindate=2006-05-30+12%3A43%3A00&maxdate=2006-05-30+12%3A43%3A00&cvsroot=%2Fcvsroot
Comment 20 Steven Michaud [:smichaud] (Retired) 2007-10-26 09:39:20 PDT
Created attachment 286306 [details] [diff] [review]
Fix

Here's a patch that fixes this bug.

Contrary to appearances, the bug isn't caused by an infinite recursion
... just by a very large non-infinite one :-)

This bug's URL has a very large number of nsMacControl objects on its
page.  For each of these, a handler for kEventWindowActivated and
kEventWindowDeactivated Carbon events is installed on the window to
which the control belongs.  Currently (before my patch), this handler
calls CallNextEventHandler() to invoke all previously installed
handlers.  But if there are n nsMacControl objects in a page/window,
using CallNextEventHandler() makes the handler recurse n-1 times.

This bug's URL has so many controls that this recursion (apparently)
causes a stack overflow.

Calling CallNextEventHandler() from this handler servers no purpose,
so I've eliminated the call.

Even with my patch, this bug's URL still redraws incredibly slowly,
and sometimes fails to redraw at all.  But that's another problem :-(
Comment 21 Steven Michaud [:smichaud] (Retired) 2007-10-26 09:40:48 PDT
Needless to say, this isn't an Apple bug.

Jeremy, please let them know that.
Comment 22 Mark Mentovai 2007-10-26 14:52:10 PDT
Comment on attachment 286306 [details] [diff] [review]
Fix

This is OK with me.  Make sure that you test that the sheet show/hide stuff in nsMacWindow still works in conjunction with controls, since I know that it uses {kEventClassWindow, kEventWindowActivated} too.
Comment 23 Steven Michaud [:smichaud] (Retired) 2007-10-26 15:05:50 PDT
> Make sure that you test that the sheet show/hide stuff in
> nsMacWindow still works in conjunction with controls, since I know
> that it uses {kEventClassWindow, kEventWindowActivated} too.

Please give me some pointers about how to do this -- examples of URLs
(or browser menus/functions) that use this stuff, and so forth.
Comment 24 Mark Mentovai 2007-10-26 15:09:08 PDT
It's probably enough if you bring up a Bugzilla page and then try to navigate through all of the prefs sheets in Google Toolbar.
Comment 25 Jeremy Baron 2007-10-27 12:19:43 PDT
(In reply to comment #21)
> Jeremy, please let them know that.
Done.
Comment 26 Steven Michaud [:smichaud] (Retired) 2007-10-29 08:06:52 PDT
> It's probably enough if you bring up a Bugzilla page and then try to
> navigate through all of the prefs sheets in Google Toolbar.

Thanks, Mark.

I did this (while occasionally switching to another browser window and
back, or to another app and back), and saw no problems.
Comment 27 Steven Michaud [:smichaud] (Retired) 2007-10-29 08:14:18 PDT
Comment on attachment 286306 [details] [diff] [review]
Fix

Roc, Jeremy has already pointed out the following error in my comment.
I'll fix it when I land the patch:

+  // actually "consume" an activate or deactive event here, there should be

should be

+  // actually "consume" an activate or deactivate event here, there should be
Comment 28 Jeremy Baron 2007-10-29 08:49:20 PDT
Comment on attachment 286306 [details] [diff] [review]
Fix

(In reply to comment #27)
As long as we're enumerating errata here, I'll add to the pile:
>+  // for each of them.  So CallNextEventHandler() recurses through this
>+  // handler once for each "additional" control, and if there are too many
>+  // controls a stack overflow can result (bmo bug 398499).  Since we never
This sentence could probably use an extra comma or to be split into 2 sentences.

>+  // no problem alwaye returning eventNotHandledErr.
s/alwaye/always/
Comment 29 Steven Michaud [:smichaud] (Retired) 2007-11-05 07:41:51 PST
Roc? :-)
Comment 30 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2007-11-05 13:35:24 PST
Comment on attachment 286306 [details] [diff] [review]
Fix

sorry, I lost track of this one somehow
Comment 31 Steven Michaud [:smichaud] (Retired) 2007-11-05 14:12:43 PST
Comment on attachment 286306 [details] [diff] [review]
Fix

This patch's risk is minimal (the call to CallNextEventHandler()
didn't serve any useful purpose).

It's benefits are perhaps moderate (very few web pages will have
enough controls to trigger the crash my patch fixes).  Or you could
argue that, since Wikipedia is an important website, the patch's
benefits are substantial.
Comment 32 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2007-11-05 14:30:58 PST
Personally I don't think we should take this on branch.
Comment 33 Steven Michaud [:smichaud] (Retired) 2007-11-05 14:34:38 PST
Oh?  Why not?

(The problem is branch-only, by the way.)
Comment 34 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2007-11-05 14:38:38 PST
Because we typically only take crash fixes on branch if they have security implications, and AFAIK this doesn't. We might make exceptions for crashes that are regressions from other branch changes, or for crashes that are very frequent, but this doesn't seem to meet those criteria either.
Comment 35 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2007-11-05 14:41:35 PST
> We might make exceptions for crashes that are regressions from other branch
> changes

What I mean specifically is that we might take a fix for a crash that was introduced since Firefox 2.0.0.0 shipped. But it seems this bug was in Firefox 2.0.0.0.
Comment 36 Steven Michaud [:smichaud] (Retired) 2007-11-05 14:47:57 PST
> Because we typically only take crash fixes on branch if they have
> security implications, and AFAIK this doesn't.

No, I agree this crash doesn't have security implications.

But this policy seems _way_ too restrictive ... at least in my
opinion.  Even if you sometimes make exceptions to it.

Though it'd be another matter if this bug didn't have a patch.

Given that a patch exists, I'm not sure it makes sense to
distinguish between "primordial" crash bugs and regressions.

(By the way, one reason this bug got a lot of attention is that at
first it seemed to be Leopard-only.)
Comment 37 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2007-11-05 14:59:03 PST
We've had some pretty bad regressions in the 2.0.0.x series. You could argue that we're not restrictive *enough*.

> Given that a patch exists, I'm not sure it makes sense to
> distinguish between "primordial" crash bugs and regressions.

It does make sense. Crashes that 2.0.0.x has always incurred get worked around by Web developers and users. New crashes are a lot more painful.
Comment 38 Steven Michaud [:smichaud] (Retired) 2007-11-05 15:01:48 PST
You also need to remember that this patch is close to zero risk --
basically a no-brainer.

But it's out of my hands ... do with it what you will :-)
Comment 39 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2007-11-05 15:02:07 PST
Now that we have a lot of test automation on trunk, the situation for the Firefox 3 branch should be a lot better, so we may want to relax our policy there. Or not. But anyway, this is not the right place for that discussion.
Comment 40 Steven Michaud [:smichaud] (Retired) 2007-11-05 15:13:14 PST
> We've had some pretty bad regressions in the 2.0.0.x series. You
> could argue that we're not restrictive *enough*.

Regressions caused by narrowly-scoped fixes for crash bugs?

I'd be surprised to hear there were more than one or two ... or even
that many.

I _do_ think your approach is correct for new features ... but I'd bet
that almost all of these regressions you talk about came from adding
new features.
Comment 41 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2007-11-05 15:20:05 PST
> Regressions caused by narrowly-scoped fixes for crash bugs?

Yes.

> I'd be surprised to hear there were more than one or two ...

Be surprised.

> I'd bet that almost all of these regressions you talk about came from adding
> new features.

None of the ones I'm aware of came from adding new features (not surprising, since we haven't added any new features in the 2.0.0.x series except, arguably, some Vista compatibility stuff).
Comment 42 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2007-11-05 15:23:35 PST
Note that in many of the 2.0.0.x releases we fixed > 50 security bugs each. It's not surprising that some of these fixes caused regressions, even though we each one as safe as we could.
Comment 43 Steven Michaud [:smichaud] (Retired) 2007-11-05 15:35:42 PST
> Note that in many of the 2.0.0.x releases we fixed > 50 security
> bugs each.

Yes, I can see that security bugs deserve priority.  And it's hard to
keep track of the implications of that many different changes.  And
they inevitably (and legitimately) draw attention away from other
bugs (even crash bugs).

And I grant that, even without the crash, 1.8-branch Firefox behaves
very badly on this bug's URL.
Comment 44 Daniel Veditz [:dveditz] 2007-11-07 15:17:20 PST
Comment on attachment 286306 [details] [diff] [review]
Fix

1.8.1.10 isn't the release to take this in (short, focused). We'll look later.
Comment 45 Daniel Veditz [:dveditz] 2007-12-17 15:13:17 PST
Comment on attachment 286306 [details] [diff] [review]
Fix

approved for 1.8.1.10, a=dveditz for release-drivers
Comment 46 Daniel Veditz [:dveditz] 2007-12-17 15:22:53 PST
I meant 1.8.1.12
Comment 47 Steven Michaud [:smichaud] (Retired) 2007-12-17 15:52:08 PST
Landed on 1.8 branch.
Comment 48 Samuel Sidler (old account; do not CC) 2007-12-17 16:07:19 PST
Steven, please use the fixed1.8.1.* keyword (12, in this case) after checking in on the branch.
Comment 49 Steven Michaud [:smichaud] (Retired) 2007-12-17 16:14:27 PST
I'll try to remember next time :-)
Comment 50 Al Billings [:abillings] 2008-01-22 17:06:44 PST
Verified in 1.8.1.12 with Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.1.12pre) Gecko/2008012208 BonEcho/2.0.0.12pre.

Note You need to log in before you can comment on or make changes to this bug.