398499 - Crash in DispatchEventToHandlers(EventTargetRec*, OpaqueEventRef*, HandlerCallRec*

Reporter

Description

•

17 years ago

Attached file Apple crash report (obsolete) — Details

Seen while testing Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.1.8pre) Gecko/2007100307 BonEcho/2.0.0.8pre.

STR:
1. Load the URL. Crash.

Apple Crash report attached.

Marcia Knous [:marcia]

Reporter

Comment 1

•

17 years ago

Additional STR:

1. Change focus to another application, such as Thunderbird and put that window on top. Then switch back to FF and that should generate the crash.

URL: http://en.wikipedia.org/w/index.php?... → http://en.wikipedia.org/w/index.php?t...

Marcia Knous [:marcia]

Reporter

Comment 2

•

17 years ago

I checked the trunk and I do not get this crash when following the same steps. So far now this is 1.8 branch only.

Marcia Knous [:marcia]

Reporter

Comment 3

•

17 years ago

nominating, not sure how critical this is.

Severity: normal → critical

Flags: blocking1.8.1.8?

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 4

•

17 years ago

Not DOM Events issue. Maybe Widget: Mac?

Assignee: nobody → joshmoz

Component: DOM: Events → Widget: Mac

QA Contact: events → mac

Marcia Knous [:marcia]

Reporter

Comment 5

•

17 years ago

Attached file Text version of crash — Details

Attachment #283490 - Attachment is obsolete: true

Marcia Knous [:marcia]

Reporter

Comment 6

•

17 years ago

BTW thanks to Jeremy for pointing me to this bug, he should get the credit for finding it.

Daniel Veditz [:dveditz]

Updated

•

17 years ago

Flags: blocking1.8.1.8? → blocking1.8.1.9?

Jeremy Baron

Comment 7

•

17 years ago

1.8 is widget:mac (Carbon) and trunk is Cocoa.  This crashes in a Carbon system library (which I'm guessing is not used from Cocoa apps) and requires interaction with another app so it makes sense that it crashes only 1.8 and makes me think it's triggered in or related to widget:mac. [mid-air with smaug]

(In reply to comment #1)
> Additional STR:
> 1. Change focus to another application, such as Thunderbird and put that window
> on top. Then switch back to FF and that should generate the crash.
No need to switch back.  AFAICT this also requires mouse activity over any part of the moz app which is covered by another app.

(In reply to comment #3)
> nominating, not sure how critical this is.
fwiw, I'm 98% sure that a stack in my local CrashReporter (apple) log from march was *not* on a MediaWiki site.  And that happens to be the first Firefox crash in the log with HIToolbox 1.4.9 (the previous crash was 1.4.8 and a different stack)  So this is probably not a MediaWiki specific issue.

The CrashReporter log from marcia has a truncated stack (as does my own on OS X 10.4.10).  I'll attach a my own gdb trace for 10.4.10.  I'll walk Marcia through getting a gdb trace if anyone wants it for Leopard.

(In reply to comment #6)
> BTW thanks to Jeremy for pointing me to this bug, he should get the credit for
> finding it.
btw, thanks to Marcia to for reproducing on Leopard! :)  It was hard to miss on branch when browsing MediaWiki diffs :-{

<rdar://problem/5466547>

Whiteboard: [radar:5466547]

Josh Aas

Comment 8

•

17 years ago

Can somebody confirm that this still happens reliably on a fully software-updated Mac OS X 10.5 build 9a559? Does it happen reliably on this build? How likely are users to hit this bug?

Jeremy Baron

Comment 9

•

17 years ago

(In reply to comment #8)
> Can somebody confirm that this still happens reliably on a fully
> software-updated Mac OS X 10.5 build 9a559? Does it happen reliably on this
> build? How likely are users to hit this bug?

Are you having trouble reproducing?

Marcia repro'd on that build and it was prolly up to date but I'll have to double check with her. (there's prolly enough info in the report to determine the answer if you knew enough about their build changes but I don't :P) See attachment 283577 [details] for a CrashReporter log on Leopard.  I see I forgot to attach my gdb trace (10.4.10) after my last comment; coming right up.

This reliably happens on 10.4.10 on many tens of Mediawiki generated pages (mostly but maybe not exclusively diff pages, I forget) and therefore will likely effect the majority of OS X users that are more than just casual editors on any Mediawiki wiki.  (Haven't tried to reduce to MW specific versions yet)

FWIW, I'm 95% certain a matching stack in my log from several months back is *not* from Mediawiki.

from a message I just sent to an apple guy that responded to the radar:
> btw, i'd love to get some numbers on how many people are actually crashing
> from this using statistics from talkback but I can't get it to run in rosetta
> (I even tried invoking oah/traslate directly and it segfaulted)  ISTR firefox
> once worked in rosetta on this machine and I hear it still works for other
> people so probably a local issue.  Later this week or next I'll try again to
> get ahold of a PPC tester / find a public PPC machine / try with rosetta on
> another box.  Also, I know CrashReporter generates lots of reports and it
> would be nice to get stats from those too.

Jeremy Baron

Comment 10

•

17 years ago

Attached file gdb stack on 10.4.10 — Details

Jeremy Baron

Comment 11

•

17 years ago

Attached file CrashReporter log; 10.4.10 +rosetta — Details

CrashReporter log of crashing on the same page under Rosetta.  Note HIToolbox doesn't even show up in the stack.

Marcia Knous [:marcia]

Reporter

Comment 12

•

17 years ago

I just tested this again on the latest Leopard seed 9a559 on a PPC mac and I crash hitting that URL (it is a bit of a delayed crash). Talkback fires, but I am having trouble submitting the report due to a network error (perhaps a separate bug). I have an Apple report from this PPC crash which I can attach.

Josh Aas

Updated

•

17 years ago

Assignee: joshmoz → smichaud

Steven Michaud [:smichaud] (Retired)

Assignee

Comment 13

•

17 years ago

I've reproduced this crash on OS X 10.4.10 (on a PPC Mac).  I'll test
later on Leopard, but it's already pretty clear that this is a bad
bug, and not Leopard-specific.

By the way, Jeremy, what your gdb stack (attachment 285198 [details], comment
#10) has at it's bottom is truly bizarre -- it looks like you've quit
Firefox!  Something similar (and equally bizarre) shows up in a stack
trace posted for a seemingly related bug (in 1.8-branch Firefox,
attachment 285380 [details], bug 400291).  I now suspect that the bottom parts
of both stacks are corrupt.

If possible, Jeremy, please do a debug build of 1.8-branch Firefox and
test with that -- because it will have debug symbols, gdb and
crashreporterd will be less likely to generate corrupt stacks from it.

(I'll do the same, of course ... but I probably won't have time until
tomorrow.)

Steven Michaud [:smichaud] (Retired)

Assignee

Comment 14

•

17 years ago

> Something similar (and equally bizarre) shows up in a stack trace
> posted for a seemingly related bug ...

I should have said "seemingly unrelated".  Aside from the (apparent)
stack-trace corruption, this bug doesn't seem related to bug 400291.

Steven Michaud [:smichaud] (Retired)

Assignee

Comment 15

•

17 years ago

I see this bug (on OS X 10.4.10) on all the Firefox 2.0.0.X versions
that I tested, but not on any of the Firefox 1.5.0.X versions.  So it
seems to have been introduced (a long time ago) on the 1.8 branch (as
opposed to the 1.8.0 branch).

Steven Michaud [:smichaud] (Retired)

Assignee

Comment 16

•

17 years ago

This bug is a regression that took place on the 1.8 branch between the
2006-05-30-07-mozilla1.8 nightly (which doesn't have the bug) and the
2006-05-31-08-mozilla1.8 nightly (which does).

This regression window almost certainly implicates the single patch
that was landed as a fix for bug 106695 and bug 339640:

http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=all&branch=MOZILLA_1_8_BRANCH&branchtype=match&dir=&file=&filetype=match&who=&whotype=match&sortby=Date&hours=2&date=explicit&mindate=2006-05-30+12%3A34&maxdate=2006-05-30+12%3A34&cvsroot=%2Fcvsroot

Keywords: regression

Steven Michaud [:smichaud] (Retired)

Assignee

Comment 17

•

17 years ago

This bug isn't Leopard-specific.

Summary: [10.5] Crash in DispatchEventToHandlers(EventTargetRec*, OpaqueEventRef*, HandlerCallRec* → Crash in DispatchEventToHandlers(EventTargetRec*, OpaqueEventRef*, HandlerCallRec*

Steven Michaud [:smichaud] (Retired)

Assignee

Comment 18

•

17 years ago

Attached file Stack trace from build with debug symbols — Details

Here's a stack trace made using a build with debug symbols (built from
current 1.8-branch source code).

It includes both a gdb trace of the top and bottom of the main thread
stack, and a crashreporterd trace of all the threads' stacks.

Steven Michaud [:smichaud] (Retired)

Assignee

Comment 19

•

17 years ago

> This regression window almost certainly implicates the single patch
> that was landed as a fix for bug 106695 and bug 339640:

I was wrong about this.  The actual culprit was the single patch
(landed slightly later) to fix bug 54488, bug 339370 and bug 339482:

http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=all&branch=MOZILLA_1_8_BRANCH&branchtype=match&dir=&file=&filetype=match&who=&whotype=match&sortby=Date&hours=2&date=explicit&mindate=2006-05-30+12%3A43%3A00&maxdate=2006-05-30+12%3A43%3A00&cvsroot=%2Fcvsroot

Steven Michaud [:smichaud] (Retired)

Assignee

Comment 20

•

17 years ago

Attached patch Fix — Details — Splinter Review

Here's a patch that fixes this bug.

Contrary to appearances, the bug isn't caused by an infinite recursion
... just by a very large non-infinite one :-)

This bug's URL has a very large number of nsMacControl objects on its
page.  For each of these, a handler for kEventWindowActivated and
kEventWindowDeactivated Carbon events is installed on the window to
which the control belongs.  Currently (before my patch), this handler
calls CallNextEventHandler() to invoke all previously installed
handlers.  But if there are n nsMacControl objects in a page/window,
using CallNextEventHandler() makes the handler recurse n-1 times.

This bug's URL has so many controls that this recursion (apparently)
causes a stack overflow.

Calling CallNextEventHandler() from this handler servers no purpose,
so I've eliminated the call.

Even with my patch, this bug's URL still redraws incredibly slowly,
and sometimes fails to redraw at all.  But that's another problem :-(

Attachment #286306 - Flags: review?(joshmoz)

Steven Michaud [:smichaud] (Retired)

Assignee

Comment 21

•

17 years ago

Needless to say, this isn't an Apple bug.

Jeremy, please let them know that.

Josh Aas

Updated

•

17 years ago

Attachment #286306 - Flags: review?(joshmoz) → review?(mark)

Mark Mentovai

Comment 22

•

17 years ago

Comment on attachment 286306 [details] [diff] [review]
Fix

This is OK with me.  Make sure that you test that the sheet show/hide stuff in nsMacWindow still works in conjunction with controls, since I know that it uses {kEventClassWindow, kEventWindowActivated} too.

Attachment #286306 - Flags: review?(mark) → review+

Steven Michaud [:smichaud] (Retired)

Assignee

Comment 23

•

17 years ago

> Make sure that you test that the sheet show/hide stuff in
> nsMacWindow still works in conjunction with controls, since I know
> that it uses {kEventClassWindow, kEventWindowActivated} too.

Please give me some pointers about how to do this -- examples of URLs
(or browser menus/functions) that use this stuff, and so forth.

Mark Mentovai

Comment 24

•

17 years ago

It's probably enough if you bring up a Bugzilla page and then try to navigate through all of the prefs sheets in Google Toolbar.

Jeremy Baron

Comment 25

•

17 years ago

(In reply to comment #21)
> Jeremy, please let them know that.
Done.

Steven Michaud [:smichaud] (Retired)

Assignee

Comment 26

•

17 years ago

> It's probably enough if you bring up a Bugzilla page and then try to
> navigate through all of the prefs sheets in Google Toolbar.

Thanks, Mark.

I did this (while occasionally switching to another browser window and
back, or to another app and back), and saw no problems.

Steven Michaud [:smichaud] (Retired)

Assignee

Comment 27

•

17 years ago

Comment on attachment 286306 [details] [diff] [review]
Fix

Roc, Jeremy has already pointed out the following error in my comment.
I'll fix it when I land the patch:

+  // actually "consume" an activate or deactive event here, there should be

should be

+  // actually "consume" an activate or deactivate event here, there should be

Attachment #286306 - Flags: superreview?(roc)

Jeremy Baron

Comment 28

•

17 years ago

Comment on attachment 286306 [details] [diff] [review]
Fix

(In reply to comment #27)
As long as we're enumerating errata here, I'll add to the pile:
>+  // for each of them.  So CallNextEventHandler() recurses through this
>+  // handler once for each "additional" control, and if there are too many
>+  // controls a stack overflow can result (bmo bug 398499).  Since we never
This sentence could probably use an extra comma or to be split into 2 sentences.

>+  // no problem alwaye returning eventNotHandledErr.
s/alwaye/always/

Steven Michaud [:smichaud] (Retired)

Assignee

Comment 29

•

17 years ago

Roc? :-)

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 30

•

17 years ago

Comment on attachment 286306 [details] [diff] [review]
Fix

sorry, I lost track of this one somehow

Attachment #286306 - Flags: superreview?(roc) → superreview+

Steven Michaud [:smichaud] (Retired)

Assignee

Comment 31

•

17 years ago

Comment on attachment 286306 [details] [diff] [review]
Fix

This patch's risk is minimal (the call to CallNextEventHandler()
didn't serve any useful purpose).

It's benefits are perhaps moderate (very few web pages will have
enough controls to trigger the crash my patch fixes).  Or you could
argue that, since Wikipedia is an important website, the patch's
benefits are substantial.

Attachment #286306 - Flags: approval1.8.1.10?

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 32

•

17 years ago

Personally I don't think we should take this on branch.

Steven Michaud [:smichaud] (Retired)

Assignee

Comment 33

•

17 years ago

Oh?  Why not?

(The problem is branch-only, by the way.)

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 34

•

17 years ago

Because we typically only take crash fixes on branch if they have security implications, and AFAIK this doesn't. We might make exceptions for crashes that are regressions from other branch changes, or for crashes that are very frequent, but this doesn't seem to meet those criteria either.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 35

•

17 years ago

> We might make exceptions for crashes that are regressions from other branch
> changes

What I mean specifically is that we might take a fix for a crash that was introduced since Firefox 2.0.0.0 shipped. But it seems this bug was in Firefox 2.0.0.0.

Steven Michaud [:smichaud] (Retired)

Assignee

Comment 36

•

17 years ago

> Because we typically only take crash fixes on branch if they have
> security implications, and AFAIK this doesn't.

No, I agree this crash doesn't have security implications.

But this policy seems _way_ too restrictive ... at least in my
opinion.  Even if you sometimes make exceptions to it.

Though it'd be another matter if this bug didn't have a patch.

Given that a patch exists, I'm not sure it makes sense to
distinguish between "primordial" crash bugs and regressions.

(By the way, one reason this bug got a lot of attention is that at
first it seemed to be Leopard-only.)

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 37

•

17 years ago

We've had some pretty bad regressions in the 2.0.0.x series. You could argue that we're not restrictive *enough*.

> Given that a patch exists, I'm not sure it makes sense to
> distinguish between "primordial" crash bugs and regressions.

It does make sense. Crashes that 2.0.0.x has always incurred get worked around by Web developers and users. New crashes are a lot more painful.

Steven Michaud [:smichaud] (Retired)

Assignee

Comment 38

•

17 years ago

You also need to remember that this patch is close to zero risk --
basically a no-brainer.

But it's out of my hands ... do with it what you will :-)

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 39

•

17 years ago

Now that we have a lot of test automation on trunk, the situation for the Firefox 3 branch should be a lot better, so we may want to relax our policy there. Or not. But anyway, this is not the right place for that discussion.

Steven Michaud [:smichaud] (Retired)

Assignee

Comment 40

•

17 years ago

> We've had some pretty bad regressions in the 2.0.0.x series. You
> could argue that we're not restrictive *enough*.

Regressions caused by narrowly-scoped fixes for crash bugs?

I'd be surprised to hear there were more than one or two ... or even
that many.

I _do_ think your approach is correct for new features ... but I'd bet
that almost all of these regressions you talk about came from adding
new features.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 41

•

17 years ago

> Regressions caused by narrowly-scoped fixes for crash bugs?

Yes.

> I'd be surprised to hear there were more than one or two ...

Be surprised.

> I'd bet that almost all of these regressions you talk about came from adding
> new features.

None of the ones I'm aware of came from adding new features (not surprising, since we haven't added any new features in the 2.0.0.x series except, arguably, some Vista compatibility stuff).

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 42

•

17 years ago

Note that in many of the 2.0.0.x releases we fixed > 50 security bugs each. It's not surprising that some of these fixes caused regressions, even though we each one as safe as we could.

Steven Michaud [:smichaud] (Retired)

Assignee

Comment 43

•

17 years ago

> Note that in many of the 2.0.0.x releases we fixed > 50 security
> bugs each.

Yes, I can see that security bugs deserve priority.  And it's hard to
keep track of the implications of that many different changes.  And
they inevitably (and legitimately) draw attention away from other
bugs (even crash bugs).

And I grant that, even without the crash, 1.8-branch Firefox behaves
very badly on this bug's URL.

Daniel Veditz [:dveditz]

Comment 44

•

17 years ago

Comment on attachment 286306 [details] [diff] [review]
Fix

1.8.1.10 isn't the release to take this in (short, focused). We'll look later.

Attachment #286306 - Flags: approval1.8.1.10? → approval1.8.1.11?

Daniel Veditz [:dveditz]

Updated

•

17 years ago

Flags: wanted1.8.1.x+

Flags: wanted1.8.0.x-

Whiteboard: [radar:5466547] → [radar:5466547] 1.8.1.x-only crash

Daniel Veditz [:dveditz]

Comment 45

•

17 years ago

Comment on attachment 286306 [details] [diff] [review]
Fix

approved for 1.8.1.10, a=dveditz for release-drivers

Attachment #286306 - Flags: approval1.8.1.12? → approval1.8.1.12+

Daniel Veditz [:dveditz]

Comment 46

•

17 years ago

I meant 1.8.1.12

Steven Michaud [:smichaud] (Retired)

Assignee

Comment 47

•

17 years ago

Landed on 1.8 branch.

Status: NEW → RESOLVED

Closed: 17 years ago

Resolution: --- → FIXED

Samuel Sidler (old account; do not CC)

Comment 48

•

17 years ago

Steven, please use the fixed1.8.1.* keyword (12, in this case) after checking in on the branch.

Keywords: fixed1.8.1.12

Steven Michaud [:smichaud] (Retired)

Assignee

Comment 49

•

17 years ago

I'll try to remember next time :-)

Al Billings [:abillings - ex-MoCo]

Comment 50

•

17 years ago

Verified in 1.8.1.12 with Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.1.12pre) Gecko/2008012208 BonEcho/2.0.0.12pre.

Status: RESOLVED → VERIFIED

Keywords: fixed1.8.1.12 → verified1.8.1.12

Samuel Sidler (old account; do not CC)

Updated

•

17 years ago

Flags: blocking1.8.1.12?

timeless

Updated

•

15 years ago

Product: Core → Core Graveyard

Apple crash report 17 years ago Marcia Knous [:marcia] 82.30 KB, application/rtf		Details
Text version of crash 17 years ago Marcia Knous [:marcia] 81.63 KB, text/plain		Details
gdb stack on 10.4.10 17 years ago Jeremy Baron 157.42 KB, text/plain		Details
CrashReporter log; 10.4.10 +rosetta 17 years ago Jeremy Baron 30.22 KB, text/plain		Details
Stack trace from build with debug symbols 17 years ago Steven Michaud [:smichaud] (Retired) 71.81 KB, text/plain		Details
Fix 17 years ago Steven Michaud [:smichaud] (Retired) 1.29 KB, patch	mark : review+ roc : superreview+ dveditz : approval1.8.1.12+	Details \| Diff \| Splinter Review