Closed
Bug 373122
Opened 17 years ago
Closed 17 years ago
Crash and/or Cocoa exception with Flash, overflow: scroll, and display: -moz-box
Categories
(Core :: Widget: Cocoa, defect)
Tracking
()
RESOLVED
FIXED
mozilla1.9alpha8
People
(Reporter: jruderman, Assigned: smichaud)
References
Details
(4 keywords, Whiteboard: [sg:critical?] post 1.8-branch)
Attachments
(3 files, 2 obsolete files)
626 bytes,
text/html
|
Details | |
11.56 KB,
text/plain
|
Details | |
4.03 KB,
patch
|
jaas
:
review+
mikepinkerton
:
superreview+
|
Details | Diff | Splinter Review |
Steps to reproduce: 1. Load the testcase and let it reload a few times. (It will reload faster if it's local.) Result: firefox-bin[8985] *** NSRunLoop ignoring exception '*** -[NSCFArray objectAtIndex:]: index (6) beyond bounds (6)' that raised during posting of delayed perform with target 34b4a950 and selector 'runAppShell' (this should be considered a crash, see bug 340453) and Firefox stops painting. OR Firefox crashes with a sg:critical-looking stack.
Flags: blocking1.9?
Reporter | ||
Updated•17 years ago
|
Whiteboard: [sg:critical?]
Reporter | ||
Comment 1•17 years ago
|
||
Triggers the bug both opt and debug (Mac trunk).
Reporter | ||
Comment 2•17 years ago
|
||
Regressed between 2007-03-04_1651 and 2007-03-04_1750 (http://hourly-archive.localgho.st/ ftw). Regression from bug 363253? The patch for that bug had the checkin comment "Move scrollframe attribute-setting out of reflow to a post-reflow callback".
Blocks: 363253
Keywords: regression
Possibly. It looks to me like the problem is we get into a particular Mac widget setup and then, during Mac painting, a call to WillPaint flushes notifications which does "display:none" on the plugin, destroying its widget, and the Mac widget code gets all confused. I'm not sure what this has to do with bug 363253, but I can see how it could be a problem.
This whole WillPaint and notification-flushing mess is one of the things I really want to fix with the compositor...
Reporter | ||
Comment 5•17 years ago
|
||
Bug 373586 has the same regression range.
Reporter | ||
Comment 6•17 years ago
|
||
The fix for bug 373586 did not take care of this bug.
Comment 7•17 years ago
|
||
josh/colin, any thoughts on a fix? roc, are there bugs filed for the compositor work you mentioned in comment 4?
Comment 8•17 years ago
|
||
Breaking on -[NSException raise] should allow you to get a trace from where the exception is getting raised. Jesse said it best in the original report though: fixing bug 340453 should turn this into just a regular crash, with a nice stack trace.
Updated•17 years ago
|
Flags: blocking1.9? → blocking1.9+
Assignee | ||
Comment 9•17 years ago
|
||
Here's a trace of what seems to cause this bug: The browser modifies the window's view hierarchy during a call (to [NSView drawRect:], from the OS) to draw one of the views in that hierarchy. Specifically it removes a view at the same level in the hierarchy (one that "belongs" to the same superview, though not (in this case) the view currently being drawn). This changes the size of the relevant "subviews" array. But the method from which [NSView drawRect:] is being called ([NSView _recursiveDisplayRectIfNeededIgnoringOpacity]) isn't aware of this. So when it tries to fetch the next subview from the array (the subview after the one currently being drawn), this causes a "beyond bounds" error. (The "Posed ..." entries in this trace are artifacts of how I got this information -- I used poseAsClass to "hook" calls to [NSCFArray objectAtIndex:], [NSCFArray removeObjectAtIndex:] and [NSView _recursiveDisplayRectIfNeededIgnoringOpacity]. It was _not_ easy to figure out how to hook methods of the NSCFArray class. If anyone's interested in how I did this, I can attach a short howto.) In my next message I'll post a preliminary patch that seems to fix this problem. With luck it'll also fix bug 374260.
Comment 10•17 years ago
|
||
Er... if any ProcessReflowCommands() can trigger ProcessPendingRestyles() (as this stack shows), I think we have a problem. At the very least, all callers that expect to flush out already-ending layout updates but not style reresolves (PresShell::WillPaint is one of them) will be likely to crash... See also bug 375436. I guess one way to go is to say that any layout update might wipe out the frame tree (and run arbitrary script?) and make all consumers deal...
Assignee | ||
Comment 11•17 years ago
|
||
Here's a patch that (in my tests) fixes this bug. It stops the current view from being removed from its superview during nsChildView::TearDownView() -- instead it postpones this until the next time the OS runs the main run loop (on the main thread). I don't know that this is the only way to fix the problem, or even the best way. But it works, and I can see nothing wrong with it in principle.
Assignee | ||
Comment 12•17 years ago
|
||
> With luck it'll also fix bug 374260. It seems to (in a "plain" copy of Minefield, patched only with my "preliminary patch" (attachment 264783 [details] [diff] [review])). But my version of Minefield with all my debugging information still crashes (though in a different location), so I'll need to keep looking. I'll post further comments at bug 374260 (unless what I have to say will also be relevant here).
Assignee | ||
Comment 13•17 years ago
|
||
(In reply to comment #10) > I guess one way to go is to say that any layout update might wipe > out the frame tree (and run arbitrary script?) and make all > consumers deal... Are you saying that (as things now stand) a call to PresShell::WillPaint() might destroy the current view and the current window? :-)
Comment 14•17 years ago
|
||
I don't know what the ownership model for those is, assuming you mean NSViews and so forth. It can absolutely destroy nsIViews. And possibly nsGlobalWindows.
Assignee | ||
Comment 15•17 years ago
|
||
I meant both Gecko objects and their OS-specific counterparts (in this
case things like NSViews and NSWindows).
I assume that Gecko objects wrap their OS-specific counterparts, so
that the OS-specific objects would have the same ownership model
... but I don't really know.
> It can absolutely destroy nsIViews.
Can it destroy the "current" nsIView (the one the browser and/or the
OS is currently "working on")? (In this case, "working on" would be
"drawing" (via [NSView drawRect:]).) I suppose the answer is "yes".
Comment 16•17 years ago
|
||
There is no "current" nsIView at that point in the code...
Assignee | ||
Comment 17•17 years ago
|
||
> There is no "current" nsIView at that point in the code... OK, I misunderstood. I'm just talking about OS-specific objects (like NSViews and NSWindows). (By the way, if there's no "current" nsIView at that point in the code (of which my trace is a snapshot), I'd guess that my patch can deal with a call to nsChildView::TearDownView() on the "current" NSView (when an nsPaint event is dispatched as that NSView is being "drawn" in [NSView drawRect:]). Bug 374260 may be a case of this. I don't know what would happen if the "current" NSWindow got destroyed, though (the one to which the "current" NSView belongs).)
Assignee | ||
Comment 18•17 years ago
|
||
Here's a revised patch, hopefully final. I've changed it to accomodate bug 381087 (which I just spun off from from bug 374260). I've also added comments. My previous patch "worked" for bug 381087 (the crash-bug part of bug 374260) -- it stopped the crashes. But I discovered that it introduces problems of its own. Bug 381087 is caused by an NSWindow's contentView being destroyed before the NSWindow is. My first patch caused the contentView to be destroyed after the NSWindow ... but this could cause a double-free (since the call that destroys an NSWindow also destroys its contentView). Since an NSWindow "owns" its contentView (and destroys it when it gets destroyed), I no either release a ChildView that's a contentView or remove it from its superview (in nsChildView::TearDownView()).
Attachment #264783 -
Attachment is obsolete: true
Assignee | ||
Updated•17 years ago
|
Attachment #265268 -
Flags: review?(joshmoz)
Assignee | ||
Comment 19•17 years ago
|
||
Comment on attachment 265268 [details] [diff] [review] Revised patch (accomodates bug 381087) In my tests I'm now leaking ChildView objects that are contentViews. This wasn't happening before. I need to find out why.
Attachment #265268 -
Flags: review?(joshmoz)
Assignee | ||
Comment 20•17 years ago
|
||
Alright, _this_ patch should (I hope) be the final version. I wasn't counting retains and releases correctly :-(
Attachment #265268 -
Attachment is obsolete: true
Attachment #265296 -
Flags: review?(joshmoz)
Attachment #265296 -
Flags: review?(joshmoz) → review+
Attachment #265296 -
Flags: superreview?(mikepinkerton)
Comment 21•17 years ago
|
||
pink, can you sr?
Updated•17 years ago
|
Flags: wanted1.8.1.x-
Flags: wanted1.8.0.x-
Whiteboard: [sg:critical?] → [sg:critical?] post 1.8-branch
Comment 22•17 years ago
|
||
+ // Also calling [mView removeFromSuperviewWithoutNeedingDisplay] causes + // mView to be released again and dealloced, while remaining win's + // contentView. If that's the case, then we were always doing multiple releases, since this call was followed directly by a release. How did this ever work?
Assignee | ||
Comment 23•17 years ago
|
||
If you call [mView removeFromSuperviewWithoutNeedingDisplay] in addition to [mView release], mView will be dealloced immediately (though since the call to [mView release] was second, that's what triggered the dealloc). As far as I know, this didn't cause any problems unless mView was also its NSWindow's contentView -- since mView was dealloced after it had been removed from the NSView hierarchy. (Though, of course, removing an NSView from the NSView hierarchy during a call to [NSView drawRect:] caused other problems.)
Assignee | ||
Comment 24•17 years ago
|
||
Or maybe you're asking why [NSWindow dealloc] didn't get into trouble trying to release an invalid contentView ... which in any case is a valid question. I'll try to find out.
Assignee | ||
Comment 25•17 years ago
|
||
> I'll try to find out. The answer seems to be that [NSWindow dealloc] never touches the window's contentView directly. All NSWindow objects appear to have an NSNextStepFrame object (a subclass of NSView) as their top-level NSView -- above the contentView. I tested in Cocoa Firefox and Safari, and also found the following doc: http://www.cocoadev.com/index.pl?NSNextStepFrame The only NSView object touched directly by [NSWindow dealloc] is the NSWindow's NSNextStepFrame object -- [NSWindow dealloc] calls [NSView release] on it, from which (normally) [NSNextStepFrame dealloc] is called in turn. [NSNextStepFrame dealloc] releases all its subviews. If the NSWindow object's contentView is still part of its NSView hierarchy, it will get released (and normally also dealloced). If the NSWindow's contentView is no longer part of its NSView hierarchy, it will never get touched. But if the NSWindow's contentView was previously dealloced (without calling [NSWindow setContentView:nil]), it still stores a pointer to the (now deleted) contentView (in its _contentView variable). And anything that calls [NSWindow contentView] will get back the invalid pointer.
Comment 26•17 years ago
|
||
Comment on attachment 265296 [details] [diff] [review] Revised patch (count retains and releases correctly) sr=pink
Attachment #265296 -
Flags: superreview?(mikepinkerton) → superreview+
Comment 27•17 years ago
|
||
landed on trunk
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Updated•17 years ago
|
Assignee: joshmoz → smichaud
Depends on: 383821
Updated•17 years ago
|
Group: security
Flags: in-testsuite?
Comment 28•17 years ago
|
||
Of note, this seems to have started bug 385408 on OS X (Windows version of the same bug started sometime earlier). See bug 385408 comment 5.
Comment 29•15 years ago
|
||
crash test landed http://hg.mozilla.org/mozilla-central/rev/015bcccac065
Flags: in-testsuite? → in-testsuite+
You need to log in
before you can comment on or make changes to this bug.
Description
•