Closed Bug 373122 Opened 13 years ago Closed 13 years ago
Crash and/or Cocoa exception with Flash, overflow: scroll, and display: -moz-box
Steps to reproduce: 1. Load the testcase and let it reload a few times. (It will reload faster if it's local.) Result: firefox-bin *** NSRunLoop ignoring exception '*** -[NSCFArray objectAtIndex:]: index (6) beyond bounds (6)' that raised during posting of delayed perform with target 34b4a950 and selector 'runAppShell' (this should be considered a crash, see bug 340453) and Firefox stops painting. OR Firefox crashes with a sg:critical-looking stack.
Triggers the bug both opt and debug (Mac trunk).
Regressed between 2007-03-04_1651 and 2007-03-04_1750 (http://hourly-archive.localgho.st/ ftw). Regression from bug 363253? The patch for that bug had the checkin comment "Move scrollframe attribute-setting out of reflow to a post-reflow callback".
Possibly. It looks to me like the problem is we get into a particular Mac widget setup and then, during Mac painting, a call to WillPaint flushes notifications which does "display:none" on the plugin, destroying its widget, and the Mac widget code gets all confused. I'm not sure what this has to do with bug 363253, but I can see how it could be a problem.
This whole WillPaint and notification-flushing mess is one of the things I really want to fix with the compositor...
Bug 373586 has the same regression range.
The fix for bug 373586 did not take care of this bug.
josh/colin, any thoughts on a fix? roc, are there bugs filed for the compositor work you mentioned in comment 4?
Breaking on -[NSException raise] should allow you to get a trace from where the exception is getting raised. Jesse said it best in the original report though: fixing bug 340453 should turn this into just a regular crash, with a nice stack trace.
Here's a trace of what seems to cause this bug: The browser modifies the window's view hierarchy during a call (to [NSView drawRect:], from the OS) to draw one of the views in that hierarchy. Specifically it removes a view at the same level in the hierarchy (one that "belongs" to the same superview, though not (in this case) the view currently being drawn). This changes the size of the relevant "subviews" array. But the method from which [NSView drawRect:] is being called ([NSView _recursiveDisplayRectIfNeededIgnoringOpacity]) isn't aware of this. So when it tries to fetch the next subview from the array (the subview after the one currently being drawn), this causes a "beyond bounds" error. (The "Posed ..." entries in this trace are artifacts of how I got this information -- I used poseAsClass to "hook" calls to [NSCFArray objectAtIndex:], [NSCFArray removeObjectAtIndex:] and [NSView _recursiveDisplayRectIfNeededIgnoringOpacity]. It was _not_ easy to figure out how to hook methods of the NSCFArray class. If anyone's interested in how I did this, I can attach a short howto.) In my next message I'll post a preliminary patch that seems to fix this problem. With luck it'll also fix bug 374260.
Er... if any ProcessReflowCommands() can trigger ProcessPendingRestyles() (as this stack shows), I think we have a problem. At the very least, all callers that expect to flush out already-ending layout updates but not style reresolves (PresShell::WillPaint is one of them) will be likely to crash... See also bug 375436. I guess one way to go is to say that any layout update might wipe out the frame tree (and run arbitrary script?) and make all consumers deal...
Here's a patch that (in my tests) fixes this bug. It stops the current view from being removed from its superview during nsChildView::TearDownView() -- instead it postpones this until the next time the OS runs the main run loop (on the main thread). I don't know that this is the only way to fix the problem, or even the best way. But it works, and I can see nothing wrong with it in principle.
> With luck it'll also fix bug 374260. It seems to (in a "plain" copy of Minefield, patched only with my "preliminary patch" (attachment 264783 [details] [diff] [review])). But my version of Minefield with all my debugging information still crashes (though in a different location), so I'll need to keep looking. I'll post further comments at bug 374260 (unless what I have to say will also be relevant here).
(In reply to comment #10) > I guess one way to go is to say that any layout update might wipe > out the frame tree (and run arbitrary script?) and make all > consumers deal... Are you saying that (as things now stand) a call to PresShell::WillPaint() might destroy the current view and the current window? :-)
I don't know what the ownership model for those is, assuming you mean NSViews and so forth. It can absolutely destroy nsIViews. And possibly nsGlobalWindows.
I meant both Gecko objects and their OS-specific counterparts (in this case things like NSViews and NSWindows). I assume that Gecko objects wrap their OS-specific counterparts, so that the OS-specific objects would have the same ownership model ... but I don't really know. > It can absolutely destroy nsIViews. Can it destroy the "current" nsIView (the one the browser and/or the OS is currently "working on")? (In this case, "working on" would be "drawing" (via [NSView drawRect:]).) I suppose the answer is "yes".
There is no "current" nsIView at that point in the code...
> There is no "current" nsIView at that point in the code... OK, I misunderstood. I'm just talking about OS-specific objects (like NSViews and NSWindows). (By the way, if there's no "current" nsIView at that point in the code (of which my trace is a snapshot), I'd guess that my patch can deal with a call to nsChildView::TearDownView() on the "current" NSView (when an nsPaint event is dispatched as that NSView is being "drawn" in [NSView drawRect:]). Bug 374260 may be a case of this. I don't know what would happen if the "current" NSWindow got destroyed, though (the one to which the "current" NSView belongs).)
Here's a revised patch, hopefully final. I've changed it to accomodate bug 381087 (which I just spun off from from bug 374260). I've also added comments. My previous patch "worked" for bug 381087 (the crash-bug part of bug 374260) -- it stopped the crashes. But I discovered that it introduces problems of its own. Bug 381087 is caused by an NSWindow's contentView being destroyed before the NSWindow is. My first patch caused the contentView to be destroyed after the NSWindow ... but this could cause a double-free (since the call that destroys an NSWindow also destroys its contentView). Since an NSWindow "owns" its contentView (and destroys it when it gets destroyed), I no either release a ChildView that's a contentView or remove it from its superview (in nsChildView::TearDownView()).
Attachment #264783 - Attachment is obsolete: true
Comment on attachment 265268 [details] [diff] [review] Revised patch (accomodates bug 381087) In my tests I'm now leaking ChildView objects that are contentViews. This wasn't happening before. I need to find out why.
Alright, _this_ patch should (I hope) be the final version. I wasn't counting retains and releases correctly :-(
pink, can you sr?
Whiteboard: [sg:critical?] → [sg:critical?] post 1.8-branch
+ // Also calling [mView removeFromSuperviewWithoutNeedingDisplay] causes + // mView to be released again and dealloced, while remaining win's + // contentView. If that's the case, then we were always doing multiple releases, since this call was followed directly by a release. How did this ever work?
If you call [mView removeFromSuperviewWithoutNeedingDisplay] in addition to [mView release], mView will be dealloced immediately (though since the call to [mView release] was second, that's what triggered the dealloc). As far as I know, this didn't cause any problems unless mView was also its NSWindow's contentView -- since mView was dealloced after it had been removed from the NSView hierarchy. (Though, of course, removing an NSView from the NSView hierarchy during a call to [NSView drawRect:] caused other problems.)
Or maybe you're asking why [NSWindow dealloc] didn't get into trouble trying to release an invalid contentView ... which in any case is a valid question. I'll try to find out.
> I'll try to find out. The answer seems to be that [NSWindow dealloc] never touches the window's contentView directly. All NSWindow objects appear to have an NSNextStepFrame object (a subclass of NSView) as their top-level NSView -- above the contentView. I tested in Cocoa Firefox and Safari, and also found the following doc: http://www.cocoadev.com/index.pl?NSNextStepFrame The only NSView object touched directly by [NSWindow dealloc] is the NSWindow's NSNextStepFrame object -- [NSWindow dealloc] calls [NSView release] on it, from which (normally) [NSNextStepFrame dealloc] is called in turn. [NSNextStepFrame dealloc] releases all its subviews. If the NSWindow object's contentView is still part of its NSView hierarchy, it will get released (and normally also dealloced). If the NSWindow's contentView is no longer part of its NSView hierarchy, it will never get touched. But if the NSWindow's contentView was previously dealloced (without calling [NSWindow setContentView:nil]), it still stores a pointer to the (now deleted) contentView (in its _contentView variable). And anything that calls [NSWindow contentView] will get back the invalid pointer.
Comment on attachment 265296 [details] [diff] [review] Revised patch (count retains and releases correctly) sr=pink
Attachment #265296 - Flags: superreview?(mikepinkerton) → superreview+
landed on trunk
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Depends on: 383821
Of note, this seems to have started bug 385408 on OS X (Windows version of the same bug started sometime earlier). See bug 385408 comment 5.
crash test landed http://hg.mozilla.org/mozilla-central/rev/015bcccac065
Flags: in-testsuite? → in-testsuite+
You need to log in before you can comment on or make changes to this bug.