"Assertion failure in -[NSNextStepFrame lockFocus]" (NSInternalInconsistencyException)

RESOLVED FIXED

Status

()

Core
Widget: Cocoa
--
critical
RESOLVED FIXED
8 years ago
7 years ago

People

(Reporter: Jesse Ruderman, Assigned: smichaud)

Tracking

(Blocks: 1 bug, {assertion, crash, testcase})

Trunk
x86
Mac OS X
assertion, crash, testcase
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(blocking2.0 -, status1.9.2 .17-fixed, status1.9.1 unaffected)

Details

(Whiteboard: [sg:dos])

Attachments

(3 attachments, 4 obsolete attachments)

(Reporter)

Description

8 years ago
Created attachment 430506 [details]
testcase (triggers the bug 30% of the time; requires focus)

2010-03-04 19:51:24.453 firefox-bin[35063:613] *** Assertion failure in -[NSNextStepFrame lockFocus], /SourceCache/AppKit/AppKit-1038.25/AppKit.subproj/NSView.m:5237
2010-03-04 19:51:24.456 firefox-bin[35063:613] Mozilla has caught an Obj-C exception [NSInternalInconsistencyException: -[NSNextStepFrame(0x1ec5d9b0) lockFocus] failed with window=0x1ec5d6c0, windowNumber=1220, [self isHiddenOrHasHiddenAncestor]=0]
2010-03-04 19:51:24.457 firefox-bin[35063:613] Generating stack trace for Obj-C exception...
2010-03-04 19:51:27.970 firefox-bin[35063:613] Stack trace:
Looking up symbols in process 35063 named:  firefox-bin
__raiseError (in CoreFoundation) + 381
objc_exception_throw (in libobjc.A.dylib) + 56
+[NSException raise:format:arguments:] (in CoreFoundation) + 136
-[NSAssertionHandler handleFailureInMethod:object:file:lineNumber:description:] (in Foundation) + 116
-[NSView lockFocus] (in AppKit) + 280
-[NSView _displayRectIgnoringOpacity:isVisibleRect:rectIsVisibleRectForView:] (in AppKit) + 2917
-[NSView displayIfNeeded] (in AppKit) + 818
-[NSNextStepFrame displayIfNeeded] (in AppKit) + 98
-[NSWindow displayIfNeeded] (in AppKit) + 204
_handleWindowNeedsDisplay (in AppKit) + 696
__CFRunLoopDoObservers (in CoreFoundation) + 1186
__CFRunLoopRun (in CoreFoundation) + 557
CFRunLoopRunSpecific (in CoreFoundation) + 452
CFRunLoopRunInMode (in CoreFoundation) + 97
RunCurrentEventLoopInMode (in HIToolbox) + 392
ReceiveNextEventCommon (in HIToolbox) + 158
BlockUntilNextEventMatchingListInMode (in HIToolbox) + 81
_DPSNextEvent (in AppKit) + 847
-[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] (in AppKit) + 156
-[NSApplication run] (in AppKit) + 821
nsAppShell::Run() (in XUL) (nsAppShell.mm:863)
nsAppStartup::Run() (in XUL) (nsAppStartup.cpp:183)
XRE_main (in XUL) (nsAppRunner.cpp:3494)
main (in firefox-bin) (nsBrowserApp.cpp:158)
start (in firefox-bin) + 54

Comment 1

8 years ago
In case it's helpful, I have a site that crashes Firefox with this same error with 100% reliability.  

My config: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3

Site:  http://www.geniustools.net/Products/search.aspx

If you just click on the pulldown menu under the text "WRENCHES & TORQUE WRENCHES" (about 2/3 down in left column), Firefox will crash immediately and ignominiously.
(Reporter)

Updated

8 years ago
blocking2.0: --- → ?
Keywords: crash
(In reply to comment #1)
> In case it's helpful, I have a site that crashes Firefox with this same error
> with 100% reliability.  

Somewhat, but not entirely, unsurprisingly, it doesn't crash or hit an exception in (1.9.2-based) Camino, presumably since we use our own <select> implementation.

However, that URL does also crash Safari 4.0.5 on 10.5.8, but at least Safari is able to generate a crash report! (no exceptions with Safari, though).
(Assignee)

Comment 3

8 years ago
This could be an Apple bug.  Sounds like the problem started with Leopard.

http://www.cocoabuilder.com/archive/cocoa/206587-assertion-failure-in-nsnextstepframe-lockfocus.html
http://lists.apple.com/archives/Cocoa-dev/2007/Oct/msg01672.html

Whatever it turns out to be, though, I'll probably need to figure out a way to work around it.

Comment 4

8 years ago
Blocking 1.9.3+ for now.
Assignee: nobody → smichaud
blocking2.0: ? → final+
http://tinderbox.mozilla.org/showlog.cgi?log=Electrolysis/1282056626.1282057018.14741.gz
Rev3 MacOSX Leopard 10.5.8 electrolysis opt test mochitests-5/5 on 2010/08/17 07:50:26
s: talos-r3-leopard-006

10164 INFO TEST-START | /tests/toolkit/content/tests/widgets/test_popupincontent.xul
2010-08-17 07:56:34.462 firefox-bin[343:10b] *** Assertion failure in -[NSNextStepFrame lockFocus], /SourceCache/AppKit/AppKit-949.54/AppKit.subproj/NSView.m:4755
2010-08-17 07:56:34.462 firefox-bin[343:10b] Mozilla has caught an Obj-C exception [NSInternalInconsistencyException: -[NSNextStepFrame(0x24ec6c10) lockFocus] failed with window=0x243028c0, windowNumber=793, [self isHiddenOrHasHiddenAncestor]=0]
INFO | automation.py | Application ran for: 0:03:11.787227
Blocks: 438871
Whiteboard: [orange]
(Reporter)

Updated

8 years ago
Blocks: 599143
(Reporter)

Comment 7

8 years ago
Orange is bug 599143. I'm not sure it's the same bug.
No longer blocks: 438871
Whiteboard: [orange]

Comment 8

8 years ago
(In reply to comment #1)

> Site:  http://www.geniustools.net/Products/search.aspx
> 
> If you just click on the pulldown menu under the text "WRENCHES & TORQUE
> WRENCHES" (about 2/3 down in left column), Firefox will crash immediately and
> ignominiously.

I can't reproduce this.

Comment 9

8 years ago
Until we have a better idea of what is going on here and some more evidence that this crash strikes frequently enough to be a top crasher I don't think we should block.
blocking2.0: final+ → -
Bug 601897 (a dup of this bug) has much better STR.
(Assignee)

Updated

8 years ago
Duplicate of this bug: 601897
The STR from bug 601897 make me crash at nsCocoaWindow::Show():

bp-41ee32a4-39d4-4137-a0c0-6073f2101006

But there are only 30 of these crashes in the last week -- so it's nowhere near a topcrasher.
All of the (79) crashes over the last 3 weeks are on either OS X 10.5.X or 10.6.X.

Comment 14

8 years ago
This is definitely a top crasher. This only started since that latest update of Mozilla. I will have to make a work around to avoid the browser crash. 

To replicate, go to http://sbgi.tapdev.co.uk/events/content/179/utility_street_works and select a drop down menu.

Comment 15

8 years ago
(In reply to comment #14)
> To replicate, go to
> http://sbgi.tapdev.co.uk/events/content/179/utility_street_works and select a
> drop down menu.

Awesome, that reproduced on the first try for me. Thanks!

Comment 16

8 years ago
I am also experiencing this precise crash on a page I developed. One thing it has in common with the test case above is the use of jQuery 1.4.2.

The original report I received from a user was for FF 3.6.10 under Windows 7. There, the select list would simply fail to drop down.

I am able to reproduce the actual crash under Mac OS X 10.5.8; sometimes it crashes, sometimes the select list drops down briefly and then disappears.

At this point, I'd really like some suggestions about what I might be able to do as a workaround. Turning off jQuery does solve it, but that's not an option that would fly with the bosses :-).

Comment 17

8 years ago
For what it's worth, I was able to avoid the crash by changing the CSS of one of the containing DIVs to "display: inline-block", where it had been "inline".

This doesn't seem to apply to the SBGI site. I tried changing a few things, and couldn't find anything analogous to prevent the crash. So the fact that I was able to prevent it this way in my site may be a coincidence.

Comment 18

8 years ago
Any progress on this?
(Assignee)

Updated

8 years ago
Duplicate of this bug: 625843
Bug 625843 has close to 100% effective STR for this "crash".
Many (most?) of this bug's "crashes" don't bring up Breakpad, and so
aren't included in the statistics at http://crash-stats.mozilla.com.
So what I said in comment #30 about this not being a topcrasher is
misleading -- we actually have no way of knowing whether or not its a
topcrasher.

So I'm going to try to find time next week for this bug, and see
whether or not it has an easy fix/workaround.  If it does, this bug
should probably block the FF4 final release.
> So what I said in comment #30 about this not being a topcrasher is
> misleading -- we actually have no way of knowing whether or not its a
> topcrasher.

It was comment #12.
This bug is our fault (not Apple's):  On the trunk it's a regression
caused by the combination of the patch for bug 517804
(http://hg.mozilla.org/mozilla-central/rev/3cdaf0a84414) and part of
the patch for bug 498340
(http://hg.mozilla.org/mozilla-central/rev/ea520d552f8c).

The problem is that the NS_WILL_PAINT event sent from [ChildView
viewWillDraw] can cause our ChildView object's NSWindow to be closed:

[NSView viewWillDraw] is called on an NSView object that the OS has
set up for drawing (including already having successfully called
[NSView canDraw] on it).  So no matter what we do in [ChildView
viewWillDraw], the OS *will* try to draw it (or at least its NSWindow)
after this method has returned.

But if we close our NSWindow in [ChildView viewWillDraw], every NSView
in it will become undrawable.  Then [NSView lockFocus] will throw an
exception when (subsequently) it calls [NSView lockFocusIfCanDraw] on
one of our NSWindow's NSViews and this call fails (returns FALSE).

There are two possible solutions to this problem:

1) Stop cross-platform code from deleting our widget (our ChildView
   object's mGeckoChild) while handling the NS_WILL_PAINT event, which
   in turn stops our ChildView's NSWindow from being closed.

2) Hook an undocumented NSView method
   (_displayRectIgnoringOpacity:isVisibleRect:rectIsVisibleRectForView:)
   to stop the OS from trying to draw an NSView that (since the last,
   successful, call to canDraw) no longer has an NSWindow, or whose
   NSWindow has since been closed.

Clearly #1 is preferable.  But if (for some reason) #1 is difficult
(or difficult to do quickly), we can safely fall back to option #2.

I'll post patches for both these options in later comments.

The regression range for this bug on the trunk is:

firefox-2009-11-15-03-mozilla-central
firefox-2009-11-16-04-mozilla-central

The patch for bug 517804 landed (on the trunk) on 2009-11-14 at
01:41:34 PST.  The relevant part of the patch for bug 498340 landed at
2009-11-15 at 14:59:38 PST.  I used 'hg bisect' to confirm that this
part of the patch for bug 498340 is what triggers this bug.

The regression range for this bug on the 1.9.2 branch is:

firefox-2009-11-18-03-mozilla-1.9.2
firefox-2009-11-19-03-mozilla-1.9.2

The patch for bug 517804 landed on the 1.9.2 branch on 2009-11-18 at
05:17:19 PST.

This bug doesn't happen on the earlier branches (see bug 625843
comment #6).
Blocks: 517804, 498340
Created attachment 505950 [details]
Stack trace of our NSWindow being closed while handling an NS_WILL_PAINT event

Here's a stack trace of our ChildView's NSWindow being closed while
sending an NS_WILL_DRAW event to Gecko from [ChildView viewWillDraw].
I made it using my debug patch, which I can post if people would like
to see it.

I admit that this stack trace puzzles me -- it seems to be missing a
few "levels".  But it does clearly show the ChildView's PopupWindow
being closed (and its widget and parent widget being deleted).
Created attachment 506015 [details] [diff] [review]
First stab at a Gecko patch

This patch sort-of works.

It gets rid of this bug's crashes, but doesn't fully resolve the
underlying problem -- the ChildView's NSWindow still gets closed (and
its widget deleted) in [ChildView viewWillDraw] whenever a browser
window is closed (though no crash or OS assertion results).

I also don't really understand how/why it works (to the extent that it
does work).

Roc, can you help me find a better solution?
How about in viewWillDraw, if the refcount of the window is 1 before we release the reference held by the nsPaintEvent, add a reference and dispatch an XPCOM event to release the last reference?
Created attachment 506560 [details] [diff] [review]
Cocoa patch rev1

> How about in viewWillDraw, if the refcount of the window is 1 before
> we release the reference held by the nsPaintEvent, add a reference
> and dispatch an XPCOM event to release the last reference?

I don't fully understand this -- retaining an NSWindow won't prevent
it being closed.  But I get your drift:  You don't think it's
worthwhile trying to prevent Gecko from deleting the current widget
(or its parents) while processing an NS_WILL_DRAW event in the general
case (in cross-platform code).

So here's another Cocoa-specific patch/workaround, which is much
simpler than my previous Cocoa patch (attachment 506017 [details] [diff] [review]).

I can't use the traditional "kung-fu death grip" to retain mGeckoChild
or its parent(s) -- they need to be retained until after [ChildView
viewWillDraw] returns.  And it turns out I don't need to retain
mGeckoChild -- only its parent(s).  But Cocoa provides a very
straightforward way to postpone releasing the parent widget(s) until
the next time through the main thread's run loop, which is all we
need.

This patch works fine in all my tests.  I don't know why I didn't
think of it last week.
Attachment #506015 - Attachment is obsolete: true
Attachment #506017 - Attachment is obsolete: true
Attachment #506560 - Flags: review?(roc)
Attachment #506560 - Flags: review?(joshmoz)
This patch is OK but I think it would be simpler if you just used a heap-allocated nsTArray<nsRefPtr<nsIWidget> >. Probably less leaky too --- what deletes aWidgetArray? I guess you'd need to use an XPCOM event instead of performWithSelector in that case.
In order to use performSelector:withObject:afterDelay:0 I need to use
an NSArray (a Cocoa object that can be retained and released using
[NSObject retain] and [NSObject release]).  This NSArray object gets
retained by the call to performSelector:withObject:afterDelay:0 and
then released after the delayed call to releaseWidgets: has returned.

This is very simple and straightforward (simpler than using an XPCOM
event), and won't result in any leaks.

Furthermore, if I did use an XPCOM event it might get run "too early"
(before the next time through the main thread's run loop), thanks to
nesting between the "native" and "Gecko" event loops.
Comment on attachment 506560 [details] [diff] [review]
Cocoa patch rev1

Oops, this patch might lead to releaseWidgets: messages being sent to
a deleted ChildView object.  I'll change it to also retain and release
mGeckoChild, which will fix the problem.
Attachment #506560 - Flags: review?(joshmoz)

Comment 32

8 years ago
Comment on attachment 506560 [details] [diff] [review]
Cocoa patch rev1

+    if (parent)
+      widgetArray = [NSMutableArray arrayWithCapacity:2];

I prefer brackets around all "if" blocks.
Created attachment 506746 [details] [diff] [review]
Cocoa patch rev2 (also retain and release mGeckoChild)

> I prefer brackets around all "if" blocks.

Also fixed in this patch.
Attachment #506560 - Attachment is obsolete: true
Attachment #506746 - Flags: review?(joshmoz)

Comment 34

8 years ago
Comment on attachment 506746 [details] [diff] [review]
Cocoa patch rev2 (also retain and release mGeckoChild)

Are you sure you don't want to addref "mGeckoChild" unless it has a parent?
> Are you sure you don't want to addref "mGeckoChild" unless it has a
> parent?

I'm not sure I understand.

Right now I'm only addrefing mGeckoChild if I send a (delayed)
releaseWidgets: message, and I only send a message if mGeckoChild has
a parent.

As best I can tell, we don't have trouble if mGeckoChild is deleted
during viewWillDraw and it *doesn't* have a parent.  I didn't see any
problems in my tests.  And I think
_displayRectIgnoringOpacity:isVisibleRect:rectIsVisibleRectForView: is
only ever called on top-level views (like NSGrayFrame or
NSNextStepFrame).  So when one of the [NSView _recursiveDisplay...]
methods is called, is only recurses through NSViews that are still
subviews of the top-level view.

Comment 36

8 years ago
(In reply to comment #35)

> Right now I'm only addrefing mGeckoChild if I send a (delayed)
> releaseWidgets: message, and I only send a message if mGeckoChild has
> a parent.

Right. I'm just making sure that was intentional. You say in the next bit of your reply why, that's all I was looking for.

> As best I can tell, we don't have trouble if mGeckoChild is deleted
> during viewWillDraw and it *doesn't* have a parent.

Comment 37

8 years ago
Comment on attachment 506746 [details] [diff] [review]
Cocoa patch rev2 (also retain and release mGeckoChild)

>+    if (parent) {
>+      widgetArray = [NSMutableArray arrayWithCapacity:3];
>+    }
>+    while (parent) {
>+      NS_ADDREF(parent);
>+      [widgetArray addObject:[NSNumber numberWithUnsignedInteger:(NSUInteger)parent]];
>+      parent = parent->GetParent();
>+    }

You could put the "while" loop inside the "if (parent)" block. That would be a little cleaner logic-wise and save a check when !parent.

>+    if (widgetArray) {
>+      NS_ADDREF(mGeckoChild);
>+      [widgetArray addObject:[NSNumber numberWithUnsignedInteger:(NSUInteger)mGeckoChild]];
>+      [self performSelector:@selector(releaseWidgets:)
>+                 withObject:widgetArray
>+                 afterDelay:0];
>+    }

You could put this stuff inside the "if (parent)" block as well. That makes it more clear that none of this work happen unless there is a parent.
Attachment #506746 - Flags: review?(joshmoz) → review+
Created attachment 506868 [details] [diff] [review]
Cocoa patch rev3 (follow Josh's suggestions)

I assume this is OK, Josh.
Attachment #506746 - Attachment is obsolete: true
(Assignee)

Updated

8 years ago
Attachment #506868 - Flags: review?(joshmoz)
Attachment #506868 - Flags: approval2.0?

Updated

8 years ago
Attachment #506868 - Flags: review?(joshmoz)
Attachment #506868 - Flags: review+
Attachment #506868 - Flags: approval2.0?
Attachment #506868 - Flags: approval2.0+
Comment on attachment 506868 [details] [diff] [review]
Cocoa patch rev3 (follow Josh's suggestions)

Landed on trunk:
http://hg.mozilla.org/mozilla-central/rev/5daf5e03b8e1
(Assignee)

Updated

8 years ago
Status: NEW → RESOLVED
Last Resolved: 8 years ago
Resolution: --- → FIXED
Comment on attachment 506868 [details] [diff] [review]
Cocoa patch rev3 (follow Josh's suggestions)

This bug also happens on the 1.9.2 branch.  So we should presumably also fix it there, after letting this patch bake on the trunk for a while.
Attachment #506868 - Flags: approval1.9.2.15?
(In reply to comment #40)
> This bug also happens on the 1.9.2 branch.  So we should presumably also
> fix it there, after letting this patch bake on the trunk for a while.

Why? We fix lots of bugs that also happen on old branches. What bad things happen to whom and how often with this bug vs. saving the regression risk and QA time if we take the patch?
status1.9.1: --- → unaffected
There's no good way to find out how many people experience this bug,
either on the trunk or the 1.9.2 branch (see comment #21).

But this bug is arguably worse on the 1.9.2 branch -- the "crash"
(actually an OS-level process abort, following the assertion) only
happens in 32-bit mode, and 64-bit mode isn't available on the
branches.  (And of course it's the "crash" that we should be most
concerned about.)

Also, the fix is very simple and straightforward.

But, like I said, it makes sense to let it bake on the trunk for a
while.  If it makes you feel better, maybe we should wait until after
the FF4 release (which will put the trunk patch into the hands of a
much larger number of users/testers).
Whiteboard: [sg:dos]
Unfortunately this appears to have missed 4.0b10 so we won't get serious feedback for a few weeks until b11 goes out
Comment on attachment 506868 [details] [diff] [review]
Cocoa patch rev3 (follow Josh's suggestions)

Approved for 1.9.2.15, a=dveditz for release-drivers
Attachment #506868 - Flags: approval1.9.2.15? → approval1.9.2.15+

Comment 45

7 years ago
When will this be applied to FF 3.6?

Comment 46

7 years ago
A better question is is there a chance this getting applied to FF 3.6 because this happens for me in 3.6 and not 4? See https://bugzilla.mozilla.org/show_bug.cgi?id=625843 which was marked as a duplicate of this issue.
Comment on attachment 506868 [details] [diff] [review]
Cocoa patch rev3 (follow Josh's suggestions)

Landed on the 1.9.2 branch:
http://hg.mozilla.org/releases/mozilla-1.9.2/rev/16c83c0d5e7e
(Assignee)

Updated

7 years ago
status1.9.2: --- → .5-fixed
> When will this be applied to FF 3.6?

It's now been landed on the 1.9.2 branch, which means it will appear in FF 3.6.15.

It will also appear in the FF 3.6.15pre nightlies, starting with tomorrow's.

Comment 49

7 years ago
Thank you very much and will help verify this tomorrow.

Comment 50

7 years ago
I tried the FF 3.6.15pre that I found in the below url and the fix worked! Good job guys, thanks!

http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-firefox-3.6.x/
status1.9.2: .5-fixed → .15-fixed
You need to log in before you can comment on or make changes to this bug.