Crash dismissing modal dialogs/in AppKit display when interleaved with layout [@ _NSRaiseError].[@ NSException raise:format:].[@ NSArrayRaiseBoundException]..[@ -[NSView displayIfNeeded]]

RESOLVED FIXED

Status

()

--
critical
RESOLVED FIXED
13 years ago
13 years ago

People

(Reporter: jaas, Assigned: mark)

Tracking

Trunk
PowerPC
Mac OS X
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments, 3 obsolete attachments)

(Reporter)

Description

13 years ago
When dismissing a modal dialog using a Cocoa build of Firefox, the browser crashes. To reproduce, just go to some site that pops up an http auth dialog.
(Reporter)

Comment 1

13 years ago
Created attachment 222417 [details]
crash log
(Assignee)

Comment 2

13 years ago
I can't reproduce in a current cocoafox build, by either clicking OK or Cancel in an HTTP auth dialog, or using the key equivalents.  This stack and the other one you e-mailed me are identical.

The most recent checkins to widget/src/cocoa I made were to flip [NSApp updateWindows] on and off.  It's back in there now, does the crash disappear if you comment that line out?
(Assignee)

Comment 3

13 years ago
I did manage to crash with this signature after pressing command-right arrow for history forward.
(Assignee)

Comment 4

13 years ago
Consistent testcase to reproduce (for me, anyway): start browser, type www.wikipedia.org into the location bar, press return.  Browser crashes before drawing any page content.
(Assignee)

Comment 5

13 years ago
Created attachment 222442 [details] [diff] [review]
exploratory work, for reference

This appears to be directly caused by doing our own event dispatch manually.  When I bypass all of the manual dispatch, these crashes don't occur.  (I'm bypassing by modifying nsAppShell::ProcessNextNativeEvent to return PR_FALSE immediately if !aMayWait, this forces all native events to wait to be processed by [NSApp run] when the method is called with aMayWait true.)

Darin keeps asking if bugs like this are caused by not having [NSApp run] on the stack at all.  I've been skeptical all along, but I decided to give it a try anyway in this case, given the above.  This patch is a quick and dirty way to get a single [NSApp run] on the stack.  It's enough to show that things are still broken when we're doing our own manual dispatch on top of [NSApp run].

Moral: there's something more that we're not doing with nextEventMatchingMask/sendEvent that the system does inside [NSApp run].
(Assignee)

Comment 6

13 years ago
I took a look at the GNUStep AppKit implementation of NSApplication, and we've accounted for everything significant in our manual dispatch except for making [NSApp isRunning] appear true.  I also tried accounting for the insignificant things like updating the main, window, and services menus after each event.  And I subclassed NSApplication to make isRunning always appear true (except to internal consumers in the real NSApplication, but this couldn't be the problem anyway because I saw the same results in manual dispatch with the exploratory patch where isRunning was true for real.)  And I still get the same result.

Comment 7

13 years ago
Maybe we can ask someone at Apple about this?
(Assignee)

Comment 8

13 years ago
Just for fun, I decided to write my own implementation of [NSApplication run], to use when calling ProcessNextNativeEvent with aMayWait true.  I then bypassed the nonblocking [NSApp nextEventMatchingMask:...] call as I did in comment 5.  That had the effect of putting all native event dispatch through this:

  while (mRunning) {
    if (NSEvent* event = [NSApp nextEventMatchingMask:NSAnyEventMask
                                            untilDate:[NSDate distantFuture]
                                               inMode:NSDefaultRunLoopMode
                                              dequeue:YES]) {
      NSAutoreleasePool* localPool = [[NSAutoreleasePool alloc] init];
      [NSApp sendEvent:event];
      [localPool release];
    }
  }

I fully expected this to cause the same problem, because it follows the manual event dispatch to the tee.  Much to my surprise, the app runs as well with this code as it did for my test in comment 5.

Even stranger, I still don't crash when I revise the above code to not block by using untilDate:nil, creating an ugly CPU-hogging loop.
(Assignee)

Comment 9

13 years ago
Created attachment 222673 [details] [diff] [review]
More exploratory work (modified version of 338249.stop.1)

This patch does not fix anything.  But it proves something very important.

All of the comments above assumed that I was doing something wrong in the AppShell, and that that's what was causing the crashes.  I had assumed that because the blocking event fetch wouldn't crash and the nonblocking fetch would crash, the problem had something to do with the nonblocking fetch.  That now seems to not be the case.

This patch is a modified version of the second attachment to bug 338249, attachment 222628 [details] [diff] [review].  Functionally, it's slightly inferior, but the point of this patch isn't to fix anything, it's to show that even when all event fetches are made nonblocking, WITHOUT bypassing event processing in the !aMayWait case, the crashes still occur.

Given that, along with the results I referred to as "strange" in comment 8, it's now apparent that the problem isn't so much our event loop model as it is some weird state that the NSView hierarchy is getting itself into, most likely as a result of Gecko events.  When ProcessNextNativeEvent is called with aMayWait PR_FALSE, it's because there are more Gecko events waiting, and not having processed all of them apparently is what produces this condition.  When native event handling only occurs when there are no Gecko events, this bug is not experienced.  So there must be some Gecko event out there that creates this state, which is later fixed up by some other Gecko event.  Maybe they're paired or something.  I suspect it's got something to do with nsChildView.
(Assignee)

Comment 10

13 years ago
OK, progress.  Notes (mostly so I don't forget, but there's a question for Josh at the end):

At least for the Wikipedia case, I tracked this down to the offending Gecko events being NS_PAINT events.  I started poking around in [ChildView drawRect:] and learned that the last rect asked to draw itself is 16x16, and that if I skip calling UpdateWidget for 16x16 rects, the crash is avoided.  That's a start.  I changed drawRect: to just fill in 16x16 rects instead of calling UpdateWidget and learned that the problem rect belongs to the favicon in the URL bar.  But not all favicons are problems.  Google's seems fine.

Indeed, I do see that around the affected UpdateWidget call (which eventually dispatches the NS_PAINT event), the number of sibling views drops from 3 to 2, which is consistent with the NSRangeException, "Uncaught exception: <NSRangeException> *** -[NSCFArray objectAtIndex:]: index (2) beyond bounds (2)".  The disappearing view was 0x0@0x0.  The depth of the problem view in the view hierarchy also seems to match the stack trace (although it's difficult to say for sure without examining AppKit source): you can trace the 16x16 problem view back up to its great-grandparent.

Josh, since I haven't really been able to reproduce according to your steps in comment 0, I'll offer an explanation (that you're free to refute): could you have gone to a site that uses http auth, entered a valid username and password, and loaded page that uses a bum favicon?  Or could you have hit cancel and been taken to a 401 page that has a favicon?
Severity: major → critical
(Assignee)

Comment 11

13 years ago
Created attachment 222786 [details] [diff] [review]
Temporary patch (I've got something better)

I figured this one out.  On the NS_PAINT event that was giving us trouble, we were getting into layout, which was responsible for calling Show(FALSE) on the sibling widget I saw getting unhooked in comment 10.  Apparently, AppKit's NSViews somehow cache the number of subviews and don't invalidate or update the cached value when subviews are removed (or added?) under certain circumstances.  Because the subview list is internally just an NSArray, getting a subview at [subviews objectAtIndex:index] where index >= [subviews count] is a serious problem.  Sounds like an AppKit bug to me.

This is the first test patch I wrote as a workaround.  It's not complete (not accounting for nsNativeScrollbarViews) and not as efficient as it could be, but it does avoid the crash.  I'm posting it because the comments describe the problem in greater detail.

After writing the test patch, I realized that our minimum runtime is now 10.3, so we can avoid adding and removing views to set visibility, and leverage the newer [NSView setVisibility:] functionality.  That's...
(Assignee)

Comment 12

13 years ago
Created attachment 222787 [details] [diff] [review]
Use [NSView setHidden:] to implement show/hide

...this patch.  (I meant [NSView setHidden:] in my previous comment.)

This implementation avoids the problem entirely by implementing nsIWidget::Show(PRBool) using [NSView setHidden:] instead of adding and removing NSViews from the native NSView hierarchy.  [NSView setHidden:] is new in 10.3, and we haven't moved any of the ppc builds to the 10.3 SDK yet, so I've provided declarations in category interfaces.  In addition to fixing nsChildView, I've also updated nsNativeScrollbarView, which is the other class that holds an NSView that implements mozView.
Attachment #222442 - Attachment is obsolete: true
Attachment #222673 - Attachment is obsolete: true
Attachment #222786 - Attachment is obsolete: true
Attachment #222787 - Flags: review?(joshmoz)
(Assignee)

Updated

13 years ago
Status: NEW → ASSIGNED
Summary: crash dismissing modal dialogs → Crash dismissing modal dialogs/in AppKit display when interleaved with layout [@ _NSRaiseError].[@ NSException raise:format:].[@ NSArrayRaiseBoundException]..[@ -[NSView displayIfNeeded]]
(Reporter)

Comment 13

13 years ago
Comment on attachment 222787 [details] [diff] [review]
Use [NSView setHidden:] to implement show/hide

+        [mParentView respondsToSelector: @selector(getNativeWindow)])

No space between ":" and the argument

#if MAC_OS_X_VERSION_MAX_ALLOWED < MAC_OS_X_VERSION_10_3

Why is this necessary in trunk code? Shouldn't it always be false?

+  NSRect orientation = NSMakeRect(0, 0, 100, 16);

I know those constants were there before your patch, but so long as you're touching that code can you macro-ize them? I don't like loose constants like that.
Attachment #222787 - Flags: review?(joshmoz) → review+
(Assignee)

Comment 14

13 years ago
Comment on attachment 222787 [details] [diff] [review]
Use [NSView setHidden:] to implement show/hide

MAC_OS_X_VERSION_MAX_ALLOWED refers to the SDK version - see also comment 12.  I'll make the other two changes on checkin.
Attachment #222787 - Flags: superreview?(mikepinkerton)
Comment on attachment 222787 [details] [diff] [review]
Use [NSView setHidden:] to implement show/hide

sr=pink
Attachment #222787 - Flags: superreview?(mikepinkerton) → superreview+
(Assignee)

Comment 16

13 years ago
Checked in on trunk.  We now return you to your regularly-scheduled instability.
Status: ASSIGNED → RESOLVED
Last Resolved: 13 years ago
Resolution: --- → FIXED
(Assignee)

Updated

13 years ago
Depends on: 339495
You need to log in before you can comment on or make changes to this bug.