Last Comment Bug 968838 - Crash on australis first-run when right side of window is off screen
: Crash on australis first-run when right side of window is off screen
Status: NEW
[Australis:P-]
: crash, leave-open
Product: Core
Classification: Components
Component: Layout (show other bugs)
: 30 Branch
: x86 Mac OS X
-- normal with 1 vote (vote)
: ---
Assigned To: Markus Stange [:mstange] (away until Feb 22)
:
: Jet Villegas (:jet)
Mentors:
https://www.mozilla.org/en-US/firefox...
Depends on: 985947
Blocks: fx-UITour
  Show dependency treegraph
 
Reported: 2014-02-06 07:51 PST by Axel Hecht [:Pike]
Modified: 2014-08-17 10:38 PDT (History)
16 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
+
fixed
+
fixed


Attachments
Problem report from OSX (169.01 KB, text/plain)
2014-02-07 06:33 PST, Axel Hecht [:Pike]
no flags Details
bottom of the stack from gdb (19.96 KB, text/plain)
2014-02-19 16:09 PST, Timothy Nikkel (:tnikkel)
no flags Details
a few iterations of the loop gdb stack (63.47 KB, text/plain)
2014-02-19 16:09 PST, Timothy Nikkel (:tnikkel)
no flags Details
part 1: don't constrain popups in nsCocoaWindow::DoResize (1.60 KB, patch)
2014-03-06 06:47 PST, Markus Stange [:mstange] (away until Feb 22)
enndeakin: review+
Details | Diff | Splinter Review
part 2: refuse reentrant nsCocoaWindow::DoResize as an additional safety measure (2.62 KB, patch)
2014-03-06 06:49 PST, Markus Stange [:mstange] (away until Feb 22)
smichaud: review+
sledru: approval‑mozilla‑aurora+
sledru: approval‑mozilla‑beta-
Details | Diff | Splinter Review

Description User image Axel Hecht [:Pike] 2014-02-06 07:51:01 PST
I reproducably crash when opening https://www.mozilla.org/en-US/firefox/29.0a2/whatsnew/ when the right side of the window is off screen, i.e., where the mixed popup would show up.

If the window is fully visible, the page works fine.

I've got three stack traces, all useless.

https://crash-stats.mozilla.com/report/index/2a5652cf-ad07-4a8e-8b90-33a6f2140206
https://crash-stats.mozilla.com/report/index/bp-ce6428bf-8639-40a9-9443-aa6b12140206
https://crash-stats.mozilla.com/report/index/bp-6d090c2c-1a10-4eae-b9e9-927a42140206

As there's no stack, I don't have a good idea where to file.

I didn't have a chance to test on non-mac.
Comment 1 User image Benjamin Smedberg [:bsmedberg] 2014-02-06 18:11:12 PST
Can you catch it in a debugger?
Comment 2 User image Timothy Nikkel (:tnikkel) 2014-02-07 00:13:26 PST
I tested this on OS X 10.8 and couldn't get a crash. I tried both just moving the right side of the window offscreen (the popup adjusted to be on screen) and with the right side of the window on an external monitor.

What version of OS X did you use?
Comment 3 User image Axel Hecht [:Pike] 2014-02-07 00:26:28 PST
I'm on 10.8.5. I'll try to reproduce in a debug build later today.
Comment 4 User image Axel Hecht [:Pike] 2014-02-07 06:33:36 PST
Created attachment 8372266 [details]
Problem report from OSX
Comment 5 User image Axel Hecht [:Pike] 2014-02-07 06:39:52 PST
(In reply to Axel Hecht [:Pike] from comment #4)
> Created attachment 8372266 [details]
> Problem report from OSX

That's me reproducing this crasher in a fresh debug build from today. Does this help? Is there something I can help with? I have to say though that objective c is code I never looked at in earnest, so I'm not sure how much help I'll be.
Comment 6 User image Timothy Nikkel (:tnikkel) 2014-02-07 14:02:21 PST
Thank you that helps a lot.

Looks like we are hitting a stack overflow, hitting a loop where we move a window, which triggers a viewWillDraw from the OS, which makes us update our current rendering state, which moves any windows which need to be moved, which triggers a viewWillDraw, and repeat.

Markus, haven't we run into this kind of problem before? (And by before I mean in the past 5 or so years)
Comment 7 User image Timothy Nikkel (:tnikkel) 2014-02-19 14:56:30 PST
In the absence of being able to reproduce this myself, Axel, do you have the ability to make your own builds? Or use a custom build with some logging?
Comment 8 User image Timothy Nikkel (:tnikkel) 2014-02-19 16:08:33 PST
Talked with Axel on irc, got some stacks from gdb that I need to analyze. I will upload them here so they don't get lost.
Comment 9 User image Timothy Nikkel (:tnikkel) 2014-02-19 16:09:03 PST
Created attachment 8378666 [details]
bottom of the stack from gdb
Comment 10 User image Timothy Nikkel (:tnikkel) 2014-02-19 16:09:41 PST
Created attachment 8378668 [details]
a few iterations of the loop gdb stack
Comment 11 User image Matthew N. [:MattN] (PM if requests are blocking you) 2014-02-19 16:17:20 PST
I was just commenting that I can reproduce on 10.9 but I see that you got the info from Axel.

In a clean profile, I position the window about half-way off-screen on the right side of my primary display (with my secondary display on the right). Opening the URL for the tour then causes a crash most of the time.
Comment 12 User image Markus Stange [:mstange] (away until Feb 22) 2014-02-20 17:19:29 PST
(In reply to Timothy Nikkel (:tn) from comment #6)
> Markus, haven't we run into this kind of problem before? (And by before I
> mean in the past 5 or so years)

Indeed, in bug 708278 and bug 788189. But those were caused by the context menu offset... I can't think of a reason that would cause us to oscillate between two completely different rects, which is what attachment 8378668 [details] seems to show (and aHeight=2880 really seems a bit much).
Comment 13 User image Axel Hecht [:Pike] 2014-02-21 03:52:39 PST
(In reply to Markus Stange [:mstange] from comment #12)
> (and aHeight=2880 really seems a bit much).

I'm testing this on a 24' monitor in portrait mode, height arguments can be big. The browser window wasn't, though.
Comment 14 User image :Gijs 2014-02-24 16:33:59 PST
(In reply to Markus Stange [:mstange] from comment #12)
> (In reply to Timothy Nikkel (:tn) from comment #6)
> > Markus, haven't we run into this kind of problem before? (And by before I
> > mean in the past 5 or so years)
> 
> Indeed, in bug 708278 and bug 788189. But those were caused by the context
> menu offset... I can't think of a reason that would cause us to oscillate
> between two completely different rects, which is what attachment 8378668 [details]
> [details] seems to show (and aHeight=2880 really seems a bit much).

Actually, it seems like it starts out with quite small parameters and does a "multiply by 3, divide by 2" sequence (swapping between two screen positions), presumably until it hits either window or screen constraints. Here's a grep through the loop for CocoaWindow::DoResize:

#42397 aX=1568.5, aY=50, aWidth=600, aHeight=960,
#42439 aX=4706, aY=150, aWidth=1800, aHeight=2880,
#42462 aX=1568.5, aY=50, aWidth=600, aHeight=960,
#42504 aX=4706, aY=150, aWidth=1800, aHeight=2880,
#42527 aX=1568.5, aY=50, aWidth=600, aHeight=960,
#42569 aX=4706, aY=150, aWidth=1800, aHeight=2880,
#42592 aX=1568.5, aY=50, aWidth=600, aHeight=960,
#42634 aX=4706, aY=150, aWidth=1800, aHeight=2367,
#42657 aX=1568.5, aY=50, aWidth=600, aHeight=789,
#42699 aX=4706, aY=150, aWidth=1476, aHeight=1578,
#42722 aX=1568.5, aY=50, aWidth=492, aHeight=526,
#42764 aX=4706, aY=150, aWidth=984, aHeight=1052,
#42787 aX=1568.5, aY=50, aWidth=328, aHeight=350.5,
#42829 aX=4706, aY=150, aWidth=656, aHeight=701,
#42852 aX=1568.5, aY=50, aWidth=218.5, aHeight=233.5,
#42894 aX=4706, aY=150, aWidth=437, aHeight=467,
#42917 aX=1568.5, aY=50, aWidth=145.5, aHeight=155.5,


From the other attachment, it's clear we started at:

#43177 0x000000010305e94b in nsCocoaWindow::DoResize (this=0x126a47820, aX=1560.5, aY=111.5, aWidth=23, aHeight=23, aRepaint=true, aConstrainToCurrentScreen=false) at nsCocoaWindow.mm:1373 .

Which looks like it's probably the circular highlight on the main panel menu?
Comment 15 User image Justin Dolske [:Dolske] 2014-02-25 23:45:21 PST
This sounds pretty serious (especially since all of our users will be exposed to this when Australis is released), and it would likely be a startup crash, so P1.

Timothy, can you own this? (I will assign assuming yes :), feel free to reassign if you can't). Do note that beta uplift is just a few short weeks away.
Comment 16 User image Timothy Nikkel (:tnikkel) 2014-02-25 23:51:00 PST
I don't know if I will have cycles for this before the current train ships, so if you need this you might want to find another owner.
Comment 17 User image :Gijs 2014-02-26 02:07:46 PST
(In reply to Timothy Nikkel (:tn) from comment #16)
> I don't know if I will have cycles for this before the current train ships,
> so if you need this you might want to find another owner.

Could you suggest someone who is sufficiently at home in this code to stand a chance of fixing it? :-)
Comment 18 User image Timothy Nikkel (:tnikkel) 2014-02-26 02:12:36 PST
Perhaps one of mats, smichaud, enndeakin, mstange, jfkthame. It seems to be an issue with retina displays, so someone who has one can probably reproduce and get to the bottom quicker (unlike myself).
Comment 19 User image Markus Stange [:mstange] (away until Feb 22) 2014-02-26 04:36:49 PST
I'll take it.
Comment 20 User image Mike Conley (:mconley) 2014-03-03 12:38:36 PST
Any updates on this, mstange? We're quickly approaching Beta uplift point...
Comment 21 User image Markus Stange [:mstange] (away until Feb 22) 2014-03-04 09:32:56 PST
I don't have a patch yet, but I know roughly what the problem is.

There are at least two problems here:
 1. The fact that nsCocoaWindow::DoResize attempts the widget to the target screen.
 2. The mismatch of scale factors between a popup widget and its parent window.

I reproduced the crash like this: I connected a 1920x1080 external, non-Retina monitor to my Retina Macbook. I made the external monitor the main screen, and positioned the internal Retina screen to the left hand side of it. Now the screens were sized and positioned like this in screen coordinates:
 - main screen, external, non-Retina, positioned at 0,0 and sized 1920x1080
 - secondary screen, internal, Retina, positioned at -1440,0 and sized 1440x900

Now I opened the Firefox window on the external screen, and moved it to the right so that the hamburger button was off-screen. Then I navigated to https://www.mozilla.org/en-US/firefox/29.0a2/whatsnew/ and watched the crash.

With the window position / size in my test, nsView::DoResetWidgetBounds wants to resize the circular highlight popup to this rect: 2018, 124, 46, 46. This rect is off the screen. So the first call to nsCocoaWindow::DoResize constrains the target rect to the main screen, and ends up with 1874, 124, 46, 46. During the resize call, viewWillDraw is called, which ends up in nsView::DoResetWidgetBounds again, and notices that the popup is not at the position that it was asked to be at. So nsCocoaWindow::Move is called and the popup is moved properly off screen, to 2018, 124, 46, 46.
Now what happens is that the popup's backing scale factor changes. It's not on any screen, which makes it default to a scale factor of 2 (I don't know why), so now its mBounds are doubled (4036, 248, 92, 92). When nsView::DoResetWidgetBounds is called the next time, it compares the widget's client bounds, which are now 4036, 248, 92, 92, to the intended bounds (still 2018, 124, 46, 46), doesn't notice that the mismatch is due to backing scale factor differences, and calls ResizeClient again. This brings us back to the beginning and we enter an infinite loop (growing the stack all the time).
Comment 22 User image :Gijs 2014-03-06 06:22:25 PST
(In reply to Markus Stange [:mstange] from comment #21)
> I don't have a patch yet, but I know roughly what the problem is.
> 
> There are at least two problems here:
>  1. The fact that nsCocoaWindow::DoResize attempts the widget to the target
> screen.
>  2. The mismatch of scale factors between a popup widget and its parent
> window.
> 
> I reproduced the crash like this: I connected a 1920x1080 external,
> non-Retina monitor to my Retina Macbook. I made the external monitor the
> main screen, and positioned the internal Retina screen to the left hand side
> of it. Now the screens were sized and positioned like this in screen
> coordinates:
>  - main screen, external, non-Retina, positioned at 0,0 and sized 1920x1080
>  - secondary screen, internal, Retina, positioned at -1440,0 and sized
> 1440x900
> 
> Now I opened the Firefox window on the external screen, and moved it to the
> right so that the hamburger button was off-screen. Then I navigated to
> https://www.mozilla.org/en-US/firefox/29.0a2/whatsnew/ and watched the crash.
> 
> With the window position / size in my test, nsView::DoResetWidgetBounds
> wants to resize the circular highlight popup to this rect: 2018, 124, 46,
> 46. This rect is off the screen. So the first call to
> nsCocoaWindow::DoResize constrains the target rect to the main screen, and
> ends up with 1874, 124, 46, 46. During the resize call, viewWillDraw is
> called, which ends up in nsView::DoResetWidgetBounds again, and notices that
> the popup is not at the position that it was asked to be at. So
> nsCocoaWindow::Move is called and the popup is moved properly off screen, to
> 2018, 124, 46, 46.

Dumb question: can we make DoResetWidgetBounds be aware of 'off screen' and not try to move things off screen?
Comment 23 User image Markus Stange [:mstange] (away until Feb 22) 2014-03-06 06:40:06 PST
(In reply to :Gijs Kruitbosch from comment #22)
> Dumb question: can we make DoResetWidgetBounds be aware of 'off screen' and
> not try to move things off screen?

We could do that, but then the highlight circle would no longer be over the thing that it's supposed to highlight. I think our case is a very valid use case for placing popups partly or completely offscreen.

Incidentally, the highlighter popup (#UITourHighlightContainer) has flip="none" set on it, which means "don't attempt to keep this panel on-screen".

For other flip values, nsMenuPopupFrame.cpp already takes care of keeping the popup on-screen, so DoResetWidgetBounds shouldn't need to handle it.
Comment 24 User image Markus Stange [:mstange] (away until Feb 22) 2014-03-06 06:47:57 PST
Created attachment 8386772 [details] [diff] [review]
part 1: don't constrain popups in nsCocoaWindow::DoResize

This fixes the endless recursion because nsView::DoResetWidgetBounds now no longer needs to move the popup to the right place, so it only calls Resize; and even though curBounds in newBounds still differ due to HiDPI scaling, recursion is stopped at the second Resize call because nsCocoaWindow::DoResize exits early (because newFrame == [mWindow frame]).

Constraining popups to the screen shouldn't be needed in nsCocoaWindow::Resize because nsMenuPopupFrame.cpp should already have done all necessary constrain-to-screen calculations. And any constraining in nsCocoaWindow::Resize will be reverted by nsView::DoResetWidgetBounds in another call to nsCocoaWindow::Move.
Comment 25 User image Markus Stange [:mstange] (away until Feb 22) 2014-03-06 06:49:21 PST
Created attachment 8386773 [details] [diff] [review]
part 2: refuse reentrant nsCocoaWindow::DoResize as an additional safety measure

In case there are other ways to hit an infinite recursion here, this additional safety measure should at least prevent us from crashing.
Comment 26 User image Neil Deakin 2014-03-06 06:53:38 PST
Comment on attachment 8386772 [details] [diff] [review]
part 1: don't constrain popups in nsCocoaWindow::DoResize

This seems fine for nsMenuPopupFrame. There are two other cases where eWindowType_popup is used (html:select and the popup flag to window.open) where this might be needed but I'm not familiar with the former enough to know.
Comment 27 User image Markus Stange [:mstange] (away until Feb 22) 2014-03-06 07:54:57 PST
(In reply to Neil Deakin from comment #26)
> html:select

Hmm, looks like we don't constrain those to the screen at all.
In any case, this also sizes its widget through an nsView, so it goes through nsView::DoResetWidgetBounds, so it probably does the same resize->move dance as XUL popups.

> and the popup flag to window.open

Huh! I didn't know about that one. I'll do some tests.
Comment 28 User image Neil Deakin 2014-03-06 08:03:27 PST
(In reply to Markus Stange [:mstange] from comment #27)
> Huh! I didn't know about that one. I'll do some tests.

I think we probably should end up removing support for the flag entirely. It's not documented and I think it is only used for the alert notifications.
Comment 29 User image Markus Stange [:mstange] (away until Feb 22) 2014-03-06 08:18:28 PST
(In reply to Neil Deakin from comment #28)
> (In reply to Markus Stange [:mstange] from comment #27)
> > Huh! I didn't know about that one. I'll do some tests.
> 
> I think we probably should end up removing support for the flag entirely.
> It's not documented and I think it is only used for the alert notifications.

I agree. I couldn't find any other users apart from nsXULAlerts.cpp either.
Comment 30 User image Steven Michaud [:smichaud] (Retired) 2014-03-06 08:46:18 PST
Comment on attachment 8386773 [details] [diff] [review]
part 2: refuse reentrant nsCocoaWindow::DoResize as an additional safety measure

Looks fine to me.
Comment 31 User image Markus Stange [:mstange] (away until Feb 22) 2014-03-13 05:24:22 PDT
I've pushed part 2 to inbound, because it already fixes the crash on its own:
https://hg.mozilla.org/integration/mozilla-inbound/rev/0daba7f5bcf7

Neil, about part 1: Do you want me to make any changes, or do any more tests, before you review it?
Comment 32 User image Neil Deakin 2014-03-13 05:26:48 PDT
I assumed you might test or ask someone more familiar with the other popup uses.
Comment 33 User image Ryan VanderMeulen [:RyanVM] 2014-03-13 12:39:03 PDT
https://hg.mozilla.org/mozilla-central/rev/0daba7f5bcf7
Comment 34 User image :Gijs 2014-03-15 12:12:40 PDT
Markus, can we land the crash fix on 29 as well? (ie get aurora or beta approval)
Comment 35 User image Markus Stange [:mstange] (away until Feb 22) 2014-03-16 00:38:58 PDT
Comment on attachment 8386773 [details] [diff] [review]
part 2: refuse reentrant nsCocoaWindow::DoResize as an additional safety measure

Sure.

[Approval Request Comment]
I'm requesting approval for 29.

Bug caused by (feature/regressing bug #): unknown, probably in the initial HiDPI implementation
User impact if declined: crash when the UI tour starts and the user has multiple monitors, one of them HiDPI, and positions the window offscreen
Testing completed (on m-c, etc.): manual testing + 1 day on mozilla-central
Risk to taking this patch (and alternatives if risky): very low risk, this patch is just a recursion guard
String or IDL/UUID changes made by this patch: none
Comment 36 User image Sylvestre Ledru [:sylvestre] 2014-03-16 08:42:10 PDT
Comment on attachment 8386773 [details] [diff] [review]
part 2: refuse reentrant nsCocoaWindow::DoResize as an additional safety measure

28 is going to be publish in a few days. We cannot accept this patch for beta.
Comment 37 User image Matthew N. [:MattN] (PM if requests are blocking you) 2014-03-16 11:42:51 PDT
Part 2 pushed to aurora: https://hg.mozilla.org/releases/mozilla-aurora/rev/d88bcc90ab03
Comment 38 User image Justin Dolske [:Dolske] 2014-03-18 19:04:48 PDT
What's this staying open for? Tests?
Comment 39 User image Matthew N. [:MattN] (PM if requests are blocking you) 2014-03-18 19:06:44 PDT
Part 1 needs review and landing
Comment 40 User image :Gijs 2014-03-24 06:01:24 PDT
(as the crash is now fixed, I think the remainder of this is much less important - we have few users on mac, and even fewer will have a second screen, and still fewer will open their browser halfway on each screen)
Comment 41 User image Mike Conley (:mconley) 2014-03-27 11:24:45 PDT
Removing from the Australis tracking dashboard.
Comment 42 User image Steven Michaud [:smichaud] (Retired) 2014-04-16 16:05:18 PDT
Bug 985947 blocks (and is triggered by) this bug's part 2 patch ("refuse reentrant nsCocoaWindow::DoResize as an additional safety measure").

Note You need to log in before you can comment on or make changes to this bug.