Closed Bug 865702 Opened 7 years ago Closed 6 years ago

Weirdness with comboboxes pushed aside by Bookmarks sidebar

Categories

(Core :: Widget: Cocoa, defect)

x86
macOS
defect
Not set

Tracking

()

VERIFIED FIXED
mozilla24
Tracking Status
firefox21 --- affected
firefox22 --- verified
firefox23 --- verified
firefox24 --- verified

People

(Reporter: smichaud, Assigned: tnikkel)

References

Details

Attachments

(6 files)

I'm recording this so it doesn't get lost.  I currently have no idea what's the cause, which platforms it happens on, and how far back it goes.  But it does happen (at least on OS X 10.8.3) in FF 20 and today's mozilla-central nightly.

I hope to come back later to fill in more information.

1) Visit a page in bugzilla (this bug will do).
2) Scroll down to the bottom and click on the Additional Comments box (to focus it).
3) Press Cmd-b to open the Bookmarks sidebar.
4) Click on the Bookmarks sidebar close button to close it.
5) Click on one of the Status comboboxes -- it should open.
6) Click again on the Additional Comments box.
7) Press Cmd-b to open the Bookmarks sidebar.
8) Click on the Bookmarks sidebar close button to close it.
9) Click on one of the Status comboboxes -- this time it won't open.

I suspect these STR can be reduced.
Forgot to add that you can get out of this weird state by scrolling the page up or down.
I can't this to happen on Windows (Windows XP) with FF 20.0.1.  So presumably this is Mac only, and probably a Cocoa widgets bug.
Component: Untriaged → Widget: Cocoa
Product: Firefox → Core
André, you want to try your hand with this? :-)
It'd be worth testing whether HiDPI mode makes a difference here.

But the machine I was testing on is connected to one, non-Retina display, and doesn't support HiDPI mode.
Looks like a coordinate system issue: my window leaves some space till the bottom of the screen and I could see the combo box menu appear there. I'm not sure where and when the coordinate system is set. I can look into it, but if you have a hint, it's more than welcome.
I made my previous test on a Retina MBP on the HiDPI display.
Then reading comment #4, I tried to reproduce on a LoDPI connected display.
Nightly crashed. I submitted the report mentioning bug 865702 (this one) so that we may link the info here (I don't know where those reports go and how they are handled).
(In reply to comment #5)

> Looks like a coordinate system issue: my window leaves some space
> till the bottom of the screen and I could see the combo box menu
> appear there.

Interesting.

> if you have a hint

I'd start by finding out exactly what code is placing the combo box in
the wrong location, and work backwards from there.  The combo box is a
PopupWindow object, so you should be able to find this code by adding
logging to any code that sets or changes the location of a PopupWindow
object.
> Nightly crashed. I submitted the report ...

What's the bug number?
(In reply to Steven Michaud from comment #8)
> > Nightly crashed. I submitted the report ...
> 
> What's the bug number?

I didn't open a new bug in bugzilla. I'm talking about the Crash Reporter tool that shows up when FF crashes and proposes to send us a crash log. That's where I mentioned this bug number (865702).
OK then, what's the Socorro bug id?

(An example is 1987d390-9c36-43ef-84ac-639942130420.  Cite it here prefaced with "bp-" and bugzilla will automatically link to the bug report.  E.g. bp-1987d390-9c36-43ef-84ac-639942130420.)
I guess that's what I find in:
Users/andre/Library/Application Support/Firefox/Crash Reports/submitted

Here it is:
bp-52cfb6fd-dd8d-48f1-bc79-09dff2130425
Unfortunately that crash dump is completely useless :-(

"Signature EMPTY: no crashing thread identified; corrupt dump"

Congratulations, though.  I've never before seen such an empty Socorro crash log :-)
I crashed Nightly on purpose again, and the report is empty again.
Now that's reproducible, maybe we should file a bug about Socorro ?
Or read bsmedberg's blog, since he's posted several times about that this week, pointing at bug 837835.
André, one thing you can do (which may be very important) is check whether the reports in your ~/Library/Application Support/Firefox/Crash Reports/ directory are as empty/corrupt as the data corresponding to them in Socorro.  Whatever you find, you should probably report it at bug 837835.
Also check in ~/Library/Application Support/CrashReporter/ for any Apple crash logs you may have corresponding to the empty crash logs on Socorro.
> Or read bsmedberg's blog

And no, I don't read bsmedberg's blog (or *any* blog).  Where is it?
I noticed that the offset of the popup menu is twice that of the page: if the page is at the top, the "show bookmarks" trick doesn't affect the popup position (there are plenty of such popups in the bugzilla form). Then just try to scroll a few pixels down repeat the "show bookmarks" trick... It looks as if the offset was twice the scroll distance. Which could be due to a + / - inversion somewhere. Still looking into it...
One thing to do here is find a regression range.  That should provide more clues.
It will be my first time working with regression.

Meanwhile I have some more clues:

I tested the bug on an older (non-retina) MBP with the same Mac OS and FF versions, and the bug does NOT show. Which suggests the bug is specific to retina MBP.

Back to my retina MBP, I checked the bug affects windows created both on the retina display and on the external display. I tested that because bug 866027 shows there is something different between windows created on LoDPI and HiDPI devices.

I also plan on running FF under valgrind, because I suspect crash from comment 6 is caused by addressing pixels outside of a specified buffer. I noticed FF crashes only when the menu would be drawn below the bottom of the lower screen (in my case the LoDPI screen is above the retina MBP's screen).
> Which suggests the bug is specific to retina MBP.

Not so.  I can reproduce on a Mac Pro running OS X 10.8.3 connected to a non-Retina display.  (So far I haven't tested on my Retina MBP.)

You really need to look for a regression range in mozilla-central nightlies, and then find (by using hg bisect or bisecting by hand) which patch triggered/caused this bug.  This should help a lot in figuring out the bug.
> crash from comment 6

I'd bet this is a separate bug.  But you certainly should also investigate it.

One thing I find very useful in tracking down memory access bugs is libgmalloc (http://developer.apple.com/library/mac/#technotes/tn2124/_index.html#//apple_ref/doc/uid/DTS10003391-CH1-SECGMALLOC).  To use this with Firefox you need to turn off jemalloc (by setting the NO_MAC_JEMALLOC environment variable).

The reason it's useful is that it tends to crash as soon as the code makes any sort of illegal access, so you can get a stack of the actual problem (instead of one of its symptoms).
I tried on yet another Mac (MacBook Air, no retina), and had no issue. Neither on Nightly nor on Release.

Now trying on FF Release on my MBP retina, and still having the issue. I'm wondering what combination of factors triggers the bug! IMO, figuring that out would help a lot too.
It's possible you and I are seeing different bugs.  If I have some time (later this week) I'll dig a bit more into the one I saw (and reported), and look for its regression range.

In the meantime, please do find the regression range for your bug.  It may turn out to be different from mine.
I've done some retesting on my non-Retina Mac Pro.  I found that the bug only occurs if the browser window is "just" wide enough for there to be no horizontal scroll bar.  (I don't yet know exactly what "just" means.)

Yes, this bug keeps getting stranger :-)

Hopefully I'll have time later this week to look for a regression range.
Attached image Stack trace in Xcode
I launched Nightly under lldb with NO_MAC_JEMALLOC. In those conditions, when the crash occurs, the combo-box is only *painted white* : no text, not even the grey background are drawn. And the stack is 45624 frames deep from "start", which hints at an uncontrolled recursion. The bt command in lldb gave no results, but Xcode was displaying it, so I made a screen capture. Still investigating...
Please generate a text-mode all-thread stack trace for one of your crashes, and attach it here.  To do this you may need to run FF in command-line lldb or gdb.  And since the trace will be very large, you'll probably need to gzip it before uploading it.

Did you also load libgmalloc?

Are your crashes reproducible?  Do you have STR?

Note that (for unknown reasons) you can't start FF nightlies in commandline gdb (when you try it FF always crashes).  You need to run FF and then use gdb to attach to the process (e.g. "gdb firefox [firefox pid]").
Ok, I think I finally got it: the bug triggers *if and only if* showing/hiding the bookmarks pane also shows/hides the horizontal scrollbar (because the content pane width changes). For windows too wide, the scrollbar will never show. For windows too narrow, the scrollbar will always show. In both cases, we don't trigger the bug.

Now we have a reproducible crash!

I expect, this window width trick to be the reason why I could not reproduce the bug on the 2 others Macs I tested in comment 23.
Obtained with NO_MAC_JEMALLOC=1 but without libgmalloc.
Again the popup menu displayed only a while rectangle on screen, then FF crashed before drawing any background or strings in it.
Tested on one of the macs that was previously immune: I set the "right" window width, and made it crash :-)
The following line, which sends an event to Gecko, is causing -[ChildView viewWillDraw] to be re-entered infinitely:

http://hg.mozilla.org/mozilla-central/annotate/e474b6cfebce/widget/cocoa/nsChildView.mm#l3010

André, could you find which patch triggered this?  It may be http://hg.mozilla.org/mozilla-central/rev/919c98296dbf, but only testing will tell for sure.
Do you crash only in HiDPI mode?
(In reply to Steven Michaud from comment #32)
> Do you crash only in HiDPI mode?

No, I managed to crash both on HiDPI and LoDPI screens.

(In reply to Steven Michaud from comment #31)
> The following line, which sends an event to Gecko, is causing -[ChildView
> viewWillDraw] to be re-entered infinitely:
> 
> http://hg.mozilla.org/mozilla-central/annotate/e474b6cfebce/widget/cocoa/
> nsChildView.mm#l3010
> 
> André, could you find which patch triggered this?  It may be
> http://hg.mozilla.org/mozilla-central/rev/919c98296dbf, but only testing
> will tell for sure.

I'd be surprised this patch to be the cause, because it's more recent than FF Release, which also crashes on the same conditions.

As far as I can tell, this patch only adds an intermediary method call (nsChildView::WillPaintWindow) before the call to the same listener method (nsIWidgetListener::WillPaintWindow). I have to look further back in history. But I'm afraid it's been a long time ago...
I'm not sure but here is where the paint logic seems to have changed:

https://hg.mozilla.org/mozilla-central/annotate/d1b17b12a472/widget/cocoa/nsChildView.mm#l2685

From what I understand, we switched from posting a "gecko event" to directly calling a method. May this be the cause of the ill recursion?

Anyway, I guess there were good reasons to change the paint logic (has to do with OMTC ?). So maybe we'd better figure out why we have an infinite recursion and how to avoid it, while keeping the new logic. Am I wrong?
What you really need to do, as I've said several times above, is find the regression range for your crashes.

Find which mozilla-central nightly is the first with which you can reproduce the bug, then bisect the tree until you find the individual patch that causes/triggers the problems.

Until we have that information, speculation is pretty useless.
I checked that the regression window is from nightly 2012-08-15 to 2012-08-16. Now trying to figure out how to confirm that changeset d1b17b12a472 is the cause. As far as I can tell, this is the only patch touching nsChildView.mm in this window.
> the regression window is from nightly 2012-08-15 to 2012-08-16

Testing on OS X 10.8.3 with my STR from comment #0 (as revised by comment #25), I get the same regression range.  So I think we're making progress.

I've never seen one of André's crashes, but it seems likely that our bugs are at least related.

For convenience, here's my revised STR in one sequence:

1) Visit a page in bugzilla (this bug will do).
2) Make the page just narrow enough for the horizontal scrollbar to disappear.
3) Scroll down to the bottom and click on the Additional Comments box (to focus it).
4) Press Cmd-b to open the Bookmarks sidebar.
5) Click on the Bookmarks sidebar close button to close it.
6) Click on one of the Status comboboxes -- it should open.
7) Click again on the Additional Comments box.
8) Press Cmd-b to open the Bookmarks sidebar.
9) Click on the Bookmarks sidebar close button to close it.
10) Click on one of the Status comboboxes -- this time it won't open.
The image is the concatenation of 3 screen captures taken while having a breakpoint in nsCocoaWindow::DoResize at line [mWindow setFrame:newFrame display:YES] and performing steps from comment 37.

During the deadly recursion, the "window" where the popup should be drawn is moved and resized in a 3 steps cycle. As previously noticed the popup is misplaced, much below the position it should have, but this time none of the 3 positions is vertically aligned with the webpage's control.
I'm thinking we could work around this bug by simulating a quick 1 pixel scroll back and forth after the horizontal scroll bar is displayed or hidden. Don't know how to do that yet, but I'm sure I'll figure it out much faster than the subtleties of the layout engine. Then we can make a follow-up bug, which will be an "improvement". Meanwhile FF won't crash any more on this one. Any thoughts?
By using hg bisect, I found that this bug was caused/triggered by the following changeset:

http://hg.mozilla.org/mozilla-central/rev/f8d76d32e29b

At least the bug I found was.  André, please test to see if this was also the trigger for your crashes.
Blocks: 780661
I've now got STR for André's crashes.  Interestingly they only work in mozilla-central builds dated 2013-04-16 and later.  And the top (the "signature") is different (not surprising since they're infinite recursion crashes).  And there are some trivial differences due (presumably) to the fact that I've been testing on OS X 10.7.5, and André's been testing on OS X 10.8.3.  Otherwise they're identical.

Here are the steps:

1) Plug in an external monitor to your MacBook Pro and arrange it on top of your laptop's display.
2) Visit a page in bugzilla (this bug will do).
3) Move that page to the external monitor, if it doesn't open there.
4) Make the page just narrow enough for the horizontal scrollbar to disappear.
5) Scroll down to the bottom of the page.
6) Press Cmd-b to open the Bookmarks sidebar.
7) Click on the Status button -- its combobox should open.
8) Press Cmd-b again to close the Bookmarks sidebar.
9) Press Cmd-b again to open the Bookmarks sidebar.
10) Click on the Status button again ... and crash.

Now that I can reproduce the crashes I'll take this bug.  But I won't be able to work on it until next week.
Assignee: nobody → smichaud
For the record here's a stack trace of one of my crashes, zipped up because it's so large.
Thanks for looking into this Steven.  It looks like we have an infinite recursion
stemming from the WillPaintWindow flush.  See bug 851641 for a recent discussion
on a similar problem.

...
#472 0x0000000101c02ddd in nsViewManager::WillPaintWindow ()
#473 0x0000000101c01549 in nsView::WillPaintWindow ()
#474 0x000000010225354e in -[ChildView viewWillDraw] ()
#475 0x00007fff98edef08 in -[NSView viewWillDraw] ()
#476 0x00007fff98eddc4d in -[NSView _sendViewWillDrawInRect:clipRootView:suppressRecursion:] ()
#477 0x00007fff98edc9b8 in -[NSView displayIfNeeded] ()
#478 0x00007fff99035fe4 in -[NSNextStepFrame displayIfNeeded] ()
#479 0x00007fff98f9ca8f in -[NSWindow _setFrameCommon:display:stashSize:] ()
#480 0x00007fff98f9c093 in -[NSWindow setFrame:display:] ()
#481 0x0000000102242c01 in nsCocoaWindow::DoResize ()
#482 0x0000000102289908 in nsBaseWidget::ResizeClient ()
#483 0x0000000101bffd87 in nsView::DoResetWidgetBounds ()
#484 0x0000000101c021e9 in nsViewManager::ProcessPendingUpdatesForView ()
#485 0x0000000101c0220e in nsViewManager::ProcessPendingUpdatesForView ()
#486 0x0000000101c0220e in nsViewManager::ProcessPendingUpdatesForView ()
#487 0x0000000101c0220e in nsViewManager::ProcessPendingUpdatesForView ()
#488 0x0000000101c0220e in nsViewManager::ProcessPendingUpdatesForView ()
#489 0x0000000101c02ddd in nsViewManager::WillPaintWindow ()
#490 0x0000000101c01549 in nsView::WillPaintWindow ()
#491 0x000000010225354e in -[ChildView viewWillDraw] ()
#492 0x00007fff98edef08 in -[NSView viewWillDraw] ()
#493 0x00007fff98eddc4d in -[NSView _sendViewWillDrawInRect:clipRootView:suppressRecursion:] ()
#494 0x00007fff98edc9b8 in -[NSView displayIfNeeded] ()
#495 0x00007fff99035fe4 in -[NSNextStepFrame displayIfNeeded] ()
#496 0x00007fff99035f79 in -[NSWindow display] ()
#497 0x00007fff9953caeb in -[NSWindow _updateSettingsFromSavedScreen:scaleFactor:] ()
#498 0x00007fff98ff2809 in -[NSWindow _setFrame:updateBorderViewSize:] ()
#499 0x00007fff98ff1f56 in -[NSWindow _oldPlaceWindow:] ()
#500 0x00007fff98f9c86e in -[NSWindow _setFrameCommon:display:stashSize:] ()
#501 0x00007fff98f9c093 in -[NSWindow setFrame:display:] ()
#502 0x0000000102242c01 in nsCocoaWindow::DoResize ()
#503 0x0000000102289908 in nsBaseWidget::ResizeClient ()
#504 0x0000000101bffd87 in nsView::DoResetWidgetBounds ()
#505 0x0000000101c021e9 in nsViewManager::ProcessPendingUpdatesForView ()
#506 0x0000000101c0220e in nsViewManager::ProcessPendingUpdatesForView ()
#507 0x0000000101c0220e in nsViewManager::ProcessPendingUpdatesForView ()
#508 0x0000000101c0220e in nsViewManager::ProcessPendingUpdatesForView ()
#509 0x0000000101c0220e in nsViewManager::ProcessPendingUpdatesForView ()
#510 0x0000000101c02ddd in nsViewManager::WillPaintWindow ()
...
I've been looking into this.

The only way WillPaintWindow calls ProcessPendingUpdates is if the view has a forced repaint, which happens because it has been resized. And you can see from the stack that we are constantly doing a move or resize on the widget. So the flush that ProcessPendingUpdates causes via calling WillPaint on the presshell is likely moving the view/widget around in an infinite loop.

In debugging this I see that the view for the select dropdown popup is about 6000 pixels off screen, this happens because when we reflow the scroll frame for the page (because the width of the page changed when we add/remove the bookmarks toolbar) we set the scroll position temporarily to 0,0. In the post reflow callback we position the view based on a scroll position of 0,0, and then the scroll frame reflow code sets the scroll position back to what it was. Based on that this would seem to be a general problem that applies to all views so I'm trying to get a testcase where, say, the view for an iframe gets wrong coordinates. Either that or there is something that is specific to select dropdowns that I haven't determined yet.
Assignee: smichaud → tnikkel
http://hg.mozilla.org/mozilla-central/rev/f8d76d32e29b caused this to show up because we use to position the frame view every time, now we check if the position of the frame has changed relative to its parent frame (it hasn't), so the erroneous view position doesn't get corrected.
Attached patch patchSplinter Review
The problem is that nsContainerFrame::FinishReflowChild moves the frame to the passed in position even if the NO_MOVE_FRAME bit is passed in. When the scroll frame code calls FinishReflowChild is just passes 0,0 for the position and the NO_MOVE_FRAME bit. The frame's position gets fixed up after reflow by scrolling calls. But before that we update some view coordinates and they don't get fixed if we don't have to do any actual scrolling after reflow.

The bug has existed since the initial checkin of this code in 1999.
Attachment #753089 - Flags: review?(roc)
https://hg.mozilla.org/mozilla-central/rev/87e9d2c528c0
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla24
Blocks: 876346
Comment on attachment 753089 [details] [diff] [review]
patch

[Approval Request Comment]
Bug caused by (feature/regressing bug #): bug 780661
User impact if declined: select dropdowns won't open, or be in the wrong place, or maybe a crash in some specific situations
Testing completed (on m-c, etc.): been on m-c for several days now
Risk to taking this patch (and alternatives if risky): this is pretty low risk, the rest of the cycle should be enough time to shake out any regressions it might cause
String or IDL/UUID changes made by this patch: none
Attachment #753089 - Flags: approval-mozilla-aurora?
Comment on attachment 753089 [details] [diff] [review]
patch

[Triage Comment]
Looks like we need this on beta as well (bug 876346) - please carry over the a+ if it lands cleanly there too.
Attachment #753089 - Flags: approval-mozilla-beta+
Attachment #753089 - Flags: approval-mozilla-aurora?
Attachment #753089 - Flags: approval-mozilla-aurora+
(In reply to Timothy Nikkel (:tn) from comment #49)
> Risk to taking this patch (and alternatives if risky): this is pretty low
> risk, the rest of the cycle should be enough time to shake out any
> regressions it might cause

WRT risk on Beta, we're only going to beta 3 tomorrow so we still have time to backout if there were unforseen regressions from this uplift.
Adding verifyme to get a little testing around this.
Keywords: verifyme
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:22.0) Gecko/20100101 Firefox/22.0
Build ID: 20130528181031

Verified as fixed on the latest Firefox 22 beta 3 build.
Verified as fixed on Firefox 23.0b3 (07/03), 24.0a2 (07/04) and 25.0a1 (07/04).
Status: RESOLVED → VERIFIED
Keywords: verifyme
QA Contact: ioana.budnar
Blocks: 890495
No longer blocks: 890495
Depends on: 890495
You need to log in before you can comment on or make changes to this bug.