Closed Bug 491700 Opened 11 years ago Closed 11 years ago

hang while resizing windows (CPU hits 100%, no interaction is possible)

Categories

(Core :: Layout, defect)

x86
Windows XP
defect
Not set

Tracking

()

RESOLVED FIXED

People

(Reporter: alice0775, Assigned: tnikkel)

References

Details

(Keywords: regression)

Attachments

(2 files, 5 obsolete files)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1b5pre) Gecko/20090506 Firefox/3.5.0
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2a1pre) Gecko/20090506 Minefield/3.6a1pre ID:20090506045618

Can anyone confirm the following issue?
Since CPU becomes 100%,  the View Souse Window can not be resized and Operation of Firefox becomes impossible. 


Reproducible: Always

Steps to Reproduce:
1. Start Minfield with New profile.
2. Open http://www.mozilla.org/projects/minefield/
3. View > Page Source
4. Resize width of the Page Source window.
5. Resize width of the Page Source window again.
Actual Results:  
CPU 100%
You can not operate Minefield.


Expected Results:  
Low CPU usage.
You can operate Minefield.


Works Fine:
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2a1pre) Gecko/20090505 Minefield/3.6a1pre ID:20090505041741

Broken:
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2a1pre) Gecko/20090506 Minefield/3.6a1pre ID:20090506045618

P.S. You can push "Window key" to enable operation again.
Keywords: regression
Summary: Since CPU becomes 100%, the View Souse Window can not be resized and Operation of Firefox becomes impossible. → Since CPU becomes 100%, the View Souse Window can not be resized and can not operate Firefox.
Summary: Since CPU becomes 100%, the View Souse Window can not be resized and can not operate Firefox. → Since CPU becomes 100%, the View Source Window can not be resized and can not operate Firefox.
Component: General → Layout
Product: Firefox → Core
Version: unspecified → Trunk
don't see this in Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2a1pre) Gecko/20090503 Minefield/3.6a1pre (.NET CLR 3.5.30729) ID:20090503044327

just updated in today's latest nightly, so restarting and shall see...
ok, I confirm this in Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2a1pre) Gecko/20090506 Minefield/3.6a1pre (.NET CLR 3.5.30729) ID:20090506045618

Also noticed that it only happens with resize operations that alter the window's width, not its height. In other words dragging the left side, right side or any of the corners of the window, but not when dragging the upper or lower side of it.

Hope this small detail helps.
QA Contact: general → layout
Hmm.  I can't seem to reproduce this on Mac.  I'll see what I can do about reproducing this on Windows.  Fun times!
And in a VM, vmware seems to go bonkers if I resize any window whatsoever (uses tons of CPU until I resize the vmware window).  But if I do resize the vmware window, there's no untoward Firefox CPU usage.

If someone has a Windows build, does removing the "toolkit && toolkit->UserIsMovingWindow()" check at http://mxr.mozilla.org/mozilla-central/source/widget/src/windows/nsWindow.cpp#7036 (in nsWindow::HasPendingInputEvent) help?
Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: blocking1.9.2?
(In reply to comment #6)
> does removing the "toolkit &&
> toolkit->UserIsMovingWindow()" check at
> http://mxr.mozilla.org/mozilla-central/source/widget/src/windows/nsWindow.cpp#7036
> (in nsWindow::HasPendingInputEvent) help?

Yep, it does. I can reproduce the bug, and removing that line fixes it.
Gavin did one more test for me.  While the CPU is at 100% there, we keep hitting that code and UserIsMovingWindow() keeps returning true.  ere, Neil, any idea how we get into that state, given that the user isn't actually moving or resizing the window anymore.
Offhand the only idea I have is to use Spy++ to watch the WM_ENTER/EXITSIZEMOVE messages to ensure that they are matched up correctly.
I can not reproduce this on win2k if that helps.
This issue only occurs if you have "Show windows contents while dragging" selected in the Display->Appearance->Settings control panel.
I was locally able to demonstrate that resizing the window could trigger a strange behaviour that prevents the system from exiting the resize state. Unfortunately this condition is extremely difficult to debug.
(In reply to comment #11)
> This issue only occurs if you have "Show windows contents while dragging"
> selected in the Display->Appearance->Settings control panel.

Ooops.  Meant to say Display->Appearance->Effects.
This issue is not confined to the view source window.  It also occurs with web content.

As pointed out in the Mozillazine forums, navigating to this page:

http://livedocs.adobe.com/coldfusion/7/

and if it isn't in two frames already, click the arrow in the top left to get another on the left, with a menu.

Horizontally resizing the window at this point produces the same result as resizing the view source window.

Removing the check as suggested in comment #6 "cures" the issue on this page as well.
Simply mozillazine forum index is enough to trigger this bug, although not 100% of the time.

Now, I'm not sure if this is helpful, but on the pages where it does reproduce there's a funny reflow effect when changing size: part of the page reflows immediately, and part of the page reflows only where dragging pauses/ends.

I made a screenshot of this effect on mozillazine http://members.ii.net/~syskin/blue.png
This was taken *during* upsizing the window. The top part was fully reflowing while changing window size, while the bottom part was only increasing blue border size. As soon as I paused that drag (not necessarily ending it), the bottom part would reflow as well.

Now, the freeze seems to happen before/during the bottom part's reflow. Frozen browser can have such blue border displayed.

Hopefully this helps.
> part of the page reflows immediately, and part of the page reflows only where
> dragging pauses/ends.

That's what I'd expect in some cases, sure.  But if you're seeing this, I wouldn't expect the 100% cpu thing to also happen.  That should happen effectively when the bottom part never reflows.

We could just take the resize check out, I guess, which would suck for one of the main use cases of ireflow (resizing large pages).
OK, so I think I know what the issue is.

Because of problems with plugins, nsAppShell.cpp explicitly processes keyboard messages first, then mouse messages, and then other messages. (Note that this unfortunately promotes keyboard messages over mouse messages which makes it difficult to mix the two on a slow system. It does at least manage not to change the relative ordering of keyboard and IME messages.) Thus when reflow is interrupted by a GetQueueStatus(QS_INPUT) test, these messages are immediately processed and cleared before the next attempt to reflow.

The problem here is that the resize loop is a modal message loop that presumably follows standard Windows message ordering, which is to promote posted messages over input messages. As nsAppShell::ScheduleNativeEventCallback posts a message to do its work, the next message is another attempt to reflow, which promptly interrupts itself ad nauseum (well, until the window loses focus, at which point the modal event loop bails out.)
Is the resize loop something we control?  And if so, can we change the ordering?

Alternately, is there a way to reduce the priority of a particular posted message?

If the answers to both of those are "no", then the obvious options seem to be either not looking at the resize state if no input events have been processed since the last time we interrupted due to resizing, or to put the reflow event for interrupted reflow off on a timeout instead of just a simple runnable...
No, we can't change the ordering, which is
    * Sent messages
    * Posted messages
    * Input (hardware) messages and system internal events
    * Sent messages (again)
    * Paint messages (these are synthetically generated)
    * Timer messages (synthetic events, not Gecko timers)

Looking at other events is no good, as it's the event that we're about to post that causes the problem. Firing it on a zero timeout would work though.
> as it's the event that we're about to post that causes the problem

Yeah, but if we look at other events and then _don't_ interrupt based on that, we don't post the event.

I'll try the zero-timeout thing as a first cut and see how that goes.
If I visit [url=http://blip.tv/file/2106626]this[/url] page, resizing the window in -any- direction (not just horizontal) freezes up firefox and explorer until I alt-tab to make it lose focus. Is this the same bug? 

tested on: Vista x64 with SP1
user-agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.2a1pre) Gecko/20090514 Minefield/3.6a1pre
Sorry about the broken link. For convenience: http://blip.tv/file/2106626
Almost certainly the same issue, yes.
Here is an extremely reduced testcase for the freeze that happens with horizontal resizing. It seems to be linked to the vertical length of the page.
Honestly, we don't need a reduced testcase.  We know exactly why the problem happens and how we need to fix it....
Sorry, I didn't get that from the comments. I know there's a one-line fix, but I thought you were trying to get to the bottom of why that line is making it fail.
Ah, ok.  The getting to the bottom was done in comment 17.  ;)
Attached patch Proposed fix (obsolete) — Splinter Review
Gavin, do you think you could try this out?
Assignee: nobody → bzbarsky
Status: NEW → ASSIGNED
Attachment #378959 - Attachment is patch: true
Attachment #378959 - Attachment mime type: application/octet-stream → text/plain
Comment on attachment 378959 [details] [diff] [review]
Proposed fix

This did not fix the problem for me (in a winxp vm).
Does not checking for resize in widget fix the problem in a vm?  For me, in a Vista vm trying to resize a window just kills vmware until I do something to make it wake up; see comment 6.

If it does, does changing the attached patch to use 10 instead of 0 as the timer delay help?
I'm using VirtualBox, and the behaviour I see wrt resizing is the same (as far as I can tell) as I see natively on Windows.

(In reply to comment #30)
> Does not checking for resize in widget fix the problem in a vm?  For me, in a
> Vista vm trying to resize a window just kills vmware until I do something to
> make it wake up; see comment 6.

Making the change described in comment 6 fixes the problem.

> If it does, does changing the attached patch to use 10 instead of 0 as the
> timer delay help?

Changing the 0 to 10 does not fix the problem.
What about changing it to 100?
No change.(In reply to comment #32)
> What about changing it to 100?

Doesn't have any noticeable effect.
Neil, any idea why?  That patch with a 100ms timer should resolve the issue you describe in comment 17, right?
I just tried the patch as is with 0ms timer and it seems to work for me.
That patch doesn't fix this bug for me.
adjusting the summary as this is not specific to the Page Source window. I hit this issue on Vista when resizing a large Google Docs document.
Summary: Since CPU becomes 100%, the View Source Window can not be resized and can not operate Firefox. → When resizing the Page Source window or some Web pages CPU hits 100% and no interaction is possible
Duplicate of this bug: 496016
Summary: When resizing the Page Source window or some Web pages CPU hits 100% and no interaction is possible → hang while resizing windows (CPU hits 100%, no interaction is possible)
Attached patch Proposed fix v2 (obsolete) — Splinter Review
So the reason that running the interrupted reflow off on a timeout didn't work is that another reflow is posted via PresShell::ProcessReflowCommands. This patch runs that off a timeout as well and it fixes the problem for me.

I noticed that there is another Windows only problem when resizing. Since HavePendingInputEvent will always return true during a resize, after sInterruptChecksToSkip calls to CheckForInterrupt we will always have an interrupt. This causes a noticeable problem when we have a block frame with more than ~200 lines -- the lines after ~200 will not get reflowed (or is that reflown? I'm no good at imaginary grammar) and so draw the same width as before the resize started. File a followup for this?
Attachment #381473 - Flags: review?(bzbarsky)
Attached patch Fix some comments (obsolete) — Splinter Review
Attachment #381474 - Flags: review?(bzbarsky)
Comment on attachment 381474 [details] [diff] [review]
Fix some comments

Good catch on the bogosity, but instead of removing "until SetInterruptState is called" it should be "until ReflowStarted is called" and similar in the other comment.

Also s/returend/returned/ and s/where returns true/where this returns true/.

And you can just roll this into the other patch.
Attachment #381474 - Flags: review?(bzbarsky) → review-
Comment on attachment 381473 [details] [diff] [review]
Proposed fix v2

Excellent catch!  You need to keep doing the non-timer thing, I think if !interrupted.  But other than that, this makes perfect sense.

Do you want me to just update the patch (it'll need a one-over from roc anyway), or do you want to do it?
Attachment #381473 - Flags: review?(bzbarsky) → review-
Attached patch patch v3 (obsolete) — Splinter Review
Using superreview to get roc to look at it, use whatever flag is appropriate.
Attachment #378959 - Attachment is obsolete: true
Attachment #381473 - Attachment is obsolete: true
Attachment #381474 - Attachment is obsolete: true
Attachment #381706 - Flags: superreview?(roc)
Attachment #381706 - Flags: review?(bzbarsky)
Followup for issue mentioned in comment 39 filed at bug 496500.
Attachment #381706 - Flags: review?(bzbarsky) → review+
Comment on attachment 381706 [details] [diff] [review]
patch v3

Beautiful.  Thank you!
With the patch from bug 496500 I can get this hang again; although it is hard to come by, so I can't really place blame on that patch.

It is caused by the posted reflow in FrameNeedsReflow. After a reflow we call DidDoReflow which calls HandlePostedReflowCallbacks which updates the attributes for the scroll bars, and that calls FrameNeedsReflow on some scroll bar frames.

Relevant stack:

gklayout.dll!PresShell::FrameNeedsReflow(nsIFrame * aFrame=0x03b2d0e0, nsIPresShell::IntrinsicDirty aIntrinsicDirty=eStyleChange, unsigned int aBitToAdd=1024)  Line 3572	C++
gklayout.dll!nsSliderFrame::AttributeChanged(int aNameSpaceID=0, nsIAtom * aAttribute=0x024a6db8, int aModType=1)  Line 323	C++
gklayout.dll!nsCSSFrameConstructor::AttributeChanged(nsIContent * aContent=0x03b2d0e0, int aNameSpaceID=0, nsIAtom * aAttribute=0x024a6db8, int aModType=1, unsigned int aStateMask=0)  Line 7736 + 0x15 bytes	C++
gklayout.dll!PresShell::AttributeChanged(nsIDocument * aDocument=0x03861240, nsIContent * aContent=0x026c72b8, int aNameSpaceID=0, nsIAtom * aAttribute=0x024a6db8, int aModType=1, unsigned int aStateMask=0)  Line 5025	C++
gklayout.dll!nsNodeUtils::AttributeChanged(nsIContent * aContent=0x0390948c, int aNameSpaceID=0, nsIAtom * aAttribute=0x024a6db8, int aModType=1, unsigned int aStateMask=0)  Line 110 + 0xbb bytes	C++
gklayout.dll!nsGenericElement::SetAttrAndNotify(int aNamespaceID=0, nsIAtom * aName=0x024a6db8, nsIAtom * aPrefix=0x00000000, const nsAString_internal & aOldValue={...}, nsAttrValue & aParsedValue={...}, int aModification=1, int aFireMutation=0, int aNotify=1, const nsAString_internal * aValueForAfterSetAttr=0x0012ede4)  Line 4369 + 0x1d bytes	C++
gklayout.dll!nsGenericElement::SetAttr(int aNamespaceID=0, nsIAtom * aName=0x024a6db8, nsIAtom * aPrefix=0x00000000, const nsAString_internal & aValue={...}, int aNotify=1)  Line 4300 + 0x22 bytes	C++
gklayout.dll!nsXBLPrototypeBinding::AttributeChanged(nsIAtom * aAttribute=0x024a6db8, int aNameSpaceID=0, int aRemoveFlag=0, nsIContent * aChangedElement=0x03a9cf00, nsIContent * aAnonymousContent=0x04668808, int aNotify=1)  Line 604	C++
gklayout.dll!nsXBLBinding::AttributeChanged(nsIAtom * aAttribute=0x024a6db8, int aNameSpaceID=0, int aRemoveFlag=0, int aNotify=1)  Line 969	C++
gklayout.dll!nsGenericElement::SetAttrAndNotify(int aNamespaceID=0, nsIAtom * aName=0x024a6db8, nsIAtom * aPrefix=0x00000000, const nsAString_internal & aOldValue={...}, nsAttrValue & aParsedValue={...}, int aModification=1, int aFireMutation=0, int aNotify=1, const nsAString_internal * aValueForAfterSetAttr=0x0012f200)  Line 4359	C++
gklayout.dll!nsGenericElement::SetAttr(int aNamespaceID=0, nsIAtom * aName=0x024a6db8, nsIAtom * aPrefix=0x00000000, const nsAString_internal & aValue={...}, int aNotify=1)  Line 4300 + 0x22 bytes	C++
gklayout.dll!nsGfxScrollFrameInner::SetCoordAttribute(nsIContent * aContent=0x03a9cf00, nsIAtom * aAtom=0x024a6db8, int aSize=14917441)  Line 2626	C++
gklayout.dll!nsGfxScrollFrameInner::FinishReflowForScrollbar(nsIContent * aContent=0x03a9cf00, int aMinXY=0, int aMaxXY=14917441, int aCurPosXY=1632420, int aPageIncrement=22920, int aIncrement=1140)  Line 2364	C++
gklayout.dll!nsGfxScrollFrameInner::ReflowFinished()  Line 2433 + 0x1b bytes	C++
gklayout.dll!PresShell::HandlePostedReflowCallbacks(int aInterruptible=1)  Line 4808 + 0x6 bytes	C++
gklayout.dll!PresShell::DidDoReflow(int aInterruptible=1)  Line 7034	C++
gklayout.dll!PresShell::ProcessReflowCommands(int aInterruptible=1)  Line 7290	C++
gklayout.dll!PresShell::FlushPendingNotifications(mozFlushType aType=Flush_InterruptibleLayout)  Line 4917 + 0x11 bytes	C++
gklayout.dll!PresShell::ReflowEvent::Run()  Line 6983	C++

Posting the reflow off a timer if HasPendingInterrupt() is true seems to fix it.
Hmm.  That might just be the right approach, yes.  If HasPendingInterrupt() is true, that means that the previous reflow we tried interrupted.  If we don't have a reflow event posted, that means that we're about to post one (it can't be that we posted one and then fired it already, since that would have reset what HasPendingInterrupt() returns.

I guess we break out of the ProcessReflowCommands loop if we interrupt on some root, so we can't clobber state that way either.  So yeah, that sounds pretty good in general.  If we do that check in PostReflowEvent() itself (and add a new function for the guts of it, that the timer could call directly), we can undo some of the other changes the earlier patches in this bug made, too.

Want to do that?
Attached patch patch v4 (obsolete) — Splinter Review
Attachment #381706 - Attachment is obsolete: true
Attachment #382447 - Flags: superreview?(roc)
Attachment #382447 - Flags: review?(bzbarsky)
Attachment #381706 - Flags: superreview?(roc)
Comment on attachment 382447 [details] [diff] [review]
patch v4

Boris is the right person to sr this.
Attachment #382447 - Flags: superreview?(roc) → superreview?(bzbarsky)
Robert, I wrote half the patch... I'd really appreciate a sanity-check on the approach.  ;)
   void     PostReflowEvent();
+  void     DoPostReflowEvent();

It would be useful to have a comment explaining what the difference is.

You also need a comment explaining why we use a timer instead of a regular XPCOM event. It seems to be something to do with Windows event processing.
Attached patch patch v5Splinter Review
Robert's comments addressed.
Attachment #382447 - Attachment is obsolete: true
Attachment #382927 - Flags: superreview?(bzbarsky)
Attachment #382927 - Flags: review?(roc)
Attachment #382447 - Flags: superreview?(bzbarsky)
Attachment #382447 - Flags: review?(bzbarsky)
Are there not any tinderbox tests being ran that should be hit by this? For instance, the opening of a file input upload window without having to resize at all.
(In reply to comment #53)
> Are there not any tinderbox tests being ran that should be hit by this? For
> instance, the opening of a file input upload window without having to resize at
> all.

I you encounter a hang not related to resizing it has nothing to do with this bug.  Your issue sounds more like bug 439687.
Attachment #382927 - Flags: superreview?(bzbarsky)
Attachment #382927 - Flags: superreview+
Attachment #382927 - Flags: review?(roc)
Attachment #382927 - Flags: review+
Comment on attachment 382927 [details] [diff] [review]
patch v5

>+PresShell::PostReflowEventOffTimer()
>+  if (!mReflowEvent.IsPending() && !mReflowContinueTimer) {

The only caller of this method is PostReflowEvent now, and so we'll never be called when mReflowEvent.IsPending().  I'll switch that to an assert, and with that looks good.
Pushed http://hg.mozilla.org/mozilla-central/rev/121b6b6d728b with that change.

Timothy, thanks for digging into this and fixing it!
Assignee: bzbarsky → tnikkel
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
(In reply to comment #54)
> (In reply to comment #53)
> > Are there not any tinderbox tests being ran that should be hit by this? For
> > instance, the opening of a file input upload window without having to resize at
> > all.
> 
> I you encounter a hang not related to resizing it has nothing to do with this
> bug.  Your issue sounds more like bug 439687.

I don't think that is correct since I haven't been able to reproduce the hang simply by trying to upload a file with today's build.
(In reply to comment #57)
> (In reply to comment #54)
> > (In reply to comment #53)
> > > Are there not any tinderbox tests being ran that should be hit by this? For
> > > instance, the opening of a file input upload window without having to resize at
> > > all.
> > 
> > I you encounter a hang not related to resizing it has nothing to do with this
> > bug.  Your issue sounds more like bug 439687.
> 
> I don't think that is correct since I haven't been able to reproduce the hang
> simply by trying to upload a file with today's build.

Windows does go into a modal loop when a dialog is open, so a similar hang could happen in such a case. However the resize modal loop is a special case because we are constantly reflowing the page. So the resize hang only ends when focus is changed. So with a dialog I would think the hang would end when reflow is finished.
I don't think it could go even into a similar hang in that case, as the hang here is caused by the reflow interrupting itself because it detects the resize in progress but being unable to defer itself because the modal resize loop prefers to redispatch the reflow event rather than respond to the mouse.
I'm still getting the hang on this page: http://iran.whyweprotest.net/news-current-events/8590-green-brief-20-july-06-a.html I can't resize it unless I turn off 'show window contents while dragging'.
(In reply to comment #60)
> I'm still getting the hang on this page:
> http://iran.whyweprotest.net/news-current-events/8590-green-brief-20-july-06-a.html
> I can't resize it unless I turn off 'show window contents while dragging'.

Hmm, I'm not able to reproduce this.
That's odd - I tried it in Safe Mode and it still happens. Can anyone reproduce? I can -maybe- get it to change the size once (it'll respond very slowly and jump to the new size all at once), but after that it locks up. (taking up 100% of a core) It happens to this bug as well - vertical resizing works fine by horizontal resizing makes it lock up.

I'm on Vista x64 w/SP2.
yeah confirm, resizing the window of page source of this bugzilla page is sluggish 

Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2a1pre) Gecko/20090706 Minefield/3.6a1pre (.NET CLR 3.5.30729) ID:20090706042748
Sluggishness is totally different from a hang where no interaction is possible.
so file new bug.?
There are reports of slowness when resizing at bug 499447, so if you have anything to add do it there unless you think what you are seeing is different (although that bug is kind of vague right now since it hasn't been pinned down yet). The patch in bug 496788 might fix the slowness issue. But hard hangs that never end, with no interaction possible, that are solved by alt-tabbing or hitting the windows key (or anything else that focuses something else) are this bug.
To clarify, I'm getting a hard hang sometimes (i.e. if I try to resize the window slowly) preceded by a single visible resize. After I alt-tab out the window is reset to the size it had before I began resizing. (still happens on latest nightly)
That definitely sounds like the same hang this bug should have fixed. I still can't reproduce. Are there any other pages it happens on? I think the patch bug 496788 would fix this, but unfortunately its not going to be taken and the tryserver builds with it have expired.
I still get the hang on the test case I attached - any long page seems to do. The moment the vertical scroll bar gets small enough (I couldn't really tell you how small, I'm not sure if it's always the same) the hang starts to happen. Even for say, any google search, if I make the window thin enough it starts to hang.
I do not want to sound condescending, but are you sure you are using the latest nightly? Check Help > About Mozilla Firefox, it will report a date like 20090715.
Yep, 20090722 at time of writing. I usually update daily, and since I'd just come home from my holiday I made sure to update right away. And don't worry, I know how frustrating it is to get reports you can't reproduce.

Worse, now -I'm- having trouble reproducing the problem. :\ I wasn't doing anything particularly extraordinary before, and I was definitely getting hangs. If it happens again I'll record exactly what tabs I had open at the time.
You need to log in before you can comment on or make changes to this bug.