Closed Bug 499447 Opened 11 years ago Closed 10 years ago

Hanging when changing width of main window - on BBC

Categories

(Core :: Layout, defect, P1, major)

x86
Windows XP
defect

Tracking

()

RESOLVED FIXED
Tracking Status
status1.9.2 --- beta1-fixed

People

(Reporter: toolz4schoolz, Assigned: sylvain.pasche)

References

()

Details

Attachments

(4 files, 1 obsolete file)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2a1pre) Gecko/20090618 Minefield/3.6a1pre
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2a1pre) Gecko/20090618 Minefield/3.6a1pre

When I changed the width of the main window while on the above BBC url it does so with some reluctance. The redrawing of the screen lags and stutters. A build from June 9th was *much* worse. What would hog one core for more than 10 seconds and prevent me from moving mouse onto Task Bar - I'd have to Ctrl-Alt-Del to see Task Manager.

No problems when changing height.

This is a clean profile created today.

Other sites fine. With this tab focused (offending tab still open) I can change width normally.

Reproducible: Always

Steps to Reproduce:
1.Go to above url.
2.Try to change width of main window.
3.
Actual Results:  
Lagging redraw, high CPU usage.

Expected Results:  
Fluid redraw.
Seems caused by Bug 67752. Looks like Bug 491700 only not a hang anymore, but slower than with a 4 May 2009 build.
Blocks: ireflow, 491700
Component: General → Layout
Product: Firefox → Core
QA Contact: general → layout
Version: unspecified → Trunk
Also reproducible on a large Google Docs document.
Status: UNCONFIRMED → NEW
Ever confirmed: true
I can't seem to reproduce with the bbc url when comparing the 4 May 2009 nightly to a current nightly. Please include any other urls where this is happening.

But 491700 is specifically about a hang, so if its just slow then probably not related as there are still issues with ireflow on Windows when resizing.
That's strange, I can't reproduce this on another computer (same OS and using 3.6). I'll try to investigate this when I get a chance.
Any information on reproducing (including the details on the focus thing from comment 0) would be much appreciated.
Just downloaded build from 22 June and the problem is still occurring.
All pages below that main url are affected (ie all low-res BBC News pages). Going down into Sports:
http://news.bbc.co.uk/sport/low
it seems appreciably slower - never less than 2 secs - one core on 100%.
Going down into Football:
http://news.bbc.co.uk/sport2/low/football/default.stm
it's slower still - often 7 or 8 seconds.
Mouse can't be moved onto Task Bar. This is what I call a *hang*.

[We can see that the feature that's changing as we drill down the site is the wrapped-line bunches of urls for the child sections (Football, Formula1, Olympics, ...).]
I'm very sorry for the updates. This url does *not* display any slowness whatsoever:
http://news.bbc.co.uk/sport2/low/football/teams/n/newcastle_united/8113312.stm
That kind of puts the lid on the 'wrapped line bunches of urls' theory.
In fact none of the 'story' pages seem to have the problem. All the 'category' pages I've tried have the problem.
These two are cut-down versions of the above to try to isolate the problem. They're the smallest pages I can get to exhibit the behaviour:
http://toolz4schoolz.com/bbcmiddle7.htm
http://toolz4schoolz.com/bbcmiddle8.htm
Tried again on two machines, these specific steps:
 - create a new clean profile
 - open http://news.bbc.co.uk/low (or one of Jon's testcase above)
 - click and drag the right side of the window to widen/shrink it 5-10 times.

On one machine, it hangs sometimes up to 1 minute (unless I alt-tab). On the other machine, it's just slow but I don't see any long hang. Both machines have the same OS (win7 RC 64-bit) and somewhat similar CPUs. I'm wondering if there's not something related to the graphic card driver that would send message differently during the resizing, that's kind of strange.

Also tried in a XP vm -> no hang, Vista vm -> hangs.
Trying to reproduce on http://news.bbc.co.uk/low I am able to consistently get a crash with the June 22 nightly. It doesn't seem to crash on other pages (a bugzilla bug, a bugzilla bug list, about:buildconfig, and a long wikipedia page).

Steps to reproduce. Open http://news.bbc.co.uk/low. Grab right edge of window shrink to the minimum width, then quickly move the mouse left and right so that the window grows a little bit and then back down to minimum size.

Here is the windbg stack for the crash.
Reproduced the crash in a debug build. I hit the assertion "overflow list w/o frames" at http://mxr.mozilla.org/mozilla-central/source/layout/generic/nsInlineFrame.cpp#346 before crashing. Attaching a Visual Studio stack for the assertion with line numbers.
It's probably worth fixing some of the known bad frametree deps of bug 67752 before worrying about that more...
Flags: blocking1.9.2?
Flags: blocking1.9.2? → blocking1.9.2+
Priority: -- → P2
For me the hangs/freezes are most visible with a debug build running on Win7. With an opt build, I still see some lags but that's less visible. I couldn't reproduce the lags on XP though.

I did some debugging and I saw that when resizing the window, native input events are never dispatched (nsAppShell::ProcessNextNativeEvent is never called while the event loop is looping many times). That's similar to bug 491700 symptoms: HasPendingInputEvent() is always returning true (because GetQueueStatus() has input events) and so we keep running reflow events.

I saw that nsAppShell::ProcessNextNativeEvent is never called because because mBlockNativeEvent is true when nsBaseAppShell::OnProcessNextEvent() is called. That's because there are some nested event loops apparently (see stack). As an experiment, I tried commenting the "if (mBlockNativeEvent) {" condition in nsBaseAppShell::OnProcessNextEvent to have nsAppShell::ProcessNextNativeEvent called, but strangely the hang is still there. I didn't investigate further.

I tried to add a 10ms delay to the reflow timer and the hangs are gone. Resizing is smooth with both the debug and opt build. I guess that could be an easy fix for this issue if there's no other side effects when delaying the timer a bit (and if that fixes the issue for everybody).
Jon (and others), do you still see this issue on trunk?

I can't reproduce with a release build, but I do with a debug build. This patch fixes the issue for me with the debug build. It could also improve the situation in nightly builds for people that still see this issue.
I can still reproduce this in a mozilla-central debug build on XP.
http://news.bbc.co.uk/sport/low is sluggish with high CPU when resizing
to narrow window widths.  The problem is even worse on
http://en.wikipedia.org/wiki/United_states which more or less deadlocks
at 100% CPU but can be "unlocked" by pressing the Windows Start Menu key.

I tested the patch that raises the timer to 10ms and it seems to fix
both those problems, and there's no regression for the file dialog at
http://dn.se (bug 496788) for me.
Comment on attachment 393957 [details] [diff] [review]
raises reflow timer from 0ms to 10ms

Thanks for the testing Mats.

Boris, do you think we should try that?
Attachment #393957 - Flags: review?(bzbarsky)
Not sure.  This can lead to serious pageload performance issues, I think, especially in cases where the user is interacting with the page.  I'd rather we tried not interrupting unless reflow has been running for a bit and seeing whether that helps (e.g. allows the reflow to complete in this case).
OK, so I do think raising that timer value is the right thing to do; just fixing bug 519590 is not enough to fix, e.g. bug 507260.

10ms is not enough, in my testing.  I'm going to give 30ms a shot and see how it goes.
Blocks: 519590, 507260
No longer depends on: 519590
Comment on attachment 393957 [details] [diff] [review]
raises reflow timer from 0ms to 10ms

r+, but please don't push this; I'll land it together with bug 519590 to minimize risk of talos issues.
Attachment #393957 - Flags: review?(bzbarsky) → review+
Assignee: nobody → sylvain.pasche
Attachment #393957 - Attachment is obsolete: true
Attachment #404089 - Flags: review+
Comment on attachment 404089 [details] [diff] [review]
With the 30ms timer.

roc, can you give this a once-over?
Pushed http://hg.mozilla.org/mozilla-central/rev/871d8ef60b3b

Sylvain, Timothy, Jon, Mats, thank you all for helping out with this stuff!
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
I think this should be a beta blocker.  It's a change that we really need beta testing on (and conversely, beta testing without this change is much less useful).
Priority: P2 → P1
Blocks: 496788
You need to log in before you can comment on or make changes to this bug.