Open Bug 519591 Opened 15 years ago Updated 2 years ago

Investigate why allowing reflow to proceed for > 22ms causes ireflow issues

Categories

(Core :: Layout, defect)

x86
macOS
defect

Tracking

()

People

(Reporter: bzbarsky, Unassigned)

References

Details

Attachments

(3 files, 1 obsolete file)

STEPS TO REPRODUCE:
1)  Apply the patch in bug 519590
2)  Change this line:

+  mozilla::TimeDuration::FromMilliseconds(21);

to have 100 instead of 21.

3)  Load http://www.whatwg.org/specs/web-apps/current-work/ and wait for the browser to start responding again after the throbber stops.

4)  Scroll to the bottom of the page

5)  Drag the resizer to change the window width and see whether the window responds.

EXPECTED RESULTS:  Window responds just like it does when the 100 is replaced with 0.

ACTUAL RESULTS: Window responds much more choppily.
Boris, I hope you don't mind my attaching your original email. I believe it provides a helpful background.

On Sep 29, 2009, at 3:19 PM, Boris Zbarsky wrote:

So a summary of the problem:

Right now we interrupt reflow as soon as a certain number of lines has been reflowed.  It would be better to interrupt once we've spent a certain amount of time in the reflow.  This would help Windows CPU usage issues while resizing, as well as various flicker issues (we think).

When I tried doing this, using 100ms as the threshold, I discovered that resizing the html5 single-page spec no longer updated the browser ui frame smoothly on Mac.  Experimentation showed that around 21-23ms there was a state transition from good (21ms) to not (23ms) with 22ms somewhere in between.

I tried asking Josh whether he has any bright ideas, and he didn't offhand.  Here's what he had to say:

<josh> bz: I did a quick look for live resizing optimizations on Apple's end, you might want to look at setPreservesContentDuringLiveResize, a method on NSWindow
<josh> bz: I also looked for places where we might modify normal live resizing behavior
<josh> bz: http://mxr.mozilla.org/mozilla-central/source/widget/src/cocoa/nsCocoaWindow.mm#1598
<josh> bz: just something you might want to look at

that link is as of 8/11/09 12:08 AM EDT or so (the file has changed since then).

What would be useful here, I think is:

1)  Someone with mac widget clue (joelr?  mstange?  Someone else?) to look into why this behavior happens on mac.  I've attached my patch to this mail (using a 21ms timeout, but it's pretty easy to tweak).
2)  Someone with Windows widget clue (robarnold?) to check whether this approach really solves the Windows issues.  It seems to in my VM, but the issues are actually pretty different in a VM and not, from what I saw in past bugs.   In particular this means investigating the effect of various values of the timeout.
3)  Someone checking whether this approach fixes the flicker issues (I can do this; I haven't been putting much time into it because the whole approach wasn't even working on Mac.  I'll report on this either tonight or tomorrow morning).
4)  Someone to sanity-test this on Linux; I can probably do this too if we have no one else around.
Attached file Instruments time profile (obsolete) —
My working hypothesis is that we are not doing enough event processing to avoid native event starvation. From bug 516732, comment #10 (smichaud):

Native events (Cocoa events) can be processed from nsAppShell::Run() and from nsAppShell::ProcessNextNativeEvent().  Calls to nsAppShell::ProcessNextNativeEvent() (from various places in nsBaseAppShell::OnProcessNextEvent()) are needed to prevent native event starvation while processing Gecko events.
Summary: Investigate while allowing reflow to proceed for > 22ms causes ireflow issues → Investigate why allowing reflow to proceed for > 22ms causes ireflow issues
That seems entirely possible...  Do we end up handling the posted reflow event before we have finished processing the pending events?  Does changing the timer in PresShell::PostReflowEventOffTimer to, say, 10ms instead of "as soon as possible, asynchronously" help?
Oh, and I assume you can reproduce the issue I'm seeing?
I _can_ reproduce the issue you are seeing. 

Where in the trace do you see us posting the reflow event?

I'll look into the rest in an hour or so.
I can't get Instruments to do anything useful with the attached trace; I'm just trying to guess at ways we might be able to alleviate the event starvation issue.
You don't see the stack trace in Instruments? 

I don't know yet how the reflow event is posted and what it looks like. Please clue me in!
> You don't see the stack trace in Instruments? 

Nope.  I do File > Open on that .trace file, and nothing shows up anywhere.

A reflow event is posted any time we need to reflow, including after reflow completion if the reflow was interrupted.  In this last case, it's posted off a timer, not directly.  See PresShell::PostReflowEvent.
I think it's a simple matter of not processing Cocoa events often enough. I think you answered your own question in comment #1 where you talk about the transition from smooth resizing at 21ms to jumpy resizing after 23ms.

This chunk of nsPresContext::CheckForInterrupt in nsPresContext.cpp is the key:

  // Don't interrupt if it's been less than 
  mHasPendingInterrupt = 
    mozilla::TimeStamp::Now() - mReflowStartTime > sInterruptTimeout && 
    HavePendingInputEvent() &&
    !IsChrome();

You can be assured that there are pending input events as you are dragging. This causes us to break frequently to process drawing and other native events. Things start to get jumpy as you delay that processing longer and longer. Simply removing the timeout condition (or reducing the timeout) puts smooth resizing back. 

Going back to your original email, it seems that we don't interrupt reflow after a number of lines but rather after a number of lines _or_ an input event. It seems then that we can't delay the interrupt for more than 20ms since that interferes with event processing.
To answer my own question, going to 10ms for the timer doesn't seem to help.  Going to 50 helps some, though (with the reflow timeout set at 100ms).  Going to 100 makes things pretty smooth.  I guess it's possible that if something triggers a reflow while the timer is pending and the reflow takes longer than the timer delay we'll fire the timer immediately after the reflow and preempt other events.  Let me test that...
That would still leave the 22ms mystery, but perhaps that's just a matter of being "enough longer" than the typical actual delay of a "0ms" xpcom timer?
When do we reflow on the 0ms xpcom timer?

What is the relationship between that timer and the reflow interrupt as you implemented? Why does firing the timer every 50ms and not interrupting until 100ms have passed actually work during resizing?

I think that firing the timer every 50ms gives enough time to process events, redraw the screen, i.e. process everything accumulated during the 100ms chunk of reflow. With a 22ms reflow chunk and a 0ms timer, the timer must be firing too frequently to process pending events resulting in a jumpy resizing.
> When do we reflow on the 0ms xpcom timer?

Any time we interrupt reflow, the continuation is done off that timer (or more precisely, when the timer fires, we post an nsIRunnable to do the actual reflow).

> Why does firing the timer every 50ms and not interrupting until
> 100ms have passed actually work during resizing?

Presumably because it gives a bit more time between reflow completion and new reflow start during which we can process native events.  Still experimenting with that.

Something like comment 13 paragraph 3 sounds about right to me, I think.
OK, so we are in fact taking pretty variable amounts of time to fire the timer; between 500us and 166ms (!).  I just tried again, and on my hardware a 30ms time and a 100ms timeout for the interrupt seems to give decent responsiveness.
For future reference, this means that reflow (and other) processing takes between 500us and 166ms. 30ms are then enough to process system events accumulated during the 100ms reflow period and guarantee smooth resizing.

I'm working on DTrace static probes to see what we are doing during 500us-166ms.
Don't forget these in your mozconfig:

ac_add_options --enable-dtrace
ac_add_options --disable-strip
ac_add_options --disable-install-strip
Attachment #404013 - Attachment is obsolete: true
dtrace -x specsize=128m -x bufresize=auto -s reflow-timer.d -p 15030 > /tmp/reflow.log 2>&1

15030 is my firefox-bin pid.
OK.  So to sum up:

As best as we can tell this is just an event starvation scenario.  It's still not clear why the magic number is 22ms, exactly, but it may just a matter of system event processing voodoo.

The "do the next reflow" timer takes a lot longer than 0ms to actually fire in the usual case.  Increasing that timer value from 0ms to 30ms seems to make things smooth over here.  Joel's log shows that delays in that timer firing seem to be largely attributable to painting (that's whats going on in that log between the timer being primed and fired).  So it's possible that we were effectively processing reflow for a bit, then paint, and then immediately reflow without ever giving native events a chance...

The bug assignee didn't login in Bugzilla in the last 7 months, so the assignee is being reset.

Assignee: joelr → nobody
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: