Open Bug 745315 Opened 8 years ago Updated 7 years ago

up to 16ms delay before paint starts after setViewport

Categories

(Firefox for Android :: General, defect)

defect
Not set

Tracking

()

REOPENED
Firefox 14
Tracking Status
firefox14 - ---
blocking-fennec1.0 --- -

People

(Reporter: gal, Assigned: jrmuizel)

References

(Blocks 1 open bug)

Details

Attachments

(2 files, 1 obsolete file)

Native Fennec sends setViewport events from the Java frontend to Gecko every time the user pans the screen around. In Gecko these are batched and the paint is executed at the next refresh driver tick. This is bad, because the tick can be up to 16ms away.

Even on slow devices this delay is often larger than the paint time itself (trace from cnn.com below, panning a small amount):

I/RenderDump( 2797): paint time 9
I/RenderDump( 2797): 28442334 syncViewportInfo 1868 13229 720 1035 3.897975 1988.2378 13540.98 3.8979745
I/RenderDump( 2797): 28442337 adjustViewport v=RectF(1988.2378, 13541.187, 2468.2385, 14231.187) p=(3991.9995,14266.998) z=3.8979745 DisplayPortMetrics(1868.2375,13231.998,2588.2388,14266.998,3.8979745)

The 9 here is ms, obviously.

The right thing to do here is to request a lot viewer viewport changes, and then when we execute one it should draw immediately and pre-schedule the next refresh driver tick.
a lot => a lot less
JP, can you get someone to take a look and double check my math above, and if confirmed, we might want to get someone on this.
I agree with the problem described in this bug; I was seeing similar latencies caused by the refresh driver in bug 740883. Usually it wouldn't be the full 16ms but some smaller number like 4-5ms. Something I tried that helped was to just queue a DRAW event in nsAppShell right after we queued a Viewport:Change event; the DRAW coalescing behaviour would take care of any extra draws. Although this did provide some improvement I didn't think it was really enough to justify the hackiness of the solution. You can see the WIP I tried at https://bug740883.bugzilla.mozilla.org/attachment.cgi?id=611305
(In reply to Andreas Gal :gal from comment #0)
> The right thing to do here is to request a lot viewer viewport changes, and
> then when we execute one it should draw immediately and pre-schedule the
> next refresh driver tick.

Can you expand on this? I don't quite follow your suggested solution.
The refresh driver is driven by a timer right now. We should allow that timer to fire early when we need an immediate paint (trigger the refresh driver manually, set a flag so that the next timer tick doesn't actually paint).

We likely also have to reduce the frequency of setViewport events as well. Right now we hammer the Gecko thread with setViewport events. When you move a page around with your finger, the Gecko thread is constantly painting (up to 60 times a second, if the page paints quickly enough), even if you just move a tad. This adds additional latency. When you need a paint to happen, Gecko is usually busy servicing the previous paint event. That has to finish first, then we wait for a refresh driver tick, and then we start the paint we desperately need right away to avoid checker-boarding.
(In reply to Andreas Gal :gal from comment #5)
> The refresh driver is driven by a timer right now. We should allow that
> timer to fire early when we need an immediate paint 

Agreed

> (trigger the refresh
> driver manually, set a flag so that the next timer tick doesn't actually
> paint).
> 

That's one option. Another would be to modify the refresh driver scheduling to the same as BenWa is doing in the compositor, which seems better to me. This algorithm is: if we need to paint and have not painted in the last 16ms, paint immediately; otherwise schedule a paint such that it will happen 16ms after the last paint. This algorithm keeps the 16ms throttling but also removes the latency on first draw.

> We likely also have to reduce the frequency of setViewport events as well.
> Right now we hammer the Gecko thread with setViewport events. When you move
> a page around with your finger, the Gecko thread is constantly painting (up
> to 60 times a second, if the page paints quickly enough), even if you just
> move a tad. This adds additional latency. When you need a paint to happen,
> Gecko is usually busy servicing the previous paint event. That has to finish
> first, then we wait for a refresh driver tick, and then we start the paint
> we desperately need right away to avoid checker-boarding.

I don't necessarily agree this will help. We were doing this with the fixed-margin strategy earlier, and the problem I was seeing is that if you send fewer viewport changes, the area that gets drawn on each change is larger. Rather than drawing 1 row of pixels 50 times (for example) we will draw a block of 50 pixels once. This larger draw takes much more time (I think maybe even super-linear) and by the time it's done the user has already scrolled somewhere else. With the 50 small draws, however, the gecko thread is more responsive and the drawn content actually gets displayed to the user.

However, I concede that this behaviour is dependent on a lot of factors including the content being drawn. You can twiddle with the danger zone prefs in the fixed margin strategy to play with how often we send viewport changes. I'm adding similar prefs to the velocity bias in bug 744390.
Here's a version of kats' patch that works better. First it doesn't add the DRAW to the event queue, it just does it right away. This helps us avoid leaving extra DRAWs around when we coalesce VIEWPORTs. Second it uses the bounds of the window instead of a 0x0 rect. I originally tried it with a 0x0 rect and found that it did not work. Using the full bounds fixed it.

I've yet to measure the impact of this patch.
After thinking about this more, I'm pretty sure we want to do it. I've yet to try to measure any real difference but I think it's at least justifiable theoretically.
Attachment #615965 - Attachment is obsolete: true
Attachment #616189 - Flags: review?(bugmail.mozilla)
Comment on attachment 616189 [details] [diff] [review]
Draw directly from viewport changes

Review of attachment 616189 [details] [diff] [review]:
-----------------------------------------------------------------

Patch looks fine to me, but I really would like to see a measured benefit from this before we land it. It's entirely possible that because of this we'll be doing extra draw calls that just burn CPU cycles. In particular the DRAW event that the refresh driver will queue in response to a viewport change will be turned into a no-op.
(In reply to Kartikaya Gupta (:kats) from comment #9)
> Comment on attachment 616189 [details] [diff] [review]
> Draw directly from viewport changes
> 
> Review of attachment 616189 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Patch looks fine to me, but I really would like to see a measured benefit
> from this before we land it. It's entirely possible that because of this
> we'll be doing extra draw calls that just burn CPU cycles. In particular the
> DRAW event that the refresh driver will queue in response to a viewport
> change will be turned into a no-op.

The refresh driver will only queue a DRAW if there's an invalid region to draw.
Attachment #616189 - Flags: review?(bugmail.mozilla) → review+
I ran this on the red page with robocop and got the following results:

Before:
__start_report0.97285944__end_report
__startTimestamp1334864757196__endTimestamp

After:
__start_report0.97513074__end_report
__startTimestamp1334869192814__endTimestamp
Attachment #616189 - Flags: approval-mozilla-central?
Attachment #616189 - Flags: approval-mozilla-central? → approval-mozilla-central+
https://hg.mozilla.org/integration/mozilla-inbound/rev/c048eb7cc728
Assignee: nobody → jmuizelaar
Target Milestone: --- → Firefox 14
https://hg.mozilla.org/mozilla-central/rev/c048eb7cc728
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Backed out: https://hg.mozilla.org/mozilla-central/rev/9c88cc584f04

(Please include the bug number in your backout commit message in the future as well.  Thanks!)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Do we know why this regressed?
(In reply to Andreas Gal :gal from comment #17)
> Do we know why this regressed?

I'm sort of looking into it because applying this patch on top of your heuristic stuff is also giving odd results. What I see is that applying this patch seems to result in two draws. Specifically, the NS_PAINT event handler in nsViewManager kicks off an invalidate which results in another draw. Here is the stack from where this happens:

#0  nsWindow::Invalidate (this=0x5e1c2880, aRect=...) at /Users/kats/zspace/mozilla-git/widget/android/nsWindow.cpp:545
#1  0x61687316 in nsViewManager::InvalidateWidgetArea (this=<optimized out>, aWidgetView=0x5ff86f60, aDamagedRegion=<optimized out>)
    at /Users/kats/zspace/mozilla-git/view/src/nsViewManager.cpp:557
#2  0x6168744c in nsViewManager::FlushDirtyRegionToWidget (this=<optimized out>, aView=<optimized out>) at /Users/kats/zspace/mozilla-git/view/src/nsViewManager.cpp:437
#3  0x616874da in nsViewManager::ProcessPendingUpdatesForView (this=0x601b99d0, aView=0x5ff86f60, aFlushDirtyRegion=true)
    at /Users/kats/zspace/mozilla-git/view/src/nsViewManager.cpp:419
#4  0x61687540 in nsViewManager::ProcessPendingUpdates (this=0x601b99d0) at /Users/kats/zspace/mozilla-git/view/src/nsViewManager.cpp:1357
#5  0x616865b2 in nsViewManager::DispatchEvent (this=0x601b99d0, aEvent=0x5b7600f8, aView=0x5ff86f60, aStatus=0x5b7601bc)
    at /Users/kats/zspace/mozilla-git/view/src/nsViewManager.cpp:786
#6  0x6168670e in nsViewManager::DispatchEvent (this=0x601b99d0, aEvent=0x5b760218, aView=0x5ff86f60, aStatus=0x5b7601bc)
    at /Users/kats/zspace/mozilla-git/view/src/nsViewManager.cpp:808
#7  0x61684068 in HandleEvent (aEvent=0x5b760218) at /Users/kats/zspace/mozilla-git/view/src/nsView.cpp:158
#8  0x61b7ee08 in nsWindow::DispatchEvent (this=0x5e1c2880, aEvent=0x5b760218) at /Users/kats/zspace/mozilla-git/widget/android/nsWindow.cpp:673
#9  0x61b800ca in nsWindow::DrawTo (this=0x5e1c2880, targetSurface=0x5ff56c40, invalidRect=<optimized out>)
    at /Users/kats/zspace/mozilla-git/widget/android/nsWindow.cpp:1064
#10 0x61b8032a in nsWindow::DrawTo (this=0x5e1c2280, targetSurface=0x5ff56c40, invalidRect=<optimized out>)
    at /Users/kats/zspace/mozilla-git/widget/android/nsWindow.cpp:1112
#11 0x61b80570 in nsWindow::OnDraw (this=0x5e1c2280, ae=0x60432870) at /Users/kats/zspace/mozilla-git/widget/android/nsWindow.cpp:1179
Even if I hack around the will-paint event shown in the last comment, I still get the refresh driver queueing invalidates, which shouldn't be happening AIUI. (this is on the red page, so there's no animations or anything).

#0  nsWindow::Invalidate (this=0x5e1c2880, aRect=...) at /Users/kats/zspace/mozilla-git/widget/android/nsWindow.cpp:545
#1  0x603da316 in nsViewManager::InvalidateWidgetArea (this=<optimized out>, aWidgetView=0x5f886f60, aDamagedRegion=<optimized out>)
    at /Users/kats/zspace/mozilla-git/view/src/nsViewManager.cpp:557
#2  0x603da44c in nsViewManager::FlushDirtyRegionToWidget (this=<optimized out>, aView=<optimized out>) at /Users/kats/zspace/mozilla-git/view/src/nsViewManager.cpp:437
#3  0x603da4da in nsViewManager::ProcessPendingUpdatesForView (this=0x5fab99d0, aView=0x5f886f60, aFlushDirtyRegion=true)
    at /Users/kats/zspace/mozilla-git/view/src/nsViewManager.cpp:419
#4  0x603da540 in nsViewManager::ProcessPendingUpdates (this=0x5fab99d0) at /Users/kats/zspace/mozilla-git/view/src/nsViewManager.cpp:1357
#5  0x6008ac64 in nsRefreshDriver::Notify (this=0x5f7d16a0, aTimer=<optimized out>) at /Users/kats/zspace/mozilla-git/layout/base/nsRefreshDriver.cpp:447
#6  0x60a7abdc in nsTimerImpl::Fire (this=0x5d2ec4c0) at /Users/kats/zspace/mozilla-git/xpcom/threads/nsTimerImpl.cpp:511
That definitely explains the regression. Very encouraging. If we fix the cause we should be able to land this after all.
Yeah, but I'll need help from somebody who understands this code.
Boris might have some idea; failing that, perhaps Matt does, as he's looked at invalidates a lot.
Why are there pending invalidates in the refresh driver in this case?
(In reply to Boris Zbarsky (:bz) from comment #23)
> Why are there pending invalidates in the refresh driver in this case?

I was going to look into that.
And a perhaps related question: if we're invalidating the whole viewport anyway (are we?) should we just clear other pending invalidates?
(In reply to Boris Zbarsky (:bz) from comment #25)
> And a perhaps related question: if we're invalidating the whole viewport
> anyway (are we?) should we just clear other pending invalidates?

Doesn't this already happen during paint?
I don't know.  It might not if the invalidates are queued up in our view system and haven't been flushed to the widget layer yet....
I took a look at this. WILLPAINT clears our pending invalidates, but this causes us to queue a draw(). We need to clear before calling paint.
We could fix this by disabling nsWindow::Invalidate() during our Draw() but that's pretty ugly. Better suggestions welcome.
This change did not fix the regression.
While this doesn't fix the regression, it does fix the double draw problem. The regression may come from scheduling a draw to early, which better displayport prediction should avoid. We can retry this when we have the better prediction.
kats, want to try this patch with your patch?
blocking-fennec1.0: --- → ?
blocking-fennec1.0: ? → -
minusing because its not clear that this is a win or even a problem.
You don't think its a problem that we delay urgently needed paints to avoid checkerboarding by up to 16ms? You don't think it would be a win not to do that?
Comment on attachment 617580 [details] [diff] [review]
Version with invalidation disabling

Review of attachment 617580 [details] [diff] [review]:
-----------------------------------------------------------------

The way this is supposed to work is that you send an NS_WILLPAINT event (and have your NS_PAINT event set didSendWillPaint). The NS_WILLPAINT handler may trigger invalidation. That invalidation should be covered by your NS_PAINT event. In the Android case, since you always paint everything, you should just ignore invalidations that happen during the NS_WILLPAINT. It is not correct to ignore invalidations that happen during NS_PAINT, although there won't normally be any.
Though it's a bit more complicated because Gonk uses mDirtyRegion. So nsWindow::DoDraw needs to send NS_WILLPAINT before copying mDirtyRegion into event.region and clearing mDirtyRegion.
As for this 16ms delay, I think the right thing to do is for nsRefreshDriver::ScheduleViewManagerFlush to schedule a refresh driver run with *zero* delay (but still async) if there isn't a timer running and the delay since the last refresh driver tick is more than one refresh driver period.

It might make sense to do that for some other callers of EnsureTimerStarted too, the non-animation ones --- AddStyleFlushObserver, AddLayoutFlushObserver.

Does that sound right, Boris?
Another option is to add API to request "urgent" early refresh driver ticks in response to "critical" user input where we must minimize latency. The question is of course, what would be treated as "critical". If everything's critical then we won't be throttling paints anymore.
roc, can we simply skip the next tick when we schedule an early draw?
Sure, if a tick happens early then we won't need to schedule another one (unless an animation observer requests it).

That still means we'd end up painting furiously if too many "critical" requests happen.
Shouldn't be the case. Even pretty quick paints take a few ms. We shouldn't fire a new one at least until the last one is done. It would be nice if we make paints interruptible.
On desktop, we definitely had problems in the past with painting at too high a rate and wasting cycles since you can't see more than 60fps anyway. Hence bug 598482.

Really we should be thinking about this in terms of OMTC.
(In reply to Andreas Gal :gal from comment #35)
> You don't think its a problem that we delay urgently needed paints to avoid
> checkerboarding by up to 16ms? You don't think it would be a win not to do
> that?

I've been continuing to look at this, and I have yet been able to see clear wins. Tryserver has been flaky which has made measuring this a pain, but I hope to get better results soon.
On the red div page this delay takes longer than the actual paint time. Happy to help measure it.
My current patch https://hg.mozilla.org/try/rev/27f790c8d5e5 still causes tcheckerboard to regress from .77 to .73-.71 and I don't know why.
> Does that sound right, Boris?

We don't want to do an "immediate" tick for style/layout observers, I don't think.  Though maybe it would be ok.
You need to log in before you can comment on or make changes to this bug.