745315 - up to 16ms delay before paint starts after setViewport

Reporter

Description

•

13 years ago

Native Fennec sends setViewport events from the Java frontend to Gecko every time the user pans the screen around. In Gecko these are batched and the paint is executed at the next refresh driver tick. This is bad, because the tick can be up to 16ms away. Even on slow devices this delay is often larger than the paint time itself (trace from cnn.com below, panning a small amount): I/RenderDump( 2797): paint time 9 I/RenderDump( 2797): 28442334 syncViewportInfo 1868 13229 720 1035 3.897975 1988.2378 13540.98 3.8979745 I/RenderDump( 2797): 28442337 adjustViewport v=RectF(1988.2378, 13541.187, 2468.2385, 14231.187) p=(3991.9995,14266.998) z=3.8979745 DisplayPortMetrics(1868.2375,13231.998,2588.2388,14266.998,3.8979745) The 9 here is ms, obviously. The right thing to do here is to request a lot viewer viewport changes, and then when we execute one it should draw immediately and pre-schedule the next refresh driver tick.

Andreas Gal :gal

Reporter

Comment 1

•

13 years ago

a lot => a lot less

Andreas Gal :gal

Reporter

Comment 2

•

13 years ago

JP, can you get someone to take a look and double check my math above, and if confirmed, we might want to get someone on this.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 3

•

13 years ago

I agree with the problem described in this bug; I was seeing similar latencies caused by the refresh driver in bug 740883. Usually it wouldn't be the full 16ms but some smaller number like 4-5ms. Something I tried that helped was to just queue a DRAW event in nsAppShell right after we queued a Viewport:Change event; the DRAW coalescing behaviour would take care of any extra draws. Although this did provide some improvement I didn't think it was really enough to justify the hackiness of the solution. You can see the WIP I tried at https://bug740883.bugzilla.mozilla.org/attachment.cgi?id=611305

Blocks: checkerboarding

Jeff Muizelaar [:jrmuizel]

Assignee

Comment 4

•

13 years ago

(In reply to Andreas Gal :gal from comment #0) > The right thing to do here is to request a lot viewer viewport changes, and > then when we execute one it should draw immediately and pre-schedule the > next refresh driver tick. Can you expand on this? I don't quite follow your suggested solution.

Andreas Gal :gal

Reporter

Comment 5

•

13 years ago

The refresh driver is driven by a timer right now. We should allow that timer to fire early when we need an immediate paint (trigger the refresh driver manually, set a flag so that the next timer tick doesn't actually paint). We likely also have to reduce the frequency of setViewport events as well. Right now we hammer the Gecko thread with setViewport events. When you move a page around with your finger, the Gecko thread is constantly painting (up to 60 times a second, if the page paints quickly enough), even if you just move a tad. This adds additional latency. When you need a paint to happen, Gecko is usually busy servicing the previous paint event. That has to finish first, then we wait for a refresh driver tick, and then we start the paint we desperately need right away to avoid checker-boarding.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 6

•

13 years ago

(In reply to Andreas Gal :gal from comment #5) > The refresh driver is driven by a timer right now. We should allow that > timer to fire early when we need an immediate paint Agreed > (trigger the refresh > driver manually, set a flag so that the next timer tick doesn't actually > paint). > That's one option. Another would be to modify the refresh driver scheduling to the same as BenWa is doing in the compositor, which seems better to me. This algorithm is: if we need to paint and have not painted in the last 16ms, paint immediately; otherwise schedule a paint such that it will happen 16ms after the last paint. This algorithm keeps the 16ms throttling but also removes the latency on first draw. > We likely also have to reduce the frequency of setViewport events as well. > Right now we hammer the Gecko thread with setViewport events. When you move > a page around with your finger, the Gecko thread is constantly painting (up > to 60 times a second, if the page paints quickly enough), even if you just > move a tad. This adds additional latency. When you need a paint to happen, > Gecko is usually busy servicing the previous paint event. That has to finish > first, then we wait for a refresh driver tick, and then we start the paint > we desperately need right away to avoid checker-boarding. I don't necessarily agree this will help. We were doing this with the fixed-margin strategy earlier, and the problem I was seeing is that if you send fewer viewport changes, the area that gets drawn on each change is larger. Rather than drawing 1 row of pixels 50 times (for example) we will draw a block of 50 pixels once. This larger draw takes much more time (I think maybe even super-linear) and by the time it's done the user has already scrolled somewhere else. With the 50 small draws, however, the gecko thread is more responsive and the drawn content actually gets displayed to the user. However, I concede that this behaviour is dependent on a lot of factors including the content being drawn. You can twiddle with the danger zone prefs in the fixed margin strategy to play with how often we send viewport changes. I'm adding similar prefs to the velocity bias in bug 744390.

Jeff Muizelaar [:jrmuizel]

Assignee

Comment 7

•

13 years ago

Attached patch Draw directly from viewport changes (obsolete) — Details — Splinter Review

Here's a version of kats' patch that works better. First it doesn't add the DRAW to the event queue, it just does it right away. This helps us avoid leaving extra DRAWs around when we coalesce VIEWPORTs. Second it uses the bounds of the window instead of a 0x0 rect. I originally tried it with a 0x0 rect and found that it did not work. Using the full bounds fixed it. I've yet to measure the impact of this patch.

Jeff Muizelaar [:jrmuizel]

Assignee

Comment 8

•

13 years ago

Attached patch Draw directly from viewport changes — Details — Splinter Review

After thinking about this more, I'm pretty sure we want to do it. I've yet to try to measure any real difference but I think it's at least justifiable theoretically.

Attachment #615965 - Attachment is obsolete: true

Attachment #616189 - Flags: review?(bugmail.mozilla)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 9

•

13 years ago

Comment on attachment 616189 [details] [diff] [review] Draw directly from viewport changes Review of attachment 616189 [details] [diff] [review]: ----------------------------------------------------------------- Patch looks fine to me, but I really would like to see a measured benefit from this before we land it. It's entirely possible that because of this we'll be doing extra draw calls that just burn CPU cycles. In particular the DRAW event that the refresh driver will queue in response to a viewport change will be turned into a no-op.

Jeff Muizelaar [:jrmuizel]

Assignee

Comment 10

•

13 years ago

(In reply to Kartikaya Gupta (:kats) from comment #9) > Comment on attachment 616189 [details] [diff] [review] > Draw directly from viewport changes > > Review of attachment 616189 [details] [diff] [review]: > ----------------------------------------------------------------- > > Patch looks fine to me, but I really would like to see a measured benefit > from this before we land it. It's entirely possible that because of this > we'll be doing extra draw calls that just burn CPU cycles. In particular the > DRAW event that the refresh driver will queue in response to a viewport > change will be turned into a no-op. The refresh driver will only queue a DRAW if there's an invalid region to draw.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Updated

•

13 years ago

Attachment #616189 - Flags: review?(bugmail.mozilla) → review+

Jeff Muizelaar [:jrmuizel]

Assignee

Comment 11

•

13 years ago

I ran this on the red page with robocop and got the following results: Before: __start_report0.97285944__end_report __startTimestamp1334864757196__endTimestamp After: __start_report0.97513074__end_report __startTimestamp1334869192814__endTimestamp

Jeff Muizelaar [:jrmuizel]

Assignee

Updated

•

13 years ago

Attachment #616189 - Flags: approval-mozilla-central?

Joe Drew (not getting mail)

Updated

•

13 years ago

tracking-firefox14: --- → +

Joe Drew (not getting mail)

Updated

•

13 years ago

Attachment #616189 - Flags: approval-mozilla-central? → approval-mozilla-central+

Jeff Muizelaar [:jrmuizel]

Assignee

Comment 12

•

13 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/c048eb7cc728

Assignee: nobody → jmuizelaar

Target Milestone: --- → Firefox 14

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 13

•

13 years ago

This seems to have noticeably regressed both tcheckerboard and tcheckerboard2, unfortunately. :( http://graphs.mozilla.org/graph.html#tests=[[175,63,20]]&sel=none&displayrange=7&datatype=running http://graphs.mozilla.org/graph.html#tests=[[201,63,20]]&sel=none&displayrange=7&datatype=running

Jeff Muizelaar [:jrmuizel]

Assignee

Comment 14

•

13 years ago

I've backed it out: https://hg.mozilla.org/integration/mozilla-inbound/rev/9c88cc584f04

(no longer active)

Comment 15

•

13 years ago

https://hg.mozilla.org/mozilla-central/rev/c048eb7cc728

Status: NEW → RESOLVED

Closed: 13 years ago

Resolution: --- → FIXED

(no longer active)

Comment 16

•

13 years ago

Backed out: https://hg.mozilla.org/mozilla-central/rev/9c88cc584f04 (Please include the bug number in your backout commit message in the future as well. Thanks!)

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Andreas Gal :gal

Reporter

Comment 17

•

13 years ago

Do we know why this regressed?

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 18

•

13 years ago

(In reply to Andreas Gal :gal from comment #17) > Do we know why this regressed? I'm sort of looking into it because applying this patch on top of your heuristic stuff is also giving odd results. What I see is that applying this patch seems to result in two draws. Specifically, the NS_PAINT event handler in nsViewManager kicks off an invalidate which results in another draw. Here is the stack from where this happens: #0 nsWindow::Invalidate (this=0x5e1c2880, aRect=...) at /Users/kats/zspace/mozilla-git/widget/android/nsWindow.cpp:545 #1 0x61687316 in nsViewManager::InvalidateWidgetArea (this=<optimized out>, aWidgetView=0x5ff86f60, aDamagedRegion=<optimized out>) at /Users/kats/zspace/mozilla-git/view/src/nsViewManager.cpp:557 #2 0x6168744c in nsViewManager::FlushDirtyRegionToWidget (this=<optimized out>, aView=<optimized out>) at /Users/kats/zspace/mozilla-git/view/src/nsViewManager.cpp:437 #3 0x616874da in nsViewManager::ProcessPendingUpdatesForView (this=0x601b99d0, aView=0x5ff86f60, aFlushDirtyRegion=true) at /Users/kats/zspace/mozilla-git/view/src/nsViewManager.cpp:419 #4 0x61687540 in nsViewManager::ProcessPendingUpdates (this=0x601b99d0) at /Users/kats/zspace/mozilla-git/view/src/nsViewManager.cpp:1357 #5 0x616865b2 in nsViewManager::DispatchEvent (this=0x601b99d0, aEvent=0x5b7600f8, aView=0x5ff86f60, aStatus=0x5b7601bc) at /Users/kats/zspace/mozilla-git/view/src/nsViewManager.cpp:786 #6 0x6168670e in nsViewManager::DispatchEvent (this=0x601b99d0, aEvent=0x5b760218, aView=0x5ff86f60, aStatus=0x5b7601bc) at /Users/kats/zspace/mozilla-git/view/src/nsViewManager.cpp:808 #7 0x61684068 in HandleEvent (aEvent=0x5b760218) at /Users/kats/zspace/mozilla-git/view/src/nsView.cpp:158 #8 0x61b7ee08 in nsWindow::DispatchEvent (this=0x5e1c2880, aEvent=0x5b760218) at /Users/kats/zspace/mozilla-git/widget/android/nsWindow.cpp:673 #9 0x61b800ca in nsWindow::DrawTo (this=0x5e1c2880, targetSurface=0x5ff56c40, invalidRect=<optimized out>) at /Users/kats/zspace/mozilla-git/widget/android/nsWindow.cpp:1064 #10 0x61b8032a in nsWindow::DrawTo (this=0x5e1c2280, targetSurface=0x5ff56c40, invalidRect=<optimized out>) at /Users/kats/zspace/mozilla-git/widget/android/nsWindow.cpp:1112 #11 0x61b80570 in nsWindow::OnDraw (this=0x5e1c2280, ae=0x60432870) at /Users/kats/zspace/mozilla-git/widget/android/nsWindow.cpp:1179

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 19

•

13 years ago

Even if I hack around the will-paint event shown in the last comment, I still get the refresh driver queueing invalidates, which shouldn't be happening AIUI. (this is on the red page, so there's no animations or anything). #0 nsWindow::Invalidate (this=0x5e1c2880, aRect=...) at /Users/kats/zspace/mozilla-git/widget/android/nsWindow.cpp:545 #1 0x603da316 in nsViewManager::InvalidateWidgetArea (this=<optimized out>, aWidgetView=0x5f886f60, aDamagedRegion=<optimized out>) at /Users/kats/zspace/mozilla-git/view/src/nsViewManager.cpp:557 #2 0x603da44c in nsViewManager::FlushDirtyRegionToWidget (this=<optimized out>, aView=<optimized out>) at /Users/kats/zspace/mozilla-git/view/src/nsViewManager.cpp:437 #3 0x603da4da in nsViewManager::ProcessPendingUpdatesForView (this=0x5fab99d0, aView=0x5f886f60, aFlushDirtyRegion=true) at /Users/kats/zspace/mozilla-git/view/src/nsViewManager.cpp:419 #4 0x603da540 in nsViewManager::ProcessPendingUpdates (this=0x5fab99d0) at /Users/kats/zspace/mozilla-git/view/src/nsViewManager.cpp:1357 #5 0x6008ac64 in nsRefreshDriver::Notify (this=0x5f7d16a0, aTimer=<optimized out>) at /Users/kats/zspace/mozilla-git/layout/base/nsRefreshDriver.cpp:447 #6 0x60a7abdc in nsTimerImpl::Fire (this=0x5d2ec4c0) at /Users/kats/zspace/mozilla-git/xpcom/threads/nsTimerImpl.cpp:511

Andreas Gal :gal

Reporter

Comment 20

•

13 years ago

That definitely explains the regression. Very encouraging. If we fix the cause we should be able to land this after all.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 21

•

13 years ago

Yeah, but I'll need help from somebody who understands this code.

Joe Drew (not getting mail)

Comment 22

•

13 years ago

Boris might have some idea; failing that, perhaps Matt does, as he's looked at invalidates a lot.

Boris Zbarsky [:bzbarsky]

Comment 23

•

13 years ago

Why are there pending invalidates in the refresh driver in this case?

Jeff Muizelaar [:jrmuizel]

Assignee

Comment 24

•

13 years ago

(In reply to Boris Zbarsky (:bz) from comment #23) > Why are there pending invalidates in the refresh driver in this case? I was going to look into that.

Boris Zbarsky [:bzbarsky]

Comment 25

•

13 years ago

And a perhaps related question: if we're invalidating the whole viewport anyway (are we?) should we just clear other pending invalidates?

Jeff Muizelaar [:jrmuizel]

Assignee

Comment 26

•

13 years ago

(In reply to Boris Zbarsky (:bz) from comment #25) > And a perhaps related question: if we're invalidating the whole viewport > anyway (are we?) should we just clear other pending invalidates? Doesn't this already happen during paint?

Boris Zbarsky [:bzbarsky]

Comment 27

•

13 years ago

I don't know. It might not if the invalidates are queued up in our view system and haven't been flushed to the widget layer yet....

Jeff Muizelaar [:jrmuizel]

Assignee

Comment 28

•

13 years ago

I took a look at this. WILLPAINT clears our pending invalidates, but this causes us to queue a draw(). We need to clear before calling paint.

Jeff Muizelaar [:jrmuizel]

Assignee

Comment 29

•

13 years ago

We could fix this by disabling nsWindow::Invalidate() during our Draw() but that's pretty ugly. Better suggestions welcome.

Jeff Muizelaar [:jrmuizel]

Assignee

Comment 30

•

13 years ago

We'll see if this fixes it: https://tbpl.mozilla.org/?tree=Try&rev=dfef37e38b75

Jeff Muizelaar [:jrmuizel]

Assignee

Comment 31

•

13 years ago

This change did not fix the regression.

Jeff Muizelaar [:jrmuizel]

Assignee

Comment 32

•

13 years ago

Attached patch Version with invalidation disabling — Details — Splinter Review

While this doesn't fix the regression, it does fix the double draw problem. The regression may come from scheduling a draw to early, which better displayport prediction should avoid. We can retry this when we have the better prediction.

Andreas Gal :gal

Reporter

Comment 33

•

13 years ago

kats, want to try this patch with your patch?

JP Rosevear [:jpr]

Updated

•

13 years ago

blocking-fennec1.0: --- → ?

Brad Lassey [:blassey] (use needinfo?)

Updated

•

13 years ago

blocking-fennec1.0: ? → -

tracking-firefox14: + → -

Brad Lassey [:blassey] (use needinfo?)

Comment 34

•

13 years ago

minusing because its not clear that this is a win or even a problem.

Andreas Gal :gal

Reporter

Comment 35

•

13 years ago

You don't think its a problem that we delay urgently needed paints to avoid checkerboarding by up to 16ms? You don't think it would be a win not to do that?

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 36

•

13 years ago

Comment on attachment 617580 [details] [diff] [review] Version with invalidation disabling Review of attachment 617580 [details] [diff] [review]: ----------------------------------------------------------------- The way this is supposed to work is that you send an NS_WILLPAINT event (and have your NS_PAINT event set didSendWillPaint). The NS_WILLPAINT handler may trigger invalidation. That invalidation should be covered by your NS_PAINT event. In the Android case, since you always paint everything, you should just ignore invalidations that happen during the NS_WILLPAINT. It is not correct to ignore invalidations that happen during NS_PAINT, although there won't normally be any.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 37

•

13 years ago

Gonk needs a similar fix.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 38

•

13 years ago

Though it's a bit more complicated because Gonk uses mDirtyRegion. So nsWindow::DoDraw needs to send NS_WILLPAINT before copying mDirtyRegion into event.region and clearing mDirtyRegion.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 39

•

13 years ago

As for this 16ms delay, I think the right thing to do is for nsRefreshDriver::ScheduleViewManagerFlush to schedule a refresh driver run with *zero* delay (but still async) if there isn't a timer running and the delay since the last refresh driver tick is more than one refresh driver period. It might make sense to do that for some other callers of EnsureTimerStarted too, the non-animation ones --- AddStyleFlushObserver, AddLayoutFlushObserver. Does that sound right, Boris?

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 40

•

13 years ago

Another option is to add API to request "urgent" early refresh driver ticks in response to "critical" user input where we must minimize latency. The question is of course, what would be treated as "critical". If everything's critical then we won't be throttling paints anymore.

Andreas Gal :gal

Reporter

Comment 41

•

13 years ago

roc, can we simply skip the next tick when we schedule an early draw?

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 42

•

13 years ago

Sure, if a tick happens early then we won't need to schedule another one (unless an animation observer requests it). That still means we'd end up painting furiously if too many "critical" requests happen.

Andreas Gal :gal

Reporter

Comment 43

•

13 years ago

Shouldn't be the case. Even pretty quick paints take a few ms. We shouldn't fire a new one at least until the last one is done. It would be nice if we make paints interruptible.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 44

•

13 years ago

On desktop, we definitely had problems in the past with painting at too high a rate and wasting cycles since you can't see more than 60fps anyway. Hence bug 598482. Really we should be thinking about this in terms of OMTC.

Jeff Muizelaar [:jrmuizel]

Assignee

Comment 45

•

13 years ago

(In reply to Andreas Gal :gal from comment #35) > You don't think its a problem that we delay urgently needed paints to avoid > checkerboarding by up to 16ms? You don't think it would be a win not to do > that? I've been continuing to look at this, and I have yet been able to see clear wins. Tryserver has been flaky which has made measuring this a pain, but I hope to get better results soon.

Andreas Gal :gal

Reporter

Comment 46

•

13 years ago

On the red div page this delay takes longer than the actual paint time. Happy to help measure it.

Jeff Muizelaar [:jrmuizel]

Assignee

Comment 47

•

13 years ago

My current patch https://hg.mozilla.org/try/rev/27f790c8d5e5 still causes tcheckerboard to regress from .77 to .73-.71 and I don't know why.

Boris Zbarsky [:bzbarsky]

Comment 48

•

13 years ago

> Does that sound right, Boris? We don't want to do an "immediate" tick for style/layout observers, I don't think. Though maybe it would be ok.

BMO Automation

Comment 50

•

5 years ago

We have completed our launch of our new Firefox on Android. The development of the new versions use GitHub for issue tracking. If the bug report still reproduces in a current version of [Firefox on Android nightly](https://play.google.com/store/apps/details?id=org.mozilla.fenix) an issue can be reported at the [Fenix GitHub project](https://github.com/mozilla-mobile/fenix/). If you want to discuss your report please use [Mozilla's chat](https://wiki.mozilla.org/Matrix#Connect_to_Matrix) server https://chat.mozilla.org and join the [#fenix](https://chat.mozilla.org/#/room/#fenix:mozilla.org) channel.

Status: REOPENED → RESOLVED

Closed: 13 years ago → 5 years ago

Resolution: --- → INCOMPLETE

Nobody; OK to take it and work on it

Updated

•

5 years ago

Product: Firefox for Android → Firefox for Android Graveyard

Draw directly from viewport changes 13 years ago Jeff Muizelaar [:jrmuizel] 2.70 KB, patch		Details \| Diff \| Splinter Review
Draw directly from viewport changes 13 years ago Jeff Muizelaar [:jrmuizel] 3.16 KB, patch	kats : review+ joe : approval-mozilla-central+	Details \| Diff \| Splinter Review
Version with invalidation disabling 13 years ago Jeff Muizelaar [:jrmuizel] 3.83 KB, patch		Details \| Diff \| Splinter Review