Closed Bug 814864 Opened 12 years ago Closed 12 years ago

Low precision tile rendering takes too much time away from high precision rendering

Categories

(Firefox for Android Graveyard :: Toolbar, defect)

All
Android
defect
Not set
normal

Tracking

(firefox19 fixed, firefox20 fixed)

RESOLVED FIXED
Firefox 20
Tracking Status
firefox19 --- fixed
firefox20 --- fixed

People

(Reporter: cwiiis, Assigned: cwiiis)

References

Details

Attachments

(6 files, 2 obsolete files)

(Most of) The reason for the regression in bug 814437 is that low precision tile rendering isn't free and we're spending too much time doing it when we could be rendering high precision tiles. Some of this is just down to silliness, some of it is because we aren't being intelligent enough.

The silliness is that we interleave low and high precision rendering when that's totally unnecessary - also, we spend time drawing stale low precision content when we could be drawing high precision content from a more recent paint request. Patches for both of these incoming. There also appear to be rounding errors that cause slivers of content to be re-rendered quite a bit too, which although isn't a huge deal in terms of paint time, causes whole-texture uploads which will hurt our smoothness - I need to look into this, it can likely be solved by a ceil/floor in the right place...

The intelligence aspect is that we always render low precision tiles, regardless of whether we'll need them or not - Java-side has context enough that it could predict if low precision rendering might be visible and if it isn't going to be, cancel all low precision rendering (it can do this by looking at the current pan velocity and the position of the viewport within the displayport). This would likely need tuning, but if we want our snythetic performance numbers to go up instead of down, we'll need to do this (if done right, it should enhance perceived performance too). I'll tackle this in the week.

For the record, none of the above mentioned are hard problems to solve, it's likely about a week's worth of work.
This is still on a layer-by-layer basis, so it's still possible to interleave if a small layer starts drawing low precision while a large layer has high precision - this should be an extremely rare case though (it would require quite unusual layer sizes).
Attachment #684858 - Flags: review?(bgirard)
Attachment #684859 - Flags: review?(bugmail.mozilla)
Comment on attachment 684858 [details] [diff] [review]
Draw low precision updates after high precision

Review of attachment 684858 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/basic/BasicTiledThebesLayer.cpp
@@ -627,5 @@
>      }
> -
> -    // Re-add the high-precision valid region intersection so that we can
> -    // maintain coherency when the valid region changes.
> -    lowPrecisionInvalidRegion.Or(lowPrecisionInvalidRegion, invalidHighPrecisionIntersect);

Just to note that this comment was nonsense - lowPrecisionInvalidRegion is not used beyond this point and the region we don't validate won't get added to mLowPrecisionValidRegion so it'll maintain coherency.
No performance numbers yet because the tree is closed :(
Try push: https://tbpl.mozilla.org/?tree=Try&rev=d0471b703622
Compare to: https://tbpl.mozilla.org/?tree=Try&rev=db95149913a3

This has the patch from bug 814437 applied - as I'm not sure the calculations are correct, take any numbers with a pinch of salt (there should still be an improvement though).
Attachment #684859 - Flags: review?(bugmail.mozilla) → review+
Comment on attachment 684859 [details] [diff] [review]
Part 2 - Abort old, stale, low-precision updates

Review of attachment 684859 [details] [diff] [review]:
-----------------------------------------------------------------

Actually, you took out similar code in cset d014526b795e, will those problems be re-introduced with this change?
(In reply to Kartikaya Gupta (:kats) from comment #6)
> Comment on attachment 684859 [details] [diff] [review]
> Abort old, stale, low-precision updates
> 
> Review of attachment 684859 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Actually, you took out similar code in cset d014526b795e, will those
> problems be re-introduced with this change?

Aborting stale draws can cause temporary coherency issues, but we don't care so much about that with the low precision buffer. It will resolve, as it only cancels when there's another draw guaranteed.
The numbers have actually gone down :( This is likely because this patch also fixes that low precision updates could previously be cut short by the high precision updates finishing without cancelling.

I'm running a test to determine the cost of just expanding the displayport without actually drawing in it, which should get us a theoretical number to aim for, then I'll have a look at adding some intelligence (probably not by today, but we'll see).

It may also be worth adding culling to TiledThebesLayerOGL to not waste time drawing things that are off-screen - I'd hope that GPUs are sensible enough that this isn't a large cost, but I'm probably completely mistaken...
https://tbpl.mozilla.org/?tree=Try&rev=2948612f5b33 - This is a try build with low precision tiles disabled (so the displayport expansion is still happening, but nothing is validated in the extra area). Given the huge decrease in number vs. before these patches, I'm starting to think more strongly that the measurement is wrong/too expensive.

I would expect some drop, 1- because I've fixed some incorrect aborting and 2- a larger displayport isn't free, but I find a decrease of this magnitude unexpected.

I also realise that even with low-precision drawing disabled, this new patch is likely scheduling an extra useless transaction, so I'll fix that in a revised patch.
Looking at the checkerboard measurement, the numbers look correct but I notice that the recording is number of complete frames - so if the frame-rate dips, this test will be affected negatively.

I think it's worth trying the culling first to see how that improves the numbers. I also don't like that it's more expensive to measure, but it still shouldn't be so expensive as to have a significant impact.
Actually, I hadn't thought this through - no extra rendering is occurring with low precision tiles off, so I guess this may well just be the cost of the increased display-port? Need to profile. It could also just be the extra work required to draw, and/or the increased cost of measurement impacting on the frame-rate.

Culling may still be a smart thing to do regardless.
Expanded displayport, no low-precision, clipped to composition bounds: https://tbpl.mozilla.org/?tree=Try&rev=66052a573ffa
Expanded displayport, with low-precision, clipped: https://tbpl.mozilla.org/?tree=Try&rev=91003df51fae

This is also with the extra useless transaction without low precision fixed. Next I'll add a patch so that browser.js checks the pref and stops expanding the displayport when low-precision is disabled - this should get us back to the old numbers (and hopefully better due to culling).
https://tbpl.mozilla.org/?tree=Try&rev=c5456a863608 - This is with browser.js fixed to respect the low-precision pref, culling added and low-precision disabled. Numbers look good on this one :)

I noticed a bug that would get tiles drawing in the wrong direction possibly (which would badly affect this test) - I don't think it'll make a difference, but here's another try push with that fixed too (and low-precision enabled) https://tbpl.mozilla.org/?tree=Try&rev=c0f1a4a814a7

Going to double check the numbers are correct for the checkerboard measurement with low-res tiles, these numbers still surprise me...
Only with low-precision/an enlarged displayport is the value wrong.

The problem is that we calculate the checkerboard by first calculating how much of the page area is covered by the displayport, then multiplying that by the integrity of what's rendered inside it.

With an enlarged displayport, that first calculation is wrong, so any time we checkerboard, we record much higher values of checkerboarding than are actually happening (which is why the values recorded in bug 814437 are so high even when just enlarging the displayport and doing nothing else).

Hopefully we can just use the composition bounds stored in FrameMetrics to do the same thing correctly on Gecko side.
Commented on wrong bug, sorry. See bug 814437.
No longer blocks: 814437
Depends on: 814437
Comment on attachment 684858 [details] [diff] [review]
Draw low precision updates after high precision

Marking as obsolete - I have a revised version of the patch, but fixing bug 814437 first.
Attachment #684858 - Attachment is obsolete: true
Attachment #684858 - Flags: review?(bgirard)
This refactors the code slightly (for speed and readability) and makes sure that 1- low res tiles are only drawn after high res, and 2- Low res tile updates complete regardless of how many high-res updates happen.

Without more intelligent cancelling, this may result in a slight regression (but should be an improvement in terms of the coherency of what the user sees).
Attachment #685154 - Flags: review?(bgirard)
Attachment #684859 - Attachment description: Abort old, stale, low-precision updates → Part 2 - Abort old, stale, low-precision updates
Clip drawing to composition bounds to stop unnecessary rendering. I'll push some try builds to see if this actually helps or not. I'd expect it to possibly improve frame-rate on large screens/poor GPUs, but at the cost of CPU use (so it may not help at all).
Attachment #685156 - Flags: review?(bgirard)
At the moment, browser.js alwayss sets a critical and normal displayport - there's no need to set the former if low precision tiles are disabled and it will result in a slight performance decrease due to the increased cost of DLBI and layer-building when the displayport is larger.

This patch checks the pref on startup (it gets cached statically in gfxPlatform too, so there's no need to listen to updates) and reverts to just setting a displayport if it's disabled.

This gives us parity with prior performance numbers if low precision tiles are disabled.
Attachment #685159 - Flags: review?(bugmail.mozilla)
Comment on attachment 685159 [details] [diff] [review]
Part 4 - Respect low-precision rendering pref in browser.js

Review of attachment 685159 [details] [diff] [review]:
-----------------------------------------------------------------

Looks straightforward enough.
Attachment #685159 - Flags: review?(bugmail.mozilla) → review+
Comment on attachment 685156 [details] [diff] [review]
Part 3 (optional) - Clip drawing to composition bounds

Review of attachment 685156 [details] [diff] [review]:
-----------------------------------------------------------------

Ohh I guess with the bigger display port we still try to draw each tile that falls outside the screen =\. We certainly need some patch to do this. I'd like to see the performance numbers before we take this.

::: gfx/layers/opengl/TiledThebesLayerOGL.cpp
@@ +356,5 @@
>  
> +  // Walk up the tree to calculate the transform necessary to convert the
> +  // composition bounds from screen-space to layer-space.
> +  // XXX Taken from ../basic/BasicTiledThebesLayer.cpp
> +  //     Perhaps this should be a function in Layers.h?

I think we should add this function and another helper to transform a nsIntRect by a gfx3DMatrix.

@@ +369,5 @@
> +  // Get the composition bounds in the layer's coordinate space
> +  const FrameMetrics& metrics = mOGLManager->GetRoot()->AsContainerLayer()->GetFrameMetrics();
> +  gfxRect compositionBounds = transformScreenToLayer.TransformBounds(metrics.mCompositionBounds);
> +  compositionBounds.RoundOut();
> +  nsIntRect clipRect = nsIntRect(compositionBounds.x, compositionBounds.y,

Perhaps we should move this calculation to TransformShadowTree. Also it's easier to compute these transform from top to bottom then bottom up for each layer.
Attachment #685156 - Flags: review?(bgirard)
Comment on attachment 685154 [details] [diff] [review]
Part 1 - Update low precision tiles after high precision

Review of attachment 685154 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/basic/BasicTiledThebesLayer.cpp
@@ +553,5 @@
>  
> +  // Fast path for no progressive updates, no low-precision updates and no
> +  // critical display-port set.
> +  if (!gfxPlatform::UseProgressiveTilePainting() &&
> +      !gfxPlatform::UseLowPrecisionBuffer() &&

Do we support low precision without progressive drawing? I don't think we should. If not we should simplify this logic.
Attachment #685154 - Flags: review?(bgirard) → review+
(In reply to Benoit Girard (:BenWa) from comment #22)
> Comment on attachment 685154 [details] [diff] [review]
> Part 1 - Update low precision tiles after high precision
> 
> Review of attachment 685154 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: gfx/layers/basic/BasicTiledThebesLayer.cpp
> @@ +553,5 @@
> >  
> > +  // Fast path for no progressive updates, no low-precision updates and no
> > +  // critical display-port set.
> > +  if (!gfxPlatform::UseProgressiveTilePainting() &&
> > +      !gfxPlatform::UseLowPrecisionBuffer() &&
> 
> Do we support low precision without progressive drawing? I don't think we
> should. If not we should simplify this logic.

We do - low precision will draw progressive, regardless of UseProgressiveTilePainting though - I think of it like enabling the screenshot layer, but with hopefully fewer problems. I don't feel strongly about this though, so if you think low-precision tiles shouldn't work with progressive disabled, I don't mind writing a patch for that.
(In reply to Benoit Girard (:BenWa) from comment #21)
> Comment on attachment 685156 [details] [diff] [review]
> Part 3 (optional) - Clip drawing to composition bounds
> 
> Review of attachment 685156 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Ohh I guess with the bigger display port we still try to draw each tile that
> falls outside the screen =\. We certainly need some patch to do this. I'd
> like to see the performance numbers before we take this.
> 
> ::: gfx/layers/opengl/TiledThebesLayerOGL.cpp
> @@ +356,5 @@
> >  
> > +  // Walk up the tree to calculate the transform necessary to convert the
> > +  // composition bounds from screen-space to layer-space.
> > +  // XXX Taken from ../basic/BasicTiledThebesLayer.cpp
> > +  //     Perhaps this should be a function in Layers.h?
> 
> I think we should add this function and another helper to transform a
> nsIntRect by a gfx3DMatrix.

Agreed, I've written out the code to transform to/from screen-space too many times now, it'd be good for it to be centralised so that I can get it wrong in one place :) I think I'd prefer to do this in a separate patch if that's ok though. Ditto with the gfx3DMatrix/nsIntRect, though I'm not sure the best place to put that... in gfx3DMatrix.h even?

> @@ +369,5 @@
> > +  // Get the composition bounds in the layer's coordinate space
> > +  const FrameMetrics& metrics = mOGLManager->GetRoot()->AsContainerLayer()->GetFrameMetrics();
> > +  gfxRect compositionBounds = transformScreenToLayer.TransformBounds(metrics.mCompositionBounds);
> > +  compositionBounds.RoundOut();
> > +  nsIntRect clipRect = nsIntRect(compositionBounds.x, compositionBounds.y,
> 
> Perhaps we should move this calculation to TransformShadowTree. Also it's
> easier to compute these transform from top to bottom then bottom up for each
> layer.

Do you mean the other way round? (or am I understanding this the wrong way round?) Perhaps we should calculate this when doing ComputeEffectiveTransforms, that way we can cut down on the amount of calculation required and we can cache the resultant matrices for use with GetEffectiveTransformToSurface / GetEffectiveTransformToLayer? Again, sounds like a separate patch though - I promise to not write this code again before then :p
(In reply to Chris Lord [:cwiiis] from comment #23)
> We do - low precision will draw progressive, regardless of
> UseProgressiveTilePainting though - I think of it like enabling the
> screenshot layer, but with hopefully fewer problems. I don't feel strongly
> about this though, so if you think low-precision tiles shouldn't work with
> progressive disabled, I don't mind writing a patch for that.

As long as low res->progressive. I wouldn't want it to be possible to use only low res unless it was useful for testing.

(In reply to Chris Lord [:cwiiis] from comment #24)
> I think I'd prefer to do this in a separate patch if that's
> ok though. Ditto with the gfx3DMatrix/nsIntRect, though I'm not sure the
> best place to put that... in gfx3DMatrix.h even?
> 

Separate patch is fine. For the placement I would think gfx3DMatrix.h.

> > @@ +369,5 @@
> > > +  // Get the composition bounds in the layer's coordinate space
> > > +  const FrameMetrics& metrics = mOGLManager->GetRoot()->AsContainerLayer()->GetFrameMetrics();
> > > +  gfxRect compositionBounds = transformScreenToLayer.TransformBounds(metrics.mCompositionBounds);
> > > +  compositionBounds.RoundOut();
> > > +  nsIntRect clipRect = nsIntRect(compositionBounds.x, compositionBounds.y,
> > 
> > Perhaps we should move this calculation to TransformShadowTree. Also it's
> > easier to compute these transform from top to bottom then bottom up for each
> > layer.
> 
> Do you mean the other way round? (or am I understanding this the wrong way
> round?) Perhaps we should calculate this when doing
> ComputeEffectiveTransforms, that way we can cut down on the amount of
> calculation required and we can cache the resultant matrices for use with
> GetEffectiveTransformToSurface / GetEffectiveTransformToLayer? Again, sounds
> like a separate patch though - I promise to not write this code again before
> then :p

Alright let's look at it then.
(In reply to Benoit Girard (:BenWa) from comment #21)
> Comment on attachment 685156 [details] [diff] [review]
> Part 3 (optional) - Clip drawing to composition bounds
> -----------------------------------------------------------------
> Ohh I guess with the bigger display port we still try to draw each tile that
> falls outside the screen =\. We certainly need some patch to do this. I'd
> like to see the performance numbers before we take this.

Without: https://tbpl.mozilla.org/?tree=Try&rev=fa86d0af92ce
With: https://tbpl.mozilla.org/?tree=Try&rev=5a5d30ff1531

Summary: Checkerboard tests better, pan test worse (though going on previous numbers, not significantly - it's hard to tell exactly what rp is measuring).

Hard to say from this whether it's worth it or not.
That's within the variange for turbopan. It measured the sum of squares of the delta time between two frames. We should normalize these values but currently we don't so there a large numeric difference even if the test is relatively stable.
This, irritatingly, makes rck numbers worse (but makes rp *much* better), but trying it out on the actual device, the experience is much better. Page load times aren't bogged down by unnecessary low precision updates, and panning fast, it's extremely hard to get any blank area to appear. Worth taking for the improved user experience and rp results imo.

Without low-precision tiles: https://tbpl.mozilla.org/?tree=Try&rev=64f98304d873
With low-precision tiles: https://tbpl.mozilla.org/?tree=Try&rev=f93394bc9cb0
With on-demand low-precision tiles: https://tbpl.mozilla.org/?tree=Try&rev=20ea6b7f4ec8
Attachment #685803 - Flags: review?(bugmail.mozilla)
Comment on attachment 685803 [details] [diff] [review]
Part 5 - Only draw low precision tiles when there's a danger of checkerboarding

Review of attachment 685803 [details] [diff] [review]:
-----------------------------------------------------------------

::: mobile/android/base/gfx/GeckoLayerClient.java
@@ +393,5 @@
> +            if (!FloatUtils.fuzzyEquals(resolution, mProgressiveUpdateDisplayPort.resolution) ||
> +                !FloatUtils.fuzzyEquals(x, mProgressiveUpdateDisplayPort.getLeft()) ||
> +                !FloatUtils.fuzzyEquals(y, mProgressiveUpdateDisplayPort.getTop()) ||
> +                !FloatUtils.fuzzyEquals(x + width, mProgressiveUpdateDisplayPort.getRight()) ||
> +                !FloatUtils.fuzzyEquals(y + height, mProgressiveUpdateDisplayPort.getBottom())) {

It might be simpler to add a fuzzyEquals(float left, float top, float right, float bottom, float resolution) function onto the DisplayPortMetrics object to encapsulate this a bit better. Your call.

@@ +421,5 @@
> +            // If we're not doing low precision draws and we're about to
> +            // checkerboard, give up and move onto low precision drawing.
> +            if (DisplayPortCalculator.aboutToCheckerboard(viewportMetrics,
> +                  mPanZoomController.getVelocityVector(), mProgressiveUpdateDisplayPort)) {
> +                Log.d(LOGTAG, "Aborting high precision update due to danger of checkerboarding");

nit: remove log line (or comment it out)

@@ +436,5 @@
> +            if (Math.max(viewportMetrics.viewportRectLeft, viewportMetrics.pageRectLeft) + 1 < x ||
> +                Math.max(viewportMetrics.viewportRectTop, viewportMetrics.pageRectTop) + 1 < y ||
> +                Math.min(viewportMetrics.viewportRectRight, viewportMetrics.pageRectRight) - 1 > x + width ||
> +                Math.min(viewportMetrics.viewportRectBottom, viewportMetrics.pageRectBottom) - 1 > y + height) {
> +                Log.d(LOGTAG, "Aborting low precision update due to viewport not in display-port");

ditto.
Attachment #685803 - Flags: review?(bugmail.mozilla) → review+
Not committing this yet as Part 5 exposes invalidation errors (to do with cancelling) - these errors already exist, but it makes them much worse due to its more aggressive cancelling behaviour.
This is slightly different from the last patch in that it won't cancel high-precision draws until they stop drawing what's visible on the screen (same as prior behaviour) - this fixes issues with sites that invalidate lots of previously-visible content while scrolling (which would cause the entire page to re-render in low-precision, which looked bad and recorded badly).

This gives much nicer talos results: https://tbpl.mozilla.org/?tree=Try&rev=867fab306154 - it recovers rck1, while rck2 and 3 are mostly unaffected - rp gains the same speed-up as before.

I would be happy with this level of performance (though I still think there's room to improve).

Unfortunately, it uncovers a bug where if our low precision and high precision buffers have a different frame resolution, we don't reconcile the difference - so I'm currently fixing that, which I'll append as a part 6.
Attachment #685803 - Attachment is obsolete: true
Attachment #686110 - Flags: review?(bugmail.mozilla)
Attachment #686110 - Flags: review?(bugmail.mozilla) → review+
As of part 5, it's quite possible to zoom in and have the low precision buffer not update until you start panning fast. In this situation, if the low precision buffer had content drawn at a previous frame resolution, it will draw wrong (and you'll be able to see it if you zoom out or pan more quickly than the low precision buffer can be updated to catch up).

This fixes that by applying the necessary transformations when drawing the layer.
Attachment #686171 - Flags: review?(bgirard)
Attachment #686171 - Flags: review?(bgirard) → review+
Blocks: 814437
No longer depends on: 814437
Blocks: 817575
Note: this was uplifted to 19 as part of bug 817575.
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: