Closed Bug 724786 Opened 10 years ago Closed 9 years ago

Layers are invalidated after every pan due to slightly different scrolled root origin

Categories

(Firefox for Android Graveyard :: General, defect, P2)

ARM
Android
defect

Tracking

(fennec11+)

RESOLVED INCOMPLETE
Tracking Status
fennec 11+ ---

People

(Reporter: pcwalton, Unassigned)

References

Details

(Keywords: perf, Whiteboard: [gfx])

Attachments

(2 files, 1 obsolete file)

After every pan, all layers in Native Fennec get completely invalidated, defeating any buffer rotation optimizations. This is due to hitting the following condition:

http://mxr.mozilla.org/mozilla-central/source/layout/base/FrameLayerBuilder.cpp#880

That is, activeScrolledRootTopLeft != data->mActiveScrolledRootPosition. The data is off by a few subpixels, causing layout to invalidate the entire layer we wanted to scroll. Turns out that the position of the top left corner of the scrolled root is changing slightly, so layout is bailing out and repainting everything. It's *very* slight -- the first five or so decimal places are always the same. (FuzzyEquals is not a solution here -- it changes more than FuzzyEquals' tolerance.)

As it stands now (in the kiwifox branch), paints take 74.33 ms (13 fps). After commenting out that check, paints drop to 43.08 ms (23 fps) when panning quickly and 32.74 ms (30 fps) when panning slowly. That's over a 2x improvement in panning performance and will lead to correspondingly less checkerboarding.

I think we definitely want to fix this—it will help the Java compositor as well.
Keywords: perf
Does the amount of error depend on the scale factor?

Bug 681192 might be relevant.
Assignee: nobody → roc
tracking-fennec: --- → 11+
Priority: -- → P1
Sorry, I'm already completely swamped.
Assignee: roc → nobody
Yes, this looks exactly like what Oleg was seeing in bug 681192. I guess this is a Fennec bug that's been around forever. I'll try to apply his patches and see if they still apply.
Depends on: 681192
Doug, shall I take this or should Chris?
Assignee: nobody → pwalton
Assignee: pwalton → nobody
Assignee: nobody → cpeterson
Problem with patches from bug 681192, is broken reftest, which seems related to some other bug, and that other bug need to be fixed in order to push 681192
romaxa, is there a good test to verify performance improvements from bug 681192's patches? 

I enabled pcwalton's frame rate HUD. I don't see much frame rate improvements when panning and zooming timecube.com on my Galaxy Nexus. Without the patches, frame time is about 1-3 ms. With the patches, frame time is about 2-5 ms. However, timecube.com is checkerboarding *badly*, which may invalidate this test.
(In reply to Oleg Romashin (:romaxa) from comment #5)
> Problem with patches from bug 681192, is broken reftest, which seems related
> to some other bug, and that other bug need to be fixed in order to push
> 681192

Can we just disable the reftest? This is very important to land.

(In reply to Chris Peterson (:cpeterson) from comment #6)
> romaxa, is there a good test to verify performance improvements from bug
> 681192's patches? 
> 
> I enabled pcwalton's frame rate HUD. I don't see much frame rate
> improvements when panning and zooming timecube.com on my Galaxy Nexus.
> Without the patches, frame time is about 1-3 ms. With the patches, frame
> time is about 2-5 ms. However, timecube.com is checkerboarding *badly*,
> which may invalidate this test.

The patch won't affect framerate. It's purely a checkerboarding improvement.
cpeterson, if you want to measure checkerboarding you can use the talos test we have in robocop. ping me on IRC if you need help with that.
Bug 724786 part 1 - Tweak browser.js' viewport zoom factor to avoid layer invalidation when panning. Android only.

FrameLayerBuilder.cpp floors offset to an nscoord (int32_t) before scaling. It then compares scaledOffset to NSToIntRoundUp(scaledOffset). To minimize the delta, we want to find the nearest integral scaled value and compute a new zoom between the integral offsets for each axis.
Attachment #596238 - Flags: review?(roc)
Attachment #596238 - Flags: feedback?(pwalton)
Bug 724786 part 2 - Use fuzzy coordinate comparisons when invalidating recycled Thebes layers.

This patch changes core layout code! This change is a rebased subset from bug 681192's patch 557768, which was r=roc.

roc, this patch adds a fuzzy tolerance to CreateOrRecycleThebesLayer() of 1e-4 in a code path I believe is shared by desktop code. What is the smallest tolerance you would accept?

This patch is a "nice to have". My bug-724786-part-1.patch actually does a pretty good job of tweaking the zoom values to induce near-zero deltas in CreateOrRecycleThebesLayer(). Adding the fuzzy tolerance just gives browser.js a "bigger target" to hit. :)
Attachment #596239 - Flags: review?(roc)
Attachment #596239 - Flags: feedback?(pwalton)
Status: NEW → ASSIGNED
Comment on attachment 596238 [details] [diff] [review]
bug-724786-part-1-tweak-viewport-zoom-factor.patch

Review of attachment 596238 [details] [diff] [review]:
-----------------------------------------------------------------

What are the differences you see between scaleX and scaleY here? Hopefully they're very small, otherwise you could get distortion.

Also, I'm somewhat surprised that changing scaleX and scaleY doesn't cause layer invalidation in some other part of layout. But if you didn't observe layer invalidation, this is fine by me!
Attachment #596238 - Flags: feedback?(pwalton) → feedback+
Attachment #596239 - Flags: feedback?(pwalton) → feedback+
Comment on attachment 596238 [details] [diff] [review]
bug-724786-part-1-tweak-viewport-zoom-factor.patch

Review of attachment 596238 [details] [diff] [review]:
-----------------------------------------------------------------

::: mobile/android/chrome/content/browser.js
@@ +1577,5 @@
> +    // To minimize the delta, we want to find the nearest integral scaled value
> +    // and compute a new zoom between the integral offsets for each axis.
> +    let thebesOffsetX = Math.floor(aViewport.x);
> +    aViewport.x = Math.floor(thebesOffsetX / aViewport.zoom);
> +    let zoomFromIntToIntX = aViewport.x ? (thebesOffsetX / aViewport.x) : aViewport.zoom;

I don't really understand what you're doing here. You're leaving aViewport.x divided by its zoom, which seems beyond the scope of what I thought you were going to do here.

However, I don't actually need to review this code.
Attachment #596238 - Flags: review?(roc)
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 681192
Cwiiis, can you please review these browser.js changes? I'm trying to tweak the viewport zoom to minimize the offset rounding error causing layer invalidations in FrameLayerBuilder.cpp's CreateOrRecycleThebesLayer() in patch part-2:

https://bugzilla.mozilla.org/attachment.cgi?id=596239&action=diff
Attachment #596238 - Attachment is obsolete: true
Attachment #597632 - Flags: review?(chrislord.net)
Attachment #597632 - Flags: feedback?(bugmail.mozilla)
Comment on attachment 597632 [details] [diff] [review]
bug-724786-part-1-zoom-to-integer-viewport-offsets-v2.patch

Review of attachment 597632 [details] [diff] [review]:
-----------------------------------------------------------------

I don't want to r- this, but I can't r+ this until the questions I have below are answered.

::: mobile/android/chrome/content/browser.js
@@ +1591,5 @@
> +    // Y axis is longer and thus more senstive to zoom rounding errors, so use
> +    // its new zoom for both axes.
> +    aViewport.y = Math.floor(aViewport.y / aViewport.zoom);
> +    if (aViewport.y > 0)
> +        aViewport.zoom = thebesOffsetY / aViewport.y;

Changing the zoom is going to cause problems when we send viewport updates back to Gecko - updateViewport in GeckoLayerClient/GeckoSoftwareLayerClient uses the zoom to determine whether to synchronise the page size when drawing is finished.

Do we not want to do this on the Java side, so that the Java layers have the correct zoom level? I may not have the entire picture here, so sorry if this question is invalid.
Comment on attachment 597632 [details] [diff] [review]
bug-724786-part-1-zoom-to-integer-viewport-offsets-v2.patch

Review of attachment 597632 [details] [diff] [review]:
-----------------------------------------------------------------

Yeah, this doesn't seem like the right place to be doing this. This almost seems like we're changing our front-end stuff to be bug-compatible with the backend. Why does FrameLayerBuilder.cpp floor offset to nscoord (int32_t) before scaling?
Attachment #597632 - Flags: feedback?(bugmail.mozilla) → feedback-
(In reply to Kartikaya Gupta (:kats) from comment #16)
> Comment on attachment 597632 [details] [diff] [review]
> bug-724786-part-1-zoom-to-integer-viewport-offsets-v2.patch
> 
> Review of attachment 597632 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Yeah, this doesn't seem like the right place to be doing this. This almost
> seems like we're changing our front-end stuff to be bug-compatible with the
> backend. Why does FrameLayerBuilder.cpp floor offset to nscoord (int32_t)
> before scaling?

See https://bugzilla.mozilla.org/show_bug.cgi?id=681192#c8. roc says the invalidation code is correct here.

CreateOrRecycleThebesLayer() doesn't offset to nscoord before scaling -- it uses the nscoord to convert `offset` from app units (1/60 of a pixel, used for most of layout) to doubles.

The right fix is pretty complex -- it basically involves adding a new ScrollTo() API. You either have to add compensation for this issue in the frontend or add it to the implementation of ScrollTo(). Based on what Chris has been telling me, we may end up having to do the latter anyway. But this is an attempt at making a simpler fix.

Note that Oleg actually implemented the latter, and it's probably bitrotted, but as I understand things the only problem was a failing reftest. We could just disable that reftest on mobile.
Attachment #597632 - Flags: review?(chrislord.net)
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
Depends on: 728977
Assignee: cpeterson → bgirard
I don't know why I moved this in the first place, putting back to beta
blocking-fennec1.0: --- → beta+
Whiteboard: [gfx]
Duplicate of this bug: 728977
Priority: P1 → P2
Who is actually the right assignee here?
Matt, its hard for us to tell how bad this is, can you take a look?
Assignee: bgirard → matt.woodrow
blocking-fennec1.0: beta+ → soft
Note that I landed a hack early on in Maple to work around this (it basically just comments out the offending code). I don't know whether my hack still exists. We should fix this for real, as there can be (minor) graphical corruption with my hack.
blocking-fennec1.0: soft → ---
Just talked to Patrick; the behavior described in the bug isn't happening but he would like to figure out why. Assigning to him. This can also be a release blocker.
Assignee: matt.woodrow → pwalton
blocking-fennec1.0: --- → beta+
blocking-fennec1.0: beta+ → -
Was the minusing intentional? Just want to make sure this does not fall off the radar if it was not.
I didn't minus this myself. Damon, did you minus this when you were reviewing the bug list on my laptop Friday?
I think I'm seeing the graphical corruption that this bug entails -- it's annoying -- so this should be fixed the right way.
Putting it back on the beta radar then.
blocking-fennec1.0: - → beta+
Jeff, are we seeing the invalidation in the profiler?
You won't see the invalidation, because the code is ifdef'd out:

http://mxr.mozilla.org/mozilla-central/source/layout/base/FrameLayerBuilder.cpp#912

This is my hack, but it causes minor graphical corruption (one pixel). It should be fixed the right way. I'm not sure if it needs to block beta though, because there *is* that hack there.
Assignee: pwalton → nobody
blocking-fennec1.0: beta+ → +
Are we expecting the "fix this the right way" to happen for Fennec beta? That would require finishing 681192, landing it, and then making further Fennec front-end changes.

This bug is confusing. The bug as filed was fixed by pcwalton's workaround. That introduced a new bug --- the 1px corruption --- but that bug is less important and it's not clear to me that we need to block on it.
If we do want to block on the 1px corruption, please file a new bug, make it block, and make it depend on bug 681192 (and make that block too).
re-noming based on comment 30
blocking-fennec1.0: + → ?
This bug is worked around, which causes the single-pixel corruption. Let's close this out, and I'll open a new bug on the single-pixel corruption.
Status: REOPENED → RESOLVED
blocking-fennec1.0: ? → ---
Closed: 10 years ago9 years ago
Resolution: --- → INCOMPLETE
The new bug is bug 749731.
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in before you can comment on or make changes to this bug.