Regression: Panning and zooming yields flickering and or vanishing content

VERIFIED FIXED in Firefox 30

Status

()

defect
--
major
VERIFIED FIXED
6 years ago
5 years ago

People

(Reporter: aaronmt, Assigned: cwiiis)

Tracking

({regression, reproducible})

30 Branch
Firefox 31
ARM
Android
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox30 verified, firefox31 verified, fennec30+)

Details

()

Attachments

(3 attachments, 3 obsolete attachments)

There seems to be a big graphics related regression on Android where mainly on desktop sites, panning and zooming around will black/white/other out content

Pan around and zoom on e.g, http://newegg.com

See video: http://youtu.be/qB8BftH8ek8

This is reproducible on my Samsung Galaxy SIV (Android 4.4.2)

Last good revision: 41d962d23e81 (2014-03-11)
First bad revision: 44ae8462d6ab (2014-03-12)
Pushlog:

https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=41d962d23e81&tochange=44ae8462d6ab

https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=aae54a60278d&tochange=f4911ad70a26


--
Nightly (03/13)
Actually the range is different from bug 983157, probably unrelated. From the inbound range in comment 0 I would say gw280 or mattwoodrow.
Flags: needinfo?(matt.woodrow)
Flags: needinfo?(gwright)
See Also: 983157
I'd be very surprised it was related to the shadows patch, as that should only affect canvas
Flags: needinfo?(gwright)
Cwiiis, more tiling goodness that all smells like its related to bug 982651. Dupe as appropriate.
Assignee: nobody → chrislord.net
Unfortunately, this isn't a dupe of bug 982651. Something has broken things quite badly and I don't suspect that it's the tiling work either. Looking into this and bug 983202.
Both this and bug 983202 are fixed for me if I disable progressive rendering, so I suspect something has broken in that code, possible due to bug 935219. I don't want to see that get backed out even if it is the case though, so looking into fixing this.
Flags: needinfo?(matt.woodrow)
I made a page for this.
http://htmlpad.org/bigpaint/

It's a big image with a gif set as the CSS background, so that it repaints all the time.
(In reply to microrffr from comment #7)
> I made a page for this.
> http://htmlpad.org/bigpaint/
> 
> It's a big image with a gif set as the CSS background, so that it repaints
> all the time.

This is extremely useful, thanks :)

Note, this is a different issue to bug 983202, and while I still suspect that bug 935219 affected it, it was likely already at least slightly broken. Clearly ComputeProgressiveUpdateRegion is incorrect somehow, or is receiving incorrect data.
Status: NEW → ASSIGNED
So instrumenting the function to look at the values while viewing the site in comment #7, I get these values:

I/Gecko   ( 7497): XXX Composition bounds: 0, 0, 1080x3554
I/Gecko   ( 7497): XXX Scroll offset: 0, 0
I/Gecko   ( 7497): XXX Java composition bounds: 0, 0, 1080x1701 (x1.10)
I/Gecko   ( 7497): XXX Transformed composition bounds: 0, 0, 396x624

Those composition bounds are definitely wrong, they don't reflect the screen dimensions at all. They look like they may also be in the wrong coordinate system, but it's hard to make any reasoning when the initial values are wrong. The composition bounds from Java are, as expected, correct, but I'm not really sure anymore what the transform is meant to do to them.

I'm needinfo'ing botond, as we certainly need to fix the incorrect composition bounds (these are derived from the FrameMetrics composition bounds in ClientTiledThebesLayer::BeginPaint).
Flags: needinfo?(botond)
I had a look at the page from comment 7 and the composition bounds calculated in RecordFrameMetrics is indeed wrong. It is 1470 x 4838 on my x86 Motorola device, which is way too big and the wrong aspect ratio.

The culprit seems to be the change we made to the composition bounds patch in https://bugzilla.mozilla.org/show_bug.cgi?id=935219#c92 where if the scroll frame's view does not itself have a widget, we use the view bounds rather than the bounds of the nearest widget for the composition bounds. In this case the 1470 x 4838 is the view bounds, and it's clearly not what we want.

Timothy, do you know what is the correct thing to do in this case? Should we go back to using the bounds of the nearest widget on Android?

In the meantime, here is a workaround patch that restores the old behaviour (using the nearest widget's bounds).
Flags: needinfo?(botond) → needinfo?(tnikkel)
See Also: → 984264
It sounds like the view bounds aren't what I would have expected.

In general using the widget bounds would be incorrect. The widget bounds correspond to the bounds of the parent document in this situation. The size of the child document (iframe) could be bigger or smaller than those, there is no relation. But on Fennec specifically we treat the child document like it is the top level document and sort of ignore the parent document, so that's why it worked before to use the widget bounds.

It should be fine for now just to use the old widget code and remove the view stuff because I don't think we'll hit a situation where that causes a problem on b2g and fennec.

But we should fix this properly by:
1) understanding why the view bounds are what they are
2) either fixing that, or changing the code to get the bounds in a different way to be correct.
Flags: needinfo?(tnikkel)
Attachment #8392265 - Flags: review+
Setting checkin-needed as the trees are closed. Setting leave-open as this patch alone does not resolve the flickering/vanishing content problem.
Duplicate of this bug: 985500
This is my current patch for this issue and it fixes some of the bug, but not all of it - Low precision tiles can still end up with odd black areas... I'm not sure why this is yet.

There's also still flashing when scrolling around the constantly-updating-page test, which suggests that the 'coherent region' code still isn't quite correct.
Attachment #8393682 - Flags: feedback?(botond)
In fact, the calculation of the composition bounds in the progressive update code is definitely wrong. It's not taking into account the scroll position/displayport correctly. Getting closer though, I'll finish this off tomorrow.
Comment on attachment 8393682 [details] [diff] [review]
WIP: Fix progressive update calculations

Review of attachment 8393682 [details] [diff] [review]:
-----------------------------------------------------------------

On the whole, the code makes more sense to me with these changes than it did without, but I'd still like to go over it with you to understand some parts.

I think part of the source of confusion is that we overload the term "screen pixels" (and the types Screen[Pixel|Point|Rect]). In some places they refer to global screen coordinates, i.e. the thing we get from the device. In other places they refer to the local/tranformed screen coordinates of some layer.

If you look at my coordinate systems diagram, https://bug935219.bugzilla.mozilla.org/attachment.cgi?id=8380975, you can see that there is a big difference between "global screen pixels" and "L's screen pixels". It would be helpful to know which is being used when we use e.g. a ScreenRect.

Note that the composition bounds of a layer L used to be stored in L's screen pixels (not global screen pixels), and bug 935219 changed it to be stored in L's ParentLayer pixels.

::: gfx/layers/client/ClientTiledThebesLayer.cpp
@@ +107,4 @@
>  
> +  // Compute the critical display port in LayoutDevice space
> +  mPaintData.mCriticalDisplayPort = LayoutDeviceIntRect::ToUntyped(RoundedOut(
> +    ((metrics.mCriticalDisplayPort + metrics.GetScrollOffset()) * metrics.GetZoomToParent()) / metrics.GetParentResolution()));

Whose LayoutDevice pixels is mPaintData.mCriticalDisplayPort supposed to be in?

We have a few different layers in play here:
  - the thebes layer, let's call it T
  - the scrollable (container) layer, let's call it L
  - L's parent layer, let's call it M

(I chose L and M to match up with the terminology in https://bug935219.bugzilla.mozilla.org/attachment.cgi?id=8380975).

The metrics belongs to L, to metrics.mCriticalDisplayPort is in L's CSS pixels. GetScrollOffset() is likewise in L's CSS pixels, so adding those is fine. So far, so good.

Multiplying L's CSS pixels by L's GetZoomToParent() gives you M's Layer pixels (also known as L's ParentLayer pixels).

Dividing M's layer pixels by L's parent resolution (which is M's cumulative resolution) gives you M's LayoutDevice pixels. Is that what you want?
Blocks: 984430
(In reply to Botond Ballo [:botond] from comment #18)
> Comment on attachment 8393682 [details] [diff] [review]
> WIP: Fix progressive update calculations
> 
> Review of attachment 8393682 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> On the whole, the code makes more sense to me with these changes than it did
> without, but I'd still like to go over it with you to understand some parts.
> 
> I think part of the source of confusion is that we overload the term "screen
> pixels" (and the types Screen[Pixel|Point|Rect]). In some places they refer
> to global screen coordinates, i.e. the thing we get from the device. In
> other places they refer to the local/tranformed screen coordinates of some
> layer.

Yes, this is confusing :) I think in this patch, whenever I use Screen coordinates, I mean global screen coordinates. Note that the progressive update for b2g is likely broken still in this patch, I'll fix that before submitting for review of course.

> If you look at my coordinate systems diagram,
> https://bug935219.bugzilla.mozilla.org/attachment.cgi?id=8380975, you can
> see that there is a big difference between "global screen pixels" and "L's
> screen pixels". It would be helpful to know which is being used when we use
> e.g. a ScreenRect.
> 
> Note that the composition bounds of a layer L used to be stored in L's
> screen pixels (not global screen pixels), and bug 935219 changed it to be
> stored in L's ParentLayer pixels.
> 
> ::: gfx/layers/client/ClientTiledThebesLayer.cpp
> @@ +107,4 @@
> >  
> > +  // Compute the critical display port in LayoutDevice space
> > +  mPaintData.mCriticalDisplayPort = LayoutDeviceIntRect::ToUntyped(RoundedOut(
> > +    ((metrics.mCriticalDisplayPort + metrics.GetScrollOffset()) * metrics.GetZoomToParent()) / metrics.GetParentResolution()));
> 
> Whose LayoutDevice pixels is mPaintData.mCriticalDisplayPort supposed to be
> in?

They should be in the parent layer's, I believe.

> We have a few different layers in play here:
>   - the thebes layer, let's call it T
>   - the scrollable (container) layer, let's call it L
>   - L's parent layer, let's call it M
> 
> (I chose L and M to match up with the terminology in
> https://bug935219.bugzilla.mozilla.org/attachment.cgi?id=8380975).
> 
> The metrics belongs to L, to metrics.mCriticalDisplayPort is in L's CSS
> pixels. GetScrollOffset() is likewise in L's CSS pixels, so adding those is
> fine. So far, so good.
> 
> Multiplying L's CSS pixels by L's GetZoomToParent() gives you M's Layer
> pixels (also known as L's ParentLayer pixels).
> 
> Dividing M's layer pixels by L's parent resolution (which is M's cumulative
> resolution) gives you M's LayoutDevice pixels. Is that what you want?

I think it's what I almost want - possible I need one further resolution multiply to get it into L's LayoutDevice space(?) This last point I'm not totally sure about.


I realised the part that's wrong with the TiledContentClient changes is that the calculated composition bounds need to be offset by the displayport origin. If I haven't figured out the correct way of doing that, we'll talk about it later.
So this fixes the issue of the flashing and incorrect coordinates for me, but I don't think it's entirely correct (specifically, using ParentLayer coordinates instead of Layer coordinates in places).

I'd like to discuss this with you, but if for whatever reason we don't get to that today, please leave any relevant feedback here :)

This only fixes the Android path, the B2G path remains untested (and likely broken). I'll get to the B2G path once we're sure the Android path is correct. Note that the B2G path will have to deal with scrollable sub-frames and not just the root scrollable frame, so the part that calculates mScrollOffset in ClientTiledThebesLayer::BeginPaint will need to be #ifdef'd to do the right thing when !android (I think).
Attachment #8393682 - Attachment is obsolete: true
Attachment #8393682 - Flags: feedback?(botond)
Attachment #8394025 - Flags: feedback?(botond)
(In reply to Timothy Nikkel (:tn) from comment #11)
> But we should fix this properly by:
> 1) understanding why the view bounds are what they are
> 2) either fixing that, or changing the code to get the bounds in a different
> way to be correct.

Can we file a separate bug to track the possibly-wrong view bounds?
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #21)
> (In reply to Timothy Nikkel (:tn) from comment #11)
> > But we should fix this properly by:
> > 1) understanding why the view bounds are what they are
> > 2) either fixing that, or changing the code to get the bounds in a different
> > way to be correct.
> 
> Can we file a separate bug to track the possibly-wrong view bounds?

I filed bug 985992.
tracking-fennec: ? → 30+
(In reply to Chris Lord [:cwiiis] from comment #19)
> (In reply to Botond Ballo [:botond] from comment #18)
> > Comment on attachment 8393682 [details] [diff] [review]
> > WIP: Fix progressive update calculations
> > 
> > Review of attachment 8393682 [details] [diff] [review]:
> > -----------------------------------------------------------------
> > 
> > On the whole, the code makes more sense to me with these changes than it did
> > without, but I'd still like to go over it with you to understand some parts.
> > 
> > I think part of the source of confusion is that we overload the term "screen
> > pixels" (and the types Screen[Pixel|Point|Rect]). In some places they refer
> > to global screen coordinates, i.e. the thing we get from the device. In
> > other places they refer to the local/tranformed screen coordinates of some
> > layer.
> 
> Yes, this is confusing :) I think in this patch, whenever I use Screen
> coordinates, I mean global screen coordinates. 

OK, good to know. Before bug 935219 landed, mCompositionBounds was in the
local screen coordinates of the scrollable layer, so that explains why this
code was so confusing before.

> They should be in the parent layer's, I believe.
> 
> > We have a few different layers in play here:
> >   - the thebes layer, let's call it T
> >   - the scrollable (container) layer, let's call it L
> >   - L's parent layer, let's call it M
> > 
> > (I chose L and M to match up with the terminology in
> > https://bug935219.bugzilla.mozilla.org/attachment.cgi?id=8380975).
> > 
> > The metrics belongs to L, to metrics.mCriticalDisplayPort is in L's CSS
> > pixels. GetScrollOffset() is likewise in L's CSS pixels, so adding those is
> > fine. So far, so good.
> > 
> > Multiplying L's CSS pixels by L's GetZoomToParent() gives you M's Layer
> > pixels (also known as L's ParentLayer pixels).
> > 
> > Dividing M's layer pixels by L's parent resolution (which is M's cumulative
> > resolution) gives you M's LayoutDevice pixels. Is that what you want?
> 
> I think it's what I almost want - possible I need one further resolution
> multiply to get it into L's LayoutDevice space(?) This last point I'm not
> totally sure about.

If you look at https://bug935219.bugzilla.mozilla.org/attachment.cgi?id=8380975,
you can see that there is a big difference between M's LayoutDevice space and
L's LayoutDevice space. If what you want is L's LayoutDevice space, and what
you're starting from is L's CSS space, all you need to do is multiply by
metrics.mDevPixelsPerCSSPixels.
Spending a bit more time on this, actually, I think I was right before using ParentLayer space and LayoutDevice space rather than global Screen space. But some of the transforms were wrong - I'll work up another patch hopefully before the daily meeting.
tracking-fennec: 30+ → ?
tracking-fennec: ? → 30+
I think this might be the one, but I'll talk to you about it now and we can go through it.
Attachment #8394025 - Attachment is obsolete: true
Attachment #8394025 - Flags: feedback?(botond)
Attachment #8394884 - Flags: review?(botond)
Address the issues that we identified when we went over the patch.
Attachment #8394884 - Attachment is obsolete: true
Attachment #8394884 - Flags: review?(botond)
Attachment #8394993 - Flags: review?(botond)
Note, this doesn't fix bug 983169 (though it can make it better), and there are still odd glitches in the low precision buffer that I'm not really sure about. These can be investigated separately.
Comment on attachment 8394993 [details] [diff] [review]
Fix progressive update calculations v4

Review of attachment 8394993 [details] [diff] [review]:
-----------------------------------------------------------------

r=me with the possible null pointer access addressed

::: gfx/layers/client/ClientTiledThebesLayer.cpp
@@ +123,5 @@
> +  // Calculate the transform required to convert ParentLayer space of our
> +  // display port parent to LayoutDevice space of this layer.
> +  gfx::Matrix4x4 transform = scrollParent->GetTransform();
> +  for (ContainerLayer* parent = scrollParent->GetParent();
> +       parent != displayPortParent->GetParent()->GetParent();

Could displayPortParent->GetParent() be null?

@@ +157,5 @@
> +  // async transforms have occurred, we can use the zoom for this.
> +  mPaintData.mResolution = displayportMetrics.GetZoomToParent();
> +
> +  // Store the parent composition bounds in LayoutDevice units.
> +  // This is actually in LayoutDevice units of the ParentLayer, but because

s/the ParentLayer/scrollParent's parent layer
Attachment #8394993 - Flags: review?(botond) → review+
I wouldn't be surprised if this caused some failures (build or test), so let's push to try first in case. This code is only run on Android and it could affect b2g building, so I'll restrict to those.

https://tbpl.mozilla.org/?tree=Try&rev=c3d9914aff94
Rebased (s/mScrollId/GetScrollId()) and pushed:

https://hg.mozilla.org/integration/mozilla-inbound/rev/fef29327a1b1
Comment on attachment 8392265 [details] [diff] [review]
Workaround for wrong composition bounds

[Approval Request Comment]
Bug caused by (feature/regressing bug #): Progressive update regions end up being too large
User impact if declined: 
Testing completed (on m-c, etc.): On m-c for a while
Risk to taking this patch (and alternatives if risky): Low
String or IDL/UUID changes made by this patch: None
Attachment #8392265 - Flags: approval-mozilla-aurora?
Attachment #8392265 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment on attachment 8394993 [details] [diff] [review]
Fix progressive update calculations v4

[Approval Request Comment]
Bug caused by (feature/regressing bug #): This bug
User impact if declined: Awful experience on many sites
Testing completed (on m-c, etc.): On m-c for a few days
Risk to taking this patch (and alternatives if risky): I think there's risk that this isn't 100% correct, but the situation prior to the patch is definitely worse.
String or IDL/UUID changes made by this patch: None.
Attachment #8394993 - Flags: approval-mozilla-aurora?
(In reply to Wes Kocher (:KWierso) from comment #14)
> https://hg.mozilla.org/mozilla-central/rev/b8f26064637f

https://hg.mozilla.org/releases/mozilla-aurora/rev/f7543e0fd69e

Also, should this bug still be open?
Target Milestone: --- → Firefox 31
This is fixed now that Chris' patch has landed (the 'leave-open' was intended for the first patch which landed earlier).
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Keywords: leave-open
Resolution: --- → FIXED
The second patch needs to be rebased for Aurora uplift.
Flags: needinfo?(chrislord.net)
Rebased Chris' patch to apply to aurora. Hope I did it right!
Flags: needinfo?(chrislord.net)
Attachment #8394993 - Flags: approval-mozilla-aurora?
Status: RESOLVED → VERIFIED
No longer blocks: 976424
Depends on: 988370
Duplicate of this bug: 986991
Depends on: 988882
Depends on: 988889
Depends on: 989278
Not sure whether to comment here or in bug 986991, but still a big issue when typing out comments in Facebook.
Depends on: 993554
Depends on: 997038
Depends on: 1009127
While panning and zooming  there is no flickering, so:
Verified fixed on:
Build: Firefox for Android 30 Beta 4
Device: Alcatel One Touch (Android 4.1.2)
You need to log in before you can comment on or make changes to this bug.