Closed Bug 936500 Opened 11 years ago Closed 10 years ago

Requesting a displayport outside the currently visible region fails to build a layer for the content

Categories

(Core :: Layout, defect)

x86_64
Windows 8.1
defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla29
blocking-b2g 1.3+
Tracking Status
firefox28 + verified
firefox29 --- verified
b2g-v1.3 --- verified
b2g-v1.3T --- fixed
b2g-v1.4 --- verified

People

(Reporter: jimm, Assigned: cwiiis)

References

Details

(Whiteboard: [beta28][layout])

Attachments

(4 files, 2 obsolete files)

Attached video WP_20131108_09_27_59_Pro.webmhd.webm (obsolete) —
Str:

1) open a long page of text
2) flick the page down repeatedly

result: often you'll see a while area oposite the scroll direction and text will turn blurry. 

3) stop the scroll
4) scroll a small amount

result: the view corrects
to view the video, use chrome or in fx, start around seven seconds in.
Whiteboard: [triage]
Whiteboard: [triage] → [block28]
The patch on bug 899154 should help here, if not fix it entirely. (I'm just saying that based on the symptoms described in comment 0)
Depends on: 899154
The blurry text problem seems to have goner away. However I'm seeing pretty broken scrolling via flicks on the nightly from sunday for long pages of text. This seems to have become worse over the last few days.
Can you describe the exact symptoms you're seeing (along with STR if you have any)? I can intermittently repro an issue on the alice in wonderland page where if I flick around for a while the content area eventually blanks to the background color of the page and any attempts at panning or zooming have no visible effect. Reloading the page fixes it. Is that the thing you're seeing?
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #4)
> Can you describe the exact symptoms you're seeing (along with STR if you
> have any)? I can intermittently repro an issue on the alice in wonderland
> page where if I flick around for a while the content area eventually blanks
> to the background color of the page and any attempts at panning or zooming
> have no visible effect. Reloading the page fixes it. Is that the thing
> you're seeing?

yep!
The issue I was seeing appears to be because the scrollable container layer for the content vanishes. Possibly another layout bug, not really sure. I determined this by turning on the APZC logging in APZCTreeManager.cpp, which showed me that the APZC instance was getting destroyed at the same time that I saw the screen blank. I added a call to aRoot->Manager()->Dump() at [1] to see what the layer tree looked like to cause that, and the layer tree dump there showed there were no scrollable layers in the tree. The output is attached showing the layer tree. The content repaint requests from the APZC just prior to this happening all look sane so I'm not sure why this is happening.

[1] http://mxr.mozilla.org/mozilla-central/source/gfx/layers/composite/APZCTreeManager.cpp?rev=53e39839a03f#96
I got a longer log and based on that I suspect it might be that the displayport calculation puts it really far from the scroll offset (because of the high panning velocity). It's still within the page bounds so it should be valid but maybe it's tripping some sanity check somewhere. I'll keep digging.
Ok, yeah this seems to happen when the displayport doesn't intersect the visible area. The attached patch allows reproducing it easily - just load a long page and scroll down slowly. The patch pushes the displayport down based on the y-scroll-offset, so you'll see the top part of the visible area increasingly blank. Once the blank white reaches the bottom of the screen it ends up in the state that this bug is about. We should be able to fix this in the APZC displayport calculation code but I want to understand a bit better why this is happening and if there are performance implications.
Assignee: nobody → bugmail.mozilla
Summary: Repeated flicks can result in broken bounds / blurry text → Requesting a displayport outside the currently visible region fails to build a layer for the content
I discussed this with :tn a bit, and he pointed out that since there's no layer created there's nowhere for the painted content to go, and so it's likely that no paint is happening at all or that it is discarded. This means that when the displayport is outside the visible area (which is a legitimate use case) we end up not having the displayport at all and so we can't async pan to it. So the "correct" fix here seems to be that we should still paint and layerize the displayport even if it doesn't intersect the visible region.

As a fallback approach we can modify APZC::CalculatePendingDisplayPort to not generate a displayport that is outside the composition bounds and that would also prevent us from getting stuck in this state, but it is not as good of a fix.

I'll continue looking into this to see if I can figure out exactly which bit of layout code needs updating.
Component: Panning and Zooming → Layout
Version: 26 Branch → Trunk
I spent some more time looking at this and as tn suspected it was in fact the ComputeVisibility calculation that was coming back with an empty region. I did find the call to SubtractFromVisibleRegion that was causing the region to get emptied out but I still don't have a good intuition as to what the different rects (particularly when nested a few layers down in the displaylist tree) so it's hard for me say what needs to change.
Assignee: bugmail.mozilla → nobody
Assignee: nobody → tnikkel
Turns out it's Metro only, so tn can't cover it.
Assignee: tnikkel → ajones
ajones not the right person for it, I'll find somebody else.
Assignee: ajones → nobody
Whiteboard: [block28] → [layout][layout]
Whiteboard: [layout][layout] → [layout]
Blocks: metrobacklog
No longer blocks: metro-apzc
With the patch in bug 944938, this bug is reproducible on Mac (e.g. on planet.mozilla.org), so tn could look into it now if he wants to.
Sorry, messed up those flags.
Whiteboard: [layout] → [beta28][layout]
Blocks: metrov1backlog
No longer blocks: metrobacklog
Assignee: nobody → tnikkel
I spent too long trying to debug this and didn't get anywhere. I don't think ComputeVisibility is at fault - nsDisplayScrollLayer doesn't alter the visible region and it still returns true in the case that the displayport doesn't intersect with the frame rect.

I was suspect of GetBounds, in that nsDisplayScrollLayer will return a bounds that relates to its displayport, I think, where as I think it should return the bounds of the scrollframe - though making that change didn't seem to help (I may have made it incorrectly though, so if there's any sense in this, it's worth double-checking by someone with more layout knowledge).

I also tried modifying SubtractFromVisibleRegion so it does nothing, but the bug persists, so that further rules out ComputeVisibility I think.

At this point, I think it's not so simple that I can easily fix it, so I'm going to work around the issue in bug 943846 and we can remove that work-around when this bug gets fixed. I'm dubious of the displayport heuristics in APZC anyway, but that's an aside.
(In reply to Chris Lord [:cwiiis] from comment #15)
> I was suspect of GetBounds, in that nsDisplayScrollLayer will return a
> bounds that relates to its displayport, I think, where as I think it should
> return the bounds of the scrollframe - though making that change didn't seem
> to help (I may have made it incorrectly though, so if there's any sense in
> this, it's worth double-checking by someone with more layout knowledge).

That sounds plausible. I can take a look at your patch.
Requesting tracking since as a result of this bug, flings can result in blank pages being displayed, which the user must close to fix.
(In reply to Jim Mathies [:jimm] from comment #17)
> Requesting tracking since as a result of this bug, flings can result in
> blank pages being displayed, which the user must close to fix.

Just to note, if the page goes *completely* blank, or shifts in a direction, revealing blank area, this is likely a different bug. If you're talking about individual elements going blank, that's more likely to be caused by this bug.
(In reply to Timothy Nikkel (:tn) from comment #16)
> (In reply to Chris Lord [:cwiiis] from comment #15)
> > I was suspect of GetBounds, in that nsDisplayScrollLayer will return a
> > bounds that relates to its displayport, I think, where as I think it should
> > return the bounds of the scrollframe - though making that change didn't seem
> > to help (I may have made it incorrectly though, so if there's any sense in
> > this, it's worth double-checking by someone with more layout knowledge).
> 
> That sounds plausible. I can take a look at your patch.

It was quite small, so I didn't save it - but I just implemented GetBounds on nsDisplayScrollLayer, and in the implementation I check if there's a displayport (in the same way that ComputeVisibility does). If there isn't, I just chain up to GetBounds on nsDisplayWrapList, otherwise I returned what I was hoping would be the scroll frame bounds - something like mScrollFrame->GetVisualOverflowRectRelativeToSelf() + mScrollFrame->GetOffsetToCrossDoc(ReferenceFrame());

I tried a few variations of different things, but couldn't get it quite right - I don't think my testing methodology was thorough enough though, and I wasn't sure what values I really wanted.
(In reply to Chris Lord [:cwiiis] from comment #18)
> (In reply to Jim Mathies [:jimm] from comment #17)
> > Requesting tracking since as a result of this bug, flings can result in
> > blank pages being displayed, which the user must close to fix.
> 
> Just to note, if the page goes *completely* blank, or shifts in a direction,
> revealing blank area, this is likely a different bug. If you're talking
> about individual elements going blank, that's more likely to be caused by
> this bug.

I'm assuming we're still dealing with the bug discussed in comments 4 - 8. If not we need to file a new bug for it.

To reproduce, fling a long page of text a few times quickly. After a few flings, the page will go blank, although it will maintain the correct background color. Further panning doesn't correct the issue.

This page is a good test case: http://www.gutenberg.org/files/11/11-h/11-h.htm
Attachment #829314 - Attachment is obsolete: true
Just to note, I'm going to push a work-around in bug 943846, but that should be backed out when this is fixed.
So changing nsDisplayScrollLayer::GetBounds to return this:

  return mScrollFrame->GetRect() -
         mScrollFrame->GetPosition() +
         mScrollFrame->GetOffsetToCrossDoc(ReferenceFrame());

does not fix the issue, for reference. Given that its ComputeVisibility doesn't change aVisibleRegion and still returns true in this case, I'm thinking the layer disappears somewhere in FrameLayerBuilder. Checking there. tn, if you're looking at this, do carry on, you'll certainly get there quicker than me, just had a moment and didn't want to give up :p
ooh, just noticed ShouldBuildLayerEvenIfInvisible...
ok, so returning true from ShouldBuildLayerEvenIfInvisible stops the layer from not being created, so it must be disappearing because its itemVisibleRect in FrameLayerBuilder comes up empty - which means either its GetBounds are wrong, or the mVisibleRect it sets in ComputeVisibility is wrong. Getting somewhere.
Got waylaid by a meeting, but just to update, the layer is disappearing because of the display item's clipping rect (which is the bounds of the scroll frame) not intersecting with the displayport in ContainerState::ProcessDisplayItems.

I wonder how we want to handle this... If nsDisplayScrollLayer just returns true from ShouldBuildLayerEvenIfInvisible, that stops the layer from disappearing, but it doesn't stop it from turning blank - it just means that as soon as another displayport gets set, it's fine.

I'm not really sure why that is, if the clip was being applied to drawing, we wouldn't see anything outside of the viewport at all...
(In reply to Chris Lord [:cwiiis] from comment #25)
> Got waylaid by a meeting, but just to update, the layer is disappearing
> because of the display item's clipping rect (which is the bounds of the
> scroll frame) not intersecting with the displayport in
> ContainerState::ProcessDisplayItems.
> 
> I wonder how we want to handle this... If nsDisplayScrollLayer just returns
> true from ShouldBuildLayerEvenIfInvisible, that stops the layer from
> disappearing, but it doesn't stop it from turning blank - it just means that
> as soon as another displayport gets set, it's fine.
> 
> I'm not really sure why that is, if the clip was being applied to drawing,
> we wouldn't see anything outside of the viewport at all...

So I actually think just the ShouldBuildLayerEvenIfInvisible being true is enough to fix this, and there's a separate issue that b2g is just not setting the right displayport sometimes, once its gotten far enough ahead. I'll continue to look into this.
Attachment #8343868 - Flags: review?(tnikkel)
I can confirm, my attached patch is enough to fix the layout issue, there is a further issue in AsyncPanZoomController - once the acceleration gets high enough, there's no guarantee that a displayport will be set once the scrolling finishes that actually contains the screen, it seems - I've only been able to reproduce this scrolling upwards though.
It confuses me that this happens, because if I read the code right, at the end of a fling, the Axis will set its velocity to zero and RequestContentRepaint() will get called, which will calculate a new displayport based on that velocity - which is lower than gMinSkateSpeed, so the displayport should be centred around the composition bounds.
Actually, we only need to do this if it has a displayport set.
Attachment #8343868 - Attachment is obsolete: true
Attachment #8343868 - Flags: review?(tnikkel)
Attachment #8343986 - Flags: review?(tnikkel)
So the reason the element can still end up white after this patch is that the fling seems to stop way too early - I've not debugged why yet, but it stops long before the velocity reaches gMinSkateSpeed or gFlingStoppedThreshold.

I'm going to be PTO on Monday and Tuesday, so if this needs to be fixed before then, best find someone else... Needinfo'ing milan, in case.
Flags: needinfo?(milan)
Timothy is probably our best bet.
Flags: needinfo?(milan) → needinfo?(tnikkel)
(In reply to Milan Sreckovic [:milan] from comment #32)
> Timothy is probably our best bet.

For the layout issue yes, but the stuff from comment 31 sounds like APZC, for which there are better bets then I. :)
Flags: needinfo?(tnikkel) → needinfo?(milan)
Sure, I was going off the component setting - Timothy, can you and Kats sort out where the bug is and set the component to panning & zooming if it's APZC side?
Flags: needinfo?(milan)
Comment on attachment 8343986 [details] [diff] [review]
Always build nsDisplayScrollLayer if it has a display-port

The reason just changing the bounds didn't work is that you need to change both the bounds and the visible region (they are both currently the display port) because ProcessDisplayItems intersects the visible region with the (clipped) bounds to get the visible region for the layer.

But to make that change we will also need to change ProcessDisplayItems to not set the visible region for scroll layer items (because it doesn't have the right visible region, it needs the display port).

I think we should fix the coordinate issue properly, but that is a little bit more invasive patch, so this patch seems like a reasonable quick fix.
Attachment #8343986 - Flags: review?(tnikkel) → review+
Well, that ended up being simpler than I expected :p
Assignee: tnikkel → chrislord.net
Status: NEW → ASSIGNED
Attachment #8347262 - Flags: review?(bugmail.mozilla)
Attachment #8347262 - Flags: review?(bugmail.mozilla) → review+
Blocks: 949548
https://hg.mozilla.org/mozilla-central/rev/56a287aa20fd
https://hg.mozilla.org/mozilla-central/rev/5d3f25ce4a6a
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla29
No longer blocks: metrov1backlog
The FlingAnimation error was a regression caused by bug 839911 so adding a dependency relationship.
Blocks: 839911
Whoops, I thought this landed in time for 28 for some reason - We should track this for b2g.
blocking-b2g: --- → 1.3?
Comment on attachment 8343986 [details] [diff] [review]
Always build nsDisplayScrollLayer if it has a display-port

We want this in both b2g 1.3 and Metro Aurora.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): When flinging (without the displayport work-around patch), the flinged element can suddenly disappear
User impact if declined: 
Testing completed (on m-c, etc.): Been on m-c for a few days and tested locally
Risk to taking this patch (and alternatives if risky): Low
String or IDL/UUID changes made by this patch: None
Attachment #8343986 - Flags: approval-mozilla-aurora?
Comment on attachment 8347262 [details] [diff] [review]
Fix FlingAnimation

We want this on both b2g 1.3 and Metro Aurora

[Approval Request Comment]
Bug caused by (feature/regressing bug #): When flinging fast, you can end up with bits of the flinged element missing
User impact if declined: 
Testing completed (on m-c, etc.): Been on m-c for a few days
Risk to taking this patch (and alternatives if risky): Low, and possibly fixes some crashes (apparently)
String or IDL/UUID changes made by this patch: None
Attachment #8347262 - Flags: approval-mozilla-aurora?
The approval for this also needs to go hand-in-hand with the backout of bug 943846.
Chris, we need to backout bug 943846 from Aurora, so that one needs a 1.3+ as well?
blocking-b2g: 1.3? → 1.3+
(In reply to Milan Sreckovic [:milan] from comment #44)
> Chris, we need to backout bug 943846 from Aurora, so that one needs a 1.3+
> as well?

Requested on that bug, but not sure if I need to open a new bug to track that... I'll let whoever handles the flag-setting educate me.
Attachment #8343986 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #8347262 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
(In reply to Timothy Nikkel (:tn) from comment #35)
> Comment on attachment 8343986 [details] [diff] [review]
> Always build nsDisplayScrollLayer if it has a display-port
> 
> The reason just changing the bounds didn't work is that you need to change
> both the bounds and the visible region (they are both currently the display
> port) because ProcessDisplayItems intersects the visible region with the
> (clipped) bounds to get the visible region for the layer.
> 
> But to make that change we will also need to change ProcessDisplayItems to
> not set the visible region for scroll layer items (because it doesn't have
> the right visible region, it needs the display port).
> 
> I think we should fix the coordinate issue properly, but that is a little
> bit more invasive patch, so this patch seems like a reasonable quick fix.

I filed bug 951467 for this (with patch).
Target Milestone: mozilla29 → mozilla28
The target milestone is supposed to reflect the train on m-c when the patch landed. The status flags track uplifts and whether or not the bug was fixed on other branches.
Target Milestone: mozilla28 → mozilla29
Keywords: verifyme
Mozilla/5.0 (Windows NT 6.3; WOW64; rv:28.0) Gecko/20100101 Firefox/28.0

Verified as fixed on Windows 8.1 x64 using latest Aurora Metro 28.0a2 (buildID: 20131219004003).
The issue doesn't reproduce on the latest master build, when scrolling a text is not blurry

Device: Buri 1.4 MOZ
BuildID: 20140219040204
Gaia: ac06cfbd2baf6494ffbb668cc599e3892cd5e17b
Gecko: bf0e76f2a7d4
Version: 30.0a1
Firmware Version: v1.2-device.cfg
Verified as fixed v1.3. This issue does NOT reproduce on the latest v1.3 build:

3/26 Environmental Variables:
Device: Buri 1.3 MOZ RIL
BuildID: 20140326004002
Gaia: 812838ad0fabf51fa14435af562ddac6d26fa936
Gecko: ba97efb0da4b
Version: 28.0
Firmware Version: V1.2-device.cfg

Verified as fixed v1.4. This issues does NOT reproduce on the latest v1.4 build:

3/26 Environmental Variables:
Device: Buri 1.4 MOZ RIL
BuildID: 20140326000201
Gaia: 7e705dd4718d528974d99ac31866318d7e201152
Gecko: 4889124accfa
Version: 30.0a2
Firmware Version: V1.2-device.cfg

-

1) I am currently unable to verify this issue on the v1.3T Branch, leaving verifyme keyword.

2) Changing bug status to verified as this was tested against master v1.4.
Status: RESOLVED → VERIFIED
Keywords: verifyme
You need to log in before you can comment on or make changes to this bug.