Closed Bug 874950 Opened 7 years ago Closed 6 years ago

Opaque content can incorrectly occlude content that has a displayport set

Categories

(Core :: Layout, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla31
tracking-b2g backlog
Tracking Status
firefox28 --- affected
firefox29 --- affected
firefox30 --- fixed
firefox31 --- fixed

People

(Reporter: wesj, Assigned: cwiiis)

Details

Attachments

(4 files, 5 obsolete files)

Scrolling towards the top of the page on Twitter, I often see some areas of low res content flashing by at the top of the page. It looks like our viewport prediction is off, but Chris tells me that its actually likely a problem with us not drawing the content under opaque fixed position layers, leading to us not having anything nice to show as we scroll.
Yup - it's actually a bug that we retain the low-res content, but I guess it's marginally nicer than seeing a blank area... This is likely a bug in the display-list code.
Cool ... this might explain an intermittent oddness I see in testing bug 840144 ...
This worth tracking?
(In reply to Aaron Train [:aaronmt] from comment #3)
> This worth tracking?

It's a pretty long-standing bug, so probably not. Unless someone has the spare cycles to take a look at it...
Given we've fixed similar bugs quite recently (for example, bug 974643), I think it'd be good to get this back on the radar.

This manifests quite badly on mobile Twitter with APZC enabled, but also, more worryingly, manifests on the Firefox Marketplace. It's a lot more obvious on Twitter, where it manifests as a grey blank space in content that shifts during content painting as you scroll upwards.

It's more obviously visible on devices that perform well (such as the Galaxy Nexus, or I assume, some of the newly announced devices), but is quite visible on every device I've tried.

Fixing this would also improve our scrolling performance in these situations, as we wouldn't have partial-tile invalidations during scrolling.

n?tn as I think the fix for this will touch similar areas to the fix for bug 974643.
blocking-b2g: --- → 1.3?
Component: General → Layout
Flags: needinfo?(tnikkel)
OS: Linux → All
Product: Firefox for Android → Core
Hardware: x86 → All
Version: unspecified → Trunk
fwiw, this only affects us when using a displayport, so this affects B2G (as of 1.3), Firefox for Android and Metro Firefox.
If this has been a long standing bug, then I don't think we would block on this.
blocking-b2g: 1.3? → backlog
(In reply to Jason Smith [:jsmith] from comment #7)
> If this has been a long standing bug, then I don't think we would block on
> this.

It's a long-standing bug, but in code that previously we only ever shipped on fennec. Now we're shipping that code on FirefoxOS (and Metro) and the results of it are a lot more obvious. I think the fix will be relatively simple, but fair enough, not my call.
To get an idea of the impact of this bug, here's me scrolling upwards on Twitter (ignore the stretched bar at the top, which is bug 962629): https://www.dropbox.com/s/5hlpe6jo1x3ttxv/2014-03-11%2016.53.23.mp4
For what it's worth, I didn't make clear that this would be a regression in 1.3 vs. <=1.2. Taking that into consideration, and with the video above, does that alter your decision?
Flags: needinfo?(jsmith)
Retitling this bug for accuracy. The real problem is that anything can occlude a display item that has a displayport - if the resultant layer will be async scrolled, we can't assume that content above it will remain above it unless they share animated geometry roots, I guess.
Summary: Content is not rendered underneath opaque fixed position content layers → Opaque content can incorrectly occlude content that has a displayport set
(In reply to Chris Lord [:cwiiis] from comment #10)
> For what it's worth, I didn't make clear that this would be a regression in
> 1.3 vs. <=1.2. Taking that into consideration, and with the video above,
> does that alter your decision?

I don't think so - mainly because Fennec has already shipped with the problem for multiple releases.
Flags: needinfo?(jsmith)
(In reply to Jason Smith [:jsmith] from comment #12)
> (In reply to Chris Lord [:cwiiis] from comment #10)
> > For what it's worth, I didn't make clear that this would be a regression in
> > 1.3 vs. <=1.2. Taking that into consideration, and with the video above,
> > does that alter your decision?
> 
> I don't think so - mainly because Fennec has already shipped with the
> problem for multiple releases.

ok. I still think this is important, but the decision has been made. One final note, this doesn't affect fennec as badly as low precision content means the blank area you see in the video is usually still rendered, but in low resolution (and because of the generally higher power of Android devices, it's displayed for less time).

I'll see if I can cook up a patch for this and maybe that will change things, as I think this is a really visible bug we shouldn't ship with on b2g (and it doesn't just affect Twitter, it's just a good example - it affects any page with an opaque fixed position content. Not to mention the increased checkerboarding it causes).
Let's at least make sure QA is aware of this so that they are aware and can make sure to properly categorize the severity.
A review from either of you would be great, please unset the other unless you think they should have input too :)

If this isn't the way to do this, advice would be greatly appreciated.
Assignee: nobody → chrislord.net
Status: NEW → ASSIGNED
Attachment #8390035 - Flags: review?(tnikkel)
Attachment #8390035 - Flags: review?(matt.woodrow)
Flags: needinfo?(tnikkel)
So tn pointed out that mDisallowFixedOcclusion (or whatever I called it) was uninitialised, and that was the only reason this worked on b2g. mattwoodrow suggested that maybe it could be done without the boolean.

Here's my attempt at addressing both issues - the patch works for me on b2g, but I've not tested it on Android yet.
Attachment #8390035 - Attachment is obsolete: true
Attachment #8390035 - Flags: review?(tnikkel)
Attachment #8390035 - Flags: review?(matt.woodrow)
Attachment #8390164 - Flags: review?(tnikkel)
Attachment #8390164 - Flags: review?(matt.woodrow)
Discussion on IRC was that the last method wouldn't work as fixed items and scrolled items share the same list - so this alters ComputeVisibilityForSublist to take a frame as argument and uses that instead of assuming the first in the list will be suitable.
Attachment #8390164 - Attachment is obsolete: true
Attachment #8390164 - Flags: review?(tnikkel)
Attachment #8390164 - Flags: review?(matt.woodrow)
Attachment #8390816 - Flags: review?(tnikkel)
Attachment #8390816 - Flags: review?(matt.woodrow)
Comment on attachment 8390816 [details] [diff] [review]
Don't let fixed position items occlude displayport lists v3

Review of attachment 8390816 [details] [diff] [review]:
-----------------------------------------------------------------

I think tn should probably review this, he knows it better than I do.

::: layout/base/nsDisplayList.cpp
@@ +1030,5 @@
>    PROFILER_LABEL("nsDisplayList", "ComputeVisibilityForRoot");
>    nsRegion r;
>    r.And(*aVisibleRegion, GetBounds(aBuilder));
> +  return ComputeVisibilityForSublist(aBuilder,
> +                                     aBuilder->RootReferenceFrame()->GetFirstPrincipalChild(),

This feels a bit fragile, or at least non-obvious that it's finding the scrollable frame.

Could we pass in 'rootScrollFrame' from the nsLayoutUtils caller instead?
Attachment #8390816 - Flags: review?(matt.woodrow)
(In reply to Matt Woodrow (:mattwoodrow) from comment #19)
> ::: layout/base/nsDisplayList.cpp
> @@ +1030,5 @@
> >    PROFILER_LABEL("nsDisplayList", "ComputeVisibilityForRoot");
> >    nsRegion r;
> >    r.And(*aVisibleRegion, GetBounds(aBuilder));
> > +  return ComputeVisibilityForSublist(aBuilder,
> > +                                     aBuilder->RootReferenceFrame()->GetFirstPrincipalChild(),
> 
> This feels a bit fragile, or at least non-obvious that it's finding the
> scrollable frame.
> 
> Could we pass in 'rootScrollFrame' from the nsLayoutUtils caller instead?

That'd be fine by me :)
n?vingtetun to find out if this helps with performance of position:fixed and position:sticky at all.
Flags: needinfo?(21)
This (hopefully) addresses the issues we spoke about on IRC.
Attachment #8390816 - Attachment is obsolete: true
Attachment #8390816 - Flags: review?(tnikkel)
Attachment #8391681 - Flags: review?(tnikkel)
Comment on attachment 8391681 [details] [diff] [review]
Don't let fixed position items occlude displayport lists v4

I think we need to stop the loop when we hit the root frame of the document with the displayport frame. Otherwise the item would be in a fixed subtree in the parent document. You can use GetRootFrame on the presshell of the display port frame.

Also please check that it has fixed style, but also that it has a parent, and the parent does not have a parent (ie is the root frame) so we don't include fixed stuff which gets turned into abs stuff inside a transform.

I think it would be cleaner if we just called SetDisplayPort in ComputeVisibility of the two display items, that's when we need it, and saves duplicating the code.

I think using mScrollFrame for nsDisplayScrollLayer items is more consistent (we use the root scroll frame for the one case).
"I think we need to stop the loop when we hit the root frame of the document with the displayport frame. Otherwise the item would be in a fixed subtree in the parent document. You can use GetRootFrame on the presshell of the display port frame."

Done. Is this right though? Don't we still want to stop occlusion even if it's in the parent document? (although is this even possible in terms of displaylist layout?)

"Also please check that it has fixed style, but also that it has a parent, and the parent does not have a parent (ie is the root frame) so we don't include fixed stuff which gets turned into abs stuff inside a transform."

Done.

"I think it would be cleaner if we just called SetDisplayPort in ComputeVisibility of the two display items, that's when we need it, and saves duplicating the code."

I removed the functions and member variables entirely, just went with optional parameters, which I think is smaller and cleaner.

"I think using mScrollFrame for nsDisplayScrollLayer items is more consistent (we use the root scroll frame for the one case)."

Done.
Attachment #8391681 - Attachment is obsolete: true
Attachment #8391681 - Flags: review?(tnikkel)
Attachment #8392166 - Flags: review?(tnikkel)
Sorry for churn, forgot to remove the member variables from nsDisplayList.
Attachment #8392166 - Attachment is obsolete: true
Attachment #8392166 - Flags: review?(tnikkel)
Attachment #8392168 - Flags: review?(tnikkel)
Comment on attachment 8392168 [details] [diff] [review]
Don't let fixed position items occlude displayport lists v5.1

>@@ -1148,19 +1151,45 @@ nsDisplayList::ComputeVisibilityForSubli
>+        nsIFrame* displayPortRoot = aDisplayPortFrame->PresContext()->PresShell()->GetRootFrame();

Can you just calculate this once, outside the loop at the start?

>+   * @param aInDisplayPort true if the item for which this list corresponds
>+   * is within a displayport.
>+   * @param aDisplayPortFrame If the item for which this list corresponds is
>+   * within a displayport, the scroll frame for which that display port
>+   * applies.

No more aInDisplayPort.

This is a minor point, but I'd rather not have a false comment lieing around: aDisplayPortFrame is the root frame (viewport) in the nsDisplaySubDocument case, not the (root) scroll frame. It doesn't matter to the code though. Maybe just add "For root scroll frames you can pass the root frame instead."
Attachment #8392168 - Flags: review?(tnikkel) → review+
(In reply to Chris Lord [:cwiiis] from comment #24)
> "I think we need to stop the loop when we hit the root frame of the document
> with the displayport frame. Otherwise the item would be in a fixed subtree
> in the parent document. You can use GetRootFrame on the presshell of the
> display port frame."
> 
> Done. Is this right though? Don't we still want to stop occlusion even if
> it's in the parent document? (although is this even possible in terms of
> displaylist layout?)

Correct, we don't want to occlude if a fixed position item in a parent document is on top of an async-scrollable item in a child document. But that's not the situation we have here. Fixed position items that are in a parent document to the document of aDisplayPortFrame can never be in the display list. When we process the display list for the parent document is when we would consider this situation, and so then we could determine that the fixed items should not occlude.

Stopping at the root frame of aDisplayPortFrame's document also saves a bunch of needless pointer chasing if we are in a subdocument, but there is also the correctness aspect.
Comments addressed and pushed to inbound: https://hg.mozilla.org/integration/mozilla-inbound/rev/7364948c6566
Comment on attachment 8392168 [details] [diff] [review]
Don't let fixed position items occlude displayport lists v5.1

Review of attachment 8392168 [details] [diff] [review]:
-----------------------------------------------------------------

::: layout/base/nsDisplayList.cpp
@@ +1164,5 @@
> +      // is and they're in the same document, don't let it occlude this list.
> +      bool occlude = true;
> +      if (aDisplayPortFrame) {
> +        nsIFrame* displayPortRoot = aDisplayPortFrame->PresContext()->PresShell()->GetRootFrame();
> +        for (nsIFrame* frame = item->Frame(); frame && frame != displayPortRoot;

So for every visible item, we're traversing (almost) all frame ancestors. That's a lot of overhead :-(. Isn't there another way to do this?

Honestly I think we should try just removing all occlusion culling from ComputeVisibility (in which case we should rename it!). That would remove a bunch of overhead. We'd still do per-ThebesLayer culling after FrameLayerBuilder has run.
Attachment #8392168 - Flags: feedback-
Okay, I figured it would be roughly the same amount of work as getting the animated geometry root, but I guess we don't actually do that for every display item in a lot of cases.
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #29)
> Comment on attachment 8392168 [details] [diff] [review]
> Don't let fixed position items occlude displayport lists v5.1
> 
> Review of attachment 8392168 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: layout/base/nsDisplayList.cpp
> @@ +1164,5 @@
> > +      // is and they're in the same document, don't let it occlude this list.
> > +      bool occlude = true;
> > +      if (aDisplayPortFrame) {
> > +        nsIFrame* displayPortRoot = aDisplayPortFrame->PresContext()->PresShell()->GetRootFrame();
> > +        for (nsIFrame* frame = item->Frame(); frame && frame != displayPortRoot;
> 
> So for every visible item, we're traversing (almost) all frame ancestors.
> That's a lot of overhead :-(. Isn't there another way to do this?
> 
> Honestly I think we should try just removing all occlusion culling from
> ComputeVisibility (in which case we should rename it!). That would remove a
> bunch of overhead. We'd still do per-ThebesLayer culling after
> FrameLayerBuilder has run.

I tried removing that occlusion culling block while experimenting and it killed the frame-rate on b2g (I guess we ended up with more layers(?)).

Could we mark any fixed items when they're added to the displaylist instead, then propagate that flag when adding children?
That sounds like a good idea. Couldn't we even just flip a flag on the builder in nsFrame::BuildDisplayListForChild when considering a fixed child (flipping it back when done), the flag would then make the builder mark any new items created as fixed?
https://hg.mozilla.org/mozilla-central/rev/7364948c6566
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla31
(In reply to Chris Lord [:cwiiis] from comment #31)
> I tried removing that occlusion culling block while experimenting and it
> killed the frame-rate on b2g (I guess we ended up with more layers(?)).

That would have likely removed all occlusion culling. We do two passes: a global compute visibility pass, and then another per-layer that only considers the items in one layer. The same machinery is used for both. roc's suggestion was to just skip the first.
What do you think of doing this? We have to use an extra flag on display items.
Attachment #8393362 - Flags: review?(roc)
One wrinkle: if we don't unset the bit when going into subdocument we wouldn't do any occlusion in subdocuments that are in fixed pos in the parent doc. This makes the fix not apply if a subdoc in a fixed pos subtree in the parent doc occludes things in an important manner, but that's probably not too important.
Attachment #8393364 - Flags: review?(roc)
But appying my patches no longer seems to fix the bug...
The fix was obvious. I'll fold this in when landing.
Attachment #8393965 - Flags: review?(roc)
I'm not sure what flags to set to ask for uplift to 1.4 here, so requesting blocking.

This is a highly visible issue that causes poor performance and visual glitches whenever there's opaque fixed position content in a page. See the video in comment #9 for an example, where it manifests on twitter.com - the video was taken on a Galaxy Nexus, which has superior specifications to most of the phones that ship (where this bug has a larger impact). Note that this is a regression - without APZC, this bug is not exposed.

We have a fix, and this affects many more sites and apps than just Twitter. I really think we should uplift this.
blocking-b2g: backlog → 1.4?
Comment on attachment 8392168 [details] [diff] [review]
Don't let fixed position items occlude displayport lists v5.1

I'm told Aurora approval is what's required to get this in 1.4.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): Blank areas and poor performance when scrolling sites with opaque, fixed position elements
User impact if declined: Sites like Twitter.com feel bad
Testing completed (on m-c, etc.): Initial patch on m-c for a few days, tested locally
Risk to taking this patch (and alternatives if risky): Low risk. Small risk that some layers do more drawing than they could get away with.
String or IDL/UUID changes made by this patch: None

Please consider this request to include the other patches that were committed to improve the performance of this fix.
Attachment #8392168 - Flags: approval-mozilla-aurora?
(In reply to Chris Lord [:cwiiis] from comment #41)
> I'm not sure what flags to set to ask for uplift to 1.4 here, so requesting
> blocking.
> 

There was already active decision to blocking- on 1.3, so that decision still stands for 1.4. Asking for aurora approval will allow this to land on 1.4 if it's approved.
blocking-b2g: 1.4? → backlog
Attachment #8392168 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Pushed to aurora: https://hg.mozilla.org/releases/mozilla-aurora/rev/d6b4eebd92ed
Target Milestone: mozilla31 → mozilla30
TM = when it was fixed on m-c. Status flags track uplifts.
Target Milestone: mozilla30 → mozilla31
(In reply to Chris Lord [:cwiiis] from comment #21)
> n?vingtetun to find out if this helps with performance of position:fixed and
> position:sticky at all.

I didn't measure it directly, but I removed a lot of position: sticky in our apps and I still see more checkerboarding for the contacts app (where I have not yet replaced position: sticky with some custom code).
Flags: needinfo?(21)
Depends on: 1000423
No longer depends on: 1000423
blocking-b2g: backlog → ---
You need to log in before you can comment on or make changes to this bug.