Closed Bug 993554 Opened 10 years ago Closed 10 years ago

Overflow / positioned blocks disappear when scrolled

Categories

(Firefox for Android Graveyard :: Toolbar, defect)

ARM
Android
defect
Not set
normal

Tracking

(firefox30 verified, firefox31 verified, fennec30+)

VERIFIED FIXED
Firefox 31
Tracking Status
firefox30 --- verified
firefox31 --- verified
fennec 30+ ---

People

(Reporter: jryans, Assigned: kats)

References

()

Details

(Keywords: regression)

Attachments

(1 file, 2 obsolete files)

Fennec 31 (2014-04-08)
LG G2 (Android 4.2.2)

STR:

1. Go to https://istlsfastyet.com/#server-performance
2. In the "Server Performance" section, there is a status table with green / yellow / red blocks.
3. On small width devices, you can scroll the table horizontally
4. Try to scroll the table by dragging on it

ER:

The table should scroll smoothly (as it does on desktop, if you use Responsive Mode to shrink down to emulate a small width display).

AR:

When you drag, the entire table disappears for ~5 seconds.  It then reappears, seemingly scrolled to the right position, though it's hard to tell since you can't see it move.

The page applies the following styles to the tbody element at small widths:

@media only screen and (max-width: 40em) {
  tbody { 
    display: block; 
    width: auto; 
    position: relative; 
    overflow-x: auto; 
    white-space: nowrap; 
  }
}

I've seen this issue with code blocks, etc. on other pages too, likely using similar styling for phones.
Subframes are still scrolled synchronously on Fennec, so it's not going to be smooth unless it's really cheap to paint and/or you have a fast device. 5 seconds with the table disappearing seems excessive though. Are you able to profile this using the Gecko profiler [1]?

[1] https://developer.mozilla.org/en-US/docs/Performance/Profiling_with_the_Built-in_Profiler
Spent some time bisecting for a bit...

Last good revision: 50ae5e011f79
First bad revision: fef29327a1b1
Pushlog:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=50ae5e011f79&tochange=fef29327a1b1

Looks like this issue was caused by bug 983208.  Before the changes of that bug, scrolling is visually glitchy as it mentions, but the content does not disappear entirely for many seconds as it does now.

Would the profiler output still be helpful here?
Blocks: 983208
Keywords: regression
No, in that case this is a tiling bug, no need for profiler output.
tracking-fennec: --- → ?
Assignee: nobody → bugmail.mozilla
Status: NEW → ASSIGNED
So I investigated this a bit. What appears to be happening is that the subframe gets layerized while it's being actively scrolled. When that happens, the paint code in ClientTiledThebesLayer::RenderLayer cannot distinguish it properly (i.e. it ends up with the same scrollParent and displayportParent as the "main" frame). It then proceeds to clip the paint area by the untransformed critical displayport of the main frame. This will almost always result in blank showing up, unless the subframe happens to be at the top of the page (scroll the subframe on https://people.mozilla.org/~kgupta/griddiv.html to see an example of the clipping behaviour in action). When you stop scrolling the layer gets flattened and it shows up again.

The child layer has a 2D translation transform set on it by the FrameLayerBuilder to put it in the right place, but the code at [1] presumably misunderstands that as something else and doesn't do the right thing. I don't know if this will be a problem on B2G because actively scrolling subframes do have their own FrameMetrics and critical displayport, and so wouldn't find and use the parent's.

[1] http://mxr.mozilla.org/mozilla-central/source/gfx/layers/client/ClientTiledThebesLayer.cpp?rev=c5a753c7a84a#73
Attached patch Patch (obsolete) — Splinter Review
This is the most conservative fix I could come up with, because this will need to be uplifted to Aurora before the merge to avoid bustage on Fennec beta. I can file a follow-up to get a proper fix, presumably one that Chris can review in detail.
Attachment #8408234 - Flags: review?(botond)
Attachment #8408234 - Flags: review?(bgirard)
Comment on attachment 8408234 [details] [diff] [review]
Patch

Botond pointed out on IRC that this will affect the main frame as well because that also gets a 2D transform, so this is probably not a good idea.
Attachment #8408234 - Flags: review?(botond)
Attachment #8408234 - Flags: review?(bgirard)
tracking-fennec: ? → 30+
So after investigating some more I came to the conclusion that there are really only four options - two of which are bad one is ugly. I'll resummarize the problem as I understand it first. Consider the page at https://people.mozilla.org/~kgupta/griddiv.html. While scrolling the subframe, the layer tree looks something like this:

ClientLayerManager (0x88393ec0)
  ClientContainerLayer (0x89cc0c00) ... [metrics={ ... scrollId=0 }]
    ClientContainerLayer (0x85162c00) ... [metrics={ ... scrollId=5 }]
      ClientThebesLayer (0x856b0c00) [transform=[ 1 0; 0 1; 0 -325; ]] ...
      ClientThebesLayer (0x8ad99000) [transform=[ 1 0; 0 1; -108 -570; ]] ...
      ClientThebesLayer (0x8ad99800) [transform=[ 1 0; 0 1; -108 -570; ]] ...

The scrollId=5 container layer is "primary scrollable layer" that the async transform is applied to. The first ClientThebesLayer is the main document content, and has a CSS transform corresponding to the gecko-side scroll position. The other two ClientThebesLayers are created while the subframe is being actively scrolled (I think one is for the border or background or something, and the other is for the actual text).

Given this layer tree and the current code, all three of the ClientThebesLayer instances are treated the same. Specifically, they are all clipped by the critical displayport of the container layer. The critical displayport, however, moves only with respect to the main document (because Fennec doesn't set a displayport on subframes). So the rendering of the first ClientThebesLayer is ok, but the other two get clipped improperly when their transforms diverge from the transform of the first ClientThebesLayer.

The two bad solutions are (1) disabling tiling entirely via the prefs and (2) the patch that I posted above, which does effectively the same thing (technically slightly different because we get low-res and progressive painting only while the main frame is at scroll position (0,0) because then its transform is the identity). These solutions are bad because they are Fennec regressions.

The ugly solution is to modify my patch above so that low-res and progressive painting still applies to the first ClientThebesLayer and doesn't apply to the other two. Because there isn't really any flag or marker on the layer itself to make this distinction, we'd basically have to use LayerManager::GetPrimaryScrollableFrame (which is what Fennec uses already to find the container layer) and check if the ClientThebesLayer is the first child of that. If it is, we go ahead and allow the tiling code to run. If it isn't, we do an early-abort like the patch above.

The last possible solution, which is perhaps less ugly, is to disable tiling "properly" on subframes in Fennec. Already in B2G we have code that only enables tiling on scrollable frames. This change would then enforce the property "tiling is only enabled on async-scrollable frames". I think tiling doesn't provide much advantage on sync-scrollable frames anyway so this wouldn't be a regression, and it should solve this problem. I haven't yet actually implemented this to see if it works but it should in theory.
I tried a bunch of things, all with the attempt of passing the right value into the call at [1] so that only thebes layers for things with displayports would be "SCROLLABLE". Combined with removing the ifdef B2G at [2], that should have had the desired effect. However I was unable to get the magic combination of flags and state checks to get that working. (For the record what I was trying to do was to add a flag to ContainerState that indicated if the layer was async-scrollable, and was computed based on whether or not the call sites had displayports on their relevant content frames.. but I think there is not a 1:1 mapping between the call sites, ContainerState instances, and ThebesLayers, so the state wasn't quite right).

I'm falling back to the "ugly solution" described above to see if that works, so that at least we'll have a viable fallback that we can land in the short term.

[1] http://mxr.mozilla.org/mozilla-central/source/layout/base/FrameLayerBuilder.cpp?rev=93265ade3bc9#1506
[2] http://mxr.mozilla.org/mozilla-central/source/gfx/layers/client/ClientThebesLayer.cpp?rev=ebcacae1532c#137
Attached patch Patch v2 (obsolete) — Splinter Review
Updated patch to implement the "ugly" solution as described above. Seems to work, although my test case still suffers from the glitching which is described in bug 992218 and friends. I will investigate those issues.
Attachment #8408234 - Attachment is obsolete: true
Attachment #8409685 - Flags: review?(botond)
Attachment #8409685 - Flags: review?(bgirard)
Comment on attachment 8409685 [details] [diff] [review]
Patch v2

Review of attachment 8409685 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/client/ClientTiledThebesLayer.cpp
@@ +86,5 @@
>      return;
>    }
> +#ifdef MOZ_WIDGET_ANDROID
> +  if (Layer* scrollable = ClientManager()->GetPrimaryScrollableLayer()) {
> +    if (scrollable->GetFirstChild() != this) {

I wonder if it would be more robust to consider the first child which is a thebes layer. Maybe BenWa can comment more on the robustness of this hack.

@@ +88,5 @@
> +#ifdef MOZ_WIDGET_ANDROID
> +  if (Layer* scrollable = ClientManager()->GetPrimaryScrollableLayer()) {
> +    if (scrollable->GetFirstChild() != this) {
> +      // Subframes on Fennec are not async scrollable because they have no displayport.
> +      // However, the code in RenderLayer() picks up a displayport from the nearest

nit: I think the code that picks up a displayport from the nearest scrollable ancestor is actually in function (the loop just below computes displayportParent, and then we use its metrics).
Attachment #8409685 - Flags: review?(botond) → review+
Comment on attachment 8409685 [details] [diff] [review]
Patch v2

Review of attachment 8409685 [details] [diff] [review]:
-----------------------------------------------------------------

r- but maybe you can convince me on my second comment and turn this into an r+.

::: gfx/layers/client/ClientTiledThebesLayer.cpp
@@ +86,5 @@
>      return;
>    }
> +#ifdef MOZ_WIDGET_ANDROID
> +  if (Layer* scrollable = ClientManager()->GetPrimaryScrollableLayer()) {
> +    if (scrollable->GetFirstChild() != this) {

I agree. It looks like it would break with a simple difference in the layer tree like a background color layer.

@@ +89,5 @@
> +  if (Layer* scrollable = ClientManager()->GetPrimaryScrollableLayer()) {
> +    if (scrollable->GetFirstChild() != this) {
> +      // Subframes on Fennec are not async scrollable because they have no displayport.
> +      // However, the code in RenderLayer() picks up a displayport from the nearest
> +      // scrollable ancestor container layer anyway, which is incorrect for Fennec. This

> However, the code in RenderLayer() picks up a displayport from
> the nearest scrollable ancestor container layer

Isn't this a real bug then for both platforms? Even with proper sub-APZC we should support have a mix of all sorts of layers (thebes without any sub-APZC, scroll info layer and thebes layer with APZC).

Wouldn't the proper more general fix to fix the code in RenderLayer to not pick up a displayport when there isn't one? This would also remove this nasty ifdef.
Attachment #8409685 - Flags: review?(bgirard) → review-
Attached patch Patch v3Splinter Review
Updated as per our IRC discussion. I'll file a follow-up bug to remove this ifdef block and fix it properly.
Attachment #8409685 - Attachment is obsolete: true
Attachment #8410497 - Flags: review?(botond)
Attachment #8410497 - Flags: review?(bgirard)
Attachment #8410497 - Flags: review?(bgirard) → review+
Comment on attachment 8410497 [details] [diff] [review]
Patch v3

Review of attachment 8410497 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/client/ClientTiledThebesLayer.cpp
@@ +93,5 @@
> +  // behaviour results in the subframe getting clipped improperly and perma-blank areas
> +  // while scrolling the subframe. To work around this, we detect if this layer is
> +  // the primary scrollable layer and disable the tiling behaviour if it is not.
> +  bool isPrimaryScrollableThebesLayer = false;
> +  if (Layer* scrollable = ClientManager()->GetPrimaryScrollableLayer()) {

[suggestion] I think it would enhance the readability of the code to extract a GetPrimaryScrollableThebesLayer() function, and then just test for "this == GetPrimaryScrollableThebesLayer()" here. Given that this is just a temporary workaround, it's up to you if you care to do this.
Attachment #8410497 - Flags: review?(botond) → review+
(In reply to Botond Ballo [:botond] from comment #20)
> [suggestion] I think it would enhance the readability of the code to extract
> a GetPrimaryScrollableThebesLayer() function, and then just test for "this
> == GetPrimaryScrollableThebesLayer()" here. Given that this is just a
> temporary workaround, it's up to you if you care to do this.

Yeah, that would make it more readable but I think since it's temporary code I'd rather have it all in one place.

Try push for android at https://tbpl.mozilla.org/?tree=Try&rev=bf0217299890
Comment on attachment 8410497 [details] [diff] [review]
Patch v3

[Approval Request Comment]
Bug caused by (feature/regressing bug #): bug 983208 and b2g tiling work
User impact if declined: lots of scrollable subframes (iframes, overflow:scroll divs, text inputs, and so on) show up in low-res or just blank
Testing completed (on m-c, etc.): locally
Risk to taking this patch (and alternatives if risky): affects fennec only, and the patch is well contained. it should make things strictly better than they were before, although we might find that there are still some issues unfixed.
String or IDL/UUID changes made by this patch: none

(Since we are close to the merge date, note that this is a request to uplift to 30, which will become beta in a few days)
Attachment #8410497 - Flags: approval-mozilla-aurora?
https://hg.mozilla.org/mozilla-central/rev/bd59c5ef0677
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 31
Attachment #8410497 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
On Nightly build (2014-04-25), table does not disappears but flickers when scrolling horizontally. I used Nexus 5 (Android 4.4.2).
I don't see any flicker on my Nexus 4. Can you attach a video?
Blocks: 996062
No longer depends on: 996062
I can't reproduce this anymore.
Verified as fixed on:
Builds: Nightly (2014-04-27) and Aurora (2014-04-27)
Devices: Nexus 5 (Android 4.4.2) and Galaxy Nexus (Android 4.2.1)
Status: RESOLVED → VERIFIED
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in before you can comment on or make changes to this bug.