Closed Bug 962791 Opened 6 years ago Closed 6 years ago

[B2G][Core][Layout] Cancel button disappears while reseting data with APZ enabled

Categories

(Core :: Layout, defect)

28 Branch
ARM
Gonk (Firefox OS)
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla30
blocking-b2g 1.3+
Tracking Status
firefox27 --- wontfix
firefox28 --- wontfix
firefox29 --- wontfix
firefox30 --- fixed
b2g-v1.2 --- unaffected
b2g-v1.3 --- fixed
b2g-v1.4 --- fixed

People

(Reporter: jschmitt, Assigned: tnikkel)

References

Details

(Keywords: regression, Whiteboard: dogfood1.3, [ETA: 2/5])

Attachments

(13 files, 7 obsolete files)

7.46 MB, video/mpeg
Details
22.84 KB, text/plain
Details
46.18 KB, text/plain
Details
114.14 KB, text/plain
Details
113.69 KB, text/plain
Details
67.87 KB, text/plain
Details
67.87 KB, text/plain
Details
48.29 KB, text/plain
Details
47.95 KB, text/plain
Details
53.66 KB, text/plain
Details
48.05 KB, text/plain
Details
1.73 KB, patch
botond
: review+
kats
: checkin-
Details | Diff | Splinter Review
5.14 KB, patch
kats
: checkin+
Details | Diff | Splinter Review
Attached video Cost_Control.mpeg
Description:
With APZ enabled the cancel button disappears while sliding on the option of the reset data page

Repro Steps:
1) Updated Buri to BuildID: 20140121004137
2) Proceed to settings and turn on APZ in the developer options
3) Load the Cost Control app
4) Proceed to the setting in the app
5) Select 'Reset'
6) Long press 'Reset mobile & wifi data' and scroll up

Actual:
Cancel button disappears from the reset data page

Expected:
The cancel button still visible 

Environmental Variables:
Device: Buri 1.3 MOZ
BuildID: 20140121004137
Gaia: 47049555282a9a01fb60d1e1421b57e2810c96f5
Gecko: 6f7dfe36ab6c
Version: 28.0a2
Firmware Version: V1.2-device.cfg

Notes:
Repro frequency: 100%
See attached: Cost_Control.mpeg video
Attached file log.txt
I have attached a logcat of the issue.

Tested the bug on buri 1.4 and the issue occurs, the cancel button disappears. Buri 1.2 has different functionality and the issue does not occur.

Environmental Variables:
Device: Buri 1.2 COM
BuildID: 20140121004053
Gaia: 539a25e1887b902b8b25038c547048e691bd97f6
Gecko: c9f305c1d9a7
Version: 26.0
RIL Version: 01.02.00.019.102
Firmware Version: V1.2-device.cfg
Blocks: gaia-apzc
blocking-b2g: --- → 1.3?
Component: Gaia::Cost Control → Panning and Zooming
Product: Firefox OS → Core
Version: unspecified → 28 Branch
Keywords: regression
Unsure if this is on the app side or in the Gecko APZ - Botond, can you check?
Assignee: nobody → botond
blocking-b2g: 1.3? → 1.3+
Blocks: 942267
(In reply to Milan Sreckovic [:milan] from comment #2)
> Unsure if this is on the app side or in the Gecko APZ - Botond, can you
> check?

Looks like you can't open the Usage app if you don't have a sim card. I will ask jlin to get me a sim card. In the meantime, if someone knows a way of opening the Usage app without a sim card, please let me know.
I got a sim card from jlin and I can reproduce the bug now.

I haven't nailed down the problem yet, but I noticed something suspicious: the "menu" element that contains the buttons "Reset mobile usage" etc. is scrollable (i.e. it gets a scroll info layer and an APZC) even though I don't think it shouldn't be. When determining whether or not to create a scroll layer item for the scroll frame, the frame's scroll range is reported [1] as 1 app unit, suggesting that it should really be 0 but there is a round-off error somewhere. CC'ing Timothy.

http://mxr.mozilla.org/mozilla-central/source/layout/generic/nsGfxScrollFrame.cpp#2408
(In reply to Botond Ballo [:botond] from comment #4)
> the "menu" element that contains the buttons "Reset mobile usage" etc. is
> scrollable (i.e. it gets a scroll info layer and an APZC) even though I
> don't think it shouldn't be

Er, "I don't think it should be".
I think getting 1 app unit of scroll range is valid. We don't snap layout coordinates until painting, so if the size of the scrolled content just happens to be 1 app unit bigger than the scroll port we will get this result.
I'm continuing to investigate. My efforts are impeded by the fact than I cannot seem to get App Manager to work on the affected screen. I filed bug 963330 for this.
I looked at content dumps before and after the gesture that makes the Cancel button go away, and I can confirm that the button does not disappear from the DOM. This suggests that this is not a Gaia issue, it's a Gecko issue. I'm not sure yet whether it's a Layout or APZ issue.
The "Long press 'Reset mobile & wifi data' and scroll up" step of the STR can be simplified to "do a pan gesture anywhere over the top three buttons". What this does is causes a displayport to be set on the scrollable element (this is the one with a scroll range of just 1 app unit, which shouldn't really be scrollable), which causes a relayerization (though interestingly the layer for the scrollable element remains a scroll info layer), which messes things up.

I'm attaching display list dumps before and after the pan gesture that triggers the relayerization.
Attached file Display list dump after relayerization (obsolete) —
Attachment #8364784 - Flags: feedback?(tnikkel)
Attachment #8364783 - Flags: feedback?(tnikkel)
Comment on attachment 8364784 [details]
Display list dump after relayerization

I think the problem is that the scroll info layer item ends up in the middle of a bunch of scroll layer items. Our current merging and flattening code is written to only expect scrollinfo layer items to be below any scroll layer items for the same scroll frame. In this case it looks like some ::before generated content is wrapped in a scroll layer item that is below the scroll info layer item.

We'll have to see exactly how the frame tree looks and how the display list building codes creates a display list from it to know how to get the scroll info layer item before any scroll layer items for ::before content.
Attachment #8364784 - Flags: feedback?(tnikkel)
(In reply to Botond Ballo [:botond] from comment #9)
> The "Long press 'Reset mobile & wifi data' and scroll up" step of the STR
> can be simplified to "do a pan gesture anywhere over the top three buttons".
> What this does is causes a displayport to be set on the scrollable element
> (this is the one with a scroll range of just 1 app unit, which shouldn't
> really be scrollable), which causes a relayerization (though interestingly
> the layer for the scrollable element remains a scroll info layer), which
> messes things up.

Just to summarize:

I think 1 app unit of scroll range is valid for some pages. Whether there is a bug somewhere that makes us have 1 app unit of scroll range here I can't say.

Having a scroll info layer when a display port set is something that might happen, but we don't want it to happen for any cases we care about. It means we can't async scroll that scrollable element.

But if it does happen we should still operate correctly, just without async scrolling.

So at minimum we should figure out why the rendering is incorrect in this situation, and we should figure how to make this element have a regular scroll layer.
Timothy and Botond, thanks for all the action and investigations on this bug - are you two OK covering this, or is the bug in the area where we need to get others to help?
Flags: needinfo?(tnikkel)
Flags: needinfo?(botond)
(In reply to Milan Sreckovic [:milan] from comment #14)
> Timothy and Botond, thanks for all the action and investigations on this bug
> - are you two OK covering this, or is the bug in the area where we need to
> get others to help?

I'm OK continuing to investigate from my side as I can reproduce the issue. I will need some help from the layout side.
Flags: needinfo?(botond)
(In reply to Timothy Nikkel (:tn) from comment #12)
> We'll have to see exactly how the frame tree looks and how the display list
> building codes creates a display list from it to know how to get the scroll
> info layer item before any scroll layer items for ::before content.

I am attaching frame dumps before and after relayerization.
Attachment #8365111 - Flags: feedback?(tnikkel)
Note that the frame dumps and the display list dumps are from different runs, so the addresses in them won't match. If this is a problem I can generate both again from the same run.
Component: Panning and Zooming → Layout
Attachment #8365336 - Flags: feedback?(tnikkel)
Just wanted to give an update on the progress of the bug. 

I am continuing to investigate this with Timothy's help. The layer structure after relayerization is wrong (the <menu> element, since it now has a displayport, should be a scrollable layer rather than a scroll info layer). This is caused by the display list structure being unexpected (the code expects the ScrollInfoLayer item to be below all the ScrollLayer items, but it's not), which is in turn likely caused by one of the frames inside the menu being out-of-flow. Likely the display list building code will need to be updated to preserve the invariant that a ScrollInfoLayer ends up below all ScrollLayers even in this situation, but more debugging is needed to see how this invariant is broken and how that can be fixed. 

Progress on this is impeded by the fact that we are unable to debug this app in App Manager or Gaia-in-Firefox, so it's very difficult to tell what computed style various elements have that might be causing the frame to be out-of-flow. If anyone on the Gaia team has any insights on how we can look at this app in Inspector, please let me know.

It's not clear yet whether the wrong layer structure is causing the button to disappear, or if that's an independent problem.
(In reply to Milan Sreckovic [:milan] from comment #14)
> Timothy and Botond, thanks for all the action and investigations on this bug
> - are you two OK covering this, or is the bug in the area where we need to
> get others to help?

I think we got this for now, if that changes we'll let you know.
Flags: needinfo?(tnikkel)
I *think* it's this style which is being applied and causing the scroll layer item below the scroll info layer item:
https://github.com/mozilla-b2g/gaia/blob/190fadcb06f6d9e4a50946a743eea9f5dbd10b4d/shared/style/action_menu.css#L296

The fact that is has z-index: -1 means that it will probably get sorted below the scroll info layer (which will probably be in the normal stacking context plain at z-index 0).

So we will either have to figure out how to get the scroll info layer item below negative z-index items at the stacking context level, or write a more robust merging and flattening loop in nsDisplayList::ComputeVisibilityForSublist that can handle this structure.
There is also the matter that things should still be rendering properly even with a scroll info layer.

Botond, could you write a quick hack in nsDisplayList::ComputeVisibilityForSublist that manually forces scroll info layer items to be below scroll layer items in the array by scanning and swapping?
Attachment #8365111 - Flags: feedback?(tnikkel)
Attachment #8365336 - Flags: feedback?(tnikkel)
Actually negative zindex items are still in front of background border items, so there must be more to it still.
Hi all. First, answering the mail of Botond Ballo.

> Nor am I able to load the Cost Control app when running Gaia in desktop Firefox. (Maybe this is related to the
> app requiring a SIM card to be present when launched on the phone?)

Nope, it is not possible to run in Gecko Desktop as, AFAIK, it depends on some not implemented APIs.

> Are you able to debug the Cost Control app in App Manager? When I try, I'm either unable to debug it completely 
> (i.e. pressing the Debug button has no effect), or I can get to the Inspector screen but the DOM that is shown 
> does not look like what I see on the phone screen, and it does not update as I interact with the app. Also the 
> "select element with mouse" feature does not work. (I filed bug 963330 for this, by the way.)

Thank you for filling the bug. I don't know why App Manager is not working for the  Cost Control application. What I do for this cases is to isolate the component in another sample as it is a shared building block component (see bug 908100 to see an example). This allows us to have a very isolated version of the bug for Gecko debugging and it could solve your problems with App Manager or even be another test case for bug 96330.

Finally, I'm rewording the issue to retag as layout error.
Summary: [B2G][Cost Control] Cancel button disappears while reseting data with APZ enabled → [B2G][Core][Layout] Cancel button disappears while reseting data with APZ enabled
(In reply to Salvador de la Puente González [:salva] from comment #27)
> What I do for this cases is to isolate
> the component in another sample as it is a shared building block component
> (see bug 908100 to see an example). This allows us to have a very isolated
> version of the bug for Gecko debugging and it could solve your problems with
> App Manager or even be another test case for bug 96330.

That would be great. It doesn't even have to be as reduced as that example. Something you can use without a sim card would be a great improvement. Even a complicated testcase you can load on desktop would be excellent.
(In reply to Timothy Nikkel (:tn) from comment #25)
> There is also the matter that things should still be rendering properly even
> with a scroll info layer.
> 
> Botond, could you write a quick hack in
> nsDisplayList::ComputeVisibilityForSublist that manually forces scroll info
> layer items to be below scroll layer items in the array by scanning and
> swapping?

I wrote such a hack, and as a result a scrollable layer now gets created when the displayport is set.

However, the button still disappears, suggesting that that's an independent problem.
It looks like the scrollable layer's visible region is too small - it is large enough for the three buttons which are visible, but not for the Cancel button.

Attached is a display list dump after the displayport is set, with the hack from comment #29 in place. Note that in the "after optimization" phase of the dump, the display item for the "Cancel" button is a child of the ScrollLayer display item, suggesting that it should be drawn on the scrollable layer created by this display item.

I notice that, in the "after optimization" phase of the dump, the visible region of the display item for the "Cancel" button is (0,0,0,0), while that is not the case for the other buttons. This is likely to be the proximate cause of the problem.
Regenerated "display list dump after relayerization" with later m-c that includes extra information (the scroll frame associated with each ScrollLayer item).
Attachment #8364784 - Attachment is obsolete: true
Whoops, the previous one was incorrect.
Attachment #8366225 - Attachment is obsolete: true
Attached patch Patch that fixes the problem (obsolete) — Splinter Review
After further debugging with Timothy, it seems the problem is that the Cancel button is position:absolute, with its containing block being the <section> element which is an ancestor of the scrollable <menu> element. This means the Cancel button should not scroll with the contents of the <menu> element, and the clip that is applied to the scrolled content should not apply to it. The problem is we currently are applying this clip, and that's cutting off the Cancel button.

A potential solution suggested by Timothy is to avoid wrapping a display item for a position:absolute element into a ScrollLayer item if the child item's containing block is outside the scrolled content. Attached is a patch that implements this. With this patch applied, the Cancel button no longer disappears.

Try run with this patch: https://tbpl.mozilla.org/?tree=Try&rev=093cd8d755a6
Attachment #8366343 - Flags: feedback?(tnikkel)
Matt, did you have a chance to look at this yesterday?  Rob, can you take a peak as well?
Flags: needinfo?(roc)
Flags: needinfo?(matt.woodrow)
Comment on attachment 8366343 [details] [diff] [review]
Patch that fixes the problem

Looks good, but I had enough of a hand in this patch that roc should review.
Attachment #8366343 - Flags: review?(roc)
Attachment #8366343 - Flags: feedback?(tnikkel)
Attachment #8366343 - Flags: feedback+
Comment on attachment 8366343 [details] [diff] [review]
Patch that fixes the problem

Review of attachment 8366343 [details] [diff] [review]:
-----------------------------------------------------------------

Driving by.

::: layout/generic/nsGfxScrollFrame.cpp
@@ +2192,5 @@
> +      return aItem;
> +    } else {
> +      SetCount(++mCount);
> +      return new (aBuilder) nsDisplayScrollLayer(aBuilder, aItem, aItem->Frame(), mScrolledFrame, mScrollFrame);
> +    }

The test is almost getting complicated enough that it may be useful to break it down?

bool shouldBeScrolled = .....
if (shouldBeScrolled){
  ...
} else {
  ...
}


Also, I think the style guide says:

if (a &&
    b)
{

rather then

if (a
    && b) {
Attached patch Patch that fixes the problem (obsolete) — Splinter Review
Updated patch to address Milan's review comments.

(In reply to Milan Sreckovic [:milan] from comment #39)
> Also, I think the style guide says:
> 
> if (a &&
>     b)
> {
> 
> rather then
> 
> if (a
>     && b) {

You're right. It says to put && and || at the end of a line but other operators at the beginning. I probably violated this rule in some previous commits; my apologies. (In my defense, I think it's a bit counterintuitive to treat && and || differently from other operators.)
Attachment #8366343 - Attachment is obsolete: true
Attachment #8366343 - Flags: review?(roc)
Attachment #8366846 - Flags: review?(roc)
Comment on attachment 8366846 [details] [diff] [review]
Patch that fixes the problem

Review of attachment 8366846 [details] [diff] [review]:
-----------------------------------------------------------------

::: layout/generic/nsGfxScrollFrame.cpp
@@ +2187,5 @@
> +    // should only scroll with the scrolled content if its containing block
> +    // is within the scrolled content.
> +    bool shouldWrap = !aItem->Frame()->IsAbsolutelyPositioned() ||
> +                      aItem->Frame()->GetContainingBlock() == mScrolledFrame ||
> +                      nsLayoutUtils::IsProperAncestorFrame(mScrolledFrame, aItem->Frame()->GetContainingBlock(), nullptr);

I think we can just check nsLayoutUtils::IsProperAncestorFrame(mScrolledFrame, aItem->Frame()) and avoid the containing block stuff. We can still check aItem->Frame()->IsAbsolutelyPositioned(), for speed.
Attachment #8366846 - Flags: review?(roc) → review-
Attached patch Patch that fixes the problem (obsolete) — Splinter Review
Updated patch to address roc's review comments.
Attachment #8366846 - Attachment is obsolete: true
Attachment #8366925 - Flags: review?(roc)
The patch has now been reviewed and has a clean Try push. It is ready to land.

Unfortunately both b2g-i and m-i are closed. I will land the patch when they reopen.
Keywords: checkin-needed
Updated commit message to change "r=tn" to "r=roc". Carrying r+.
Attachment #8366925 - Attachment is obsolete: true
Attachment #8366939 - Flags: review+
https://hg.mozilla.org/mozilla-central/rev/4f0d8916386e
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla29
Depends on: 966282
Depends on: 966258
With the regressions this bug has caused on Android, I'd like to figure out if we need to backout or conditionally build for b2g.

We really can't ship the regressions on Android. I'd like to avoid them on Beta, before they sneak out into final Release.
Attached patch Backout patch (obsolete) — Splinter Review
We need to back this patch out of Aurora as it breaks panning and zooming on several websites and we don't want this to reach Beta.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): patch for this bug
User impact if declined: panning and zooming working poorly or not at all on several websites
Testing completed (on m-c, etc.): this is a backout, patch was demonstrated to be the cause of regressions
Risk to taking this patch (and alternatives if risky): low
String or IDL/UUID changes made by this patch: none
Attachment #8368952 - Flags: approval-mozilla-aurora?
(In reply to Botond Ballo [:botond] from comment #49)
> We need to back this patch out of Aurora as it breaks panning and zooming on
> several websites and we don't want this to reach Beta.

And therefore we'll need a revised fix for this bug. Reopening.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(In reply to Botond Ballo [:botond] from comment #50)
> (In reply to Botond Ballo [:botond] from comment #49)
> > We need to back this patch out of Aurora as it breaks panning and zooming on
> > several websites and we don't want this to reach Beta.
> 
> And therefore we'll need a revised fix for this bug. Reopening.

Can we just back this out on all branches if we already know the regression rate high? I'd rather not keep trunk busted in the process with the regressions.
https://hg.mozilla.org/releases/mozilla-aurora/rev/6a87e8247640

I went ahead and pushed this a=backout given the consensus from the email thread that this needed to be.
Merge of backout:
https://hg.mozilla.org/mozilla-central/rev/00b86eca0baf

Should be in tomorrow's trunk/aurora nightlies.
I think it's clear now that we have to fix ScrollLayer stuff so we can always smooth-scroll even if the scrolling content must fragment into multiple layers. We can do that by extending the APZC layers attributes to associate a set of layers with a scrollable element. It might not even be that hard.

The question is what to do for B2G 1.3. Would it make sense to disable APZC for non-viewport elements in 1.3?
Attachment #8368952 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Incorrectly wrapping fixed position items seems to work surprisingly well using a testcase where there is content both above and below it. The async scrolling worked fine but there are the expected visual problems, but they seemed surprisingly minor.

The incorrect clipping noticed in this bug on non-viewport elements might be the worst problem. We could deal with that in the short term by failing to create a scrollable layer if we have abs pos content that will be incorrectly clipped, and then subsequently not propagating the incorrect clip to abs pos content when we flatten its containing scroll layer item. Gross, but we've been broken on this for a long time and only noticed now.
Comment on attachment 8366939 [details] [diff] [review]
Patch that fixes the problem

checkin- to indicate this was backed out.
Attachment #8366939 - Flags: checkin-
I'm planning to implement comment 56 to fix this specific problem for 1.3.
QA Contact: tnikkel
Assignee: botond → tnikkel
QA Contact: tnikkel
Thanks for the update guys.  We need an ETA here.  When does this close?
This fixes the issue in a testcase I constructed to replicate the issue in the cost control app for me.
Attachment #8369861 - Flags: review?(roc)
(In reply to Faramarz (:faramarz) from comment #59)
> Thanks for the update guys.  We need an ETA here.  When does this close?

Whenever the patch lands I guess.
Some specific configurations needed nsIFrameInlines.h to be included for IsAbsolutelyPositioned
https://hg.mozilla.org/integration/mozilla-inbound/rev/91971a2ab3bb
Combined the include fix up for easier uplifting.
Attachment #8369861 - Attachment is obsolete: true
https://hg.mozilla.org/mozilla-central/rev/96f918d5006d
https://hg.mozilla.org/mozilla-central/rev/91971a2ab3bb
Status: REOPENED → RESOLVED
Closed: 6 years ago6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla30
Comment on attachment 8369880 [details] [diff] [review]
fail to create a scrollable layer if it will clip abs pos items incorrectly (combined patch, commit msg, and review)

Just to be clear, this patch is the one that needs uplifting.
Attachment #8369880 - Flags: checkin+
Whiteboard: dogfood1.3 → dogfood1.3, [ETA: 2/5]
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #55)
> I think it's clear now that we have to fix ScrollLayer stuff so we can
> always smooth-scroll even if the scrolling content must fragment into
> multiple layers. We can do that by extending the APZC layers attributes to
> associate a set of layers with a scrollable element. It might not even be
> that hard.

I filed bug 967844 for this.
You need to log in before you can comment on or make changes to this bug.