Closed Bug 1502638 Opened 6 years ago Closed 6 years ago

Fennec fling redux:- fling bounce or blank screen on some flings

Categories

(Firefox for Android Graveyard :: Toolbar, defect)

Firefox 61
defect
Not set
normal

Tracking

(firefox63 unaffected, firefox64 fixed, firefox65 fixed)

RESOLVED FIXED
Firefox 65
Tracking Status
firefox63 --- unaffected
firefox64 --- fixed
firefox65 --- fixed

People

(Reporter: mark.paxman99, Assigned: botond)

Details

(Keywords: regression)

Attachments

(7 files)

Attached video 20181027_135712.mp4
User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:65.0) Gecko/20100101 Firefox/65.0

Steps to reproduce:

I think this regression happened around Nightly 2018-10-26 so might be related to the fix for bug 1499941. Or not.

I'm using Nightly 2018-10-26

Make sure the toolbar is set to auto hide (enable Full Screen browsing in Settings > General)

I'm using uk.mobile.reuters.com but have seen it on arstechnica.com too

At the top of the page, with the toolbar fully visible

Start a fling by dragging slowly upwards until the toolbar just starts to move. Complete the fling by flicking upwards briskly and lifting your finger off the screen
- sometimes the screen goes blank (fix it by dragging back downwards)
- sometimes there's a bounce.

There's a knack to it :)

Very occasionally I see the same blank screen when drag scrolling on other parts of the document, but it's very very rare, I don't know how to replicate. 

On arstechnica.com the screen doesn't go blank it goes to a grayscale dark at the top light at the bottom so I think it's a page element of some kind.

Video attached
I see it on my old Snapdragon 425 phone too on uk.mobile.reuters.com . The screen only goes blank perhaps 1 time in 20, most of the time it bounces or stops. You have to be very persistent to get the blank screen.
Attached video 20181027_165943.mp4
Another video. Another way to reproduce:- scroll to middle of document. Make sure toolbar is hidden. Slowly drag down until most (but not all) of toolbar is visible. Without lifting finger, do a very short flick upwards & release finger to start the fling. Result shown in video:- first fling up, velocity reversal. Second fling up: correct behaviour. Third fling up: "blank screen".

I think it's similar to last time, a fling where all the touch events take place while the toolbar is in motion. I also think it's a velocity error, the bug causes a velocity with the incorrect sign.

The "blank screen" is actually the document's background (I'm sure there's a more technical term). e.g. arstechnica.com seems to have that grayscale as a background. So I wonder if this bug sometimes causes the scroll offset to go outside the document e.g. a very big or negative scroll offset? Or maybe the scroll velocity becomes unreasonably large? I'm not sure because any drag from the "blank screen" returns me to where I expected to be on the document.

Anyway, I'm sure you're delighted to read all this. Have a great weekend ;)
I'm not having any luck reproing this.

The "blank screen" could just be checkerboarding (showing an area of the screen we haven't painted yet, usually because we moved there faster than we can paint it). If you pause in that state for a few seconds, does content appear? What about if you tap the page without scrolling in that state?
Pants 

I don't think it's checkerboarding because the grayscale background appears on arstechnica.com. It doesn't clear, I'm looking at it now, after 30 secs it's still there. A tap brings back the correct page at more or less the position from where I started the fling. Sometimes I see the content and the gray alternate for a split second gray-content-gray.  I see it on both my Sony Xperia (Oreo) and my BQ Aquaris (Nougat). I just upgraded to today's Nightly and still there.

Try dragging up until the toolbar is juuuuuust starting to move out of the way then do the tiniest flick upwards. You really need to get your finger off the screen v quick. I can get the scroll reversal pretty much every time now :) but the blank screen takes persistence :) :)
I turned on the minimap using apz.minimap.enabled true and then flung until I got my grayscale screen. The minimap shows a very funny result, most of the document (green) is gone and a black border shows. The viewport (purple) does seem to be outside what's left of the document (green) and the displayport (yellow) is not to be seen. I tried a few times and the the purple viewport always seems to be directly below the bottom of what's left of the green document.

screenshot 1 shows this, mostly a black border but parts of the green document can be seen above the purple viewport which is directly adjacent but (maybe) outside it. Displayport not visible at all.

screenshot 2 shows what happens when I tap the screen from the grayscale without dragging, the purple viewport doesn't move but the document seems to sort itself out and the displayport comes back.

When I have the grayscale, if I rotate my phone I see a flash of content and then the gray comes back as Fennec re-orients. I can re-orient indefinitely and this happens over and over.

I do just wonder if we get a very high velocity or NaN which shoots the viewport off the bottom of the document?

I have Nightly 2018-10-22 running on another phone which IIRC does not have the fix for bug 1499941 and I couldn't replicate this behaviour so I think it's new since bug 1499941 was fixed.
Attached image screenshot 1.png
Attached image screenshot 2.png
I'm still looking into this with some frame by frame stuff but making little headway. 

If I set apz.velocity_relevance_time_ms to 16 or 32 and restart I see the same problem in normal scrolling - scrolls often go opposite to my fling, and I get the gray screen once in a while. Even if I turn the dynamic toolbar off (toolbar always visible).

So I think the problem I am seeing in normal use is to do with what's in the velocity queue.

IIUC correctly, while the toolbar is in motion and my finger is on the screen (drag), the toolbar is "in charge" of the motion of the content, it intercepts the touch events, moves itself and tells the content to move in step. Basically the content is attached to the toolbar. But when my finger comes off the screen (start of fling) the motion separates:- the toolbar continues to move at [velocity?] while the content starts to fling using the Chrome physics.

Is the first step of the Chrome physics is to do a least squares fit to a touch sample history to get an initial velocity?

If so does it use apz.velocity_relevance_time_ms worth of samples for that fit?

Or what? https://bugzilla.mozilla.org/show_bug.cgi?id=1432019#c73 talks about the toolbar animation generating the velocity?

What would happen if some of the samples in the touch history were corrupt? 

I'm just wondering if I'm seeing a problem due to the race between vsync and touch event, see https://bugzilla.mozilla.org/show_bug.cgi?id=1432019#c73. Sometimes during a drag with the toolbar animating, the vsync will fire and start the whole toolbar next frame, but the new touch event will come in too late. What goes into the sample history in that case?
Yeah the problem I see is like setting apz.velocity_relevance_time_ms to a short value. 16 is awful, 32 is pretty bad. Set it to 16 and do short flings up and down, basically quickly reverse the fling direction while a fling is going on. I expect you'll see fling velocity sign errors and occasionally the "blank screen".

If I turn the Chrome physics off using apz.android.chrome_fling_physics.enabled FALSE while keeping apz.velocity_relevance_time_ms 16 the problems go away, no fling velocity sign errors, no blank screen.

So that all that makes me think of the polynomial least squares fit in AndroidVelocityTracker::ComputeVelocity.

I wonder if, when I am dragging and the toolbar is moving, the (touch event??) history sometimes acquires some bad values. When the drag turns into a fling the polynomial least squares fit fails, like it seems to do when I set apz.velocity_relevance_time_ms to a foolishly short value.

I don't know where those bad values might come from.  I'm less sure the stuff I said in comment 8 about the race between vsysnc and touch event is relevant. I see pretty much the same problem moving from hardware to software vsync @ 60 fps. Software vsync @ 60 fps mostly fixes the race in bug 1432019.
Thanks, the suggestion to increase the repro rate by decreasing apz.velocity_relevance_time_ms was very hepful! I was able to reproduce the symptoms fairly readily with that.

(In reply to Mark from comment #8)
> Is the first step of the Chrome physics is to do a least squares fit to a
> touch sample history to get an initial velocity?

Yep.

> If so does it use apz.velocity_relevance_time_ms worth of samples for that
> fit?

Yep.

> What would happen if some of the samples in the touch history were corrupt? 

I don't think the issue is that they're corrupt, but rather that there are too few of them. We generally get one sample per vsync interval, so a relevance time of 16 ms means most of the time we only have one sample. Can't really compute a useful velocity from one position sample.
Assignee: nobody → botond
All the issues seem to be related to edge cases when the history contains only 1 or 2 samples.

The "bounce" occurs when there is only 1 sample. I tried to fix this in bug 1498329 by falling back to the "instantaneous velocity" computed by AddPosition(), but I failed to realize that AddPosition() will also not compute a velocity if there is only 1 sample. We basically end up using a stale velocity value in this case, which is sometimes in the opposite direction. On the other hand, we are basically discarding a sample by not making use of the position passed in to StartTracking(). So the fix for this is: (1) get an extra sample out of the position passed in to StartTracking(), like SimpleVelocityTracker does; and (2) if there truly is only one sample to work with, just don't fling.

The "blank screen" occurs when there are 2 samples with the same timestamp. The computed velocity is infinite (or sometimes NaN, if the position values are also the same), and that causes things to go bad. The fix for this is to add a guard for two samples with the same timestamp (which again SimpleVelocityTracker already has).
The previous value of the axis velocity could be stale, e.g. in the opposite
direction.

Depends on D10449
Pushed by bballo@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/ef8a9e120272
Make use of the position passed in to AndroidVelocityTracker::StartTracking(). r=kats
https://hg.mozilla.org/integration/autoland/rev/e8ab07c1c98f
Guard against two samples with the same timestamp in AndroidVelocityTracker. r=kats
https://hg.mozilla.org/integration/autoland/rev/e05552012e19
If the velocity tracker can't compute a velocity, zero out the axis velocity. r=kats
https://hg.mozilla.org/mozilla-central/rev/ef8a9e120272
https://hg.mozilla.org/mozilla-central/rev/e8ab07c1c98f
https://hg.mozilla.org/mozilla-central/rev/e05552012e19
Status: UNCONFIRMED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 65
This seems fixed - thanks! The worst that happens is the fling doesn't occur and the content stays where it is. Which is fine.

Of course it begs the question: why is the touch sample history often very short when the toobar is animating? But I expect that's a can of worms not worth opening :)

Cheers
Comment on attachment 9021711 [details]
Bug 1502638 - Make use of the position passed in to AndroidVelocityTracker::StartTracking(). r?kats

[Beta/Release Uplift Approval Request]

Feature/Bug causing the regression: Bug 1457586

User impact if declined: Sometimes flings can go in the wrong direction (creating the appearance of a "bounce").
Sometimes flings can get you into a bad state where nothing is rendered until further user interaction.
These issues are more prevalent when the fling starts while the dynamic toolbar is animating.

Is this code covered by automated tests?: No

Has the fix been verified in Nightly?: Yes

Needs manual test from QE?: No

If yes, steps to reproduce: 

List of other uplifts needed: None

Risk to taking this patch: Low

Why is the change risky/not risky? (and alternatives if risky): The changes are straightforward and well-understood.

String changes made/needed:
Attachment #9021711 - Flags: approval-mozilla-beta?
^ This uplift request applies to all three patches.
Comment on attachment 9021711 [details]
Bug 1502638 - Make use of the position passed in to AndroidVelocityTracker::StartTracking(). r?kats

android apz fix, approved for 64.0b9
Attachment #9021711 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #9021713 - Flags: approval-mozilla-beta+
Attachment #9021714 - Flags: approval-mozilla-beta+
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: