Closed Bug 980679 Opened 11 years ago Closed 11 years ago

[B2G][Youtube] The screen will jump up and down when scrolling through the Youtube app

Categories

(Core :: Panning and Zooming, defect)

28 Branch
ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla31
blocking-b2g 1.3+
Tracking Status
firefox28 --- wontfix
firefox29 --- wontfix
firefox30 --- fixed
firefox31 --- fixed
b2g-v1.3 --- fixed
b2g-v1.3T --- verified
b2g-v1.4 --- verified
b2g-v2.0 --- verified
b2g-master --- verified

People

(Reporter: KTucker, Assigned: kats)

References

Details

(Keywords: regression, Whiteboard: [fxos-bug-bash-1.4])

Attachments

(3 files, 3 obsolete files)

Attached file youtubelog.txt
Description:
If the user scrolls the Youtube app as soon as they open it, the screen will jump up and down.

Repro Steps:
1)  Updated Buri to Build ID: 20140306040204
2)  Install the Youtube app from the Marketplace.
3)  Open the Youtube app.
4)  As soon as the app loads, try to scroll through the videos.

Actual:
The screen will appear to be jumping up and down as the user scrolls down through the Youtube app.

Expected:
The screen scrolls smoothly without issue. 

Environmental Variables
Device: Buri v 1.4.0 Mozilla RIL
Build ID: 20140306040204
Gecko: https://hg.mozilla.org/mozilla-central/rev/8122ffa9e1aa
Gaia: 9cb35e701df44766d9b3560b0defe0a401a0ecdd
Platform Version: 30.0a1
Firmware Version: v1.2-device.cfg

Notes:
Repro frequency: 100%
See attached: videoclip, logcat
This issue also occurs on the Buri v 1.3.0 Mozilla RIL

1.3 Environmental Variables:
Device: Buri 1.3 MOZ
BuildID: 20140306004002
Gaia: 8aed4fafbaeb86d6884d31ce7d3cbeb87bcbf837
Gecko: 3d2d84d52141
Version: 28.0
Firmware Version: v1.2-device.cfg

The screen jumps up and down when scrolling the Youtube app.
Attached video Youtubevideo.mp4
Component: Preinstalled B2G Apps → General
Keywords: qawanted
Product: Tech Evangelism → Firefox OS
Version: Trunk → unspecified
Does this happen on 1.1?
(In reply to Jason Smith [:jsmith] from comment #3)
> Does this happen on 1.1?

Following the STR in comment #0, the page will flash white then will jump to where the user would have scrolled normally while the page is loading. After the page fully loads the user can scroll smoothly.

Environmental Variables:
Device: Buri 1.1 MOZ
BuildID: 20140306041201
Gaia: 44a2ddf63373f8e95c784faf4ed4d60081699c61
Gecko: 1421a6b7fc51
Version: 18.0
Firmware Version: V1.2-device.cfg
Keywords: qawanted
blocking-b2g: --- → 1.3?
Component: General → Panning and Zooming
Product: Firefox OS → Core
Version: unspecified → 28 Branch
Blocks: gaia-apzc
1.3+ for youtube
blocking-b2g: 1.3? → 1.3+
Assignee: nobody → bugmail.mozilla
From my initial investigation on this bug it seems like what happens is that the content does a scrollTo(0,1). Layout sends a layers update to the APZC with mUpdateScrollOffset=true, and continues to do so until the scroll offset update is acknowledged. The APZ code dispatches such an update, but it takes a while to actually get processed because it has to do an IPC jump and also Gecko is busy with doing other things. Until the scroll offset update is acknowledged, every layers update that is received by the APZ (and there's a bunch of them, because the page is loading) has the mUpdateScrollOffset flag set to true, and so all of these layer updates keep clobbering the async scroll offset. Therefore, as the user scrolls, the scroll position keeps getting reset back to (0,1) causing the visual jumps.

Not sure what the right fix here is. There might be something else at play here too because it feels like the latency for processing the acknowledgement is much larger than I would expect.
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #6)
> There might be something else at play
> here too because it feels like the latency for processing the
> acknowledgement is much larger than I would expect.

Nothing else at play, it seems. Just a really big latency with parent -> child messaging.
Also for the record in the test I just ran the message is sent from the APZC on the parent side at 16:02:07.349 and isn't received in TabChild until 16:02:09.109, which is 1.760 seconds (according to logcat timestamps). During this time APZC receives another 19 NotifyLayersUpdated calls with mUpdateScrollOffset=true; it receives an additional 5 for inflight calls dispatched before the acknowledgement is processed.
QA Contact: bzumwalt
Last working Build:
Environmental Variables:
Device: Buri v1.3 Mozilla RIL
BuildID: 20140110004009
Gaia: c606b129a2d1647c0fc7bfb197555026e9b27f9e
Gecko: c5109884ae3a
Version: 28.0a2
Firmware Version: v1.2-device.cfg


First Broken Build:
Environmental Variables:
Device: Buri v1.3 Mozilla RIL
BuildID: 20140111004005
Gaia: 1f029aa4c924d6c59fd7e8c6d9786a7370755d49
Gecko: 89c68fcb1448
Version: 28.0a2
Firmware Version: v1.2-device.cfg


Last Working Gaia/First Broken Gecko: Issue DOES Reproduce
Gaia: c606b129a2d1647c0fc7bfb197555026e9b27f9e
Gecko: 89c68fcb1448

First Broken Gaia/Last Working Gecko: Issue Does NOT Reproduce
Gaia: 1f029aa4c924d6c59fd7e8c6d9786a7370755d49
Gecko: c5109884ae3a

http://hg.mozilla.org/releases/mozilla-aurora/pushloghtml?fromchange=c5109884ae3a&tochange=89c68fcb1448

Appears to be Gecko issue. Unable to narrow regression window further due to lack of Tinderbox and Inbound windows for this date range.
Bug 949132 is in that range, which fits based on what I described in comments 6-8.
Blocks: 949132
Maybe if we did the scroll offset update acknowledgement from the refresh driver it would get processed sooner on the content side?
Can you elaborate on that? I'm not sure when the refresh driver would do the acknowledgement.
Flags: needinfo?(tnikkel)
One possible solution I think is to use a tri-state flag. State 1 would be when some non-apz code has set the scroll offset (this corresponds to originOfLastScroll != nullptr && originOfLastScroll != nsGkAtoms::apz). State 2 would be after RecordFrameMetrics has been called while in State 1 (that is, the mUpdateScrollOffset flag has been sent to the APZ side). During State 2 the code in APZCCallbackHelper would still not set scroll offset updates fromthe APZ. We would move into State 3 once the acknowledgement arrives, and originOfLastScroll is reset back to null.

This would prevent RecordFrameMetrics from sending multiple mUpdateScrollOffset=true metrics to the APZ side, which would fix this problem. However it should not regress any existing behaviour because the APZ code still can't set a scroll offset until the acknowledgement is processed.
Actually putting the tri-state flag in the RecordFrameMetrics code doesn't seem to work, because each call to RecordFrameMetrics can still be associated with multiple layer transactions (not sure why exactly). Even with tiling disabled I see multiple NotifyLayersUpdated with mUpdateScrollOffset=true even though there was only one RecordFrameMetrics that set it.
Since we can get multiple NotifyLayersUpdated calls where aLayerMetrics.mUpdateScrollOffset is true (but they all have the same scroll generation) we can deduplicate the scroll update using the generation counter. This prevents us from constantly clobbering the APZ's frame metrics (we will only do it once per scroll generation) and also reduces the number of acknowledgement messages we send back to the child process (one per generation rather than one per NotifyLayersUpdated).
Attachment #8389965 - Flags: review?(botond)
Flags: needinfo?(tnikkel)
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #12)
> Can you elaborate on that? I'm not sure when the refresh driver would do the
> acknowledgement.

I guess we need to back up a little. How does the scroll update acknowledgement get from the APZC thread to layout?
(In reply to Timothy Nikkel (:tn) from comment #16)
> I guess we need to back up a little. How does the scroll update
> acknowledgement get from the APZC thread to layout?

The acknowledgement originates at [1] which is on the compositor thread in the parent process. That call jumps over to the UI thread of the parent process at [2], and is sent to the child process at [3]. The gap between this send call and the corresponding receive call at [4] is where the latency is. I don't think we can get it to the child process any faster if the child process is busy. The only thing we could is make it a higher priority message so that it somehow jumps over whatever else is waiting in the child process' event queue and gets processed first, but that's probably a bad idea.

[1] http://mxr.mozilla.org/mozilla-central/source/gfx/layers/ipc/AsyncPanZoomController.cpp?rev=fb97d6af06af#1780
[2] http://mxr.mozilla.org/mozilla-central/source/layout/ipc/RenderFrameParent.cpp?rev=aae54a60278d#530
[3] http://mxr.mozilla.org/mozilla-central/source/dom/ipc/TabParent.cpp?rev=aae54a60278d#515
[4] http://mxr.mozilla.org/mozilla-central/source/dom/ipc/TabChild.cpp?rev=aae54a60278d#1500
Comment on attachment 8389965 [details] [diff] [review]
Filter duplicate scroll update messages in the APZ

Review of attachment 8389965 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM.

I also took this opportunity to have a look at the patch that introduced scroll generations to begin with, which I hadn't reviewed but have been meaning to. Very nice overall!

::: layout/generic/nsGfxScrollFrame.cpp
@@ +1551,5 @@
>    , mResizerBox(nullptr)
>    , mOuter(aOuter)
>    , mAsyncScroll(nullptr)
>    , mOriginOfLastScroll(nsGkAtoms::other)
> +  , mScrollGeneration(1)

Please add a comment explaining why this starts at 1 instead of 0.
Attachment #8389965 - Flags: review?(botond) → review+
(In reply to Botond Ballo [:botond] from comment #18)
> Please add a comment explaining why this starts at 1 instead of 0.

Done and landed:

https://hg.mozilla.org/integration/mozilla-inbound/rev/8764e9f061d0
https://hg.mozilla.org/mozilla-central/rev/8764e9f061d0
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla30
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #17)

Ok, that's not where I thought the latency was.
Since kats is on PTO, can you please request approval for b2g28 when this is ready? Thanks :)
Flags: needinfo?(tnikkel)
Botond reviewed this, so I'll bump the needinfo over to him.
Flags: needinfo?(tnikkel) → needinfo?(botond)
Comment on attachment 8389965 [details] [diff] [review]
Filter duplicate scroll update messages in the APZ

NOTE: This flag is now for security issues only. Please see https://wiki.mozilla.org/Release_Management/B2G_Landing to better understand the B2G approval process and landings.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): Bug 963278
User impact if declined: Jumps in scrolling while a page is loading
Testing completed: I just tested it locally. Presumably Kats tested it as well.
Risk to taking this patch (and alternatives if risky): It's no more risky than the bug 963278 patch whose regression it fixes, which was approved for uplift in bug 980041. The alternative would be backing out the patches uplifted in bug 980041, which would regress the bugs that those patches fixed.
String or UUID changes made by this patch: None
Attachment #8389965 - Flags: approval-mozilla-b2g28?
Flags: needinfo?(botond)
Attachment #8389965 - Flags: approval-mozilla-b2g28? → approval-mozilla-b2g28+
Doesn't apply cleanly to b2g28.
Flags: needinfo?(bugmail.mozilla)
Attached patch Patch for uplifting to 1.3 (obsolete) — Splinter Review
Rebased the patch to apply to 1.3.
Flags: needinfo?(bugmail.mozilla)
Depends on: 985529
Comment on attachment 8389965 [details] [diff] [review]
Filter duplicate scroll update messages in the APZ

Moving a- - this caused a smoketest regression.
Attachment #8389965 - Flags: approval-mozilla-b2g28+ → approval-mozilla-b2g28-
The regression in bug 985529 happened because there is a case that falls through when this patch is applied.

Say a scroll frame is created, and so sends a scroll update to the APZC instance with scroll generation = 1. This is acknowledged by the APZC. Then, the scroll frame is destroyed and re-created, but is mapped to the same APZC (this happens on occasion). The new scroll frame will again send a scroll update with scroll generation = 1, but this time it will be ignored by the APZC because it already acknowledged an update with that generation. So things go out of sync and stay that way.

The best way I can think of to fix this is to start the ScrollFrameHelper::mScrollGeneration at an always-increasing initial value (using a static variable in nsGfxScrollFrame.cpp). This way when the scroll frame is re-created it will have a higher scroll generation than the old one so the APZC will accept the update.
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #29)
> The best way I can think of to fix this is to start the
> ScrollFrameHelper::mScrollGeneration at an always-increasing initial value
> (using a static variable in nsGfxScrollFrame.cpp). This way when the scroll
> frame is re-created it will have a higher scroll generation than the old one
> so the APZC will accept the update.

Actually even this doesn't work in all cases, because the scroll frame might be scrolled once before being destroyed and re-created in which case it will end up with the same scroll generation with and without the destruction.
Updated version, needs more testing. The only difference is the deletion of the RemoteId in the ScrollFrameHelper destructor - this ensures we use a new APZC if the ScrollFrameHelper is destroyed and re-created, and so avoids the problem where we can't tell if the ScrollFrameHelper is destroyed and re-created.
Attachment #8389965 - Attachment is obsolete: true
Attachment #8393735 - Attachment is obsolete: true
Depends on: 985218
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #31)
> Created attachment 8394409 [details] [diff] [review]
> Filter duplicate scroll update messages in the APZ (v2)
> 
> Updated version, needs more testing. The only difference is the deletion of
> the RemoteId in the ScrollFrameHelper destructor - this ensures we use a new
> APZC if the ScrollFrameHelper is destroyed and re-created, and so avoids the
> problem where we can't tell if the ScrollFrameHelper is destroyed and
> re-created.

I wonder if this risks regressing bug 934420, where we took care to preserve an APZC for a given ScrollableLayerGuid even if its layer was re-created. Do we know whether, in that case, it was just the layer being re-created, and not the scroll frame?
(In reply to Botond Ballo [:botond] from comment #32)
> I wonder if this risks regressing bug 934420, where we took care to preserve
> an APZC for a given ScrollableLayerGuid even if its layer was re-created. Do
> we know whether, in that case, it was just the layer being re-created, and
> not the scroll frame?

That's a good point. I tested the test case attached to that bug on the B2G browser and it still behaves fine with this patch, so that's a good sign. But yeah we'll need to make sure this doesn't regress behaviour before we can uplift it anywere. I might even consider it too risky for uplift to 1.3.
Comment on attachment 8394409 [details] [diff] [review]
Filter duplicate scroll update messages in the APZ (v2)

Review of attachment 8394409 [details] [diff] [review]:
-----------------------------------------------------------------

tn, any objections to the line of code I added in the ScrollFrameHelper destructor?
Attachment #8394409 - Flags: review?(tnikkel)
Attachment #8394409 - Flags: review?(botond)
Botond suggested using a global counter for the scroll generation which also works and is less risky. It solves the problem outlined in comment 29 without incurring the problem in comment 30.
Attachment #8394409 - Attachment is obsolete: true
Attachment #8394409 - Flags: review?(tnikkel)
Attachment #8394409 - Flags: review?(botond)
Attachment #8394883 - Flags: review?(tnikkel)
Attachment #8394883 - Flags: review?(botond)
Attachment #8394883 - Flags: review?(botond) → review+
Another option would be to store the scroll generation on the content node, not sure if that is practically any better though.
I did consider that briefly but figured it would be more expensive to keeping updating a property every time it's incremented.
Blocks: 986247
Attachment #8394883 - Flags: review?(tnikkel) → review+
Thanks!

https://hg.mozilla.org/integration/mozilla-inbound/rev/e5f36dfbb383
Whiteboard: [fxos-bug-bash-1.4] → [fxos-bug-bash-1.4][bake on m-c until april 06]
https://hg.mozilla.org/mozilla-central/rev/e5f36dfbb383
Status: REOPENED → RESOLVED
Closed: 11 years ago11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla31
No longer blocks: 986247
Ready to go with uplift requests? :)
Flags: needinfo?(bugmail.mozilla)
Yup, thanks for reminding me! :)
Flags: needinfo?(bugmail.mozilla)
Whiteboard: [fxos-bug-bash-1.4][bake on m-c until april 06] → [fxos-bug-bash-1.4]
Comment on attachment 8394883 [details] [diff] [review]
Filter duplicate scroll update messages in the APZ (v3)

NOTE: This flag is now for security issues only. Please see https://wiki.mozilla.org/Release_Management/B2G_Landing to better understand the B2G approval process and landings.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): bug 949132
User impact if declined: If the user scrolls the Youtube app as soon as they open it, the screen will jump up and down.
Testing completed (on m-c, etc.): on m-c for a week, no reports of regressions so far
Risk to taking this patch (and alternatives if risky): medium risk since this it's hard to ensure all cases are covered. this code is also not covered by tests so it's easy to regress. Affects B2G and metro.
String or IDL/UUID changes made by this patch: none
Attachment #8394883 - Flags: approval-mozilla-b2g28?
Attachment #8394883 - Flags: approval-mozilla-aurora?
Comment on attachment 8394883 [details] [diff] [review]
Filter duplicate scroll update messages in the APZ (v3)

accepting for aurora so it's in b2g 1.4 - not concerned about risk to metro, as we are not shipping that.
Attachment #8394883 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Flagging this for QA to verify before uplifting it to 1.3 and also some exploratory tests given we have no tests. :kats any other testcases you can through in here other than youtube for QA to do some exploratory testing here.
Keywords: verifyme
It would be good to test the behaviour while on a page that is driving scrolling via script. So create a test page or app that is doing scrollTo calls every half a second or so and scroll around and lock/unlock the device. Make sure the behavious is as expected.
:njpark is helping verify this so passing this onto him.
QA Contact: bzumwalt → npark
created a test javascript autoscroll page based on this script: http://www.brownielocks.com/autoscroll1.html
The page is: http://pageeasy.com/Autoscrolltest/.

The lock/unlock of device while scrolling does not cause any unexpected behavior.  It picks up where it left off when the device was locked, and continues scrolling.  When tries to manually scroll faster or to opposite direction, it stutters a bit, but that is expected according to kats.  (It shows similiar behavior on desktop)  

Tested on this version of 1.4
 Gaia      8dff633372022723e2ebad17fe3c826436b3b258                         │
│ Gecko     https://hg.mozilla.org/releases/mozilla-aurora/rev/bc14179fc49c  │
│ BuildID   20140414000201                                                   │
│ Version   30.0a2 

And this version of Master:
│ Gaia      f3abbd2d0a60f1a1618db93f8b1957cae6de379c                         │
│ Gecko     https://hg.mozilla.org/mozilla-central/rev/215080b813a7          │
│ BuildID   20140414040203                                                   │
│ Version   31.0a1
Comment on attachment 8394883 [details] [diff] [review]
Filter duplicate scroll update messages in the APZ (v3)

Thanks for the help with verification :njpark, looks good to land.
Attachment #8394883 - Flags: approval-mozilla-b2g28? → approval-mozilla-b2g28+
Rebased and landed on b2g28:

https://hg.mozilla.org/releases/mozilla-b2g28_v1_3/rev/ab227cdd984c

Note that as part of rebasing this I basically had to pull in the patch at [1]. I didn't explicitly request b2g28 approval on that because the other patch on that bug was already approved for uplift into 1.3 and the second patch was fairly trivial (we had no reason to uplift it before but probably should have in hindsight).

[1] https://bug967671.bugzilla.mozilla.org/attachment.cgi?id=8375575
Verified on the latest Tarako build

1.3T Environmental Variables:
Device: Tarako 1.3T
BuildID: 20140530014002
Gaia: e68858693b71d917c9c5ee7e215f7ceea04635f7
Gecko: 1945abae19ff
Version: 28.1
Firmware Version: SP6821a-Gonk-4.0-4-29

1.3 tarako: The screen scrolls smoothly without issue on the YouTube app
Verified fixed on following build: 
Environmental Variables:
Device: Flame 2.5
BuildID: 20150924030228
Gaia: 4bb17b24620818cbda0ba0c0d69e0ce3f914e1b7
Gecko: bf2bc1aa78c0b72d9b6b13f7a8c6ae61c60a51dc
Gonk: c4779d6da0f85894b1f78f0351b43f2949e8decd
Version: 44.0a1 (2.5) 
Firmware Version: v18D
User Agent: Mozilla/5.0 (Mobile; rv:44.0) Gecko/44.0 Firefox/44.0

RESULT: 
the menu list of youtube videos within the app scrolls smoothly as expected.
QA Whiteboard: [Qanalysts-Triage?]
Flags: needinfo?(jmercado)
Keywords: regression, verifyme
QA Contact: npark → sleedavid
Status: RESOLVED → VERIFIED
Keywords: regression
QA Whiteboard: [Qanalysts-Triage?] → [Qanalysts-Triage+]
Flags: needinfo?(jmercado)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: