Closed Bug 1724755 Opened 3 years ago Closed 9 months ago

[Bug]: Fenix drops scroll events on polygon.com with pull-to-refresh enabled

Categories

(Fenix :: General, defect, P2)

Unspecified
Android
defect

Tracking

(firefox123 verified)

VERIFIED FIXED
Tracking Status
firefox123 --- verified

People

(Reporter: kbrosnan, Unassigned)

References

()

Details

Attachments

(1 file)

From github: https://github.com/mozilla-mobile/fenix/issues/20659.

Steps to reproduce

  1. with pull to refresh enabled…
  2. Visit https://www.polygon.com/22526689/halo-infinite-release-date-gameplay-xbox-e3-2021 (it may work on other pages)
  3. After ads load part way down the page…
  4. Scroll up and down

Expected behaviour

Scrolling works smoothly

Actual behaviour

After scrolling down, the first scroll up event will frequently be dropped. Sometimes, you'll see the pull to refresh icon drop from the top partially but it never triggers to actually reload the page. This effect is exacerbated by playing the video content. I'm unable to reproduce this behavior with Pull to refresh disabled.

Device name

Pixel 2

Android version

Android 11

Firefox release type

Firefox Nightly

Firefox version

92.0a1 (2021-08-03)

Device logs

No response

Additional information

Video recording of the issue (with both pull-to-refresh enabled and disabled): https://user-images.githubusercontent.com/759372/128097205-d1dd668b-dc76-4758-910a-e3a2cf038ebc.mp4

Change performed by the Move to Bugzilla add-on.

Whiteboard: [apz-android-needstriage]

I experience the same issue on many websites.

I can reproduce this, intermittently but fairly often, on the mentioned page.

Not quite an S2 (a workaround exists, namely if you try scrolling again it will work), but I agree it's quite annoying and worth investigating, hence marking it a P2.

It's not clear to me whether the issue is on the Fenix side of the pull-to-refresh implementation or the GeckoView side, but we can start the investigation here.

Severity: -- → S3
Priority: -- → P2
Whiteboard: [apz-android-needstriage]

(In reply to Matt Cosentino from comment #1)

I experience the same issue on many websites.

Matt, if you have other example sites you are able to provide, that would potentially be helpful (e.g. if we need to investigate the page's Javascript code, that might be easier on some sites where it's less obfuscated).

It has been reported to GitHub that this problem occurs on Google Search when Google Search Fixer is installed and some pages of GitHub.
https://github.com/mozilla-mobile/fenix/issues/21536
https://github.com/mozilla-mobile/fenix/issues/25964
I have confirmed that it also occurs on Yahoo! JAPAN and Daily Portal Z.

Blocks: 1807071

(In reply to Botond Ballo [:botond] from comment #3)

(In reply to Matt Cosentino from comment #1)

I experience the same issue on many websites.

Matt, if you have other example sites you are able to provide, that would potentially be helpful (e.g. if we need to investigate the page's Javascript code, that might be easier on some sites where it's less obfuscated).

See also
amazon.com
amazon.co.uk
gsmarena.com
reuters.com/world

In all cases, on Nightly,
Enable pull to refresh

  1. fling upwards (content moves upwards & inertia scrolls)
  2. wait for the inertia scroll to completely stop
  3. drag finger downwards
    -> content does not pan, INCORRECT, but toolbar does animate into view CORRECT
  4. repeat drag downwards
    -> content pans only on every second drag , INCORRECT

disable pull to refresh and content now pans correctly

Correction, it's any article on m.gsmarena.com not the main page, eg
m.gsmarena.com/motoroa_moto_g72-review-2517.php has the issue

In the case of gsmarena.com, the site does something weird. There's a passive event listener for touchmove doing preventDefault(). I am not confident it does prevent from scrolling though.

draggable: function (e, t, n) {
(snip)
 r= !1,
(snip)
document.addEventListener('touchmove', function (e) {
      r && (e.preventDefault(), c({
        screenX: e.touches[0].clientX,
        screenY: e.touches[0].clientY
      }))
    }, {
      passive: !1
    }),
~~~

I have confirmed that adding ##+js (addEventListener-defuser.js) in uBlock Origin's MyFilter for the site where the problem occurs will not cause this issue and will allow scrolling (but the site will not work properly).

I am almost 100% sure that this is not an issue in Gecko, not an issue in GeckoView either. This is an issue in android-component.

I dumped InputResultDetail values in Gecko side, in nsWindow.cpp. When the problem happens Gecko returns the proper value, handled: handled-by-root, scrollable: [top, bottom], overscroll: either.

I also dumped InputResultDetail values in android-component, in SwipeRefreshFeature.kt the place where we tell the content can be scrolled upward or not for pull-to-refresh. When the problem happens I got "Input with unknown handling" which means the InputResultDetail hasn't been properly set. It was just initialized, well, I would say it's uninitialized state.

GeckoView.onTouchEventForDetailResult returns GeckoResult<InputResultDetail>, which means the InputResultDetail will be resolved asynchronously via GeckoResult<>. So android-component needs to defer to tell whether the content can be scrolled or not.

Moving into Fenix:General since I could find any bugzilla component corresponding to android-component.

Component: Panning and Zooming → General
Product: Core → Fenix

I've confirmed that a naive way, just adding SCROLL_DIRECTIONS_UNKNOWN = -1 and initializing InputResultDetail with the value here solves the issue, FWIW.

See Also: → 1807073
See Also: → 1817330

Could not reproduce the issue on https://www.polygon.com/22526689/halo-infinite-release-date-gameplay-xbox-e3-2021.
But could reproduce it for any article from gsmarena.com

The issue to me seems to be rather the delay with which APZ->GeckoView responds with the InputResultDetail.
While on most of the websites I see <50ms on gsmarena I see >100ms.
This difference seems to be sufficient for the app to start the pull to refresh functionality which will be cancelled after ~100ms but this seems to indeed consume the gesture.

Using a default of "SCROLL_DIRECTIONS_UNKNOWN = -1" as suggested in https://bugzilla.mozilla.org/show_bug.cgi?id=1724755#c10 will indeed help this exact scenario - as this will effectively disable pull to refresh until we receive the InputResultDetail from GeckoView but this will break the overall pull to refresh functionality.
The way it works right now: the platform's SwipeRefreshLayout needs a complete stream of events from ACTION_DOWN forward. If we don't pass this in or if we block it from starting pull to refresh it cannot be started anymore for the current gesture.
So a solution here would still have to come from APZ - faster inference of how the gesture is handled to pass this on to pull to refresh inside the actionable window - the way it happens for most of the websites.
As such I still think this is an issue for APZ to solve, not Fenix. The issue depends on the website which Fenix doesn't really care about.
I'd also like to remind about my proposal of rethinking the whole gesture support from https://github.com/mozilla-mobile/android-components/pull/12198 which should help fix the issues described here.

Flags: needinfo?(hikezoe.birchill)
Flags: needinfo?(botond)

(In reply to Petru-Mugurel Lingurar [:petru] from comment #11)

The issue to me seems to be rather the delay with which APZ->GeckoView responds with the InputResultDetail.
While on most of the websites I see <50ms on gsmarena I see >100ms.

It is expected that the time it takes to determine the InputResultDetail value varies from one site to another.

For this particular site (m.gsmarena.com), the reason it takes longer than many other sites is that the page has a non-passive touchmove event listener, as Hiro described in comment 7. This means the page may want to handle some touch gestures itself in the page's JS, in which case it calls preventDefault() on the first touchmove event, and in that case, no browser default actions (including no scrolling, and no pull-to-refresh) should happen.

This means we need to wait for the page to receive the touchstart and first touchmove events, let it run its event listeners, see whether it has called preventDefault(), and factor that into the InputResultDetail (if the page called preventDefault(), the handledResult value is set to INPUT_RESULT_HANDLED_CONTENT), and only then can we provide the InputResultDetail to GeckoView.

The time required for all this is, to a significant extent, outside of our control: if the page has a slow event listener, it slows down the response time, and if the page does other heavy JS processing (such that e.g. the touchmove event is sitting in the event queue while the page is running another expensive operation), that also slows down the response time.

We do put a time limit on how much time the page has to run the event listener and produce a response, but that time limit is 600ms on mobile. This has been chosen experimentally in bug 1247280 comment 2 to achieve a 99% correctness rate -- that is, we found 600ms was sufficient for the page to complete running its event listener in 99% of cases, and not sufficient in 1% of cases (in those 1% of cases, we may perform a default browser action such as scrolling even if the page intended to handle the gesture itself).

While we may be able to consider slight revisions to the time limit, getting it to be <100ms or <50ms is not realistic (we would do the wrong thing, e.g. scroll when we shouldn't, way too often).

Therefore, the application layer needs to gracefully handle cases where the response (InputResultDetail) takes up to 600ms to arrive.

If I'm reading the code around here correctly, it seems like inputResultDetail starts off with a default value of InputResultDetail.newInstance(true), and then is assigned the actual value once the GeckoResult promise is resolved. So, the variable spends a period of time being set to a default value that does not reflect the actual value from Gecko yet. I think the code that reads inputResultDetail needs to take care to avoid reading this default value -- like maybe the value should be null during this period, and code that reads it should check for null and treat it as "we don't know the answer yet".

The way it works right now: the platform's SwipeRefreshLayout needs a complete stream of events from ACTION_DOWN forward. If we don't pass this in or if we block it from starting pull to refresh it cannot be started anymore for the current gesture.

Can we save the events that arrive before we know the answer in a queue, and "replay" them once we know the answer?

This is what we do for scrolling inside APZ: we keep the touch events in a queue until the preventDefault() answer arrives, and then process them (to cause scrolling) or drop them (if they should not cause scrolling).

Flags: needinfo?(petru.lingurar)
Flags: needinfo?(hikezoe.birchill)
Flags: needinfo?(botond)

(In reply to Petru-Mugurel Lingurar [:petru] from comment #11)

I'd also like to remind about my proposal of rethinking the whole gesture support from https://github.com/mozilla-mobile/android-components/pull/12198 which should help fix the issues described here.

I'm eager to have a discussion about this, but as I mentioned previously (e.g. in bug 1785754 comment 2) we need someone familiar with the GeckoView layer to help us understand this as well.

I mentioned this to Amedyne and Bryan at the Hawaii All Hands, and more recently at a meeting a few weeks ago; if you can also put in a good word to help make such a discussion happen sooner, that would be appreciated.

Alternatively, if you're familiar enough with the GeckoView layer yourself, and would be able to expand on the RFC to outline its implications on the API between Gecko and GeckoView, perhaps we could make progress that way.

Flags: needinfo?(petru.lingurar)
Flags: needinfo?(petru.lingurar)

(In reply to Botond Ballo [:botond] from comment #12)

(In reply to Petru-Mugurel Lingurar [:petru] from comment #11)

The issue to me seems to be rather the delay with which APZ->GeckoView responds with the InputResultDetail.
While on most of the websites I see <50ms on gsmarena I see >100ms.

It is expected that the time it takes to determine the InputResultDetail value varies from one site to another.
...

While we may be able to consider slight revisions to the time limit, getting it to be <100ms or <50ms is not realistic (we would do the wrong thing, e.g. scroll when we shouldn't, way too often).

Agree

Therefore, the application layer needs to gracefully handle cases where the response (InputResultDetail) takes up to 600ms to arrive.
Can we save the events that arrive before we know the answer in a queue, and "replay" them once we know the answer?

This is what we do for scrolling inside APZ: we keep the touch events in a queue until the preventDefault() answer arrives, and then process them (to cause scrolling) or drop them (if they should not cause scrolling).

Cannot say for sure now. Have to investigate more.


(In reply to Botond Ballo [:botond] from comment #13)

I mentioned this to Amedyne and Bryan at the Hawaii All Hands, and more recently at a meeting a few weeks ago; if you can also put in a good word to help make such a discussion happen sooner, that would be appreciated.

Amedyne is aware of the current hurdles and complexity and Jonathan is also now investigating the overall status.

See Also: → 1815732

Since we already have a wrapper over platform's pull to refresh functionality - VerticalSwipeRefreshLayout I looked into caching here the MotionEvents we receive in onInterceptTouchEvent.
While we could do a deep copy of the said MotionEvents and also update the eventTime and downTime properties and then feed these to the platform's SwipeRefreshLayout onInterceptTouchEvent it seems like those events will not actually trigger the pull to refresh functionality.
May be because the platform somehow prevents ghost touches as indicated here [1] but I haven't been able to find out more. If that would be the case then we'd need to have our own complete implementation of a SwipeRefreshLayout that would update the UI maybe on even simpler APIs than MotionEvents.
That would require a discussion about whether we want to invest more into this approach or wait until the other discussed API driven entirely by GeckoView is ready.

[1] https://stackoverflow.com/a/5240821/4249825

Flags: needinfo?(petru.lingurar)

(In reply to Petru-Mugurel Lingurar [:petru] from comment #15)

That would require a discussion about whether we want to invest more into this approach or wait until the other discussed API driven entirely by GeckoView is ready.

I wonder if the API driven by GeckoView might have the same problem. In between the MotionEvent being received from the OS, and the later time when GeckoView invokes the API with the information needed for the application to handle the event, doesn't the MotionEvent still need to be remembered / stored somewhere?

In the proposed new API we would not use MotionEvents anymore from which to then infer the gesture and distance.
If GeckoView would send us the distances then we would use that to animate the toolbar or the pull to refresh throbber to a new position without caring about the user gesture.
I tested the delay scenarion with the newly proposed api and added here a video with the following code

 Handler(Looper.getMainLooper()).postDelayed ({
 ...            
 }, 3000)

wrapping the nestedScrollDispatcher?.coroutineScope?.launch calls from NestedGeckoView.

The visible result in the app shows that the new implementation would automatically support any delay from GeckoView informing about the scroll/overscroll distances.
The proposed approach in the RFC uses a now deprecated API for pull to refresh similar to the one in the View based system which shows the same bug - a scroll in GeckoView is synced to animating the pull to refresh throbber. Seems like in the meantime Google is migrating to a similar API based on distances and a such API is already available through androidx.compose.material.pullrefresh which uses a PullRefreshState which allows specifying the progress. Have not tested but I'm confident that with this we would get the same behavior as for the toolbar, being able to translate the throbber to any position (progress) without caring about a user gesture or delays in handling touches.

(In reply to Petru-Mugurel Lingurar [:petru] from comment #17)

In the proposed new API we would not use MotionEvents anymore from which to then infer the gesture and distance.

Does this mean it wouldn't use SwipeRefreshLayout?

I'm just wondering why this change can't be made independently of the larger refactoring where GeckoView sends information about the distances.

What I understood from comment 15 is that there is an issue with trying to hold MotionEvents in a queue and replay them later, and from comment 17 that there is an alternative way to drive the swipe-to-refresh animation based on distances that doesn't involve MotionEvents.

So, I'm wondering: in the shorter term, could we still keep the A-C logic to infer the distance from the MotionEvent, put that information into a queue (instead of the MotionEvent itself), and dequeue it when we get the InputResultDetail from GeckoView? And then, once the larger refactoring to have GeckoView provide the distance information is in place, we can switch to consuming the distances from that?

Apologies if I'm overlooking something obvious in making this suggestion.

androidx.compose.material.pullrefresh which would use distance data instead of MotionEvents is an API available for Compose. The app has basic support for this, which I based my RFC on while in production it is using the old xml based layouts and the older SwipeRefreshLayout.
With the long term goal for the app being to be migrated to Compose I suggest migrating to compose and androidx.compose.material.pullrefresh at the same time with migrating to a new distances based API from GeckoView.
If that effort is deemed to take a longer time then we can also implement our own throbber and animate it "manually" based on cached data (distances) from MotionEvents but with not actually sending the MotionEvents to the SwipeRefreshLayout. While this doesn't seem to be supported now in official libraries there may be some 3rd party ones available or we could implement our own.

One interesting data point may be to check which of these APIs Chrome uses on Android. Presumably they've needed to overcome similar challenges to ours.

See Also: → 1830805
See Also: → 1831349
See Also: → 1832880
See Also: → 1836298

There has been no movement for several months, but have there been any new developments on this issue?
The pull to refresh feature should be deprecated as an incomplete feature until this issue is resolved, as it is occurring on a large number of sites and is interfering with a pleasant web experience in Firefox.

See Also: 1831349

I have checked the latest Beta and Nightly versions and this bug no longer occurs on each site where it used to occur, so I assume it has recently been fixed.
Can anyone confirm further information?

Probably bug 1847305 fixed this case too.

I agree, requesting QA verification based on the comment above.

Flags: qe-verify+

Hi,
Verified with the page given in the description.
While reloading the page takes more time than usual, the refresh is completed and the page scrolls as expected.

Devices used: Samsung Galaxy S23 Ultra and Google Pixel 7 (both with Android 14).

Marking the ticket as verified.

Status: NEW → RESOLVED
Closed: 9 months ago
Flags: qe-verify+
Resolution: --- → FIXED
Status: RESOLVED → VERIFIED

Great to hear that! Thanks!

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: