Rough scrolling in GeckoView
Categories
(GeckoView :: General, defect, P1)
Tracking
(firefox66 unaffected, firefox67 unaffected, firefox68 fixed)
Tracking | Status | |
---|---|---|
firefox66 | --- | unaffected |
firefox67 | --- | unaffected |
firefox68 | --- | fixed |
People
(Reporter: rbarker, Assigned: emilio)
References
(Blocks 1 open bug, Regression)
Details
(Keywords: regression, Whiteboard: [geckoview:fxr:p1] [geckoview:fenix:p1] [gvtv:p2])
Attachments
(4 files)
When scrolling in GeckoView, scrolling is now no longer smooth. Bisecting m-c has show https://bugzilla.mozilla.org/show_bug.cgi?id=1535788 to be the cause of the regression. This scrolling regression is particularly noticeable in Firefox Reality where scrolling now causes a strobing effect. Stepping through video it appears that the page no longer smoothly tracks the finger and will miss frames and then jump to catch up.
Assignee | ||
Comment 1•6 years ago
|
||
I'm happy to take a look at this, but FWIW I'm somewhat surprised bug 1535788 regressed something like this (it effectively just moved some object around, though arguably wasn't a small change).
Specially I'm surprised it regressed scrolling performance, given scrolling is asynchronous. Though Geckoview is the only one using some of the stylesheet service features, so... maybe?
Is there a chance to get a profile or some more detailed information?
Reporter | ||
Comment 2•6 years ago
|
||
(In reply to Emilio Cobos Álvarez (:emilio) from comment #1)
I'm happy to take a look at this, but FWIW I'm somewhat surprised bug 1535788 regressed something like this (it effectively just moved some object around, though arguably wasn't a small change).
Specially I'm surprised it regressed scrolling performance, given scrolling is asynchronous. Though Geckoview is the only one using some of the stylesheet service features, so... maybe?
Is there a chance to get a profile or some more detailed information?
I'm trying to capture video to show the issue. I was surprised as well after doing the bisect. To make double sure, I checked out a branch with 1535788 at the head. The issue is 100% reproducible. I then backed it out and the issue is gone. Unfortunately it is very hard to see on a phone. However it is very visible on the Oculus Go headset running Firefox Reality. I would guess one reason it is more noticeable in VR is that the pixel density is 1.
Assignee | ||
Comment 3•6 years ago
|
||
I assume you mean https://hg.mozilla.org/mozilla-central/rev/60c20a0f320c? (I guess there's a bit of a chance of it not building, but the other one surely shouldn't have caused hazards).
Just wanted to double-check, since between the first and the second commit from that bug there's a bunch of other stuff that landed.
Reporter | ||
Comment 4•6 years ago
|
||
This is the commit: https://phabricator.services.mozilla.com/D23903 GeckoView is able to build with that revision as the head and it exhibits the issue. If I back that revision out, the strobing and flickering go away.
Updated•6 years ago
|
Reporter | ||
Comment 5•6 years ago
|
||
This video shows the finger moving and composite happening every other frame.
Reporter | ||
Comment 6•6 years ago
|
||
This shows the finger and composite happening in the same frame.
Reporter | ||
Comment 7•6 years ago
|
||
Note: the videos run at 60Hz. In order to see the issue it is necessary to step through one frame at a time.
Assignee | ||
Comment 8•6 years ago
|
||
Is there any chance to see a profile of that? I don't have any VR hardware handy.
Also, do you know if fxr does something with the stylesheet service or such to inject stylesheets? That's pretty much the only thing that I'd think can affect this.
Reporter | ||
Comment 9•6 years ago
|
||
The page used in the video was https://webxr.today which is the default home page for Firefox Reality.(In reply to Emilio Cobos Álvarez (:emilio) from comment #8)
Is there any chance to see a profile of that? I don't have any VR hardware handy.
I can try and get some profiles tomorrow.
Also, do you know if fxr does something with the stylesheet service or such to inject stylesheets? That's pretty much the only thing that I'd think can affect this.
For this page it does not alter the css or run any content scripts. There are content scripts for youtube.com but they should not be active for the page in the videos.
The page used in the video is https://webxr.today which is the default home page for Firefox Reality.
Reporter | ||
Comment 10•6 years ago
•
|
||
Profile with scrolling regression present: https://perfht.ml/2OYYL3b
Using GeckoView nightly 68.0.20190404063228
Reporter | ||
Comment 11•6 years ago
|
||
Profile with out scrolling regression: https://perfht.ml/2uXFqGC
Using GeckoView nightly 68.0.20190403060632
Reporter | ||
Comment 12•6 years ago
|
||
Profiles were captured on Oculus Go using https://webxr.today webpage.
Updated•6 years ago
|
Comment 13•6 years ago
|
||
Huh, at first glance looking at the marker chart, both of those profiles have significant jank, and it's hard for me to see a significant difference.
Ehsan, anything obvious I'm missing?
Updated•6 years ago
|
Updated•6 years ago
|
Assignee | ||
Comment 14•6 years ago
|
||
Styling time seems comparable in both cases, I don't see any significant difference there either.
Longest rasterization is slower in the profile with the regression. One of them is painting a bunch of gradients, the other one is painting a bunch of shadows. But I don't know how my patch could've potentially caused that, I think it's more likely that the content on-screen was different at the point they were taken...
Also, I thought GeckoView used e10s these days, and the profile claims to have been taken on Fennec, which is weird... Do you know what's up with that?
Comment 15•6 years ago
|
||
(In reply to Bobby Holley (:bholley) from comment #13)
Huh, at first glance looking at the marker chart, both of those profiles have significant jank, and it's hard for me to see a significant difference.
Ehsan, anything obvious I'm missing?
I don't really have a lot of context for this problem, but I think you should start by finding what's interesting to compare in the two profiles. Randall is probably the best person to help walk you through what's happening on the FXR side, but from the existing info on the bug it looks like the problem is that we're running out of frame budget in some cases, so the first step would be to narrow down which parts of the before and after profile would correspond to the interesting times to pay attention to.
Reporter | ||
Comment 16•6 years ago
|
||
(In reply to Emilio Cobos Álvarez (:emilio) from comment #14)
Also, I thought GeckoView used e10s these days, and the profile claims to have been taken on Fennec, which is weird... Do you know what's up with that?
GeckoView can use e10s but e10s is still broken on android for a number of WebVR sites so it is currently disable in FxR. I don't believe the profiler knows how to distinguish between GeckoView based apps and Fennec. I collected the profiles from the no-api version of FxR which runs on a standard phone.
Reporter | ||
Comment 17•6 years ago
|
||
(In reply to :Ehsan Akhgari from comment #15)
(In reply to Bobby Holley (:bholley) from comment #13)
Huh, at first glance looking at the marker chart, both of those profiles have significant jank, and it's hard for me to see a significant difference.
Ehsan, anything obvious I'm missing?
I don't really have a lot of context for this problem, but I think you should start by finding what's interesting to compare in the two profiles. Randall is probably the best person to help walk you through what's happening on the FXR side, but from the existing info on the bug it looks like the problem is that we're running out of frame budget in some cases, so the first step would be to narrow down which parts of the before and after profile would correspond to the interesting times to pay attention to.
So maybe the regression is caused by this patch degrading performance enough that pages already on edge of missing a frame are pushed over the threshold?
Assignee | ||
Comment 18•6 years ago
|
||
(In reply to Randall Barker [:rbarker] from comment #16)
GeckoView can use e10s but e10s is still broken on android for a number of WebVR sites so it is currently disable in FxR. I don't believe the profiler knows how to distinguish between GeckoView based apps and Fennec.
I see, thanks.
I collected the profiles from the no-api version of FxR which runs on a standard phone.
Do you know how can I build that locally? Do I need WebVR hardware to see the regression? I'd like to at least run a FxR build with some logging around the code I touched to confirm nothing silly is going on there.
(In reply to Randall Barker [:rbarker] from comment #17)
So maybe the regression is caused by this patch degrading performance enough that pages already on edge of missing a frame are pushed over the threshold?
Even if that was the case, my patch should be performance-neutral, so I still want to understand what's going on.
Reporter | ||
Comment 19•6 years ago
•
|
||
FxR GitHub repository: https://github.com/MozillaReality/FirefoxReality
The README has instructions for building with a local build of GeckoView. If you select the no-api build variant in AndroidStudio you can run it on a regular phone with out any special VR hardware.
The regression is harder to see on a regular phone than in VR hardware, but the video I attached to this bug was captured from the no-api version so it does happen there.
It's probably more noticeable on the Oculus Go because the hardware is so under powered.
Updated•6 years ago
|
Comment 20•6 years ago
|
||
Emilio, what are the next steps for this bug? Randall has STR, a regression range, and a video demonstrating the frame drops. We can get you an Oculus Go headset or a low-end Android phone if that will help with debugging.
Can we back out style set bug 1535788 in the meantime? Randall says the bug 1535788 can no longer be backed out cleanly because other style changes since landed.
Assignee | ||
Comment 21•6 years ago
|
||
(In reply to Chris Peterson [:cpeterson] from comment #20)
Emilio, what are the next steps for this bug?
It's on my list of things to do getting a FxR no-api build and try to repro / investigate the regression.
We can get you an Oculus Go headset or a low-end Android phone if that will help with debugging.
According to Randall it should repro on a regular phone, so I'll try to repro on my phone first, but I'll shout if I can't figure out.
Can we back out style set bug 1535788 in the meantime? Randall says the bug 1535788 can no longer be backed out cleanly because other style changes since landed.
I'd rather not back it out. It's not only a big change, but it's also a necessary step to fix some important compat issues like bug 1490401 (which is probably what causes it not to be rotten).
Comment 22•6 years ago
|
||
I think I can reproduce this with a simple geckoview_example build on the HD8 (no FxR required).
I'll try to find a moment tomorrow to take a look.
Assignee | ||
Comment 23•6 years ago
|
||
So, to put here some of what we've been debugging over IRC, so I don't forget...
- This reproduces in GVE.
- We came up with a consistent way to "see" the regression (via
layers.acceleration.draw-fps
). In the "bad" build the "[CC] Build" time is consistently larger (in the "good" build it's always < 2ms). - The regression only happens on the first session. Anything that causes the session to be re-created (switching to private mode, toggling e10s...) makes the bug go away.
- I thought that it was going to be because of this code calling into widget/ too early during startup, but it turns out not to be the case.
So this is likely some pre-existing bug that my patch happened to somehow trigger. But I still need to figure out what exactly and how.
Assignee | ||
Comment 24•6 years ago
|
||
Assignee | ||
Comment 25•6 years ago
|
||
Those profiles show the difference clearly, and they're from the same build.
In the "bad" case, we spend a lot of time waiting on the graphics driver. In the "good" case, we don't.
But I don't know yet why... Or how this relates to my change at all.
Assignee | ||
Comment 26•6 years ago
|
||
I'll figure this out no matter what, this is so bizarre :)
Assignee | ||
Comment 28•6 years ago
|
||
Just for reference, some things I've tried to make the lifetime of the styleset the same as before, and that have not fixed it, and other quite suspicious bits that I tried to remove as well:
https://hg.mozilla.org/try/rev/d8af7fda056c293fdec0463b0673d1ca350eee3b
https://hg.mozilla.org/try/rev/8544dd93226be8589c3ccb7b2b32ee540497cad4
https://hg.mozilla.org/try/rev/e1ba6a0d41ab2184bdef86fa5b7e5002f5af83df
https://hg.mozilla.org/try/rev/37bf11e004b8821f842d72e8054071c4a1e0449a
https://hg.mozilla.org/try/rev/7b6dd59216dca6ce3e8de94ea19596a9ae2c2243
https://hg.mozilla.org/try/rev/1b5969476b61810c53488dfd5408f92336cd30f9
https://hg.mozilla.org/try/rev/ca9763003065855755e568aaddc4688cb0ef9f80
So still out of ideas... Will try to re-implement the bad revision on top of the good revision.
Comment 29•6 years ago
|
||
(In reply to Randall Barker [:rbarker] from comment #10)
Profile with scrolling regression present: https://perfht.ml/2OYYL3b
Using GeckoView nightly 68.0.20190404063228
(In reply to Randall Barker [:rbarker] from comment #11)
Profile with out scrolling regression: https://perfht.ml/2uXFqGC
Using GeckoView nightly 68.0.20190403060632
The marker chart shows a pretty clear difference in the timing of the composites.
Bad: https://perfht.ml/2UfXtlw Good: https://perfht.ml/2UfXyFQ
In the "good" profile, there is one composite per Vsync tick, which is what we would expect. 1 1 1 1
In the "bad" profile, on every other vsync tick, there are two composites! 1 2 1 2
I don't know what could cause those extra composites. Extra composites can definitely cause stuttering.
The other difference I can see is in the types of DOM events that get triggered on the main thread:
In the "good" profile, the types of DOM events on the main thread are "scroll", "scrollend" and "mozvisualscroll". https://perfht.ml/2Ug7SNY
In the "bad" profile, in addition to those events, we also seem to fire "touchmove" events. https://perfht.ml/2UgyGxw
Maybe the touchmove events cause extra composites somehow? And maybe those are only dispatched because of emilio's change?
Assignee | ||
Comment 30•6 years ago
|
||
(I think the profiles in comment 24 are easier to analyze, since they are from the same build, and repeating the same action, which is just scrolling about:support
).
Assignee | ||
Comment 31•6 years ago
|
||
I manually bisected my patch, by writing it from scratch in smaller, potentially slightly broken but not too broken pieces, this is the piece that introduces the regression: https://hg.mozilla.org/try/rev/b4f0258b9ae0b3c55cfa8746b03c7b6a7ba3c07e
How, I don't know yet. I did notice yesterday while I was looking at the patch that that changed the order on which quirks.css is applied, but fixing that still doesn't fix the regression here, so not sure what's going on just yet, but at least I have it narrowed down...
Assignee | ||
Comment 32•6 years ago
|
||
Hmm, also, this affects XUL docs, which don't even have quirks mode. So the only thing in that patch that gets lost is the EnsureStyleFlush() call. If that fixes it I'd be sad :(
Assignee | ||
Comment 33•6 years ago
|
||
And bingo...
Assignee | ||
Comment 34•6 years ago
|
||
Sorry for the vague commit message, but I haven't dug yet on why or how the
Android code is depending on this.
This call used to be part of nsPresContext::CompatibilityModeChanged, which
unconditionally called PresShell::EnsureStyleFlush.
This was not (in theory, at least) always necessary. There's there's no point in
ensuring a style flush is going to happen if styles haven't changed, and
CompatibilityModeChanged() didn't actually guarantee that the compat mode was
different at all before my patch.
Styles only change if the compat mode actually changes (since then selectors
become case-sensitive or case-insensitive), or more obviously when you insert or
remove the quirks.css stylesheet, and in that case ApplicableStylesChanged makes
sure that the flush happens.
Yet here we are, and not having that early call to EnsureStyleFlush, even in the
case there's no quirks mode or quirks sheet change or anything of that sort
(this happens even on XUL docs, which are always FullStandards) makes the first
(and only the first) browsing session in Geckoview have terrible scrolling
performance.
I'm calling it a day for today, will investigate as time permits in bug 1544185.
Assignee | ||
Comment 35•6 years ago
|
||
Following the reasoning from the previous commit. Selectors may become case
sensitive or case insensitive as a result of this operation, something I forgot
about when writing the patch for bug 1535788.
Now Document::CompatibilityModeChanged is actually only called if the
compatibility mode actually changes, and thus we always need to restyle.
The Stylist already takes care of fully invalidating the document when it
changes (via mark_origins_dirty and co.), so we technically just need to
guarantee that a style flush will happen.
In practice I doubt this call will have any effect in practice given how early
in the document's lifetime the compatibility mode can change, but still I think
it's worth landing.
Depends on D27416
Assignee | ||
Comment 36•6 years ago
|
||
Can you confirm that just applying the first patch fixes the issue? The second one is a bit drive-by, while I reasoned about the shape of the code before my patch.
Comment 37•6 years ago
|
||
Comment 38•6 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/3b45bccf61cf
https://hg.mozilla.org/mozilla-central/rev/6ff81d173403
Reporter | ||
Comment 39•6 years ago
|
||
Can confirm it fixes the issue in FxR. Thanks!
Updated•3 years ago
|
Description
•