Closed Bug 1288751 Opened 8 years ago Closed 8 years ago

[Win 10] Scrolling does not work on recent Windows 10 Insider Preview Builds

Categories

(Firefox :: General, defect)

48 Branch
All
Windows 10
defect
Not set
normal

Tracking

()

VERIFIED FIXED
Tracking Status
e10s ? ---
firefox48 blocking verified
firefox49 --- verified
firefox50 --- unaffected

People

(Reporter: avaida, Assigned: jimm)

References

Details

(Keywords: qablocker, regression)

[Affected versions]:
- latest Aurora 49.0a2
- latest Beta 48.0b10

[Affected platforms]:
- Windows 10 x64 Insider Preview Build 14393
- Windows 10 x64 Insider Preview Build 14390

[Steps to reproduce]:
1. Launch Firefox with a new profile.
2. Go to a webpage and start scrolling up and down using your mouse wheel or trackpad.

[Expected result]:
- Scrolling via mouse wheel or trackpad is working on the webpage.

[Actual result]:
- Any attempt to scroll the page using the mouse wheel or trackpad has no effect.

[Regression range]:
m-a:
- Last good revision: 63be002b4a803df1122823841ef7633b7561d873
- First bad revision: 829d3be6ba648b838ee1953fdfa1a477dace752f
Pushlog: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=63be002b4a803df1122823841ef7633b7561d873&tochange=829d3be6ba648b838ee1953fdfa1a477dace752f

m-b:
- Reproducible starting with 48.0b1.

[Additional notes]:
- The issue is reproducible only on touch devices with wired mouse, I've tested with MS Surface Pro 2 and Dell XPS 12. Fx50 doesn't seem to be affected. Stable releases of Windows 10 don't seem to be affected.
- I was able to reproduce this issue either on the first or second profile created.
- If you can't reproduce the issue from the first profile please create a 2nd or a 3nd profile and try again.

[Tracking Requested - why for this release]:
User-facing bug affecting fast-ring Insider Preview builds of Windows 10.
Guys, can you help with this? Thanks
Flags: needinfo?(jaws)
Flags: needinfo?(gijskruitbosch+bugs)
Flags: needinfo?(dolske)
This is quite critical, no need to explain why...
There are a number of scrolling changes in the regression range, most of which involve :kats or :botond, so it seems like they will likely be better-placed to look into this than we will be - it's a core issue rather than a Firefox frontend issue.
Flags: needinfo?(jaws)
Flags: needinfo?(gijskruitbosch+bugs)
Flags: needinfo?(dolske)
Flags: needinfo?(bugmail)
Flags: needinfo?(botond)
What fixed this for 50, as the report says it's not affected? Have you verified this is unrelated to whether e10s is on/off?
Flags: needinfo?(andrei.vaida)
(In reply to :Gijs Kruitbosch from comment #4)
> What fixed this for 50, as the report says it's not affected? Have you
> verified this is unrelated to whether e10s is on/off?

I couldn't pin point a reason for which 50 is unaffected.

This bug is _not reproducible with e10s disabled_. I've used a user.js file with all the prefs associated to e10s set to false and on any new profile I've created with 48.0b10-build1, scrolling worked fine.
Flags: needinfo?(andrei.vaida)
Keywords: qablocker
e10s related, ni more people so that they are aware.
Flags: needinfo?(jmathies)
Flags: needinfo?(elancaster)
This spans a month, is there any way we can shrink this range down a bit? Alternatively, can we find the patch that fixed it in nightly?

https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=63be002b4a803df1122823841ef7633b7561d873&tochange=829d3be6ba648b838ee1953fdfa1a477dace752f
Flags: needinfo?(jmathies) → needinfo?(andrei.vaida)
Yes, please narrow down the regression range. This bug has been reported before a couple of times, see related.
Flags: needinfo?(bugmail)
See Also: → 1282846, 1233795
Also just to be clear, based on the behavior described here and in the related bugs I'm fairly sure the problem is related to the way we disable e10s based on detecting accessibility tools. The disabling probably happens in such a way that APZ is left in an inconsistent state and so scrolling doesn't work. When the browser is restarted the state is fully consistent again (with e10s disabled) and everything works fine.
I ran mozregression once again using the --find-fix parameter on the following range:
  - good: 2016-07-24
  - bad: 2016-06-06
but I'm still skeptical about the results. Isolating a regression range for this bug has been an issue from the beginning, I'm afraid there's not much else I can do.


mozilla-central
===============
* First good revision: d700dc054751333e0735f975fce3d3adf153c62a (2016-06-30)
* Last bad revision: e45890951ce77c3df05575bd54072b9f300d77b0 (2016-06-29)
* Pushlog:
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=e45890951ce77c3df05575bd54072b9f300d77b0&tochange=d700dc054751333e0735f975fce3d3adf153c62a

mozilla-inbound
===============
* First good revision: d700dc054751333e0735f975fce3d3adf153c62a
* Last bad revision: 0edb9df3c97ff7d205a99f8af607f99a858f140b
* Pushlog:
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=0edb9df3c97ff7d205a99f8af607f99a858f140b&tochange=d700dc054751333e0735f975fce3d3adf153c62a

fx-team
=======
* First good revision: 56fc33e6b14eb446916cdf28edee82df4c548082
* Last bad revision: 78fe61c78057529235d21d80832f6c01e868129c
* Pushlog:
https://hg.mozilla.org/integration/fx-team/pushloghtml?fromchange=78fe61c78057529235d21d80832f6c01e868129c&tochange=56fc33e6b14eb446916cdf28edee82df4c548082
* Looks like the following bug has the changes which introduced the fix:
https://bugzilla.mozilla.org/show_bug.cgi?id=1280381
Flags: needinfo?(andrei.vaida)
Hey Andrei, could you please confirm comment 9? (A restart of the browser after running into the issue fixes it.)
Flags: needinfo?(andrei.vaida)
(In reply to Jim Mathies [:jimm] from comment #11)
> Hey Andrei, could you please confirm comment 9? (A restart of the browser
> after running into the issue fixes it.)
Sure. Answers below. 

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #9)
> Also just to be clear, based on the behavior described here and in the
> related bugs I'm fairly sure the problem is related to the way we disable
> e10s based on detecting accessibility tools. The disabling probably happens
> in such a way that APZ is left in an inconsistent state and so scrolling
> doesn't work.
I'm not sure if I got this right, but bear with me: you're saying that this bug is reproducible when e10s is disabled due to accessibility reasons. This is incorrect, because the bug is only reproducible while e10s is enabled.

Note that you'll continue to reproduce this bug no matter how many times you restart the browser, unless you disable e10s.

> When the browser is restarted the state is fully consistent
> again (with e10s disabled) and everything works fine.
There's no need for a restart if e10s is disabled to start with. Per Comment 5, I've created new profiles and started Firefox with e10s forcefully disabled (via user.js file) and scrolling worked perfectly fine on first run.
Flags: needinfo?(andrei.vaida)
(In reply to Andrei Vaida, QA [:avaida] – please ni? me from comment #12)
> (In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #9)
> > Also just to be clear, based on the behavior described here and in the
> > related bugs I'm fairly sure the problem is related to the way we disable
> > e10s based on detecting accessibility tools. The disabling probably happens
> > in such a way that APZ is left in an inconsistent state and so scrolling
> > doesn't work.
> I'm not sure if I got this right, but bear with me: you're saying that this
> bug is reproducible when e10s is disabled due to accessibility reasons. This
> is incorrect, because the bug is only reproducible while e10s is enabled.

Not quite what I was saying. On a first-run with the default settings, Firefox will start up with e10s *enabled*. However, during this first-run, it detects that accessibility tools are running (because the laptop is touch-enabled), and so sets a flag that disables e10s for future runs. So the first time you run Firefox, it will run with e10s enabled, but then if you close and restart firefox without changing anything, it will then run with e10s disabled for accessibility. If you are manually overriding this behaviour by setting prefs then obviously the prefs will take precedence. My experience with the related bugs was that on that first run, wheel scrolling doesn't work, but after restarting Firefox it works fine.

> Note that you'll continue to reproduce this bug no matter how many times you
> restart the browser, unless you disable e10s.

So it sounds like from what you're saying that e10s remains enabled even after a restart, and wheel scrolling remains broken. If that's the case - it means that the behaviour on Windows 10 Insider Preview is different from regular Windows 10, in that e10s doesn't get disabled because of accessibility. So maybe on Windows 10 Insider Preview, the fact that it's a touch-enabled laptop doesn't trigger our accessibility detection code?

> > When the browser is restarted the state is fully consistent
> > again (with e10s disabled) and everything works fine.
> There's no need for a restart if e10s is disabled to start with. Per Comment
> 5, I've created new profiles and started Firefox with e10s forcefully
> disabled (via user.js file) and scrolling worked perfectly fine on first run.

This part makes sense.
Ok, lets try to confirm this.  Andrei, could you please try these steps:

1) fresh profile, launch browser, open a page
2) try to wheel scroll
3) close browser
4) reuse previous profile, launch browser
5) try to wheel scroll

Results?
Flags: needinfo?(andrei.vaida)
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #13)
> Not quite what I was saying. On a first-run with the default settings,
> Firefox will start up with e10s *enabled*. However, during this first-run,
> it detects that accessibility tools are running (because the laptop is
> touch-enabled), and so sets a flag that disables e10s for future runs. So
> the first time you run Firefox, it will run with e10s enabled, but then if
> you close and restart firefox without changing anything, it will then run
> with e10s disabled for accessibility. If you are manually overriding this
> behaviour by setting prefs then obviously the prefs will take precedence. My
> experience with the related bugs was that on that first run, wheel scrolling
> doesn't work, but after restarting Firefox it works fine.
> 
> > Note that you'll continue to reproduce this bug no matter how many times you
> > restart the browser, unless you disable e10s.
> 
> So it sounds like from what you're saying that e10s remains enabled even
> after a restart, and wheel scrolling remains broken. If that's the case - it
> means that the behaviour on Windows 10 Insider Preview is different from
> regular Windows 10, in that e10s doesn't get disabled because of
> accessibility. So maybe on Windows 10 Insider Preview, the fact that it's a
> touch-enabled laptop doesn't trigger our accessibility detection code?

Argh... yeah, my initial assessment on this matter was wrong.

After restarting the browser, scrolling works properly, which means that e10s is successfully disabled as it detects accessibility tools, per Kartikaya's description in Comment 9.

Sorry for the back and forth. That being said, I can confirm that Comment 9 and Comment 11 are 100% accurate: restarting the browser fixes this issue.
Flags: needinfo?(andrei.vaida)
Is requiring the user to restart browser good enough for 48? Are there changes we can make to improve the experience for 49?
Flags: needinfo?(elancaster)
So... on 50/nightly, does this really not reproduce (anymore), or is it just similar to the others (broken on first run, working after a restart) ?
Flags: needinfo?(botond) → needinfo?(andrei.vaida)
(In reply to Erin Lancaster [:elan] from comment #16)
> Is requiring the user to restart browser good enough for 48?

Yes product made the decision on this. 

> Are there
> changes we can make to improve the experience for 49?

Yeah, broken wheel scroll seems pretty bad. Generally we prompt telling people to restart when we detect accessibility clients. However apparently if the request is made really early in startup the prompt doesn't show up. Touchscreen users with fresh profiles get hit by this. Note the code that adds prefs indicating a11y was recently in use landed back in Fx45. So existing users with touch screens have a "I've recently used a11y" pref set, and as such will not get e10s.

This bug could show up for fresh installs on touchscreen devices though, which makes for a pretty kludgy first experience. I'll look into fixing the prompt.
Assignee: nobody → jmathies
Flags: needinfo?(andrei.vaida)
(In reply to :Gijs Kruitbosch (Gone July 28 - Aug 11) from comment #17)
> So... on 50/nightly, does this really not reproduce (anymore), or is it just
> similar to the others (broken on first run, working after a restart) ?

I'll check this too.
felipe, how do we distribute the the system addon that enables e10s? Is it something we download on first run?
Flags: needinfo?(felipc)
Spoke with felipe on irc, It's bundled with 48 install.
Flags: needinfo?(felipc)
Component: Widget: Win32 → General
Product: Core → Firefox
Maybe block 48.
Flags: needinfo?(mozillamarcia.knous)
I can't repo this on a touchscreen laptop running a local build of nightly.
Jim, this is already marked as blocking 48.
FYI, I will start the build of 48 RC1 soon. We can do a second RC later.
Flags: needinfo?(mozillamarcia.knous)
(In reply to Jim Mathies [:jimm] from comment #23)
> I can't repo this on a touchscreen laptop running a local build of nightly.

Ugh, sorry, this was on Windows 8.1. I need to track down access to windows 10 insider preview builds for this. I do have a windows 10 release system but it doesn't have touchscreen.

blassey curious if you could help here, can you repo this on your surface book with a fresh profile?
Flags: needinfo?(blassey.bugs)
ni to tracy as well for testing on a Surface with Windows 10 release.
Flags: needinfo?(twalker)
(In reply to Jim Mathies [:jimm] from comment #25)
> (In reply to Jim Mathies [:jimm] from comment #23)
> > I can't repo this on a touchscreen laptop running a local build of nightly.
> 
> Ugh, sorry, this was on Windows 8.1. I need to track down access to windows
> 10 insider preview builds for this. I do have a windows 10 release system
> but it doesn't have touchscreen.
> 
> blassey curious if you could help here, can you repo this on your surface
> book with a fresh profile?

I don't use insider builds (nor do I have a surface book, but I do have a touch screen). Scrolling works on nightly with release win10
Flags: needinfo?(blassey.bugs)
Are these Insider Previews of what's going out on the Aug 2 anniversary update?

From the conclusions, if my understanding is correct, it looks like the problem is during the first session, when e10s is enabled and a11y was activated, so it's waiting on a restart to deactivate e10s.

The good news is that the a11y deactivation code is already shipped and running ahead of time for Firefox 47 users. So if they have a touch screen device and upgrade from 47 to 48 they won't get e10s even once because we will already have detected this condition.

This limits the problem to new users. If desired we could create a simple patch that delays e10s activation for one session to avoid it. Maybe a bit of a bummer because people who download Firefox fresh with the intention of seeing e10s in action might be fooled into thinking it doesn't make a difference. However, for the first few weeks they will be playing against the 1% odds anyway so maybe it's not an issue.

(For simplicity the delay would be for every user, not only new users. Unless there's a very simple and safe way to detect first run which I don't know about. But we could limit this behavior to Win10)
(In reply to :Gijs Kruitbosch (Gone July 28 - Aug 11) from comment #17)
> So... on 50/nightly, does this really not reproduce (anymore), or is it just
> similar to the others (broken on first run, working after a restart) ?
I was unable to reproduce this bug _at all_ on Fx50 using Dell XPS 12 and MS Surface Pro 2. I've double checked it with the latest nightly (2016-07-25), just to be sure.

(In reply to :Felipe Gomes (needinfo me!) from comment #28)
> Are these Insider Previews of what's going out on the Aug 2 anniversary
> update?
Yes. This was the case with previous large updates as well.
I'd like to understand why this affects Windows Insider Preview builds and not the current release version of Windows 10. I have a laptop that's getting updated. I'm somewhat at the mercy of Microsoft's update mechanism here though.
For beta build testing, how do you insure you'll get into the e10s test pool?
Flags: needinfo?(andrei.vaida)
Ok I can reproduce this on Aurora with the current release version of Windows 10. I'm not sure what rev of 10 I have but it's what came down in the Windows 8 -> 10 free update. I didn't need the Insider Preview build.

Interestingly I can't reproduce in Nightly, same set up. Which means there's something in 50 that fixed this.
(In reply to Jim Mathies [:jimm] from comment #31)
> For beta build testing, how do you insure you'll get into the e10s test pool?
By creating and setting the pref: "browser.tabs.remote.force-enable" to true.


(In reply to Jim Mathies [:jimm] from comment #32)
> Ok I can reproduce this on Aurora with the current release version of
> Windows 10. I'm not sure what rev of 10 I have but it's what came down in
> the Windows 8 -> 10 free update. I didn't need the Insider Preview build.
I was unable to reproduce the bug on the current release build of Windows 10, perhaps it's something that has to do with my test environment. Anyway, it sounds like this is no longer an issue isolated to or directly impacting the upcoming anniversary release of Windows 10.


(In reply to Jim Mathies [:jimm] from comment #32)
> Interestingly I can't reproduce in Nightly, same set up. Which means there's
> something in 50 that fixed this.
I assume that the data I provided in Comment 10 on this matter didn't help?
Flags: needinfo?(andrei.vaida)
Here's the fix range on mozilla-central:

https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=c9a70b64f2faa264296f0cc90d68a2ee2bac6ac5&tochange=f47a1bc057ff5a6084e776040425da258a2851ca

I'm guessing Milan's bug 1282584 (3ee6220f83eb) fixed it.

Also, I am occasionally seeing the accessibility restart prompt while testing. There is a bug with that that I can track down, fix, and potentially uplift. I'd suggest we look at uplifting the scroll fix instead, which looks pretty safe. I'd like to do some testing to be sure first.
Flags: needinfo?(twalker)
(In reply to Jim Mathies [:jimm] from comment #34)
> I'm guessing Milan's bug 1282584 (3ee6220f83eb) fixed it.
> 

I agree, this looks like a promising fix. I had suspected that this issue was a race condition not limited to Windows (given bug 1282846 was on Ubuntu, and given the general hard-to-reproduce/intermittent nature of the bug). In particular from the APZ code it seemed like this situation might arise if the parent process thought e10s/APZ was enabled but the child process did not. And Milan's fix seems to address exactly such a scenario.
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #35)
> (In reply to Jim Mathies [:jimm] from comment #34)
> > I'm guessing Milan's bug 1282584 (3ee6220f83eb) fixed it.
> > 
> 
> I agree, this looks like a promising fix. I had suspected that this issue
> was a race condition not limited to Windows (given bug 1282846 was on
> Ubuntu, and given the general hard-to-reproduce/intermittent nature of the
> bug). In particular from the APZ code it seemed like this situation might
> arise if the parent process thought e10s/APZ was enabled but the child
> process did not. And Milan's fix seems to address exactly such a scenario.

Ok, great. So I've confirmed this fix addresses the problem. What I'd like to propose is that we uplift 3ee6220f83eb (which essentially keeps APZ mechanics in the content process running after we've updated prefs as a result of accessibility detection) to address the scroll problem in 48. I'll file a follow up on the intermittent prompt problem, fix that and uplift to 49.

The group of users that will be impacted by this then will be new install accessibility users who have a client that triggers the prompting problem. I don't know yet if this use case exists, I'll do some testing to see using common a11y clients. These a11y users would have access to browser chrome but would not 'see' content through their a11y client until a restart. Generally I think it's a safe trade off.
Marcia, does this sound ok with you? We'll need to do another 48 rc as a result.
Flags: needinfo?(mozillamarcia.knous)
First, I would like to congrat you guys for this amazing team work. This is impressive work. Bravo!

Second, I agree that we should uplift bug 1282584 and do a RC2. I pinged Milan in the bug.
Flags: needinfo?(mozillamarcia.knous)
(In reply to Jim Mathies [:jimm] from comment #36)
> Ok, great. So I've confirmed this fix addresses the problem. What I'd like
> to propose is that we uplift 3ee6220f83eb (which essentially keeps APZ
> mechanics in the content process running after we've updated prefs as a
> result of accessibility detection) to address the scroll problem in 48. I'll
> file a follow up on the intermittent prompt problem, fix that and uplift to
> 49.

Sounds good to me, thanks!
(In reply to Sylvestre Ledru [:sylvestre] from comment #38)
> First, I would like to congrat you guys for this amazing team work. This is
> impressive work. Bravo!
> 
> Second, I agree that we should uplift bug 1282584 and do a RC2. I pinged
> Milan in the bug.

uplifted now!
By the way, this bug can be hit with an alternative workflow, and without a racing condition:
* Start Firefox with E10S on.
* Disable E10S in about:config - parent process still thinks E10S is on
* Close all content tabs (about:config is not running in the parent process, so that one is OK) - this will get rid of the content process
* Open another tab, go to a site that would scroll - content process is recreated, reads the prefs and now thinks there is no E10S
* No scrolling!
Blocks: 1289473
Fixed by the uplift in bug 1282584.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Depends on: 1282584
Flags: qe-verify+
This is verified fixed on 48.0-build2 (20160726073904) and 49.0a2 (2016-07-27). Confirmed for both of the scenarios mentioned in Comment 0 and Comment 41.

Testing was performed on Dell XPS 12 and MS Surface Pro, using Windows 10 x64 - Insider Preview Builds 14393.0 and 14393.5.
Status: RESOLVED → VERIFIED
while monitoring the trees i noticed that it seems that this change somehow cause a assertion failure on windows xp like mentioned in https://bugzilla.mozilla.org/show_bug.cgi?id=1290089 

Note that this is a assertion failure so this output assertion failure something you only see in debug builds.

Not sure what impact this has to release build, but just in case mentioning this here.
Status: VERIFIED → REOPENED
Flags: needinfo?(jmathies)
Resolution: FIXED → ---
lets deal with that over in bug 1290089.
Flags: needinfo?(jmathies)
Status: REOPENED → RESOLVED
Closed: 8 years ago8 years ago
Resolution: --- → FIXED
Status: RESOLVED → VERIFIED
Glad we got this one for 48, and I'm told 14393.5 is flagged as "gold" for the Anniversary release.
You need to log in before you can comment on or make changes to this bug.