Closed Bug 1697091 Opened 1 month ago Closed 1 month ago

Scrolling does not continue after scrolling up/down on nightly

Categories

(Core :: Panning and Zooming, defect)

Firefox 88
defect

Tracking

()

VERIFIED FIXED
88 Branch
Tracking Status
firefox-esr78 --- unaffected
firefox86 --- disabled
firefox87 --- disabled
firefox88 --- verified

People

(Reporter: daphnei.mee, Assigned: dthayer)

References

(Regression)

Details

(Keywords: regression)

Attachments

(2 files)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:86.0) Gecko/20100101 Firefox/86.0

Steps to reproduce:

Scroll up/down on a webpage on nightly.

Actual results:

Scrolling freezes after my fingers leave the touchpad

Expected results:

Scrolling should have continued automatically down as it does in stable

Performance Profiles to compare:
Stable: https://share.firefox.dev/3cg6aY4
Nightly: https://share.firefox.dev/3l4WC67

Note: restarting with addons disabled does not fix the issue, but oddly enough going into safe mode and then out of safe mode fixes the issue. This issue also seems to return almost every time a new nightly version is released.

Another note: This issue still occurs on a fresh profile and with all addons disabled and can be temporarily resolved in the same way.

Component: Untriaged → Panning and Zooming
Product: Firefox → Core

What happens if you set apz.windows.force_disable_direct_manipulation to true?

Hmm, I can reproduce this. Setting to apz.windows.force_disable_direct_manipulation true does fix this. But there seems to be something about the profile involved (mozregression finds both builds as good). Do you know when you first started seeing this?

Status: UNCONFIRMED → NEW
Ever confirmed: true

Can you tell me what hardware you are seeing this on? I want to try to see if it might be due to drivers or not.

Flags: needinfo?(daphnei.mee)
See Also: → 1697756

It seems like how Firefox is launched determines whether or not this bug shows up. If the user performs some action in the browser to trigger Firefox to restart then the bug won't appear (such as clicking the restart in safe mode button in about:support or toggling the about:home cache in nightly experiments in preferences). If Firefox is launched from the start menu or command line then the bug will show up.

(In reply to Timothy Nikkel (:tnikkel) from comment #4)

Can you tell me what hardware you are seeing this on? I want to try to see if it might be due to drivers or not.

I am using a Dell XPS 15 9560 with
Intel 7700HQ
Intel 630+Nvidia 1050
32gb ram
ssd

I have the latest intel graphics drivers installed from the vendor website. I have tried the generic intel drivers as well and the issue still occurs.

Flags: needinfo?(daphnei.mee)

Found reproducible steps. Bisected, this is caused by

https://hg.mozilla.org/mozilla-central/rev/c5bcb9f8eab98e66d57b3a2f2b08bc08e9d3810e

-> bug 1680258. I'm not sure how.

Doug, any idea what the skeleton ui might be changing to cause a change in how the OS sends us direct manipulation inertia events? How does it change the hwnd structure of Firefox? Does it change any window styles or anything like that? This effect does not go away, it is not something that just happens early in the process. Does it poke win32 in a weird way that maybe puts our app in a weird mode as far as dmanip is concerned? Does the skeleton ui window stick around? Can we easily disable big chunks of the code enabled by the pref to see what caused this?

Flags: needinfo?(dothayer)

(In reply to Timothy Nikkel (:tnikkel) from comment #7)

Found reproducible steps. Bisected, this is caused by

https://hg.mozilla.org/mozilla-central/rev/c5bcb9f8eab98e66d57b3a2f2b08bc08e9d3810e

-> bug 1680258. I'm not sure how.

Doug, any idea what the skeleton ui might be changing to cause a change in how the OS sends us direct manipulation inertia events? How does it change the hwnd structure of Firefox?

In theory it shouldn't - we create the first window much earlier than usual, trying our best to exactly match the first toplevel window that nsWindow creates, and then when nsWindow actually comes along and tries to create its first toplevel window, it just consumes the one we already created.

Does it change any window styles or anything like that?

It's possible that there is a window style mismatch. On debug builds we assert that the style of the early window matches the style that we intend to create, though - so we should be failing that assertion on debug builds for anyone who would be experiencing this - if a window style discrepancy is the underlying cause.

This effect does not go away, it is not something that just happens early in the process.

Does dmanip work for new windows created after the first one?

Does it poke win32 in a weird way that maybe puts our app in a weird mode as far as dmanip is concerned? Does the skeleton ui window stick around? Can we easily disable big chunks of the code enabled by the pref to see what caused this?

You could try locally returning a null from ConsumePreXULSkeletonUI - this will leave the skeleton UI window floating around but your real window should be created in the normal fashion.

My bet suspicion I can think of right now for this is that perhaps a problem arises due to the fact that with the skeleton UI enabled, we show a window prior to initializing dmanip. To test this you could try commenting out the call to sShowWindow in PreXULSkeletonUI.cpp?

I wouldn't mind debugging this and solving it myself, though maybe I would need to get loaner hardware to reproduce this? I do most of my work on a desktop Windows machine, and my one Windows laptop does not seem to exhibit this problem (scrolling continues after fingers leave the touchpad with or without the skeleton UI enabled.)

Flags: needinfo?(dothayer)

Thanks for your quick and thorough reply!

(In reply to Doug Thayer [:dthayer] (he/him) from comment #8)

In theory it shouldn't - we create the first window much earlier than usual, trying our best to exactly match the first toplevel window that nsWindow creates, and then when nsWindow actually comes along and tries to create its first toplevel window, it just consumes the one we already created.

Ah, it we re-use the skeleton hwnd as our hwnd, that could be involved somehow, I didn't realize that.

Does it change any window styles or anything like that?

It's possible that there is a window style mismatch. On debug builds we assert that the style of the early window matches the style that we intend to create, though - so we should be failing that assertion on debug builds for anyone who would be experiencing this - if a window style discrepancy is the underlying cause.

Yeah, I didn't hit any assert when reproducing a debug build.

This effect does not go away, it is not something that just happens early in the process.

Does dmanip work for new windows created after the first one?

Yes. We call RecreateDirectManipulationIfNeeded when we create a new window.

<snip some debugging suggestions>

I'll give those a try.

I wouldn't mind debugging this and solving it myself, though maybe I would need to get loaner hardware to reproduce this? I do most of my work on a desktop Windows machine, and my one Windows laptop does not seem to exhibit this problem (scrolling continues after fingers leave the touchpad with or without the skeleton UI enabled.)

I'm not 100% sure how to reproduce this on demand, but I did develop these complicated steps which seemed to work 100% of the time and allowed me to bisect this:

run mozregression, ignore the instance that it runs though (keep it open though). Use the firefox.exe that it creates in the tmp dir and run with my default nightly development profile, test scrolling in that, close, run it again, test scrolling. Then run that firefox.exe again with no cmd line arguments (this seems to cause it to use/create some sort of fresh profile), test scrolling, close it. Then run firefox.exe again with no cmd line arguments (so that regular about:home comes up), test scrolling, the scrolling will now have no inertia.

For your laptop to reproduce it would need to have a precision touchpad, you can verify this in Windows settings under touchpad, it'll say precision touchpad if you have one. I've also heard from a couple developers with thinkpads that they were able to find and install some drivers that then "activated" the precision touchpad on their machine.

I do almost all of my Windows development on a desktop machine, I bought a usb touchpad so I could implement dmanip, specifically this https://www.amazon.ca/gp/product/B07TM8FZYB/ and it works quite well. I'm going to spend some time today debugging this though.

(In reply to Doug Thayer [:dthayer] (he/him) from comment #8)

Does dmanip work for new windows created after the first one?

Ah, I see what you are asking now. Creating a second window does not have the issue when the first window does.

Making ConsumePreXULSkeletonUI return null seems to fix it. Commenting out the call to sShowWindow does not fix it.

Another clue is that when the bug happens the mouse cursor stays at the "busy" circle seemingly forever, I don't think it does that when the bug does not happen.

Note that when developing dmanip for firefox we found it was necessary to add the flags WS_EX_LAYERED | WS_EX_TRANSPARENT to the compositor window here

https://searchfox.org/mozilla-central/rev/f6ffb71dca9eb491e85aa95042380b2602008b00/widget/windows/WinCompositorWindowThread.cpp#171

after talking to Microsoft they confirmed that is correct and not a bug.

However, the compositor window does not seem to be involved here as disabling it (by changing the gfx setup we use by changing prefs) does not fix the problem. I mention that only because dmanip has been shown to have unexpected interactions in the past.

One other possible thought: when I say we don't get inertia events from dmanip I've been simplifying, we do get inertia events but they never seem to exceed about 2 pixels total even for large, fast flings. We actually can get lots of them for very small increments that total no more than 2 pixels. So many some scale is getting set wrong by a factor of 10 to 100 to 500 or more?

Hmm - the only thing we do that's scaling related is we use our own WndProc (PreXULSkeletonUIProc) which calls EnableNonClientDpiScaling for the HWND. It seems unlikely that this is the source of the problem, but you may be able to test this by commenting out these lines.

That being said, I think I'm just going to go ahead and order that touchpad.

Yeah, ordering the touchpad seems like a good idea, I didn't really make much progress on this.

We were skipping the initial ResizeDirectManipulationViewport call when the
skeleton UI showed a maximized window, because it pseudo-ignores the first
Move/Resize in order to not break the maximization. There's no reason
ResizeDirectManipulationViewport should have been in the else clause - it just
wasn't properly considered.

Assignee: nobody → dothayer
Status: NEW → ASSIGNED

This change is just cleanup and have no behavioral difference, given that
SetThemeRegion only does anything for eWindowType_popups, which never have a
skeleton UI.

Depends on D108849

Pushed by dothayer@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/382fac447105
Ensure ResizeDirectManipulationViewport called on move/resize r=tnikkel
https://hg.mozilla.org/integration/autoland/rev/b5075bfc56f2
Move SetThemeRegion inside skeleton UI maximization else clause r=tnikkel
Regressed by: 1680258
Status: ASSIGNED → RESOLVED
Closed: 1 month ago
Resolution: --- → FIXED
Target Milestone: --- → 88 Branch
Flags: qe-verify+

I tried to reproduce this issue on Nightly 2021-03-08 and 86.0 Release with the steps on comment 0. Scrolling works properly on Windows 10x64. We checked this on two different machines.

DreamlikeCuttlefish, could you please help us verify this issue?

Flags: needinfo?(daphnei.mee)

(In reply to Giorgia Nichita, Release Desktop QA from comment #20)

I tried to reproduce this issue on Nightly 2021-03-08 and 86.0 Release with the steps on comment 0. Scrolling works properly on Windows 10x64. We checked this on two different machines.

DreamlikeCuttlefish, could you please help us verify this issue?

I am no longer able to reproduce the issue, at least from my 20 minutes of testing on old and current nightly builds. The issue tended to occur after I had been using a session of nightly for several hours, and then the issue would just start occurring. And it wouldn't go away even if i closed/reopened nightly until I restarted with addons disabled or in safe mode.

I wish there was more information I could give you right now. Please let me know if there is anything else I can test.

Flags: needinfo?(daphnei.mee)

**And the issue no longer occurs at all for me on new nightly builds

(In reply to Giorgia Nichita, Release Desktop QA from comment #20)

I tried to reproduce this issue on Nightly 2021-03-08 and 86.0 Release with the steps on comment 0. Scrolling works properly on Windows 10x64. We checked this on two different machines.

DreamlikeCuttlefish, could you please help us verify this issue?

Ah, I should have posted a note in here: this issue only manifests with a maximized window. You have to maximize the window, then close Firefox and reopen.

(In reply to Doug Thayer [:dthayer] (he/him) from comment #23)

Ah, I should have posted a note in here: this issue only manifests with a maximized window. You have to maximize the window, then close Firefox and reopen.

Thanks Doug, for this extra detail. But unfortunately, I still can't reproduce the issue.

(In reply to DreamlikeCuttlefish from comment #22)

**And the issue no longer occurs at all for me on new nightly builds

Can you please also verify this on 88.0 RC, so we can close the issue, in case everything works as expected there as well? Thank you!

Flags: needinfo?(daphnei.mee)

(In reply to Giorgia Nichita, Release Desktop QA from comment #24)

(In reply to Doug Thayer [:dthayer] (he/him) from comment #23)

Ah, I should have posted a note in here: this issue only manifests with a maximized window. You have to maximize the window, then close Firefox and reopen.

Thanks Doug, for this extra detail. But unfortunately, I still can't reproduce the issue.

(In reply to DreamlikeCuttlefish from comment #22)

**And the issue no longer occurs at all for me on new nightly builds

Can you please also verify this on 88.0 RC, so we can close the issue, in case everything works as expected there as well? Thank you!

Just tested on 88 beta release candidate and I can not reproduce the issue there. Thank you all for the help with this. It can now be closed.

Flags: needinfo?(daphnei.mee)
Status: RESOLVED → VERIFIED
Flags: qe-verify+
You need to log in before you can comment on or make changes to this bug.