Closed Bug 1611660 Opened 4 years ago Closed 4 years ago

window.scrollY always returns 0 when Restore (Maximize) button is used

Categories

(Core :: Panning and Zooming, defect, P3)

72 Branch
defect

Tracking

()

RESOLVED FIXED
mozilla79
Tracking Status
firefox-esr68 --- wontfix
firefox72 --- wontfix
firefox73 --- wontfix
firefox74 --- wontfix
firefox75 --- wontfix
firefox76 --- wontfix
firefox77 --- wontfix
firefox78 --- wontfix
firefox79 --- fixed

People

(Reporter: d3_max_t, Assigned: botond)

References

(Regression)

Details

(Keywords: perf, perf-alert, regression)

Attachments

(2 files)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36 Edg/79.0.309.71

Steps to reproduce:

I have some very simple JavaScript code running to detect when a user scrolled away from the top of the page and when a user scrolled back to the top of the page. I am simply checking the value of "window.scrollY". What happens is when I go between maximizing and restoring the browser Window, the scrollY variable starts to be always returned as 0, but only in the restored state and only using the mouse wheel. If i use the vertical scrollbar, everything works as it should. The mouse wheel also works perfectly fine when the browse window is maximized, so i know the "onscroll" events are firing. In fact, I can confirm they are firing in the restored window state as well, but scrollY never changes from 0, until i scorll to about the middle of the page, and all of a sudden scrollY starts returning a value.

This happens in both the current version 72.0.2 and the latest developer edition.

Here is the JS code i have:

function scrollDetect() {
console.log('Scroll event fired');
console.log(window.scrollY);
if (window.scrollY > 0) {
console.log('Scrolled down');
} else {
console.log('At the top');
}
}

window.onscroll = function () { scrollDetect(); };

Steps to reproduce:

  1. Open the console page in the developer tools to monitor the value of window.scrollY

  2. Maximize the browser window and scroll up and down a few times using the mouse wheel. Then scroll all the way to the top and stay at the top of the page.

  3. Hit the restore button on the browser (so the window is not maximized) and scroll up and down a few times using the mouse wheel. Then scroll all the way to the top and stay at the top of the page.

  4. Repeat step 2. Important to stay at the top of the page.

  5. Hit the restore button again, and gently scroll down using the mouse wheel by one tick. Make sure you don't go too far down, so you don't hit the middle of the page. I would suggest making the body very long, something like 100rem so you could clearly observer this bug. You will notice that scrollY returns 0, even when scrolled down a little.

  6. Either manually scroll using the scrollbar or scroll down past the middle of the page using the mouse wheel and you will see scrollY starting to return values again.

Actual results:

scrollY seems to get stuck on returning 0 when the mosue wheel is used after the window was maximized and restored a few times and is left in the restored state, as long as scrolling down doesn't go past the middle of the page.

Expected results:

scrollY should have had a value as soon as mouse wheel was used to scroll down

I should have mentioned, this works perfectly fine in Chrome and the new chromium-based Edge.

Component: Untriaged → DOM: Core & HTML
Product: Firefox → Core

Open the console page in the developer tools to monitor the value of window.scrollY

I checked it with DOM pane of DevTools, but I cannot reproduce this bug with Nightly nor Firefox 72.

Emilio, you touched nsGfxScrollFrame.cpp a lot in this several months, do you have any ideas?

Flags: needinfo?(emilio)

This is pretty hard to debug without a URL or a reproducer. Is there any chance you could attach a page to this bug (using the "attach new file" button), or share a URL where we could reproduce this?

I suspect what's happening is that the main thread scroll offset is not getting synced properly from the APZ thread, or something related to scroll-anchoring. Does the bug reproduce with layout.css.scroll-anchoring.enabled=false?

Flags: needinfo?(emilio) → needinfo?(d3_max_t)

Sure:
https://hummingmindidp.azurewebsites.net

The site is in early stages of development, but you can observe this happening on that page. It's supposed to change the background color of the navigation bar when scrolled down. Just make sure when you are in the restored state, the browser window is still big enough to stay within the same media query (lets say over 1200px).

Flags: needinfo?(d3_max_t)

I can reproduce the problem on the url of comment #5 with Nightly74.0a1 Windows10.
The problem seems to happen with both normal and maximum sizemode

STR:

  1. Make browser normal sizemode
  2. Resize the browser to 1920x768px (against 1920x1080 screen)
  3. Maximized
  4. Open the url https://hummingmindidp.azurewebsites.net and wait for page loading
  5. Restore browser sizemode to normal
  6. Scrolled down a little. with mouse wheel
  7. Observe background color of header

(if not reproduce)
8. Scroll to top
9. Maximized
10. Scrolled down a little. with mouse wheel
11. Observe background color of header

Actual Results:
The header background color would not be changed to black
And Webconsole shows 0 when evaluate window.scrollY

Expected Results:
The header background color should be changed to black

Regression window:
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=b62d582090e11edcdd01c2c707e0b07b33ad0b12&tochange=ecafdbe899a7b2e574922250e356f8610788ad75

Regressed by: 1423011
Has Regression Range: --- → yes

I can repro on Linux too, with and without webrender... Botond, any idea of what might be going on?

Status: UNCONFIRMED → NEW
Component: DOM: Core & HTML → Panning and Zooming
Ever confirmed: true
Flags: needinfo?(botond)

It looks like APZ's knowledge of the visual and layout viewports is getting out of sync for a brief period of time, with the layout viewport reflecting the older (maximized) size and the visual viewport reflecting the newer (unmaximized) size. Need to dig a bit further to understand how this comes to be.

I haven't verified this, but I suspect this code is at fault. I'd really like to get rid of it, but last time I tried it broke other things.

(In reply to Botond Ballo [:botond] [spotty responses until Feb 19] from comment #9)

I'd really like to get rid of it, but last time I tried it broke other things.

Specifically, these are the things that broke when I tried to get rid of this code.

Flags: needinfo?(botond)

At least the macOS crashtest that used to fail without that code is passing now: https://treeherder.mozilla.org/#/jobs?repo=try&tier=1&revision=a628ac932168c607ee63948c752d08c204a7c7f6

We can try removing that code again and see what happens.

Previously, we would wait until the following frame (for uncertain reasons
that date back to B2G), but this meant the layout and visual viewports would
be out of sync for a frame, causing APZ to misbehave.

Assignee: nobody → botond
Status: NEW → ASSIGNED
Priority: -- → P3
Pushed by bballo@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/9dc9deb3ba3b
Accept layout viewport updates from the main thread right away. r=tnikkel
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla74
Regressions: 1603237
Regressions: 1572900
No longer regressions: 1603237
Flags: needinfo?(botond)

:botond, will you come to this for 74 ?

Flags: needinfo?(botond)

I will try, but how soon we can re-land this fix will depend on how tricky tracking down the intermittent regression it caused (bug 1572900) will be.

Flags: needinfo?(botond)

Wontfix for 74 as we are nearing the end of the beta cycle.

So the intermittent failure tracked in bug 1572900 has had a low background occurrence rate on a variety of platforms since long before these patches landed, but, looking at the failures from the day these patches landed and were backed out, they seem to have caused a spike on "windows7-32-mingwclang" -- which is a Tier 2 platform -- in particular.

I did before and after try pushes and indeed, reftests on windows7-32-mingwclang have a 65% failure rate with the patches applied (compared to 0%, out of 20 runs for each task), without the patches.

Kats, do you think we should block the re-landing of these patches on figuring out what's going on with those mingw failures? On the one hand, they're Tier 2 failures and these patches fix some user-noticeable regressions. On the other hand, the high failure rate with these patches may indicate a potential cause for concern even on a Tier 2 platform.

Flags: needinfo?(kats)

I don't think a high failure rate on only mingw-clang should block landing these patches, no. If we can disable the test on mingw-clang that would be good to avoid the sheriffs some headache, but regardless we should land these and continue investigation of the failure. If there's a subtle bug in the patch then maybe we'll get some additional user reports which will help us track down the problem (since debugging the test via try pushes will likely be somewhat painful).

Flags: needinfo?(kats)

I don't think the failure points to a single test we could disable. The harness itself seems to be timing out near the end. Shutdown hang perhaps?

Hm, ok. If you want me to investigate the failure let me know. I'm happy to offload stuff from you so you can focus more on desktop zooming.

Tom, IIRC you've worked on MinGW support. Do you have any theories for why a patch might be causing a reftest harness error, intermittently but very frequently (65% failure rate), only on the "windows7-32-mingwclang" platform?

Do you have any suggestions for how one might investigate such a failure? Is it straightforward to build this configuration locally on a Windows machine?

Flags: needinfo?(tom)

No theories. This build is a cross build; it builds on Linux. It's not trivial to build it locally, but doing so won't get you more than downloading the builds from Try and running them on a Windows machine. FWIW I would confirm the platform: is it only x86 and not x64? Is it both debug and opt or only one of them?

Does the failure reproduce outside the harness? If so you can just debug with WinDbg. If it doesn't, then one would need to run the test+harness locally, which is not easy for a cross build. I've heard it's been done, but I've never succeeded myself (and filed Bug 1577101 for it.) Some people have succeeded in debugging it using an interactive task in try, so that's another option.

This seems to be indicative of a general problem, it's just we managed to reproduce it reliably in an annoying place to reproduce it locally. If you want to disable the MinGW test, I'm okay with that (just please file a new bug about investigating it...)

Flags: needinfo?(tom)

(In reply to Tom Ritter [:tjr] (ni for response to sec-[approval|rating|advisories|cve]) from comment #27)

FWIW I would confirm the platform: is it only x86 and not x64?

It looks to be only x86 and not x64, yes.

Is it both debug and opt or only one of them?

I've only tested debug. I can't find a way to trigger opt reftests for mingw, not even in mach try fuzzy --full.

Does the failure reproduce outside the harness?

It's not a failure of a specific test, but a failure in the harness itself. I've previously speculated it might be a shutdown hang, but looking at the logs more closely, it actually appears to be a startup hang, during the startup of one of the many browser invocations performed in the reftest chunk. It's not even the same test directory every time, but a different one each time.

This means there isn't a single test (or even a single directory of tests) that we could disable. We'd have to disable all reftests on mingw, which we presumably don't want to do.

(In reply to Max from comment #5)

Sure:
https://hummingmindidp.azurewebsites.net

Max, this link doesn't resolve any more.

Could you provide another link which does (or, preferably, attach a local testcase to the bug)? Thanks!

Flags: needinfo?(d3_max_t)

I explored writing an alternative fix to this bug with the hope of avoiding the MinGW test failures. Unfortunately, a Try push shows that the alternative fix triggers the same failures as well.

Based on debugging on Try, it looks like for the affected browser invocations, the reftest webextension is not getting installed or initialized, even though at least some other webextensions are. No idea what might be happening there...

We don't even get as far as the point where the marionette client tries to install the reftest extension.

It looks like the browser process never gets as far as initializing the marionette server, and the client hangs when trying to connect to the server.

The browser process does not even get as far as gBrowserInit.onLoad().

It gets as far as creating a document viewer for the top-level chrome document and calling LoadStart() on it, but LoadComplete() (which is what would trigger firing the onload event and calling gBrowserInit.onLoad() as its handler) is never called on that document viewer.

It's not immediately clear to me where additional logging could be added to diagnose how far the loading process gets / why LoadComplete() never gets called. I'm open to suggestions for how to investigate this further.

There are some r+ patches which didn't land and no activity in this bug for 2 weeks.
:botond, could you have a look please?
For more information, please visit auto_nag documentation.

Flags: needinfo?(botond)

The patches are not in a state where they can land, as they cause high-frequency intermittents for which we do not have a fix yet.

Flags: needinfo?(botond)

The high-frequency intermittents (at least the ones in bug 1572900) seem to have gone away in the last day or two, probably due to changes in the workers. It's probably a good time to rebase these patches and do a try push and reland.

Flags: needinfo?(botond)

Huh, weird. But good news for this bug, I guess!

Flags: needinfo?(botond)

Unfortunately, I'm still seeing a high volume of the same failure as before: https://treeherder.mozilla.org/#/jobs?repo=try&revision=956bd07738080bedbb2970079313f4aeed73fa8a

Hm, ok I'll continue investigating using the win 7 mingw platform. Hopefully it's the same problem I was chasing down on win10-aarch64.

Thinking about this more - even if I do track down all the different hang locations, it's unlikely that they will be fixed soon since the root cause is not obvious and may be inside mingw.

Tom, can you elaborate on these mingw builds - are they builds that we ship to users, or do we just keep them to ensure a mingw-based build doesn't break? In the latter case, do we need to keep running win 7 reftests on these builds? I feel like an increase in intermittent failures in a tier-2 test suite that we're not using for anything in particular shouldn't prevent an obviously-unrelated patch from landing.

Flags: needinfo?(tom)

I assumed they were used for Tor browser but I have nothing to back that up.

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #42)

Tom, can you elaborate on these mingw builds - are they builds that we ship to users,

No.

or do we just keep them to ensure a mingw-based build doesn't break?

Yes.

In the latter case, do we need to keep running win 7 reftests on these builds?

No.

I feel like an increase in intermittent failures in a tier-2 test suite that we're not using for anything in particular shouldn't prevent an obviously-unrelated patch from landing.

You are correct. Feel free to disable the test(s) you need to, file a bug against Bug 1475994 for re-enabling it with details/linking here. You can also make the hang bugs children of the new bug you filed.

Flags: needinfo?(tom)

Bug 1642720 is on autoland so this should be good to re-land now.

Pushed by bballo@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/5cab400444e9
Accept layout viewport updates from the main thread right away. r=tnikkel
https://hg.mozilla.org/integration/autoland/rev/5a874f1887ff
Adjust WR test expectations. r=gw
Status: REOPENED → RESOLVED
Closed: 4 years ago4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla79
Regressions: 1557133

Seeing as this patch has caused a number of other regressions/spikes, should we consider backing it out until we have a better understanding of what's going on?

== Change summary for alert #26127 (as of Thu, 04 Jun 2020 10:45:24 GMT) ==

Regressions:

39% raptor-tp6-google-firefox-cold replayed windows10-64-shippable opt 812.71 -> 497.75
37% raptor-tp6-google-firefox-cold replayed windows7-32-shippable opt 740.00 -> 466.17
28% raptor-tp6-google-firefox-cold replayed linux64-shippable opt 850.29 -> 614.75

Improvements:

64% raptor-tp6-google-firefox-cold loadtime windows7-32-shippable opt 1,101.83 -> 397.08
58% raptor-tp6-google-firefox-cold loadtime windows10-64-shippable opt 1,112.67 -> 464.50
23% raptor-tp6-google-firefox-cold windows7-32-shippable opt 385.83 -> 298.67
21% raptor-tp6-google-firefox-cold windows10-64-shippable opt 384.43 -> 304.04

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=26127

Whiteboard: perf, perf-alert

As the above alert is generating impovments and regressing the "replayed" metric should we further investigate this?

Flags: needinfo?(tarek)
Flags: needinfo?(gmierz2)

I think yes because we are seeing a simultaneous improvement in loadtime and a regression in the replayed value - the improvements are likely false-positives.

Flags: needinfo?(gmierz2)
Flags: needinfo?(tarek)
No longer regressions: 1557133
Keywords: perf, perf-alert
Whiteboard: perf, perf-alert
Flags: needinfo?(fstrugariu)
Regressions: 1650398
No longer regressions: 1572900

regression bug created removing ni?

Flags: needinfo?(fstrugariu)

(Clearing out some old needinfos)

Flags: needinfo?(d3_max_t)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: