1222190 - Blank white persistent frames happen intermittently when scrolling with the scroll wheel

Reporter

Description

•

10 years ago

Scroll-wheel scrolling will sometimes cause all (or more rarely, most) of a page to be blank white. Arrow-key scrolling (which I'm told doesn't use APZ) consistently behaves fine.

Kelsey Gilbert [:jgilbert]

Reporter

Comment 1

•

10 years ago

It was more obvious on longer pages. I saw it frequently on: https://news.ycombinator.com/item?id=10512882 After a restart, it's not showing up. I restarted because I was unable to access the start menu to open the screenshotting util. Perhaps these are related, and there's handle or resource exhaustion? Alternatively, my machine resets its GPU about one in three times I sleep+wake it, so perhaps it's due to device reset handling.

Kelsey Gilbert [:jgilbert]

Reporter

Comment 2

•

10 years ago

I had restarted the browser a couple times, but that didn't seem to fix it, or didn't fix it for long.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 3

•

10 years ago

Resolving since this went away and is not actionable any more. If you encounter it again please reopen this bug and don't reboot. We'll need to enable various loggings to diagnose the problem when somebody is able to reproduce it.

Status: NEW → RESOLVED

Closed: 10 years ago

Resolution: --- → WORKSFORME

Kelsey Gilbert [:jgilbert]

Reporter

Comment 4

•

10 years ago

I'm still getting this, though it's closer to 1 in 300 frames right now. Previously it was as bad as 1 in 3 before I restarted.

URL: https://news.ycombinator.com/item?id=...

Status: RESOLVED → REOPENED

Resolution: WORKSFORME → ---

Kelsey Gilbert [:jgilbert]

Reporter

Comment 5

•

10 years ago

Attached image partial rendering failure — Details

Kelsey Gilbert [:jgilbert]

Reporter

Comment 6

•

10 years ago

Attached image full rendering failure — Details

Given that this also happens sometimes partially, maybe it's happening along tile boundaries?

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 7

•

10 years ago

Hm, interesting. Can you set the apz.highlight_checkerboarded_areas pref to true (requires browser restart)? That will make checkerboarding appear as a pink color. If the white is replaced by pink then it's a perma-checkerboarding problem. If not then it's likely that the painted layers themselves are bad. A layers dump might also help.

Kelsey Gilbert [:jgilbert]

Reporter

Comment 8

•

10 years ago

Attached image partial-failure-gmail.jpg — Details

Kelsey Gilbert [:jgilbert]

Reporter

Comment 9

•

10 years ago

Attached image full-failure-with-pref.JPG — Details

Kelsey Gilbert [:jgilbert]

Reporter

Comment 10

•

10 years ago

Attached image partial-failure-with-pref.JPG — Details

Kelsey Gilbert [:jgilbert]

Reporter

Updated

•

10 years ago

Flags: needinfo?(bugmail.mozilla)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 11

•

10 years ago

Ok, so that definitely looks like perma-checkerboarding. :( Either repaint requests aren't getting through or the painted content isn't getting through. Likely the former since scrolling with keyboard works. What device are you seeing this on? It would be easiest to debug if I can reproduce it locally, otherwise I'll need to send you a special build with extra logging or ask you to attach a debugger and see what's going on. If you *do* want to attach a debugger, I'd be interested to know if, in the parent process, we enter the function at [1] and where in that function code flow goes through, and also if we hit the code at [2]. [1] https://dxr.mozilla.org/mozilla-central/rev/a8ed7dd831d1969a5a1a8636e63bd93d6aeaf94a/gfx/layers/apz/src/AsyncPanZoomController.cpp#2732 [2] https://dxr.mozilla.org/mozilla-central/rev/a8ed7dd831d1969a5a1a8636e63bd93d6aeaf94a/gfx/layers/apz/src/AsyncPanZoomController.cpp#2790

Blocks: all-aboard-apz

Flags: needinfo?(bugmail.mozilla)

Kelsey Gilbert [:jgilbert]

Reporter

Comment 12

•

10 years ago

It's a 2013 15" retina macbook pro running Win10. This is from my about:memory: 5,488.44 MB ── canvas-2d-pixels <<<<<< What?! 58.04 MB ── d3d11-shared-textures 0.00 MB ── d3d9-shared-texture 0.00 MB ── d3d9-shared-textures 0.00 MB ── gfx-d2d-vram-draw-target 0.00 MB ── gfx-d2d-vram-source-surface 0.00 MB ── gfx-surface-win32 0.00 MB ── gfx-textures 0.00 MB ── gfx-tiles-waste 2 ── ghost-windows 0.00 MB ── gpu-committed 75.40 MB ── gpu-dedicated 224.34 MB ── gpu-shared 1,630.80 MB ── heap-allocated 2,110 ── heap-chunks 1.00 MB ── heap-chunksize 1,666.61 MB ── heap-committed 2,110.00 MB ── heap-mapped 2.19% ── heap-overhead-ratio 0 ── host-object-urls 3.84 MB ── imagelib-surface-cache-estimated-locked 3.94 MB ── imagelib-surface-cache-estimated-total 0 ── imagelib-surface-cache-overflow-count 10.59 MB ── js-main-runtime-temporary-peak 4,174.24 MB ── private 2,849.61 MB ── resident 0.18 MB ── shmem-allocated 0.18 MB ── shmem-mapped 50.17 MB ── system-heap-allocated 5,861.49 MB ── vsize 129,988,053.61 MB ── vsize-max-contiguous 2 ── webgl-buffer-count 0.00 MB ── webgl-buffer-memory 2 ── webgl-context-count 0 ── webgl-renderbuffer-count 0.00 MB ── webgl-renderbuffer-memory 4 ── webgl-shader-count 2 ── webgl-texture-count 0.00 MB ── webgl-texture-memory

Kelsey Gilbert [:jgilbert]

Reporter

Comment 13

•

10 years ago

After scrolling around some of my GL spec PDFs (pdf.js) some more, it's at: 5,630.24 MB ── canvas-2d-pixels Maybe, or maybe not unrelated.

Alistair Vining

Comment 14

•

10 years ago

I see this as well on a PC with Intel HD Graphics 2000. With apz.highlight_checkerboarded_areas set to true I see the fetching pink colour. (Most of the time -- though I did once see pink in one tab and white in another.) On pages where it has started to occur (usually long pages), it will reliably happen if I make the smallest possible movement of the mouse wheel. Nightly 45.0a1, for about the last week or more.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 15

•

10 years ago

(In reply to Alistair Vining from comment #14) > Nightly 45.0a1, for about the last week or more. If this is a recent regression, any chance you could use mozregression to narrow down when it started? I don't know if you can reproduce it reliably enough for that though.

Alistair Vining

Comment 16

•

10 years ago

Yes, I had trouble with mozregression for a while because of false negatives, but after reading Jeff's comments about restarting, I think the following STR were reliable for me: * Open URL in new tab (e.g. this page, or the original URL), no need to scroll or resize browser * Hibernate PC (didn't try with sleep) * Restart PC and scroll page with mouse wheel That gave the following pushes: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=d53a52b39a95dced722cca90ac74529b66dd5253&tochange=5ca03a00d26823ce91ee0eaa2937bed605bd53c1 https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=076f95c512c459f9bd668222fc65b75f442ae4e0&tochange=c9af4f7acba9c7cb9c486c4343b3bad26e367925 Bug 1207412: Mouse wheel smooth-scroll "jumps forward" every ~250ms with APZ enabled - regression from 2015-08-24 Obviously if the STR are not 100% then this is just a red herring. Without hibernating the PC it comes and goes; yesterday it seemed to be present all day, this morning I couldn't otherwise see it or it would happen once and go away.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 17

•

10 years ago

Thank you so much for getting that regression window. It certainly seems reliable enough if you were able to bisect it down that far, and the regressing changeset is definitely related so your result makes sense. Brian, any thoughts on how hibernation might affect the timestamps on the wheel events?

Blocks: 1207412

Flags: needinfo?(bbirtles)

Brian Birtles (:birtles)

Comment 18

•

10 years ago

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #17) > Thank you so much for getting that regression window. It certainly seems > reliable enough if you were able to bisect it down that far, and the > regressing changeset is definitely related so your result makes sense. > Brian, any thoughts on how hibernation might affect the timestamps on the > wheel events? That changeset only switched a call from TimeStamp::NowLoRes to TimeStamp::Now. It's possible the underlying API calls behave differently in the face of hibernation but I really don't know without stepping through with a debugger. Unfortunately I won't have time to look into it this week. If anyone else can have a look, I'd be most grateful.

Flags: needinfo?(bbirtles)

David Anderson [:dvander] - inactive, e-mail if emergency

Comment 19

•

10 years ago

(In reply to Alistair Vining from comment #16) > Yes, I had trouble with mozregression for a while because of false > negatives, but after reading Jeff's comments about restarting, I think the > following STR were reliable for me: > * Open URL in new tab (e.g. this page, or the original URL), no need to > scroll or resize browser > * Hibernate PC (didn't try with sleep) > * Restart PC and scroll page with mouse wheel > > That gave the following pushes: > > https://hg.mozilla.org/mozilla-central/ > pushloghtml?fromchange=d53a52b39a95dced722cca90ac74529b66dd5253&tochange=5ca0 > 3a00d26823ce91ee0eaa2937bed605bd53c1 > > https://hg.mozilla.org/integration/mozilla-inbound/ > pushloghtml?fromchange=076f95c512c459f9bd668222fc65b75f442ae4e0&tochange=c9af > 4f7acba9c7cb9c486c4343b3bad26e367925 > > Bug 1207412: Mouse wheel smooth-scroll "jumps forward" every ~250ms with APZ > enabled - regression from 2015-08-24 > > Obviously if the STR are not 100% then this is just a red herring. Without > hibernating the PC it comes and goes; yesterday it seemed to be present all > day, this morning I couldn't otherwise see it or it would happen once and go > away. Alistair, I've been trying to reproduce this on my laptop but haven't been able to get it yet. Should it be sufficient to close the lid, wait for hibernation to complete, then reopen it? (I did make it is hibernating, rather than sleeping.)

Alistair Vining

Comment 20

•

10 years ago

That should be sufficient, David: I have a desktop PC and the power button is set to Hibernate (S4), so I just push that and wait for the power light to go out. I have just tested sleep and that seems to reproduce the bug as well.

David Anderson [:dvander] - inactive, e-mail if emergency

Comment 21

•

10 years ago

(In reply to Alistair Vining from comment #20) > That should be sufficient, David: I have a desktop PC and the power button > is set to Hibernate (S4), so I just push that and wait for the power light > to go out. I have just tested sleep and that seems to reproduce the bug as > well. I'm having trouble reproducing this on my Desktop as well. What version of Windows do you have? Could you also attach a copy of about:support? (Just the graphics section is enough)

Alistair Vining

Comment 22

•

10 years ago

Attached file Contents of about:support (normal + safe mode) — Details

I am running the recent Windows 10 Home version 1511; I have attached the contents of about:support and below that the same details from safe mode where the bug also shows up.

David Anderson [:dvander] - inactive, e-mail if emergency

Comment 23

•

10 years ago

(In reply to Alistair Vining from comment #22) > Created attachment 8691021 [details] > Contents of about:support (normal + safe mode) > > I am running the recent Windows 10 Home version 1511; I have attached the > contents of about:support and below that the same details from safe mode > where the bug also shows up. Well, I'm out of ideas for trying to repro this locally then :) Alistair, I'll try to get you an instrumented build with extra logging.

David Anderson [:dvander] - inactive, e-mail if emergency

Updated

•

10 years ago

Assignee: nobody → dvander

Status: REOPENED → ASSIGNED

Erin Lancaster [:elan]

Updated

•

10 years ago

Priority: -- → P1

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 24

•

9 years ago

Jeff Gilbert was looking at this today, he caught it in the debugger but with questionable symbols. It looked like the timestamp object was getting corrupted when it was getting passed into TaskThrottler::TaskComplete from AsyncPanZoomController. Also it looked like some of the TimeStamp values were 0 and otherwise kind of dodgy. I'm hoping that dvander's deletion of the TaskThrottler in bug 1192919 will just make this problem go away but if it doesn't then we'll have to debug it further.

Depends on: 1192919

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 25

•

9 years ago

I'm able to reproduce this issue using Hibernate on a Surface running windows 10. I also tested using a try build that had the patches from bug 1192919, and I can still repro it. I'll investigate.

Assignee: dvander → bugmail.mozilla

No longer depends on: 1192919

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Updated

•

9 years ago

Blocks: 1233053

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 26

•

9 years ago

So the problem seems to happen at the start of a wheel-scroll animation in the APZ. What happens it that mLastSampleTime gets reset to Now() at [1] which has little relationship to the vsync timestamp that comes in at [2]. By "little relationship" I mean that Now() is almost always ahead of the vsync timestamp, and by sometimes large amounts. The computed sampleTimeDelta value is effectively garbage on the first animation sample, which results in the axis velocity getting set to garbage at [3], and that throws the displayport calculation way off. I'm not entirely sure why this only happens after hibernation, though. As far as I can tell it should happen even during normal operation. I have a patch that band-aids the problem but I'm not really satisfied that I've found the root cause yet, so I'll dig a bit more. [1] https://dxr.mozilla.org/mozilla-central/rev/0711218a018d912036f7d3be2ae2649e213cfb85/gfx/layers/apz/src/AsyncPanZoomController.cpp#2573 [2] https://dxr.mozilla.org/mozilla-central/rev/0711218a018d912036f7d3be2ae2649e213cfb85/gfx/layers/apz/src/AsyncPanZoomController.cpp#2910 [3] https://dxr.mozilla.org/mozilla-central/rev/0711218a018d912036f7d3be2ae2649e213cfb85/gfx/layers/apz/src/WheelScrollAnimation.cpp#61

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 27

•

9 years ago

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #26) > The computed > sampleTimeDelta value is effectively garbage on the first animation sample, > which results in the axis velocity getting set to garbage at [3], and that > throws the displayport calculation way off. So this is true. The difference between the pre- and post-hibernation scenarios is that in the post-hibernation scenarios almost all the animation samples are "first" animation samples. This happens because the wheel transaction keeps getting terminated and we start a new wheel block, which cancels and restarts the animation. The wheel transaction getting terminated, in turn, because we keep getting eMouseHitTest event types coming in to the APZCTreeManager and hitting the MaybeTimeout call at [1]. For some reason the timestamps on the eMouseHitTest events are a few seconds off from the timestamps on the wheel events, causing the problem. But wait! The call at [1] is actually using TimeStamp::Now() instead of the timestamp on the eMouseHitTest (but they're about the same, so it doesn't matter). Some more investigation reveals that the wheel events get their timestamp from SystemTimeConverter::GetTimeStampFromSystemTime(::GetMessageTime()) at [2]. And this is different from TimeStamp::Now(). Pre-hibernation the difference is less than 20ms but post-hibernation this goes up to ~3300ms. The code in SystemTimeConverter is supposed to ensure this never happens, but on windows the GetTimeAsyncForPossibleBackwardsSkew function for the callback isn't implemented, and so backwards skew never actually gets corrected. [1] https://dxr.mozilla.org/mozilla-central/rev/66fb852962c0d5f6f5fe0604204da4f5d17763c9/gfx/layers/apz/src/APZCTreeManager.cpp#994 [2] https://dxr.mozilla.org/mozilla-central/rev/66fb852962c0d5f6f5fe0604204da4f5d17763c9/widget/windows/nsWindow.cpp#3822

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 28

•

9 years ago

Attached patch Patch — Details — Splinter Review

This fixes the skew post-hibernate and solves the problem. Looking at the Linux implementation I suppose it would be better to do a ::PostMessage and then run the skew fix when that message is delivered, but to do a PostMessage I think I need a window handle which this code doesn't have access to.

Attachment #8700090 - Flags: review?(bbirtles)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 29

•

9 years ago

And a try push to go with: https://treeherder.mozilla.org/#/jobs?repo=try&revision=3eb9ada1a709

Brian Birtles (:birtles)

Comment 30

•

9 years ago

Comment on attachment 8700090 [details] [diff] [review] Patch Review of attachment 8700090 [details] [diff] [review]: ----------------------------------------------------------------- Looks good to me. If Karl is available it would be good for him to check it over too.

Attachment #8700090 - Flags: review?(karlt)

Attachment #8700090 - Flags: review?(bbirtles)

Attachment #8700090 - Flags: review+

Pulsebot

Comment 31

•

9 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/203644fd8bbf

Carsten Book [:Tomcat]

Comment 32

•

9 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/203644fd8bbf

Status: ASSIGNED → RESOLVED

Closed: 10 years ago → 9 years ago

status-firefox46: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla46

Karl Tomlinson (back Dec 8 :karlt)

Comment 33

•

9 years ago

Comment on attachment 8700090 [details] [diff] [review] Patch There are a couple of issues here. One is that I don't know of anything to provide that native messages already in the queue will be processed before this Gecko event is processed. Without that, the skew adjustment can change between two associated native events that have the same native time, which will result in them no longer having the same timestamp and so they will appear not to be associated. The other is that GetMessageTime() returns the time of the last native message processed and there is nothing to provide that another message will be generated after aNow was generated. CompensateForBackwardsSkew() assumes that aReferenceTime is after aLowerBound, which is not necessarily true here, and so the UpdateReferenceTime() call can over-correct. Pending events will seem to be in the future when received and so their timestamps will compressed together. Using GetTickCount() instead of GetMessageTime() would address the second issue but not the first. (In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #28) > Looking at the > Linux implementation I suppose it would be better to do a ::PostMessage and > then run the skew fix when that message is delivered, but to do a > PostMessage I think I need a window handle which this code doesn't have > access to. Yes, I think that would produce the desired results (though I'm not familiar with the WINAPI methods). https://msdn.microsoft.com/en-us/library/windows/desktop/ms644944%28v=vs.85%29.aspx indicates that the hWnd parameter is optional and can have the NULL value.

Attachment #8700090 - Flags: review?(karlt)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Updated

•

9 years ago

Blocks: 1234838

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 34

•

9 years ago

Thanks, I filed bug 1234838 to switch to use ::PostMessage. I'll test my patch and put it up shortly.

Vasilica Tamas, QA [:vtamas]

Updated

•

9 years ago

Flags: qe-verify+

QA Contact: vasilica.mihasca

Petruta Horea [:phorea]

Comment 35

•

9 years ago

We were unable to reproduce this. Alistair or Jeff, could you please confirm that this works as expected on latest versions? Thank you!

Flags: needinfo?(jgilbert)

Flags: needinfo?(alistair.vining)

Alistair Vining

Comment 36

•

9 years ago

This has worked fine for me on Nightly for months, presumably since the checkin in comment 32.

Flags: needinfo?(alistair.vining)

Petruta Horea [:phorea]

Comment 37

•

9 years ago

Thanks a lot! Marking as verified.

Status: RESOLVED → VERIFIED

Flags: qe-verify+

Flags: needinfo?(jgilbert)

partial rendering failure 10 years ago Kelsey Gilbert [:jgilbert] 503.13 KB, image/jpeg		Details
full rendering failure 10 years ago Kelsey Gilbert [:jgilbert] 180.70 KB, image/jpeg		Details
partial-failure-gmail.jpg 10 years ago Kelsey Gilbert [:jgilbert] 118.90 KB, image/jpeg		Details
full-failure-with-pref.JPG 10 years ago Kelsey Gilbert [:jgilbert] 202.89 KB, image/jpeg		Details
partial-failure-with-pref.JPG 10 years ago Kelsey Gilbert [:jgilbert] 468.23 KB, image/jpeg		Details
Contents of about:support (normal + safe mode) 10 years ago Alistair Vining 4.19 KB, text/plain		Details
Patch 9 years ago Kartikaya Gupta (email:kats@mozilla.staktrace.com) 1.70 KB, patch	birtles : review+	Details \| Diff \| Splinter Review