Open Bug 1449027 Opened 6 years ago Updated 2 years ago

Website scrolling incorrectly on Firefox (checkerboarding due to slow paints)

Categories

(Core :: Graphics: Layers, defect, P3)

x86_64
macOS
defect

Tracking

()

Performance Impact low

People

(Reporter: edmorales.97, Assigned: bas.schouten)

References

()

Details

(Keywords: perf, Whiteboard: [gfx-noted])

Attachments

(1 file)

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:61.0) Gecko/20100101 Firefox/61.0
Build ID: 20180326132207

Steps to reproduce:

1. Go to waleteros.com
2. Quickly scroll down


Actual results:

Notice that the green menu sidebar on the right appears when you scroll too fast.
Making scrolling very laggy.


Expected results:

Smooth scrollbar, green menu sidebar shouldnt appear.
Try this in any other browser, Safari, Chrome, Opera and it works perferctly fine.

Maybe an issue with JavaScript, CSS or HTML?
Hi Eduardo,

I tested this issue on Mac OS X 10.12 with FF Nightly 61.0a1(2018-03-29) and I can't reproduce it, in my case the scroll is smooth and the menu bar is displayed. 

Can you please retest this in safe mode? Here is a link that can help you:
https://support.mozilla.org/t5/Procedures-to-diagnose-and-fix/Troubleshoot-Firefox-issues-using-Safe-Mode/ta-p/1687#w_how-to-start-firefox-in-safe-mode

Also, it will be a good idea to retest this with a new profile, you have the steps here:https://support.mozilla.org/en-US/kb/profile-manager-create-and-remove-firefox-profiles?redirectlocale=en-US&redirectslug=Managing-profiles#w_starting-the-profile-manager
Component: Untriaged → Panning and Zooming
Flags: needinfo?(edmorales.97)
Product: Firefox → Core
Version: 61 Branch → Trunk
I could reproduce this on macOS with Firefox 60.0b4 (and a build going in the background to eat some CPU). It looks like checkerboarding - we sometimes don't paint fast enough and so the background layer with the green sidebar peeks through.

I got a Gecko profile: https://perfht.ml/2GGfDde which seems to show a lot of time being spent in painting. However I'm not sure if that's the same root cause as Eduardo.

Eduardo, instead of the stuff Ovidiu requested (which would be nice but probably not too helpful), can you get go to https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Reporting_a_Performance_Problem and follow the instructions to get a performance profile while scrolling on this page? That will give us a better idea of what's going and why it's showing up like this.
I will mark this as New, based on comment 2.
Status: UNCONFIRMED → NEW
Ever confirmed: true
I ran the test using Gecko Profile:

https://perfht.ml/2E7kIpN

I can run it again if you'd like as this is the first time using the add-on and I'm not sure if it I used it properly.

Thanks,

Eduardo
Flags: needinfo?(edmorales.97)
Thanks! Your profile shows similar results to mine, where a lot of time (~17%) is spent in ARGBSetRow_X86 during painting. Moving to layers and tagging with [qf] since this is basically painting performance.
Component: Panning and Zooming → Graphics: Layers
Keywords: perf
OS: Unspecified → Mac OS X
Hardware: Unspecified → x86_64
Summary: Website scrolling incorrectly on Firefox → Website scrolling incorrectly on Firefox (checkerboarding due to slow paints)
Whiteboard: [qf]
Priority: -- → P3
Whiteboard: [qf] → [qf] [gfx-noted]
I can reproduce jank here on Linux nightly, too.  Here's a profile of me hitting page-down 3 times, and then page-up 3 times (in a fresh profile): https://perfht.ml/2IJRX5p
Bas, will do a bit of further investigative work on this bug.
Assignee: nobody → bas
Reminder for Kannan: Follow with Bas when he's in the office.
Flags: needinfo?(kvijayan)
This website appears to be broken both in Edge and Firefox at the moment.. I 'can't' scroll down..
Flags: needinfo?(edmorales.97)
(In reply to Bas Schouten (:bas.schouten) from comment #9)
> This website appears to be broken both in Edge and Firefox at the moment.. I
> 'can't' scroll down..

That's not how it was a month ago, but I'm seeing the same behavior now. They must've done a rewrite/update.

I also "can't" scroll down in Chrome (v68 dev, on Linux).  And Firefox Nightly and Firefox Release on Linux. So unless there's an OS dependency here, it seems like the site is now broken in the same way everywhere (unrelated to the original jank issue here).  (I was about to file a webcompat issue when I realized it was broken in Chrome too :))
The website seems to have been fixed.
It wasn't working yesterday. It works today with the original issue still reproducing.
www.waleteros.com
Flags: needinfo?(edmorales.97)
OK, restoring needinfo=Bas from before comment 9.
Flags: needinfo?(bas)
Hrm, there doesn't appear to be any issue here with D2D, however, I do see a type of 'jankiness' when using Skia. Note there's something interesting going on, when you scroll down fast, briefly you see a green 'rectangle' on the lower right of the screen, that immediately disappears again. I see this in Edge as well so it's unlikely to be a correctness issue.

This does indeed appear to be coming mostly from ARGBSetRow_X86, in particular the one on shmem creation (which is where presumably the initial page fault for the memory would occur). This is on a 4K screen, so this would just be the clearing of the immense non-opaque layers that would be constructed here. I don't think this is a painting performance issue per sé, it seems rather like this is an issue with poor layerization decisions, likely related to that crazy green square. This is Jamie's area of expertise.

Jamie, let me know if you agree.
Flags: needinfo?(bas) → needinfo?(jnicol)
Yeah. Lots of time spent painting, especially allocating or clearing buffers, is very often because the layerization of the page drastically changes.

I don't have a green sidebar, but I do get choppy scrolling.

From enabling paint flashing, I see two problems:

* When the header bar appears, the content (text, images) gets repainted, but not the background. This probably means something is splitting the page in to two separate layers.

* When you scroll further down, at some point everything gets repainted. I'm guessing that the thing splitting the layers from the previous bullet has now been scrolled out of the display port so the layers get recombined. (Although I don't know why that would invalidate the whole background rather than just the text and images again, so maybe something else is happening.)
Flags: needinfo?(jnicol)
For the Triage meeting if I don't make it in time, based on the data here this is an example of the types of layerization issues we see on a variety of pages.

Jamie, is there a metabug for this somewhere?

I'd imagine this particular website may not be important enough to justify spending a lot of time on with a high priority, but arguably this class of issues in general is something we should keep finding and addressing.

The question is though, with most of Gfx tied up in web render, is this more promising than other perf work we're doing (since the gfx team outside of WebRender is already basically doing strictly performance work at the moment).
Flags: needinfo?(jnicol)
No we don't have a metabug. Having one would be nice.

It's hard to quantify the impact that fixing each individual site will have. We just don't know if each one is indicative of others or not. (And unfortunately sometimes tweaking the heuristics for one makes others worse).

Overall I think it's definitely worthwhile though. We can drastically improve performance on affected sites, rather than percentage wins. And in my experience they fairly often do benefit other sites too.

For this site, first I need to diagnose it in more depth than above. Then we'll have a better idea of how difficult the fix would be, so can prioritize better.
Flags: needinfo?(jnicol)
We have telemetry for checkerboarding, and in theory a change that drastically improves or hurts checkerboarding should have an impact there. For smaller changes the signal might get lost in the noise.
Had a deeper look. My guesses were partly right.

* At the same time as the header bar appears, the in-content app store badges get `display: none` set on them. This causes the rest of the content to move up some pixels, which means it all needs rerasterized.

* Most of the page is contained in a transform. The bit at the top, with the happy people paying for their drinks, is nested within another transform, which has `transform: translate3d(0, 0, 0); transform-style: preserve3d;`. We therefore make it active, which makes its parent active, which makes the rest of the page go in a ClientPaintedLayer. When the page is scrolled down, this 3d transform leaves the displayport, so the parent transform no longer needs to be active. So the rest of the page now is rendered as a BasicPainteLayer.

I'm not sure what we can do about the first thing - I think the website should be using `visibility: hidden` instead?

I'm not sure why the 3d transform is being used. I see this fairly often. I think it might be a trick some websites use to ensure they get active layers, but that's not always desirable. I've thought before about ignoring it when the transform is zero, but it might hurt in some cases. Maybe that's the sort of thing that we could use checkerboarding telemetry to see if it helps.

Another idea, is to remember that the transform was previously active, and therefore remain active. Once we've stopped scrolling it can change back to inactive to save memory, but whilst we're scrolling it'd be nice to avoid as much invalidation as possible.
Flags: needinfo?(kvijayan)
Hey bas, can you provide a work assessment on this issue?  Given Jamie's investigation - what's a reasonable prioritization?
Flags: needinfo?(bas)
(In reply to Kannan Vijayan [:djvj] from comment #19)
> Hey bas, can you provide a work assessment on this issue?  Given Jamie's
> investigation - what's a reasonable prioritization?

The work that needs to be done here isn't obvious (there's a couple of possible approaches but none of them are perfect). There's a tech evangelism solution here as I understand it from Jamie's comment (i.e. visibility:hidden instead of display:none). In the end this problem seems to primarily affect other platforms than windows, and I'm hard pressed to believe this website is particularly important, so I'd say maybe P2 or even p3 from a QF perspective. Although our layerization logic needs to be better, we know this, and it's being worked on as best we can at the moment.
Flags: needinfo?(bas)
Whiteboard: [qf] [gfx-noted] → [qf:p3:f64] [gfx-noted]
Whiteboard: [qf:p3:f64] [gfx-noted] → [qf:p3][gfx-noted]
Performance Impact: --- → P3
Whiteboard: [qf:p3][gfx-noted] → [gfx-noted]
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: