Open Bug 1439148 Opened 2 years ago Updated 10 months ago

black flashes when scrolling by scrollbar thumb

Categories

(Core :: Graphics, defect, P3)

53 Branch
Unspecified
All
defect

Tracking

()

Tracking Status
firefox-esr52 --- unaffected
firefox-esr60 --- wontfix
firefox58 --- wontfix
firefox59 --- wontfix
firefox60 --- wontfix
firefox61 --- wontfix
firefox62 --- fix-optional

People

(Reporter: alice0775, Unassigned)

References

(Depends on 1 open bug, )

Details

(Keywords: regression, Whiteboard: [gfx-noted])

Attachments

(2 files)

Attached video screenshot
Mozillazine reported ( http://forums.mozillazine.org/viewtopic.php?p=14792327 )

STR
1. Open https://www.ozbargain.com.au/deals
2. scroll by scrollbar thumb
They have `body {background-color: #333;}`. If I uncheck that rule in Inspector, I stop getting those black flashes.
yep, this http://www.ecma-international.org/ecma-262/7.0/index.html, white flashing :(

Anyway, disabling apz fixes the annoying flash.
This looks like regular checkerboarding. The "black" is just the #333 color of the background.

Can you get a gecko profile (see https://perf-html.io/) while reproducing this? That would be helpful so we can see why it's slow.
Flags: needinfo?(ampersand100000)
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #3)
> Can you get a gecko profile (see https://perf-html.io/) while reproducing
> this? That would be helpful so we can see why it's slow.

Unsure if I got it right: https://perfht.ml/2onM8mb. I recorded this on a fresh profile of v60.0a1 (2018-02-23) (64-bit).
Flags: needinfo?(ampersand100000)
Thanks, that looks good. The profile seems to indicate we're spending a lot of time in the advanced layers compositor, and since the compositor thread is busy it blocks the other threads trying to communicate with it. Bas, can you take a look at the compositor thread in the profile and see if anything looks amiss there?
Flags: needinfo?(bas)
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #5)
> Thanks, that looks good. The profile seems to indicate we're spending a lot
> of time in the advanced layers compositor, and since the compositor thread
> is busy it blocks the other threads trying to communicate with it. Bas, can
> you take a look at the compositor thread in the profile and see if anything
> looks amiss there?

So, in the 'long' composite times (and to be fair, there's not that many, there's mostly fast ones, but a couple 50ms+. In these cases it's not advanced layers specific code, it seems like the GPU is just fairly busy in the background. Some of them are 'opening of shared resources' which is communicating with the windows kernel to get an object wrapping a surface created in the content process, the other cases it's just us waiting for the GPU to finish executing our work.
Flags: needinfo?(bas)
Ok, thanks. Bug 1441324 will probably help here somewhat in that it will stop the main thread from blocking on the compositor as much.
Depends on: 1441324
Priority: -- → P3
Whiteboard: [gfx-noted]
Now that bug 1441324 is fixed, could you get another profile on the latest nightly build? If the issue isn't improved at least the profile will tell us what to look at next.
Flags: needinfo?(ampersand100000)
The black flashes feel about the same to me. Recorded https://perfht.ml/2DKLaVV on 61.0a1 (2018-03-21) (64-bit).
Flags: needinfo?(ampersand100000)
Hm, that profile shows the content process is spending a lot of time (7%) just stuck inside CompositorBridgeChild::FlushAsyncPaints(), which is called from ClientLayerManager::BeginTransactionWithTarget. I thought normal flow was to use BeginTransaction, not BeginTransactionWithTarget. There appears to be some traces of OMTP in this profile as well (4% of time is in ClientPaintedLayer::PaintOffMainThread()). Ryan, is the BeginTransactionWithTarget flow a normal part of OMTP? If so, is there anything we can do to reduce the blocking in FlushAsyncPaints()?
Flags: needinfo?(rhunt)
Yeah BeginTransaction() should be being called not, BeginTransactionWithTarget, but BeginTransaction always immediately calls BeginTransactionWithTarget, so maybe it's being inlined?

So FlushAsyncPaints() is expected to block in cases where the main thread is getting ahead of the paint thread. Like when we have a slow paint.

I noticed there is some D2D lock contention on PathBuilder in rasterizing mask layers on the main thread. Fixing bug 1420825 would help here. There might be more we could do here, but we'd need to see what's taking so long on the paint thread.
Flags: needinfo?(rhunt)
Would you be able to get another profile?

This time expanding '> Settings' and adding ',Paint' to Threads. This will give us more information into why painting is taking so long.
Sure, here: https://perfht.ml/2IMiNuj
Although this profile has the paint thread, it's not symbolicated. Can you try again? You might need to wait for the symbolication to complete before sharing the profile. The status shows up in a slightly-hard-to-see popup message at the top of the profile page.
Strange, because the top of the page says 'Sharing will be available once symbolication is complete' and the `Share` and `Save as file` buttons don't show up for me until then.

Anyway, I recorded https://perfht.ml/2IP2YCW and https://perfht.ml/2IP3O2y.
https://perfht.ml/2G7ph9e

This is a symbolicated profile with latest nightly and Paint thread. While quick scrolling, the black flashes were quite prominent.
(In reply to Bruce from comment #15)
> Anyway, I recorded https://perfht.ml/2IP2YCW and https://perfht.ml/2IP3O2y.

These are both still not symbolicated. Are you using a regular Nightly build, or something different (a local build maybe)?

(In reply to Mayank Bansal from comment #16)
> https://perfht.ml/2G7ph9e

This one is symbolicated. The only thing that stands out to me is that it seems to be painting six different pages:

PresShell::Paint https://www.ozbargain.com.au/deals?page=1
PresShell::Paint https://www.ozbargain.com.au/deals?page=7
PresShell::Paint https://www.ozbargain.com.au/deals?page=6
PresShell::Paint https://www.ozbargain.com.au/deals
PresShell::Paint https://www.ozbargain.com.au/deals?page=3
PresShell::Paint https://www.ozbargain.com.au/deals?page=2

But it looks like when scrolling on that page it automatically updates the URL based on how far down the infinite scroller you are. It just means you're scrolling rapidly through about 7 "pages" of content, and I guess we just can't keep up.
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #17)
> (In reply to Bruce from comment #15)
> > Anyway, I recorded https://perfht.ml/2IP2YCW and https://perfht.ml/2IP3O2y.
> 
> These are both still not symbolicated. Are you using a regular Nightly
> build, or something different (a local build maybe)?

I'm just using regular Nightly, the same that I've used for the 2 other profiles I submitted that apparently were symbolicated. Could you tell me how I can know if the profile I'm about to share is symbolicated or not? Looking at https://i.imgur.com/4O1Gjwc.png, this would appear to be a bug that I can file at perf.html's github.

> But it looks like when scrolling on that page it automatically updates the
> URL based on how far down the infinite scroller you are. It just means
> you're scrolling rapidly through about 7 "pages" of content, and I guess we
> just can't keep up.

I can reliably reproduce black flashes even if I ensure that I'm scrolling within the top ~2/3rd of the page. This prevents the website from dynamically appending pages; the url always remains `https://www.ozbargain.com.au/deals`.
(In reply to Bruce from comment #18)
> I'm just using regular Nightly, the same that I've used for the 2 other
> profiles I submitted that apparently were symbolicated. Could you tell me
> how I can know if the profile I'm about to share is symbolicated or not?
> Looking at https://i.imgur.com/4O1Gjwc.png, this would appear to be a bug
> that I can file at perf.html's github.

A symbolicated profile has symbol names instead of addresses, see attached image for comparison.

> I can reliably reproduce black flashes even if I ensure that I'm scrolling
> within the top ~2/3rd of the page. This prevents the website from
> dynamically appending pages; the url always remains
> `https://www.ozbargain.com.au/deals`.

Ok, good to know.
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #19)
> A symbolicated profile has symbol names instead of addresses, see attached
> image for comparison.

Thanks. Recorded https://perfht.ml/2uqinqp on v61.0a1 (2018-03-28).
Ryan, do you see any possible OMTP improvements in the profile? ^
Flags: needinfo?(rhunt)
It looks like there is some heavy Direct3D/2D contention going on here. With the main thread creating D3D11 textures for texture clients, the paint thread executing D2D commands, and the compositor compositing.

I'm not sure what is contending with what, especially because the compositor is in a different process. Bas do you have any thoughts? Could this just be heavy GPU load?

I also just stumbled on this when reading about DXGI threading support [1]. Do you think it's worth detecting this and disabling OMTP when we can't create resources and issues commands? Or we could maybe work around it.

[1] https://msdn.microsoft.com/en-us/library/windows/desktop/ff476893(v=vs.85).aspx
Flags: needinfo?(rhunt) → needinfo?(bas)
So it seems like we might be running into contention with the D2D global lock and creation of paths. Here's a build put together by Bas which tries to work around this.

Could you test with this build and see if it's any better?

[1] https://queue.taskcluster.net/v1/task/S5UM8vppSnOZaUTT8cc-0w/runs/0/artifacts/public/build/target.zip
Flags: needinfo?(bas) → needinfo?(ampersand100000)
Unfortunately the black flashes appear to be about the same for me.
(Sorry for the late response)
Flags: needinfo?(ampersand100000)

Moving this out of APZ since it seems like it's a D2D contention problem with OMTP.

Component: Panning and Zooming → Graphics
You need to log in before you can comment on or make changes to this bug.