Open Bug 1448778 Opened 6 years ago Updated 2 years ago

Really sluggish performance on a fast machine (CrossProcessSemaphoreReadLock blocking on specific kernel)

Categories

(Core :: Graphics: Layers, defect, P3)

59 Branch
x86_64
Linux
defect

Tracking

()

Performance Impact none

People

(Reporter: jmvalin, Unassigned)

References

Details

(Keywords: perf, Whiteboard: [gfx-noted])

Attachments

(2 files)

Firefox performance (especially scrolling, but not just that) is really bad on my Fedora 27 workstation. The machine is pretty fast: it's a dual Xeon E5-2640 (20 cores, 40 threads) with 128 GB RAM and a Radeon RX 560 video card, so it really shouldn't be slow. As a comparison, my Mozilla-issued W540 laptop with just the Intel GPU (nvidia GPU turned off) feels much faster when running Firefox.

Just to make sure it wasn't just the drivers, I made some tests. I have no performance problem when running Eclipse (which is usually much slower than Firefox). Chromium also seems to be running fine. I tried Firefox 57, 58 and 59 (The Fedora packaged version) as well as 60 (developer edition downloaded directly from Mozilla) and the problem is roughly the same. Overall, glmark2 reports about 3x higher performance on my workstation (on which FF is slow) than on my laptop (on which it's fast).

To give an idea of what I mean by slow, I mean really jerky scrolling (especially when using the arrow key, but also with the mouse). Sometimes, a "page up" or "page down" can take 0.5 to 1 second to redraw a page when it's almost instantaneous on my other machine.
Attached file about:support
Attached file X.org log file
So I managed to narrow down the problem a bit. It seems like the problem only occurs with kernel 4.15, not with 4.13 and 4.14. That being said I'm not convinced it's just a kernel bug, since Chrome and other applications are unaffected.
It would really help if you could get a performance profile when the browser is misbehaving. Here is a guide to getting a profile:

https://developer.mozilla.org/docs/Mozilla/Performance/Reporting_a_Performance_Problem
Component: General → Panning and Zooming
Flags: needinfo?(jmvalin)
Product: Firefox → Core
Whiteboard: [qf]
Also try forcing hw acceleration by starting Firefox with MOZ_ACCELERATED=1 in your environment. That will probably help.
It's slow even with MOZ_ACCELERATED=1. Here's a profile:
https://perfht.ml/2EkIZZC
Here's what I did during the profiling:
1) Open a new tab
2) go to washingtonpost.com (not the only site affected)
3) scroll down through to the bottom of the page using the down arrow key
4) scroll up to the top using the up arrow key
5) repeat 3) and 4) another 3 times (four round trips total)
This is with kernel 4.15.14-300.fc27 and the scrolling is really sluggish.
Flags: needinfo?(jmvalin)
And here's the profile with kernel 4.14.11-300.fc27 doing exactly the same thing as in my previous comment:
https://perfht.ml/2GBswWK
The scrolling is pretty smooth.
The profile with the jank shows that the content process is getting blocked in CrossProcessSemaphoreReadLock::TryReadLock (called from TextureClient::TryReadLock) during a layers transaction and that seems to block the transactions. In the other profile this doesn't seem to happen.

It also makes sense that a kernel update might affect how the cross-process semaphores work and could be affecting this behaviour.

Moving to layers and cc'ing some folks who might know what's going on here.
Component: Panning and Zooming → Graphics: Layers
Summary: Really sluggish performance on a fast machine → Really sluggish performance on a fast machine (CrossProcessSemaphoreReadLock blocking on specific kernel)
Keywords: perf
Priority: -- → P3
Whiteboard: [qf] → [qf] [gfx-noted]
I tracked the regression down to commit 648bc3574716400acc06f99915815f80d9563783 in the kernel. That being said, it's still unclear to me whether the issue is with the kernel, with Firefox or both. As far as I can tell, Firefox is the only application affected -- or at least it's by far the most noticeable.
> https://perfht.ml/2EkIZZC   from Comment 6
> https://perfht.ml/2GBswWK   from Comment 7

By comparing two profile, GLContextGLX::SwapBuffers() also seemed to take long time. Stack was like the following.

-----------------

pthread_cond_wait
__driDriverGetExtensions_virtio_gpu
amdgpu_winsys_create
__driDriverGetExtensions_virtio_gpu
__driDriverGetExtensions_virtio_gpu
__glx_Main
mozilla::gl::GLContextGLX::SwapBuffers()
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #8)
> The profile with the jank shows that the content process is getting blocked
> in CrossProcessSemaphoreReadLock::TryReadLock (called from
> TextureClient::TryReadLock) during a layers transaction and that seems to
> block the transactions. In the other profile this doesn't seem to happen.

The stack of CrossProcessSemaphoreReadLock::TryReadLock was like the following.

do_futex_wait
__new_sem_wait_slow
mozilla::CrossProcessSemaphore::Wait(mozilla::Maybe<mozilla::BaseTimeDuration<mozilla::TimeDurationValueCalculator> > const&)
mozilla::layers::CrossProcessSemaphoreReadLock::TryReadLock(mozilla::BaseTimeDuration<mozilla::TimeDurationValueCalculator>)
mozilla::layers::TextureClient::TryReadLock()
CrossProcessSemaphoreReadLock::TryReadLock was called with 500ms timeout, from it, cross process semaphore might not receive signal from UI process.

> TextureClient::TryReadLock()
>   mReadLock->TryReadLock(TimeDuration::FromMilliseconds(500))

https://dxr.mozilla.org/mozilla-central/source/gfx/layers/client/TextureClient.cpp#476
mattwoodrow, can you comment to this bug?
Flags: needinfo?(matt.woodrow)
Don't know if it helps, but the commit has to do with transparent huge page (presumably in the AMD driver):
"drm/ttm: add transparent huge page support for DMA allocations v2". 

Apparently I'm not the only one hit by that problem. See these posts by another Firefox user also finding a regression on the same commit:
https://grfilms.net/v-video-performance-loss-with-kernel-4-15-x-drm-ttm-problem-YqtleU5YBlA.html
https://bbs.archlinux.org/viewtopic.php?id=233701

I've been profiling the scrolling because it was easier to notice, but I'm also seeing similar issues with video.
So the author of the kernel patch that causes the regression is blaming Firefox:
"What we found is that firefox is doing something rather strange by allocating large textures and then just trowing them away again immediately." and "somebody needs to figure out why firefox and/or the user space stack is doing this constant allocation/freeing of memory".

He also points to to bugs already filed on this:
https://bugzilla.kernel.org/show_bug.cgi?id=198511
https://bugs.freedesktop.org/show_bug.cgi?id=105038
The CrossProcessSemaphoreReadLock::TryReadLock call is waiting on the compositor to finish reading from texture memory, so that we can start writing to it again without racing.

It seems like in this case, it's the compositor thread that is having slowdowns (since we're doing a poor job of memory management), and so the main-thread waiting on the semaphore is the expected behaviour (to prevent us over-producing frames, and allocating even more memory).

For the video case, we have code around to recycle texture memory, but it's implemented per-backend, so it's possible that we're missing it for some media formats and/or platforms.

For scrolling it would be unexpected for us to be re-allocating texture memory every time, that seems like quite a big problem.
Flags: needinfo?(matt.woodrow)
As far as I can tell, a lot of the investigation here was done after acceleration was enabled, which is not a default situation on Linux. It's hard to say based on this bug if there's a similar pathological case on for example accelerated windows with skia drawing. But in this bug I see no evidence of that.
Whiteboard: [qf] [gfx-noted] → [qf-] [gfx-noted]
FYI, I did try with layers.acceleration.force-enabled set to both true and false. The problem was still there.
Performance Impact: --- → -
Whiteboard: [qf-] [gfx-noted] → [gfx-noted]
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: