Closed
Bug 1501378
Opened 7 years ago
Closed 3 years ago
Slower performance on low end nvidia hardware (https://aliexpress.com )
Categories
(Core :: Graphics: WebRender, defect, P2)
Core
Graphics: WebRender
Tracking
()
RESOLVED
WORKSFORME
People
(Reporter: jrmuizel, Unassigned)
References
Details
Attachments
(1 file)
5.39 MB,
application/x-7z-compressed
|
Details |
From: https://www.reddit.com/r/firefox/comments/9qpa8t/firefox_beta_640_released_webrender_enabled_by/e8as2m8/
WR ON: https://perfht.ml/2Jbf2z1
WR OFF: https://perfht.ml/2JcOk9w
The profiles show us spending all of our time waiting for the GPU. This is from a AMD FX-8320; NVIDIA GeForce 210.
Reporter | ||
Updated•7 years ago
|
Blocks: stage-wr-trains
Priority: -- → P2
Reporter | ||
Comment 1•7 years ago
|
||
I can reproduce this locally on a lower end system.
Comment 2•7 years ago
|
||
I looked at a GPU capture of the top of AliExpress. There is nothing outstanding, the workloads appear to be perfectly suited for WR:
- just below 60 draw calls with ~550 primitives in total
- very little alpha overdraw (101% of opaque pass, 38% of transparent pass)
- not too much GPU cache updates (~70 rows per frame)
- about 1Mb of texture data uploaded
- the counts of render targets and texture cache targets are totally normal
It might be that low-end NV hardware/driver doesn't like something specific that we do? Should be a good case to ask NV contacts about.
Comment 3•7 years ago
|
||
What do the reported GPU times in the WR profiler look like?
Most of the time we are spending is in void mozilla::wr::RenderCompositorANGLE::WaitForPreviousPresentQuery. I don't really understand why we are blocking the CPU on the GPU here (even if it's one previous frame). PC GPU drivers often queue up several frames of work, so blocking here seems sub-optimal, if I'm understanding the code.
Is this actually necessary, and why? What does the performance look like if we just remove this blocking call (even if it leads to a correctness issue while testing)?
Reporter | ||
Comment 5•7 years ago
|
||
Comment 6•7 years ago
|
||
This is a comparison between a GeForce 210 and an integrated Intel GPU (HD4600).
https://gpu.userbenchmark.com/Compare/Nvidia-GeForce-210-vs-Intel-HD-4600-Mobile-115-GHz/m7740vsm7676
If that's a reliable benchmark, then the timings we are seeing are probably not completely surprising right now, since it's running 1080p resolution on a GPU that is much weaker than an HD4600 (seems like ~20% of the throughput in most tests).
Picture caching / tiling _should_ help a lot with this, but it's still at least a couple of weeks away from landing.
If those benchmarks are accurate, I wonder if we should consider blacklisting a GPU that old from WR compositor, for now?
Comment 7•7 years ago
|
||
> Most of the time we are spending is in void mozilla::wr::RenderCompositorANGLE::WaitForPreviousPresentQuery. I don't really understand why we are blocking the CPU on the GPU here
It's expected for CPU to wait if the GPU is too slow, which seems to be the case here. The details on how, where to wait, and what is the frame queue size are still being figured out (in the bug of Comment 4).
The numbers definitely get out of sync with NVidia's description of that wonder piece:
> Modern games and 3D applications demand more graphics performance than ever before and Intel integrated graphics simply aren’t good enough. With 16 processing cores, the GeForce 210 delivers over 10x the performance1 of Intel integrated solutions! If you want to play popular mainstream games like World of Warcraft, Spore or Sims3, the GeForce 210 is an essential addition to your PC.
I do wonder though if we are just hitting a slow path somewhere. Say, depth rejection is way more expensive, or instancing, or something else. It would be useful to know this and have a mitigation path within WR.
Nvidia GeForce 210(D3D10 hardware) is a 2009 GPU built on 40nm process, the description is accurate when compared to Intel integrated GPUs of the same time period, comparing it to much more modern Intel Haswell HD4600 integrated GPU(D3D11.1 hardware) 22nm FinFET process from 2013 is kinda silly.
Comment 9•7 years ago
|
||
My understanding is that lower end nvidia hardware is now excluded from the MVP, can we remove this from the release blocker list?
Comment 10•7 years ago
|
||
(In reply to Nicolas Silva [:nical] from comment #9)
> My understanding is that lower end nvidia hardware is now excluded from the
> MVP, can we remove this from the release blocker list?
In what bug did that happen?
Comment 11•7 years ago
|
||
(I agree we _decided_ to exclude it, I just want to be sure the code landed)
Comment 12•7 years ago
|
||
(In reply to Bobby Holley (:bholley) from comment #10)
> In what bug did that happen?
bug 1478150 + bug 1491141
Comment 13•7 years ago
|
||
(In reply to Jan Andre Ikenmeyer [:darkspirit] from comment #12)
> (In reply to Bobby Holley (:bholley) from comment #10)
> > In what bug did that happen?
>
> bug 1478150 + bug 1491141
Those both landed several months ago, and the report here is only 8 days old. Unless we think the reddit user force-enabled, we may need to tighten the qualified hardware more.
Comment 14•7 years ago
|
||
Also bug 1501533 blocked a bunch.
Comment 15•7 years ago
|
||
> Also bug 1501533 blocked a bunch.
The reporter's Geforce 210 is one of the devices blocked by that bug.
Comment 16•7 years ago
|
||
(In reply to Nicolas Silva [:nical] from comment #15)
> > Also bug 1501533 blocked a bunch.
>
> The reporter's Geforce 210 is one of the devices blocked by that bug.
Thanks.
Comment 17•3 years ago
|
||
Appears to run well on intel/amd hardware I tested, and nv210 is blocked hardware now.
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•