Closed Bug 1501378 Opened 6 years ago Closed 2 years ago

Slower performance on low end nvidia hardware (https://aliexpress.com)

Categories

(Core :: Graphics: WebRender, defect, P2)

defect

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: jrmuizel, Unassigned)

References

Details

Attachments

(1 file)

From: https://www.reddit.com/r/firefox/comments/9qpa8t/firefox_beta_640_released_webrender_enabled_by/e8as2m8/

WR ON: https://perfht.ml/2Jbf2z1

WR OFF: https://perfht.ml/2JcOk9w

The profiles show us spending all of our time waiting for the GPU. This is from a AMD FX-8320; NVIDIA GeForce 210.
Priority: -- → P2
I can reproduce this locally on a lower end system.
I looked at a GPU capture of the top of AliExpress. There is nothing outstanding, the workloads appear to be perfectly suited for WR:
  - just below 60 draw calls with ~550 primitives in total
  - very little alpha overdraw (101% of opaque pass, 38% of transparent pass)
  - not too much GPU cache updates (~70 rows per frame)
  - about 1Mb of texture data uploaded
  - the counts of render targets and texture cache targets are totally normal

It might be that low-end NV hardware/driver doesn't like something specific that we do? Should be a good case to ask NV contacts about.
What do the reported GPU times in the WR profiler look like?

Most of the time we are spending is in void mozilla::wr::RenderCompositorANGLE::WaitForPreviousPresentQuery. I don't really understand why we are blocking the CPU on the GPU here (even if it's one previous frame). PC GPU drivers often queue up several frames of work, so blocking here seems sub-optimal, if I'm understanding the code.

Is this actually necessary, and why? What does the performance look like if we just remove this blocking call (even if it leads to a correctness issue while testing)?
bug 1500017 has some discussion on this.
Depends on: 1500017
Attached file GPUVIew trace
This is a comparison between a GeForce 210 and an integrated Intel GPU (HD4600).

https://gpu.userbenchmark.com/Compare/Nvidia-GeForce-210-vs-Intel-HD-4600-Mobile-115-GHz/m7740vsm7676

If that's a reliable benchmark, then the timings we are seeing are probably not completely surprising right now, since it's running 1080p resolution on a GPU that is much weaker than an HD4600 (seems like ~20% of the throughput in most tests).

Picture caching / tiling _should_ help a lot with this, but it's still at least a couple of weeks away from landing.

If those benchmarks are accurate, I wonder if we should consider blacklisting a GPU that old from WR compositor, for now?
> Most of the time we are spending is in void mozilla::wr::RenderCompositorANGLE::WaitForPreviousPresentQuery. I don't really understand why we are blocking the CPU on the GPU here

It's expected for CPU to wait if the GPU is too slow, which seems to be the case here. The details on how, where to wait, and what is the frame queue size are still being figured out (in the bug of Comment 4).

The numbers definitely get out of sync with NVidia's description of that wonder piece:

> Modern games and 3D applications demand more graphics performance than ever before and Intel integrated graphics simply aren’t good enough. With 16 processing cores, the GeForce 210 delivers over 10x the performance1 of Intel integrated solutions! If you want to play popular mainstream games like World of Warcraft, Spore or Sims3, the GeForce 210 is an essential addition to your PC.

I do wonder though if we are just hitting a slow path somewhere. Say, depth rejection is way more expensive, or instancing, or something else. It would be useful to know this and have a mitigation path within WR.
Nvidia GeForce 210(D3D10 hardware) is a 2009 GPU built on 40nm process, the description is accurate when compared to Intel integrated GPUs of the same time period, comparing it to much more modern Intel Haswell HD4600 integrated GPU(D3D11.1 hardware) 22nm FinFET process from 2013 is kinda silly.
My understanding is that lower end nvidia hardware is now excluded from the MVP, can we remove this from the release blocker list?
(In reply to Nicolas Silva [:nical] from comment #9)
> My understanding is that lower end nvidia hardware is now excluded from the
> MVP, can we remove this from the release blocker list?

In what bug did that happen?
(I agree we _decided_ to exclude it, I just want to be sure the code landed)
(In reply to Bobby Holley (:bholley) from comment #10)
> In what bug did that happen?

bug 1478150 + bug 1491141
(In reply to Jan Andre Ikenmeyer [:darkspirit] from comment #12)
> (In reply to Bobby Holley (:bholley) from comment #10)
> > In what bug did that happen?
> 
> bug 1478150 + bug 1491141

Those both landed several months ago, and the report here is only 8 days old. Unless we think the reddit user force-enabled, we may need to tighten the qualified hardware more.
> Also bug 1501533 blocked a bunch.

The reporter's Geforce 210 is one of the devices blocked by that bug.
(In reply to Nicolas Silva [:nical] from comment #15)
> > Also bug 1501533 blocked a bunch.
> 
> The reporter's Geforce 210 is one of the devices blocked by that bug.

Thanks.
Blocks: stage-wr-next
No longer blocks: stage-wr-trains

Appears to run well on intel/amd hardware I tested, and nv210 is blocked hardware now.

Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: