Closed Bug 1739908 Opened 3 years ago Closed 3 years ago

bad GPU performance regression on Intel since Firefox 94 with Canvas2D

Categories

(Core :: Graphics: WebRender, defect)

x86_64
Windows 10
defect

Tracking

()

RESOLVED FIXED
97 Branch
Tracking Status
firefox-esr91 --- unaffected
firefox94 --- wontfix
firefox95 --- wontfix
firefox96 --- wontfix
firefox97 --- fixed

People

(Reporter: tempel.julian, Assigned: bobowen)

References

(Regression)

Details

(Keywords: perf, power, regression)

Attachments

(2 files)

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:95.0) Gecko/20100101 Firefox/95.0

Steps to reproduce:

With an Intel Gemini Lake mobile device, open www.vsynctester.com

Actual results:

Since Firefox 94, it drops lots of frames, as according to GPU driver sensors (e.g. use the tool Core Temp for convenience reasons), GPU power consumption exploded to ~8W, and thus the whole SoC starts to throttle clocks.

Expected results:

With Firefox 93, GPU power consumption was reported at ~3W and the result was much smoother accordingly. In both cases, Webrender with GPU acceleration was used.

It is 100% reproducible. Latest Windows 11 and Intel graphics drivers used, display is 1080p 60Hz.

I've created GPU performance profiler traces for Firefox 93 and latest Nightly:
93:
https://share.firefox.dev/3bLA7PU

nightly:
https://share.firefox.dev/3kdZajb

Please don't just switch to Webrender software as a workaround. Gemini Lake only has two weak CPU cores and putting more load on the CPU would be bad too. Linux with native OpenGL instead of ANGLE btw. is affected too.

Component: Untriaged → Graphics: WebRender
Product: Firefox → Core
Blocks: gfx-triage
Keywords: perf, power, regression
OS: Unspecified → Windows 10
Hardware: Unspecified → x86_64

Are you able to bisect this with mozregression?

Flags: needinfo?(tempel.julian)

2021-11-09T15:00:48.060000: DEBUG : Found commit message:
Bug 1709603: Use a separate permanent canvas back buffer when texture has synchronization. r=lsalzman

Differential Revision: https://phabricator.services.mozilla.com/D125201

2021-11-09T15:00:48.060000: DEBUG : Did not find a branch, checking all integration branches
2021-11-09T15:00:48.076000: INFO : The bisection is done.
2021-11-09T15:00:48.076000: INFO : Stopped

Flags: needinfo?(tempel.julian)

Excellent, thank you!

Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: needinfo?(bobowencode)
Regressed by: 1709603
Has Regression Range: --- → yes

Hi, thanks for reporting this.

The nightly performance trace doesn't seem to have any GPU Process tracks, would you be able to create another one please.

Flags: needinfo?(bobowencode) → needinfo?(tempel.julian)
Flags: needinfo?(tempel.julian)

Bob, does the new trace help?

Flags: needinfo?(bobowencode)

(In reply to Ryan VanderMeulen [:RyanVM] from comment #6)

Bob, does the new trace help?

I've had a fairly quick look and it's a little difficult to tell, but there are some longer spikes in the later one, although I'm not sure if that is because it contains more of the overall run. It is quite a bit longer.

The wait time in the canvas code actually seems smaller in the Nightly trace, but then it seems like the issue is more load on the GPU.
The change could cause extra coping of surfaces, but only in the case where the first write in each frame covers the entire canvas.
Maybe vsynctester does that, I'll have to check when I get a chance.

Flags: needinfo?(bobowencode)

It's unfortunately extremely common for canvas2d to "reset before drawing" by doing a full screen draw to clear at the beginning of the frame before proceeding to the rest of the frame.

Bob mentioned he'd follow up.

No longer blocks: gfx-triage
Flags: needinfo?(bobowencode)

I remembered when talking to jgilbert that before the remote canvas we were using a PersistentBufferProviderBasic on Windows anyway.
I think this uses a permanent back buffer with a similar copy at the end as well.

So it would be interesting to know if you see the same performance issues with the pref gfx.canvas.remote set to false in about:config.
You'll need to restart the browser for it to take effect.

Flags: needinfo?(bobowencode) → needinfo?(tempel.julian)

Indeed, with gfx.canvas.remote = false GPU power consumption is again at normal 2W.

Flags: needinfo?(tempel.julian)

The severity field is not set for this bug.
:jimm, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(jmathies)
Summary: bad GPU performance regression on Intel since Firefox 94 → bad GPU performance regression on Intel since Firefox 94 with Canvas2D

Maybe our current canvas work will help here?

Severity: -- → S4
Flags: needinfo?(jmathies) → needinfo?(lsalzman)

I did notice one thing that we're doing extra from the GPU point of view with the remote verses non-remote.
When I switched to using a permanent back buffer, I didn't fix up PersistentBufferProviderShared::PreservesDrawingState, so we push and pop the clips and transforms every frame.
I'm doubtful that would cause such a big difference, but here is a test build with that fixed if you wouldn't mind testing:
https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/ZpWsYuLPSga-hC5ZXaBCjQ/runs/0/artifacts/public/build/install/sea/target.installer.exe

Flags: needinfo?(tempel.julian)

Yes, still bad with that build.

Flags: needinfo?(tempel.julian)
Flags: needinfo?(lsalzman)

Thought some more about the difference from the GPU's point of view with current remote canvas and non-remote.
I realised that if we had a hidden canvas, where the texture isn't forwarded, we might keep copying into the front buffer for remote.
Whereas for non-remote that only happens on forwarding.

It'll need some more work, but here's a build that tries to work around that, would you mind trying it out:
https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/ex-RYD1cTHmivZBRMxM4Dw/runs/0/artifacts/public/build/install/sea/target.installer.exe

Flags: needinfo?(tempel.julian)

Yep, with that build, power consumption is back to normal. :)

Flags: needinfo?(tempel.julian)

(In reply to walmartguy from comment #17)

Yep, with that build, power consumption is back to normal. :)

Great, thanks for testing.
Now I just need to see if it/make sure it doesn't break things in other ways. :-)

Will the fix also improve situation for Linux/Android builds with native GL Webrender?

(In reply to walmartguy from comment #19)

Will the fix also improve situation for Linux/Android builds with native GL Webrender?

No, I think this would only be affecting Windows, sorry.

(In reply to Bob Owen (:bobowen) from comment #18)

(In reply to walmartguy from comment #17)

Yep, with that build, power consumption is back to normal. :)

Great, thanks for testing.
Now I just need to see if it/make sure it doesn't break things in other ways. :-)

Unfortunately that change did have other problems.
I hope I've come up with an alternative though.
This one doesn't start using the permanent back buffer unless we try to read lock a front buffer (for copy or snapshot), that is still in use by the compositor.
Hopefully this will mean that we only use one buffer (and no copies) for the unforwarded case (like on vsynctester) and also for the case that the JS always writes to the full canvas at the start of frames and doesn't need a snapshot between frames.
Anyway, if you wouldn't mind testing another test build, sorry:
https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/Q06ArVpvTse5YOWTst_UMw/runs/0/artifacts/public/build/install/sea/target.installer.exe

Flags: needinfo?(tempel.julian)

Power consumption is still low with that build.

I'll recheck situation on Linux, just to be sure.

Flags: needinfo?(tempel.julian)

(In reply to walmartguy from comment #22)

Power consumption is still low with that build.

I'll recheck situation on Linux, just to be sure.

Excellent, thanks for all the testing.
I'll get those changes up for review.

This measure was originally put in to help with what was believed to be an issue
with ClearCachedResources, but we now think it was down to textures being
re-forwarded on tab switch when already read locked.
A change in bug 1717209 fixed this, so I think we can safely remove
mTextureLockIsUnreliable, which would cause some compilcations with the
following patch.

Assignee: nobody → bobowencode
Status: NEW → ASSIGNED

This removes some of the changes that meant we started using
mPermanentBackBuffer straight away and we now wait until we actually try and
lock a read locked texture.
While this might still give a very small risk of contention, it gives
improvements in the following two circumstances.

  • If a canvas texture is never forwarded and never read locked, it means we will
    only use one texture with no copies.
  • If a canvas is always fully overwritten at the start of the frame (and a
    snapshot is not taken between frames), then we avoid a copy on each frame.

This also adds back in code so that on an OPEN_READ_WRITE lock we cache the data
surface if required, because that texture will be the new front buffer and we
won't be using mPermanentBackBuffer at that point.

Depends on D132601

Tested that build, it's still fine.

I've retested Linux and situation is different than I initially thought. It hasn't regressed with 94, but also 93 was already relatively bad with ~8.2W SoC power consumption (Xorg EGL backend forced on, no Xorg compositor active). Well, anyway not that pressing matter as the >11W total SoC on Windows with 93.

I'm going to wait and land this as soon as we merge and give it a little while on Nightly just in case we do see a re-emergence of the lock contention.

Pushed by bobowencode@gmail.com: https://hg.mozilla.org/integration/autoland/rev/366bdd769b86 p1: Remove PersistentBufferProviderShared::mTextureLockIsUnreliable. r=lsalzman https://hg.mozilla.org/integration/autoland/rev/757d31ebc575 p2: Only use PersistentBufferProviderShared::mPermanentBackBuffer when first needed. r=lsalzman
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 97 Branch

The patch landed in nightly and beta is affected.
:bobowen, is this bug important enough to require an uplift?
If not please set status_beta to wontfix.

For more information, please visit auto_nag documentation.

Flags: needinfo?(bobowencode)

== Change summary for alert #32689 (as of Fri, 10 Dec 2021 11:29:07 GMT) ==

Improvements:

Ratio Test Platform Options Absolute values (old vs new)
7% pdfpaint windows10-64-shippable-qr e10s stylo webrender 614.64 -> 573.73
5% pdfpaint windows10-64-shippable-qr e10s stylo webrender 611.55 -> 579.65

For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=32689

(In reply to Release mgmt bot [:sylvestre / :calixte / :marco for bugbug] from comment #31)

The patch landed in nightly and beta is affected.
:bobowen, is this bug important enough to require an uplift?
If not please set status_beta to wontfix.

It doesn't look like any issues have been introduced/re-introduced, but given the time of year I think uplifting a performance bug to Beta is probably not the right call.

Flags: needinfo?(bobowencode)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: