Closed Bug 1599862 Opened 5 years ago Closed 5 years ago

Rendering corruption after GPU switch to AMD GPU on macOS 10.15 (Catalina)

Categories

(Core :: Graphics: WebRender, defect)

All
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla73
Tracking Status
firefox-esr68 --- unaffected
firefox70 --- disabled
firefox71 --- disabled
firefox72 --- fixed
firefox73 --- fixed

People

(Reporter: mstange, Assigned: Gankra)

References

(Regression)

Details

(Keywords: regression)

Attachments

(2 files)

Steps to reproduce:

  1. Start Firefox with WebRender enabled while the Intel GPU is active.
  2. Use gfxCardStatus to force a switch to the discrete GPU.
  3. Interact some more with Firefox.

Expected results:
No UI corruption.

Actual results:
Some glyphs start turning into garbage after some more interaction, some black text turns teal, some images turn red.

This seems to be a regression from bug 1579664, in a way. Between bug 1574538 and bug 1579664, we would never migrate GL contexts across GPUs. (Non-CoreAnimation migrated GL contexts automatically, enabling CoreAnimation broke the automatic migration, and bug 1579664 added manual migration.)

I don't know when the actual bug was caused. I think there are the following possibilities:

  1. The bug may have existed for a long time and would have been reproducible before CoreAnimation was enabled.
  2. The bug may have been introduced while GPU migration was broken.
  3. The bug may have been caused by the manual migration. The manual migration might work differently than the automatic pre-CoreAnimation migration.

My hunch is that the corruption starts happening once we enlarge the texture cache. Maybe copying between texture array layers is broken in the migrated context.

But in general, the capabilities of the two GPUs are different. If WebRender uses functionality that's only supported on one of the GPUs, things are bound to go wrong after migration, unless WebRender re-queries the capabilities. There's no notification for migration at the moment, so Gecko is not really giving WebRender a chance to handle this properly.

gonna do some investigation on this

Assignee: nobody → a.beingessner

After a bit of testing, it doesn't seem to happen on the 2014 macbook pros, which have nvidia cards.

Also to be clear, part of your STR is enabling gfx.webrender.compositor, right?

(In reply to Alexis Beingessner [:Gankra] from comment #3)

After a bit of testing, it doesn't seem to happen on the 2014 macbook pros, which have nvidia cards.

That's unfortunate, but not too surprising I guess.

Also to be clear, part of your STR is enabling gfx.webrender.compositor, right?

No, this bug happens both with and without gfx.webrender.compositor.

So after a bit of discussion, here's my thoughts/conclusion:

  • it's gonna take us a bit to work out exactly what's going wrong, which is annoying for wr mac dogfooders (on 2015+ macs)
  • putting gpu migration behind a pref here should make the correctness issues go away: https://searchfox.org/mozilla-central/source/widget/cocoa/nsChildView.mm#1711
  • gpu migration is "just" a perf/power optimization, and we only "need" to disable it with wr, so that seems fine to do short-term
  • also even if it's disabled, macos seems to aggressively want to be mac on integrated graphics, so pref issues may go away quickly in some cases (e.g. the discrete-demanding page/application is closed)

So I think I'm going to add a gfx.compositor.gpu-migration flag, with one of two designs:

  • tristate of "force-disable" (0), "force-enable" (2), "use where it works" (1, the default)
  • just a "force-enable" boolean to opt into using it on wr

(not sure which is better, also not sure if I would make it a live pref)

Either way the default would be for gpu migration to be off with WR and on without (which we would change as the impl gets better or worse).

I used mozregression a bit and figured out what's going on, I think.

This is basically bug 1558167. https://hg.mozilla.org/mozilla-central/rev/8d93170e3aaf made us use 256-byte aligned texture uploads, but only when WebRender is initialized with the AMD GPU. The patch was never handling dynamic changes of the GPU. So if we start on the Intel GPU, we don't apply the workaround, and once we switch to the AMD GPU bug 1558167 comes back.
At the time the fix landed, CoreAnimation was on by default and GPU migration was broken, so we never caught on to the fact that the fix didn't handle dynamic GPU changes.

I suggest we change the code to use 256-aligned texture uploads if any of the renderers are AMD, not just based on the current renderer. (We get the name of the current renderer using gl.get_string(gl::RENDERER). Getting the names of all renderers is harder.)

Alternatively, is it possible to re-initialize the Device once the renderer changes?

Flags: needinfo?(dmalyshau)
Summary: Rendering corruption after GPU switch on my machine (Intel HD 530 -> AMD Radeon Pro 460) → Rendering corruption after GPU switch to AMD GPU on macOS 10.15 (Catalina)

We can use this bug for a short-term workaround. I have filed bug 1600178 on finding the "real" fix to GPU switching.

Pushed by abeingessner@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b303f7c7f5ed
hide macos gpu migration on WR behind a pref. r=mstange
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla73
Flags: needinfo?(dmalyshau)

Alexis, could you request Beta approval? This will help people who manually turned on WebRender.

Flags: needinfo?(a.beingessner)

Comment on attachment 9112785 [details]
Bug 1599862 - hide macos gpu migration on WR behind a pref.

Beta/Release Uplift Approval Request

  • User impact if declined: Anyone who enables webrender manually will get very nasty UI corruption bugs. Also it may be harder for us to debug issues with this feature.
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): It just disables a (significant) optimization when running webrender on macos, keeping us on a simpler/safer path.
  • String changes made/needed:
Flags: needinfo?(a.beingessner)
Attachment #9112785 - Flags: approval-mozilla-beta?

Comment on attachment 9112785 [details]
Bug 1599862 - hide macos gpu migration on WR behind a pref.

webrender fix for macs, approved for 72.0b5

Attachment #9112785 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Has Regression Range: --- → yes
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: