Closed Bug 1586627 Opened 3 months ago Closed 3 months ago

[macOS] Graphics glitches in firefox caused by using core animation on HD 3000

Categories

(Core :: Widget: Cocoa, defect, P2)

70 Branch
All
macOS
defect

Tracking

()

VERIFIED FIXED
mozilla71
Tracking Status
firefox-esr60 --- unaffected
firefox-esr68 --- unaffected
firefox67 --- unaffected
firefox68 --- unaffected
firefox69 --- unaffected
firefox70 + verified
firefox71 --- verified

People

(Reporter: david.leibovic, Assigned: mstange, NeedInfo)

References

(Regression)

Details

(Keywords: regression)

Attachments

(12 files, 2 obsolete files)

7.85 MB, video/mp4
Details
794.29 KB, video/mp4
Details
595 bytes, text/html
Details
725 bytes, text/html
Details
47 bytes, text/x-phabricator-request
Details | Review
640 bytes, text/html
Details
16.34 KB, text/plain
Details
15.54 KB, image/png
Details
27.34 KB, image/png
Details
6.75 KB, text/html
Details
57.22 KB, image/png
Details
66.06 KB, image/png
Details
Attached video flicker2.mp4

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:70.0) Gecko/20100101 Firefox/70.0

Steps to reproduce:

Operating System: Mac OS 10.13.6
Hardware: MacBook Pro (13-inch, Early 2011), 2.7 GHz Intel Core i7, 4 GB 1333 MHz DDR3, Intel HD Graphics 3000 384 MB
Firefox version: 70.0b11 (64-bit)

Tried to create an event in google calendar.

Actual results:

Black graphics glitches flicker on the screen and I am unable to create an event.

Expected results:

No glitches

Bugbug thinks this bug should belong to this component, but please revert this change in case of error.

Component: Untriaged → Graphics
Product: Firefox → Core

Is this a regression? Did it start happening recently? If so would you be willing to use mozregression to track down what caused it?

https://mozilla.github.io/mozregression/

Alternatively you could try toggling the pref gfx.core-animation.enabled in about:support to see if that makes a difference.

Flags: needinfo?(david.leibovic)

(In reply to Timothy Nikkel (:tnikkel) from comment #2)

Is this a regression? Did it start happening recently? If so would you be willing to use mozregression to track down what caused it?

https://mozilla.github.io/mozregression/

Alternatively you could try toggling the pref gfx.core-animation.enabled in about:support to see if that makes a difference.

Results of running mozregression: https://gist.github.com/dasl-/2225ec7268bed74a6a215bb7d2d5b392

Also, confirmed that setting gfx.core-animation.enabled to false in about:config fixes my issue.

Flags: needinfo?(david.leibovic)

Thanks!

Has Regression Range: --- → yes
Flags: needinfo?(mstange)
Priority: -- → P2
Regressed by: 1574538
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Unspecified → macOS
Hardware: Unspecified → x86_64
Component: Graphics → Widget: Cocoa
Hardware: x86_64 → All

I have been trying to reproduce this but haven't had any luck so far. I've set my window size to the same as in the video, used RDM to set my resolution to 1440x900@1x, and enabled the same calendar layout. So far, everything's working fine unfortunately.

The only obvious remaining difference I can see is the fact that my machine is much newer, has more main memory and GPU memory, and is running 10.12 instead of 10.13.
I can try to reproduce it tonight on my non-Retina Macbook Pro.

Assignee: nobody → mstange
Status: NEW → ASSIGNED
Flags: needinfo?(mstange)

David, when this happens, does rendering break permanently? Or do other page interactions still work?

Flags: needinfo?(david.leibovic)

(In reply to Markus Stange [:mstange] from comment #6)

David, when this happens, does rendering break permanently? Or do other page interactions still work?

I am able to switch to another tab when the glitch happens. Other tabs do not have the glitch.

Flags: needinfo?(david.leibovic)

If I stay on the calendar tab, sometimes the glitch persists until I refresh the page. Other times it goes away without a refresh.

Would a regression range with the core animation pref force enabled help perhaps?

Flags: needinfo?(mstange)

Good idea, it might.

David, can re-run mozregression and, whenever it launches a build, set gfx.core-animation.enabled to true and restart before you test? I think the "easiest" way to restart the browser during mozregression is to open the error console (Cmd+Shift+J) and then press Cmd+Option+R.

Also, can you capture a screenshot of what it looks like when it's not broken? I wonder if the Create event UI contains some visual element for you that I don't have and which might be the trigger of the breakage.

Flags: needinfo?(mstange) → needinfo?(david.leibovic)

You should be able to just pass the pref on the command line so you don't have to flip it each time.

mozregression --pref "gfx.core-animation.enabled:true"

If you are using the gui there is a way to set prefs there too.

Summary: Graphics glitches in firefox → [macOS] Graphics glitches in firefox caused by using core animation

Oh, that's so much better!

Sure, I can do that from the command line. Any suggestion how far back in time I should go for the mozregression --good option?

Flags: needinfo?(david.leibovic)

2019-08-14

Results of running mozregression --pref "gfx.core-animation.enabled:true" --good 2019-08-14: https://gist.github.com/dasl-/516c3056c74e02ced55cdfa681d1852e

Thanks!

So this broke with the initial set of CoreAnimation patches. Those basically just redirected the compositor's rendering to an offscreen framebuffer and put that on the screen as one big layer. So nothing really changed for the compositor itself, other than the fact that it's now rendering to an offscreen framebuffer instead of the onscreen framebuffer.
Oh, and the other thing that changed is the size of the swap chain - the GLContext was double buffered whereas our CoreAnimation solution usually is triple buffered. So it might still be a GPU memory issue.

Could you make another screen recording, with the first broken build, and recording some more interactions with the browser in the broken state? I'm curious about what it looks like all the way from the glitches first appearing to the glitches disappearing once you switch tabs. Please launch Firefox using

mozregression --launch 03a1e5d857b555b7f400a94ca904786cba968c50 --pref "gfx.core-animation.enabled:true"

I'm hoping that looking at the size and position of the broken rectangles will give me some ideas.

Flags: needinfo?(david.leibovic)

Ran the test. Observations:

  1. attempting to create an event in google calendar (either by clicking the "Create" button or clicking on a day of the month) triggers the glitches.
  2. In this version of firefox the glitches turn the entire address bar and adjacent components gray, rather than the black bars I see in later versions of firefox. Furthermore, the entire tab bar turns black, and the entire google calendar header turns white.
  3. switching from the glitching tab to another tab and then back to the glitching tab does not fix the glitches -- the glitches persist.
  4. refreshing the page removes the glitches
  5. Clicking anywhere on the page after triggering the glitches removes the glitches.

Video: https://www.youtube.com/watch?v=cQ3UgYtNWOE

Flags: needinfo?(david.leibovic)

fwiw, I've encountered graphics glitch bugs in the past that seemed to only affect older macs: https://github.com/signalapp/Signal-Desktop/issues/2603

Regressed by: 1491442
No longer regressed by: 1574538

This worries me a bit, mstange, any luck here? I would likely still take a last minute patch for 70.

Flags: needinfo?(mstange)

It worries me too. I have found a similar machine in the Toronto office and will try to reproduce the bug there.
Bug 1587435 is somewhat similar but also a bit different. But so far, these two bugs are the only ones about visual glitches that I've seen, so the problems might not be widespread.

Flags: needinfo?(mstange)

Tested this on my MacBook Pro (13-Inch, Mid-2012) Running Mac OS 10.15 Catalina. 2.5GHz Intel Core i5, 16 GB 1600 MHz DDR3, Intel HD Graphics 4000 1536 MB. Running Firefox 70 and haven't been able to reproduce that issue. Site works as expected.

I can reproduce this on a Mid-2011 Macbook Air (Intel HD 3000, 4gb ram, macOS 10.13.6), the symptoms are identical to what David described.

Attached video recording.mp4

saw the call for testing on reddit...

Sierra/10.12.6
MacBook Air (13-Inch Late 2010)
2.13 core2/duo
4gb 1067 ddr3
nvidia 320m 256.

FF 70.0b13

cannot reproduce

Hello! I've seen the call for testing on reddit. Here are my results:

Mojave/10.14.6
MacBook Air (11-inch early 2015, 7,1)
Intel Core i5 / 1.6 GHz
4 gb ddr3
Intel HD Graphics 6000

Firefox 70.0b13

I cannot reproduce the problem.

I'm unable to reproduce with 70.0b13 (64-bit) on:

Mojave/10.14.6
MacBook Air (11-inch, Mid 2012)
Intel Core i5 1.7 GHz (Ivy Bridge)
4 GB 1600 MHz DDR3
Intel HD Graphics 4000 1536 MB

If Sandy Bridge/HD 3000 is the culprit, then it looks like you need someone with:

  • MacBook Air Mid 2011
  • MacBook Pro Early 2011 (13" preferable, 15" and 17" have a second GPU and auto-switching)
  • MacBook Pro Late 2011 (13" preferable, 15" and 17" have a second GPU and auto-switching)
  • Mac Mini Mid 2011 (Macmini5,1 or Macmini5,3 only)

It looks like iMacs from that era all had discrete GPUs.

We've obtained a system with an HD 3000 and now just reproduced the problem on 10.9

Summary: [macOS] Graphics glitches in firefox caused by using core animation → [macOS] Graphics glitches in firefox caused by using core animation on HD 3000

Thanks to everyone who contributed! It looks like these problems only appear on systems with an Intel HD 3000 GPU.
Now that we have a machine with that GPU and can reproduce the problem, I will try to find a workaround.

I'm still interested in hearing from people who 1) have an Intel HD Graphics 3000 and do not see the problem, and from people who 2) have something that's not an Intel HD Graphics 3000 and do see the problem. Thanks!

Attached file testcase

This reduced testcase displays a green square when things are working. But on Intel HD Graphics 3000 + Firefox 70 with CoreAnimation it displays a red square with rounded corners and text.

It seems like compositing just "gives up" once the problematic element is encountered. Will debug more.

Here's a similar testcase that produces the wrong output even before CoreAnimation, e.g. on Firefox Release 69.0.3, on this particular GPU. It renders correctly on other GPUs or in other browsers.
This demonstrates that there was an existing problem on this GPU that websites could run into, but it was harder to hit, so fewer websites were affected by it.

The testcase is different to the previous one in that it wraps everything inside another opacity group, so it forces the bad rendering to happen inside an intermediate surface. Intermediate surfaces are offscreen framebuffers in the OpenGL compositior.
This finding indicates that the brokenness affects offscreen framebuffers but not the default framebuffer ("framebuffer 0"). When we use CoreAnimation, we always render "offscreen" (we render into a framebuffer that is backed by an IOSurface), so we now hit the bug even for the window itself, not only for intermediate surfaces.

Attached file another testcase, no rounded clip (obsolete) —

Looks like the mask layer was unnecessary; all we need is two nested intermediate surfaces with component alpha inside.

The glitch is related to the "copy background into intermediate surface" business that the compositor does for container layers that contain component alpha layers. If I disable the code that does the copying, the glitches disappear.

Attachment #9100348 - Attachment is obsolete: true

Here's a standalone OpenGL app that reproduces the bug outside of Firefox.

The bug seems to be triggered when the deletion of the most recently read-from framebuffer interleaves with two draw calls in a certain way. More specifically, if you have two framebuffers A and B, the following sequence of events breaks subsequent drawing to B:

  1. A becomes "most recently read-from framebuffer".
  2. B is drawn to.
  3. A is deleted, and other GL state (such as GL_SCISSOR enabled state)
    is touched.
  4. B is drawn to again.

Now all draws to framebuffer B, including the draw from step 4, will render at the wrong position and upside down. The wrong transform is a flip transform that seems to be intended to counteract a flip transform on the "screen" framebuffer, but on offscreen framebuffers it just breaks things. The vertical offset of the flip is always based on the height of framebuffer zero. In headless GL contexts, the flip offset is zero, i.e. all drawing is just mirrored along the framebuffer's bottom edge.

(from my regular work machine which has a Intel HD Graphics 530)

(from the Early 2011 machine with the Intel HD Graphics 3000)

Attached file WebGL testcase

This driver bug is also observable through WebGL.

(this is with a workaround to GLContext.cpp, on the broken machine)

I'll put up two patches with bug workarounds: One that's scoped to CompositorOGL.cpp, and one that's in GLContext.cpp and fixes both the compositor and WebGL. I'll leave it to Jeff Gilbert and Jeff Muizelaar to decide which approach we should take. I think having the workaround in GLContext is preferable because doing so will benefit both the compositor and WebGL, and potentially WebRender in the future. On the other hand, I briefly attempted running the WebGL conformance suite on the Intel HD Graphics 3000 machine and there were so many other failures that I don't know if this worth doing.

Attachment #9100645 - Attachment description: Bug 1586627 - WIP: Work around a bug in AppleIntelHD3000GraphicsGLDriver. → Bug 1586627 - Work around a bug in AppleIntelHD3000GraphicsGLDriver in CompositorOGL. r=jrmuizel, r=jgilbert

CompositorOGL encounters this bug the following way:

Say you have a 3 level deep nesting of ContainerLayers with intermediate surfaces which contain a component alpha layer: A > B > C > L.
For container layers with intermediate surfaces that contain component alpha layers, the background behind that container layer has to be copied into the container layer surface before drawing can happen for its contents. This copy is implemented as a call to glCopyTexImage2D which reads from the "parent" surface. So now the initialization of A copies from the screen, the initialization of B copies from A, C from B.
When C is created, B becomes the framebuffer that has most recently been read from. And B is deleted right after it is drawn to A. The deletion of B happens between two draw calls to A: The first draw call is the draw that puts B into A, and the second draw call is whatever is drawn to A on top. Every CompositorOGL::DrawGeometry call touches the scissor state. So now A is broken.

We can try to fast-track a fix into 70 RC2 if a fix lands on m-c and can be verified tomorrow.

Flags: qe-verify+
Pushed by mstange@themasta.com:
https://hg.mozilla.org/integration/autoland/rev/eea4ecbe16b6
Work around a bug in AppleIntelHD3000GraphicsGLDriver in CompositorOGL. r=jrmuizel

Comment on attachment 9100990 [details]
Bug 1586627 - Work around a bug in AppleIntelHD3000GraphicsGLDriver in GLContext. r=jrmuizel,jgilbert

Revision D49205 was moved to bug 1588676. Setting attachment 9100990 [details] to obsolete.

Attachment #9100990 - Attachment is obsolete: true

I've landed the Compositor variant on autoland and moved the more general fix to bug 1588676.
I am comfortable with the risk level of the Compositor fix: I've confirmed that it fixes the bug on the affected machine, and it's written in such a way that, on drivers that work correctly, it won't have any observable effects. The code it adds also isn't even executed on GPUs that are not the Intel HD 3000 model.

Here are two try builds:
mozilla-central: https://treeherder.mozilla.org/#/jobs?repo=try&revision=da7294753f0de8ed59b9fbad15a436c64bbf8c70
Beta: https://treeherder.mozilla.org/#/jobs?repo=try&revision=63dd225d1699d4036a5123d3cfa858e421b2c149

I didn't bother triggering any tests because our tests machines don't have the affected GPU, so the new code won't ever get executed in our CI.

Comment on attachment 9100645 [details]
Bug 1586627 - Work around a bug in AppleIntelHD3000GraphicsGLDriver in CompositorOGL. r=jrmuizel, r=jgilbert

Beta/Release Uplift Approval Request

  • User impact if declined: Visible glitches on certain GPUs on macOS. According to telemetry, around 0.25% of total "sessions" are on machines with this hardware IIRC.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: Yes
  • If yes, steps to reproduce: Find a Mac machine that has an Intel HD Graphics 3000 in it. Create an event on Google calendar. No glitches should appear on the screen. I think scrolling on about:preferences also shows glitches.
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): The added code isn't executed on any other GPUs, and it's completely "neutral" / harmless in the sense that it normally wouldn't have any effect. It just happens to prevent the driver from going down the faulty path.
    I've tested the fix on the single affected machine we could find.
  • String changes made/needed:
Attachment #9100645 - Flags: approval-mozilla-beta?

Build for testing: target.dmg

David, bgstandaert, could you download this build and test it out for a bit? Does it fix the glitches for you? Do you notice any other problems with it? Thanks!

(This build says 71.0b1, and has Nightly branding, but I think code-wise it should be very similar to a 70 RC 2 build with the patch included.)

Flags: needinfo?(david.leibovic)
Flags: needinfo?(bgstandaert)
Status: ASSIGNED → RESOLVED
Closed: 3 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla71

Comment on attachment 9100645 [details]
Bug 1586627 - Work around a bug in AppleIntelHD3000GraphicsGLDriver in CompositorOGL. r=jrmuizel, r=jgilbert

70 is on release now, moving the request accordingly.

Attachment #9100645 - Flags: approval-mozilla-beta? → approval-mozilla-release?

Comment on attachment 9100645 [details]
Bug 1586627 - Work around a bug in AppleIntelHD3000GraphicsGLDriver in CompositorOGL. r=jrmuizel, r=jgilbert

Workaround for older Macs, OK for uplift for the RC2 build.

Attachment #9100645 - Flags: approval-mozilla-release? → approval-mozilla-release+

Reproduced the issue with 70.0b11 on macOS 10.12.6 Intel HD Graphics 3000 384 MB.
Verified as fixed with Nightly 71.0a1 and 70.0 build from taskcluster on macOS 10.12.6 Intel HD Graphics 300.

Flags: qe-verify+ → qe-verify-
Flags: qe-verify-

I'm currently seeing a similar issue on Firefox 71.0 with HD 5000. Google calendar doesn't trigger it, for me the act of switching tabs can trigger it. And if I don't switch out of the "glitchy" tab it can even trigger a reboot.

Firefox 71.0
Mojave/10.14.6
MacBook Pro (Retina, 13-inch, Late 2013) (MacBookPro11,1)
Intel Core i5 / 2.6 GHz
Memory: 16 GB
Intel HD Graphics 5100
VRAM (Dynamic, Max): 1536 MB

I've filed bug 1606603 for comment 56.

You need to log in before you can comment on or make changes to this bug.