Open Bug 1784093 Opened 2 years ago Updated 3 months ago

GPU hangs on ivybridge and sandybridge with backdrop filter blur

Categories

(Core :: Graphics, defect, P2)

Firefox 103
defect

Tracking

()

Tracking Status
firefox-esr91 --- unaffected
firefox-esr102 --- unaffected
firefox103 --- wontfix
firefox104 + wontfix
firefox105 + wontfix
firefox106 + affected

People

(Reporter: kml, Assigned: bradwerth, NeedInfo)

References

Details

Attachments

(7 files, 5 obsolete files)

User Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:103.0) Gecko/20100101 Firefox/103.0

Steps to reproduce:

After the last Firefox update (103.0.2) Intel Graphics driver 9.17.10.2932 crashes constantly on some sites. If between restarts it's possible to close the tab with the site causing problems, the crashes stop.

User agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:103.0) Gecko/20100101 Firefox/103.0
Driver Version: 9.17.10.2932 (latest from laptop manufacturer)
Computer: laptop ASUS K56CB (Intel Core-i5 3317U + HD Graphics 4000)
Windows 7 64 bit (Windows_NT 6.1 7601)

Example site that causes crash:
https://www.asus.com/bt/SupportOnly/K56CB/HelpDesk_Knowledge/
and scroll down
(Some sites causes crash instantly, some after scrolling down).

This didn't happen until the latest Firefox update.

The Bugbug bot thinks this bug should belong to the 'Core::Graphics' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.

Component: Untriaged → Graphics
Product: Firefox → Core

Thank you for filing. Would you please post your "about:support" to this Bug? We'll try to correlate this to a crash report and figure out where the crash is occurring.

Blocks: gfx-triage
Severity: -- → S2
Flags: needinfo?(kml)
Priority: -- → P2
Attached file about:support
(In reply to Brad Werth [:bradwerth] from comment #2)
> Thank you for filing. Would you please post your "about:support" to this Bug? We'll try to correlate this to a crash report and figure out where the crash is occurring.

(In reply to Brad Werth [:bradwerth] from comment #2)

Thank you for filing. Would you please post your "about:support" to this Bug? We'll try to correlate this to a crash report and figure out where the crash is occurring.

I attached the "about:support" info.

(In reply to GMA from comment #3)

https://www.intel.com/content/www/us/en/download/18606/intel-graphics-driver-for-windows-15-33.html
https://www.intel.com/content/www/us/en/support/articles/000005654/graphics.html

Try updating to the last Win7 driver version released by Intel, 10.18.10.5161. 9.17.10.2932 is very old version.

When I try to install driver version 15.33, I can't do this because the following error appears: "The driver being installed is not validated for this computer. Please obtain the appropriate driver from the computer manufacturer."
Of course, I can try to find workarounds, but still this is the last official release of the manufacturer, and the driver did not crash until the last update.

Flags: needinfo?(kml)

I'll add the old driver to the blocklist.

Assignee: nobody → bwerth
No longer blocks: gfx-triage

I'll add something like the blocking of old nvidia drivers, but for intel. This will activate software WebRender for users in a similar situation, which should solve this problem for this class of users.

See Also: → 1784368

I can confirm this error.
After updating to version 103.0.1, FF has started to cause an Intel video driver (v 9.17.10.4229) error on some sites
on Lenovo monoblocks with Win 7 x64 OS in our office

We should try reproducing this locally to get a regression range.

(In reply to Alex AC from comment #9)

I can confirm this error.
After updating to version 103.0.1, FF has started to cause an Intel video driver (v 9.17.10.4229) error on some sites
on Lenovo monoblocks with Win 7 x64 OS in our office

I've updated the patch to include this version in the blocklist. Obviously an imperfect solution, but if the problem is only occurring with a 10-year-old driver, we can decide if we want to draw the line there.

Alex AC, can attach the graphics section of your about:support to the bug as well?

Flags: needinfo?(alexac)

kml,

Are you able to run mozregression to find out what change introduced the problem?

Flags: needinfo?(kml)
Attached file mozregression_log.txt

(In reply to Jeff Muizelaar [:jrmuizel] from comment #13)

kml,

Are you able to run mozregression to find out what change introduced the problem?

I have attached two new files - log and buildinfo text files from mozregression. I got this from there:

pushlog_url: https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=c8d1ebf43e8cecf24e0932a5ab547b92f68a6028&tochange=e7cc9622ec6a955a42bb1e258ca2debb424bee57

At the end of the log was the following message:

2022-08-16T01:48:13.070000: DEBUG : Found commit message:
Bug 1578503 - Enable backdrop-filter by default r=gfx-reviewers,jrmuizel
There are still a few remaining issues with the updated backdrop
filter implementation, specifically:

  • We don't use reflectMode yet for blurs (quality issue in some cases)
  • Performance may not be optimal in all use cases
    However, we can try enabling by default now and work on these as
    follow ups.
    Differential Revision: https://phabricator.services.mozilla.com/D148684
    2022-08-16T01:48:13.070000: DEBUG : Did not find a branch, checking all integration branches
    2022-08-16T01:48:13.072000: INFO : The bisection is done.
    2022-08-16T01:48:13.074000: INFO : Stopped

In addition, when the driver crashed, the following errors appeared in the log:

2022-08-16T01:45:18.359000: INFO : b'[Parent 16720, IPC I/O Parent] WARNING: file /builds/worker/checkouts/gecko/ipc/chromium/src/base/process_util_win.cc:167'
2022-08-16T01:45:34.849000: INFO : b'[GFX1-]: Internal D3D11 error: HRESULT: 0x887A0005: Error allocating VertexShader'
2022-08-16T01:45:34.879000: INFO : b'[GFX1-]: Context has been lost.'
2022-08-16T01:45:34.879000: INFO : b'[GFX1-]: Failed to link shader program: cs_scale'
2022-08-16T01:45:34.889000: INFO : b''
2022-08-16T01:45:34.889000: INFO : b'[2022-08-15T22:45:34Z ERROR webrender::device::gl] Failed to link shader program: cs_scale'
2022-08-16T01:45:34.889000: INFO : b''
2022-08-16T01:45:34.889000: INFO : b'[GFX1-]: Failed to compile vertex shader: cs_scale_TEXTURE_2D'
2022-08-16T01:45:34.889000: INFO : b''
2022-08-16T01:45:34.889000: INFO : b'[2022-08-15T22:45:34Z ERROR webrender::device::gl] Failed to compile vertex shader: cs_scale_TEXTURE_2D'
2022-08-16T01:45:34.889000: INFO : b''
2022-08-16T01:45:34.889000: INFO : b'[GFX1-]: wr_renderer_render: Shader(Link("cs_scale", ""))'
2022-08-16T01:45:34.889000: INFO : b'[GFX1-]: wr_renderer_render: Shader(Compilation("cs_scale_TEXTURE_2D", ""))'
2022-08-16T01:45:34.889000: INFO : b'[GFX1]: Device reset due to WR device: 0x887a0006'
2022-08-16T01:45:34.889000: INFO : b'[GFX1-]: GFX: RenderThread detected a device reset in PostUpdate'
2022-08-16T01:45:35.905000: INFO : b'[GFX1-]: Fallback WR to SW-WR + D3D11'
2022-08-16T01:45:35.931000: INFO : b'[GFX1-]: Failed to make render context current during destroying.'
2022-08-16T01:45:44.258000: INFO : b'[GFX1-]: Receive IPC close with reason=AbnormalShutdown'
2022-08-16T01:45:44.261000: INFO : b'[GFX1-]: Receive IPC close with reason=AbnormalShutdown'
2022-08-16T01:45:44.262000: INFO : b'[GFX1-]: Receive IPC close with reason=AbnormalShutdown'
2022-08-16T01:45:44.262000: INFO : b'[GFX1-]: Receive IPC close with reason=AbnormalShutdown'
2022-08-16T01:45:44.273000: INFO : b'Exiting due to channel error.'

Hope this helps.

Flags: needinfo?(kml)

Yes, that helps a lot.

(In reply to Jeff Muizelaar [:jrmuizel] from comment #17)

Yes, that helps a lot.

I set the preference "layout.css.backdrop-filter.enabled" to "false" in "about:config" and video driver crashes stopped.
It might be useful for you to know this.

Glenn, any guesses as to how backdrop filters would cause the vertex shader not to build?

Flags: needinfo?(gwatson)
Summary: Graphics driver crashes constantly on some sites after last update 103.0.2: Intel GFX driver 9.17.10.2932 / Windows 7 64 bit → Graphics driver crashes constantly on some sites after last update 103.0.2: Intel GFX driver 9.17.10.2932 / Windows 7 64 bit caused by backdrop filter

No, that doesn't make any sense to me at all - there's no shaders that are specific to backdrop-filter, I can't imagine why it would cause a link failure in cs_scale.

From [GFX1-]: Internal D3D11 error: HRESULT: 0x887A0005: Error allocating VertexShader' maybe some kind of coincidental corruption or other bug coming from ANGLE or the driver?

Flags: needinfo?(gwatson)

FWIW, that HRESULT is DXGI_ERROR_DEVICE_REMOVED.

Attached file graph.html
Flags: needinfo?(alexac)
Summary: Graphics driver crashes constantly on some sites after last update 103.0.2: Intel GFX driver 9.17.10.2932 / Windows 7 64 bit caused by backdrop filter → Graphics driver crashes constantly on some sites after last update 103.0.2: Intel GFX driver 9.17.10.2932 / Windows 7 64 bit caused by backdrop filter on ivybridge

I believe I can reproduce this locally

I see it on Win10 with 9.17.10.4459

Summary: Graphics driver crashes constantly on some sites after last update 103.0.2: Intel GFX driver 9.17.10.2932 / Windows 7 64 bit caused by backdrop filter on ivybridge → Graphics driver crashes on ivybridge with backdrop filter
Summary: Graphics driver crashes on ivybridge with backdrop filter → Graphics driver crashes on ivybridge with backdrop filter blur

9.17.10.4459 is the newest driver available to me on Windows update

Status: UNCONFIRMED → NEW
Ever confirmed: true
Attached file A mostly reduced test case (obsolete) —

It seems the 'background-color' is important for reproducing the problem

Attachment #9290110 - Attachment is obsolete: true
See Also: → 1785091

Recording the problem in GPUview suggests that it's a GPU hang. I see packet submitted by Firefox taking 14-15 seconds.

See Also: → 1784908
Attachment #9289588 - Attachment is obsolete: true

I can't reproduce this, and we're not going to handle it as a driver blocklist issue, so I'll take myself off the bug.

Assignee: bwerth → nobody

I can not reproduce the problem with the 10.18.10.5161 driver

doesn't reproduce with 10.18.10.4425

(In reply to kml from comment #0)

User Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:103.0) Gecko/20100101 Firefox/103.0

Steps to reproduce:

After the last Firefox update (103.0.2) Intel Graphics driver 9.17.10.2932 crashes constantly on some sites. If between restarts it's possible to close the tab with the site causing problems, the crashes stop.

User agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:103.0) Gecko/20100101 Firefox/103.0
Driver Version: 9.17.10.2932 (latest from laptop manufacturer)
Computer: laptop ASUS K56CB (Intel Core-i5 3317U + HD Graphics 4000)
Windows 7 64 bit (Windows_NT 6.1 7601)

Example site that causes crash:
https://www.asus.com/bt/SupportOnly/K56CB/HelpDesk_Knowledge/
and scroll down
(Some sites causes crash instantly, some after scrolling down).

This didn't happen until the latest Firefox update.

You need to update your video driver to 15.33.53.5161 because it's likely a driver bug.

Please download the latest video driver from here https://www.intel.com/content/www/us/en/products/sku/65707/intel-core-i53317u-processor-3m-cache-up-to-2-60-ghz/downloads.html and see if your issue goes away.

Just FYI that driver may not be available for Windows 7?

(In reply to Ashley Hale from comment #34)

Just FYI that driver may not be available for Windows 7?

It explicitly states "Windows 7, 32-bit*,Windows 8.1, 32-bit*,Windows 7, 64-bit* 3 More"

Oh cool, thanks for the correction. I was going by another bug where a Windows 7 laptop could not update, but that may have just been OEM restrictions.

(In reply to Ashley Hale from comment #36)

Oh cool, thanks for the correction. I was going by another bug where a Windows 7 laptop could not update, but that may have just been OEM restrictions.

Lenovo is notorious for OEM lock. Not sure if an older Asus with Ivy Bridge will be too. I hope not.

(In reply to Arthur K. [He/Him] from comment #33)

You need to update your video driver to 15.33.53.5161 because it's likely a driver bug.

Please download the latest video driver from here https://www.intel.com/content/www/us/en/products/sku/65707/intel-core-i53317u-processor-3m-cache-up-to-2-60-ghz/downloads.html and see if your issue goes away.

Well, as I wrote in comment 5, when I try to install driver version 15.33, I can't do this because the following error appears: "The driver being installed is not validated for this computer. Please obtain the appropriate driver from the computer manufacturer."
I have the latest available driver from the manufacturer installed (as well as through Windows Update). The driver at your link is in the form of an exe-file, so I can't install it manually.

(In reply to kml from comment #38)

(In reply to Arthur K. [He/Him] from comment #33)

You need to update your video driver to 15.33.53.5161 because it's likely a driver bug.

Please download the latest video driver from here https://www.intel.com/content/www/us/en/products/sku/65707/intel-core-i53317u-processor-3m-cache-up-to-2-60-ghz/downloads.html and see if your issue goes away.

Well, as I wrote in comment 5, when I try to install driver version 15.33, I can't do this because the following error appears: "The driver being installed is not validated for this computer. Please obtain the appropriate driver from the computer manufacturer."
I have the latest available driver from the manufacturer installed (as well as through Windows Update). The driver at your link is in the form of an exe-file, so I can't install it manually.

That's what I get for not scrolling through the discussion. The ZIP is located here: https://www.intel.com/content/www/us/en/download/18606/intel-graphics-driver-for-windows-15-33.html If you have 7zip or some other freebie extractor, you can extract the .ZIP to some temp folder.

When updating your driver, you should be able to bypass this warning by using the "Have Disk" method and point it to the .inf for the newer driver and just force it to use the newer driver when it complains about it not being "from the computer manufacturer". This method has worked for me for eons. Up to you if you want to go the extra mile.

Summary: Graphics driver crashes on ivybridge with backdrop filter blur → Graphics driver crashes on ivybridge and sandybridge with backdrop filter blur

To replace OEM driver with Intel's release, we may need to manually remove OEM driver first. Below are the detailed steps:

  1. Disconnect the internet connection so Windows Update won't automatically reinstall a previous OEM driver.
  2. Open Device Manager > Display Adapters > right-click [Intel Graphics] > Uninstall Device
    Important: Check-mark "delete the driver software for this device"
  3. Right-click anywhere in device manager > select Scan for Hardware Changes
    Note: Many older versions can be stored on the system to roll back to
  4. If another Intel Graphics is reinstalled, repeat 2 & 3 until Basic Display Adapter is shown, not the Intel driver.

The bug is marked as tracked for firefox104 (beta) and tracked for firefox105 (nightly). We have limited time to fix this, the soft freeze is in a day. However, the bug still isn't assigned.

:bhood, could you please find an assignee for this tracked bug? If you disagree with the tracking decision, please talk with the release managers.

For more information, please visit auto_nag documentation.

Flags: needinfo?(bhood)
Assignee: nobody → jmuizelaar
Flags: needinfo?(bhood)

Reducing the size of the Firefox window to 1920/2 prevents the problem from happening

and reducing the blur radius to 1px doesn't help

Does reproduce with:
9.17.10.4229 5/25/2015
9.17.10.2867 9/26/2012
9.17.10.2843 8/21/2012

Does not reproduce with:
8.15.10.2351 4/10/2011
8.15.10.2401 5/21/2011
8.15.10.2559 10/21/2011
8.15.10.2778 6/6/2012
8.15.10.2879 10/30/2012

I haven't been able to reproduce this with a capture even after updating mozangle to the same version of ANGLE as is in Firefox.

I was able to get an apitrace recording. The hang happens when executing a ID3D11DeviceContext4::Draw call. I believe this Draw is coming from here: https://searchfox.org/mozilla-central/rev/14fd7ed50b087ca4d46d33e0f818360c32294afa/gfx/angle/checkout/src/libANGLE/renderer/d3d/d3d11/Clear11.cpp#797 when we try to clear a depth buffer.

If I change the preceding RSSetScissorRects to set 0 rects instead of 1 the draw call doesn't hang.

See Also: → 1638672

Disabling enable_clear_scissor seems to prevent the crash

During the daily I noticed that ANGLE's Clear fallback is still using a scissor when drawing to the depth:
https://searchfox.org/mozilla-central/source/gfx/angle/checkout/src/libANGLE/renderer/d3d/d3d11/Clear11.cpp#790
But the clear fallback that we use elsewhere does not.

(In reply to Jeff Muizelaar [:jrmuizel] from comment #51)

It seems plausible that the hangs are related to this problem: https://gitlab.freedesktop.org/mesa/mesa/-/commit/714b4f6184db84a738cf2d063980f0e19ab03b4b

I take it we're current with ANGLE version such that there's nothing there that would work around or fix it?

This is generated from the apitrace of Firefox. It hasn't been reduced that much yet.

This version of the program is somewhat readable.

There are two draws. The first one uses dual source blending. The second one does not. The second one hangs. I suspect this bug has the same underlying cause as bug 1633628

Attachment #9292128 - Attachment is obsolete: true
See Also: → 1633628

This is still present in 104?

The underlying problem is still in 104 but bug 1785366 which is in 104 tries to avoid hitting it.

Summary: Graphics driver crashes on ivybridge and sandybridge with backdrop filter blur → GPU hangs on ivybridge and sandybridge with backdrop filter blur

So it turns out I misdiagnosed bug 1633628. The cause of that was not the ClearView call hanging but the depth only draw that happened afterward. I confirmed this by getting a new apitrace recording of that hang and replaying successfully past ClearView. The reason the fix there helped is that by avoiding ClearView we cleared the color and depth targets together thus avoiding doing a depth only draw.

I'm not sure why we're doing a depth only clear in this case but avoiding that is a temporary option for avoiding this hang.

I'm not sure what a better fix for ANGLE is at this point.

Glenn, what would be a good way to measure the performance gain from scissoring during clear?

Flags: needinfo?(gwatson)

kvark wrote a small gl-benchmarking harness [1]. We could probably extend that slightly to support scissored clears, and run that along with the fill benchmark on a variety of intel GPUs?

[1] https://github.com/kvark/gl-bench/blob/master/src/main.rs

Flags: needinfo?(gwatson)

(In reply to Brad Werth [:bradwerth] from comment #7)

I'll add something like the blocking of old nvidia drivers, but for intel. This will activate software WebRender for users in a similar situation, which should solve this problem for this class of users.

Given we don't have a better solution, we will handle this by adding to the blocklist.

Assignee: jmuizelaar → bwerth
Attachment #9347282 - Attachment is obsolete: true

(In reply to Brad Werth [:bradwerth] from comment #62)

Given we don't have a better solution, we will handle this by adding to the blocklist.

I misunderstood. We still have hope of affecting this calling pattern either in Angle or within WebRender.

(In reply to Glenn Watson [:gw] from comment #61)

kvark wrote a small gl-benchmarking harness [1]. We could probably extend that slightly to support scissored clears, and run that along with the fill benchmark on a variety of intel GPUs?

[1] https://github.com/kvark/gl-bench/blob/master/src/main.rs

so something like this gives me full clear:

1| windows | "4.6.0 - Build 31.0.101.4502" | "Intel(R) Iris(R) Xe Graphics" | 1920x1200 | 1 | 0.50 ms | 0 mcs | 217 mcs | 34 mcs |

Scissored:

1| windows | "4.6.0 - Build 31.0.101.4502" | "Intel(R) Iris(R) Xe Graphics" | 1920x1200 | 1 | 0.17 ms | 2 mcs | 74 mcs | 31 mcs |

So scissored seems faster, but also I'm scissoring to only a quarter of the screen so not sure how representative that is.

Glen, does the commit above seem reasonable? Do the results match your expectations? Thanks

Flags: needinfo?(gwatson)

It looks like there's only one way that the ColorRenderTarget clear_color is ever set to None. That was done as part of Bug 1764005. If that target also will return true for needs_depth(), then we're setting up a depth-only clear. And of course if the clear_color is set to None here and then later the conditions change to make needs_depth() true, then that would lead to the same problem. Tricky.

Glenn, should we be doing something more complicated here to ensure we don't attempt a depth-only clear?

This adds a capabilities boolean to note whether or not the device can
successfully depth-only clear. It is set false for Sandybridge and
Ivybridge hardware; true for others. At the point of clearing, it panics
if a depth-only clear is attempted. A later part will need to detect when
we are about to submit a depth-only clear and supply a color when required
by the device.

Jeff, in your reproduction case, does a build with attachment 9350178 [details] applied hit the panic instead of the driver crash?

Flags: needinfo?(jmuizelaar)

If I'm reading that correctly it suggests that scissored clears are still likely to be a clear performance win on Xe, at least for that specific rectangle size (the overall % I guess depends on what our average / typical clear region within targets it).

Flags: needinfo?(gwatson)

Are we waiting on something here, or have we pursued this as far as we can? Looking to see if this can be considered stalled.

I should have PiKVM setup for a machine that experiences this in the next couple of days

Flags: needinfo?(jmuizelaar)
Flags: needinfo?(jmuizelaar)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: