Crash in [@ core::result::unwrap_failed | webrender::renderer::Renderer::render_impl]
Categories
(Core :: Graphics: WebRender, defect, P1)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr68 | --- | unaffected |
firefox75 | --- | unaffected |
firefox76 | --- | unaffected |
firefox77 | --- | fixed |
firefox78 | --- | fixed |
People
(Reporter: gsvelto, Assigned: kvark)
References
(Blocks 1 open bug, Regression)
Details
(Keywords: crash, regression)
Crash Data
Attachments
(1 file)
47 bytes,
text/x-phabricator-request
|
pascalc
:
approval-mozilla-beta+
|
Details | Review |
This bug is for crash report bp-88ab7b43-bba5-4f43-859d-63f4d0200419.
Top 10 frames of crashing thread:
0 xul.dll RustMozCrash mozglue/static/rust/wrappers.cpp:16
1 xul.dll mozglue_static::panic_hook mozglue/static/rust/lib.rs:89
2 xul.dll core::ops::function::Fn::call<fn ../f3e1a954d2ead4e2fc197c7da7d71e6c61bad196/src/libcore/ops/function.rs:232
3 xul.dll std::panicking::rust_panic_with_hook ../f3e1a954d2ead4e2fc197c7da7d71e6c61bad196//src/libstd/panicking.rs:475
4 xul.dll std::panicking::begin_panic_handler ../f3e1a954d2ead4e2fc197c7da7d71e6c61bad196//src/libstd/panicking.rs:375
5 xul.dll core::panicking::panic_fmt ../f3e1a954d2ead4e2fc197c7da7d71e6c61bad196//src/libcore/panicking.rs:84
6 xul.dll core::result::unwrap_failed ../f3e1a954d2ead4e2fc197c7da7d71e6c61bad196//src/libcore/result.rs:1188
7 xul.dll webrender::renderer::Renderer::render_impl gfx/wr/webrender/src/renderer.rs:3407
8 xul.dll webrender::renderer::Renderer::render gfx/wr/webrender/src/renderer.rs:3185
9 xul.dll webrender_bindings::bindings::wr_renderer_render gfx/webrender_bindings/src/bindings.rs:599
The oldest buildid with this crash is 20200414094457. The raw crash reason is:
called `Result::unwrap()` on an `Err` value: Shader(Compilation("gpu_cache_update", ""))
From what I can tell we should be failing around here.
Comment 1•5 years ago
•
|
||
Builds before and including first crash:
Range between 20200413034542 and 20200413225327: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=da2b1829e1fd5bb09b405214cfb8d80bd911f111&tochange=73dad4bd258511ddea0b6ad87c47f4a01b5c7a6b
Range between 20200413225327 and 20200414094457 (first crash): https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=73dad4bd258511ddea0b6ad87c47f4a01b5c7a6b&tochange=078326f48100828dc60d2e0ca2819164b8803b7d
Comment 2•5 years ago
|
||
Hey Glenn,
Could this have something to do with Bug 1629693?
Updated•5 years ago
|
Comment 3•5 years ago
|
||
It does seem like that is the bug that regressed it.
But it's hard to imagine why that would fail to compile, it's a really simple shader.
Dzmitry, Sotaro, Jeff, do you have any ideas what may cause that shader to fail compilation on the machine(s) in this crash report? I didn't see any shader compiler errors in the log, but I might have missed them.
Comment 4•5 years ago
|
||
I see that "System Memory Use Percentage" is at 98% - could this somehow be an OOM on startup!?
Assignee | ||
Comment 5•5 years ago
|
||
It's the only shader that assigns to gl_PointSize
and is non-polygonal from Angle's point of view. It should work, of course. I'm not seeing anything inherently suggesting that it wouldn't. The good thing is - if it's Angle failing to compile the shader, we should be able to reproduce it. Just need to find a similar "UHD Graphics 620 (Whiskey Lake) (0x3ea0)" machine with Windows 10 and check what HLSL source gets generated, if any.
Side note - why are we even compiling it at render time? We should add it to the non-lazy list of shaders on the platforms that need it.
Comment 6•5 years ago
|
||
Sotaro was unable to reproduce this on the same GPU. Looking closer at the crash report, it seems to be low on both physical memory and also available page file memory. So I suspect the system in this case was heavily loaded and we're seeing an OOM crash in D3D / ANGLE.
Let's see if this continues to have reported crashes in the next few days?
Reporter | ||
Comment 7•5 years ago
|
||
Looking at all the crashes ~60% of them have very low available page file numbers which indicate they must be very close to an OOM condition. The rest however have plenty of memory available. Could they be running out of another resource? One other interesting tidbit is that almost all the crashes have the following error in the Graphics critical error annotation:
Failed to load a program object with a program binary: gpu_cache_update renderer ANGLE (...)
IIUC the shader had already been compiled and was available in binary form but it failed to load, so the code tried to recompile it and failed, does that sound correct?
Comment 8•5 years ago
|
||
Maybe checking for a glError after calling glProgramBinary would be a good idea. It should also be possible to get the debug messages out of ANGLE in this case.
Comment 9•5 years ago
|
||
(In reply to Gabriele Svelto [:gsvelto] from comment #7)
Looking at all the crashes ~60% of them have very low available page file numbers which indicate they must be very close to an OOM condition. The rest however have plenty of memory available. Could they be running out of another resource? One other interesting tidbit is that almost all the crashes have the following error in the Graphics critical error annotation:
Failed to load a program object with a program binary: gpu_cache_update renderer ANGLE (...)
IIUC the shader had already been compiled and was available in binary form but it failed to load, so the code tried to recompile it and failed, does that sound correct?
Yes, from the error log, it seemed to fail to load program binary at first.
Comment 10•5 years ago
|
||
(In reply to Jeff Muizelaar [:jrmuizel] from comment #8)
Maybe checking for a glError after calling glProgramBinary would be a good idea. It should also be possible to get the debug messages out of ANGLE in this case.
If seems possible to forward ANGLE error to gfx critical note. WebRender already uses GL_KHR_debug for logging.
And rust log Error message is already forwarded to gfx critical note.
Comment 11•5 years ago
|
||
(In reply to Sotaro Ikeda [:sotaro] from comment #10)
If seems possible to forward ANGLE error to gfx critical note. WebRender already uses GL_KHR_debug for logging.
Created Bug 1632096 for it.
Comment 12•5 years ago
|
||
Since Bug 1632096 fix, crash reports had following ANGLE errors. It seems that ANGLE failed to allocate GPU resources.
- Internal D3D11 error: HRESULT: 0x8007000E: Error allocating Texture2D
- Internal D3D11 error: HRESULT: 0x887A0005: Error allocating VertexShader
Comment 13•5 years ago
•
|
||
I just ran into this locally on my gen6 machine.
Updated•5 years ago
|
Updated•5 years ago
|
Comment 14•5 years ago
|
||
It looks like this is still an issue we should worry about. Glenn, should we disable 1629693 on Beta?
Comment 15•5 years ago
•
|
||
There were 5-10 crashes per day, but the last 5 Nightlies do not seem to be affected anymore:
Last crashing build: 20200504205324
First seemingly good: 20200505094621
Do you see a commit that could have fixed this crash?
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=34d71b4a00863e7615463592662dfe362c4a993e&tochange=2027edd3c62a65a57708984407f9f2aa0174f78e
Updated•5 years ago
|
Comment 16•5 years ago
|
||
Oh, it seems it only changed its signature.
bp-af37f4f7-0c5d-4818-bc26-a86f10200506 [@ core::option::expect_none_failed ]
called
Result::unwrap()
on anErr
value: Shader(Compilation("gpu_cache_update", ""))
Comment 17•5 years ago
|
||
So it looks like this is only happening after we have a device lost.
Comment 18•5 years ago
|
||
It's probably a good idea to disable on beta, I think.
It's a significant performance win on gen7.5 and below. It's probably / possibly a measurable performance win on all Windows/ANGLE devices, but would be less noticeable on higher end devices.
We really need to find a way to repro this, since we definitely want to enable this path everywhere, if possible.
Jeff, you said you saw it on your gen6 machine - was it a one off? Did you manage to repro again at all?
Do we know how many unique users have seen that crash? Initially we thought it might be an OOM, but maybe this hits a weird driver case sometime causing an OOM?
Comment 19•5 years ago
|
||
Ok, so I looked at this some more and it looks like it's just us being sloppy with our error handling. i.e.:
https://searchfox.org/mozilla-central/source/gfx/wr/webrender/src/renderer.rs#1480 can return an error if the shader fails to compile because the device has been lost.
We then panic on that error here: https://searchfox.org/mozilla-central/source/gfx/wr/webrender/src/renderer.rs#3717. Instead we should handle it like we do for other shader compilation.
Dzmitry, can you take a look at fixing this?
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 20•5 years ago
|
||
Comment 21•5 years ago
|
||
Comment 22•5 years ago
|
||
bugherder |
Comment 23•5 years ago
|
||
I a not seeing any crashes in 77 beta so marking 77 as wontfix.
Updated•5 years ago
|
Comment 24•5 years ago
|
||
The regressing bug landed in 77 so I think this should be still be uplifted.
Updated•5 years ago
|
Assignee | ||
Comment 25•5 years ago
|
||
Comment on attachment 9146892 [details]
Handle a case where WebRender can't initialize GPU cache
Beta/Release Uplift Approval Request
- User impact if declined: GPU crash in some cases where we are out of memory or failing to load a specific shader in WebRender
- Is this code covered by automated tests?: No
- Has the fix been verified in Nightly?: Yes
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): This is just turning a panic into something that reports an error. No functional changes otherwise.
- String changes made/needed:
Comment 26•5 years ago
|
||
Comment on attachment 9146892 [details]
Handle a case where WebRender can't initialize GPU cache
Low risk patch for a crash, was on nightly for 2 weeks, approved for uplift on beta before we build our RC build tomorrow.
Comment 27•5 years ago
|
||
bugherder uplift |
Updated•5 years ago
|
Description
•