Crash in [@ mozilla::layers::NativeLayerCA::HandlePartialUpdate<T>]
Categories
(Core :: Graphics: WebRender, defect, P2)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr102 | --- | unaffected |
firefox110 | --- | unaffected |
firefox111 | --- | wontfix |
firefox112 | + | fixed |
People
(Reporter: aryx, Assigned: bradwerth)
References
(Regression)
Details
(Keywords: crash, regression, topcrash)
Crash Data
Attachments
(2 files)
15 crashes from 11 installations of 112.0a1 on macOS since bug 1658986 landed.
Crash report: https://crash-stats.mozilla.org/report/index/b399b5e2-f24e-4bd3-9ef9-35a0d0230219
MOZ_CRASH Reason: MOZ_CRASH()
Top 10 frames of crashing thread:
0 XUL mozilla::layers::NativeLayerCA::HandlePartialUpdate<mozilla::layers::NativeLayerCA::NextSurfaceAsFramebuffer gfx/layers/NativeLayerCA.mm:1231
0 XUL mozilla::layers::NativeLayerCA::NextSurfaceAsFramebuffer gfx/layers/NativeLayerCA.mm:1297
1 XUL mozilla::wr::RenderCompositorNativeOGL::Bind gfx/webrender_bindings/RenderCompositorNative.cpp:548
2 XUL <webrender_bindings::bindings::WrCompositor as webrender::composite::Compositor>::bind gfx/webrender_bindings/src/bindings.rs:1332
3 XUL webrender::renderer::Renderer::draw_frame gfx/wr/webrender/src/renderer/mod.rs:4370
4 XUL webrender::renderer::Renderer::render_impl gfx/wr/webrender/src/renderer/mod.rs:1480
5 XUL webrender::renderer::Renderer::render gfx/wr/webrender/src/renderer/mod.rs:1197
6 XUL wr_renderer_render gfx/webrender_bindings/src/bindings.rs:614
7 XUL mozilla::wr::RendererOGL::UpdateAndRender gfx/webrender_bindings/RendererOGL.cpp:186
7 XUL mozilla::wr::RenderThread::UpdateAndRender gfx/webrender_bindings/RenderThread.cpp:578
Updated•1 year ago
|
Assignee | ||
Comment 1•1 year ago
|
||
I'll figure this out. This is an expected result of landing Bug 1658986, which added an intentional crash that logs additional information to the graphics log. As an example:
[GFX1]: The update region [0,0,513,392] must cover the invalid region [0,391,728,392].
is from a crash in 112.0a1 and shows that WebRender is giving us an update region that is narrower than we expect, though the invalid region is only one pixel high which probably means that we have a rounding error in our invalid-region tracking logic.
Updated•1 year ago
|
Comment 2•1 year ago
|
||
The bug is linked to a topcrash signature, which matches the following criterion:
- Top 10 desktop browser crashes on nightly
:bradwerth, could you consider increasing the severity of this top-crash bug?
For more information, please visit auto_nag documentation.
Assignee | ||
Comment 3•1 year ago
|
||
Yep, I'm on it. I need to confirm that most or all of the crash signatures are similar to what was noted in comment 1: that this shows our invalid region tracking is accepting a rounding error to int pixels which is leaving us with a one-pixel tall or wide invalid region.
Ideas on how to fix this:
- Find where the rounding error is occurring (possibly within WebRender) and make it more accurate.
- Change our macOS invalid region management code to clean up a remaining one-pixel tall or wide region, and leave the asserts in. Eventually remove the asserts because they have a runtime cost.
- Just remove the asserts now. We probably don’t have invalid pixels, just an invalid region. If we do have invalid pixels, we’ll get new bug reports on that issue – but why aren’t we already seeing invalid pixels on other platforms?
I think I'll pursue option 2 to get this fixed fast, and we can clean it up later.
Assignee | ||
Comment 4•1 year ago
|
||
Assignee | ||
Comment 5•1 year ago
|
||
I think the proposed patch will resolve this issue, but we should leave the Bug open while we check. And if it does resolve the issue, then we still have a rounding error to fix somewhere, and remove all the runtime asserts, and we could use this Bug for that work.
Comment 6•1 year ago
|
||
No, let's have one landing per bug and use new bugs for follow-up work. Having one bug per landing makes it much easier to track regressions and uplifts.
Assignee | ||
Comment 7•1 year ago
|
||
(In reply to Markus Stange [:mstange] from comment #6)
No, let's have one landing per bug and use new bugs for follow-up work. Having one bug per landing makes it much easier to track regressions and uplifts.
Good point. I'll take off the leave-open on this Bug, and I've opened Bug 1818540 for the follow-up work.
Assignee | ||
Updated•1 year ago
|
Pushed by bwerth@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/2d7bbfc27139 Make nsNativeLayerCA clean up rounding errors in update regions. r=mstange
Comment 9•1 year ago
|
||
(In reply to Brad Werth [:bradwerth] from comment #7)
Good point. I'll take off the leave-open on this Bug, and I've opened Bug 1818540 for the follow-up work.
Thanks!
Comment 10•1 year ago
|
||
bugherder |
Assignee | ||
Comment 11•1 year ago
|
||
Comment on attachment 9319487 [details]
Bug 1817691: Make nsNativeLayerCA clean up rounding errors in update regions.
Beta/Release Uplift Approval Request
- User impact if declined: macOS users will continue to occasionally experience crashes while scrolling under some unknown conditions.
- Is this code covered by automated tests?: No
- Has the fix been verified in Nightly?: No
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: Bug 1658986
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): The patch in Bug 1817691 moves some asserts around, and this patch probably solves the problem in a way that is very low-risk. There is no meaningful risk added by these patches.
- String changes made/needed:
- Is Android affected?: No
Comment 12•1 year ago
|
||
Comment on attachment 9319487 [details]
Bug 1817691: Make nsNativeLayerCA clean up rounding errors in update regions.
Approved for 111.0b6
Comment 13•1 year ago
|
||
bugherder uplift |
Updated•1 year ago
|
Reporter | ||
Comment 14•1 year ago
|
||
Crash volume is unchanged after the code landed.
Assignee | ||
Comment 15•1 year ago
|
||
Better leave this open even as we land partial fixes.
Assignee | ||
Comment 16•1 year ago
|
||
Now that NotifySurfaceReady will clean up a 1-pixel strip invalid region,
this change makes HandlePartialUpdate allow a 1-pixel strip invalid
region. This should hopefully make HandlePartialUpdate only crash on the
cases that will also crash NotifySurfaceReady.
Comment 17•1 year ago
|
||
Pushed by bwerth@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/451ce16f5fd1 Part 2: Make HandlePartialUpdate tolerate the same fuzziness as NotifySurfaceReady. r=mstange
Comment 18•1 year ago
|
||
Backed out for causing region update related assertion failures.
Failure logs:
https://treeherder.mozilla.org/logviewer?job_id=407644608&repo=autoland
https://treeherder.mozilla.org/logviewer?job_id=407645242&repo=autoland
https://treeherder.mozilla.org/logviewer?job_id=407643391&repo=autoland
Backout link: https://hg.mozilla.org/integration/autoland/rev/b55abe26174eaa37f7bebce994001acf05e1bd3f
Assignee | ||
Comment 19•1 year ago
|
||
This was just a binary logic error in the original patch. Fast fix.
Comment 20•1 year ago
|
||
Pushed by bwerth@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/18f8401e2b78 Part 2: Make HandlePartialUpdate tolerate the same fuzziness as NotifySurfaceReady. r=mstange
Comment 21•1 year ago
|
||
bugherder |
Comment 22•1 year ago
|
||
:bradwerth, the crash volume has dropped.
In terms of your investigation, do you plan on landing anything else under this OR should the leave-open be removed and the bug resolved?
Next step if you are finished with the investigation. If you could consider a release uplift request on the last patch, we could consider it for inclusion in a later 111 build.
Assignee | ||
Comment 23•1 year ago
|
||
I don't think anything else will happen in this Bug. The follow-up Bug 1818540 will have the rest of the work. I'll remove the leave-open.
I don't think the Part 2 patch needs to be uplifted because it affects only code in a #ifdef NIGHTLY_BUILD
protected block. It should have no impact outside of Nightly.
Assignee | ||
Updated•1 year ago
|
Updated•1 year ago
|
Updated•1 year ago
|
Comment 24•1 year ago
|
||
Comment on attachment 9319487 [details]
Bug 1817691: Make nsNativeLayerCA clean up rounding errors in update regions.
Removing the uplift approved flag to remove this from missed 111 uplift queries.
Description
•