Crash in [@ mozilla::layers::NativeLayerCA::HandlePartialUpdate<T>]
Categories
(Core :: Graphics: WebRender, defect, P2)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr102 | --- | unaffected |
firefox110 | --- | unaffected |
firefox111 | --- | wontfix |
firefox112 | + | fixed |
People
(Reporter: aryx, Assigned: bradwerth)
References
(Regression)
Details
(Keywords: crash, regression, topcrash)
Crash Data
Attachments
(2 files, 1 obsolete file)
15 crashes from 11 installations of 112.0a1 on macOS since bug 1658986 landed.
Crash report: https://crash-stats.mozilla.org/report/index/b399b5e2-f24e-4bd3-9ef9-35a0d0230219
MOZ_CRASH Reason: MOZ_CRASH()
Top 10 frames of crashing thread:
0 XUL mozilla::layers::NativeLayerCA::HandlePartialUpdate<mozilla::layers::NativeLayerCA::NextSurfaceAsFramebuffer gfx/layers/NativeLayerCA.mm:1231
0 XUL mozilla::layers::NativeLayerCA::NextSurfaceAsFramebuffer gfx/layers/NativeLayerCA.mm:1297
1 XUL mozilla::wr::RenderCompositorNativeOGL::Bind gfx/webrender_bindings/RenderCompositorNative.cpp:548
2 XUL <webrender_bindings::bindings::WrCompositor as webrender::composite::Compositor>::bind gfx/webrender_bindings/src/bindings.rs:1332
3 XUL webrender::renderer::Renderer::draw_frame gfx/wr/webrender/src/renderer/mod.rs:4370
4 XUL webrender::renderer::Renderer::render_impl gfx/wr/webrender/src/renderer/mod.rs:1480
5 XUL webrender::renderer::Renderer::render gfx/wr/webrender/src/renderer/mod.rs:1197
6 XUL wr_renderer_render gfx/webrender_bindings/src/bindings.rs:614
7 XUL mozilla::wr::RendererOGL::UpdateAndRender gfx/webrender_bindings/RendererOGL.cpp:186
7 XUL mozilla::wr::RenderThread::UpdateAndRender gfx/webrender_bindings/RenderThread.cpp:578
Updated•2 years ago
|
Assignee | ||
Comment 1•2 years ago
|
||
I'll figure this out. This is an expected result of landing Bug 1658986, which added an intentional crash that logs additional information to the graphics log. As an example:
[GFX1]: The update region [0,0,513,392] must cover the invalid region [0,391,728,392].
is from a crash in 112.0a1 and shows that WebRender is giving us an update region that is narrower than we expect, though the invalid region is only one pixel high which probably means that we have a rounding error in our invalid-region tracking logic.
Updated•2 years ago
|
Comment 2•2 years ago
|
||
The bug is linked to a topcrash signature, which matches the following criterion:
- Top 10 desktop browser crashes on nightly
:bradwerth, could you consider increasing the severity of this top-crash bug?
For more information, please visit auto_nag documentation.
Assignee | ||
Comment 3•2 years ago
|
||
Yep, I'm on it. I need to confirm that most or all of the crash signatures are similar to what was noted in comment 1: that this shows our invalid region tracking is accepting a rounding error to int pixels which is leaving us with a one-pixel tall or wide invalid region.
Ideas on how to fix this:
- Find where the rounding error is occurring (possibly within WebRender) and make it more accurate.
- Change our macOS invalid region management code to clean up a remaining one-pixel tall or wide region, and leave the asserts in. Eventually remove the asserts because they have a runtime cost.
- Just remove the asserts now. We probably don’t have invalid pixels, just an invalid region. If we do have invalid pixels, we’ll get new bug reports on that issue – but why aren’t we already seeing invalid pixels on other platforms?
I think I'll pursue option 2 to get this fixed fast, and we can clean it up later.
Assignee | ||
Comment 4•2 years ago
|
||
Assignee | ||
Comment 5•2 years ago
|
||
I think the proposed patch will resolve this issue, but we should leave the Bug open while we check. And if it does resolve the issue, then we still have a rounding error to fix somewhere, and remove all the runtime asserts, and we could use this Bug for that work.
Comment 6•2 years ago
|
||
No, let's have one landing per bug and use new bugs for follow-up work. Having one bug per landing makes it much easier to track regressions and uplifts.
Assignee | ||
Comment 7•2 years ago
|
||
(In reply to Markus Stange [:mstange] from comment #6)
No, let's have one landing per bug and use new bugs for follow-up work. Having one bug per landing makes it much easier to track regressions and uplifts.
Good point. I'll take off the leave-open on this Bug, and I've opened Bug 1818540 for the follow-up work.
Assignee | ||
Updated•2 years ago
|
Comment 9•2 years ago
|
||
(In reply to Brad Werth [:bradwerth] from comment #7)
Good point. I'll take off the leave-open on this Bug, and I've opened Bug 1818540 for the follow-up work.
Thanks!
Comment 10•2 years ago
|
||
bugherder |
Assignee | ||
Comment 11•2 years ago
|
||
Comment on attachment 9319487 [details]
Bug 1817691: Make nsNativeLayerCA clean up rounding errors in update regions.
Beta/Release Uplift Approval Request
- User impact if declined: macOS users will continue to occasionally experience crashes while scrolling under some unknown conditions.
- Is this code covered by automated tests?: No
- Has the fix been verified in Nightly?: No
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: Bug 1658986
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): The patch in Bug 1817691 moves some asserts around, and this patch probably solves the problem in a way that is very low-risk. There is no meaningful risk added by these patches.
- String changes made/needed:
- Is Android affected?: No
Comment 12•2 years ago
|
||
Comment on attachment 9319487 [details]
Bug 1817691: Make nsNativeLayerCA clean up rounding errors in update regions.
Approved for 111.0b6
Comment 13•2 years ago
|
||
bugherder uplift |
Updated•2 years ago
|
![]() |
Reporter | |
Comment 14•2 years ago
|
||
Crash volume is unchanged after the code landed.
Assignee | ||
Comment 15•2 years ago
|
||
Better leave this open even as we land partial fixes.
Assignee | ||
Comment 16•2 years ago
|
||
Now that NotifySurfaceReady will clean up a 1-pixel strip invalid region,
this change makes HandlePartialUpdate allow a 1-pixel strip invalid
region. This should hopefully make HandlePartialUpdate only crash on the
cases that will also crash NotifySurfaceReady.
Comment 17•2 years ago
|
||
Comment 18•2 years ago
|
||
Backed out for causing region update related assertion failures.
Failure logs:
https://treeherder.mozilla.org/logviewer?job_id=407644608&repo=autoland
https://treeherder.mozilla.org/logviewer?job_id=407645242&repo=autoland
https://treeherder.mozilla.org/logviewer?job_id=407643391&repo=autoland
Backout link: https://hg.mozilla.org/integration/autoland/rev/b55abe26174eaa37f7bebce994001acf05e1bd3f
Assignee | ||
Comment 19•2 years ago
|
||
This was just a binary logic error in the original patch. Fast fix.
Comment 20•2 years ago
|
||
Comment 21•2 years ago
|
||
bugherder |
Comment 22•2 years ago
|
||
:bradwerth, the crash volume has dropped.
In terms of your investigation, do you plan on landing anything else under this OR should the leave-open be removed and the bug resolved?
Next step if you are finished with the investigation. If you could consider a release uplift request on the last patch, we could consider it for inclusion in a later 111 build.
Assignee | ||
Comment 23•2 years ago
|
||
I don't think anything else will happen in this Bug. The follow-up Bug 1818540 will have the rest of the work. I'll remove the leave-open.
I don't think the Part 2 patch needs to be uplifted because it affects only code in a #ifdef NIGHTLY_BUILD
protected block. It should have no impact outside of Nightly.
Assignee | ||
Updated•2 years ago
|
Updated•2 years ago
|
Updated•2 years ago
|
Comment 24•2 years ago
|
||
Comment on attachment 9319487 [details]
Bug 1817691: Make nsNativeLayerCA clean up rounding errors in update regions.
Removing the uplift approved flag to remove this from missed 111 uplift queries.
Comment 25•6 months ago
•
|
||
These crashes are still happening, on macOS. In fact they've more than doubled in frequency over the last month or so.
Comment 26•6 months ago
|
||
These crashes now all have the following "moz crash reason":
MOZ_RELEASE_ASSERT(IntRect({}, mSize).Contains(aUpdateRegion.GetBounds())) (The update region should be within the surface bounds.)
Assignee | ||
Comment 27•6 months ago
|
||
Okay, let's re-open it and I'll land something else, probably just removing the assert since I think this is a harmless issue. Maybe can keep the assert in and have it report the update region bounds so we might be able to do something actionable.
I'm having trouble getting Bugzilla to reopen the Bug, but I'll figure that out in time. Consider this active again.
Comment 28•6 months ago
•
|
||
(In reply to Brad Werth [:bradwerth] from comment #27)
I'm having trouble getting Bugzilla to reopen the Bug
Me too. Might be a bugzilla bug -- presumably a recent one. Or maybe an unfortunate (and recent) settings change for BMO's bugzilla.
Assignee | ||
Comment 29•6 months ago
|
||
This would be a deep root cause, but seemingly the only way this could happen is if this WebRender code generates a scissor_rect
outside the valid_rect
. I'm hesitant to touch this, because of how deep this is, but I guess that's what reviews are for!
Assignee | ||
Comment 30•6 months ago
|
||
This is an extra-double-sure clipping of the scissor rect to the valid
rect. This shouldn't matter since the dirty rect is clipped before the
creation of either of the other two rects, but will ensure that rounding
issues from the application of the transform in get_surface_rect
won't
leave us with an out-of-bounds scissor rect.
Comment 31•6 months ago
|
||
A patch has been attached on this bug, which was already closed. Filing a separate bug will ensure better tracking. If this was not by mistake and further action is needed, please alert the appropriate party. (Or: if the patch doesn't change behavior -- e.g. landing a test case, or fixing a typo -- then feel free to disregard this message)
Assignee | ||
Comment 32•6 months ago
|
||
Guess I'll attach this patch to a new Bug, since even Bugzilla is upset that we can't re-open this one.
Comment 33•6 months ago
|
||
Comment on attachment 9420416 [details]
Bug 1817691 Part 3: Ensure WebRender provides update rects within the bounds of valid rects.
Revision D219916 was moved to bug 1914673. Setting attachment 9420416 [details] to obsolete.
Assignee | ||
Comment 34•5 months ago
|
||
If I'm reading the crash reports correctly, there's no new sightings in 131, not since the landing of Bug 1914673. I'll keep watching.
Comment 35•5 months ago
•
|
||
(In reply to Brad Werth [:bradwerth] from comment #34)
If I'm reading the crash reports correctly, there's no new sightings in 131, not since the landing of Bug 1914673. I'll keep watching.
Even before the patch for bug 1914673 landed, this bug's crashes happened in very small numbers on mozilla-central and beta. We'd need a few weeks without any crashes to be sure.
And I do see two crashes: bp-3cddac4a-e630-457d-b8d0-ed93e0240829 and bp-f3e616d1-9538-4f37-ac60-5f5fc0240907
So it seems bug 1914673 didn't fix these crashes.
Description
•