Closed Bug 1817691 Opened 1 year ago Closed 1 year ago

Crash in [@ mozilla::layers::NativeLayerCA::HandlePartialUpdate<T>]

Categories

(Core :: Graphics: WebRender, defect, P2)

defect

Tracking

()

RESOLVED FIXED
112 Branch
Tracking Status
firefox-esr102 --- unaffected
firefox110 --- unaffected
firefox111 --- wontfix
firefox112 + fixed

People

(Reporter: aryx, Assigned: bradwerth)

References

(Regression)

Details

(Keywords: crash, regression, topcrash)

Crash Data

Attachments

(2 files)

15 crashes from 11 installations of 112.0a1 on macOS since bug 1658986 landed.

Crash report: https://crash-stats.mozilla.org/report/index/b399b5e2-f24e-4bd3-9ef9-35a0d0230219

MOZ_CRASH Reason: MOZ_CRASH()

Top 10 frames of crashing thread:

0  XUL  mozilla::layers::NativeLayerCA::HandlePartialUpdate<mozilla::layers::NativeLayerCA::NextSurfaceAsFramebuffer  gfx/layers/NativeLayerCA.mm:1231
0  XUL  mozilla::layers::NativeLayerCA::NextSurfaceAsFramebuffer  gfx/layers/NativeLayerCA.mm:1297
1  XUL  mozilla::wr::RenderCompositorNativeOGL::Bind  gfx/webrender_bindings/RenderCompositorNative.cpp:548
2  XUL  <webrender_bindings::bindings::WrCompositor as webrender::composite::Compositor>::bind  gfx/webrender_bindings/src/bindings.rs:1332
3  XUL  webrender::renderer::Renderer::draw_frame  gfx/wr/webrender/src/renderer/mod.rs:4370
4  XUL  webrender::renderer::Renderer::render_impl  gfx/wr/webrender/src/renderer/mod.rs:1480
5  XUL  webrender::renderer::Renderer::render  gfx/wr/webrender/src/renderer/mod.rs:1197
6  XUL  wr_renderer_render  gfx/webrender_bindings/src/bindings.rs:614
7  XUL  mozilla::wr::RendererOGL::UpdateAndRender  gfx/webrender_bindings/RendererOGL.cpp:186
7  XUL  mozilla::wr::RenderThread::UpdateAndRender  gfx/webrender_bindings/RenderThread.cpp:578
Flags: needinfo?(bwerth)

I'll figure this out. This is an expected result of landing Bug 1658986, which added an intentional crash that logs additional information to the graphics log. As an example:

[GFX1]: The update region [0,0,513,392] must cover the invalid region [0,391,728,392].

is from a crash in 112.0a1 and shows that WebRender is giving us an update region that is narrower than we expect, though the invalid region is only one pixel high which probably means that we have a rounding error in our invalid-region tracking logic.

Assignee: nobody → bwerth
Flags: needinfo?(bwerth)

The bug is linked to a topcrash signature, which matches the following criterion:

  • Top 10 desktop browser crashes on nightly

:bradwerth, could you consider increasing the severity of this top-crash bug?

For more information, please visit auto_nag documentation.

Flags: needinfo?(bwerth)
Keywords: topcrash

Yep, I'm on it. I need to confirm that most or all of the crash signatures are similar to what was noted in comment 1: that this shows our invalid region tracking is accepting a rounding error to int pixels which is leaving us with a one-pixel tall or wide invalid region.

Ideas on how to fix this:

  • Find where the rounding error is occurring (possibly within WebRender) and make it more accurate.
  • Change our macOS invalid region management code to clean up a remaining one-pixel tall or wide region, and leave the asserts in. Eventually remove the asserts because they have a runtime cost.
  • Just remove the asserts now. We probably don’t have invalid pixels, just an invalid region. If we do have invalid pixels, we’ll get new bug reports on that issue – but why aren’t we already seeing invalid pixels on other platforms?

I think I'll pursue option 2 to get this fixed fast, and we can clean it up later.

Severity: -- → S2
Flags: needinfo?(bwerth)
Priority: -- → P2

I think the proposed patch will resolve this issue, but we should leave the Bug open while we check. And if it does resolve the issue, then we still have a rounding error to fix somewhere, and remove all the runtime asserts, and we could use this Bug for that work.

Keywords: leave-open

No, let's have one landing per bug and use new bugs for follow-up work. Having one bug per landing makes it much easier to track regressions and uplifts.

Blocks: 1818540

(In reply to Markus Stange [:mstange] from comment #6)

No, let's have one landing per bug and use new bugs for follow-up work. Having one bug per landing makes it much easier to track regressions and uplifts.

Good point. I'll take off the leave-open on this Bug, and I've opened Bug 1818540 for the follow-up work.

Keywords: leave-open
Pushed by bwerth@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/2d7bbfc27139
Make nsNativeLayerCA clean up rounding errors in update regions. r=mstange

(In reply to Brad Werth [:bradwerth] from comment #7)

Good point. I'll take off the leave-open on this Bug, and I've opened Bug 1818540 for the follow-up work.

Thanks!

Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → 112 Branch

Comment on attachment 9319487 [details]
Bug 1817691: Make nsNativeLayerCA clean up rounding errors in update regions.

Beta/Release Uplift Approval Request

  • User impact if declined: macOS users will continue to occasionally experience crashes while scrolling under some unknown conditions.
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: Bug 1658986
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): The patch in Bug 1817691 moves some asserts around, and this patch probably solves the problem in a way that is very low-risk. There is no meaningful risk added by these patches.
  • String changes made/needed:
  • Is Android affected?: No
Attachment #9319487 - Flags: approval-mozilla-beta?

Comment on attachment 9319487 [details]
Bug 1817691: Make nsNativeLayerCA clean up rounding errors in update regions.

Approved for 111.0b6

Attachment #9319487 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Crash volume is unchanged after the code landed.

Status: RESOLVED → REOPENED
Flags: needinfo?(bwerth)
Resolution: FIXED → ---
Target Milestone: 112 Branch → ---

Better leave this open even as we land partial fixes.

Flags: needinfo?(bwerth)
Keywords: leave-open

Now that NotifySurfaceReady will clean up a 1-pixel strip invalid region,
this change makes HandlePartialUpdate allow a 1-pixel strip invalid
region. This should hopefully make HandlePartialUpdate only crash on the
cases that will also crash NotifySurfaceReady.

Pushed by bwerth@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/451ce16f5fd1
Part 2: Make HandlePartialUpdate tolerate the same fuzziness as NotifySurfaceReady. r=mstange

This was just a binary logic error in the original patch. Fast fix.

Flags: needinfo?(bwerth)
Pushed by bwerth@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/18f8401e2b78
Part 2: Make HandlePartialUpdate tolerate the same fuzziness as NotifySurfaceReady. r=mstange

:bradwerth, the crash volume has dropped.
In terms of your investigation, do you plan on landing anything else under this OR should the leave-open be removed and the bug resolved?
Next step if you are finished with the investigation. If you could consider a release uplift request on the last patch, we could consider it for inclusion in a later 111 build.

Flags: needinfo?(bwerth)

I don't think anything else will happen in this Bug. The follow-up Bug 1818540 will have the rest of the work. I'll remove the leave-open.

I don't think the Part 2 patch needs to be uplifted because it affects only code in a #ifdef NIGHTLY_BUILD protected block. It should have no impact outside of Nightly.

Flags: needinfo?(bwerth)
Keywords: leave-open
Status: REOPENED → RESOLVED
Closed: 1 year ago1 year ago
Resolution: --- → FIXED
Target Milestone: --- → 112 Branch

Comment on attachment 9319487 [details]
Bug 1817691: Make nsNativeLayerCA clean up rounding errors in update regions.

Removing the uplift approved flag to remove this from missed 111 uplift queries.

Attachment #9319487 - Flags: approval-mozilla-beta+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: