Closed Bug 1817691 Opened 2 years ago Closed 2 years ago

Crash in [@ mozilla::layers::NativeLayerCA::HandlePartialUpdate<T>]

Categories

(Core :: Graphics: WebRender, defect, P2)

defect

Tracking

()

RESOLVED FIXED
112 Branch
Tracking Status
firefox-esr102 --- unaffected
firefox110 --- unaffected
firefox111 --- wontfix
firefox112 + fixed

People

(Reporter: aryx, Assigned: bradwerth)

References

(Regression)

Details

(Keywords: crash, regression, topcrash)

Crash Data

Attachments

(2 files, 1 obsolete file)

15 crashes from 11 installations of 112.0a1 on macOS since bug 1658986 landed.

Crash report: https://crash-stats.mozilla.org/report/index/b399b5e2-f24e-4bd3-9ef9-35a0d0230219

MOZ_CRASH Reason: MOZ_CRASH()

Top 10 frames of crashing thread:

0  XUL  mozilla::layers::NativeLayerCA::HandlePartialUpdate<mozilla::layers::NativeLayerCA::NextSurfaceAsFramebuffer  gfx/layers/NativeLayerCA.mm:1231
0  XUL  mozilla::layers::NativeLayerCA::NextSurfaceAsFramebuffer  gfx/layers/NativeLayerCA.mm:1297
1  XUL  mozilla::wr::RenderCompositorNativeOGL::Bind  gfx/webrender_bindings/RenderCompositorNative.cpp:548
2  XUL  <webrender_bindings::bindings::WrCompositor as webrender::composite::Compositor>::bind  gfx/webrender_bindings/src/bindings.rs:1332
3  XUL  webrender::renderer::Renderer::draw_frame  gfx/wr/webrender/src/renderer/mod.rs:4370
4  XUL  webrender::renderer::Renderer::render_impl  gfx/wr/webrender/src/renderer/mod.rs:1480
5  XUL  webrender::renderer::Renderer::render  gfx/wr/webrender/src/renderer/mod.rs:1197
6  XUL  wr_renderer_render  gfx/webrender_bindings/src/bindings.rs:614
7  XUL  mozilla::wr::RendererOGL::UpdateAndRender  gfx/webrender_bindings/RendererOGL.cpp:186
7  XUL  mozilla::wr::RenderThread::UpdateAndRender  gfx/webrender_bindings/RenderThread.cpp:578
Flags: needinfo?(bwerth)

I'll figure this out. This is an expected result of landing Bug 1658986, which added an intentional crash that logs additional information to the graphics log. As an example:

[GFX1]: The update region [0,0,513,392] must cover the invalid region [0,391,728,392].

is from a crash in 112.0a1 and shows that WebRender is giving us an update region that is narrower than we expect, though the invalid region is only one pixel high which probably means that we have a rounding error in our invalid-region tracking logic.

Assignee: nobody → bwerth
Flags: needinfo?(bwerth)

The bug is linked to a topcrash signature, which matches the following criterion:

  • Top 10 desktop browser crashes on nightly

:bradwerth, could you consider increasing the severity of this top-crash bug?

For more information, please visit auto_nag documentation.

Flags: needinfo?(bwerth)
Keywords: topcrash

Yep, I'm on it. I need to confirm that most or all of the crash signatures are similar to what was noted in comment 1: that this shows our invalid region tracking is accepting a rounding error to int pixels which is leaving us with a one-pixel tall or wide invalid region.

Ideas on how to fix this:

  • Find where the rounding error is occurring (possibly within WebRender) and make it more accurate.
  • Change our macOS invalid region management code to clean up a remaining one-pixel tall or wide region, and leave the asserts in. Eventually remove the asserts because they have a runtime cost.
  • Just remove the asserts now. We probably don’t have invalid pixels, just an invalid region. If we do have invalid pixels, we’ll get new bug reports on that issue – but why aren’t we already seeing invalid pixels on other platforms?

I think I'll pursue option 2 to get this fixed fast, and we can clean it up later.

Severity: -- → S2
Flags: needinfo?(bwerth)
Priority: -- → P2

I think the proposed patch will resolve this issue, but we should leave the Bug open while we check. And if it does resolve the issue, then we still have a rounding error to fix somewhere, and remove all the runtime asserts, and we could use this Bug for that work.

Keywords: leave-open

No, let's have one landing per bug and use new bugs for follow-up work. Having one bug per landing makes it much easier to track regressions and uplifts.

(In reply to Markus Stange [:mstange] from comment #6)

No, let's have one landing per bug and use new bugs for follow-up work. Having one bug per landing makes it much easier to track regressions and uplifts.

Good point. I'll take off the leave-open on this Bug, and I've opened Bug 1818540 for the follow-up work.

Pushed by bwerth@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/2d7bbfc27139 Make nsNativeLayerCA clean up rounding errors in update regions. r=mstange

(In reply to Brad Werth [:bradwerth] from comment #7)

Good point. I'll take off the leave-open on this Bug, and I've opened Bug 1818540 for the follow-up work.

Thanks!

Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 112 Branch

Comment on attachment 9319487 [details]
Bug 1817691: Make nsNativeLayerCA clean up rounding errors in update regions.

Beta/Release Uplift Approval Request

  • User impact if declined: macOS users will continue to occasionally experience crashes while scrolling under some unknown conditions.
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: Bug 1658986
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): The patch in Bug 1817691 moves some asserts around, and this patch probably solves the problem in a way that is very low-risk. There is no meaningful risk added by these patches.
  • String changes made/needed:
  • Is Android affected?: No
Attachment #9319487 - Flags: approval-mozilla-beta?

Comment on attachment 9319487 [details]
Bug 1817691: Make nsNativeLayerCA clean up rounding errors in update regions.

Approved for 111.0b6

Attachment #9319487 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Crash volume is unchanged after the code landed.

Status: RESOLVED → REOPENED
Flags: needinfo?(bwerth)
Resolution: FIXED → ---
Target Milestone: 112 Branch → ---

Better leave this open even as we land partial fixes.

Flags: needinfo?(bwerth)
Keywords: leave-open

Now that NotifySurfaceReady will clean up a 1-pixel strip invalid region,
this change makes HandlePartialUpdate allow a 1-pixel strip invalid
region. This should hopefully make HandlePartialUpdate only crash on the
cases that will also crash NotifySurfaceReady.

Pushed by bwerth@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/451ce16f5fd1 Part 2: Make HandlePartialUpdate tolerate the same fuzziness as NotifySurfaceReady. r=mstange

This was just a binary logic error in the original patch. Fast fix.

Flags: needinfo?(bwerth)
Pushed by bwerth@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/18f8401e2b78 Part 2: Make HandlePartialUpdate tolerate the same fuzziness as NotifySurfaceReady. r=mstange

:bradwerth, the crash volume has dropped.
In terms of your investigation, do you plan on landing anything else under this OR should the leave-open be removed and the bug resolved?
Next step if you are finished with the investigation. If you could consider a release uplift request on the last patch, we could consider it for inclusion in a later 111 build.

Flags: needinfo?(bwerth)

I don't think anything else will happen in this Bug. The follow-up Bug 1818540 will have the rest of the work. I'll remove the leave-open.

I don't think the Part 2 patch needs to be uplifted because it affects only code in a #ifdef NIGHTLY_BUILD protected block. It should have no impact outside of Nightly.

Flags: needinfo?(bwerth)
Keywords: leave-open
Status: REOPENED → RESOLVED
Closed: 2 years ago2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 112 Branch

Comment on attachment 9319487 [details]
Bug 1817691: Make nsNativeLayerCA clean up rounding errors in update regions.

Removing the uplift approved flag to remove this from missed 111 uplift queries.

Attachment #9319487 - Flags: approval-mozilla-beta+

These crashes are still happening, on macOS. In fact they've more than doubled in frequency over the last month or so.

These crashes now all have the following "moz crash reason":

MOZ_RELEASE_ASSERT(IntRect({}, mSize).Contains(aUpdateRegion.GetBounds())) (The update region should be within the surface bounds.)

Okay, let's re-open it and I'll land something else, probably just removing the assert since I think this is a harmless issue. Maybe can keep the assert in and have it report the update region bounds so we might be able to do something actionable.

I'm having trouble getting Bugzilla to reopen the Bug, but I'll figure that out in time. Consider this active again.

(In reply to Brad Werth [:bradwerth] from comment #27)

I'm having trouble getting Bugzilla to reopen the Bug

Me too. Might be a bugzilla bug -- presumably a recent one. Or maybe an unfortunate (and recent) settings change for BMO's bugzilla.

This would be a deep root cause, but seemingly the only way this could happen is if this WebRender code generates a scissor_rect outside the valid_rect. I'm hesitant to touch this, because of how deep this is, but I guess that's what reviews are for!

This is an extra-double-sure clipping of the scissor rect to the valid
rect. This shouldn't matter since the dirty rect is clipped before the
creation of either of the other two rects, but will ensure that rounding
issues from the application of the transform in get_surface_rect won't
leave us with an out-of-bounds scissor rect.

A patch has been attached on this bug, which was already closed. Filing a separate bug will ensure better tracking. If this was not by mistake and further action is needed, please alert the appropriate party. (Or: if the patch doesn't change behavior -- e.g. landing a test case, or fixing a typo -- then feel free to disregard this message)

Guess I'll attach this patch to a new Bug, since even Bugzilla is upset that we can't re-open this one.

Comment on attachment 9420416 [details]
Bug 1817691 Part 3: Ensure WebRender provides update rects within the bounds of valid rects.

Revision D219916 was moved to bug 1914673. Setting attachment 9420416 [details] to obsolete.

Attachment #9420416 - Attachment is obsolete: true
See Also: → 1914673

If I'm reading the crash reports correctly, there's no new sightings in 131, not since the landing of Bug 1914673. I'll keep watching.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: