Closed Bug 1769254 Opened 9 months ago Closed 6 months ago

Crash in [@ mozilla::wr::RenderMacIOSurfaceTextureHost::GetSize]

Categories

(Core :: Graphics: WebRender, defect)

Unspecified
macOS
defect

Tracking

()

VERIFIED FIXED
105 Branch
Tracking Status
firefox-esr91 --- unaffected
firefox-esr102 --- disabled
firefox100 --- unaffected
firefox101 --- unaffected
firefox102 - disabled
firefox103 + disabled
firefox104 --- disabled
firefox105 + verified

People

(Reporter: aryx, Assigned: sotaro)

References

(Blocks 2 open bugs, Regression, )

Details

(Keywords: crash, regression, topcrash)

Crash Data

Attachments

(3 files)

11 crashes from 6 installations, all with Firefox 102.0a1 (on macOS obviously), first reported build ID is 20220505185614.

Crash report: https://crash-stats.mozilla.org/report/index/4b88927e-489d-4eb6-be12-5024c0220512

Reason: EXC_BAD_ACCESS / KERN_INVALID_ADDRESS

Top 10 frames of crashing thread:

0 XUL mozilla::wr::RenderMacIOSurfaceTextureHost::GetSize const gfx/webrender_bindings/RenderMacIOSurfaceTextureHost.cpp:68
1 XUL mozilla::layers::NativeLayerCA::AttachExternalImage gfx/layers/NativeLayerCA.mm:779
2 XUL mozilla::wr::RenderCompositorNative::AttachExternalImage gfx/webrender_bindings/RenderCompositorNative.cpp:321
3 XUL webrender::renderer::Renderer::update_native_surfaces gfx/wr/webrender/src/renderer/mod.rs:4688
4 XUL webrender::renderer::Renderer::render_impl gfx/wr/webrender/src/renderer/mod.rs:1976
5 XUL webrender::renderer::Renderer::render gfx/wr/webrender/src/renderer/mod.rs:1737
6 XUL wr_renderer_render gfx/webrender_bindings/src/bindings.rs:616
7 XUL mozilla::wr::RenderThread::UpdateAndRender gfx/webrender_bindings/RenderThread.cpp:537
8 XUL mozilla::wr::RenderThread::HandleFrameOneDoc gfx/webrender_bindings/RenderThread.cpp:387
9 XUL mozilla::detail::RunnableMethodImpl<mozilla::wr::RenderThread*, void  xpcom/threads/nsThreadUtils.h:1200

I can reproduce this crash with SWGL enabled after accelerated Canvas2D bug 1773712 landed. However, this crash bug is not a regression from that Canvas2D bug because the Canvas2D change landed today and this crash bug was filed a month ago.

I bisected the crash with my STR to this pushlog with Canvas2D bug 1773712:

https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=ec7e694733a0dc3d956b6a3ea3a323fc1d41264c&tochange=5c16ac03eca7a366057cb2ac7a6f376f98dd8bf0

Steps to reproduce

  1. Enable SWGL (gfx.webrender.software = true) (UPDATED: and gfx.canvas.accelerated = true)
  2. Load https://results.enr.clarityelections.com/CA/Contra_Costa/114138/web.285569/#/summary
  3. Scroll down to the "GOVERNOR" section.
  4. Click on the "Show Chart" button to the right of the "GOVERNOR" section title.

Result

Crash bp-842adff9-37bc-4858-b749-f9cd30220612 in [@ mozilla::wr::RenderMacIOSurfaceTextureHost::GetSize ]

Severity: S2 → --
Has Regression Range: --- → no
Has STR: --- → yes
Flags: needinfo?(lsalzman)
Keywords: regression
OS: Unspecified → macOS
See Also: → 1773712

[Tracking Requested - why for this release]:

Kelsey, this crash is a regression from canvas color space bug 1703654.

Since accelerated Canvas2D bug 1773712 just enabled the gfx.canvas.accelerated pref, I bisected my crash STR again with that pref force-enabled and landed on this earlier pushlog for canvas color space bug 1703654:

https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=88a64ff079e144ce530a7f3121244604c29ddf92&tochange=f2d46bb91c9ddd9e6b43c315be70917b334d52a0

Has Regression Range: no → yes
Flags: needinfo?(lsalzman) → needinfo?(jgilbert)
Regressed by: 1703654

The crashes are gated to nightly build, so no need to track for 102 and 102ESR, but we should track for nightly as this is a top crasher on macOS and we should decide rapidly on backing out or not bug 1773712 on Nightly.

Requesting a backout of Bug 1773712 until the crash is investigated

Setting 103 to disabled, Bug 1773712 was backed out of central

Blocks: 1773712
See Also: 1773712

The STR I found required the accelerated canvas be enabled (bug 1773712).

But this crash signature started in Nightly 102 before the accelerated canvas was enabled and we still have some crash reports from Beta (and DevEdition) 102. Example: bp-cf91cbe1-d3c6-4c54-ba60-59fc70220613

So there may be other STR that hit this same crash without the accelerated canvas. I bisected the original crash to bug 1703654, which landed in Nightly 102.

I believe this is from

  wr::RenderMacIOSurfaceTextureHost* texture = aExternalImage->AsRenderMacIOSurfaceTextureHost();
  MOZ_ASSERT(texture);   <- I bet this is the issue, and that we would crash here if we did MOZ_RELEASE_ASSERT
  mTextureHost = texture;

  gfx::IntSize oldSize = mSize;

  mSize = texture->GetSize(0);   <- crashes here
Flags: needinfo?(jgilbert)

(In reply to Chris Peterson [:cpeterson] from comment #1)

I can reproduce this crash with SWGL enabled after accelerated Canvas2D bug 1773712 landed. However, this crash bug is not a regression from that Canvas2D bug because the Canvas2D change landed today and this crash bug was filed a month ago.

I bisected the crash with my STR to this pushlog with Canvas2D bug 1773712:

https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=ec7e694733a0dc3d956b6a3ea3a323fc1d41264c&tochange=5c16ac03eca7a366057cb2ac7a6f376f98dd8bf0

Steps to reproduce

  1. Enable SWGL (gfx.webrender.software = true).
  2. Load https://results.enr.clarityelections.com/CA/Contra_Costa/114138/web.285569/#/summary
  3. Scroll down to the "GOVERNOR" section.
  4. Click on the "Show Chart" button to the right of the "GOVERNOR" section title.

Result

Crash bp-842adff9-37bc-4858-b749-f9cd30220612 in [@ mozilla::wr::RenderMacIOSurfaceTextureHost::GetSize ]

I tried this STR, setting gfx.webrender.software and gfx.canvas.accelerated to true, but I don't get a crash. Is there anything else I am missing to repro this?

Flags: needinfo?(cpeterson)

My hunch is that the SharedSurfaceIO is not keeping alive the MacIOSurface long with out-of-process WebGL, so it exports a SurfaceDescriptor from GPU process to content process, then back to GPU process for WebRender, but in that gap, for some reason, the SharedSurfaceIO goes away, so that when the MacIOSurfaceTextureHostOGL goes to create it from the SurfaceDescriptor for WebRender, it is already gone, so we end up with a null IOSurface that is causing these crashes downwind.

I would need a more reliable repro to test this, but Sotaro is working on something to fix a similar problem in D3D11 that might fix the issue here as well if this is actually the problem.

See Also: → 1712486
Attached file about:support

(In reply to Lee Salzman [:lsalzman] from comment #8)

I tried this STR, setting gfx.webrender.software and gfx.canvas.accelerated to true, but I don't get a crash. Is there anything else I am missing to repro this?

I don't know of any other steps or settings missing from my STR. I'll attach my about:support info. Maybe there is something peculiar about my hardware (a 2015 MacBook Pro).

Summarizing my findings:

My STR crashes on this build with a clean profile when gfx.webrender.software and gfx.canvas.accelerated are true:

mach mozregression --launch 5c16ac03eca7a366057cb2ac7a6f376f98dd8bf0 --pref "gfx.webrender.software:true" "gfx.canvas.accelerated:true" -a "https://results.enr.clarityelections.com/CA/Contra_Costa/114138/web.285569/#/summary"

But doesn't crash on the same build if gfx.webrender.software or gfx.canvas.accelerated are false:

mach mozregression --launch 5c16ac03eca7a366057cb2ac7a6f376f98dd8bf0 --pref "gfx.webrender.software:true" "gfx.canvas.accelerated:false" -a "https://results.enr.clarityelections.com/CA/Contra_Costa/114138/web.285569/#/summary"

mach mozregression --launch 5c16ac03eca7a366057cb2ac7a6f376f98dd8bf0 --pref "gfx.webrender.software:false" "gfx.canvas.accelerated:true" -a "https://results.enr.clarityelections.com/CA/Contra_Costa/114138/web.285569/#/summary"

Flags: needinfo?(cpeterson)
Severity: -- → S2

Set release status flags based on info from the regressing bug 1703654

[Tracking Requested - why for this release]:

Lee, I can reproduce this SWGL crash again (using the STR in comment #1) now that gfx.canvas.accelerated has been re-enabled in bug 1773712.

bp-43999729-23c4-40db-ab68-930260220806

Flags: needinfo?(lsalzman)
Regressed by: 1773712

What's the target release for shipping accelerated canvas, Lee?

I hit this saving an edited (pdfjs.annotationEditorMode=0) PDF to PDF in yesterday(?)'s nightly on macOS.

I've hit this four times while using Google Maps over the last two days.

Sotaro, any ideas here?

Flags: needinfo?(lsalzman) → needinfo?(sotaro.ikeda.g)

100% repro visiting starlink.sx

Assignee: nobody → lsalzman
Status: NEW → ASSIGNED
Keywords: leave-open
Pushed by lsalzman@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/0f1fe5d29884
Check for null texture host. r=aosmond

This patch should at least avoid the crashes while we're investigating this bug.

We mark every external image in an async image pipeline as preferring a compositor surface:
https://searchfox.org/mozilla-central/rev/6a37a2ab9328bec6a29f688d1b2fba6974d34905/gfx/layers/wr/AsyncImagePipelineManager.cpp#453

There are additional requirements that must be met, so I am guessing that is why or part of why it doesn't always trip:
https://searchfox.org/mozilla-central/rev/43ba67391e71c57a14420e554e9d381543292611/gfx/wr/webrender/src/picture.rs#2520

As such, I don't think there is any guarantee we will get the expected type:
https://searchfox.org/mozilla-central/rev/43ba67391e71c57a14420e554e9d381543292611/gfx/layers/NativeLayerCA.mm#805

Should we be performing more checks before we set the flag? Or is it the responsibility of the compositing code to do the check?

Problem seems to exists at canUpdate check in AsyncImagePipelineManager::UpdateImageKeys(). It could not detect TextureHost change from MacIOSurfaceTextureHostOGL to ShmemTextureHost with same format and size.

It happened when accelerated canvas was fallback to sw canvas.

Assignee: lsalzman → sotaro.ikeda.g
Flags: needinfo?(sotaro.ikeda.g)
Pushed by sikeda.birchill@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/74587348428c
Add TextureHost type check for using update in AsyncImagePipelineManager::UpdateImageKeys() r=gfx-reviewers,lsalzman
Keywords: leave-open
Status: ASSIGNED → RESOLVED
Closed: 6 months ago
Resolution: --- → FIXED
Target Milestone: --- → 105 Branch
Flags: qe-verify+

Reproduced this issue on an affected Nightly build from 2022-05-13 using the STR from Comment 1, on macOS 10.15.
Verified as fixed on Firefox 105.0b5 (20220830185924) on the above OS.

Status: RESOLVED → VERIFIED
Flags: qe-verify+
You need to log in before you can comment on or make changes to this bug.