Crash in static bool mozilla::wr::Moz2DRenderCallback MOZ_RELEASE_ASSERT(ret)

RESOLVED FIXED in Firefox 65

Status

()

defect
P2
critical
RESOLVED FIXED
Last year
5 months ago

People

(Reporter: mats, Assigned: kats)

Tracking

(Blocks 2 bugs, 4 keywords)

Trunk
mozilla65
Unspecified
All
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox-esr60 unaffected, firefox62 disabled, firefox63 disabled, firefox64 disabled, firefox65 fixed)

Details

(crash signature, )

Attachments

(6 attachments, 1 obsolete attachment)

Reporter

Description

Last year
This bug was filed from the Socorro interface and is
report bp-a1423f52-5a1d-4120-876c-39ad40180527.
=============================================================

MOZ_CRASH Reason:  MOZ_RELEASE_ASSERT(ret)


Top 9 frames of crashing thread:

0 xul.dll static bool mozilla::wr::Moz2DRenderCallback gfx/webrender_bindings/Moz2DImageRenderer.cpp:324
1 xul.dll wr_moz2d_render_cb gfx/webrender_bindings/Moz2DImageRenderer.cpp:363
2 xul.dll static void rayon_core::job::{{impl}}::execute<closure> third_party/rust/rayon-core/src/job.rs:156
3 xul.dll static void rayon_core::registry::WorkerThread::wait_until<rayon_core::latch::CountLatch> third_party/rust/rayon-core/src/registry.rs:540
4 xul.dll static void std::sys_common::backtrace::__rust_begin_short_backtrace<closure,  src/libstd/sys_common/backtrace.rs:133
5 xul.dll static void alloc::boxed::{{impl}}::call_box< src/liballoc/boxed.rs:815
6 xul.dll static void std::sys::windows::thread::{{impl}}::new::thread_start src/libstd/sys/windows/thread.rs:54
7 kernel32.dll BaseThreadInitThunk 
8 ntdll.dll RtlUserThreadStart 

=============================================================
Crash Signature: [@ static bool mozilla::wr::Moz2DRenderCallback] → [@ static bool mozilla::wr::Moz2DRenderCallback] [@ mozilla::wr::Moz2DRenderCallback ]
OS: Windows 10 → All
Crash Signature: [@ static bool mozilla::wr::Moz2DRenderCallback] [@ mozilla::wr::Moz2DRenderCallback ] → [@ static bool mozilla::wr::Moz2DRenderCallback] [@ mozilla::wr::Moz2DRenderCallback ] [@ wr_moz2d_render_cb ]
Duplicate of this bug: 1470437
Bug 1470437 has a reproducible testcase
bp-294a6c6a-c031-45f4-85bb-cd6fd0180622

"good" = thread 'WRRenderBackend#1' panicked at 'Vector image error Unknown error', gfx/webrender/src/resource_cache.rs:854:29
"bad" = Assertion failure: ret, at /builds/worker/workspace/build/src/gfx/webrender_bindings/Moz2DImageRenderer.cpp:247

RUST_BACKTRACE=1 mozregression --good 2018-03-05 --bad 2018-03-16 -B debug --pref gfx.webrender.all:true startup.homepage_welcome_url:'https://bug1470437.bmoattachments.org/attachment.cgi?id=8987065'
> 11:30.57 INFO: Last good revision: 28707a88967e92bee769fcbae13584adc75ced84
> 11:30.57 INFO: First bad revision: 7dad306cf8fede99816cf454466ff16cbf38053c
> 11:30.57 INFO: Pushlog:
> https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=28707a88967e92bee769fcbae13584adc75ced84&tochange=7dad306cf8fede99816cf454466ff16cbf38053c

> 7dad306cf8fe	Jeff Muizelaar — Bug 1391255. Crash earlier if recording playback fails. r=kats

> +    MOZ_RELEASE_ASSERT(ret);
Blocks: 1391255
Has Regression Range: --- → yes
Has STR: --- → yes
Keywords: testcase
Priority: -- → P1
Please also set the 'regression' keyword when you find a regression.
Keywords: regression
We can't release this to the field, but we can let this ride to beta -- unless the crash rate goes up when we enable WR in Nightly.
Priority: P1 → P2

Comment 7

10 months ago
Posted file ASAN crash report
I can reproduce it reliably while editing a google slide presentation, attached is the ASAN crash report.
(In reply to Francois Guerraz from comment #7)
> Created attachment 9007176 [details]
> ASAN crash report
> 
> I can reproduce it reliably while editing a google slide presentation,
> attached is the ASAN crash report.

Any chance you can give repeatable steps to reproduce?
Flags: needinfo?(kubrick)

Comment 9

10 months ago
bug1486198
Yes, create a new presentation:
https://docs.google.com/presentation/
Click the "blank template", start typing a title in the big title box, and voilà.

It's pretty much unusable, anything I try to do with slides leads to a crash.
Flags: needinfo?(kubrick)
(In reply to Francois Guerraz from comment #9)
> Yes, create a new presentation:
> https://docs.google.com/presentation/
> Click the "blank template", start typing a title in the big title box, and
> voilà.

Those steps don't reproduce for me. What platform are you on?
Flags: needinfo?(kubrick)
If Google Slides is broken we need to block nightly.
Blocks: stage-wr-nightly
No longer blocks: stage-wr-trains
Priority: P2 → P1
I can't reproduce on Mac or Windows.
Also are you using the latest nightly?

Comment 14

10 months ago
I'm running the latest nightly, on Linux, (X)Wayland, HiDPI, webrender enabled, Iris Graphics 540, arch.

but...

I've tried to reproduce it with a fresh profile and it doesn't crash, but it very reliably crashes with my "usual" profile.

I'll try to find out what it is that makes it die.
Flags: needinfo?(kubrick)
Summary: Crash in static bool mozilla::wr::Moz2DRenderCallback → Crash in static bool mozilla::wr::Moz2DRenderCallback [Google Slides]
Assignee: nobody → jmuizelaar
Comment hidden (offtopic)
Comment hidden (offtopic)
Comment hidden (offtopic)
(In reply to Kartikaya Gupta (email:kats@mozilla.com) (parental leave) from comment #3)
> Bug 1470437 has a reproducible testcase

RUST_BACKTRACE=1 mozregression --launch 2018-08-08 -B debug --pref gfx.webrender.all:true -a https://bug1470437.bmoattachments.org/attachment.cgi?id=8987065
> 0:34.83 INFO: Assertion failure: ret, at /builds/worker/workspace/build/src/gfx/webrender_bindings/Moz2DImageRenderer.cpp:336

This bug still has its original crash reason while MOZ_RELEASE_ASSERT(aBlob.length() > sizeof(size_t)) from comment 15 and comment 17 is tracked in bug 1486198.
See Also: → 1486198
Summary: Crash in static bool mozilla::wr::Moz2DRenderCallback [Google Slides] → Crash in static bool mozilla::wr::Moz2DRenderCallback MOZ_RELEASE_ASSERT(ret) [Google Slides]
Comment hidden (offtopic)
Comment hidden (offtopic)
(In reply to Mayank Bansal from comment #17)
> I got crashes like  these when opening blank google sheets and typing something. But its not 100% reproducible.
> https://crash-stats.mozilla.com/report/index/85aaeae6-440d-4664-ac92-35f760180909

Google Slides has the crash reason of bug 1486198. Moving it over there.
Assignee: jmuizelaar → nobody
Blocks: stage-wr-trains
No longer blocks: stage-wr-nightly
Priority: P1 → P2
Summary: Crash in static bool mozilla::wr::Moz2DRenderCallback MOZ_RELEASE_ASSERT(ret) [Google Slides] → Crash in static bool mozilla::wr::Moz2DRenderCallback MOZ_RELEASE_ASSERT(ret)

Updated

9 months ago
Flags: needinfo?(jmuizelaar)
Priority: P2 → P3
Priority: P3 → P2
I'll take a look at this. I can repro some sort of lockup using the testcase in bug 1470437.
Assignee: nobody → kats
The recorded stream has a CreateSimilarDrawTarget with a giant size, and trying to do that operation on a skia draw target fails. Presumably the better thing to do is to fail earlier on the content side when creating the stream.
I'm not sure this is the right approach.
Do you want to (a) try to detect and avoid it on the content side, or (b) handle it more gracefully on the compositor side?

If (a) is your concern that we're going to be creating these temporary drawtargets? Or that it might still work on the content side but fail on the compositor side? Or something else? Another option for (a) side is to somehow extract the logic from skia drawtarget creation where it does the sanity checking of parameters and reuse that.

If you prefer (b) I can try to write something that does that, I just assumed (a) was the way to go.
(In reply to Jeff Muizelaar [:jrmuizel] from comment #27)
> I'm not sure this is the right approach.

To elaborate I don't want to pay the cost of allocating the surface during recording. What does the caller do to handle the error?
On the content side the call stack looks like this:

#01: gfxCallbackDrawable::MakeSurfaceDrawable(gfxContext*, mozilla::gfx::SamplingFilter)[/Users/kats/zspace/mozilla-wr/obj-host-debug/dist/NightlyDebug.app/Contents/MacOS/XUL +0x121888e]
#02: gfxCallbackDrawable::Draw(gfxContext*, mozilla::gfx::RectTyped<mozilla::gfx::UnknownUnits, double> const&, mozilla::gfx::ExtendMode, mozilla::gfx::SamplingFilter, double, mozilla::gfx::BaseMatrix<double> const&)[/Users/kats/zspace/mozilla-wr/obj-host-debug/dist/NightlyDebug.app/Contents/MacOS/XUL +0x1218bb8]
#03: gfxUtils::DrawPixelSnapped(gfxContext*, gfxDrawable*, mozilla::gfx::SizeTyped<mozilla::gfx::UnknownUnits, double> const&, mozilla::image::ImageRegion const&, mozilla::gfx::SurfaceFormat, mozilla::gfx::SamplingFilter, unsigned int, double, bool)[/Users/kats/zspace/mozilla-wr/obj-host-debug/dist/NightlyDebug.app/Contents/MacOS/XUL +0x12c32fd]
#04: mozilla::image::DynamicImage::Draw(gfxContext*, mozilla::gfx::IntSizeTyped<mozilla::gfx::UnknownUnits> const&, mozilla::image::ImageRegion const&, unsigned int, mozilla::gfx::SamplingFilter, mozilla::Maybe<mozilla::SVGImageContext> const&, unsigned int, f[/Users/kats/zspace/mozilla-wr/obj-host-debug/dist/NightlyDebug.app/Contents/MacOS/XUL +0x1340227]
#05: DrawImageInternal(gfxContext&, nsPresContext*, imgIContainer*, mozilla::gfx::SamplingFilter, nsRect const&, nsRect const&, nsPoint const&, nsRect const&, mozilla::Maybe<mozilla::SVGImageContext> const&, unsigned int, mozilla::gfx::ExtendMode, float)[/Users/kats/zspace/mozilla-wr/obj-host-debug/dist/NightlyDebug.app/Contents/MacOS/XUL +0x388dd65]
#06: nsLayoutUtils::DrawImage(gfxContext&, mozilla::ComputedStyle*, nsPresContext*, imgIContainer*, mozilla::gfx::SamplingFilter, nsRect const&, nsRect const&, nsPoint const&, nsRect const&, unsigned int, float)[/Users/kats/zspace/mozilla-wr/obj-host-debug/dist/NightlyDebug.app/Contents/MacOS/XUL +0x3890395]
#07: mozilla::nsImageRenderer::Draw(nsPresContext*, gfxContext&, nsRect const&, nsRect const&, nsRect const&, nsPoint const&, nsSize const&, mozilla::gfx::IntRectTyped<mozilla::CSSPixel> const&, float)[/Users/kats/zspace/mozilla-wr/obj-host-debug/dist/NightlyDebug.app/Contents/MacOS/XUL +0x3c22970]
#08: mozilla::nsImageRenderer::DrawLayer(nsPresContext*, gfxContext&, nsRect const&, nsRect const&, nsPoint const&, nsRect const&, nsSize const&, float)[/Users/kats/zspace/mozilla-wr/obj-host-debug/dist/NightlyDebug.app/Contents/MacOS/XUL +0x3c23e5e]
#09: nsCSSRendering::PaintStyleImageLayerWithSC(nsCSSRendering::PaintBGParams const&, gfxContext&, mozilla::ComputedStyle*, nsStyleBorder const&)[/Users/kats/zspace/mozilla-wr/obj-host-debug/dist/NightlyDebug.app/Contents/MacOS/XUL +0x3bbe52a]
#10: nsCSSRendering::PaintStyleImageLayer(nsCSSRendering::PaintBGParams const&, gfxContext&)[/Users/kats/zspace/mozilla-wr/obj-host-debug/dist/NightlyDebug.app/Contents/MacOS/XUL +0x3bbd26e]
#11: nsDisplayBackgroundImage::PaintInternal(nsDisplayListBuilder*, gfxContext*, nsRect const&, nsRect*)[/Users/kats/zspace/mozilla-wr/obj-host-debug/dist/NightlyDebug.app/Contents/MacOS/XUL +0x3be8c50]
#12: nsDisplayCanvasBackgroundImage::Paint(nsDisplayListBuilder*, gfxContext*)[/Users/kats/zspace/mozilla-wr/obj-host-debug/dist/NightlyDebug.app/Contents/MacOS/XUL +0x39141ad]
#13: mozilla::layers::PaintItemByDrawTarget(nsDisplayItem*, mozilla::gfx::DrawTarget*, mozilla::gfx::PointTyped<mozilla::LayoutDevicePixel, float> const&, nsDisplayListBuilder*, RefPtr<mozilla::layers::BasicLayerManager> const&, mozilla::gfx::SizeTyped<mozilla[/Users/kats/zspace/mozilla-wr/obj-host-debug/dist/NightlyDebug.app/Contents/MacOS/XUL +0x10c8521]
#14: mozilla::layers::WebRenderCommandBuilder::GenerateFallbackData(nsDisplayItem*, mozilla::wr::DisplayListBuilder&, mozilla::wr::IpcResourceUpdateQueue&, mozilla::layers::StackingContextHelper const&, nsDisplayListBuilder*, mozilla::gfx::RectTyped<mozilla::L[/Users/kats/zspace/mozilla-wr/obj-host-debug/dist/NightlyDebug.app/Contents/MacOS/XUL +0x10c72a6]

And the first couple of frames gracefully handle not getting back a drawtarget by just skipping the drawing. So we end up with a blank page on the testcase, which is the same as we get with non-WR.

I can try to extract the logic for determining if the surface allocation parameters are valid and just use that without actually allocating the surface.
Is that the callstack for the crashtest?
Flags: needinfo?(kats)
Yeah. I don't have any other STR beyond that test case. it's possible there are other causes.
Flags: needinfo?(kats)
Ah, I see now that the test case use -moz-element. I'd be interested to know what mstange's thoughts on the best way to fix this are.
Flags: needinfo?(mstange)
Here's an alternative fix that works on the test case as well, and avoids the texture allocation by testing the size. If this approach is acceptable I can update the phabricator patch.
Attachment #9016717 - Attachment description: cancreate → Alternative part 1
There should be an optimization that avoids allocating a surface in the case where the image is not repeated. I don't know if that optimization still exists.
It would also to be good to have a size check in -moz-element-specific code that refuses to create "dynamic images" of sizes that are larger than what we would allow for regular pixel images.
And we probably want to keep some detection in the DrawTargetRecording code for large DTs, but maybe it should just be a crash so that we can root out badly-behaved consumers.
Flags: needinfo?(mstange)
(In reply to Markus Stange [:mstange] from comment #35)
> There should be an optimization that avoids allocating a surface in the case
> where the image is not repeated. I don't know if that optimization still
> exists.

I looked for something like this but it's hard to look for something that may or may not exist. I didn't find anything but that doesn't mean much.

> It would also to be good to have a size check in -moz-element-specific code
> that refuses to create "dynamic images" of sizes that are larger than what
> we would allow for regular pixel images.

Just so I'm clear, does this mean nsImageRenderer::DrawableForElement should return null for large image sizes? I'm not too familiar with this code. It looks like that function is also invoked from some border rendering code, do we want that affected as well? If not the check could go at the relelvant call site [1].

> And we probably want to keep some detection in the DrawTargetRecording code
> for large DTs, but maybe it should just be a crash so that we can root out
> badly-behaved consumers.

Ok, but we should avoid crashing on the gpu side, right? It's better to crash on the content side when the bad consumer tries to create the DrawTargetRecording, that way we get possibly-useful stacks. So we'd want something like my patch (the one in phabricator) but that crashes instead of returns null when the underlying drawtarget returns null.

[1] https://searchfox.org/mozilla-central/rev/39cb1e96cf97713c444c5a0404d4f84627aee85d/layout/painting/nsImageRenderer.cpp#507
Flags: needinfo?(mstange)
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #36)
> (In reply to Markus Stange [:mstange] from comment #35)
> > There should be an optimization that avoids allocating a surface in the case
> > where the image is not repeated. I don't know if that optimization still
> > exists.
> 
> I looked for something like this but it's hard to look for something that
> may or may not exist. I didn't find anything but that doesn't mean much.

We create a gfxCallbackDrawable for drawing -moz-element here: https://searchfox.org/mozilla-central/rev/6e0e603f4852b8e571e5b8ae133e772b18b6016e/layout/svg/nsSVGIntegrationUtils.cpp#1314
And here's gfxCallbackDrawable::Draw: https://searchfox.org/mozilla-central/rev/6e0e603f4852b8e571e5b8ae133e772b18b6016e/gfx/thebes/gfxDrawable.cpp#161-172
It only calls MakeSurfaceDrawable if the image is repeated (and in a few other cases). Otherwise, it draws directly onto the destination. At least that's the idea.

> > It would also to be good to have a size check in -moz-element-specific code
> > that refuses to create "dynamic images" of sizes that are larger than what
> > we would allow for regular pixel images.
> 
> Just so I'm clear, does this mean nsImageRenderer::DrawableForElement should
> return null for large image sizes? I'm not too familiar with this code. It
> looks like that function is also invoked from some border rendering code, do
> we want that affected as well? If not the check could go at the relelvant
> call site [1].

I think it would be reasonable to do that to nsImageRenderer::DrawableForElement in general.

> > And we probably want to keep some detection in the DrawTargetRecording code
> > for large DTs, but maybe it should just be a crash so that we can root out
> > badly-behaved consumers.
> 
> Ok, but we should avoid crashing on the gpu side, right? It's better to
> crash on the content side when the bad consumer tries to create the
> DrawTargetRecording, that way we get possibly-useful stacks. So we'd want
> something like my patch (the one in phabricator) but that crashes instead of
> returns null when the underlying drawtarget returns null.

"when the underlying drawtarget would return null", yes. We don't actually want to allocate pixels on the recording side, but we do want the size check on the recording side.
Flags: needinfo?(mstange)
So it looks like we're hitting that MakeSurfaceDrawable call because the image is repeated, that seems to happen at [1]. The aExtendMode argument to that function is set to CLAMP but doTile is true.

[1] https://searchfox.org/mozilla-central/rev/17f55aee76b7c4610a974cffd3453454e0c8de7b/layout/base/nsLayoutUtils.cpp#6948
Attachment #9015981 - Attachment is obsolete: true
Badly-behaved consumers of DrawTargetRecording can trigger recording of
draw calls that will fail to allocate required draw targets when the
recording is replayed. This patch tries to guard against this by
detecting these situations at record-time rather than crashing at
replay-time. When such a situation is detected, it will crash (for
content processes, to catch such scenarios) or gracefully fail (for
other processes).

Depends on D8257

Comment 41

7 months ago
Pushed by kgupta@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/e80a0b2502e9
Add a gfxCriticalNote to provide more details on replay failure. r=jrmuizel
https://hg.mozilla.org/integration/autoland/rev/75de649833be
Robustify DrawTargetRecording codepaths that create new drawtargets. r=mstange
https://hg.mozilla.org/integration/autoland/rev/dae86d421021
Prevent creation of DynamicImage instances that are excessively large. r=mstange
Depends on: 1508811
Depends on: 1508822
Depends on: 1513133
Depends on: 1524418
You need to log in before you can comment on or make changes to this bug.