Closed Bug 1187466 Opened 4 years ago Closed 3 years ago

[Windows] crash in mozilla::layers::BasicCompositor::DrawQuad(mozilla::gfx::RectTyped<T> const&, mozilla::gfx::RectTyped<T> const&, mozilla::layers::EffectChain const&, float, mozilla::gfx::Matrix4x4 const&)

Categories

(Core :: Graphics: Layers, defect, critical)

30 Branch
x86
Windows
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla49
Tracking Status
firefox39 --- wontfix
firefox40 --- wontfix
firefox41 --- wontfix
firefox42 --- wontfix
firefox46 --- wontfix
firefox47 --- wontfix
firefox48 --- fixed
firefox49 --- fixed
firefox-esr45 --- affected
firefox50 --- affected

People

(Reporter: ashughes, Assigned: jerry)

References

Details

(Keywords: crash, topcrash, topcrash-win, Whiteboard: [gfx-noted])

Crash Data

Attachments

(1 file, 2 obsolete files)

This bug was filed from the Socorro interface and is 
report bp-7b27d517-8e78-482b-a580-7b8da2150724.
=============================================================
0 	xul.dll 	mozilla::layers::BasicCompositor::DrawQuad(mozilla::gfx::RectTyped<mozilla::gfx::UnknownUnits> const&, mozilla::gfx::RectTyped<mozilla::gfx::UnknownUnits> const&, mozilla::layers::EffectChain const&, float, mozilla::gfx::Matrix4x4 const&) 	gfx/layers/basic/BasicCompositor.cpp
1 	xul.dll 	mozilla::layers::ImageHost::Composite(mozilla::layers::EffectChain&, float, mozilla::gfx::Matrix4x4 const&, mozilla::gfx::Filter const&, mozilla::gfx::RectTyped<mozilla::gfx::UnknownUnits> const&, nsIntRegion const*) 	gfx/layers/composite/ImageHost.cpp
2 	xul.dll 	mozilla::layers::ImageLayerComposite::RenderLayer(nsIntRect const&) 	gfx/layers/composite/ImageLayerComposite.cpp
3 	xul.dll 	mozilla::layers::RenderLayers<mozilla::layers::ContainerLayerComposite>(mozilla::layers::ContainerLayerComposite*, mozilla::layers::LayerManagerComposite*, mozilla::gfx::IntRectTyped<mozilla::RenderTargetPixel> const&) 	gfx/layers/composite/ContainerLayerComposite.cpp
4 	xul.dll 	mozilla::layers::ContainerRender<mozilla::layers::ContainerLayerComposite>(mozilla::layers::ContainerLayerComposite*, mozilla::layers::LayerManagerComposite*, nsIntRect const&) 	gfx/layers/composite/ContainerLayerComposite.cpp
5 	xul.dll 	mozilla::layers::ContainerLayerComposite::RenderLayer(nsIntRect const&) 	gfx/layers/composite/ContainerLayerComposite.cpp
6 	xul.dll 	mozilla::layers::RenderLayers<mozilla::layers::ContainerLayerComposite>(mozilla::layers::ContainerLayerComposite*, mozilla::layers::LayerManagerComposite*, mozilla::gfx::IntRectTyped<mozilla::RenderTargetPixel> const&) 	gfx/layers/composite/ContainerLayerComposite.cpp
7 	xul.dll 	mozilla::layers::ContainerRender<mozilla::layers::ContainerLayerComposite>(mozilla::layers::ContainerLayerComposite*, mozilla::layers::LayerManagerComposite*, nsIntRect const&) 	gfx/layers/composite
=============================================================
More reports: https://crash-stats.mozilla.com/report/list?product=Firefox&signature=mozilla%3A%3Alayers%3A%3ABasicCompositor%3A%3ADrawQuad%28mozilla%3A%3Agfx%3A%3ARectTyped%3CT%3E+const%26%2C+mozilla%3A%3Agfx%3A%3ARectTyped%3CT%3E+const%26%2C+mozilla%3A%3Alayers%3A%3AEffectChain+const%26%2C+float%2C+mozilla%3A%3Agfx%3A%3AMatrix4x4+const%26%29


Note, I've filed this as a Windows bug but there are reports of this on Mac OS X. I suspect the Mac OS X crashes deserve a different bug for the following reasons:

1. The crash reasons are different.
* Windows shows EXCEPTION_ACCESS_VIOLATION_READ
* Mac shows EXC_BAD_ACCESS/KERN_INVALID_ADDRESS

2. The crashes start with different Firefox versions
* Windows crashes started with Firefox 30
* Mac crashes started with Firefox 39

3. Volume is higher on Mac OS X than it is on Windows, especially when you consider user population.
* Windows has 43% of the crashes with 87% of the users
* Mac OS X has 57% of the crashes with 7% of the users

4. The URLs reported are quite different on Windows compared to Mac 
* Top URL on Windows is Facebook.com with 83% whereas only 2% of the crashes on Mac OS X are with Facebook.com
* Top URL on Mac is Google (Maps and Chrome) with 70% whereas these URLs don't show up at all for Windows users

5. The stacks are slightly different.
* Frame 2 on Windows is in mozilla::layers::ImageLayerComposite::RenderLayer(nsIntRect const&)
* Frame 2 on Mac is in mozilla::layers::CanvasLayerComposite::RenderLayer(nsIntRect const&)

The Mac OS X case is covered by bug 1187464. Feel free to merge these bugs if you think they're the same.
Note that this is the #125 topcrash for Windows-specific crashes. It's #48 overall which I suspect is driven by the Mac-specific case in bug 1187464 (ranked #6 for Mac crashes).
Summary: crash in mozilla::layers::BasicCompositor::DrawQuad(mozilla::gfx::RectTyped<T> const&, mozilla::gfx::RectTyped<T> const&, mozilla::layers::EffectChain const&, float, mozilla::gfx::Matrix4x4 const&) → [Windows] crash in mozilla::layers::BasicCompositor::DrawQuad(mozilla::gfx::RectTyped<T> const&, mozilla::gfx::RectTyped<T> const&, mozilla::layers::EffectChain const&, float, mozilla::gfx::Matrix4x4 const&)
Crash Signature: [@ mozilla::layers::BasicCompositor::DrawQuad(mozilla::gfx::RectTyped<T> const&, mozilla::gfx::RectTyped<T> const&, mozilla::layers::EffectChain const&, float, mozilla::gfx::Matrix4x4 const&)] → [@ mozilla::layers::BasicCompositor::DrawQuad(mozilla::gfx::RectTyped<T> const&, mozilla::gfx::RectTyped<T> const&, mozilla::layers::EffectChain const&, float, mozilla::gfx::Matrix4x4 const&)] [@ mozilla::layers::BasicCompositor::DrawQuad]
See Also: → 1187464
Flags: needinfo?(sotaro.ikeda.g)
Whiteboard: [gfx-noted]
Peter, this is just outside the Top 20 on Beta, but almost in top 10 in Aurora; anybody you have that could take a look?
Flags: needinfo?(howareyou322)
See Also: → 1256517
Jerry, please help to check it.
Assignee: nobody → hshih
Flags: needinfo?(howareyou322)
I will check the crash report first.
Now in top 10 for Aurora. Note that we wouldn't see this as a crash in beta and release, because we turn it into telemetry instead.   Example crash: https://crash-stats.mozilla.com/report/index/b8be9da7-86dc-4011-8f32-9d33d2160325
From comment 5, it shows an interesting log:
|[172][GFX1 21]: Bad for basic with DataTextureSourceD3D11 and 00000000 (t=1.31793e+006)

The log comes from this:
https://hg.mozilla.org/releases/mozilla-aurora/annotate/bf6d2a4b1031/gfx/layers/basic/BasicCompositor.cpp#l514

The texture type is "DataTextureSourceD3D11", but we don't implement the AsSourceBasic() in the class DataTextureSourceD3D11. So we hit the checking.

I would like to check why we pass a DataTextureSourceD3D11 to the BasicCompositor.
I'm trying to force resetting the device while video playing and check whether we hit this assert.
Status: NEW → ASSIGNED
It's my assumption. If the device changes from d3d11 to software while playing a video, we might still receive a DataTextureSourceD3D11 and pass it to BasicCompositor.
Gecko will invalidate all frames when driver reset, but maybe the ImageBridge still the old configuration.

Is it possible?
Flags: needinfo?(nical.bugzilla)
Flags: needinfo?(sotaro.ikeda.g)
Flags: needinfo?(sotaro.ikeda.g)
(In reply to Jerry Shih[:jerry] (UTC+8) from comment #9)
> Is it possible?

Yea, it is possible, I think. Compositor reset seems not directly affect to ImageBridge's TextureClient.

And media framework's implementation seems isolated from all frames invalidation. Compositor backend type is cached in MediaFormatReader::InitLayersBackendType().
  https://dxr.mozilla.org/mozilla-central/source/dom/media/MediaFormatReader.cpp#145
Flags: needinfo?(sotaro.ikeda.g)
As Sotaro said.
Flags: needinfo?(nical.bugzilla)
Bug 1254874 might be used to address the problem. But I am not sure about it.
I will turn to check compatibility between the texture type and the current compositor. And just have a early return before drawQuad.
Try to check the TextureSource type before calling the DrawQuad().

In the crash report: https://crash-stats.mozilla.com/report/index/b8be9da7-86dc-4011-8f32-9d33d2160325
We can see a gfx message like:
  |[172][GFX1 21]: Bad for basic with DataTextureSourceD3D11 and 00000000 (t=1.31793e+006)

We hit the assert when the TextureSource->AsSourceBasic() return nullptr.
https://hg.mozilla.org/releases/mozilla-aurora/annotate/bf6d2a4b1031/gfx/layers/basic/BasicCompositor.cpp#l514

I have an assumption that the backend might change during the video playback. So the TextureSource type might not be compatible with current compositor.
So I add the type checking before the draw call.
Attachment #8741673 - Attachment is obsolete: true
Comment on attachment 8742305 [details] [diff] [review]
check TextureSource before draw quad. v1

Please check comment 14
Attachment #8742305 - Flags: review?(mstange)
It seems like we should prevent this from happening at a point that's closer to the source. Unfortunately I don't know this code well enough to say where that might be.
Matt, can you help?
Flags: needinfo?(matt.woodrow)
Attachment #8742305 - Flags: review?(mstange)
CC'ing dvander, since recovering from a Compositor change is something he's worked on a lot.

It seems like we should be calling SetCompositor() on all TextureHosts when we switch Compositors, and that should destroy any existing TextureSources that don't match.

We also need to make sure that media is being notified on this compositor change so that we can start sending the right texture types for video playback to resume.
Flags: needinfo?(matt.woodrow) → needinfo?(dvander)
(In reply to Matt Woodrow (:mattwoodrow) from comment #17)
> CC'ing dvander, since recovering from a Compositor change is something he's
> worked on a lot.
> 
> It seems like we should be calling SetCompositor() on all TextureHosts when
> we switch Compositors, and that should destroy any existing TextureSources
> that don't match.
> 
> We also need to make sure that media is being notified on this compositor
> change so that we can start sending the right texture types for video
> playback to resume.

Aurora is missing some of the follow-up patches for device resets, namely bug 1256517 and bug 1255711 which I will request uplift on ASAP. That could reduce a lot of these crashes.

That said,

(1) I didn't address the media problem you mentioned. That's still a big TODO, potentially a little trickier than normal content since there is one ImageBridge for multiple compositors.

(2) OMTC currently takes a lazy approach to ensuring Compositables are bound to the right compositor. It would be much cleaner to atomically switch everything attached to all layers, and SetCompositor on new stuff as soon as it comes in. The new could eliminate tons of checks. Layer transactions could also forward along the last known compositor ID, which would make it a lot easier to reject old transactions. Currently we rely on IPDL ordering which might make solving (1) harder.

Maybe nical knows if there's a reason it's lazy right now, making it up-front seems like it would simplify things quite a bit.
Flags: needinfo?(dvander) → needinfo?(nical.bugzilla)
(In reply to David Anderson [:dvander] from comment #18)
> 
> Maybe nical knows if there's a reason it's lazy right now, making it
> up-front seems like it would simplify things quite a bit.

I think that it organically grew into beeing lazy. We'd run into cases where a texture didn't have a compositor in various places in the code and we'd add SetCompositor calls there.
It wasn't a design choice as far as I know, and I have no objections if you want to make it more up-front.
Flags: needinfo?(nical.bugzilla)
Comment on attachment 8742305 [details] [diff] [review]
check TextureSource before draw quad. v1

There is no notification from parent to child side decoder for compositor change. So I just do the simply way that just skip the frame which is not usable by current compositor.
Maybe we could do the notification task later.
Attachment #8742305 - Flags: feedback?(nical.bugzilla)
Comment on attachment 8742305 [details] [diff] [review]
check TextureSource before draw quad. v1

Review of attachment 8742305 [details] [diff] [review]:
-----------------------------------------------------------------

This should not be needed because calling SetCompositor with an incompatible compositor/texture pair should put the TextureHost in a state where it fails to Lock() and we never end up calling DrawQuad with the incompatible TextureSource. I would prefer that we make sure that this holds true for all texture types, rather than add another layer of checks in in the compositor.
Attachment #8742305 - Flags: feedback?(nical.bugzilla) → feedback-
If we can do that (prevent the bad thing from happening), I would want to see gfxDevCrash() on the bad compositor/texture source mixtures.  So, prevent it from happening, but report if it does happen, and make it fatal on nightly and aurora by using gfxDevCrash.
This is currently #27 in Release [2629 crashes], #15 in Aurora [67 crashes], #22 in Nightly [22 crashes] (it doesn't show up in the topcrash report for Beta).
Attachment #8757642 - Flags: review?(nical.bugzilla)
Attachment #8757642 - Flags: review?(dvander)
Attachment #8742305 - Attachment is obsolete: true
Comment on attachment 8757642 [details] [diff] [review]
check compositor status in TextureHost::lock(). v1

Review of attachment 8757642 [details] [diff] [review]:
-----------------------------------------------------------------

With bug 1256517, we have some checking for compositor type in CompositableParentManager::ReceiveCompositableUpdate().
If the driver reset is occurred during scheduling, we will use the incompatible compositor for painting.

There are a lot of checking in SetCompositor()[1][2]. If we set an incompatible compositor(e.g. a BasicCompositor) to a textureHost(e.g. a DXGITextureHostD3D9), we will clear all textureSource in SetCompositor().
But in TextureHost::lock(), gecko still create the textureSource(e.g. a DataTextureSourceD3D9) back[3]. Then this textureSource is still incompatible with current compositor.

So, my solution is that we check the compositor in lock(). If the compositor is nullptr, make the lock() fail.

[1]
https://hg.mozilla.org/mozilla-central/annotate/4d63dde701b47b8661ab7990f197b6b60e543839/gfx/layers/d3d9/TextureD3D9.cpp#l950
[2]
https://hg.mozilla.org/mozilla-central/annotate/4d63dde701b47b8661ab7990f197b6b60e543839/gfx/layers/d3d11/TextureD3D11.cpp#l673
[3]
https://hg.mozilla.org/mozilla-central/annotate/4d63dde701b47b8661ab7990f197b6b60e543839/gfx/layers/d3d11/TextureD3D11.cpp#l700

ReceiveCompositableUpdate
                            |
      ------------------->  |
                            |
       Schedule composition |
                +-----------+
                |           |
                |           |
                |           | The driver reset is
                |           | happened during scheduling
                |           | <----------
                |           |
                |           |
                +---------> |We still use incompatible
                            |compositor during painting
                            |
                            |
                            v
See Also: → 1275179
And I think the attachment 8757642 [details] [diff] [review] could also solve the problem in bug 1275179.
(In reply to Jerry Shih[:jerry] (UTC+8) from comment #25)
> Comment on attachment 8757642 [details] [diff] [review]
...
> There are a lot of checking in SetCompositor()[1][2]. If we set an
> incompatible compositor(e.g. a BasicCompositor) to a textureHost(e.g. a
> DXGITextureHostD3D9), we will clear all textureSource in SetCompositor().
> But in TextureHost::lock(), gecko still create the textureSource(e.g. a
> DataTextureSourceD3D9) back[3]. Then this textureSource is still
> incompatible with current compositor.
...

Good point, Jerry.
Attachment #8757642 - Flags: review?(nical.bugzilla) → review+
This is a top 20 on Windows, I'd like to see it uplifted to 48 Aurora (probably too late for 47 beta at this point, but it would be a good ride along.)  Please nominate once it lands on nightly.  Thanks!
Version: Trunk → 30 Branch
Attachment #8757642 - Flags: review?(dvander) → review+
Duplicate of this bug: 1275179
https://hg.mozilla.org/mozilla-central/rev/98a0044f91bd
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla49
Comment on attachment 8757642 [details] [diff] [review]
check compositor status in TextureHost::lock(). v1

Approval Request Comment
[Feature/regressing bug #]:
Bug 1187466, bug 1275179.

[User impact if declined]:
This is a top 20 crash on Windows. It has chance to crash when we have driver reset during video playback.

[Describe test coverage new/current, TreeHerder]:
land at mozilla49.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=83da2a15425d

[Risks and why]: 
low.
Just have early return for incompatible compositor case.

[String/UUID change made/needed]:
none
Attachment #8757642 - Flags: approval-mozilla-aurora?
Comment on attachment 8757642 [details] [diff] [review]
check compositor status in TextureHost::lock(). v1

Fix a top crash, taking it
Attachment #8757642 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
(In reply to Milan Sreckovic [:milan] from comment #29)
> This is a top 20 on Windows, I'd like to see it uplifted to 48 Aurora
> (probably too late for 47 beta at this point, but it would be a good ride
> along.)  Please nominate once it lands on nightly.  Thanks!

Hi Milan, 47 has been on the release channel for a week. I looked at the # of instances of this crash on release channel and there is only ~30 instances. Given that, I would prefer to wontfix this for Fx47. Please let me know if you have any concerns. Thanks!
Flags: needinfo?(milan)
It's good on 48.
Flags: needinfo?(milan)
Crash volume for signature 'mozilla::layers::BasicCompositor::DrawQuad':
 - nightly (version 50): 2 crashes from 2016-06-06.
 - aurora  (version 49): 19 crashes from 2016-06-07.
 - beta    (version 48): 54 crashes from 2016-06-06.
 - release (version 47): 173 crashes from 2016-05-31.
 - esr     (version 45): 1202 crashes from 2016-04-07.

Crash volume on the last weeks:
             Week N-1   Week N-2   Week N-3   Week N-4   Week N-5   Week N-6   Week N-7
 - nightly          0          0          0          0          2          0          0
 - aurora           3          3          2          1          4          3          0
 - beta             6          8          4         10          9         11          4
 - release         31         21         28         29         21         26          9
 - esr            172        121        147        144        130        103         83

Affected platforms: Windows, Linux
You need to log in before you can comment on or make changes to this bug.