Crash in mozilla::layers::BasicCompositor::DrawQuad

RESOLVED FIXED in Firefox 51

Status

()

--
critical
RESOLVED FIXED
3 years ago
2 years ago

People

(Reporter: milan, Assigned: jerry)

Tracking

(Blocks: 1 bug, {crash})

Trunk
mozilla51
x86
Windows 10
crash
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox48 wontfix, firefox49 wontfix, firefox-esr45 wontfix, firefox50 unaffected, firefox51 fixed, firefox52 wontfix)

Details

(Whiteboard: [gfx-noted], crash signature)

Attachments

(1 attachment, 4 obsolete attachments)

This bug was filed from the Socorro interface and is 
report bp-f12bfd98-be91-47c0-89ca-d15ef2160824.
=============================================================

Follow up on bug 1187464.  These crashes should only show up in nightly and aurora, but this particular crash has some information we didn't have last time - we are in fact in the basic compositor with DataTextureSourceD3D11, so something is out of sync after a device reset.
Whiteboard: [gfx-noted]
A quick look suggests that the media decoder is shipping D3D11 frames after a device reset. We're supposed to protect against this by destroying layers and barring old or incompatible textures from getting attached in CompositableTransactionParent.

But something's not working right, the texture is still leaking into the D3D11 compositor.
Crash volume for signature 'mozilla::layers::BasicCompositor::DrawQuad':
 - nightly (version 51): 10 crashes from 2016-08-01.
 - aurora  (version 50): 4 crashes from 2016-08-01.
 - beta    (version 49): 21 crashes from 2016-08-02.
 - release (version 48): 25 crashes from 2016-07-25.
 - esr     (version 45): 1975 crashes from 2016-05-02.

Crash volume on the last weeks (Week N is from 08-22 to 08-28):
            W. N-1  W. N-2  W. N-3
 - nightly       4       0       1
 - aurora        1       1       1
 - beta          8       3       3
 - release      12       9       1
 - esr         204     217     178

Affected platforms: Windows, Mac OS X, Linux

Crash rank on the last 7 days:
           Browser     Content   Plugin
 - nightly #110
 - aurora  #1573
 - beta    #1506
 - release #1812
 - esr     #37
status-firefox48: --- → affected
status-firefox49: --- → affected
status-firefox50: --- → affected
status-firefox-esr45: --- → affected
(In reply to David Anderson [:dvander] from comment #1)
> A quick look suggests that the media decoder is shipping D3D11 frames after
> a device reset. We're supposed to protect against this by destroying layers
> and barring old or incompatible textures from getting attached in
> CompositableTransactionParent.
> 
> But something's not working right, the texture is still leaking into the
> D3D11 compositor.

To be clear, in CompositableTransactionParent, https://dxr.mozilla.org/mozilla-central/source/gfx/layers/ipc/CompositableTransactionParent.cpp#80, we don't need to worry about early return if compositor->GetCompositor() is null?  Is it possible we have a valid compositor, but it's already the "new one"?
If the compositable hasn't been attached to any compositor yet, the compositor will be null. The attach happens in a separate operation, which we also try to guard against:

https://dxr.mozilla.org/mozilla-central/rev/7963ebdd52b93f96b812eff2eab8d94097147b9c/gfx/layers/ipc/LayerTransactionParent.cpp#625

But maybe this is not robust enough. Possibly we should attach a device counter to all layer transactions so they can be verified against the compositor.

(On a side note, wow that's a super confusing check - "IsValid" actually returns whether it's invalid. There must have been a refactoring gone wrong on my part... will fix.)
Assignee: nobody → hshih
Status: NEW → ASSIGNED
If we set the basicCompositor at this line:
https://hg.mozilla.org/mozilla-central/annotate/1a5b53a831e5a6c20de1b081c774feb3ff76756c/gfx/layers/composite/ImageHost.cpp#l331

The textureHost::Lock() should failed at this line:
https://hg.mozilla.org/mozilla-central/annotate/1a5b53a831e5a6c20de1b081c774feb3ff76756c/gfx/layers/composite/ImageHost.cpp#l335

So I still can't know why there is a path for BasicCompositor::DrawQuad with d3d11TextureHost.

Let me digging more.
Oh. I found the problem! \o/

This change break the protection of driver removed. The dxr doesn't show the latest code before.
https://hg.mozilla.org/mozilla-central/diff/7b1349cb7487/gfx/layers/d3d11/TextureD3D11.cpp#l736

Let me think about how to fix this.
Attachment #8786284 - Attachment is obsolete: true
Hi Kip,
I still can't find the usage for the removing of compositor checking:
https://hg.mozilla.org/mozilla-central/diff/7b1349cb7487/gfx/layers/d3d11/TextureD3D11.cpp#l736
I try to search the lock() call usage in that webvr implementations, but I can't see that.
Could you please tell more about that?

TextureHostD3D11 uses that checking to prevent the mismatched textureHost and compositor pair(e.g. d3d11Texture and basicCompositor). If we remove the checking, we might hit an assert like bp-f12bfd98-be91-47c0-89ca-d15ef2160824.
Flags: needinfo?(kgilbert)
Given that we've had these crashes before and it's regressed, then it might be worth trying to think about a way to add tests for this.

I guess we'd need to fake a error that causes us to recreate the compositor and fallback to Basic. Maybe we could add something to the PCompositor protocol that forces a compositor recreation and tries to composite with the current layer tree?

Could be difficult, but seems like a nice thing to have if it's possible.
(In reply to Jerry Shih[:jerry] (UTC+8) from comment #8)
> Hi Kip,
> I still can't find the usage for the removing of compositor checking:
> https://hg.mozilla.org/mozilla-central/diff/7b1349cb7487/gfx/layers/d3d11/
> TextureD3D11.cpp#l736
> I try to search the lock() call usage in that webvr implementations, but I
> can't see that.
> Could you please tell more about that?
> 
> TextureHostD3D11 uses that checking to prevent the mismatched textureHost
> and compositor pair(e.g. d3d11Texture and basicCompositor). If we remove the
> checking, we might hit an assert like
> bp-f12bfd98-be91-47c0-89ca-d15ef2160824.

In the case of WebVR, we do not use a compositor object at all and VR is not supported without DX11 and a discrete GPU.  In this case mCompositor will always be null.  The VR equivalent currently uses DX11 calls only.  Perhaps we could come up with a higher level interface that could be shared between the 2d compositor types and the VR compositing functions?  Alternately, would it help to have a boolean set by the VR case to inform TextureHost that mCompositor is expected to be null?
Flags: needinfo?(kgilbert)
If there is a special case for "mCompositor is null and that's OK", it sounds like a boolean to tag that case would be a way to go.
There are to cases for video update.

a) If the image compositable hasn't been attached, the attachment will be done at:
https://dxr.mozilla.org/mozilla-central/rev/7963ebdd52b93f96b812eff2eab8d94097147b9c/gfx/layers/ipc/LayerTransactionParent.cpp#625
and we have a guard checking for driver-reset.

b) If the image compositable is already attached and chrome side just receives the new textureHosts, the textureHosts will be set at:
https://dxr.mozilla.org/mozilla-central/rev/3ba5426a03b495b6417fffb872d42874edb80855/gfx/layers/ipc/CompositableTransactionParent.cpp#80
We also have a driver-reset guard checking here, too.

For case b, bug 1298501 fix the typo for that invalid compositor checking. Before that change, I think that's the reason why we still have the combination of basicCompositor and d3d11TextureHost.

I add some patch in textureHost::Lock() for the last-ditch driver-reset checking before. Maybe that changes could reverted after bug 1298501.
Depends on: 1298501
Comment hidden (obsolete)
After bug 1298501, maybe there will be no incompatible textureHost/compositor pair later.
However, I think we still could have a last-ditch checking before the DrawQuad() call.
This patch is try to fix the checking which we had bfore in WebVR case.
Attachment #8787544 - Flags: review?(nical.bugzilla)
Attachment #8787544 - Flags: review?(milan)
Attachment #8787544 - Flags: review?(dvander)
Attachment #8787542 - Attachment is obsolete: true
Attachment #8787542 - Flags: review?(nical.bugzilla)
Attachment #8787542 - Flags: review?(milan)
Attachment #8787542 - Flags: review?(dvander)
Comment on attachment 8787544 [details] [diff] [review]
use different lock/unlock function for WebVR composition path. v2

Review of attachment 8787544 [details] [diff] [review]:
-----------------------------------------------------------------

Overall, this is fine.  Minor changes to consider:

Naming things - while the "I don't need the compositor" is currently (only) used by VR, it seems that at the level of TextureHost, the distinction is LockWithoutCompositor and UnlockWithoutCompositor or some such, rather than LockVR/UnlockVR.  TextureHost doesn't and shouldn't know about VR, so polluting its function names may be better avoided.  In other words, name things for what they do, not how they're currently used.

The AutoLock... classes are small enough that it probably doesn't make sense to have a base and derived classes, but if we start adding more things in there, it would be something to consider.
Attachment #8787544 - Flags: review?(milan) → review+
The comment 14 is wrong.
The behavior of [1] in bug 1298501 is eventually the same as before. So, it's still a path go with an incompatible textureHost/compositor pair.

[1]
https://bug1298501.bmoattachments.org/attachment.cgi?id=8785454
refine the comment and function naming for comment 15.
Attachment #8787544 - Attachment is obsolete: true
Attachment #8787544 - Flags: review?(nical.bugzilla)
Attachment #8787544 - Flags: review?(dvander)
fix the typo of lock/unlock function prototypes in textureHost.h.
Attachment #8788081 - Attachment is obsolete: true

Comment 20

3 years ago
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/d64ac5fa8914
Use different lock/unlock function for WebVR composition path. r=milan
Keywords: checkin-needed
Blocks: 1300699

Comment 21

3 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/d64ac5fa8914
Status: ASSIGNED → RESOLVED
Last Resolved: 3 years ago
status-firefox51: affected → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla51
Not a very big volume but we could take an uplift to 50. Thanks
status-firefox48: affected → wontfix
status-firefox49: affected → wontfix
status-firefox-esr45: affected → wontfix
Flags: needinfo?(hshih)
In comment 7, the change

https://hg.mozilla.org/mozilla-central/diff/7b1349cb7487/gfx/layers/d3d11/TextureD3D11.cpp#l736

is in firefox51. So, I think this patch doesn't need to uplift.

Sylvestre, do you see the similar drawQuad() crash report in ff50?
Flags: needinfo?(hshih) → needinfo?(sledru)
On:
https://crash-stats.mozilla.com/signature/?signature=mozilla%3A%3Alayers%3A%3ABasicCompositor%3A%3ADrawQuad
I can see 4 crashes over the last 7 days.

We see it on many different older version (44, 45, etc)
Maybe this is a different issue ?!
Flags: needinfo?(sledru)
OK, thanks. Are you going to report a new bug for the other? thanks
Flags: needinfo?(sledru)
50 is unaffected based on comment 25.
status-firefox50: affected → unaffected

Updated

3 years ago
Blocks: 1297204
Crash volume for signature 'mozilla::layers::BasicCompositor::DrawQuad':
 - nightly (version 52): 2 crashes from 2016-09-19.
 - aurora  (version 51): 0 crashes from 2016-09-19.
 - beta    (version 50): 29 crashes from 2016-09-20.
 - release (version 49): 179 crashes from 2016-09-05.
 - esr     (version 45): 2546 crashes from 2016-07-25.

Crash volume on the last weeks (Week N is from 10-17 to 10-23):
            W. N-1  W. N-2  W. N-3  W. N-4
 - nightly       0       2       0       0
 - aurora        0       0       0       0
 - beta          2      10       9       3
 - release      66      57      36       7
 - esr         283     248     296     312

Affected platform: Windows

Crash rank on the last 7 days:
           Browser   Content     Plugin
 - nightly
 - aurora
 - beta    #2314     #4182
 - release #841
 - esr     #33
status-firefox52: --- → affected
Sounds like nothing useful is going to come from this bug for 52 at this point.
status-firefox52: affected → wontfix
You need to log in before you can comment on or make changes to this bug.