Crash in mozilla::layers::BasicCompositor::DrawQuad

RESOLVED FIXED in Firefox 51

Status

()

Core
Graphics
--
critical
RESOLVED FIXED
a year ago
11 months ago

People

(Reporter: milan, Assigned: jerry)

Tracking

(Blocks: 1 bug, {crash})

Trunk
mozilla51
x86
Windows 10
crash
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox48 wontfix, firefox49 wontfix, firefox-esr45 wontfix, firefox50 unaffected, firefox51 fixed, firefox52 wontfix)

Details

(Whiteboard: [gfx-noted], crash signature)

Attachments

(1 attachment, 4 obsolete attachments)

(Reporter)

Description

a year ago
This bug was filed from the Socorro interface and is 
report bp-f12bfd98-be91-47c0-89ca-d15ef2160824.
=============================================================

Follow up on bug 1187464.  These crashes should only show up in nightly and aurora, but this particular crash has some information we didn't have last time - we are in fact in the basic compositor with DataTextureSourceD3D11, so something is out of sync after a device reset.
(Reporter)

Updated

a year ago
Whiteboard: [gfx-noted]
(Reporter)

Updated

a year ago
Depends on: 1187464
A quick look suggests that the media decoder is shipping D3D11 frames after a device reset. We're supposed to protect against this by destroying layers and barring old or incompatible textures from getting attached in CompositableTransactionParent.

But something's not working right, the texture is still leaking into the D3D11 compositor.
Crash volume for signature 'mozilla::layers::BasicCompositor::DrawQuad':
 - nightly (version 51): 10 crashes from 2016-08-01.
 - aurora  (version 50): 4 crashes from 2016-08-01.
 - beta    (version 49): 21 crashes from 2016-08-02.
 - release (version 48): 25 crashes from 2016-07-25.
 - esr     (version 45): 1975 crashes from 2016-05-02.

Crash volume on the last weeks (Week N is from 08-22 to 08-28):
            W. N-1  W. N-2  W. N-3
 - nightly       4       0       1
 - aurora        1       1       1
 - beta          8       3       3
 - release      12       9       1
 - esr         204     217     178

Affected platforms: Windows, Mac OS X, Linux

Crash rank on the last 7 days:
           Browser     Content   Plugin
 - nightly #110
 - aurora  #1573
 - beta    #1506
 - release #1812
 - esr     #37
status-firefox48: --- → affected
status-firefox49: --- → affected
status-firefox50: --- → affected
status-firefox-esr45: --- → affected
(Reporter)

Comment 3

a year ago
(In reply to David Anderson [:dvander] from comment #1)
> A quick look suggests that the media decoder is shipping D3D11 frames after
> a device reset. We're supposed to protect against this by destroying layers
> and barring old or incompatible textures from getting attached in
> CompositableTransactionParent.
> 
> But something's not working right, the texture is still leaking into the
> D3D11 compositor.

To be clear, in CompositableTransactionParent, https://dxr.mozilla.org/mozilla-central/source/gfx/layers/ipc/CompositableTransactionParent.cpp#80, we don't need to worry about early return if compositor->GetCompositor() is null?  Is it possible we have a valid compositor, but it's already the "new one"?
If the compositable hasn't been attached to any compositor yet, the compositor will be null. The attach happens in a separate operation, which we also try to guard against:

https://dxr.mozilla.org/mozilla-central/rev/7963ebdd52b93f96b812eff2eab8d94097147b9c/gfx/layers/ipc/LayerTransactionParent.cpp#625

But maybe this is not robust enough. Possibly we should attach a device counter to all layer transactions so they can be verified against the compositor.

(On a side note, wow that's a super confusing check - "IsValid" actually returns whether it's invalid. There must have been a refactoring gone wrong on my part... will fix.)
(Assignee)

Updated

a year ago
Assignee: nobody → hshih
Status: NEW → ASSIGNED
(Assignee)

Comment 5

a year ago
If we set the basicCompositor at this line:
https://hg.mozilla.org/mozilla-central/annotate/1a5b53a831e5a6c20de1b081c774feb3ff76756c/gfx/layers/composite/ImageHost.cpp#l331

The textureHost::Lock() should failed at this line:
https://hg.mozilla.org/mozilla-central/annotate/1a5b53a831e5a6c20de1b081c774feb3ff76756c/gfx/layers/composite/ImageHost.cpp#l335

So I still can't know why there is a path for BasicCompositor::DrawQuad with d3d11TextureHost.

Let me digging more.
(Assignee)

Comment 6

a year ago
Created attachment 8786284 [details] [diff] [review]
don't choose image again in ImageHost::Lock(). v1
(Assignee)

Comment 7

a year ago
Oh. I found the problem! \o/

This change break the protection of driver removed. The dxr doesn't show the latest code before.
https://hg.mozilla.org/mozilla-central/diff/7b1349cb7487/gfx/layers/d3d11/TextureD3D11.cpp#l736

Let me think about how to fix this.
(Assignee)

Updated

a year ago
Attachment #8786284 - Attachment is obsolete: true
(Assignee)

Comment 8

a year ago
Hi Kip,
I still can't find the usage for the removing of compositor checking:
https://hg.mozilla.org/mozilla-central/diff/7b1349cb7487/gfx/layers/d3d11/TextureD3D11.cpp#l736
I try to search the lock() call usage in that webvr implementations, but I can't see that.
Could you please tell more about that?

TextureHostD3D11 uses that checking to prevent the mismatched textureHost and compositor pair(e.g. d3d11Texture and basicCompositor). If we remove the checking, we might hit an assert like bp-f12bfd98-be91-47c0-89ca-d15ef2160824.
Flags: needinfo?(kgilbert)
Given that we've had these crashes before and it's regressed, then it might be worth trying to think about a way to add tests for this.

I guess we'd need to fake a error that causes us to recreate the compositor and fallback to Basic. Maybe we could add something to the PCompositor protocol that forces a compositor recreation and tries to composite with the current layer tree?

Could be difficult, but seems like a nice thing to have if it's possible.
(In reply to Jerry Shih[:jerry] (UTC+8) from comment #8)
> Hi Kip,
> I still can't find the usage for the removing of compositor checking:
> https://hg.mozilla.org/mozilla-central/diff/7b1349cb7487/gfx/layers/d3d11/
> TextureD3D11.cpp#l736
> I try to search the lock() call usage in that webvr implementations, but I
> can't see that.
> Could you please tell more about that?
> 
> TextureHostD3D11 uses that checking to prevent the mismatched textureHost
> and compositor pair(e.g. d3d11Texture and basicCompositor). If we remove the
> checking, we might hit an assert like
> bp-f12bfd98-be91-47c0-89ca-d15ef2160824.

In the case of WebVR, we do not use a compositor object at all and VR is not supported without DX11 and a discrete GPU.  In this case mCompositor will always be null.  The VR equivalent currently uses DX11 calls only.  Perhaps we could come up with a higher level interface that could be shared between the 2d compositor types and the VR compositing functions?  Alternately, would it help to have a boolean set by the VR case to inform TextureHost that mCompositor is expected to be null?
Flags: needinfo?(kgilbert)
(Reporter)

Comment 11

a year ago
If there is a special case for "mCompositor is null and that's OK", it sounds like a boolean to tag that case would be a way to go.
(Assignee)

Comment 12

a year ago
There are to cases for video update.

a) If the image compositable hasn't been attached, the attachment will be done at:
https://dxr.mozilla.org/mozilla-central/rev/7963ebdd52b93f96b812eff2eab8d94097147b9c/gfx/layers/ipc/LayerTransactionParent.cpp#625
and we have a guard checking for driver-reset.

b) If the image compositable is already attached and chrome side just receives the new textureHosts, the textureHosts will be set at:
https://dxr.mozilla.org/mozilla-central/rev/3ba5426a03b495b6417fffb872d42874edb80855/gfx/layers/ipc/CompositableTransactionParent.cpp#80
We also have a driver-reset guard checking here, too.

For case b, bug 1298501 fix the typo for that invalid compositor checking. Before that change, I think that's the reason why we still have the combination of basicCompositor and d3d11TextureHost.

I add some patch in textureHost::Lock() for the last-ditch driver-reset checking before. Maybe that changes could reverted after bug 1298501.
Depends on: 1298501
Comment hidden (obsolete)
(Assignee)

Comment 14

a year ago
Created attachment 8787544 [details] [diff] [review]
use different lock/unlock function for WebVR composition path. v2

After bug 1298501, maybe there will be no incompatible textureHost/compositor pair later.
However, I think we still could have a last-ditch checking before the DrawQuad() call.
This patch is try to fix the checking which we had bfore in WebVR case.
Attachment #8787544 - Flags: review?(nical.bugzilla)
Attachment #8787544 - Flags: review?(milan)
Attachment #8787544 - Flags: review?(dvander)
(Assignee)

Updated

a year ago
Attachment #8787542 - Attachment is obsolete: true
Attachment #8787542 - Flags: review?(nical.bugzilla)
Attachment #8787542 - Flags: review?(milan)
Attachment #8787542 - Flags: review?(dvander)
(Reporter)

Comment 15

a year ago
Comment on attachment 8787544 [details] [diff] [review]
use different lock/unlock function for WebVR composition path. v2

Review of attachment 8787544 [details] [diff] [review]:
-----------------------------------------------------------------

Overall, this is fine.  Minor changes to consider:

Naming things - while the "I don't need the compositor" is currently (only) used by VR, it seems that at the level of TextureHost, the distinction is LockWithoutCompositor and UnlockWithoutCompositor or some such, rather than LockVR/UnlockVR.  TextureHost doesn't and shouldn't know about VR, so polluting its function names may be better avoided.  In other words, name things for what they do, not how they're currently used.

The AutoLock... classes are small enough that it probably doesn't make sense to have a base and derived classes, but if we start adding more things in there, it would be something to consider.
Attachment #8787544 - Flags: review?(milan) → review+
(Assignee)

Comment 16

a year ago
The comment 14 is wrong.
The behavior of [1] in bug 1298501 is eventually the same as before. So, it's still a path go with an incompatible textureHost/compositor pair.

[1]
https://bug1298501.bmoattachments.org/attachment.cgi?id=8785454
(Assignee)

Comment 17

a year ago
Created attachment 8788081 [details] [diff] [review]
use different lock/unlock function for WebVR composition path. v3. r=milan

refine the comment and function naming for comment 15.
(Assignee)

Updated

a year ago
Attachment #8787544 - Attachment is obsolete: true
Attachment #8787544 - Flags: review?(nical.bugzilla)
Attachment #8787544 - Flags: review?(dvander)
(Assignee)

Comment 18

a year ago
Created attachment 8788090 [details] [diff] [review]
use different lock/unlock function for WebVR composition path. v4. r=milan

fix the typo of lock/unlock function prototypes in textureHost.h.
(Assignee)

Updated

a year ago
Attachment #8788081 - Attachment is obsolete: true
(Assignee)

Comment 19

a year ago
Please land the attachment 8788090 [details] [diff] [review] to m-c.

try:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=575b2a936380
Keywords: checkin-needed

Comment 20

a year ago
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/d64ac5fa8914
Use different lock/unlock function for WebVR composition path. r=milan
Keywords: checkin-needed
(Assignee)

Updated

a year ago
Blocks: 1300699

Comment 21

a year ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/d64ac5fa8914
Status: ASSIGNED → RESOLVED
Last Resolved: a year ago
status-firefox51: affected → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla51
Not a very big volume but we could take an uplift to 50. Thanks
status-firefox48: affected → wontfix
status-firefox49: affected → wontfix
status-firefox-esr45: affected → wontfix
Flags: needinfo?(hshih)
(Assignee)

Comment 23

a year ago
In comment 7, the change

https://hg.mozilla.org/mozilla-central/diff/7b1349cb7487/gfx/layers/d3d11/TextureD3D11.cpp#l736

is in firefox51. So, I think this patch doesn't need to uplift.

Sylvestre, do you see the similar drawQuad() crash report in ff50?
Flags: needinfo?(hshih) → needinfo?(sledru)
On:
https://crash-stats.mozilla.com/signature/?signature=mozilla%3A%3Alayers%3A%3ABasicCompositor%3A%3ADrawQuad
I can see 4 crashes over the last 7 days.

We see it on many different older version (44, 45, etc)
Maybe this is a different issue ?!
Flags: needinfo?(sledru)
(Assignee)

Comment 25

a year ago
ff50 crash:

https://crash-stats.mozilla.com/report/index/8f7630ca-37fe-4c31-b4c1-441372160908
https://crash-stats.mozilla.com/report/index/1d8c6d46-ed82-4d78-8c2f-8b9d52160907
https://crash-stats.mozilla.com/report/index/afd3d541-846b-49fc-b53d-83e572160907
These three are crashed at https://hg.mozilla.org/mozilla-unified/annotate/3a0fd1bb116a/gfx/layers/basic/BasicCompositor.cpp#l522 which is different from this bug.
Will have a bug to track this.

But this https://crash-stats.mozilla.com/report/index/01d71a7e-b9f5-40f4-ade6-7ddd22160905 is weird. I think we should not have this crashed path. I will check this, either.

For ff48, I have a quick look for these log.
https://crash-stats.mozilla.com/report/index/125e2231-d423-4295-a066-6d3462160908
https://crash-stats.mozilla.com/report/index/79564e0a-8a2f-4b82-8879-c27f92160906
https://crash-stats.mozilla.com/report/index/edfe96a9-468e-4413-9c64-05a082160905
are not related to this and the fix of bug 1187466.

Let me look more about these logs.
Flags: needinfo?(sledru)
OK, thanks. Are you going to report a new bug for the other? thanks
Flags: needinfo?(sledru)
50 is unaffected based on comment 25.
status-firefox50: affected → unaffected

Updated

a year ago
Blocks: 1297204
Crash volume for signature 'mozilla::layers::BasicCompositor::DrawQuad':
 - nightly (version 52): 2 crashes from 2016-09-19.
 - aurora  (version 51): 0 crashes from 2016-09-19.
 - beta    (version 50): 29 crashes from 2016-09-20.
 - release (version 49): 179 crashes from 2016-09-05.
 - esr     (version 45): 2546 crashes from 2016-07-25.

Crash volume on the last weeks (Week N is from 10-17 to 10-23):
            W. N-1  W. N-2  W. N-3  W. N-4
 - nightly       0       2       0       0
 - aurora        0       0       0       0
 - beta          2      10       9       3
 - release      66      57      36       7
 - esr         283     248     296     312

Affected platform: Windows

Crash rank on the last 7 days:
           Browser   Content     Plugin
 - nightly
 - aurora
 - beta    #2314     #4182
 - release #841
 - esr     #33
status-firefox52: --- → affected
Sounds like nothing useful is going to come from this bug for 52 at this point.
status-firefox52: affected → wontfix
You need to log in before you can comment on or make changes to this bug.