This bug was filed from the Socorro interface and is report bp-f12bfd98-be91-47c0-89ca-d15ef2160824. ============================================================= Follow up on bug 1187464. These crashes should only show up in nightly and aurora, but this particular crash has some information we didn't have last time - we are in fact in the basic compositor with DataTextureSourceD3D11, so something is out of sync after a device reset.
3 years ago
3 years ago
Depends on: 1187464
A quick look suggests that the media decoder is shipping D3D11 frames after a device reset. We're supposed to protect against this by destroying layers and barring old or incompatible textures from getting attached in CompositableTransactionParent. But something's not working right, the texture is still leaking into the D3D11 compositor.
Crash volume for signature 'mozilla::layers::BasicCompositor::DrawQuad': - nightly (version 51): 10 crashes from 2016-08-01. - aurora (version 50): 4 crashes from 2016-08-01. - beta (version 49): 21 crashes from 2016-08-02. - release (version 48): 25 crashes from 2016-07-25. - esr (version 45): 1975 crashes from 2016-05-02. Crash volume on the last weeks (Week N is from 08-22 to 08-28): W. N-1 W. N-2 W. N-3 - nightly 4 0 1 - aurora 1 1 1 - beta 8 3 3 - release 12 9 1 - esr 204 217 178 Affected platforms: Windows, Mac OS X, Linux Crash rank on the last 7 days: Browser Content Plugin - nightly #110 - aurora #1573 - beta #1506 - release #1812 - esr #37
status-firefox48: --- → affected
status-firefox49: --- → affected
status-firefox50: --- → affected
status-firefox-esr45: --- → affected
(In reply to David Anderson [:dvander] from comment #1) > A quick look suggests that the media decoder is shipping D3D11 frames after > a device reset. We're supposed to protect against this by destroying layers > and barring old or incompatible textures from getting attached in > CompositableTransactionParent. > > But something's not working right, the texture is still leaking into the > D3D11 compositor. To be clear, in CompositableTransactionParent, https://dxr.mozilla.org/mozilla-central/source/gfx/layers/ipc/CompositableTransactionParent.cpp#80, we don't need to worry about early return if compositor->GetCompositor() is null? Is it possible we have a valid compositor, but it's already the "new one"?
If the compositable hasn't been attached to any compositor yet, the compositor will be null. The attach happens in a separate operation, which we also try to guard against: https://dxr.mozilla.org/mozilla-central/rev/7963ebdd52b93f96b812eff2eab8d94097147b9c/gfx/layers/ipc/LayerTransactionParent.cpp#625 But maybe this is not robust enough. Possibly we should attach a device counter to all layer transactions so they can be verified against the compositor. (On a side note, wow that's a super confusing check - "IsValid" actually returns whether it's invalid. There must have been a refactoring gone wrong on my part... will fix.)
Assignee: nobody → hshih
Status: NEW → ASSIGNED
If we set the basicCompositor at this line: https://hg.mozilla.org/mozilla-central/annotate/1a5b53a831e5a6c20de1b081c774feb3ff76756c/gfx/layers/composite/ImageHost.cpp#l331 The textureHost::Lock() should failed at this line: https://hg.mozilla.org/mozilla-central/annotate/1a5b53a831e5a6c20de1b081c774feb3ff76756c/gfx/layers/composite/ImageHost.cpp#l335 So I still can't know why there is a path for BasicCompositor::DrawQuad with d3d11TextureHost. Let me digging more.
Oh. I found the problem! \o/ This change break the protection of driver removed. The dxr doesn't show the latest code before. https://hg.mozilla.org/mozilla-central/diff/7b1349cb7487/gfx/layers/d3d11/TextureD3D11.cpp#l736 Let me think about how to fix this.
Attachment #8786284 - Attachment is obsolete: true
Hi Kip, I still can't find the usage for the removing of compositor checking: https://hg.mozilla.org/mozilla-central/diff/7b1349cb7487/gfx/layers/d3d11/TextureD3D11.cpp#l736 I try to search the lock() call usage in that webvr implementations, but I can't see that. Could you please tell more about that? TextureHostD3D11 uses that checking to prevent the mismatched textureHost and compositor pair(e.g. d3d11Texture and basicCompositor). If we remove the checking, we might hit an assert like bp-f12bfd98-be91-47c0-89ca-d15ef2160824.
Given that we've had these crashes before and it's regressed, then it might be worth trying to think about a way to add tests for this. I guess we'd need to fake a error that causes us to recreate the compositor and fallback to Basic. Maybe we could add something to the PCompositor protocol that forces a compositor recreation and tries to composite with the current layer tree? Could be difficult, but seems like a nice thing to have if it's possible.
(In reply to Jerry Shih[:jerry] (UTC+8) from comment #8) > Hi Kip, > I still can't find the usage for the removing of compositor checking: > https://hg.mozilla.org/mozilla-central/diff/7b1349cb7487/gfx/layers/d3d11/ > TextureD3D11.cpp#l736 > I try to search the lock() call usage in that webvr implementations, but I > can't see that. > Could you please tell more about that? > > TextureHostD3D11 uses that checking to prevent the mismatched textureHost > and compositor pair(e.g. d3d11Texture and basicCompositor). If we remove the > checking, we might hit an assert like > bp-f12bfd98-be91-47c0-89ca-d15ef2160824. In the case of WebVR, we do not use a compositor object at all and VR is not supported without DX11 and a discrete GPU. In this case mCompositor will always be null. The VR equivalent currently uses DX11 calls only. Perhaps we could come up with a higher level interface that could be shared between the 2d compositor types and the VR compositing functions? Alternately, would it help to have a boolean set by the VR case to inform TextureHost that mCompositor is expected to be null?
If there is a special case for "mCompositor is null and that's OK", it sounds like a boolean to tag that case would be a way to go.
There are to cases for video update. a) If the image compositable hasn't been attached, the attachment will be done at: https://dxr.mozilla.org/mozilla-central/rev/7963ebdd52b93f96b812eff2eab8d94097147b9c/gfx/layers/ipc/LayerTransactionParent.cpp#625 and we have a guard checking for driver-reset. b) If the image compositable is already attached and chrome side just receives the new textureHosts, the textureHosts will be set at: https://dxr.mozilla.org/mozilla-central/rev/3ba5426a03b495b6417fffb872d42874edb80855/gfx/layers/ipc/CompositableTransactionParent.cpp#80 We also have a driver-reset guard checking here, too. For case b, bug 1298501 fix the typo for that invalid compositor checking. Before that change, I think that's the reason why we still have the combination of basicCompositor and d3d11TextureHost. I add some patch in textureHost::Lock() for the last-ditch driver-reset checking before. Maybe that changes could reverted after bug 1298501.
Depends on: 1298501
After bug 1298501, maybe there will be no incompatible textureHost/compositor pair later. However, I think we still could have a last-ditch checking before the DrawQuad() call. This patch is try to fix the checking which we had bfore in WebVR case.
Comment on attachment 8787544 [details] [diff] [review] use different lock/unlock function for WebVR composition path. v2 Review of attachment 8787544 [details] [diff] [review]: ----------------------------------------------------------------- Overall, this is fine. Minor changes to consider: Naming things - while the "I don't need the compositor" is currently (only) used by VR, it seems that at the level of TextureHost, the distinction is LockWithoutCompositor and UnlockWithoutCompositor or some such, rather than LockVR/UnlockVR. TextureHost doesn't and shouldn't know about VR, so polluting its function names may be better avoided. In other words, name things for what they do, not how they're currently used. The AutoLock... classes are small enough that it probably doesn't make sense to have a base and derived classes, but if we start adding more things in there, it would be something to consider.
Attachment #8787544 - Flags: review?(milan) → review+
The comment 14 is wrong. The behavior of  in bug 1298501 is eventually the same as before. So, it's still a path go with an incompatible textureHost/compositor pair.  https://bug1298501.bmoattachments.org/attachment.cgi?id=8785454
refine the comment and function naming for comment 15.
fix the typo of lock/unlock function prototypes in textureHost.h.
Attachment #8788081 - Attachment is obsolete: true
Please land the attachment 8788090 [details] [diff] [review] to m-c. try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=575b2a936380
Pushed by email@example.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/d64ac5fa8914 Use different lock/unlock function for WebVR composition path. r=milan
Status: ASSIGNED → RESOLVED
Last Resolved: 3 years ago
status-firefox51: affected → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla51
Not a very big volume but we could take an uplift to 50. Thanks
status-firefox48: affected → wontfix
status-firefox49: affected → wontfix
status-firefox-esr45: affected → wontfix
In comment 7, the change https://hg.mozilla.org/mozilla-central/diff/7b1349cb7487/gfx/layers/d3d11/TextureD3D11.cpp#l736 is in firefox51. So, I think this patch doesn't need to uplift. Sylvestre, do you see the similar drawQuad() crash report in ff50?
Flags: needinfo?(hshih) → needinfo?(sledru)
On: https://crash-stats.mozilla.com/signature/?signature=mozilla%3A%3Alayers%3A%3ABasicCompositor%3A%3ADrawQuad I can see 4 crashes over the last 7 days. We see it on many different older version (44, 45, etc) Maybe this is a different issue ?!
ff50 crash: https://crash-stats.mozilla.com/report/index/8f7630ca-37fe-4c31-b4c1-441372160908 https://crash-stats.mozilla.com/report/index/1d8c6d46-ed82-4d78-8c2f-8b9d52160907 https://crash-stats.mozilla.com/report/index/afd3d541-846b-49fc-b53d-83e572160907 These three are crashed at https://hg.mozilla.org/mozilla-unified/annotate/3a0fd1bb116a/gfx/layers/basic/BasicCompositor.cpp#l522 which is different from this bug. Will have a bug to track this. But this https://crash-stats.mozilla.com/report/index/01d71a7e-b9f5-40f4-ade6-7ddd22160905 is weird. I think we should not have this crashed path. I will check this, either. For ff48, I have a quick look for these log. https://crash-stats.mozilla.com/report/index/125e2231-d423-4295-a066-6d3462160908 https://crash-stats.mozilla.com/report/index/79564e0a-8a2f-4b82-8879-c27f92160906 https://crash-stats.mozilla.com/report/index/edfe96a9-468e-4413-9c64-05a082160905 are not related to this and the fix of bug 1187466. Let me look more about these logs.
OK, thanks. Are you going to report a new bug for the other? thanks
50 is unaffected based on comment 25.
status-firefox50: affected → unaffected
Crash volume for signature 'mozilla::layers::BasicCompositor::DrawQuad': - nightly (version 52): 2 crashes from 2016-09-19. - aurora (version 51): 0 crashes from 2016-09-19. - beta (version 50): 29 crashes from 2016-09-20. - release (version 49): 179 crashes from 2016-09-05. - esr (version 45): 2546 crashes from 2016-07-25. Crash volume on the last weeks (Week N is from 10-17 to 10-23): W. N-1 W. N-2 W. N-3 W. N-4 - nightly 0 2 0 0 - aurora 0 0 0 0 - beta 2 10 9 3 - release 66 57 36 7 - esr 283 248 296 312 Affected platform: Windows Crash rank on the last 7 days: Browser Content Plugin - nightly - aurora - beta #2314 #4182 - release #841 - esr #33
status-firefox52: --- → affected
Sounds like nothing useful is going to come from this bug for 52 at this point.
status-firefox52: affected → wontfix
You need to log in before you can comment on or make changes to this bug.