Closed Bug 1224199 Opened 4 years ago Closed 3 years ago

Crashes caused by SharedSurfaces outliving their GLContext.

Categories

(Core :: Graphics, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
Tracking Status
firefox45 --- wontfix
firefox46 --- wontfix
firefox47 + fixed
firefox48 + fixed
firefox-esr45 47+ fixed

People

(Reporter: nical, Assigned: jgilbert)

References

Details

(Whiteboard: [gfx-noted])

Attachments

(7 files, 1 obsolete file)

1.83 KB, patch
Details | Diff | Splinter Review
3.37 KB, patch
nical
: review+
Details | Diff | Splinter Review
1.88 KB, patch
nical
: review+
Details | Diff | Splinter Review
58 bytes, text/x-review-board-request
jrmuizel
: review+
Details
58 bytes, text/x-review-board-request
jrmuizel
: review+
Details
58 bytes, text/x-review-board-request
jrmuizel
: review+
Details
58 bytes, text/x-review-board-request
jrmuizel
: review+
Details
Destroying a GLContext causes the destruction of its ScreenBuffer, which causes the destruction of a SurfaceFactory object, which keeps some textures alive in its recycle pool. As the pool is destroyed, its textures are destroyed too in the process, and destroying these textures causes the destruction of SharedSurface objects. SharedSurface destruction involves some calls to the GLContext that is being destroyed (and is pretty much at the end of its destructor) which causes crashes a crash in MakeCurrent.

I ended up in this situation by clicking on the background webgl canvas with the dom inspector from the devtool in http://www.ro.me/ (not sure what is important and what isn't in these STR, but that's how I found this crash).
Whiteboard: [gfx-noted]
I've been taking a look at this crash on Fennec, and it looks to be canvas/surface recycling related after working back from the crash stacks.

The story seems to go:
 - A canvas is updated, and CanvasClientSharedSurface::Updated() is called. Before switching out the front TextureClient, it calls WaitForCompositorRecycle() on it. [1]

 - This causes the TextureChild to keep a reference to the TextureClient, and calls SendClientRecycle to the compositor. The compositor is expected to reply with SendCompositorRecycle once it is finished with the corresponding TextureHost, at which point the TextureChild derefs the TextureClient. [2]

 - Content finishes with the GLContext (the canvas is being destroyed or the context lost or...?) and it kills its GLScreenBuffer, which kills its SurfaceFactory. The SurfaceFactory *should* kill its entire pool of SharedSurfaceTextureClients by dereferencing them [3]. This is fine at this point because the GLContext hasn't fully died yet and we can still use its function table.

 - Some time later, RecvCompositorRecycle() is called on a TextureChild, and it clears the last reference to the TextureClient. At this point, the GLContext has already died so we crash.


[1] https://dxr.mozilla.org/mozilla-central/rev/4ea7408b3eef059aa248f4b00328f8fdb4475112/gfx/layers/client/CanvasClient.cpp#459
[2] https://dxr.mozilla.org/mozilla-central/rev/4ea7408b3eef059aa248f4b00328f8fdb4475112/gfx/layers/client/TextureClient.cpp#118
[3] https://dxr.mozilla.org/mozilla-central/rev/eb25b90a05c194bfd4f498ff3ffee7440f85f1cd/gfx/gl/SharedSurface.cpp#304
Assignee: nobody → edwin
Attached patch 1224199.patch (obsolete) — Splinter Review
Fix. This patch simply gets rid of the waiting TextureClient reference in TextureChild when SurfaceFactory is killing its SharedSurfaceTextureClient pool.
Attachment #8726312 - Flags: review?(nical.bugzilla)
Attached patch 1224199.patchSplinter Review
Fixes a UAF I noticed in the last patch just after uploading...
Attachment #8726312 - Attachment is obsolete: true
Attachment #8726312 - Flags: review?(nical.bugzilla)
Attachment #8726319 - Flags: review?(nical.bugzilla)
Get rid of some unused recycling code. Not sure if it's meant to be used, but it certainly isn't at the moment.
Attachment #8726323 - Flags: review?(nical.bugzilla)
Attachment #8726319 - Flags: review?(nical.bugzilla) → review+
Attachment #8726323 - Flags: review?(nical.bugzilla) → review+
AFAICT, the problem I described in the bug description still exists: Textures depend on the gl context, and the gl context frees textures from its destructor, wich means textures are themselfs calling into a half-dead gl context in their own destruction code. So marking leave-open, but it's cool that this patch improves on the current situation.
Keywords: leave-open
Carrying stats over from bug 1223810.
Jeff and Dan, I think this is your area. It's causing various crashes like bug 1223810 among other things. Is that something you can look into soon-ish?
Assignee: edwin → nobody
Flags: needinfo?(jgilbert)
Flags: needinfo?(dglastonbury)
Flags: needinfo?(milan)
Jeff should be able to take care of this, don't need Dan.
Flags: needinfo?(milan)
Flags: needinfo?(dglastonbury)
I'll take this.
Assignee: nobody → jgilbert
Flags: needinfo?(jgilbert)
From 320888354ebfbedc1f0509b5baf57861c66ab7ae Mon Sep 17 00:00:00 2001
---
 gfx/gl/GLContext.cpp | 12 ++----------
 gfx/gl/GLContext.h   |  2 --
 2 files changed, 2 insertions(+), 12 deletions(-)

Review commit: https://reviewboard.mozilla.org/r/46261/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/46261/
Attachment #8741181 - Flags: review?(jmuizelaar)
Attachment #8741182 - Flags: review?(jmuizelaar)
Attachment #8741183 - Flags: review?(jmuizelaar)
Attachment #8741184 - Flags: review?(jmuizelaar)
From 5ee22edfd3cc4364806d7d1cf82e651760dbcbb9 Mon Sep 17 00:00:00 2001
---
 gfx/gl/GLBlitHelper.cpp         | 3 +++
 gfx/gl/GLContext.cpp            | 7 ++++---
 gfx/gl/GLReadTexImageHelper.cpp | 3 +++
 3 files changed, 10 insertions(+), 3 deletions(-)

Review commit: https://reviewboard.mozilla.org/r/46263/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/46263/
From 5bbe6aa209356262374466fbe91629285ba1c637 Mon Sep 17 00:00:00 2001
---
 gfx/gl/TextureGarbageBin.cpp | 1 +
 1 file changed, 1 insertion(+)

Review commit: https://reviewboard.mozilla.org/r/46265/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/46265/
From 1f9516bc5b0a938fd26b32ae1892cd325e600752 Mon Sep 17 00:00:00 2001
---
 gfx/gl/SharedSurfaceANGLE.cpp        |  4 +++-
 gfx/gl/SharedSurfaceD3D11Interop.cpp |  7 +++++--
 gfx/gl/SharedSurfaceEGL.cpp          | 20 +++++++++++---------
 gfx/gl/SharedSurfaceGLX.cpp          |  8 ++++----
 gfx/gl/SharedSurfaceGralloc.cpp      |  6 +++---
 gfx/gl/SharedSurfaceIO.cpp           |  9 ++++-----
 6 files changed, 30 insertions(+), 24 deletions(-)

Review commit: https://reviewboard.mozilla.org/r/46267/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/46267/
Comment on attachment 8741181 [details]
MozReview Request: Bug 1224199 - Remove DestroyScreenBuffer. r?jrmuizel

https://reviewboard.mozilla.org/r/46261/#review43077
Attachment #8741181 - Flags: review?(jmuizelaar) → review+
Comment on attachment 8741182 [details]
MozReview Request: Null eagerly in MarkDestroyed. r?jrmuizel

https://reviewboard.mozilla.org/r/46263/#review43079
Attachment #8741182 - Flags: review?(jmuizelaar) → review+
Comment on attachment 8741183 [details]
MozReview Request: Assert IsCurrent in TextureGarbageBin::EmptyGarbage. r?jrmuizel

https://reviewboard.mozilla.org/r/46265/#review43081
Attachment #8741183 - Flags: review?(jmuizelaar) → review+
Comment on attachment 8741184 [details]
MozReview Request: Fortify SharedSurface dtors against MakeCurrent failure. r?jrmuizel

https://reviewboard.mozilla.org/r/46267/#review43083

It would be nice if had some way to cause MakeCurrent to fail from a test harness so that we could better test this code.
Attachment #8741184 - Flags: review?(jmuizelaar) → review+
Tracking this as it seems like a good candidate for uplift to 47 once it lands in m-c and has some verification in 48.
Status: NEW → RESOLVED
Closed: 3 years ago
Keywords: leave-open
Resolution: --- → FIXED
Comment on attachment 8741181 [details]
MozReview Request: Bug 1224199 - Remove DestroyScreenBuffer. r?jrmuizel

Approval Request Comment
[Feature/regressing bug #]: very old
[User impact if declined]: Crashes in some cases when using WebGL.
[Describe test coverage new/current, TreeHerder]: none
[Risks and why]: Low: This is very unlikely to make anything worse, but there's a chance it doesn't completely fix the issues here.
[String/UUID change made/needed]: none
Attachment #8741181 - Flags: approval-mozilla-aurora?
That approval request goes for all the patches that follow it as well.

The patches previous to mine are already in 47.
I'm going to leave this open until we're done with uplifts.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment on attachment 8741181 [details]
MozReview Request: Bug 1224199 - Remove DestroyScreenBuffer. r?jrmuizel

Crash fixes in WebGL, Aurora47+
Attachment #8741181 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #8741182 - Flags: approval-mozilla-aurora+
Attachment #8741183 - Flags: approval-mozilla-aurora+
Attachment #8741184 - Flags: approval-mozilla-aurora+
Comment on attachment 8741181 [details]
MozReview Request: Bug 1224199 - Remove DestroyScreenBuffer. r?jrmuizel

[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration:
User impact if declined: Occasional crashes when WebGL is GC'd or otherwise destroyed.
Fix Landed on Version: 47+
Risk to taking this patch (and alternatives if risky): Low: This is very unlikely to make anything worse, but there's a chance it doesn't completely fix the issues here.
String or UUID changes made by this patch: None

See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for more info.
Attachment #8741181 - Flags: approval-mozilla-esr45?
This patch also seemed to have stopped a crash from bug 1272753.  20 or so crashes in the ESR versions last week.  400 crashes total between all versions, stopping with 46.0.1.
[Tracking Requested - why for this release]:
It's a crash, and needed for bug 1223810.
Comment on attachment 8741181 [details]
MozReview Request: Bug 1224199 - Remove DestroyScreenBuffer. r?jrmuizel

Crash fix, landed already in 47, ok to uplift to esr45. 
Jeff, double checking, only this patch to uplift to esr?
Flags: needinfo?(jgilbert)
Attachment #8741181 - Flags: approval-mozilla-esr45? → approval-mozilla-esr45+
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #31)
> Comment on attachment 8741181 [details]
> MozReview Request: Bug 1224199 - Remove DestroyScreenBuffer. r?jrmuizel
> 
> Crash fix, landed already in 47, ok to uplift to esr45. 
> Jeff, double checking, only this patch to uplift to esr?

All patches that were landed on Central should be landed on ESR45.
Flags: needinfo?(jgilbert)
has problems to apply to esr45:

grafting 339897:22f36f2a2e4c "Bug 1224199 - Destroy SharedSurfaces before ~GLContext(). - r=jrmuizel"
merging gfx/gl/GLBlitHelper.cpp
merging gfx/gl/GLContext.cpp
merging gfx/gl/GLContext.h
merging gfx/gl/GLReadTexImageHelper.cpp
merging gfx/gl/SharedSurfaceANGLE.cpp
merging gfx/gl/SharedSurfaceD3D11Interop.cpp
merging gfx/gl/SharedSurfaceEGL.cpp
merging gfx/gl/SharedSurfaceGLX.cpp
merging gfx/gl/SharedSurfaceGralloc.cpp
merging gfx/gl/SharedSurfaceIO.cpp
warning: conflicts while merging gfx/gl/SharedSurfaceD3D11Interop.cpp! (edit, then use 'hg resolve --mark')
abort: unresolved conflicts, can't continue
(use 'hg resolve' and 'hg graft --continue')
Tomcats-MacBook-Pro-2:mozilla-central Tomcat$
Flags: needinfo?(jgilbert)
milan can you help out here if Jeff isn't around? For this bug and bug 1223810.
Flags: needinfo?(milan)
I've got this.
Flags: needinfo?(milan)
Pulling in the fix from bug 1241484 seems important here, as part of rebasing.
Depends on: 1241484
Flags: needinfo?(jgilbert)
That should be it!
Status: REOPENED → RESOLVED
Closed: 3 years ago3 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.