Closed Bug 929506 Opened 11 years ago Closed 10 years ago

GLContext::MakeCurrent should check that the context is valid

Categories

(Core :: Graphics, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla28
Tracking Status
firefox28 --- fixed

People

(Reporter: nical, Assigned: nical)

References

Details

Attachments

(4 files, 4 obsolete files)

      No description provided.
Assignee: nobody → nical.bugzilla
Attachment #8336230 - Flags: review?(bjacob)
Pretty straight forward. When we call MakeCurrent on the Compositor thread and it fails, return early instead of crashing somewhere in the driver.
Attachment #8336236 - Flags: review?(bjacob)
try push https://tbpl.mozilla.org/?tree=Try&rev=413cd21f7d79
Benoit, in case you are overwhelmed with top crashers and other koi goodness, this is one is low priority.
Blocks: 899044
I was going through open bugs to see what's still relevant and found bug 899044 which has the typical symptom this bug is trying to fix.

Turns out something looks very wrong in ~BasicTextureImage as we used to create a local variable ctx that would be equal to mGLContext or something else depending on the situation, then did something like:

if (ctx) {
  mGLContext->DoStuff()
}

This looks very much like it should have been ctx->DoStuff() and is the reason of the crash.

So, fixed that and added checks to teh return value of MakeCurrent() in BasicTextureImage.
Attachment #8336273 - Flags: review?(bjacob)
Attachment #8336230 - Flags: review?(bjacob) → review+
Comment on attachment 8336236 [details] [diff] [review]
Check the return value of MakeCurrent in TextureHostOGL and CompositorOGL

Review of attachment 8336236 [details] [diff] [review]:
-----------------------------------------------------------------

r- for breaking a counter-based memory reporter:

::: gfx/layers/opengl/TextureHostOGL.cpp
@@ +758,5 @@
>  {
>    if (mUploadTexture) {
>      MOZ_ASSERT(mGL);
> +    if (mGL->MakeCurrent()) {
> +      mGL->fDeleteTextures(1, &mUploadTexture);    

trailing \w

@@ +1016,4 @@
>  
>      gl::GfxTexturesReporter::UpdateAmount(gl::GfxTexturesReporter::MemoryFreed,
>                                            mGLFormat, GetTileType(),
>                                            TILEDLAYERBUFFER_TILE_SIZE);

R- ! You should UpdateAmount if and only if you effectively call fDeleteTextures.

This pattern of bugs is why counter-based memory reporters are generally too unreliable to be helpful in practice. We need to switch that to traversal-based, but that would rely on the "GLContext modularization"
Attachment #8336236 - Flags: review?(bjacob) → review-
Attachment #8336273 - Flags: review?(bjacob) → review+
(In reply to Benoit Jacob [:bjacob] from comment #5)
> 
> @@ +1016,4 @@
> >  
> >      gl::GfxTexturesReporter::UpdateAmount(gl::GfxTexturesReporter::MemoryFreed,
> >                                            mGLFormat, GetTileType(),
> >                                            TILEDLAYERBUFFER_TILE_SIZE);
> 
> R- ! You should UpdateAmount if and only if you effectively call
> fDeleteTextures.
> 
> This pattern of bugs is why counter-based memory reporters are generally too
> unreliable to be helpful in practice. We need to switch that to
> traversal-based, but that would rely on the "GLContext modularization"

My thinking was that when MakeCurrent() returns false, it usually means that the driver has already freed the memory under the hood because it's what happens when you destroy the OpenGL context.

One of the goals of this bug is to relax the constraint of deleting all the gl resources on the compositor side before ~nsBaseWidget returns. If we do relax that constraint but not always decrement the counter here, it will show up as a leak even though the driver has already cleaned up the memory, and will make that use case that I am trying to make valid look like a memory leak bug.
There is no guarantee that that's the case. For example, eglMakeCurrent can fail for a variety of other reasons such as EGL_BAD_SURFACE. In these cases, OpenGL objects are still around. There is no reason for counting an OpenGL object as freed before its GL context is actually destroyed or the right glDelete* function is called on it.
Ok, then I'll add a check to see if MakeCurrent returns false because the context is destroyed specifically (in which case the counter should still be decreased, but not in the other cases). Would that make sense?
Why? It seemed to me that all what you had to do, was update the counter in the same if {...} scope as you call glDelete*. Why would it be more complicated than that?
The recurring use case is: ~nsBaseWidget is kicked, this starts the destruction sequence of a lot of stuff, but because we do refcouning it's hard to ensure that all our TextureHost or TextureImage objects will be destroyed on the spot because anything could be keeping them alive.
If, say, a Tile is kept alive just long enough for it to outlive the nsBaseWidget, it means it is holding an already destroyed GLContext. In this particular case, the Tile's texture memory was already freed by the driver, and the tile did go through the logic of deleting the texture, so as far as the tile is concerned, it did not do anything wrong. It just so happens that there is this artificial limitation around widgets and gl contexts that provoked the tile's texture memory to be destroyed a few cicles before the tile object itself. This is not a memory leak, and it would be misleading to count it as such.

I don't want this to count as a leak, because 1) memory has not actually leaked, we just carried the action of deleting the texture a few cycles too late but it was already deleted, and 2) counting them as leaks means we will still have OMTC shutdown errors, but instead of showing up as crashes, they'll show up as leaks.
Worse, those leaks will be about stuff that has actually be freed already (but the driver doesn't decrement the counter), so not really leaks. Pretty hard to debug.

So, if MakeCurrent returns false but IsDestroyed returns true, it means that the texture memory this counter is tracking is already freed (by the driver instead of us).

It's similar to those IPC shutdown scenarios where IPDL deallocates shmems under our feet. In those cases the shmems are not deallocated by us but they are not leaked either.
What you're saying here pretty much means that a counter-based memory reporter in unsuitable here.

So I'll turn my r- into an r+ but please:
 - add a comment
 - file a bug about this memory reporter being broken and it needing to be replaced by a traversal-based memory reporter.
Comment on attachment 8336236 [details] [diff] [review]
Check the return value of MakeCurrent in TextureHostOGL and CompositorOGL

A reluctant r+ because the brokenness is preexisting.
Attachment #8336236 - Flags: review- → review+
Comment on attachment 8336393 [details] [diff] [review]
Check the return value of MakeCurrent in TextureHostOGL and CompositorOGL

Review of attachment 8336393 [details] [diff] [review]:
-----------------------------------------------------------------

OK, that's better.

::: gfx/layers/opengl/TextureHostOGL.cpp
@@ +758,5 @@
>  {
>    if (mUploadTexture) {
>      MOZ_ASSERT(mGL);
> +    if (mGL->MakeCurrent()) {
> +      mGL->fDeleteTextures(1, &mUploadTexture);    

still trailing \w.
Attachment #8336393 - Flags: review?(bjacob) → review+
Didn't notice that there was an earlier push also, so it's now backed out as well:  https://hg.mozilla.org/integration/mozilla-inbound/rev/7cfdbd447930
Turns out the initialization of the GLContext needs to call MakeCurrent before it has initialized its symbols (at least for X11 and fennec), so IsDestroyed() checking for the symbols is not correct until the context has been fully initialized once.

This patch just adds a boolean in GLContext to store whether or not MarkDestroyed has been called, rather than looking at the symbols.

try push: https://tbpl.mozilla.org/?tree=Try&rev=87cafe639213
Attachment #8336230 - Attachment is obsolete: true
Attachment #8337898 - Flags: review?(bjacob)
Comment on attachment 8337898 [details] [diff] [review]
Check that the context is not destroyed in MakeCurrent

Review of attachment 8337898 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/gl/GLContext.cpp
@@ +276,5 @@
>      mMaxTextureImageSize(0),
>      mMaxRenderbufferSize(0),
>      mNeedsTextureSizeChecks(false),
> +    mWorkAroundDriverBugs(true),
> +    mDestroyed(false)

This introduces a bug: as mDestroyed is initialized to false, IsDestroyed() on a newly-constructed GLContext will now return false, whereas it returned false before.

Maybe it's confusing to have a mDestroyed field and maybe what you want instead is "mInitialized" field.
Attachment #8337898 - Flags: review?(bjacob) → review-
... "whereas it returned TRUE before".
That's exactly what I mean to do: MakeCurrent on an uninitialized context needs to work somehow because the init uses it. If MakeCurrent checks for IsDestroyed then the initialization fails (that's what got my patch backed out). Maybe we need a distinction between IsDestroyed and IsValid?
In any case, MakeCurrent needs to check for something that returns false after the context was destroyed, but does not return false before init.
If IsDestroyed is also used to check that the context is valid regardless of whether it was created/destoyed, then it should have a different name.
Do you mean that the behavior change introduced by this patch was intentional?

Before this patch, it was safe to assume that if IsDestroyed() returns false, then it is safe to make GL calls (because entry points have been initialized).

After this patch, this assumption is not safe anymore.

If this is intentional then maybe at least it is worth pointing out explicitly!
Had a conversation with Nical. We agree that it is really worth investigating to the bottom the the root problem which led him to want to introduce this mDestroyed flag, which was a strange crash during GLContext initialization. I would really like us to understand this fully. It is also worth avoiding to introduce redundant state in GLContext (mDestroyed vs nullness of fUseProgram symbol)
This version moves MakeCurrent around so that it is not called before symbols are initialized. This way MakeCurrent can check for IsDestroyed().

try push: https://tbpl.mozilla.org/?tree=Try&rev=e76a97aa2565

As a followup I would like to rename IsDestroyed into IsValid.
Attachment #8337898 - Attachment is obsolete: true
Attachment #8341149 - Flags: review?(bjacob)
Comment on attachment 8341149 [details] [diff] [review]
Check that the context is not destroyed in MakeCurrent

Review of attachment 8341149 [details] [diff] [review]:
-----------------------------------------------------------------

\o/
Attachment #8341149 - Flags: review?(bjacob) → review+
Comment on attachment 8341149 [details] [diff] [review]
Check that the context is not destroyed in MakeCurrent


>diff --git a/gfx/gl/GLContext.cpp b/gfx/gl/GLContext.cpp
>--- a/gfx/gl/GLContext.cpp
>+++ b/gfx/gl/GLContext.cpp
>@@ -451,17 +451,17 @@ GLContext::InitWithPrefix(const char *pr
> 
>     mInitialized = LoadSymbols(&symbols[0], trygl, prefix);
>-
>+    MakeCurrent();
>     if (mInitialized) {
>diff --git a/gfx/gl/GLContextProviderWGL.cpp b/gfx/gl/GLContextProviderWGL.cpp
>--- a/gfx/gl/GLContextProviderWGL.cpp
>+++ b/gfx/gl/GLContextProviderWGL.cpp
>@@ -310,17 +310,16 @@ public:
>         return ContextTypeWGL;
>     }
> 
>     bool Init()
>     {
>         if (!mDC || !mContext)
>             return false;
> 
>-        MakeCurrent();
>         SetupLookupFunction();
>         if (!InitWithPrefix("gl", true))
>             return false;

Note that you subtly changed the order of MakeCurrent wrt LoadSymbols here.  Windows, in particular, can't load any GL symbols without a current context (wglGetProcAddress always returns null).  This is for a bunch of silly historical reasons, but it stems from Windows' ability to have multiple OpenGL ICDs loaded at once.  The symbols for one context are not necessarily valid for any other.

This broke GL context creation on Windows, unless you happen to somehow have a GL context already current.  There's something else broken as well though, so not quite sure what's going on yet.
The "something else" is that MakeCurrent now does nothing unless the symbols were already loaded.  Nuking that as well and GL works once again on Windows.  So this is bad times here.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
But the previous state was not good either, as we were using symbol nullness to indicate whether a context was destroyed, and so our MakeCurrent function was only working on not-yet-loaded-symbols contexts because it was unsafe on destroyed contexts, which was a cause of crashes, which was the motivation for Nical's changes here. Nical originally proposed introducing a new 'isdestroyed' flag decoupled from symbol nullness, but I turned that down because that would have been introducing redundant state in GLContext.

Since the only problem here is WGL-specific, I would rather go for a simple fix in GLContextProviderWGL, such as directly calling wglMakeCurrent in Init().
Attached patch Call wglMakeCurrent (obsolete) — Splinter Review
Attachment #8345836 - Flags: review?(vladimir)
Attachment #8345836 - Attachment is obsolete: true
Attachment #8345836 - Flags: review?(vladimir)
Attachment #8345838 - Flags: review?(vladimir)
Comment on attachment 8345838 [details] [diff] [review]
Call wglMakeCurrent

Sold!  Thanks :)
Attachment #8345838 - Flags: review?(vladimir) → review+
https://hg.mozilla.org/mozilla-central/rev/8450bc58c1a6
Status: REOPENED → RESOLVED
Closed: 10 years ago10 years ago
Resolution: --- → FIXED
Flags: in-testsuite?
You need to log in before you can comment on or make changes to this bug.