Closed Bug 715232 Opened 13 years ago Closed 12 years ago

Crash in glCopyTexImage2D on Gonk, when creating an intermediate container-layer fbo during compositing

Categories

(Core :: Graphics, defect)

Other Branch
ARM
Gonk (Firefox OS)
defect
Not set
major

Tracking

()

RESOLVED FIXED
mozilla12

People

(Reporter: fabrice, Assigned: cjones)

References

Details

(Whiteboard: [inbound])

Attachments

(1 file, 2 obsolete files)

STR:
- go to the homescreen
- hold the back button down

Expected:
- the list of running apps is displayed

Observed:
- gecko crashes

Relevant information from logcat is:
I/Gecko   (23022): ###!!! ABORT: Framebuffer not complete -- error 0x8cd6, mFBOTextureTarget 0xde1, aRect.width 161, aRect.height 267: file /home/fabrice/dev/B2G/gecko/gfx/layers/opengl/LayerManagerOGL.cpp, line 1178
The error appears to be "GL_FRAMEBUFFER_INCOMPLETE_ATTACHMENT_OES".

The relevant code is

void
LayerManagerOGL::CreateFBOWithTexture(const nsIntRect& aRect, InitMode aInit,
                                      GLuint *aFBO, GLuint *aTexture)
{

  // Making this call to fCheckFramebufferStatus prevents a crash on
  // PowerVR. See bug 695246.
  GLenum result = mGLContext->fCheckFramebufferStatus(LOCAL_GL_FRAMEBUFFER);
  if (result != LOCAL_GL_FRAMEBUFFER_COMPLETE) {
[snip]
    NS_RUNTIMEABORT(msg.get());
  }

Is it possible that we're not finishing previous operations on the framebuffer?  If this crash goes away with GL_DEBUG, would that indicate a sync problem?
Yes, if defining MOZ_GL_DEBUG makes this go away in debug builds, that would almost prove it's a sync problem.
I would try running with MOZ_GL_DEBUG_ABORT_ON_ERROR and MOZ_GL_DEBUG_VERBOSE defined (and log the output). Or is APItrace running on B2G? If yes, use that instead of MOZ_GL_DEBUG_VERBOSE, but keep MOZ_GL_DEBUG_ABORT_ON_ERROR.
We're also seeing sporadic reports of crashes caused by incomplete framebuffers in LayerManagerOGL::CreateFBOWithTexture on OS X 10.7 (Bug 705641). Though we don't have STR for that crash, in most of the reports the error is GL_FRAMEBUFFER_INCOMPLETE_ATTACHMENT as it is here.

One thing to try: comment LayerManagerOGL.cpp:1178, preventing the crash, and see if there are obvious visual artifacts. This might help to identify what's going bad here.
Blocks: 705641
Note to self: those need --enable-debug.

APITrace is in a state of partial integration, and I don't have it set up locally.  See bug 674753.
A little more info:

I/Gecko   ( 2603): GL ERROR: void mozilla::gl::GLContext::raw_fCopyTexImage2D(GLenum, GLint, GLenum, GLint, GLint, GLsizei, GLsizei, GLint) generated GL error GL_INVALID_OPERATION(0x0502)
I/Gecko   ( 2603): GL ERROR: void mozilla::gl::GLContext::fBindFramebuffer(GLenum, GLuint) generated GL error GL_INVALID_FRAMEBUFFER_OPERATION(0x0506)
I/Gecko   ( 2603): GL ERROR: void mozilla::gl::GLContext::fFramebufferTexture2D(GLenum, GLenum, GLenum, GLuint, GLint) generated GL error GL_INVALID_FRAMEBUFFER_OPERATION(0x0506)
I/Gecko   ( 2603): GL ERROR: GLenum mozilla::gl::GLContext::fCheckFramebufferStatus(GLenum) generated GL error GL_INVALID_FRAMEBUFFER_OPERATION(0x0506)
I/Gecko   ( 2603): ###!!! ABORT: Framebuffer not complete -- error 0x8cd6, mFBOTextureTarget 0xde1, aRect.width 415, aRect.height 691: file /home/cjones/mozilla/b2g/gecko/gfx/layers/opengl/LayerManagerOGL.cpp, line 1179
GL_INVALID_OPERATION is generated if glCopyTexImage2D is executed
            between the execution of glBegin and the corresponding
            execution of glEnd.
        
GL_INVALID_OPERATION is generated if internalformat is
            GL_DEPTH_COMPONENT, GL_DEPTH_COMPONENT16,
            GL_DEPTH_COMPONENT24, or GL_DEPTH_COMPONENT32 and there is no depth
            buffer.
Also seeing

I/Gecko   ( 2746): GL ERROR: void mozilla::gl::GLContext::fBindFramebuffer(GLenum, GLuint) generated GL error GL_INVALID_FRAMEBUFFER_OPERATION(0x0506)
I/Gecko   ( 2746): GL ERROR: void mozilla::gl::GLContext::fFramebufferTexture2D(GLenum, GLenum, GLenum, GLuint, GLint) generated GL error GL_INVALID_FRAMEBUFFER_OPERATION(0x0506)
I/Gecko   ( 2746): GL ERROR: GLenum mozilla::gl::GLContext::fCheckFramebufferStatus(GLenum) generated GL error GL_INVALID_FRAMEBUFFER_OPERATION(0x0506)
I/Gecko   ( 2746): GL ERROR: void mozilla::gl::GLContext::fEnable(GLenum) generated GL error GL_INVALID_FRAMEBUFFER_OPERATION(0x0506)
I/Gecko   ( 2746): GL ERROR: void mozilla::gl::GLContext::fEnable(GLenum) generated GL error GL_INVALID_FRAMEBUFFER_OPERATION(0x0506)
I/Gecko   ( 2746): GL ERROR: void mozilla::gl::GLContext::fDeleteTextures(GLsizei, GLuint*) generated GL error GL_INVALID_FRAMEBUFFER_OPERATION(0x0506)

in the logs when opening an app.  Maybe relevant but needs investigation anyway.
diff --git a/gfx/layers/opengl/ContainerLayerOGL.cpp b/gfx/layers/opengl/ContainerLayerOGL.cpp
index 642d785..30d56c1 100644
--- a/gfx/layers/opengl/ContainerLayerOGL.cpp
+++ b/gfx/layers/opengl/ContainerLayerOGL.cpp
@@ -193,7 +193,7 @@ ContainerRender(Container* aContainer,
       // not safe.
       if (HasOpaqueAncestorLayer(aContainer) &&
           transform3D.Is2D(&transform) && !transform.HasNonIntegerTranslation()) {
-        mode = LayerManagerOGL::InitModeCopy;
+//        mode = LayerManagerOGL::InitModeCopy;
         framebufferRect.x += transform.x0;
         framebufferRect.y += transform.y0;
         aContainer->mSupportsComponentAlphaChildren = true;

makes the crash go away, and no visual glitches.  Using a device with no gdb capabilities and limited logging, but investigating what's up.  Suspect we might be in a toplevel ContainerLayer.
Can't repro comment 9 after rebooting.  The container indeed does have a parent.  Still poking.  One more to-do:

WARNING: Unsupported filter type!: file /home/cjones/mozilla/b2g/gecko/gfx/gl/GLContext.cpp, line 644
The GL docs specify this behavior in the usual sparklingly clear way, but as far as I can guess, the Mali-400 MP driver doesn't want to glCopyTexImage before the target texture has been allocated with glTexImage2d.  This patch isn't minimal in that the mGLContext->fTexParameteri(...) rearrangement isn't needed to fix the crash here, but I think it's correct and harmless anyway.

There's a possible perf impact of this change with other drivers.  If there are known issues, please let me know.
Assignee: nobody → jones.chris.g
Attachment #586657 - Flags: review?(matt.woodrow)
Attachment #586657 - Flags: review?(joe)
(In reply to Chris Jones [:cjones] [:warhammer] from comment #7)
> A little more info:
> 
> I/Gecko   ( 2603): GL ERROR: void
> mozilla::gl::GLContext::raw_fCopyTexImage2D(GLenum, GLint, GLenum, GLint,
> GLint, GLsizei, GLsizei, GLint) generated GL error
> GL_INVALID_OPERATION(0x0502)

I wanted a stack trace to this first GL error! Which you'd have gotten with MOZ_GL_DEBUG_ABORT_ON_ERROR.

Such GL errors in GL calls setting up textures are exactly the place to look for an explanation for the subsequent incomplete framebuffer!
(In reply to Chris Jones [:cjones] [:warhammer] from comment #12)
> Created attachment 586657 [details] [diff] [review]
> Allocate the fbo's texture before copying to it
> 
> The GL docs specify this behavior in the usual sparklingly clear way,
Tip: Whenever you need fine details about GL, don't read the man pages, instead read the spec.

> far as I can guess, the Mali-400 MP driver doesn't want to glCopyTexImage
> before the target texture has been allocated with glTexImage2d.
Really?! This amounts to say that glCopyTexImage2D is useless on this driver, since NOT having to call glTexImage2D beforehand is the main reason to call glCopyTexImage2D instead of glCopyTexSubImage2D.

So if this is true, then we need to:
* either avoid glCopyTexImage2D altogether and replace by glTexImage2D + glCopyTexSubImage2D. We could even remove GLContext::fCopyTexImage2D altogether...!
* or blacklist this driver
(In reply to Benoit Jacob [:bjacob] from comment #13)
> (In reply to Chris Jones [:cjones] [:warhammer] from comment #7)
> > A little more info:
> > 
> > I/Gecko   ( 2603): GL ERROR: void
> > mozilla::gl::GLContext::raw_fCopyTexImage2D(GLenum, GLint, GLenum, GLint,
> > GLint, GLsizei, GLsizei, GLint) generated GL error
> > GL_INVALID_OPERATION(0x0502)
> 
> I wanted a stack trace to this first GL error! Which you'd have gotten with
> MOZ_GL_DEBUG_ABORT_ON_ERROR.
> 
> Such GL errors in GL calls setting up textures are exactly the place to look
> for an explanation for the subsequent incomplete framebuffer!

Me too!  Except that gdb is crashing with my builds!  So I had to debug this with printfs!  Why are we yelling! :)

I have a layer tree dump which can be used to reconstruct the C++ stack here.  When I have some more time I'll try fiddling with build options to see if I can make our shitty gdb happy.
(In reply to Chris Jones [:cjones] [:warhammer] from comment #15)
> (In reply to Benoit Jacob [:bjacob] from comment #13)
> > (In reply to Chris Jones [:cjones] [:warhammer] from comment #7)
> > > A little more info:
> > > 
> > > I/Gecko   ( 2603): GL ERROR: void
> > > mozilla::gl::GLContext::raw_fCopyTexImage2D(GLenum, GLint, GLenum, GLint,
> > > GLint, GLsizei, GLsizei, GLint) generated GL error
> > > GL_INVALID_OPERATION(0x0502)
> > 
> > I wanted a stack trace to this first GL error! Which you'd have gotten with
> > MOZ_GL_DEBUG_ABORT_ON_ERROR.
> > 
> > Such GL errors in GL calls setting up textures are exactly the place to look
> > for an explanation for the subsequent incomplete framebuffer!
> 
> Me too!  Except that gdb is crashing with my builds!  So I had to debug this
> with printfs!  Why are we yelling! :)

This sounds like the know GDB/Android issue that it crashes the debugged process when doing GL work without debug symbols for the GL libraries! Which is solved by flashing your device with a homemade Android build for which you have debug symbols and pointing GDB to it! See https://wiki.mozilla.org/User:Blassey/Notes/Android#Debugging ! Also, I like to yell! it keeps me in good shape!
(In reply to Benoit Jacob [:bjacob] from comment #14)
> (In reply to Chris Jones [:cjones] [:warhammer] from comment #12)
> > Created attachment 586657 [details] [diff] [review]
> > Allocate the fbo's texture before copying to it
> > 
> > The GL docs specify this behavior in the usual sparklingly clear way,
> Tip: Whenever you need fine details about GL, don't read the man pages,
> instead read the spec.
> 

Good call.  I downloaded it.

> > far as I can guess, the Mali-400 MP driver doesn't want to glCopyTexImage
> > before the target texture has been allocated with glTexImage2d.
> Really?! This amounts to say that glCopyTexImage2D is useless on this
> driver, since NOT having to call glTexImage2D beforehand is the main reason
> to call glCopyTexImage2D instead of glCopyTexSubImage2D.
> 
> So if this is true, then we need to:
> * either avoid glCopyTexImage2D altogether and replace by glTexImage2D +
> glCopyTexSubImage2D. We could even remove GLContext::fCopyTexImage2D
> altogether...!
> * or blacklist this driver

My thought was to add a quirks flag to emulate CopyTexImage2D, if this patch would be expected to have a noticeable perf impact.  But I agree that this needs more investigation.
(In reply to Benoit Jacob [:bjacob] from comment #16)
> (In reply to Chris Jones [:cjones] [:warhammer] from comment #15)
> > Me too!  Except that gdb is crashing with my builds!  So I had to debug this
> > with printfs!  Why are we yelling! :)
> 
> This sounds like the know GDB/Android issue that it crashes the debugged
> process when doing GL work without debug symbols for the GL libraries! Which
> is solved by flashing your device with a homemade Android build for which
> you have debug symbols and pointing GDB to it! See
> https://wiki.mozilla.org/User:Blassey/Notes/Android#Debugging ! Also, I like
> to yell! it keeps me in good shape!

No, this is a b2g build, built and flashed entirely from source (except the blobs).  gdb used to work in that setup so something is tickling a bug in the versions I've tried.  (I almost got the point of gdb'ing the gdb host.  Yo dawg.)
(In reply to Chris Jones [:cjones] [:warhammer] from comment #17)
> My thought was to add a quirks flag to emulate CopyTexImage2D, if this patch
> would be expected to have a noticeable perf impact.  But I agree that this
> needs more investigation.

Indeed that makes sense, as a single glCopyTexImage2D call is much more efficient than emulating it by glTexImage2D + glCopyTexSubImage2D (because glTexImage2D involved copying a CPU-side buffer and then copying it to GPU side, it's much slower than the Copy* variants).
Comment on attachment 586657 [details] [diff] [review]
Allocate the fbo's texture before copying to it

Clearing review pending investigation.  And oops, this should have used CopyTexSubImage.  (Patches at 0500...)
Attachment #586657 - Flags: review?(matt.woodrow)
Attachment #586657 - Flags: review?(joe)
OK!  We were unnecessarily pulling the vendor-neutral GL libraries as blobs, instead of using the versions we built.  Whew.

----- HI MY NAME IS `OGLContainerLayer (0x40c80ca0)`
OGLLayerManager (0x40297c40)
  OGLContainerLayer (0x40c81900) [visible=< (x=0, y=0, w=480, h=800); >] [opaqueContent] [metrics={ viewport=(x=0, y=0, w=480, h=800) viewportScroll=(x=0, y=0) displayport=(x=0, y=0, w=0, h=0) scrollId=0 }]
    OGLThebesLayer (0x40c81a60) [clip=(x=0, y=0, w=0, h=0)] [isFixedPosition]
    OGLColorLayer (0x40c6a940) [clip=(x=0, y=0, w=0, h=0)] [opaqueContent] [isFixedPosition] [color=rgba(0, 0, 0, 1)]
    OGLContainerLayer (0x40c82da0) [clip=(x=0, y=0, w=480, h=800)] [visible=< (x=0, y=0, w=480, h=800); >] [opaqueContent] [isFixedPosition]
      OGLThebesLayer (0x40c82f00) [visible=< (x=0, y=0, w=480, h=800); >] [opaqueContent] [isFixedPosition] [valid=< (x=0, y=0, w=480, h=800); >]
      OGLCanvasLayer (0x43df6f30) [clip=(x=0, y=0, w=480, h=800)] [transform=[ 1 0; 0 1; 0 24; ]] [visible=< (x=0, y=0, w=480, h=776); >] [isFixedPosition]
      OGLContainerLayer (0x40c809e0) [clip=(x=0, y=0, w=480, h=800)] [transform=[ 1 0; 0 1; 138.445 230.742; ]] [visible=< (x=0, y=0, w=204, h=339); >] [componentAlpha] [isFixedPosition]
        OGLContainerLayer (0x40c80ca0) [visible=< (x=0, y=0, w=204, h=339); >] [opacity=0.423146] [componentAlpha] [isFixedPosition] [usesTmpSurf]
          OGLThebesLayer (0x40c826c0) [visible=< (x=0, y=0, w=204, h=339); >] [isFixedPosition]
          OGLContainerLayer (0x40c82980) [clip=(x=0, y=0, w=203, h=339)] [transform=[ 1 0; 0 1; 50.7775 84.6291; ]] [visible=< (x=0, y=-7, w=109, h=190); >] [componentAlpha] [isFixedPosition]
            OGLThebesLayer (0x40c82ae0) [transform=[ 1 0; 0 1; -61 0; ]] [visible=< (x=61, y=0, w=102, h=170); (x=102, y=173, w=20, h=10); >] [componentAlpha] [isFixedPosition]
            OGLContainerLayer (0x40c82c40) [transform=[ 1 0; 0 1; -101.555 0; ]] [visible=< (x=196, y=-7, w=14, h=14); >] [isFixedPosition]
              OGLThebesLayer (0x40c831c0) [transform=[ 1 0; 0 1; -122 0; ]] [visible=< (x=318, y=-7, w=14, h=14); >] [isFixedPosition]


(gdb) bt
#0  mozilla::layers::LayerManagerOGL::CreateFBOWithTexture (this=<value optimized out>, aRect=..., aInit=<value optimized out>, aFBO=<value optimized out>, aTexture=0xbef835cc) at /home/cjones/mozilla/b2g/gecko/gfx/layers/opengl/LayerManagerOGL.cpp:1139
#1  0x82df9ecc in ContainerRender<mozilla::layers::ContainerLayerOGL> (this=0x40c80ca0, aPreviousFrameBuffer=<value optimized out>, aOffset=<value optimized out>) at /home/cjones/mozilla/b2g/gecko/gfx/layers/opengl/ContainerLayerOGL.cpp:211
#2  mozilla::layers::ContainerLayerOGL::RenderLayer (this=0x40c80ca0, aPreviousFrameBuffer=<value optimized out>, aOffset=<value optimized out>) at /home/cjones/mozilla/b2g/gecko/gfx/layers/opengl/ContainerLayerOGL.cpp:340
#3  0x82dfa0e2 in ContainerRender<mozilla::layers::ContainerLayerOGL> (this=0x40c809e0, aPreviousFrameBuffer=<value optimized out>, aOffset=<value optimized out>) at /home/cjones/mozilla/b2g/gecko/gfx/layers/opengl/ContainerLayerOGL.cpp:247
#4  mozilla::layers::ContainerLayerOGL::RenderLayer (this=0x40c809e0, aPreviousFrameBuffer=<value optimized out>, aOffset=<value optimized out>) at /home/cjones/mozilla/b2g/gecko/gfx/layers/opengl/ContainerLayerOGL.cpp:340
#5  0x82dfa0e2 in ContainerRender<mozilla::layers::ContainerLayerOGL> (this=0x40c82da0, aPreviousFrameBuffer=<value optimized out>, aOffset=<value optimized out>) at /home/cjones/mozilla/b2g/gecko/gfx/layers/opengl/ContainerLayerOGL.cpp:247
#6  mozilla::layers::ContainerLayerOGL::RenderLayer (this=0x40c82da0, aPreviousFrameBuffer=<value optimized out>, aOffset=<value optimized out>) at /home/cjones/mozilla/b2g/gecko/gfx/layers/opengl/ContainerLayerOGL.cpp:340
#7  0x82dfa0e2 in ContainerRender<mozilla::layers::ContainerLayerOGL> (this=0x40c81900, aPreviousFrameBuffer=<value optimized out>, aOffset=<value optimized out>) at /home/cjones/mozilla/b2g/gecko/gfx/layers/opengl/ContainerLayerOGL.cpp:247
#8  mozilla::layers::ContainerLayerOGL::RenderLayer (this=0x40c81900, aPreviousFrameBuffer=<value optimized out>, aOffset=<value optimized out>) at /home/cjones/mozilla/b2g/gecko/gfx/layers/opengl/ContainerLayerOGL.cpp:340
#9  0x82dff038 in mozilla::layers::LayerManagerOGL::Render (this=0x40297c40) at /home/cjones/mozilla/b2g/gecko/gfx/layers/opengl/LayerManagerOGL.cpp:806
#10 0x82dff46a in mozilla::layers::LayerManagerOGL::EndTransaction (this=0x40297c40, aCallback=0x8237cdd1 <mozilla::FrameLayerBuilder::DrawThebesLayer(mozilla::layers::ThebesLayer*, gfxContext*, nsIntRegion const&, nsIntRegion const&, void*)>, aCallbackData=0xbef83be4, aFlags=<value optimized out>) at /home/cjones/mozilla/b2g/gecko/gfx/layers/opengl/LayerManagerOGL.cpp:430
#11 0x823a5e30 in nsDisplayList::PaintForFrame (this=<value optimized out>, aBuilder=0xbef83be4, aCtx=<value optimized out>, aForFrame=0x43b14810, aFlags=5) at /home/cjones/mozilla/b2g/gecko/layout/base/nsDisplayList.cpp:637
#12 0x823a5eda in nsDisplayList::PaintRoot (this=0x0, aBuilder=<value optimized out>, aCtx=0x40000000, aFlags=<value optimized out>) at /home/cjones/mozilla/b2g/gecko/layout/base/nsDisplayList.cpp:542
#13 0x823bd17c in nsLayoutUtils::PaintFrame (aRenderingContext=<value optimized out>, aFrame=0x43b14810, aDirtyRegion=<value optimized out>, aBackstop=<value optimized out>, aFlags=260) at /home/cjones/mozilla/b2g/gecko/layout/base/nsLayoutUtils.cpp:1714
#14 0x823d2fa6 in PresShell::Paint (this=0x402d7ad0, aViewToPaint=0x439bc240, aWidgetToPaint=<value optimized out>, aDirtyRegion=..., aIntDirtyRegion=..., aPaintDefaultBackground=<value optimized out>, aWillSendDidPaint=false) at /home/cjones/mozilla/b2g/gecko/layout/base/nsPresShell.cpp:5492
#15 0x8273dd58 in nsViewManager::RenderViews (this=<value optimized out>, aView=0x439bc240, aWidget=0x43b06350, aRegion=..., aIntRegion=..., aPaintDefaultBackground=false, aWillSendDidPaint=false) at /home/cjones/mozilla/b2g/gecko/view/src/nsViewManager.cpp:418
#16 0x8273de66 in nsViewManager::Refresh (this=0x439f9380, aView=0x439bc240, aWidget=0x43b06350, aRegion=...) at /home/cjones/mozilla/b2g/gecko/view/src/nsViewManager.cpp:393
#17 0x8273f292 in nsViewManager::DispatchEvent (this=0x439f9380, aEvent=0xbef842e0, aView=<value optimized out>, aStatus=<value optimized out>) at /home/cjones/mozilla/b2g/gecko/view/src/nsViewManager.cpp:891
#18 0x8273bbd2 in HandleEvent (aEvent=0xbef842e0) at /home/cjones/mozilla/b2g/gecko/view/src/nsView.cpp:159
#19 0x82c29b04 in nsWindow::DoDraw () at /home/cjones/mozilla/b2g/gecko/widget/gonk/nsWindow.cpp:115
#20 0x82c2817e in nsAppShell::ProcessNextNativeEvent (this=0x402f4970, mayWait=<value optimized out>) at /home/cjones/mozilla/b2g/gecko/widget/gonk/nsAppShell.cpp:668
#21 0x82c2ab18 in nsBaseAppShell::DoProcessNextNativeEvent (this=0x0, mayWait=false) at /home/cjones/mozilla/b2g/gecko/widget/xpwidgets/nsBaseAppShell.cpp:171
#22 0x82c2ae02 in nsBaseAppShell::OnProcessNextEvent (this=0x402f4970, thr=0x40214180, mayWait=false, recursionDepth=<value optimized out>) at /home/cjones/mozilla/b2g/gecko/widget/xpwidgets/nsBaseAppShell.cpp:312
#23 0x82d8f694 in nsThread::ProcessNextEvent (this=0x40214180, mayWait=false, result=<value optimized out>) at /home/cjones/mozilla/b2g/gecko/xpcom/threads/nsThread.cpp:622
#24 0x82d596f2 in NS_ProcessNextEvent_P (thread=0x40214180, mayWait=false) at /home/cjones/mozilla/b2g/gecko/objdir-prof-gonk/xpcom/build/nsThreadUtils.cpp:245
#25 0x82cc956e in mozilla::ipc::MessagePump::Run (this=0x4020f1f0, aDelegate=0x402190e0) at /home/cjones/mozilla/b2g/gecko/ipc/glue/MessagePump.cpp:110
#26 0x82db695a in MessageLoop::RunInternal (this=0x402190e0) at /home/cjones/mozilla/b2g/gecko/ipc/chromium/src/base/message_loop.cc:208
#27 0x82db698e in RunHandler (this=0x0) at /home/cjones/mozilla/b2g/gecko/ipc/chromium/src/base/message_loop.cc:201
#28 MessageLoop::Run (this=0x0) at /home/cjones/mozilla/b2g/gecko/ipc/chromium/src/base/message_loop.cc:175
#29 0x82c2afa2 in nsBaseAppShell::Run (this=0x402f4970) at /home/cjones/mozilla/b2g/gecko/widget/xpwidgets/nsBaseAppShell.cpp:189
#30 0x82aff296 in nsAppStartup::Run (this=0x402867c0) at /home/cjones/mozilla/b2g/gecko/toolkit/components/startup/nsAppStartup.cpp:220
#31 0x8221ae30 in XRE_main (argc=<value optimized out>, argv=<value optimized out>, aAppData=<value optimized out>) at /home/cjones/mozilla/b2g/gecko/toolkit/xre/nsAppRunner.cpp:3537
#32 0x00008af0 in do_main (argc=1, argv=0xbef86c64) at /home/cjones/mozilla/b2g/gecko/b2g/app/nsBrowserApp.cpp:201
#33 main (argc=1, argv=0xbef86c64) at /home/cjones/mozilla/b2g/gecko/b2g/app/nsBrowserApp.cpp:287
Summary: moz-element() fails on gonk on device → Crash in glCopyTexImage2D on Mali-400 MP, when creating an intermediate container-layer fbo during compositing
Possibly relevant:

      OGLCanvasLayer (0x43df6f30) [clip=(x=0, y=0, w=480, h=800)] [transform=[ 1 0; 0 1; 0 24; ]] [visible=< (x=0, y=0, w=480, h=776); >] [isFixedPosition]
      OGLContainerLayer (0x40c809e0) [clip=(x=0, y=0, w=480, h=800)] [transform=[ 1 0; 0 1; 138.445 230.742; ]] [visible=< (x=0, y=0, w=204, h=339); >] [componentAlpha] [isFixedPosition]
        OGLContainerLayer (0x40c80ca0) [visible=< (x=0, y=0, w=204, h=339); >] [opacity=0.423146] [componentAlpha] [isFixedPosition] [usesTmpSurf]

The canvas layer is a WebGL canvas.  The [usesTmpSurf] container is what's trying to CopyTexImage2D, obviously.
This also makes the crash go away.  This means I have no idea why the previous workaround was actually working.

diff --git a/gfx/layers/opengl/LayerManagerOGL.cpp b/gfx/layers/opengl/LayerManagerOGL.cpp
index e70df50..c419132 100644
--- a/gfx/layers/opengl/LayerManagerOGL.cpp
+++ b/gfx/layers/opengl/LayerManagerOGL.cpp
@@ -1125,6 +1125,12 @@ LayerManagerOGL::CreateFBOWithTexture(const nsIntRect& aRect, InitMode aInit,
 {
   GLuint tex, fbo;
 
+
+
+  mGLContext->fBindFramebuffer(LOCAL_GL_FRAMEBUFFER, mBackBufferFBO);
+
+
+
   mGLContext->fActiveTexture(LOCAL_GL_TEXTURE0);

I'll try to decipher the webgl black magic messing things up here, but may need to give up and ask for help.
Shouldn't this function also call MakeCurrent() before doing any GL call?
It sort of makes sense anyway: forgetting to call MakeCurrent and/or BindFramebuffer here could give exactly these symptoms, IIUC.
MakeCurrent() alone doesn't make the crash go away.  However, I'm seeing something that doesn't make sense, tracking down.
Dead end.  Turns out we drag around mBackBufferFBO in LayerManagerOGL forever, even if the GL context is double-buffered and we don't need it.  Whups.  I went ahead and fixed that.

But this *does* mean we're trying to CopyTexImage2D from fb 0, which on b2g is the hardware framebuffer.
Not sure I understand comment 26 fully but the following may be useful info:
- GLContexts are normally backed by FBOs. This means that we never use their real "hardware framebuffer" for any rendering, we render everything to a FBO. Last I checked, there still exist alternative GLContext creation methods, using e.g. a PBuffer instead of a FBO, but they're deprecated and we should probably just remove them, they're just confusing things. I assume we're only using FBOs here.
- GLContext silently maps FBO 0 to its main FBO. This means that if you do gl->fBindFramebuffer(GL_FRAMEBUFFER, 0) it actually does glBindFramebuffer(GL_FRAMEBUFFER, n) where n is the name of the main FBO of this GLContext. This mapping should be transparent to you as a user of the GLContext class.
CC'ing jgilbert who knows these things better.
Oops, comment 27 only applies to offscreen GLContext's but I suppose that the LayerManager has a non-offscreen GLContext so I was off-topic here.
This is b2g on Gonk, remember.  Gecko owns the hardware framebuffer there.  Specifically, we create an EGL window [1] that's backed by buffers allocated from the graphics hal with access USAGE_HW_FB [2].  Then we create a GL context around that EGL window.

From what I can tell, this is a bug somewhere between the the android non-proprietary and vendor-proprietary EGL implementations.  Just "guessing" (not using any other means, *cough*), I would bet that the proprietary driver is trying to find an EGLSurface whose properties to allocate the new texture with, but failing to find one for the bound framebuffer.  I tried the same on another chipset, and hit exactly the same problem.  So that points to it being android-EGL not setting up the FramebufferNativeWindow properly.

ReadPixels works just fine on the framebuffer, so I don't think it's a permission/access problem.

Both those observations would explain why the TexImage2D workaround actually works.

If it's just an android-EGL bug, then luckily on Gonk we control the android-EGL code and can patch it directly :D.

I've spent way too much time on this but am looking into android-EGL.  If I can't find the right button to push, I'll probably just do a gonk-specific workaround as a band-aid.

(Also, there's no reason we should be using a non-pixel-aligned clip here.  That'll be another "fix" on the Gaia side of things.)

[1] http://mxr.mozilla.org/mozilla-central/source/widget/gonk/nsWindow.cpp#77
[2] https://github.com/cgjones/android-frameworks-base/blob/gingerbread-b2g/libs/ui/FramebufferNativeWindow.cpp#L98
er s/clip/transform/
Summary: Crash in glCopyTexImage2D on Mali-400 MP, when creating an intermediate container-layer fbo during compositing → Crash in glCopyTexImage2D on Gonk, when creating an intermediate container-layer fbo during compositing
What appears to be going on here is that the HW framebuffer we're trying to copy from has RGB format, but we're trying to copy it up into an RGBA texture.  As I recall from the GLES spec, it's totally allowed for the driver to vomit this back up at us.

So I think the closest to right fix here is query the format of mFBOTextureTarget, and if it's RGB, then go through a slow(er) copy-up path.
I looked into trying to actually read the internal format of the currently-bound framebuffer, but (1) I'm not even sure we can do this for the default target; (2) even if we can, there's a dizzyingly large set of states the fb can be configured in, reading the internal format would take a lot of code AFAICT.

If there's a magic param I can queried here, I would be more than happy to use it! :)
Attachment #590624 - Flags: review?(joe)
Attachment #586657 - Attachment is obsolete: true
Comment on attachment 590624 [details] [diff] [review]
Don't attempt to CopyTexImage from an RGB framebuffer to an RGBA texture

Review of attachment 590624 [details] [diff] [review]:
-----------------------------------------------------------------

Is it possible to create the framebuffer in RGBA format? Both Jeff and I would prefer that.

Failing that, the below comments apply.

::: gfx/layers/opengl/LayerManagerOGL.cpp
@@ +1136,5 @@
> +GetFrameBufferInternalFormat(GLContext* gl,
> +                             GLuint aCurrentFrameBuffer,
> +                             nsIWidget* aWidget)
> +{
> +  if (0 == aCurrentFrameBuffer) {

r-----------------------------------------------------------------

@@ +1145,5 @@
> +
> +static bool
> +AreFormatsCompatibleForCopyTexImage2D(GLenum aF1, GLenum aF2)
> +{
> +  return (aF1 == aF2);

This should be #ifdef GLES (or whatever the equivalent ifdef is).

@@ +1174,5 @@
> +                                  aRect.x, aRect.y,
> +                                  aRect.width, aRect.height,
> +                                  0);
> +    } else {
> +      // Curses, incompatible formats.  Take a slow path.

Is there evidence that this is slower? It would be nice to have only a single codepath.

@@ +1183,5 @@
> +                              0,
> +                              LOCAL_GL_RGBA,
> +                              LOCAL_GL_UNSIGNED_BYTE,
> +                              NULL);
> +      mGLContext->fCopyTexSubImage2D(mFBOTextureTarget,

My reading of the GLES spec says that CopyTexSubImage2D is also constrained by the requirement that the framebuffer be in RGBA format (i.e., the same format). Is this not the case in your experience?

::: widget/gonk/nsWindow.cpp
@@ +404,5 @@
>              LOG("Could not create OGL LayerManager");
>      } else {
>          MOZ_ASSERT(sFramebufferOpen);
>          mLayerManager = new BasicShadowLayerManager(this);
> +    } 

remove this hunk

@@ +456,5 @@
> +    // We directly map the hardware fb on Gonk.  The hardware fb has
> +    // RGB format.
> +    return (mLayerManager &&
> +            LayerManager::LAYERS_OPENGL == mLayerManager->GetBackendType()) ?
> +        LOCAL_GL_RGB : LOCAL_GL_NONE;

Make this an if for easier reading :)
Attachment #590624 - Flags: review?(joe) → review-
(In reply to Joe Drew (:JOEDREW!) from comment #33)
> Comment on attachment 590624 [details] [diff] [review]
> Don't attempt to CopyTexImage from an RGB framebuffer to an RGBA texture
> 
> Review of attachment 590624 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Is it possible to create the framebuffer in RGBA format? Both Jeff and I
> would prefer that.
> 

Nope.  These buffers are handed to us directly by the gfx driver (gralloc).  We don't have any say in their internal format.

> Failing that, the below comments apply.
> 
> ::: gfx/layers/opengl/LayerManagerOGL.cpp
> @@ +1136,5 @@
> > +GetFrameBufferInternalFormat(GLContext* gl,
> > +                             GLuint aCurrentFrameBuffer,
> > +                             nsIWidget* aWidget)
> > +{
> > +  if (0 == aCurrentFrameBuffer) {
> 
> r-----------------------------------------------------------------
> 

Fixed.

> @@ +1145,5 @@
> > +
> > +static bool
> > +AreFormatsCompatibleForCopyTexImage2D(GLenum aF1, GLenum aF2)
> > +{
> > +  return (aF1 == aF2);
> 
> This should be #ifdef GLES (or whatever the equivalent ifdef is).
> 

Done.

> @@ +1174,5 @@
> > +                                  aRect.x, aRect.y,
> > +                                  aRect.width, aRect.height,
> > +                                  0);
> > +    } else {
> > +      // Curses, incompatible formats.  Take a slow path.
> 
> Is there evidence that this is slower? It would be nice to have only a
> single codepath.
> 

I don't have any, but
 (i) I can imagine CopyTexImage2D being easier to optimize in the driver
 (ii) bjacob requested the two paths

> @@ +1183,5 @@
> > +                              0,
> > +                              LOCAL_GL_RGBA,
> > +                              LOCAL_GL_UNSIGNED_BYTE,
> > +                              NULL);
> > +      mGLContext->fCopyTexSubImage2D(mFBOTextureTarget,
> 
> My reading of the GLES spec says that CopyTexSubImage2D is also constrained
> by the requirement that the framebuffer be in RGBA format (i.e., the same
> format). Is this not the case in your experience?
> 

It WFM, with GL_DEBUG on too.  Per discussion, will wait and see if problems arise later on.

> ::: widget/gonk/nsWindow.cpp
> @@ +404,5 @@
> >              LOG("Could not create OGL LayerManager");
> >      } else {
> >          MOZ_ASSERT(sFramebufferOpen);
> >          mLayerManager = new BasicShadowLayerManager(this);
> > +    } 
> 
> remove this hunk
> 

Whups.

> @@ +456,5 @@
> > +    // We directly map the hardware fb on Gonk.  The hardware fb has
> > +    // RGB format.
> > +    return (mLayerManager &&
> > +            LayerManager::LAYERS_OPENGL == mLayerManager->GetBackendType()) ?
> > +        LOCAL_GL_RGB : LOCAL_GL_NONE;
> 
> Make this an if for easier reading :)

OK.
Comment on attachment 591671 [details] [diff] [review]
Don't attempt to CopyTexImage from an RGB framebuffer to an RGBA texture, v2

Review of attachment 591671 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/opengl/LayerManagerOGL.cpp
@@ +1181,5 @@
> +                                  aRect.x, aRect.y,
> +                                  aRect.width, aRect.height,
> +                                  0);
> +    } else {
> +      // Curses, incompatible formats.  Take a slow path.

Add a comment here  saying that "Technically CopyTexSubImage2D also has the requirement of matching formats, but it doesn't seem to affect us in the real world."

::: widget/gonk/nsWindow.cpp
@@ +453,5 @@
> +PRUint32
> +nsWindow::GetGLFrameBufferFormat()
> +{
> +    if (mLayerManager &&
> +        LayerManager::LAYERS_OPENGL == mLayerManager->GetBackendType()) {

please fix rvalue on lhs here too

::: widget/xpwidgets/nsBaseWidget.cpp
@@ +1262,5 @@
> +PRUint32
> +nsBaseWidget::GetGLFrameBufferFormat()
> +{
> +  if (mLayerManager &&
> +      LayerManager::LAYERS_OPENGL == mLayerManager->GetBackendType()) {

please fix rvalue on lhs here too
Attachment #591671 - Flags: review?(joe) → review+
https://hg.mozilla.org/mozilla-central/rev/087c753880cb
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla12
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: