Closed Bug 837591 Opened 11 years ago Closed 11 years ago

Use GrallocBuffer as WebGL rendering target to boost performance

Categories

(Firefox OS Graveyard :: General, defect, P2)

ARM
Gonk (Firefox OS)
defect

Tracking

(blocking-b2g:-, b2g18+ affected, b2g18-v1.0.1 affected)

RESOLVED WORKSFORME
blocking-b2g -
Tracking Status
b2g18 + affected
b2g18-v1.0.1 --- affected

People

(Reporter: chiajung, Assigned: chiajung)

References

Details

(Keywords: perf, Whiteboard: [c= p= s=2013.11.08 u=] [tech-p1])

Attachments

(1 file, 7 obsolete files)

Attached patch WIP Patch V1 (obsolete) — Splinter Review
For better WebGL performance on FirefoxOS, we can use Gralloc buffers as WebGL rendering target and share the buffer between Content and Chrome process to avoid any means of memcpy.

I had wrote a small patch with the trick, which can boost the performance:

Crystal Skull:
Before: 5~9 FPS
After: 15~20 FPS

I will start to clean this patch up for review and merge.
Assignee: nobody → chung
Attached patch Patch V1 (obsolete) — Splinter Review
Here is the cleaned up patch without GraphicBuffer usage change for safety.

In this patch, the basic idea to speed things up is to render WebGL content to a shareable buffer directly and use that buffer for compositon directly. 

For GraphicBuffer usage part, it is nice to have but quiet risky at this time of point. Since gralloc can optimize the buffer based the usage flag.

I think we should fix that someday.
Attachment #709623 - Attachment is obsolete: true
Attachment #710026 - Flags: review?(vladimir)
Comment on attachment 710026 [details] [diff] [review]
Patch V1

Review of attachment 710026 [details] [diff] [review]:
-----------------------------------------------------------------

We need to assure that this behaves properly with WebGL's alpha, preserveDrawingBuffer, and resize mechanics. This would also break AA stuff if we actually implemented that on GLES, but this is probably acceptable for the short term.

::: gfx/gl/GLContextProviderEGL.cpp
@@ +409,4 @@
>          return true;
>      }
>  
> +    bool BindExternalBuffer(GLuint texture, void* buffer, GLenum target)

This function no longer necessarily binds an external buffer. I would advise renaming this to something like AttachNativeBufferToTexture.

@@ +2579,5 @@
>      nsRefPtr<GLContextEGL> glContext =
>          GLContextEGL::CreateEGLPBufferOffscreenContext(pbufferSize, aFormat, !usePBuffers);
>  
> +#ifdef MOZ_WIDGET_GONK
> +    sEGLLibrary.LoadConfigSensitiveSymbols();

Why is this needed here? This function should be being called at EGL library init time. Historically, we had this because this was where we loaded glEGLImageTargetTexture2D, but now that that's in GLContext instead, we should be loading EGLImage exts at the same time as other EGL exts.

::: gfx/layers/basic/BasicCanvasLayer.cpp
@@ +327,5 @@
> +      if (!mBackTexture)
> +        mGLContext->fGenTextures(1, &mBackTexture);
> +      mGLContext->BindExternalBuffer(mBackTexture, grfxBuffer->getNativeBuffer(), LOCAL_GL_TEXTURE_2D);
> +
> +      mGLContext->fBindFramebuffer(LOCAL_GL_FRAMEBUFFER, mGLContext->GetOffscreenFBO());

GetOffscreenFBO should always be 0. That function should be considered deprecated.

@@ +328,5 @@
> +        mGLContext->fGenTextures(1, &mBackTexture);
> +      mGLContext->BindExternalBuffer(mBackTexture, grfxBuffer->getNativeBuffer(), LOCAL_GL_TEXTURE_2D);
> +
> +      mGLContext->fBindFramebuffer(LOCAL_GL_FRAMEBUFFER, mGLContext->GetOffscreenFBO());
> +      mGLContext->fFramebufferTexture2D(LOCAL_GL_FRAMEBUFFER, LOCAL_GL_COLOR_ATTACHMENT0, LOCAL_GL_TEXTURE_2D, mBackTexture, 0);

We need to be sure this handles resizes correctly.
Also we probably want to leave a comment about how we're not actually trying to use FramebufferTexture2D on the default framebuffer. (framebuffer 0)

@@ +463,5 @@
> +#ifdef MOZ_WIDGET_GONK
> +  if (!IsDirty())
> +    return;
> +  Painted();
> +  mGLContext->fFlush();

I am not confident that Flush is sufficient in the general case.

::: gfx/layers/opengl/CanvasLayerOGL.cpp
@@ +530,5 @@
>                                                  true,
>                                                  GetMaskLayer() ? Mask2d : MaskNone);
>    } else {
> +#ifdef MOZ_WIDGET_GONK
> +    ShaderProgramType programType = RGBALayerProgramType;

Why do we set this separately for Gonk?
Attachment #710026 - Flags: review-
I would worry about not specifying HW_RENDER -- technically that could mean that we'd get back a buffer for which hardware rendering is not allowed.  (It might just happen to work on the QC chipset, but might not elsewhere.)

Where is the GraphicBuffer actually created?  (That is, who calls that Create function?)
(In reply to Jeff Gilbert [:jgilbert] from comment #2)
> Comment on attachment 710026 [details] [diff] [review]
> Patch V1
> 
> Review of attachment 710026 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> We need to assure that this behaves properly with WebGL's alpha,
> preserveDrawingBuffer, and resize mechanics. This would also break AA stuff
> if we actually implemented that on GLES, but this is probably acceptable for
> the short term.
> 
> ::: gfx/gl/GLContextProviderEGL.cpp
> @@ +409,4 @@
> >          return true;
> >      }
> >  
> > +    bool BindExternalBuffer(GLuint texture, void* buffer, GLenum target)
> 
> This function no longer necessarily binds an external buffer. I would advise
> renaming this to something like AttachNativeBufferToTexture.
Do you mean I rename it in this patch?
> 
> @@ +2579,5 @@
> >      nsRefPtr<GLContextEGL> glContext =
> >          GLContextEGL::CreateEGLPBufferOffscreenContext(pbufferSize, aFormat, !usePBuffers);
> >  
> > +#ifdef MOZ_WIDGET_GONK
> > +    sEGLLibrary.LoadConfigSensitiveSymbols();
> 
> Why is this needed here? This function should be being called at EGL library
> init time. Historically, we had this because this was where we loaded
> glEGLImageTargetTexture2D, but now that that's in GLContext instead, we
> should be loading EGLImage exts at the same time as other EGL exts.
Not exactly. If we create the GLContext *not* from a nsIWidget, those symbol are not loaded. Call those function in WebGL without this crashes the process when trying to call fCreateImage.

> 
> ::: gfx/layers/basic/BasicCanvasLayer.cpp
> @@ +327,5 @@
> > +      if (!mBackTexture)
> > +        mGLContext->fGenTextures(1, &mBackTexture);
> > +      mGLContext->BindExternalBuffer(mBackTexture, grfxBuffer->getNativeBuffer(), LOCAL_GL_TEXTURE_2D);
> > +
> > +      mGLContext->fBindFramebuffer(LOCAL_GL_FRAMEBUFFER, mGLContext->GetOffscreenFBO());
> 
> GetOffscreenFBO should always be 0. That function should be considered
> deprecated.
Yes, but I found others just use it this way, so I follow them :p
> 
> @@ +328,5 @@
> > +        mGLContext->fGenTextures(1, &mBackTexture);
> > +      mGLContext->BindExternalBuffer(mBackTexture, grfxBuffer->getNativeBuffer(), LOCAL_GL_TEXTURE_2D);
> > +
> > +      mGLContext->fBindFramebuffer(LOCAL_GL_FRAMEBUFFER, mGLContext->GetOffscreenFBO());
> > +      mGLContext->fFramebufferTexture2D(LOCAL_GL_FRAMEBUFFER, LOCAL_GL_COLOR_ATTACHMENT0, LOCAL_GL_TEXTURE_2D, mBackTexture, 0);
> 
> We need to be sure this handles resizes correctly.
> Also we probably want to leave a comment about how we're not actually trying
> to use FramebufferTexture2D on the default framebuffer. (framebuffer 0)
For resize staff, I think if we got a buffer with different size, we will kill the old one, and just use the new one in:
http://mxr.mozilla.org/mozilla-central/source/gfx/layers/opengl/CanvasLayerOGL.cpp#404

> 
> @@ +463,5 @@
> > +#ifdef MOZ_WIDGET_GONK
> > +  if (!IsDirty())
> > +    return;
> > +  Painted();
> > +  mGLContext->fFlush();
> 
> I am not confident that Flush is sufficient in the general case.
Yes, I think it maybe OK to add a glFinish. And that do not hurt much in my test case.
> 
> ::: gfx/layers/opengl/CanvasLayerOGL.cpp
> @@ +530,5 @@
> >                                                  true,
> >                                                  GetMaskLayer() ? Mask2d : MaskNone);
> >    } else {
> > +#ifdef MOZ_WIDGET_GONK
> > +    ShaderProgramType programType = RGBALayerProgramType;
> 
> Why do we set this separately for Gonk?
Basically, we should separate the render frontend for this buffer to make things right. And this remind me that this patch may kill 2D canvas! Let me fix them in next version!
(In reply to Vladimir Vukicevic [:vlad] [:vladv] from comment #3)
> I would worry about not specifying HW_RENDER -- technically that could mean
> that we'd get back a buffer for which hardware rendering is not allowed. 
> (It might just happen to work on the QC chipset, but might not elsewhere.)
> 
> Where is the GraphicBuffer actually created?  (That is, who calls that
> Create function?)

The call path is rather long...

here is the entry point:
http://mxr.mozilla.org/mozilla-central/source/gfx/layers/basic/BasicCanvasLayer.cpp#431

A very simple change to fix this without bother other staff may be "new GraphicBuffer" instead of AllocBuffer but I think that is depreciated. Other ways is: 1) to use other protocal to allocate the real buffer or 2) change the protocal to pass the flag. Anyway, it would be more risky and may affect other staffs.
(In reply to Chiajung Hung [:chiajung] from comment #5)
> here is the entry point:
> http://mxr.mozilla.org/mozilla-central/source/gfx/layers/basic/
> BasicCanvasLayer.cpp#431

Ah, I forgot to check basic layers.  So, why not call AllocBufferWithCaps, with introducing a new caps bit for HW texturing.  Then if that bit is present, ShadowLayerForwarder::PlatformAllocBuffer can use the second PGrallocBufferConstructor that already allows for explicit gralloc usage flags to be passed... the pixel format can be obtained from the content in the same way that the code there does.  That seems like it would let us get the right usage flags down.
(In reply to Vladimir Vukicevic [:vlad] [:vladv] from comment #6)
> (In reply to Chiajung Hung [:chiajung] from comment #5)
> > here is the entry point:
> > http://mxr.mozilla.org/mozilla-central/source/gfx/layers/basic/
> > BasicCanvasLayer.cpp#431
> 
> Ah, I forgot to check basic layers.  So, why not call AllocBufferWithCaps,
> with introducing a new caps bit for HW texturing.  Then if that bit is
> present, ShadowLayerForwarder::PlatformAllocBuffer can use the second
> PGrallocBufferConstructor that already allows for explicit gralloc usage
> flags to be passed... the pixel format can be obtained from the content in
> the same way that the code there does.  That seems like it would let us get
> the right usage flags down.
Since this one is *not* for PLayers. GrallocBuffer has 4 managers, and each of them use different protocal to create it. That one is used for PImageBridge/PImageContainer but not PLayers. That is I tried to hack around this morning :P
(In reply to Chiajung Hung [:chiajung] from comment #4)
> (In reply to Jeff Gilbert [:jgilbert] from comment #2)
> > Comment on attachment 710026 [details] [diff] [review]
> > Patch V1
> > 
> > Review of attachment 710026 [details] [diff] [review]:
> > -----------------------------------------------------------------
> > 
> > We need to assure that this behaves properly with WebGL's alpha,
> > preserveDrawingBuffer, and resize mechanics. This would also break AA stuff
> > if we actually implemented that on GLES, but this is probably acceptable for
> > the short term.
> > 
> > ::: gfx/gl/GLContextProviderEGL.cpp
> > @@ +409,4 @@
> > >          return true;
> > >      }
> > >  
> > > +    bool BindExternalBuffer(GLuint texture, void* buffer, GLenum target)
> > 
> > This function no longer necessarily binds an external buffer. I would advise
> > renaming this to something like AttachNativeBufferToTexture.
> Do you mean I rename it in this patch?
Since we are changing its behavior in this patch, yes.
> > 
> > @@ +2579,5 @@
> > >      nsRefPtr<GLContextEGL> glContext =
> > >          GLContextEGL::CreateEGLPBufferOffscreenContext(pbufferSize, aFormat, !usePBuffers);
> > >  
> > > +#ifdef MOZ_WIDGET_GONK
> > > +    sEGLLibrary.LoadConfigSensitiveSymbols();
> > 
> > Why is this needed here? This function should be being called at EGL library
> > init time. Historically, we had this because this was where we loaded
> > glEGLImageTargetTexture2D, but now that that's in GLContext instead, we
> > should be loading EGLImage exts at the same time as other EGL exts.
> Not exactly. If we create the GLContext *not* from a nsIWidget, those symbol
> are not loaded. Call those function in WebGL without this crashes the
> process when trying to call fCreateImage.
The initialization for these symbols should be changed to be loaded in all cases. I think I had a patch to do this at some point, but it got lost.
> 
> > 
> > @@ +328,5 @@
> > > +        mGLContext->fGenTextures(1, &mBackTexture);
> > > +      mGLContext->BindExternalBuffer(mBackTexture, grfxBuffer->getNativeBuffer(), LOCAL_GL_TEXTURE_2D);
> > > +
> > > +      mGLContext->fBindFramebuffer(LOCAL_GL_FRAMEBUFFER, mGLContext->GetOffscreenFBO());
> > > +      mGLContext->fFramebufferTexture2D(LOCAL_GL_FRAMEBUFFER, LOCAL_GL_COLOR_ATTACHMENT0, LOCAL_GL_TEXTURE_2D, mBackTexture, 0);
> > 
> > We need to be sure this handles resizes correctly.
> > Also we probably want to leave a comment about how we're not actually trying
> > to use FramebufferTexture2D on the default framebuffer. (framebuffer 0)
> For resize staff, I think if we got a buffer with different size, we will
> kill the old one, and just use the new one in:
> http://mxr.mozilla.org/mozilla-central/source/gfx/layers/opengl/
> CanvasLayerOGL.cpp#404
Before we attach a new texture as the FB0 backing, we need to copy the current contents of the buffer across. Just do a `BlitFramebufferToTexture(0, mBackBuffer)` before the FramebufferTexture2D.
(In reply to Jeff Gilbert [:jgilbert] from comment #8)
> (In reply to Chiajung Hung [:chiajung] from comment #4)
> > (In reply to Jeff Gilbert [:jgilbert] from comment #2)
> > > Comment on attachment 710026 [details] [diff] [review]
> > > Patch V1
> > > 
> > > Review of attachment 710026 [details] [diff] [review]:
> > > -----------------------------------------------------------------
> > > 
> > > We need to assure that this behaves properly with WebGL's alpha,
> > > preserveDrawingBuffer, and resize mechanics. This would also break AA stuff
> > > if we actually implemented that on GLES, but this is probably acceptable for
> > > the short term.
> > > 
> > > ::: gfx/gl/GLContextProviderEGL.cpp
> > > @@ +409,4 @@
> > > >          return true;
> > > >      }
> > > >  
> > > > +    bool BindExternalBuffer(GLuint texture, void* buffer, GLenum target)
> > > 
> > > This function no longer necessarily binds an external buffer. I would advise
> > > renaming this to something like AttachNativeBufferToTexture.
> > Do you mean I rename it in this patch?
> Since we are changing its behavior in this patch, yes.
Sure. Let me got it done in next patch, too.
> > > 
> > > @@ +2579,5 @@
> > > >      nsRefPtr<GLContextEGL> glContext =
> > > >          GLContextEGL::CreateEGLPBufferOffscreenContext(pbufferSize, aFormat, !usePBuffers);
> > > >  
> > > > +#ifdef MOZ_WIDGET_GONK
> > > > +    sEGLLibrary.LoadConfigSensitiveSymbols();
> > > 
> > > Why is this needed here? This function should be being called at EGL library
> > > init time. Historically, we had this because this was where we loaded
> > > glEGLImageTargetTexture2D, but now that that's in GLContext instead, we
> > > should be loading EGLImage exts at the same time as other EGL exts.
> > Not exactly. If we create the GLContext *not* from a nsIWidget, those symbol
> > are not loaded. Call those function in WebGL without this crashes the
> > process when trying to call fCreateImage.
> The initialization for these symbols should be changed to be loaded in all
> cases. I think I had a patch to do this at some point, but it got lost.
In fact I think we should load all available functions no matter we need it...
> > 
> > > 
> > > @@ +328,5 @@
> > > > +        mGLContext->fGenTextures(1, &mBackTexture);
> > > > +      mGLContext->BindExternalBuffer(mBackTexture, grfxBuffer->getNativeBuffer(), LOCAL_GL_TEXTURE_2D);
> > > > +
> > > > +      mGLContext->fBindFramebuffer(LOCAL_GL_FRAMEBUFFER, mGLContext->GetOffscreenFBO());
> > > > +      mGLContext->fFramebufferTexture2D(LOCAL_GL_FRAMEBUFFER, LOCAL_GL_COLOR_ATTACHMENT0, LOCAL_GL_TEXTURE_2D, mBackTexture, 0);
> > > 
> > > We need to be sure this handles resizes correctly.
> > > Also we probably want to leave a comment about how we're not actually trying
> > > to use FramebufferTexture2D on the default framebuffer. (framebuffer 0)
> > For resize staff, I think if we got a buffer with different size, we will
> > kill the old one, and just use the new one in:
> > http://mxr.mozilla.org/mozilla-central/source/gfx/layers/opengl/
> > CanvasLayerOGL.cpp#404
> Before we attach a new texture as the FB0 backing, we need to copy the
> current contents of the buffer across. Just do a
> `BlitFramebufferToTexture(0, mBackBuffer)` before the FramebufferTexture2D.
I am wondering why we need this? Since we just switch the buffer for Compositor, and Content side will always get the newest fresh buffer with correct size to be render. And in current sync transactions, we can not change dimension and composite at the same time.
(In reply to Vladimir Vukicevic [:vlad] [:vladv] from comment #3)
> I would worry about not specifying HW_RENDER -- technically that could mean
> that we'd get back a buffer for which hardware rendering is not allowed. 
> (It might just happen to work on the QC chipset, but might not elsewhere.)
> 
> Where is the GraphicBuffer actually created?  (That is, who calls that
> Create function?)

By the way, I think the HW flags should be rather safe in fact. Since those SW flags will prevent any possible optimization for HW access in gralloc_dev.

Basically, the SW flag *should* make the CPU related memory cache enabled, and when doing anything to alter the content, a full cache maintain should occurs to avoid artifact I think.

Well, that just some experience from my last job :)
And if you are concerning that part, I think we should default to SW_READ | SW_WRITE | HW_RENDER | HW_TEXTURE as the bad interface design make me hard to specify it myself in many case.
Yeah, I agree with that... I was just worried about a driver that checks for HW_RENDER and just disallows rendering.

However, can we default to that?  Does adding HW_RENDER change the bgr/rgb layout like you saw before, or does that only happen without the SW_* flags?  I'm just really worried about changing anything that affects non-canvas.  Doing that will make it much harder to get this in for 1.0.x.
(In reply to Vladimir Vukicevic [:vlad] [:vladv] from comment #11)
> Yeah, I agree with that... I was just worried about a driver that checks for
> HW_RENDER and just disallows rendering.
> 
> However, can we default to that?  Does adding HW_RENDER change the bgr/rgb
> layout like you saw before, or does that only happen without the SW_* flags?
> I'm just really worried about changing anything that affects non-canvas. 
> Doing that will make it much harder to get this in for 1.0.x.

The real magic which make the color space problem is Thebes related stuff I think, since we default gralloc textures to BGRA types in 
http://mxr.mozilla.org/mozilla-central/source/gfx/gl/GLContextProviderEGL.cpp#1273

The cause of the problem is because I tried to fix the program type based on the usage. But since PLayers::AllocGrallocBuffer use a version without usage, we can not handle that correct at all...As a result, I tried to alter the protocal and got things more complicated and error prone...
Attached patch WIP V2 (obsolete) — Splinter Review
This patch fixes the frontend related problem. As Layer staffs make things complicate, I tried many solution to make it work.

Some small note:
In fact we do not share SurfaceDescriptor inter-process but the GrallocBuffer. So it is not easy to add a frontend member into SurfaceDescriptor. As a result, I decide to describe that in out transaction operations.

In next version, I would like to fix all Jeff mentioned problems, and change the frontend from int type to a enumeration.
Attachment #710026 - Attachment is obsolete: true
Attachment #710026 - Flags: review?(vladimir)
Attached patch Patch V2 (obsolete) — Splinter Review
Here is a small changeset from V1:

- Fix Canvas2D
- Add frontend type for canvas transaction
- Add glFinish for safe
  - Note: This cost 2~3 fps in CrystalSkull and 2~5 fps in glClear test
- Change the function name for external texture and fix related code
Attachment #710088 - Attachment is obsolete: true
Attachment #711139 - Flags: review?(jgilbert)
Comment on attachment 711139 [details] [diff] [review]
Patch V2

Review of attachment 711139 [details] [diff] [review]:
-----------------------------------------------------------------

I still don't get it why we need to pass the frontend type around :-/

::: gfx/gl/GLContext.h
@@ +405,4 @@
>      void ApplyFilterToBoundTexture(GLuint aTarget,
>                                     gfxPattern::GraphicsFilter aFilter);
>  
> +    virtual bool BindExternalTexture(GLenum target, GLuint texture, void* buffer ) { return false; }

This method is no longer for only "External" buffer. I think something like jgilbert's AttachNativeBufferToTexture or BindNativeBufferToTexture is better.
(In reply to Kan-Ru Chen [:kanru] from comment #15)
> Comment on attachment 711139 [details] [diff] [review]
> Patch V2
> 
> Review of attachment 711139 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I still don't get it why we need to pass the frontend type around :-/
Since output of 2D Canvas is BGRA now. We have to select the correct shader type at Compositor side to show the content correctly. 
(NOTE: If SkiaGL enabled, this would become redundant though...)

Or we can change the output format of 2D Canvas to RGBA. But I don't know how to do that now...:S

By the way, if we can change the output format of thebes from BGRA to RGBA simply, I think we should change ThebesLayer output format for B2G, too. Since we output RGBA from compositor, and a BGRA input need a swizzling operation for each fragment which *should* slightly expensive then assignment.
> 
> ::: gfx/gl/GLContext.h
> @@ +405,4 @@
> >      void ApplyFilterToBoundTexture(GLuint aTarget,
> >                                     gfxPattern::GraphicsFilter aFilter);
> >  
> > +    virtual bool BindExternalTexture(GLenum target, GLuint texture, void* buffer ) { return false; }
> 
> This method is no longer for only "External" buffer. I think something like
> jgilbert's AttachNativeBufferToTexture or BindNativeBufferToTexture is
> better.
Hmm...Since this function is used as a utility to create external image from native buffer and bind it into texture, I think the name is adequate. But I agree BindNativeBufferToTexture is as good, too.
Let me change to this name :P
Attached patch Patch V3 (obsolete) — Splinter Review
Fix the function name.
Attachment #711139 - Attachment is obsolete: true
Attachment #711139 - Flags: review?(jgilbert)
Attachment #711159 - Flags: review?(jgilbert)
This seems to lock up the device often on the 2nd webgl load, especially on a new page in the browser.  There's often console spam that looks like:

W/Adreno200-GSL(  430): <gsl_ldd_control:225>: ioctl code 0xc0140910 (IOCTL_KGSL_RINGBUFFER_ISSUEIBCMDS) failed: errno 16 Device or resource busy
W/Adreno200-GSL(  117): <gsl_ldd_control:225>: ioctl code 0xc0140910 (IOCTL_KGSL_RINGBUFFER_ISSUEIBCMDS) failed: errno 16 Device or resource busy
W/Adreno200-EGL(  117): <qeglDrvAPI_eglSwapBuffers:3451>: EGL_BAD_ALLOC
Rather no, that starts with:

W/Adreno200-GSL(  454): <gsl_ldd_control:225>: ioctl code 0x400c0907 (IOCTL_KGSL_DEVICE_WAITTIMESTAMP_CTXTID) failed: errno 110 Connection timed out
W/Adreno200-GSL(  454): <gsl_ldd_control:225>: ioctl code 0xc0140910 (IOCTL_KGSL_RINGBUFFER_ISSUEIBCMDS) failed: errno 16 Device or resource busy
W/Adreno200-ES20(  454): <gl2_surface_swap:43>: GL_OUT_OF_MEMORY
W/Adreno200-EGL(  454): <qeglDrvAPI_eglSwapBuffers:3451>: EGL_BAD_ALLOC
W/Adreno200-GSL(  594): <gsl_ldd_control:225>: ioctl code 0xc0140910 (IOCTL_KGSL_RINGBUFFER_ISSUEIBCMDS) failed: errno 16 Device or resource busy
W/Adreno200-ES20(  594): <finish_current_fbo_rendering:141>: GL_OUT_OF_MEMORY
W/Adreno200-GSL(  594): <gsl_ldd_control:225>: ioctl code 0xc0140910 (IOCTL_KGSL_RINGBUFFER_ISSUEIBCMDS) failed: errno 16 Device or resource busy
Diego, any idea what the ioctl errors above might mean?  the WAITTIMESTAMP_CTXID one is the first one that we see.
I think the real start is:

01-08 23:43:48.309   475   475 E libgenlock: perform_lock_unlock_operation: GENLOCK_IOC_LOCK failed (lockType0x1, err=Connection timed out fd=85)
01-08 23:43:48.309   475   475 E msm7627a.gralloc: gralloc_lock: genlock_lock_buffer (lockType=0x2) failed
01-08 23:43:48.309   475   475 W GraphicBufferMapper: lock(...) failed -22 (Invalid argument)
01-08 23:43:48.319   475   496 W Adreno200-GSL: <gsl_ldd_control:225>: ioctl code 0xc0140910 (IOCTL_KGSL_RINGBUFFER_ISSUEIBCMDS) failed: errno 16 Device or resource busy
01-08 23:43:48.319   475   496 W Adreno200-EGL: <qeglDrvAPI_eglSwapBuffers:3451>: EGL_BAD_ALLOC
01-08 23:43:48.979   612   612 E libgenlock: perform_lock_unlock_operation: GENLOCK_IOC_LOCK failed (lockType0x1, err=Connection timed out fd=28)
01-08 23:43:48.979   612   612 E msm7627a.gralloc: gralloc_lock: genlock_lock_buffer (lockType=0x2) failed
01-08 23:43:48.979   612   612 W GraphicBufferMapper: lock(...) failed -22 (Invalid argument)
01-08 23:43:49.019   475   496 W Adreno200-GSL: <gsl_ldd_control:225>: ioctl code 0xc0140910 (IOCTL_KGSL_RINGBUFFER_ISSUEIBCMDS) failed: errno 16 Device or resource busy

This is a common problem when using such streaming texture trick. I think I have to delete related textures when not using to workaround this.
And since the blocked thread is <475, 475>, the blocker function must be ShadowCanvasLayerOGL::Swap. Let me check that part.
Comment on attachment 711159 [details] [diff] [review]
Patch V3

Review of attachment 711159 [details] [diff] [review]:
-----------------------------------------------------------------

With this patch I clear improvement in webgl speed

* My glClear() test goes from 12 FPS to 60 FPS
* Crystal skull goes from 8 FPS to 17 FPS

There are some stability issues:

* Gecko runs out of pmem just by opening settings
* WebGL colors are wrong. Crystal skull looks blue

If we can figure the issues above we'll be in very good shape!
Blocks: 835058
> * WebGL colors are wrong. Crystal skull looks blue

BGRA vs RGBA?
Something like that. Could this patch be skipping some conversion or setting the wrong format?
I think GLContext::ChooseGLFormats is used for selecting the pixel format for FBO targets. And I think it currently selects BGRA if supported. You could try returning RGBA unconditionally from that function. And you could try removing all calls to SwapRAndBComponents from GLContext methods.
Could be related to hwc hacks.
(In reply to Chris Jones [:cjones] [:warhammer] from comment #27)
> Could be related to hwc hacks.

That is correct. [1] is probably the culprit. I'll update it if/when this lands

[1] https://www.codeaurora.org/gitweb/quic/b2g/?p=b2g/build.git;a=commit;h=088c3bea92c9b66b0086180e6be62fbda5ba5731
(In reply to Diego Wilson [:diego] from comment #23)
> Comment on attachment 711159 [details] [diff] [review]
> Patch V3
> 
> Review of attachment 711159 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> With this patch I clear improvement in webgl speed
> 
> * My glClear() test goes from 12 FPS to 60 FPS
> * Crystal skull goes from 8 FPS to 17 FPS
> 
> There are some stability issues:
> 
> * Gecko runs out of pmem just by opening settings
> * WebGL colors are wrong. Crystal skull looks blue
Ahh...why settings related to this patch? As I tried to limit the change to WebGL related only...

On my device, the color is correct in CrystalSkull but I noticed that the color is wrong in card view...:S
> 
> If we can figure the issues above we'll be in very good shape!

The most annoying problem is that mentioned by Vlad. I got a full log and I think it is de-init path related.

I will review when we will destroy the front buffer when de-init later.
(In reply to Vladimir Vukicevic [:vlad] [:vladv] from comment #19)
> Rather no, that starts with:
> 
> W/Adreno200-GSL(  454): <gsl_ldd_control:225>: ioctl code 0x400c0907
> (IOCTL_KGSL_DEVICE_WAITTIMESTAMP_CTXTID) failed: errno 110 Connection timed
> out
> W/Adreno200-GSL(  454): <gsl_ldd_control:225>: ioctl code 0xc0140910
> (IOCTL_KGSL_RINGBUFFER_ISSUEIBCMDS) failed: errno 16 Device or resource busy
> W/Adreno200-ES20(  454): <gl2_surface_swap:43>: GL_OUT_OF_MEMORY
> W/Adreno200-EGL(  454): <qeglDrvAPI_eglSwapBuffers:3451>: EGL_BAD_ALLOC
> W/Adreno200-GSL(  594): <gsl_ldd_control:225>: ioctl code 0xc0140910
> (IOCTL_KGSL_RINGBUFFER_ISSUEIBCMDS) failed: errno 16 Device or resource busy
> W/Adreno200-ES20(  594): <finish_current_fbo_rendering:141>: GL_OUT_OF_MEMORY
> W/Adreno200-GSL(  594): <gsl_ldd_control:225>: ioctl code 0xc0140910
> (IOCTL_KGSL_RINGBUFFER_ISSUEIBCMDS) failed: errno 16 Device or resource busy

As I am trying to make the patch slim base on Kanru's suggestion, I found I can reproduce this symptom on Unagi by kill the CanvasLayerOGL::mTexImage after RenderLayer. Which in fact release the reference of GraphicBuffer and introduce some flickering in CrystalSkull. And then you will got similar log from logcat.
Attached patch Patch V4 (obsolete) — Splinter Review
changeset:

- Clear out #ifdef MOZ_WIDGET_GONK
- Merge frontend type into SurfaceDescriptorGralloc 
- Keep user bounded FBOs when change FBO for backend control
Attachment #711159 - Attachment is obsolete: true
Attachment #711159 - Flags: review?(jgilbert)
(In reply to Diego Wilson [:diego] from comment #23)
> Comment on attachment 711159 [details] [diff] [review]
> Patch V3
> 
> Review of attachment 711159 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> With this patch I clear improvement in webgl speed
> 
> * My glClear() test goes from 12 FPS to 60 FPS
> * Crystal skull goes from 8 FPS to 17 FPS
> 
> There are some stability issues:
> 
> * Gecko runs out of pmem just by opening settings
This one should be 836614 (thx Kanru to point out)
> * WebGL colors are wrong. Crystal skull looks blue
> 
> If we can figure the issues above we'll be in very good shape!
After review the buffer path, I think the problem should independent of this patch...or I misunderstood something.

Here is my observation:
Thread#1 -- Compositor: this thread tries to call renderLayer anytime it needed
Thread#2 -- GL updater: this thread exist in Chrome side, when eglSwapBuffer occurred, this thread will queue the composition result to FramebufferNativeWindow
Thread#3 -- I/O thread: this thread calls CanvasLayerOGL::Swap to release old FrontBuffer and assign new one 

The problem here is that we have no synchronize mechanism to check that we can use the buffer safely (on Thread#1 and #2), and no check to make sure we can delete the texture (on Thread #3). 

So we can hit this when content side trigger Thread#3 to release old texture while Compositor still work on the texture.

If this observation is valid, we can
1) setup a flag to indicate renderLayer is still running
2) calls glFinish in renderLayer to make sure the GL will done with the texture when level renderLayer
3) check the flag in swap
> 2) calls glFinish in renderLayer to make sure the GL will done with the
> texture when level renderLayer
2) calls glFinish in renderLayer to make sure the GL drive is done with the texture when leave renderLayer.
> 3) check the flag in swap

Some typo :p
Attached patch Patch V4.1 (obsolete) — Splinter Review
To solve the BAD_ALLOC problem, I reviewed destroy path and made the mBackTexture sync its state with mBackBuffer to make sure the texture is accessed only when mBackBuffer is accessible.

Since I can not reproduce the problem on my Unagi, and I have no GeekPhone in hand, I will check whether I can got one in Taipei office for debug the problem in more detail later.
Attachment #711660 - Attachment is obsolete: true
Chiajung, there's no GeekPhone in Taipei.
I can try it for you, though unlikely to be able to do it before Monday/Tuesday.  However, we were seeing the problem on Unagi as well, just rarer..
(In reply to James Ho from comment #36)
> Chiajung, there's no GeekPhone in Taipei.

OK, let me try to reproduce this on Unagi...

(In reply to Vladimir Vukicevic [:vlad] [:vladv] from comment #37)
> I can try it for you, though unlikely to be able to do it before
> Monday/Tuesday.  However, we were seeing the problem on Unagi as well, just
> rarer..

Can you provide the test case/website for me? I tried to test it with the glClear test and can not reproduce it by refresh the page endlessly in several minutes. And since the CanvasLayer affect 2D Canvas as well, I tried GUIBenchmark but no lucky...
Attached patch Patch V4.2Splinter Review
Change log:
- Fix a GL_FRAMEBUFFER_INCOMPLETE bug when page reload
  - When reload a page with <canvas>:
    1) SetBackBuffer called with a new valid SurfaceDescriptor
    2) SetBackBuffer called with a invalid SurfaceDescriptor
    3) BasicCanvasLayer::Paint called (NOT BasicShadowableCanvasLayer) and cause glError 0x0506. 
  - Rollback to default render target if invalid back buffer set
Attachment #714253 - Attachment is obsolete: true
(In reply to Vladimir Vukicevic [:vlad] [:vladv] from comment #37)
> I can try it for you, though unlikely to be able to do it before
> Monday/Tuesday.  However, we were seeing the problem on Unagi as well, just
> rarer..

By the way, if you get time to test this patch, please help to log with

adb logcat -v threadtime 2>&1 | tee log.txt

This will log the tid at the same time to help clarify the problem.
(In reply to Chiajung Hung [:chiajung] from comment #21)
> I think the real start is:
> 
> 01-08 23:43:48.309   475   475 E libgenlock: perform_lock_unlock_operation:
> GENLOCK_IOC_LOCK failed (lockType0x1, err=Connection timed out fd=85)
> 01-08 23:43:48.309   475   475 E msm7627a.gralloc: gralloc_lock:
> genlock_lock_buffer (lockType=0x2) failed
> 01-08 23:43:48.309   475   475 W GraphicBufferMapper: lock(...) failed -22
> (Invalid argument)
> 01-08 23:43:48.319   475   496 W Adreno200-GSL: <gsl_ldd_control:225>: ioctl
> code 0xc0140910 (IOCTL_KGSL_RINGBUFFER_ISSUEIBCMDS) failed: errno 16 Device
> or resource busy
> 01-08 23:43:48.319   475   496 W Adreno200-EGL:
> <qeglDrvAPI_eglSwapBuffers:3451>: EGL_BAD_ALLOC
> 01-08 23:43:48.979   612   612 E libgenlock: perform_lock_unlock_operation:
> GENLOCK_IOC_LOCK failed (lockType0x1, err=Connection timed out fd=28)
> 01-08 23:43:48.979   612   612 E msm7627a.gralloc: gralloc_lock:
> genlock_lock_buffer (lockType=0x2) failed
> 01-08 23:43:48.979   612   612 W GraphicBufferMapper: lock(...) failed -22
> (Invalid argument)
> 01-08 23:43:49.019   475   496 W Adreno200-GSL: <gsl_ldd_control:225>: ioctl
> code 0xc0140910 (IOCTL_KGSL_RINGBUFFER_ISSUEIBCMDS) failed: errno 16 Device
> or resource busy
> 
> This is a common problem when using such streaming texture trick. I think I
> have to delete related textures when not using to workaround this.

The <pid,tid> = <612, 612> should be fixed by patch V4.2.

<475, 475> is b2g main thread, which I think is other problem but I will check detail later (since locktype 0x1 in libgenlock is WRITE if the vendor do not change it, the only write usage I can think out is DrawThebes in this case). <475, 496> is CompositorThread which should be affected by main thread.
Hi Vlad,

I confirmed that the buffer in ThebesLayer should independent to CanvasLayer change, the problem should be independent or affected by other process/thread.

If you got chance to retry this, can you please give me 2 full log:
1) adb logcat -v threadtime 2>&1 | tee log.txt
  please keep the log from startup
2) adb shell dmesg > klog.txt 
  please keep the log when problem occur

Thanks!!
WebGL currently is basically not usable for intended purposes (we do use it for applying image filters though), so even if this patch has risk for webgl, I would consider uplifting it.  It's one of those "fixes more problems than regressions that might caused" issues.

If this has risk to non-webgl stuff though, the calculus is different.
blocking-b2g: --- → leo?
Needed for competitive parity, a usable 3d drawing api.
Whiteboard: [tech-p1]
Blocks: 835058
Please request uplift when we have a better idea of the riskyness of this patch.
blocking-b2g: leo? → -
tracking-b2g18: --- → +
This issue was fixed at bug 843599
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Bug 843599 only landed in m-c.
Re-open this for b2g18.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(In reply to peter chang[:pchang] from comment #47)
> Bug 843599 only landed in m-c.
> Re-open this for b2g18.

Do we have any plans to even land that on b2g18? We're only able to land leo blockers on b2g18 at this point, so I don't think I see a patch as seen on bug 843599 landing on that branch.
Keywords: perf
Priority: -- → P2
Whiteboard: [tech-p1] → [c= p= s= u=] [tech-p1]
I don't think this can/should go into b2g18.  WebGL is just not going to be viable on that platform.  I'm not even sure if the RGB swap issue was fixed in the current patch, and we haven't done much (any) testing with it.
vlad,

The RGB swap issue is fixed with the patch in bug 886415. I have tested in m-c and it works as expected by honoring the flag added to Gecko in bug 881460. It's a gonk-side fix so it'll be part of an upcoming CAF release.
Is this still a valid bug or needs to be looked at, or can we close this bug?
Flags: needinfo?(milan)
I think we're already doing this.  Right Jeff?
Flags: needinfo?(milan) → needinfo?(jgilbert)
Yep, we do this.
Status: REOPENED → RESOLVED
Closed: 11 years ago11 years ago
Flags: needinfo?(jgilbert)
Resolution: --- → WORKSFORME
Whiteboard: [c= p= s= u=] [tech-p1] → [c= p= s=2013.11.08 u=] [tech-p1]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: