Closed Bug 912134 Opened 11 years ago Closed 11 years ago

Scroll FPS is low in FFOS1.2 than FFOS 1.1

Categories

(Firefox OS Graveyard :: General, defect, P1)

ARM
Gonk (Firefox OS)
defect

Tracking

(blocking-b2g:koi+, firefox26 fixed)

RESOLVED FIXED
blocking-b2g koi+
Tracking Status
firefox26 --- fixed

People

(Reporter: tkundu, Assigned: sotaro)

References

Details

(Keywords: perf, regression, Whiteboard: [c=handeye p= s= u=1.2])

Attachments

(2 files, 5 obsolete files)

Hi,

Gecko ICS Master Branch's LayerManagerComposite::Render() takes 2 to 3 times more time than Gecko ICS 1.1 LayerManagerOGL:Render(). 

It can be verified by putting simple log inside that function .

It causes low FPS in ICS Master branch. Can anyone fix this regression in ICS Master branch ?
Keywords: perf
Summary: Scroll FPS is low in Master than 1.1 → Scroll FPS is low in ICS Master than ICS 1.1
Tapas, can you please provides more detail on the type of visible UX degradation that this will regression will cause for existing v1.1 devices.
blocking-b2g: --- → koi?
In ICS 1.1 Contact Scroll, we see top FPS as ~55fps . But we are seeing 34fps as top FPS in ICS master Contact Scroll. 

Average GPU Composition delay in LayerManagerComposite::Render() is ~18ms in "ICS master" 

Average GPU Composition delay in  LayerManagerOGL:Render() is ~9ms in "ICS 1.1".






(In reply to Michael Vines [:m1] [:evilmachines] from comment #1)
> Tapas, can you please provides more detail on the type of visible UX
> degradation that this will regression will cause for existing v1.1 devices.
What device are you running on?  I've had issues with gralloc lock timeouts during contacts scroll on the buri/hamachi platform.

And does "ICS master" mean mozilla-central and gaia/master?
(In reply to Ben Kelly [:bkelly] from comment #3)
> What device are you running on?  I've had issues with gralloc lock timeouts
> during contacts scroll on the buri/hamachi platform.
> 

Does LayerManagerComposite::Render() function can cause delay from gralloc ?? I guess "NO" . 

> And does "ICS master" mean mozilla-central and gaia/master?

Yes
(In reply to Ben Kelly [:bkelly] from comment #3)
> What device are you running on?  I've had issues with gralloc lock timeouts
> during contacts scroll on the buri/hamachi platform.
> 
> And does "ICS master" mean mozilla-central and gaia/master?

I am running on QRD device. Can you please measure FPS or delay from above function in your device ?
Both ICS Master and ICS 1.1 uses BindAndDrawQuadWithTextureRect() to draw texture using glDrawArrays(). 

BindAndDrawQuadWithTextureRect accepts one parameter aTexCoorRect 

which is defined as below in source code :

"|aTexCoordRect| is the rectangle from the texture that we want to draw using the given program" . 

It means if size of aTexCoordRect is larger than BindAndDrawQuadWithTextureRect can cause long delay during calling glDrawArrays().  If we print size of aTexCoordRect from BindAndDrawQuadWithTextureRect  to logcat then we can see that ICS Master branch is trying to draw much larger texture than 1.1 for same device. 

My guess is that this is causing main delay in gldrawArray() in master.
(In reply to Tapas Kumar Kundu from comment #6)
> Both ICS Master and ICS 1.1 uses BindAndDrawQuadWithTextureRect() to draw
> texture using glDrawArrays(). 
> 
> BindAndDrawQuadWithTextureRect accepts one parameter aTexCoorRect 
> 
> which is defined as below in source code :
> 
> "|aTexCoordRect| is the rectangle from the texture that we want to draw
> using the given program" . 
> 
> It means if size of aTexCoordRect is larger than
> BindAndDrawQuadWithTextureRect can cause long delay during calling
> glDrawArrays().  If we print size of aTexCoordRect from
> BindAndDrawQuadWithTextureRect  to logcat then we can see that ICS Master
> branch is trying to draw much larger texture than 1.1 for same device. 
> 
> My guess is that this is causing main delay in gldrawArray() in master.


This observation is for Contact Scroll (with 500 contacts loaded) in both ICS master and ICS 1.1 running on same device. 

Please compare size of 2nd parameter passed to CompositorOGL::BindAndDrawQuadWithTextureRect (in ICS master branch FFOS 1.2) 
with LayerManagerOGL::BindAndDrawQuadWithTextureRect (in ICS FFOS 1.1).

This will show that we are drawing much bigger texture for each layer in FFOS 1.2  than  FFOS 1.1 during contact scroll. 

This may cause fDrawArrays() to take long time in FFOS 1.2 than FFOS 1.1 .  This seems to be regression.

Please confirm.
From the app perspective I believe the DOM structure is fairly similar in both v1.1 and v1.2.

Milan, does the situation in comment 6 and comment 7 sound gfx related?  Could it be related to the layers refactor in m-c?
Flags: needinfo?(milan)
Keywords: regression
Whiteboard: [perf-reviewed]
blocking-b2g: koi? → koi+
Whiteboard: [perf-reviewed] → [perf-reviewed][c=handeye p= s= u=]
Priority: -- → P1
BenWa, can you take a look?  I don't know if it's pan/zoom or any of the other things we're tracking already, or something new.
Flags: needinfo?(milan) → needinfo?(bgirard)
CompositorOGL::BindAndDrawQuadWithTextureRect is not getting hit for me on b2g master. Still looking.
Flags: needinfo?(bgirard)
(In reply to Benoit Girard (:BenWa) from comment #10)
> CompositorOGL::BindAndDrawQuadWithTextureRect is not getting hit for me on
> b2g master. Still looking.

Which device are you using ? Can you please force GPU Composition ?
Flags: needinfo?(bgirard)
I'm using the arm emulator. I am using OGL Composition.
Flags: needinfo?(bgirard)
Can you please use actual device ? This may help to reproduce this issue.
Flags: needinfo?(bgirard)
Summary: Scroll FPS is low in ICS Master than ICS 1.1 → Scroll FPS is low in FFOS1.2 than FFOS 1.1
We have likely at least two problems here, and they might be related.

Our chipset vendor friends found that we are compositing (via the GPU) larger textures since the refactoring, which is slower. They will post the differences between 1.1 and 1.2 they observed in a follow-up comment.

Vivien found that in settings but also contacts we are no longer using buffer rotation. Instead, we render with multiple constantly changing layers as we scroll. Repainting all those layers is likely causing part of the slowdown. Details to follow from Vivien and me.
About gralloc buffer handling, there might be 2 gralloc usage conflicts.
[1] conflict between OpenGL and HwComposer
[2] conflict between application and LayerManagerComposite.

It become clear that [2] is happening. By disabling HWComposer, scroll becomes smooth and gen lock failure becomes smooth. [1] still could happen, need to check about it.
(In reply to Sotaro Ikeda [:sotaro] from comment #15)
> About gralloc buffer handling, there might be 2 gralloc usage conflicts.
> [1] conflict between OpenGL and HwComposer
> [2] conflict between application and LayerManagerComposite.
> 
> It become clear that [2] is happening. By disabling HWComposer, scroll
> becomes smooth and gen lock failure becomes smooth. [1] still could happen,
> need to check about it.

Sorry, correction. [1] is happening. Need to check about [2].
Filed bug 914143 for the bad layer creation.
Depends on: 914143
No longer depends on: 914143
Depends on: 914143
(In reply to Sotaro Ikeda [:sotaro] from comment #15)
> About gralloc buffer handling, there might be 2 gralloc usage conflicts.
> [1] conflict between OpenGL and HwComposer
> [2] conflict between application and LayerManagerComposite.

When using gralloc buffers, the user of the buffers need to grab genlock. OpenGL rendering and HwCompser rendering could switch dynamically in each rendering. If HwComposer or OpenGL continue to keep gralloc buffer's genlock, another one can not get genlock and wait until got the genlock is freed. If the wait time is longer than genlock timeout, "genlock failure" log printed to logcat. I wrote this genlock owner ship competiion as "gralloc usage conflicts".

This "gralloc usage conflicts" has 2 patterns.
-(1) multiple user try to use gralloc buffers simultaneously.
-(2) genlock is not released after user.

In b2g18, Bug 879297, Bug 898919 are related to genlock failure. Both cases are related camera/video frame rendering and both are categorized to (2).

The following page explain details about gralloc buffers.
 https://wiki.mozilla.org/Platform/GFX/Gralloc
(In reply to Sotaro Ikeda [:sotaro] from comment #15)
> About gralloc buffer handling, there might be 2 gralloc usage conflicts.
> [1] conflict between OpenGL and HwComposer
> [2] conflict between application and LayerManagerComposite.

At first, we think about [1]'s possibility, because during web page scrolling, genlock failure had happened. But [1] seems not related to low fps on OpenGL rendering. I had a chat with codeaurora's engineer and their recognition was also same. Low fps happens even when HwComposer is disabled. I am going to think more about [2].
Yes, disabling HWC does not address the low FPS issue
I found a one candidate of the problem. In b2g18, OpenGL's texture binding happened on the following timing.
- (1) video/camera frame(gralloc yuv) rendering: Bind to texture just before the rendering.
- (2) other layers rendering: bind to texture during buffer swap

In master, all layers rendering were change to (1) in b2g. In b2g18, (1) made a lot of troubles about genlock failure.

In b2g18, (2) is done like following.
ShadowThebesLayerOGL::Swap()
 ->ShadowLayerManager::OpenDescriptorForDirectTexturing()
   ->GLContextEGL::CreateDirectTextureImage()
     ->DirectTextureImageEGL::BindTexture(LOCAL_GL_TEXTURE0);
Bug 908276 is a bug to change a timing of Texture binding from before rendering to swap. I am going to check it.
One observation: 
On 1.1, the calls to BindAndDrawQuadWithTextureRect()are pretty much back-back and the function takes about 3-6 ms on the average on each call.
On 1.2, the function itself seems to be executing much faster, but, there seems to be some processing in-between that is delaying the calls by 4-6 ms in between. Will try to track down what's happening in-between.
Stacktrace of GraphicBufferMapper::lock() failure.

(gdb) info stack
#0  0x4003d68a in android::GraphicBufferMapper::lock (this=0xffffffea, handle=<value optimized out>, usage=<value optimized out>, 
    bounds=<value optimized out>, vaddr=0xbed2c8b4) at frameworks/base/libs/ui/GraphicBufferMapper.cpp:80
#1  0x4003cbb2 in android::GraphicBuffer::lock (this=0x45c83600, usage=51, rect=..., vaddr=<value optimized out>)
    at frameworks/base/libs/ui/GraphicBuffer.cpp:175
#2  0x4003cbda in android::GraphicBuffer::lock (this=0x40143300, usage=320, vaddr=0xbed2c8b4) at frameworks/base/libs/ui/GraphicBuffer.cpp:162
#3  0x417822a2 in mozilla::layers::ShadowLayerForwarder::PlatformOpenDescriptor (aMode=<value optimized out>, aSurface=<value optimized out>)
    at /home/sotaro/b2g_master_hamachi/B2G/gecko/gfx/layers/ipc/ShadowLayerUtilsGralloc.cpp:483
#4  0x417aa124 in mozilla::layers::ShadowLayerForwarder::OpenDescriptor (aMode=1075065600, aSurface=...)
    at /home/sotaro/b2g_master_hamachi/B2G/gecko/gfx/layers/ipc/ShadowLayers.cpp:579
#5  0x417ace34 in mozilla::layers::DeprecatedTextureClientShmem::GetSurface (this=0x478dd200)
    at /home/sotaro/b2g_master_hamachi/B2G/gecko/gfx/layers/client/TextureClient.cpp:400
#6  0x417ace4a in mozilla::layers::DeprecatedTextureClientShmem::LockSurface (this=0x40143300)
    at ../../dist/include/mozilla/layers/TextureClient.h:469
#7  0x417afc14 in mozilla::layers::ThebesLayerBuffer::EnsureBuffer (this=<value optimized out>)
    at /home/sotaro/b2g_master_hamachi/B2G/gecko/gfx/layers/ThebesLayerBuffer.cpp:438
#8  0x417afc76 in mozilla::layers::ThebesLayerBuffer::GetContextForQuadrantUpdate (this=0x40143300, aBounds=..., aSource=480, 
    aTopLeft=0xbed2c8b4) at /home/sotaro/b2g_master_hamachi/B2G/gecko/gfx/layers/ThebesLayerBuffer.cpp:330
#9  0x41798680 in mozilla::layers::ContentClientDoubleBuffered::UpdateDestinationFrom (this=0x472f62c0, aSource=..., aUpdateRegion=...)
    at /home/sotaro/b2g_master_hamachi/B2G/gecko/gfx/layers/client/ContentClient.cpp:530
#10 0x417988f4 in mozilla::layers::ContentClientDoubleBuffered::SyncFrontBufferToBackBuffer (this=0x472f62c0)
    at /home/sotaro/b2g_master_hamachi/B2G/gecko/gfx/layers/client/ContentClient.cpp:518
#11 0x4178f172 in mozilla::layers::ClientThebesLayer::PaintThebes (this=0x47e8e800)
    at /home/sotaro/b2g_master_hamachi/B2G/gecko/gfx/layers/client/ClientThebesLayer.cpp:73
#12 0x4178f56e in mozilla::layers::ClientThebesLayer::RenderLayer (this=0x47e8e800)
---Type <return> to continue, or q <return> to quit---
    at /home/sotaro/b2g_master_hamachi/B2G/gecko/gfx/layers/client/ClientThebesLayer.cpp:139
#13 0x4178e04c in mozilla::layers::ClientContainerLayer::RenderLayer (this=<value optimized out>)
    at /home/sotaro/b2g_master_hamachi/B2G/gecko/gfx/layers/client/ClientContainerLayer.h:81
#14 0x4178e04c in mozilla::layers::ClientContainerLayer::RenderLayer (this=<value optimized out>)
    at /home/sotaro/b2g_master_hamachi/B2G/gecko/gfx/layers/client/ClientContainerLayer.h:81
#15 0x4178ebe6 in mozilla::layers::ClientLayerManager::EndTransactionInternal (this=0x49175cc0, aCallback=<value optimized out>, 
    aCallbackData=<value optimized out>) at /home/sotaro/b2g_master_hamachi/B2G/gecko/gfx/layers/client/ClientLayerManager.cpp:185
#16 0x4178ec82 in mozilla::layers::ClientLayerManager::EndTransaction (this=0x49175cc0, 
    aCallback=0x40ea3c31 <mozilla::FrameLayerBuilder::DrawThebesLayer(mozilla::layers::ThebesLayer*, gfxContext*, nsIntRegion const&, nsIntRegion const&, void*)>, aCallbackData=0xbed2cf40, aFlags=mozilla::layers::LayerManager::END_NO_COMPOSITE)
    at /home/sotaro/b2g_master_hamachi/B2G/gecko/gfx/layers/client/ClientLayerManager.cpp:208
#17 0x40ec45de in nsDisplayList::PaintForFrame (this=<value optimized out>, aBuilder=0xbed2cf40, aCtx=<value optimized out>, 
    aForFrame=<value optimized out>, aFlags=13) at /home/sotaro/b2g_master_hamachi/B2G/gecko/layout/base/nsDisplayList.cpp:1223
#18 0x40ec4738 in nsDisplayList::PaintRoot (this=0xbed2d2d8, aBuilder=0xbed2cf40, aCtx=0x0, aFlags=13)
    at /home/sotaro/b2g_master_hamachi/B2G/gecko/layout/base/nsDisplayList.cpp:1070
#19 0x40ed4cc4 in nsLayoutUtils::PaintFrame (aRenderingContext=<value optimized out>, aFrame=0x45a55298, aDirtyRegion=<value optimized out>, 
    aBackstop=<value optimized out>, aFlags=772) at /home/sotaro/b2g_master_hamachi/B2G/gecko/layout/base/nsLayoutUtils.cpp:2150
#20 0x40ee1fd6 in PresShell::Paint (this=0x40349d00, aViewToPaint=<value optimized out>, aDirtyRegion=<value optimized out>, aFlags=1)
    at /home/sotaro/b2g_master_hamachi/B2G/gecko/layout/base/nsPresShell.cpp:5616
#21 0x4114eb86 in nsViewManager::ProcessPendingUpdatesForView (this=0x45cfe4c0, aView=0x45b289c0, aFlushDirtyRegion=<value optimized out>)
    at /home/sotaro/b2g_master_hamachi/B2G/gecko/view/src/nsViewManager.cpp:411
#22 0x4114ec00 in nsViewManager::ProcessPendingUpdates (this=<value optimized out>)
    at /home/sotaro/b2g_master_hamachi/B2G/gecko/view/src/nsViewManager.cpp:1040
#23 0x40ee6c16 in nsRefreshDriver::Tick (this=<value optimized out>, aNowEpoch=1378820469939457, aNowTime=...)
---Type <return> to continue, or q <return> to quit---
Opening the back buffer for writing fails. Someone must be holding a lock on the back buffer still.
(In reply to Andreas Gal :gal from comment #25)
> Opening the back buffer for writing fails. Someone must be holding a lock on
> the back buffer still.

Have we already tried to go back to using b2g18's hack for unlocking gralloc buffers?

What I recently understood from the semantics of gralloc locking for compositing is that binding the gralloc buffer to a GL texture will lock it, and that it causes the previous gralloc buffer that was bound to this GL texture will be unlocked at the next buffer swap of the screen (I mean like EGLSwapBuffers or whatever it is called).

"The next buffer swap" is the important part here because we used to think it would be unlocked right away. So if we are trying to prepare the next frame in the back buffer while the current frame (the front buffer) hasn't hit the screen yet, opening the back buffer could fail the way you said (if my understanding is right).

I am not sure about what exactly happens in the case of unlocking with a null EGLImage, apart from that we used to do that on b2g18 (and that some drivers don't like it), but it could behave differently as far as this locking problem is concerned.
Sotaro is telling me we never did the null EGLImage hack for regular content, only for video images. The explanation above sounds sane though. We will try the EGLImage hack. I think we should be able to use one shared null image that we let GL ref count on, so we don't really care when that is released.
I seems to find why the patch in Bug 908276 made fps worse than normal. Current b2g master seems to consume more pmem than b2g18. When "out of mem" happens during layer buffer allocation, layer buffer fallback to Shmem(for content process) or MemoryImage(for b2g process).
http://mxr.mozilla.org/mozilla-central/source/gfx/layers/ipc/ISurfaceAllocator.cpp#69

Current gfx layers implementation expect, layer's both front and back buffer are gralloc buffers. If only one part of the buffer is gralloc buffer. GrallocDeprecatedTextureHostOGL continue to bind the gralloc buffer to texture. This might block from an application to get the buffer's genlock. Further, CreateDeprecatedTextureHostOGL() does not handle the fallback from gralloc to shmem/MemoryImage. The function create TextureImageDeprecatedTextureHostOGL in this case. It seems incorrect class be created.

http://mxr.mozilla.org/mozilla-central/source/gfx/layers/opengl/TextureHostOGL.cpp#41
I'm making some progress profiling this:
http://people.mozilla.org/~bgirard/cleopatra/#report=6334f7bd68d228e59af5761570168d4074d9b2b8

There's a surprising amount of time that we're not spending in compositing. It could be idle time or not, I'm trying to confirm that. We also spend over 60% of our compositing time binding the Gralloc texture.
Flags: needinfo?(bgirard)
If comment #28 is correct, the following seems necessary
[1] Bind dummy buffer after end of rendering.
[2] Reduce pmem usage by layers(much layering than b2g18?)
[3] Correctly implement fallback.
Why are we using more pmem?
Depends on: 914847
Here's a profile of the contact app:
http://people.mozilla.org/~bgirard/cleopatra/#report=70f79696ad4b58afe9d1ed57db3ac1b93f1deb08

We're bottlenecked on the content process
We are blocking 50% time in read?
(In reply to Andreas Gal :gal from comment #31)
> Why are we using more pmem?

It is just my feeling received from b2g master. Bug 913821 might affected to it some. nical, how do you think about it?
Flags: needinfo?(nical.bugzilla)
(In reply to Sotaro Ikeda [:sotaro] from comment #30)
> If comment #28 is correct, the following seems necessary
> [1] Bind dummy buffer after end of rendering.

[1] can be implemented by similar way as Bug 909851. It is not clear that it can be uses as a long term solution.
Depends on: 914854
This patch disables creating active layer. Everything gets merged with the root thebes layer. This causes a LOT of extra painting. This patch is to measure the benefit of our layer decisions vs. a naive don't layer anything.

From this on the hamachi with HWC disabled I can scroll Settings at 50 FPS and Contacts with med-workload at ~30FPS. This is with the extra rasterization pressure of not layering scrolling content.

Something is very wrong. I'm working on tracking down what in our layer decisions causes this.
I made basic misunderstanding to attachment 794124 [details] [diff] [review] in Bug 908276. The patch allocate Texture for each GrallocDeprecatedTextureHostOGL. It should be for each CompositableHost. And the patch do not delete OpenGL texture correctly.
Attachment #802644 - Attachment is patch: true
Summarize investigation infos.
- Using same temporary gl textures in CompositorOGL for different texture target
   for same texture unit(Bug 906671)
- Using same texture for different texture target might remain gralloc texture's genlock to be held.
- If same texture is reused among the layers, it might be affect to the performance.
- Layer's gralloc buffers are fallback to Shmem(for content process) or MemoryImage(for b2g process)
  In fallbacked cases, TextureImageDeprecatedTextureHostOGL() is created.
  It is not clear that using  TextureImageDeprecatedTextureHostOGL() in the fallback case is correct.
(In reply to Sotaro Ikeda [:sotaro] from comment #38)
> - If same texture is reused among the layers, it might be affect to the
> performance.

Same texture seems shared by among layers, because temporary texture is decided only by texture unit.
Could there be some issue with the way the following API is being used: 
GL_APICALL void GL_APIENTRY glEGLImageTargetTexture2DOES (GLenum target, GLeglImageOES image);
in that the EGL Image cannot be freed until the GPU h/w has finished with it?
In b2g18, thebes layer create a texture for each gralloc buffer swap.

ShadowThebesLayerOGL::Swap()
 ->ShadowLayerManager::OpenDescriptorForDirectTexturing()
   ->GLContextEGL::CreateDirectTextureImage()
     ->fGenTextures(1, &texture); //passed to DirectTextureImageEGL 
     ->new DirectTextureImageEGL
     ->DirectTextureImageEGL::BindTexture(LOCAL_GL_TEXTURE0);
(In reply to Sotaro Ikeda [:sotaro] from comment #39)
> (In reply to Sotaro Ikeda [:sotaro] from comment #38)
> > - If same texture is reused among the layers, it might be affect to the
> > performance.
> 
> Same texture seems shared by among layers, because temporary texture is
> decided only by texture unit.

Only LOCAL_GL_TEXTURE0(0x84C0) is used for rendering for all layers rendering if there is no camera preview nor video playback. Therefore only one Texture is used for all layers rendering.
(In reply to Sotaro Ikeda [:sotaro] from comment #34)
> (In reply to Andreas Gal :gal from comment #31)
> > Why are we using more pmem?
> 
> It is just my feeling received from b2g master. Bug 913821 might affected to
> it some. nical, how do you think about it?

Bug 913821 and the other bugs related to new textures only affect video and canvas, and in both these cases they don't change the allocation strategy. I don't think it changes anything for general scrolling performances.
If we are using more pmem it could be the differences in how we layerize. The buffer rotation stuff (which is the most exercised path when visiting regular pages, scrolling, etc.) is a bit out of my comfort zone so I don't think it has changed its allocation strategy but I am not 100% sure.
Flags: needinfo?(nical.bugzilla)
(In reply to Sotaro Ikeda [:sotaro] from comment #38)
> - Layer's gralloc buffers are fallback to Shmem(for content process) or
> MemoryImage(for b2g process)
>   In fallbacked cases, TextureImageDeprecatedTextureHostOGL() is created.
>   It is not clear that using  TextureImageDeprecatedTextureHostOGL() in the
> fallback case is correct.

If you fallback to shmem or ram, the only code path that lets you upload that to a GL texture on the layers side is TextureImageDeprecatedTextureHostOGL (or with new textures, the equivalent which is TextureImageTextureSourceOGL).
I don't see what is not correct about this. We could manually upload to a GLTexture but that would not buy us anything over using a TextureImage since TextureImage::DirectUpdate just does that.
In my leo device with latest m-c, I found the compositor took about 10~15 ms for LayerManagerComposite::Render() during home scrolling, it took about 0~4 ms on b2g18.

After digging more, I found it spent lots of time inside GrallocDeprecatedTextureHostOGL::BindTexture() calls, and I didn't see long delay on glDrawArray calls.
http://mxr.mozilla.org/mozilla-central/source/gfx/layers/opengl/TextureHostOGL.cpp#1091


BTW, I didn't see the pmem not enough in my leo.

#0  mozilla::layers::GrallocDeprecatedTextureHostOGL::BindTexture (this=0x46beb670, aTextureUnit=33984) at /media/ramdisk/b2g_central_new/gfx/layers/opengl/TextureHostOGL.cpp:1114
#1  0x415b6ed0 in mozilla::layers::AutoBindTexture::Bind (this=<value optimized out>, aTexture=0x46beb670, aTextureUnit=1) at /media/ramdisk/b2g_central_new/gfx/layers/opengl/CompositorOGL.cpp:993
#2  0x415b900a in AutoBindTexture (this=0x46dd0100, aRect=<value optimized out>, aClipRect=<value optimized out>, aEffectChain=<value optimized out>, aOpacity=1, aTransform=..., aOffset=...)
    at /media/ramdisk/b2g_central_new/gfx/layers/opengl/CompositorOGL.cpp:981
#3  mozilla::layers::CompositorOGL::DrawQuad (this=0x46dd0100, aRect=<value optimized out>, aClipRect=<value optimized out>, aEffectChain=<value optimized out>, aOpacity=1, aTransform=..., aOffset=...)
    at /media/ramdisk/b2g_central_new/gfx/layers/opengl/CompositorOGL.cpp:1116
#4  0x415bfc2e in mozilla::layers::ContentHostBase::Composite (this=0x4370b8b0, aEffectChain=<value optimized out>, aOpacity=<value optimized out>, aTransform=<value optimized out>, aOffset=..., 
    aFilter=@0x453ff760, aClipRect=..., aVisibleRegion=0x453ff710, aLayerProperties=0x0) at /media/ramdisk/b2g_central_new/gfx/layers/composite/ContentHost.cpp:191
#5  0x415d93ee in mozilla::layers::ThebesLayerComposite::RenderLayer (this=0x474a7800, aOffset=<value optimized out>, aClipRect=<value optimized out>)
    at /media/ramdisk/b2g_central_new/gfx/layers/composite/ThebesLayerComposite.cpp:146
#6  0x415bcdf8 in mozilla::layers::ContainerRender<mozilla::layers::ContainerLayerComposite> (aContainer=0x474a7400, aOffset=..., aManager=<value optimized out>, aClipRect=...)
    at /media/ramdisk/b2g_central_new/gfx/layers/composite/ContainerLayerComposite.cpp:240
#7  0x415bd0a4 in mozilla::layers::ContainerLayerComposite::RenderLayer (this=<value optimized out>, aOffset=..., aClipRect=...)
    at /media/ramdisk/b2g_central_new/gfx/layers/composite/ContainerLayerComposite.cpp:336
#8  0x415bcdf8 in mozilla::layers::ContainerRender<mozilla::layers::ContainerLayerComposite> (aContainer=0x474a6400, aOffset=..., aManager=<value optimized out>, aClipRect=...)
    at /media/ramdisk/b2g_central_new/gfx/layers/composite/ContainerLayerComposite.cpp:240
#9  0x415bd0a4 in mozilla::layers::ContainerLayerComposite::RenderLayer (this=<value optimized out>, aOffset=..., aClipRect=...)
Peter,  I think you are observing the same genlock timeout issue we are seeing. Sotaro has a plan to change the way we cycle front and back buffers to avoid this.
Home Scrolling uses HWC. Most other native app scroll use cases GPU Composition (Rotated buffers for scrolling).
BTW, can we make rotated buffers be usable for HWC?
From the driver side, on 1.2, I'm seeing the following observations:
- Within a frame, there are multiple GL_TEXTURE_2D texture calls being made (across layers as well)
- Every call is using the same texture unit number and the same texture as well
(In reply to peter chang[:pchang] from comment #45)
> In my leo device with latest m-c, I found the compositor took about 10~15 ms
> for LayerManagerComposite::Render() during home scrolling, it took about 0~4
> ms on b2g18.
> 
> After digging more, I found it spent lots of time inside
> GrallocDeprecatedTextureHostOGL::BindTexture() calls, and I didn't see long
> delay on glDrawArray calls.
> http://mxr.mozilla.org/mozilla-central/source/gfx/layers/opengl/
> TextureHostOGL.cpp#1091

Currently, master's gecko layer uses only one Texture. From Tapas analysis, using only one Texture seems to cause a perfomance issue. I am creating a patch using Texture for each TextureHost or CompositableHost. In the morning, it seems easy. But until now, the patch does not work correctly...
Assignee: nobody → bgirard
Assignee: bgirard → sotaro.ikeda.g
Mandyam, please look at bug 915098. That patch should round-robin texture units during composition, more evenly utilizing the hardware. Can you check whether that gains us anything?
Status: NEW → ASSIGNED
Whiteboard: [perf-reviewed][c=handeye p= s= u=] → [c=handeye p= s= u=1.2]
No longer blocks: GFXB2G1.2
(In reply to Mandyam Vikram from comment #48)
> From the driver side, on 1.2, I'm seeing the following observations:
> - Within a frame, there are multiple GL_TEXTURE_2D texture calls being made
> (across layers as well)
> - Every call is using the same texture unit number and the same texture as
> well

Mandyam, could you please elaborate on how the choice of OpenGL texture unit (as in glActiveTexture) has any incidence on performance or has otherwise any bearing on what happens at the hardware level?
Flags: needinfo?(mvikram)
This is a wip patch. The patch allocate texture per CompositableHost. Current master share one textutre among all TextureHosts.
The performance seems to become better than default b2g master.
(In reply to Sotaro Ikeda [:sotaro] from comment #52)
> Created attachment 803199 [details] [diff] [review]
> WIP patch - allocate texture per CompositableHost
> 
> This is a wip patch. The patch allocate texture per CompositableHost.
> Current master share one textutre among all TextureHosts.
> The performance seems to become better than default b2g master.

genlock failure in client side seems different problem than slow OpenGL rendering.
Sotaro can you expand on comment 53?
(In reply to Andreas Gal :gal from comment #54)
> Sotaro can you expand on comment 53?

By applying the patch, FPS of browser scrolling becomes better. But genlock failure are still on the logcat. FPS during scrolling seems good but sometimes UI of the scrolling seems blocked for a short time. It seems caused by the genlock failure.

attachment 803199 [details] [diff] [review] just change the way of sharing texture from all CompositableHosts to each CompositableHost. It does not change the way of gralloc buffers locking. If the genlock failure happens by the logical failure, attachment 803199 [details] [diff] [review] should not fix the genlock failure. And actually, the patch does not fix the genlock failure. Then I am tend to think the cause of the genlock failure is somewhere in the logic around gfx layers. I do not have clear evidence about it. Just my assumption. I am going to investigate about the genlock failure more.
From previous patch, this patch adds support of camera preview and video playback. camera preview and video playback needs Bug 913821 and Bug 901224 fixed.
Attachment #803199 - Attachment is obsolete: true
(In reply to Sotaro Ikeda [:sotaro] from comment #56)
> Created attachment 803362 [details] [diff] [review]
> patch - allocate texture per CompositableHost

As a bonus, the patch revives unagi phones' camera preview. This might be affected from lower drivers task or bypass OpenGL library's defect by using multiple texture.
Depends on: 913821, 901224
(In reply to Sotaro Ikeda [:sotaro] from comment #55)
> (In reply to Andreas Gal :gal from comment #54)
> > Sotaro can you expand on comment 53?
> 
> By applying the patch, FPS of browser scrolling becomes better. But genlock
> failure are still on the logcat. FPS during scrolling seems good but
> sometimes UI of the scrolling seems blocked for a short time. It seems
> caused by the genlock failure.
> 
> attachment 803199 [details] [diff] [review] just change the way of sharing
> texture from all CompositableHosts to each CompositableHost. It does not
> change the way of gralloc buffers locking. If the genlock failure happens by
> the logical failure, attachment 803199 [details] [diff] [review] should not
> fix the genlock failure. And actually, the patch does not fix the genlock
> failure. Then I am tend to think the cause of the genlock failure is
> somewhere in the logic around gfx layers. I do not have clear evidence about
> it. Just my assumption. I am going to investigate about the genlock failure
> more.

Can you please look at https://bugzilla.mozilla.org/show_bug.cgi?id=915078#c1 . I am also seeing that genlock is creating problem both during scroll and app launch. It causes very big stutter for PresShell:Paint() (~1sec durantion Paint !!) .
Depends on: 906715
Blocks: 915064
(In reply to Sotaro Ikeda [:sotaro] from comment #18)
> (In reply to Sotaro Ikeda [:sotaro] from comment #15)
> > About gralloc buffer handling, there might be 2 gralloc usage conflicts.
> > [1] conflict between OpenGL and HwComposer
> > [2] conflict between application and LayerManagerComposite.
> 
> When using gralloc buffers, the user of the buffers need to grab genlock.
> OpenGL rendering and HwCompser rendering could switch dynamically in each
> rendering. If HwComposer or OpenGL continue to keep gralloc buffer's
> genlock, another one can not get genlock and wait until got the genlock is
> freed. If the wait time is longer than genlock timeout, "genlock failure"
> log printed to logcat. I wrote this genlock owner ship competiion as
> "gralloc usage conflicts".
> 
> This "gralloc usage conflicts" has 2 patterns.
> -(1) multiple user try to use gralloc buffers simultaneously.
> -(2) genlock is not released after user.
> 
> In b2g18, Bug 879297, Bug 898919 are related to genlock failure. Both cases
> are related camera/video frame rendering and both are categorized to (2).
> 
> The following page explain details about gralloc buffers.
>  https://wiki.mozilla.org/Platform/GFX/Gralloc

Hi, sotaro.
Did you remember bug 905304, you added a compositionComplete function call in function GonkDisplayJB::SwapBuffers.
The function compositionComplete calls glFinish in gralloc HAL module eventually.
The glFinish function will wait until all GPU commands issued befored glFinish be executed by GPU, and make eglSwapBuffers function be called synchronously. 
This means GraphicBuffers will be unlocked by GPU driver after eglSwapBuffers being called. 
As a result, LayerManagerComposite::Render will not return until all the GPU operations are finished.
So we will see calling LayerManagerComposite::Render taking more time than before.
This also explans that "By disabling HWComposer, scroll becomes smooth and gen lock failure becomes smooth". Because GPU rendering is synchronous while hwcomposer compositing is synchronous or not(I dont know that on qcom platform).
If hwcomposer compositing is not synchronous , we should see GraphicBuffer locking by the app layer before hwcomposer unlocking them.
attachment 803362 [details] [diff] [review] changes the following 2. Both could improve the performance of the rendering.
- Allocate and use a texture for each CompositableHost
    + current master uses only one texture for all CompositableHosts(Layers)
- Create EGLImage during buffer swap
    + current master re-create EGLImage for every rendering
(In reply to Sotaro Ikeda [:sotaro] from comment #57)
> (In reply to Sotaro Ikeda [:sotaro] from comment #56)
> > Created attachment 803362 [details] [diff] [review]
> > patch - allocate texture per CompositableHost
> 
> As a bonus, the patch revives unagi phones' camera preview. This might be
> affected from lower drivers task or bypass OpenGL library's defect by using
> multiple texture.

That is a pretty cool bonus
(In reply to Sotaro Ikeda [:sotaro] from comment #42)
> 
> Only LOCAL_GL_TEXTURE0(0x84C0) is used for rendering for all layers
> rendering if there is no camera preview nor video playback. Therefore only
> one Texture is used for all layers rendering.

It becomes clear that "one Texture is used for all layers" is not correct. Actually one Texture is used for all layers that using gralloc buffers". When out of pmem happens, buffer usage fallbacks to Shmem(for content process) or MemoryImage(for b2g process). And  TextureImageDeprecatedTextureHostOGL is used as TextureHost. The TextureHost allocate TiledTextureImage for rendering. The TiledTextureImage allocate and use own texture. It does not use CompositorOGL's shared temporary Texture.

Then, there could be a case that when out of pmem happens, a previous bounded gralloc buffer does not unbounded by new buffer binding. Next TextureHost under same Compositor might use different texture, when out of pmem happens.
(In reply to peter chang[:pchang] from comment #45)
> In my leo device with latest m-c, I found the compositor took about 10~15 ms
> for LayerManagerComposite::Render() during home scrolling, it took about 0~4
> ms on b2g18.

I always using hamachi for debugging. The hw difference might make the difference to the remaining pmem amount.
When I see a genlock failure on my device with HWComposer enabled, previous HWCompositor's rendering always output "ThebesLayerComposite Layer doesn't have a gralloc buffer". 

---------------------------------------------------
09-12 11:03:07.649   144   265 D HWComposer: ThebesLayerComposite Layer doesn't have a gralloc buffer
09-12 11:03:07.649   144   265 D HWComposer: Render aborted. Nothing was drawn to the screen
09-12 11:03:07.949   539   539 E libgenlock: perform_lock_unlock_operation: GENLOCK_IOC_DREADLOCK failed (lockType0x1, err=Connection timed out fd=44)
09-12 11:03:07.949   539   539 E msm7627a.gralloc: gralloc_lock: genlock_lock_buffer (lockType=0x2) failed
09-12 11:03:07.949   539   539 W GraphicBufferMapper: lock(...) failed -22 (Invalid argument)
09-12 11:03:07.949   539   539 E GraphicBufferMapper: lock(...) failed
(In reply to Benoit Jacob [:bjacob] from comment #51)
> (In reply to Mandyam Vikram from comment #48)
> Mandyam, could you please elaborate on how the choice of OpenGL texture unit
> (as in glActiveTexture) has any incidence on performance or has otherwise
> any bearing on what happens at the hardware level?

Please refer to  Comment 62  (Sotaro Ikeda [:sotaro] 2013-09-12 03:22:53 PDT) also. When the same texture is re-used multiple times within the same frame the driver needs to wait until the HW is done with it.
Flags: needinfo?(mvikram)
(In reply to Mandyam Vikram from comment #65)
> (In reply to Benoit Jacob [:bjacob] from comment #51)
> > (In reply to Mandyam Vikram from comment #48)
> > Mandyam, could you please elaborate on how the choice of OpenGL texture unit
> > (as in glActiveTexture) has any incidence on performance or has otherwise
> > any bearing on what happens at the hardware level?
> 
> Please refer to  Comment 62  (Sotaro Ikeda [:sotaro] 2013-09-12 03:22:53
> PDT) also. When the same texture is re-used multiple times within the same
> frame the driver needs to wait until the HW is done with it.

Sure. I understand that there can be lock contention when multiple things try to lock the same texture. But comment 48 used the phrase "texture unit" so it sounded like it was about OpenGL texture units in the sense of glActiveTexture. Can you confirm that by "texture unit" you meant "texture object" i.e. the thing passed to glBindTexture, not the thing passed to glActiveTexture? Sorry if that sounds obvious, but it seems to have confused people into thinkinf that we should do more glActiveTexture calls to use more units...
Flags: needinfo?(mvikram)
From andreas's comment, I am going to land the only FPS improvement part in this bug. This performance improvement is very important for v1.2. But genlock failure bug is still present. The genlock failure problem is going to be handled in a different bug.
Fix nits and build problem in other platform.
Attachment #803362 - Attachment is obsolete: true
Previous patch was old patch. Replace it.
Attachment #803850 - Attachment is obsolete: true
Comment on attachment 803855 [details] [diff] [review]
patch v3 - allocate texture per CompositableHost

Jeff, can you review the patch soon? The patch is very hackey and created very quickly. For v1.2, OpenGL rendering performance is very low. To address this, we need to commit the fix as soon as possible.
Attachment #803855 - Flags: review?(jmuizelaar)
Attachment #803855 - Flags: review?(nical.bugzilla)
Comment on attachment 803855 [details] [diff] [review]
patch v3 - allocate texture per CompositableHost

Review of attachment 803855 [details] [diff] [review]:
-----------------------------------------------------------------

This patch is very important to review well, also flagging :nical and myself.
Attachment #803855 - Flags: review?(bjacob)
I updated the patch to fix the build failure on other platform. But the latest patch failed to build on more platforms :-(
- before
  https://tbpl.mozilla.org/?tree=Try&rev=41eb4599fefc
- after
  https://tbpl.mozilla.org/?tree=Try&rev=d08f677c843a
Comment on attachment 803855 [details] [diff] [review]
patch v3 - allocate texture per CompositableHost

Review of attachment 803855 [details] [diff] [review]:
-----------------------------------------------------------------

::: b2g/app/b2g.js
@@ +72,5 @@
>  pref("mozilla.widget.force-24bpp", true);
>  pref("mozilla.widget.use-buffer-pixmap", true);
>  pref("mozilla.widget.disable-native-theme", true);
>  pref("layout.reflow.synthMouseMove", false);
> +pref("layers.force-tiles", false);

This was probably unintentional.

::: gfx/layers/opengl/TextureHostOGL.cpp
@@ +191,5 @@
> +  if (!mTexture) {
> +    gl()->MakeCurrent();
> +    gl()->fGenTextures(1, &mTexture);
> +  }
> +  return mTexture;

I think this mTexture can just live in TextureHostOGL and then we won't need the CompositableQuirksGonkOGL. If it's easier we can avoid using the temporary texture on all platforms and not just GonkOGL.
(In reply to Jeff Muizelaar [:jrmuizel] from comment #73)
> I think this mTexture can just live in TextureHostOGL and then we won't need
> the CompositableQuirksGonkOGL. If it's easier we can avoid using the
> temporary texture on all platforms and not just GonkOGL.

This comment was based on a misunderstanding and can be ignored.
Comment on attachment 803855 [details] [diff] [review]
patch v3 - allocate texture per CompositableHost

Review of attachment 803855 [details] [diff] [review]:
-----------------------------------------------------------------

::: b2g/app/b2g.js
@@ +72,5 @@
>  pref("mozilla.widget.force-24bpp", true);
>  pref("mozilla.widget.use-buffer-pixmap", true);
>  pref("mozilla.widget.disable-native-theme", true);
>  pref("layout.reflow.synthMouseMove", false);
> +pref("layers.force-tiles", false);

This probably was unintentional.

::: gfx/layers/opengl/GrallocTextureHost.cpp
@@ +181,5 @@
> +  gl()->fBindTexture(textureTarget, tex);
> +  if (!mEGLImage) {
> +    mEGLImage = gl()->CreateEGLImageForNativeBuffer(mGraphicBuffer->getNativeBuffer());
> +  }
> +  gl()->fEGLImageTargetTexture2D(textureTarget, mEGLImage);

It's worth adding a comment about why we need to do this early. i.e. include what Codeaurora said about how the EGL stuff works.
Attachment #803855 - Flags: review?(jmuizelaar) → review+
Nical's review can happen after this lands if you're in a hurry.
Comment on attachment 803855 [details] [diff] [review]
patch v3 - allocate texture per CompositableHost

Review of attachment 803855 [details] [diff] [review]:
-----------------------------------------------------------------

A comment to maybe help other people reading this code, from talking with Sotaro and reading this code with Jeff:

The fundamental change of approach in this patch is that now the temporary texture is per-compositable.

Originally, the temporary texture was per-TextureHost. That was too many temporary textures.
With Nical's work in bug 875211, that switched to having a temporary texture per compositor only. That's what turned out to be too few.
Now we have one per compositable which is fewer than one per TextureHost, because e.g. a ContentHost, which is a single Compositable, may have 2 TextureHosts to implement double-buffering.
Attachment #803855 - Flags: review?(bjacob)
Attachment #803855 - Flags: review?
Attachment #803855 - Flags: review+
Attachment #803855 - Flags: review? → review+
Attachment #803930 - Flags: review?(sotaro.ikeda.g)
Comment on attachment 803930 [details] [diff] [review]
Fix the mac-build by moving things out of headers

Thanks!
Attachment #803930 - Flags: review?(sotaro.ikeda.g) → review+
Apply comments from jrmuizel and bjacob. Committable patch.
Attachment #803855 - Attachment is obsolete: true
Attachment #803930 - Attachment is obsolete: true
Attachment #803855 - Flags: review?(nical.bugzilla)
Attachment #804019 - Flags: review+
Comment on attachment 804019 [details] [diff] [review]
patch v4 - allocate texture per CompositableHost

:nical, can you review the patch as a post review?
Attachment #804019 - Flags: review?(nical.bugzilla)
I profiled with your patch Sotaro:
http://people.mozilla.org/~bgirard/cleopatra/#report=78c4868fb762827f587f9ae35db6e2ccb47ce9be

We're no longer blocking in:
Gralloc::BindTexture
CompositorOGL::DrawQuad
ThebesLayerComposite::RenderLayer
LayerManagerComposite::Render
CompositorParent::Composite

Compositor time is moved to 'LayerManagerComposite::EndFrame' which is much better. I'm running at 53 FPS solid with this patch, HWC off and tiling off.
By applying the patch, EGLImage creations are reduced from each rendering to each buffer swap. It is still redundant. In android, EGLImage is created only once during GraphicBuffer's lifetime. b2g should be same as android does, but it needs more simplification and code changes around gfx layers.
Comment on attachment 804019 [details] [diff] [review]
patch v4 - allocate texture per CompositableHost

Review of attachment 804019 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/composite/CompositableHost.h
@@ +59,5 @@
>  
>  /**
> + * A base class for doing CompositableHost and platform dependent task on TextureHost.
> + */
> +class CompositableQuirks : public RefCounted<CompositableQuirks>

This class really needs a more explicit name but we can rename it later. Let's not spend time debating names now, this looks like an important patch.

::: gfx/layers/ipc/CompositableTransactionParent.cpp
@@ +242,5 @@
> +      // set CompositableQuirks
> +      // on gonk, create EGLImage if possible.
> +      // create EGLImage during buffer swap could reduce the graphic driver's task
> +      // during rendering.
> +      tex->SetCompositableQuirks(compositable->GetCompositableQuirks());

It looks like this call should be in AddTextureHost

::: gfx/layers/opengl/TextureHostOGL.cpp
@@ +1122,2 @@
>    DeleteTextures();
> +#if 1

Remove the useless #if 1 before landing

@@ +1156,5 @@
>    MOZ_ASSERT(gl());
>    gl()->MakeCurrent();
>  
> +  mQuirks->SetCompositor(mCompositor);
> +  GLuint tex = static_cast<CompositableQuirksGonkOGL*>(mQuirks.get())->GetTexture();

I see a pattern of doing the two lines above in many places. You should wrap this in a GetGLTexture() method in the Gralloc TextureHost classes.
Attachment #804019 - Flags: review?(nical.bugzilla) → review+
https://hg.mozilla.org/mozilla-central/rev/732bb76aad5a
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
nical, thanks for the post review. I created Bug 916116 as a follow up.
Bug 916264 is created for the genlock failure bug.
Flags: needinfo?(mvikram)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: