Closed Bug 925616 Opened 11 years ago Closed 11 years ago

Reduce gralloc buffer reallocation when scrolling homescreen

Categories

(Core :: Graphics, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

()

RESOLVED WONTFIX
blocking-b2g koi+

People

(Reporter: pchang, Assigned: pchang)

References

Details

Attachments

(4 files)

Per bug920921 comment 40, create this bug to check why gralloc buffer got reallocation when scrolling homescreen from page 1 to page 4.



https://bugzilla.mozilla.org/show_bug.cgi?id=920921#c40

[log during scrolling homescreen]
10-09 10:13:57.760  2388  2388 I Gecko   : time(406) takes 32.596ms w 768 h 1141
10-09 10:13:57.780  2388  2388 I Gecko   : buffer created 0x43e83880
10-09 10:13:57.780  2388  2388 I Gecko   : time(406) takes 17.213ms w 768 h 1141
10-09 10:13:58.260  2388  2388 I Gecko   : buffer created 0x43e83d00
10-09 10:13:58.260  2388  2388 I Gecko   : time(406) takes 33.359ms w 768 h 1141
10-09 10:13:58.280  2388  2388 I Gecko   : buffer created 0x43e83e00
10-09 10:13:58.280  2388  2388 I Gecko   : time(406) takes 16.481ms w 768 h 1141
10-09 10:13:58.771  2388  2388 I Gecko   : buffer created 0x43e83900
10-09 10:13:58.771  2388  2388 I Gecko   : time(406) takes 32.901ms w 768 h 1141
10-09 10:13:58.801  2388  2388 I Gecko   : buffer created 0x43e83980
10-09 10:13:58.801  2388  2388 I Gecko   : time(406) takes 18.892ms w 768 h 1141
10-09 10:13:59.261  2388  2388 I Gecko   : buffer created 0x43e83a00
10-09 10:13:59.261  2388  2388 I Gecko   : time(406) takes 36.563ms w 768 h 1141
10-09 10:13:59.281  2388  2388 I Gecko   : buffer created 0x43e83d00
10-09 10:13:59.281  2388  2388 I Gecko   : time(406) takes 15.626ms w 768 h 1141
10-09 10:13:59.762  2388  2388 I Gecko   : buffer created 0x43e83780
10-09 10:13:59.762  2388  2388 I Gecko   : time(406) takes 32.962ms w 768 h 1141
10-09 10:13:59.782  2388  2388 I Gecko   : buffer created 0x43e83880
10-09 10:13:59.782  2388  2388 I Gecko   : time(406) takes 12.544ms w 768 h 1141
Blocks: 920921
During the home scrolling (from page 1 to page 3), we had the following two problems.

a. Expensive cost for graphic buffer allocation
   This is the known issue that b2g shares the same thread for buffer allocation and compositon.
   Therefore, it is possible that buffer allocation has long latency.
   For example, it takes about 27.59ms for buffer allocation through IPC.

05:44:04.793  2558  2558 I Gecko   : ISurfaceAllocator::PlatformAllocSurfaceDescriptor start
05:44:04.793  2449  2476 I LM      :   ----- (beginning paint)
05:44:04.803  2449  2476 I Gecko   :     ContainerLayerComposite (0x47b85000) 
05:44:04.813  2449  2476 I Gecko   : GrallocBufferActor::Create start
05:44:04.823  2449  2476 I Gecko   : GrallocBufferActor::Create 0x45732580 buffer 0x44babd80 w768 h1140
05:44:04.823  2558  2558 I Gecko   : ISurfaceAllocator::PlatformAllocSurfaceDescriptor end
05:44:04.823  2558  2558 I Gecko   : time(444) takes 27.590ms w 768 h 1140

b. Requires new buffers(one or two fullscreen) when scroll homescreen to previous or next pages.
   And it destroys these buffers when homescreen is frozen on one page.

   Therefore, the compositor thread becomes very busy for buffer allocation and composition.

   You can see the following detail layer dump of homescreen.

[homescreen on one page]
#1 ContentHostDoubleBuffered (0x45f28550) [buffer-rect=(x=0, y=0, w=768, h=1140)] homescreen background

[scroll homescreen bwteen two pages ]
#1  ContentHostDoubleBuffered (0x45f28550) [buffer-rect=(x=0, y=0, w=768, h=1140)]  homescreen background
#2  ContentHostDoubleBuffered (0x45f27e90) [buffer-rect=(x=0, y=0, w=768, h=1141)]  Left pages
#3  ContentHostDoubleBuffered (0x4656c670) [buffer-rect=(x=0, y=0, w=768, h=1141)]  Right pages
#4  ContentHostDoubleBuffered (0x4656c820) [buffer-rect=(x=512, y=0, w=256, h=32)]  small Color index

[homescreen on one page]
05:44:00.468  2449  2476 I LM      :   ----- (beginning paint)
05:44:00.468  2449  2476 I Gecko   :         RefLayerComposite (0x4833ac00) [shadow-clip=(x=0, y=40, w=768, h=1140)] [shadow-transform=[ 1 0; 0 1; 0 40; ]] [shadow-visible=< (x=0, y=0, w=768, h=1140); >] [clip=(x=0, y=40, w=768, h=1140)] [transform=[ 1 0; 0 1; 0 40; ]] [visible=< (x=0, y=0, w=7      68, h=1140); >] [isFixedPosition anchor=0.000000,0.000000] [id=3]
05:44:00.468  2449  2476 I Gecko   :           ContainerLayerComposite (0x47bcac00) [shadow-visible=< (x=0, y=0, w=768, h=1140); >] [visible=< (x=0, y=0, w=768, h=1140); >] [componentAlpha] [metrics={ viewport=(x=0.000000, y=0.000000, w=384.000000, h=570.000000) viewportScroll=(x=0.000000, y=0.      000000) displayport=(x=0.000000, y=0.000000, w=0.000000, h=0.000000) scrollId=1 }]
05:44:00.468  2449  2476 I Gecko   :             ThebesLayerComposite (0x47bcb000) [shadow-visible=< (x=0, y=0, w=768, h=1140); >] [visible=< (x=0, y=0, w=768, h=1140); >] [componentAlpha] [isFixedPosition anchor=0.000000,0.000000] [valid=< (x=0, y=0, w=768, h=1140); >]
05:44:00.468  2449  2476 I Gecko   :               ContentHostDoubleBuffered (0x45f28550) [buffer-rect=(x=0, y=0, w=768, h=1140)] [buffer-rotation=(x=0, y=0)]
05:44:00.468  2449  2476 I Gecko   :                 GrallocDeprecatedTextureHostOGL (0x4712c300) [size=(width=768, height=1140)] [format=FORMAT_B8G8R8A8] [flags=TEXTURE_USE_NEAREST_FILTER]buffer=0x45f54980
05:44:00.468  2449  2476 I Gecko   :                 GrallocDeprecatedTextureHostOGL (0x47132f40) [size=(width=768, height=1140)] [format=FORMAT_B8G8R8A8] [flags=TEXTURE_USE_NEAREST_FILTER]buffer=0x47425880

[scroll homescreen bwteen two pages ]
05:44:04.042  2449  2476 I LM      :   ----- (beginning paint)
05:44:04.042  2449  2476 I Gecko   :         RefLayerComposite (0x4833ac00) [shadow-clip=(x=0, y=40, w=768, h=1140)] [shadow-transform=[ 1 0; 0 1; 0 40; ]] [shadow-visible=< (x=0, y=0, w=768, h=1140); >] [clip=(x=0, y=40, w=768, h=1140)] [transform=[ 1 0; 0 1; 0 40; ]] [visible=< (x=0, y=0, w=7      68, h=1140); >] [isFixedPosition anchor=0.000000,0.000000] [id=3]
05:44:04.042  2449  2476 I Gecko   :           ContainerLayerComposite (0x47bcac00) [shadow-visible=< (x=0, y=0, w=768, h=1140); >] [visible=< (x=0, y=0, w=768, h=1140); >] [componentAlpha] [metrics={ viewport=(x=0.000000, y=0.000000, w=384.000000, h=570.000000) viewportScroll=(x=0.000000, y=0.      000000) displayport=(x=0.000000, y=0.000000, w=0.000000, h=0.000000) scrollId=1 }]
05:44:04.042  2449  2476 I Gecko   :             ThebesLayerComposite (0x47bcb000) [shadow-visible=< (x=0, y=0, w=768, h=990); (x=0, y=1138, w=2, h=2); >] [visible=< (x=0, y=0, w=768, h=990); (x=0, y=1138, w=2, h=2); >] [isFixedPosition anchor=0.000000,0.000000] [valid=< (x=0, y=0, w=768, h=990      ); (x=0, y=1138, w=2, h=2); >]
05:44:04.042  2449  2476 I Gecko   :               ContentHostDoubleBuffered (0x45f28550) [buffer-rect=(x=0, y=0, w=768, h=1140)] [buffer-rotation=(x=0, y=0)]
05:44:04.042  2449  2476 I Gecko   :                 GrallocDeprecatedTextureHostOGL (0x47132f40) [size=(width=768, height=1140)] [format=FORMAT_B8G8R8A8] [flags=TEXTURE_USE_NEAREST_FILTER]buffer=0x47425880
05:44:04.042  2449  2476 I Gecko   :                 GrallocDeprecatedTextureHostOGL (0x4712c300) [size=(width=768, height=1140)] [format=FORMAT_B8G8R8A8] [flags=TEXTURE_USE_NEAREST_FILTER]buffer=0x45f54980
05:44:04.052  2449  2476 I Gecko   :             ContainerLayerComposite (0x44b2c800) [shadow-clip=(x=0, y=0, w=768, h=1140)] [shadow-transform=[ 1 0; 0 1; -736 0; ]] [shadow-visible=< (x=0, y=0, w=768, h=1141); >] [clip=(x=0, y=0, w=768, h=1140)] [transform=[ 1 0; 0 1; -736 0; ]] [visible=< (x      =0, y=0, w=768, h=1141); >] [componentAlpha] [isFixedPosition anchor=0.000000,0.000000]
 3523 10-11 05:44:04.052  2449  2476 I Gecko   :               ThebesLayerComposite (0x46d18000) [shadow-visible=< (x=0, y=0, w=768, h=1141); >] [visible=< (x=0, y=0, w=768, h=1141); >] [componentAlpha] [isFixedPosition anchor=0.000000,0.000000] [valid=< (x=0, y=0, w=768, h=1141); >]
05:44:04.052  2449  2476 I Gecko   :                 ContentHostDoubleBuffered (0x45f27e90) [buffer-rect=(x=0, y=0, w=768, h=1141)] [buffer-rotation=(x=0, y=0)] [paint-will-resample]
05:44:04.052  2449  2476 I Gecko   :                   GrallocDeprecatedTextureHostOGL (0x47187640) [size=(width=768, height=1141)] [format=FORMAT_B8G8R8A8] [flags=NoFlags]buffer=0x44baac80
05:44:04.052  2449  2476 I Gecko   :                   GrallocDeprecatedTextureHostOGL (0x4712ee80) [size=(width=0, height=0)] [format=FORMAT_UNKNOWN] [flags=NoFlags]buffer=0x44babe80
05:44:04.052  2449  2476 I Gecko   :             ContainerLayerComposite (0x46df6c00) [shadow-clip=(x=0, y=0, w=768, h=1140)] [shadow-transform=[ 1 0; 0 1; 32 0; ]] [shadow-visible=< (x=0, y=0, w=768, h=1141); >] [clip=(x=0, y=0, w=768, h=1140)] [transform=[ 1 0; 0 1; 32 0; ]] [visible=< (x=0,       y=0, w=768, h=1141); >] [componentAlpha] [isFixedPosition anchor=0.000000,0.000000]
 3528 10-11 05:44:04.052  2449  2476 I Gecko   :               ThebesLayerComposite (0x47172800) [shadow-visible=< (x=0, y=0, w=768, h=1141); >] [visible=< (x=0, y=0, w=768, h=1141); >] [componentAlpha] [isFixedPosition anchor=0.000000,0.000000] [valid=< (x=0, y=0, w=768, h=1141); >]
05:44:04.052  2449  2476 I Gecko   :                 ContentHostDoubleBuffered (0x4656c670) [buffer-rect=(x=0, y=0, w=768, h=1141)] [buffer-rotation=(x=0, y=0)] [paint-will-resample]
05:44:04.052  2449  2476 I Gecko   :                   GrallocDeprecatedTextureHostOGL (0x47421440) [size=(width=768, height=1141)] [format=FORMAT_B8G8R8A8] [flags=NoFlags]buffer=0x4572f700
05:44:04.052  2449  2476 I Gecko   :                   GrallocDeprecatedTextureHostOGL (0x471b3ac0) [size=(width=0, height=0)] [format=FORMAT_UNKNOWN] [flags=NoFlags]buffer=0x4572fe00
05:44:04.052  2449  2476 I Gecko   :             ThebesLayerComposite (0x4740e400) [shadow-visible=< (x=512, y=0, w=256, h=6); >] [visible=< (x=512, y=0, w=256, h=6); >] [opaqueContent] [isFixedPosition anchor=0.000000,0.000000] [valid=< (x=512, y=0, w=256, h=6); >]
05:44:04.052  2449  2476 I Gecko   :               ContentHostDoubleBuffered (0x4656c820) [buffer-rect=(x=512, y=0, w=256, h=32)] [buffer-rotation=(x=0, y=0)]
05:44:04.052  2449  2476 I Gecko   :                 GrallocDeprecatedTextureHostOGL (0x47421c80) [size=(width=256, height=32)] [format=FORMAT_B8G8R8X8] [flags=TEXTURE_USE_NEAREST_FILTER]buffer=0x45f55080
05:44:04.052  2449  2476 I Gecko   :                 GrallocDeprecatedTextureHostOGL (0x47421b40) [size=(width=0, height=0)] [format=FORMAT_UNKNOWN] [flags=TEXTURE_USE_NEAREST_FILTER]buffer=0x45fa9780
The long-term is to fixed problem a (Expensive cost for graphic buffer allocation) by using separate thread to handle buffer allocation.

If we didn't change the homescreen behavior, we may need to consider buffer recycle on content side, like cache non-used textureclient in clientlayermanager for next time buffer request.

But it is a kind of workaround of problem a to reduce the buffer IPC traffics.

If homescreen draws three pages by default, then we won't meet the buffer reallocation problem but it takes much more memory.
put detail log to pastebin for readability

http://www.pastebin.mozilla.org/3236645
(In reply to peter chang[:pchang] from comment #1)
> During the home scrolling (from page 1 to page 3), we had the following two
> problems.
> 
> a. Expensive cost for graphic buffer allocation
>    This is the known issue that b2g shares the same thread for buffer
> allocation and compositon.
>    Therefore, it is possible that buffer allocation has long latency.
>    For example, it takes about 27.59ms for buffer allocation through IPC.

When content process sends a buffer allocation ipc to b2g, we might see the log that b2g composes one or two frame before handing the allocation request.
The ipc call is not the first priority.
(In reply to peter chang[:pchang] from comment #2)
> The long-term is to fixed problem a (Expensive cost for graphic buffer
> allocation) by using separate thread to handle buffer allocation.
How long it takes to move buffer allocation to another shared-context thread? Please have an evaluation on that.
> If we didn't change the homescreen behavior, we may need to consider buffer
> recycle on content side, like cache non-used textureclient in
> clientlayermanager for next time buffer request.
If we can find more generic scenarios that need buffer cache mechanism in framework side, then, we should do it.
Cache is not hard to implement, what difficult is to find a correct time to release those cached buffers.
> 
> But it is a kind of workaround of problem a to reduce the buffer IPC
> traffics.
> 
> If homescreen draws three pages by default, then we won't meet the buffer
> reallocation problem but it takes much more memory.
I think we should
1. Fix expensive allocation time, caused by our threading modal(you and Jerry had point out the problem): create a WIP first.
2. Check with Gaia team on change page behaviour: always make three page visible to prevent two big buffer allocation while page swap begin.
Did we investigate why we were hitting buffer rotation? That should never ever be the case.

Caching buffers on the content side is probably doable. We can flush them when the app goes into the background. Also, we don't really need pmem buffers. If we use ashmem we might be able to flip on auto purging on memory pressure. If not we can use the low memory notification. However, I am not sure how well this works. It will work for re-allocation of the same buffer size (animation), but it might not always work.

I saw above that we are compositing twice before servicing the allocation. Thats really weird. Why is that the case? Should we maybe scan the event queue and prioritize allocations over the timer?
(In reply to Andreas Gal :gal from comment #6)
> Did we investigate why we were hitting buffer rotation? That should never
> ever be the case.
> 
Jerry is checking this problem and will post another bug for buffer rotation.

> Caching buffers on the content side is probably doable. We can flush them
> when the app goes into the background. Also, we don't really need pmem
> buffers. If we use ashmem we might be able to flip on auto purging on memory
> pressure. If not we can use the low memory notification. However, I am not
> sure how well this works. It will work for re-allocation of the same buffer
> size (animation), but it might not always work.

Good idea, I will consider to add flush cached buffer when app receives low memory 
notification or goes to background.

> 
> I saw above that we are compositing twice before servicing the allocation.
> Thats really weird. Why is that the case? Should we maybe scan the event
> queue and prioritize allocations over the timer?

In the following log, I only saw one time composition before buffer allocation.
During my testing, some times buffer allocation could be serviced asap but sometimes serviced after composition.

I can add log for compositor thread message handling to check the order of buffer allocation and composition request.
 
05:44:04.793  2558  2558 I Gecko   : ISurfaceAllocator::PlatformAllocSurfaceDescriptor start
05:44:04.793  2449  2476 I LM      :   ----- (beginning paint)
05:44:04.803  2449  2476 I Gecko   :     ContainerLayerComposite (0x47b85000) 
05:44:04.813  2449  2476 I Gecko   : GrallocBufferActor::Create start
05:44:04.823  2449  2476 I Gecko   : GrallocBufferActor::Create 0x45732580 buffer 0x44babd80 w768 h1140
05:44:04.823  2558  2558 I Gecko   : ISurfaceAllocator::PlatformAllocSurfaceDescriptor end
05:44:04.823  2558  2558 I Gecko   : time(444) takes 27.590ms w 768 h 1140
Assignee: nobody → pchang
(In reply to peter chang[:pchang] from comment #7)
> (In reply to Andreas Gal :gal from comment #6)
> > Did we investigate why we were hitting buffer rotation? That should never
> > ever be the case.
> > 
> Jerry is checking this problem and will post another bug for buffer rotation.
> 
I just updated the root cause of bufferrotation calls in below comment.

https://bugzilla.mozilla.org/show_bug.cgi?id=920921#c61
I'm able to reduce expensive gralloc buffer allocation during homescreen scrolling. But the scrolling performance is still not good. Will update gecko profiler soon.
Can you please explain comment #9 a bit more.
(In reply to peter chang[:pchang] from comment #9)
> I'm able to reduce expensive gralloc buffer allocation during homescreen
> scrolling. But the scrolling performance is still not good. Will update
> gecko profiler soon.

This optimization is supposed to be effective to all scenarios, right ?
There's another issue about Browser(Bug 925656) suffered by expensive gralloc buffer allocation.
I modified DrawFPS to output the average FPS and std value to profile the improvement.
http://mxr.mozilla.org/mozilla-central/source/gfx/layers/opengl/CompositorOGL.cpp#201

Please refer the FPS data with/without my local patch below.

It shows my patch could reduce the stutter during the homescreen scrolling (minfps/std value)
But it couldn't improve the overall FPS data.

a. With my patch
   no gralloc allocation when scroll page 1 to page 3

10-15 05:03:52.877  9166  9193 I Gecko   : FPS avgfps 36 std 8.288547 minfps 16
10-15 05:04:01.487  9166  9193 I Gecko   : FPS avgfps 35 std 8.567581 minfps 16
10-15 05:04:09.986  9166  9193 I Gecko   : FPS avgfps 36 std 8.682722 minfps 16

b. Without my patch
   lots of gralloc allocation when scroll page 1 to page 3
10-15 05:02:13.531  8771  8798 I Gecko   : FPS avgfps 35 std 12.630696 minfps 8
10-15 05:02:23.552  8771  8798 I Gecko   : FPS avgfps 35 std 12.292274 minfps 8
10-15 05:02:33.653  8771  8798 I Gecko   : FPS avgfps 35 std 12.586390 minfps 4
10-15 05:02:42.092  8771  8798 I Gecko   : FPS avgfps 39 std 12.636428 minfps 5

Now I saw some image garbage when I reuse previous gralloc. It may impact the profiling result.
I will try to fix it and provide gecko profiler soon.


> This optimization is supposed to be effective to all scenarios, right ?
> There's another issue about Browser(Bug 925656) suffered by expensive
> gralloc buffer allocation.

Do you have the buffer size for browser? Does the size change frequently? Because now I only cache gralloc close to fullscreen, otherwise we cache too much different size gralloc.
(In reply to peter chang[:pchang] from comment #12)
> > This optimization is supposed to be effective to all scenarios, right ?
> > There's another issue about Browser(Bug 925656) suffered by expensive
> > gralloc buffer allocation.
> 
> Do you have the buffer size for browser? Does the size change frequently?
> Because now I only cache gralloc close to fullscreen, otherwise we cache too
> much different size gralloc.

I just thought you were fixing item 2 of this comment(https://bugzilla.mozilla.org/show_bug.cgi?id=920921#c55). To reduce the time for allocating gralloc buffer.

It seems you were actually fixing item 3(reduce re-allocation).
> > 
> > Do you have the buffer size for browser? Does the size change frequently?
> > Because now I only cache gralloc close to fullscreen, otherwise we cache too
> > much different size gralloc.

I just check bug 925656. I saw browser tried to create a lots of bigger gralloc buffers with different size on Nexus 4 and it took long time to get new buffer. And I expect browser requests bigger gralloc especially you slide the screen faster.


10-15 22:59:07.109  1181  1181 I Gecko   : time(442) takes 25.210ms w 1150 h 1992
10-15 22:59:07.119   756   756 I Gecko   : time(442) takes 8.210ms w 768 h 1140
10-15 22:59:07.149   756   756 I Gecko   : time(442) takes 14.833ms w 768 h 1140
10-15 22:59:07.149  1181  1181 I Gecko   : time(442) takes 40.317ms w 1150 h 1992
10-15 22:59:07.189   756   756 I Gecko   : time(442) takes 9.004ms w 748 h 929
10-15 22:59:07.199   756   756 I Gecko   : time(442) takes 8.424ms w 748 h 929
10-15 22:59:07.319   756   756 I Gecko   : time(442) takes 1.648ms w 256 h 32
10-15 22:59:07.319   756   756 I Gecko   : time(442) takes 1.831ms w 256 h 32
10-15 22:59:07.369   756   756 I Gecko   : time(442) takes 16.176ms w 768 h 1140
10-15 22:59:07.379   756   756 I Gecko   : time(442) takes 9.461ms w 768 h 1140
10-15 22:59:11.934   166   166 I Gecko   : time(442) takes 1.160ms w 109 h 735
10-15 22:59:11.934   166   166 I Gecko   : time(442) takes 1.465ms w 109 h 735
10-15 22:59:11.944   166   166 I Gecko   : time(442) takes 1.831ms w 156 h 981
10-15 22:59:11.944   166   166 I Gecko   : time(442) takes 1.862ms w 156 h 981
10-15 22:59:12.034   820   820 I Gecko   : time(442) takes 38.395ms w 384 h 590
10-15 22:59:12.234   166   166 I Gecko   : time(442) takes 37.174ms w 637 h 981
10-15 22:59:12.254   166   166 I Gecko   : time(442) takes 19.350ms w 637 h 981
10-15 22:59:12.364   166   166 I Gecko   : time(442) takes 21.334ms w 478 h 735
10-15 22:59:12.384   166   166 I Gecko   : time(442) takes 16.237ms w 478 h 735
10-15 22:59:13.375  1181  1181 I Gecko   : time(442) takes 44.163ms w 1150 h 1992
10-15 22:59:13.425  1181  1181 I Gecko   : time(442) takes 45.048ms w 1150 h 1992
10-15 22:59:14.316  1181  1181 I Gecko   : time(442) takes 25.057ms w 1150 h 1994
10-15 22:59:14.346  1181  1181 I Gecko   : time(442) takes 29.391ms w 1150 h 1994
10-15 22:59:14.537  1181  1181 I Gecko   : time(442) takes 46.208ms w 1150 h 2708
10-15 22:59:14.567  1181  1181 I Gecko   : time(442) takes 32.931ms w 1150 h 2708
10-15 22:59:14.887  1181  1181 I Gecko   : time(442) takes 62.353ms w 989 h 2846
10-15 22:59:14.917  1181  1181 I Gecko   : time(442) takes 32.901ms w 989 h 2846
10-15 22:59:15.217  1181  1181 I Gecko   : time(442) takes 50.053ms w 958 h 2846
10-15 22:59:15.247  1181  1181 I Gecko   : time(442) takes 32.626ms w 958 h 2846


> 
> It seems you were actually fixing item 3(reduce re-allocation).

This bug is trying to fix 30 ms frequently gralloc buffer requests from homescreen (it is items 2 for sure).
Do I understand it correctly that allocating big gralloc buffers is slow, but small ones is fast? That changes our theory a bit. We thought its a latency problem. It seems this is really more a driver issue. Peter what do you think?
(In reply to Andreas Gal :gal from comment #15)
> Do I understand it correctly that allocating big gralloc buffers is slow,
> but small ones is fast? That changes our theory a bit. We thought its a
> latency problem. It seems this is really more a driver issue. Peter what do
> you think?

Yes, the cost of large gralloc creation is expensive than small grallc one. 
(GrallocBufferActor::Create before to after new GraphicBuffer)

But our IPC for gralloc allocation also have cost because we are sharing the same thread with compositor.
(20 ms cost between (10-16 00:00:14.002  7856) to (10-16 00:00:14.022  7236))

For browser case, could we allocate the largest buffer to fit different buffer size request? Otherwise, I guess gralloc driver keeps generating different size of buffer pool.

7236 b2g 
7357 homescreen
7856 broswer 

10-15 23:54:12.907  7357  7357 I Gecko   : ISurfaceAllocator::PlatformAllocSurfaceDescriptor(448) start
10-15 23:54:12.907  7236  7263 I Gecko   : GrallocBufferActor::Create(260) before new GraphicBuffer
10-15 23:54:12.907  7236  7263 I Gecko   : GrallocBufferActor::Create(265) after new GraphicBuffer
10-15 23:54:12.907  7236  7263 I Gecko   : GrallocBufferActor::Create 0x47221520 buffer 0x48ef9800 w256 h32 size 32768 finish
10-15 23:54:12.917  7357  7357 I Gecko   : ISurfaceAllocator::PlatformAllocSurfaceDescriptor(495) end
10-15 23:54:12.917  7357  7357 I Gecko   : time(444) takes 2.564ms w 256 h 32


10-16 00:00:14.002  7856  7856 I Gecko   : ISurfaceAllocator::PlatformAllocSurfaceDescriptor(448) start
10-16 00:00:14.022  7236  7263 I Gecko   : GrallocBufferActor::Create(260) before new GraphicBuffer
10-16 00:00:14.062  7236  7263 I Gecko   : GrallocBufferActor::Create(265) after new GraphicBuffer
10-16 00:00:14.062  7236  7263 I Gecko   : GrallocBufferActor::Create 0x4349b4c0 buffer 0x43498880 w768 h3985 size 12241920 finish
10-16 00:00:14.072  7856  7856 I Gecko   : ISurfaceAllocator::PlatformAllocSurfaceDescriptor(495) end
10-16 00:00:14.072  7856  7856 I Gecko   : time(444) takes 70.410ms w 768 h 3985
We could reuse buffers increase-only. If the next request is only slightly smaller than the existing buffer, we can use the existing buffer. If its lets say half as large, we get a new one.

Its critical to ensure that we release cached buffers when under memory pressure. Any patch we land to cache buffers must include an OOM hook.

Do we have a theory why allocating large buffers is so slow? Should we maybe not use gralloc for large buffers and try uploads?
(In reply to peter chang[:pchang] from comment #12)
> Now I saw some image garbage when I reuse previous gralloc. It may impact
> the profiling result.
> I will try to fix it and provide gecko profiler soon.

About the image garbage when I reuse previous gralloc, I think it was caused that the reused buffer was dirty and I didn't clear it.

> Do we have a theory why allocating large buffers is so slow? Should we maybe
> not use gralloc for large buffers and try uploads?

I'm going to check gralloc HAL for detail break down. I guess it may be related to the operation to clear whole buffer.
(In reply to peter chang[:pchang] from comment #18)
> (In reply to peter chang[:pchang] from comment #12)
> > Do we have a theory why allocating large buffers is so slow? Should we maybe
> > not use gralloc for large buffers and try uploads?
> 
> I'm going to check gralloc HAL for detail break down. I guess it may be
> related to the operation to clear whole buffer.

The gralloc HAL will wait to clear and flush buffer.

This is the flatfish gralloc repo(android-4.2.2_r1).

https://android.googlesource.com/platform/hardware/qcom/display/+/android-4.2.2_r1/libgralloc/ionalloc.cpp
Line 132 and 134.
Sorry for no direct line link.

This is the log when I scrolling homescreen.
The time used is including memset() and flush.

E/memalloc(  166): bignose time(162) takes 5.005ms
E/memalloc(  166): bignose ion: Allocated buffer memset size:3440640
E/memalloc(  166): bignose ion: Allocated buffer base:0x4f26f000 size:3440640 fd:99
E/memalloc(  166): bignose time(162) takes 3.296ms
E/memalloc(  166): bignose ion: Allocated buffer memset size:3342336
E/memalloc(  166): bignose ion: Allocated buffer base:0x4d4cf000 size:3342336 fd:16
E/memalloc(  166): bignose time(162) takes 3.723ms
E/memalloc(  166): bignose ion: Allocated buffer memset size:3342336
E/memalloc(  166): bignose ion: Allocated buffer base:0x4fa7e000 size:3342336 fd:94
E/memalloc(  166): bignose time(162) takes 4.029ms
E/memalloc(  166): bignose ion: Allocated buffer memset size:3440640
E/memalloc(  166): bignose ion: Allocated buffer base:0x4eefb000 size:3440640 fd:97

We will try bigger case.
Ok, this makes sense now. We clear the memory and we flush the cache so painting into the buffer is an uncached operation. That explains the bad times, especially for larger buffers. Lets retain the buffers and see how that goes. Keep in mind to a) flush the buffers when under memory pressure and b) re-use a larger buffer thats slightly to big, if we can.
Note that fixing bug 921212 will have us stop allocating gralloc for unrotate.
The patch in bug 921212 should be in a good state so feel free to apply it locally and see if it helps.
(In reply to Jerry Shih[:jerry] from comment #19)
> (In reply to peter chang[:pchang] from comment #18)
> > (In reply to peter chang[:pchang] from comment #12)
> > > Do we have a theory why allocating large buffers is so slow? Should we maybe
> > > not use gralloc for large buffers and try uploads?
> > 
> > I'm going to check gralloc HAL for detail break down. I guess it may be
> > related to the operation to clear whole buffer.
> 
> The gralloc HAL will wait to clear and flush buffer.
> 
> This is the flatfish gralloc repo(android-4.2.2_r1).
> 
> https://android.googlesource.com/platform/hardware/qcom/display/+/android-4.
> 2.2_r1/libgralloc/ionalloc.cpp
> Line 132 and 134.
> Sorry for no direct line link.
> 
> This is the log when I scrolling homescreen.
> The time used is including memset() and flush.
> 
> E/memalloc(  166): bignose time(162) takes 5.005ms
> E/memalloc(  166): bignose ion: Allocated buffer memset size:3440640
> E/memalloc(  166): bignose ion: Allocated buffer base:0x4f26f000
> size:3440640 fd:99
> E/memalloc(  166): bignose time(162) takes 3.296ms
> E/memalloc(  166): bignose ion: Allocated buffer memset size:3342336
> E/memalloc(  166): bignose ion: Allocated buffer base:0x4d4cf000
> size:3342336 fd:16
> E/memalloc(  166): bignose time(162) takes 3.723ms
> E/memalloc(  166): bignose ion: Allocated buffer memset size:3342336
> E/memalloc(  166): bignose ion: Allocated buffer base:0x4fa7e000
> size:3342336 fd:94
> E/memalloc(  166): bignose time(162) takes 4.029ms
> E/memalloc(  166): bignose ion: Allocated buffer memset size:3440640
> E/memalloc(  166): bignose ion: Allocated buffer base:0x4eefb000
> size:3440640 fd:97
> 
> We will try bigger case.

Well, I think it's N4's(Qcom's) GPU HAL, but not FlatFish.

It makes sense that GPU driver will flush memory if usage is SW_READ_ | SW_WRITE.

Here is a good and interesting document.
https://wiki.mozilla.org/Platform/GFX/Gralloc  (How Gralloc buffers are created and refcounted (non Mozilla-specific))
(In reply to Benoit Girard (:BenWa) from comment #22)
> The patch in bug 921212 should be in a good state so feel free to apply it
> locally and see if it helps.

I just applied it, but there's still buffer re-allocation while switches to next page of Homescreen. Can you provide further information about it ? What scenarios is it supposed to be helpful for ? I can look into it.
(In reply to Vincent Lin[:vincentlin] from comment #27)
> (In reply to Benoit Girard (:BenWa) from comment #22)
> > The patch in bug 921212 should be in a good state so feel free to apply it
> > locally and see if it helps.
> 
> I just applied it, but there's still buffer re-allocation while switches to
> next page of Homescreen. Can you provide further information about it ? What
> scenarios is it supposed to be helpful for ? I can look into it.

Yes, Homescreen still has lots of gralloc buffer reallocation request after applying patch of bug 921212.

Because the mBufferProvider in below line is still nullptr which means it is the first time to render the new layer(new deprecatedtextureclient). Therefore, canReuseBuffer flag is false at line 550.

But I saw it worked for setting app.

http://mxr.mozilla.org/mozilla-central/source/gfx/layers/ThebesLayerBuffer.cpp#481
http://mxr.mozilla.org/mozilla-central/source/gfx/layers/ThebesLayerBuffer.cpp#550
Please pull in some people from Toronto to comment on this.
Flags: needinfo?(milan)
Need Benwa's comment.
Flags: needinfo?(milan) → needinfo?(bgirard)
I was also seeing more buffer allocation after my patch as well but that one is because there is no buffer. I spend an hour or two debugging it but I wasn't able to understand why it was happening. Someone needs to debug this further. I saw that the main process active the lock screen which would send a message to the app to clear the layer tree which interfered with the debugging quite a bit as it was legitimately clearing the buffer. Wanting to look into bug 928123 first because it's a low hanging fruit for performance.
Flags: needinfo?(bgirard)
this issue blocks 920921, according to triage result, mark it koi+
blocking-b2g: --- → koi+
Clear gralloc buffer when gralloc recycle happened could solve the garbage problem.
And it only took about 2~3 ms to clear 720p area according to systrace result.

Since we have lots of gralloc requests with different size, we need a mechanism to decide which size/type we want to cache. Therefore, I'm thinking to have the gralloc request logger, then we could select which gralloc we need to cache based on the logger data.

(In reply to Andreas Gal :gal from comment #20)
> and b) re-use a larger buffer thats slightly to big, if we can.

I'm not sure we are able to reuse the buffer which is bigger than requested. It may generate additional cost because we are using bigger buffer. Checking....
This is the first draft patch which hard code the buffer size for recycling.
(In reply to Francis Lee [:frlee] from comment #32)
> this issue blocks 920921, according to triage result, mark it koi+

Bug 920921 is a Flatfish bug.  I didn't think Flatfish was getting koi+.  I think it's a mistake trying to force this into 1.2 given where we are in the development cycle.
Component: General → Graphics
Product: Firefox OS → Core
bug 930587 is trying to enable pre-render for homescreen and bug 929973 is going to improve the gralloc allocation behavior. Therefore, I think we don't need to put resource on this gralloc recycle fix.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: