WebGL game's FPS is low when HwComposer is used

RESOLVED WONTFIX

Status

()

Core
Graphics: Layers
P2
normal
RESOLVED WONTFIX
4 years ago
3 years ago

People

(Reporter: sotaro, Assigned: sotaro)

Tracking

({perf})

unspecified
ARM
Gonk (Firefox OS)
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(tracking-b2g:backlog)

Details

(Whiteboard: [c=progress p= s=2014.06.20.t u=])

Attachments

(3 attachments, 1 obsolete attachment)

(Assignee)

Description

4 years ago
When running WebGL game, HwComposer's composition is slower than OpenGL composition.
(Assignee)

Comment 1

4 years ago
I found the following problems.
-[1] EGLImage binding is done even when OpenGL composition is not used.
-[2] EGLImage binding is done when gralloc buffer is delivered to Compositor side.
   We need to defer EGLImage binding. Because Fence wait need to be done before EGLImage binding.
-[3] Flame's EGLSurface uses Surface::hook_dequeueBuffer_DEPRECATED().
     It is sync function and block Composer thread.
     On latest hw platform does not use this function.
(Assignee)

Updated

4 years ago
Blocks: 1022823
(Assignee)

Updated

4 years ago
Assignee: nobody → sotaro.ikeda.g

Comment 2

4 years ago
Can you dump and share the HWC layer list sent to HWC hal. Add this line at [1]:

LOGE("HWC layer[%d]::%s: dst=[%d %d %d %d] src=[%f %f %f %f] Tr=%x Blend=0x%x Flags=0x%x", mList->numHwLayers, aLayer->Name(),
hwcLayer.displayFrame.left, hwcLayer.displayFrame.top, hwcLayer.displayFrame.right, hwcLayer.displayFrame.bottom,
hwcLayer.sourceCropf.left, hwcLayer.sourceCropf.top, hwcLayer.sourceCropf.right, hwcLayer.sourceCropf.bottom,
hwcLayer.transform, hwcLayer.blending, hwcLayer.flags);

[1]: http://mxr.mozilla.org/mozilla-central/source/widget/gonk/HwcComposer2D.cpp#505
Grep for "HWC" in ADB logs.
Flags: needinfo?(sotaro.ikeda.g)

Comment 3

4 years ago
Also change LOGD to LOGE in HwcComposer2D::TryRender() to confirm GPU/HWC Composition. If possible, share Game app and steps to install it.
(Assignee)

Comment 4

4 years ago
I already confirmed that the majority of wait happened by Acquire fence wait and Surface::hook_dequeueBuffer_DEPRECATED().
Flags: needinfo?(sotaro.ikeda.g)
(Assignee)

Comment 5

4 years ago
(In reply to Sushil from comment #2)
> Can you dump and share the HWC layer list sent to HWC hal. Add this line at
> [1]:
> 

I am going to provide it.
(Assignee)

Comment 6

4 years ago
Sushil, do you know why the following code is necessary? If HwComposer do all layer's composition by blit, it seem not necessary except fence sync purpose.

----------------------
HwcComposer2D::TryHwComposition()
>>
        } else if (blitComposite) {
            // BLIT Composition, flip FB target
            GetGonkDisplay()->UpdateFBSurface(mDpy, mSur);
            FramebufferSurface* fbsurface = (FramebufferSurface*)(GetGonkDisplay()->GetFBSurface());
            if (!fbsurface) {
                LOGE("H/W Composition failed. NULL FBSurface.");
                return false;
            }
            mList->hwLayers[idx].handle = fbsurface->lastHandle;
            mList->hwLayers[idx].acquireFenceFd = fbsurface->GetPrevFBAcquireFd();
        }
Flags: needinfo?(sushilchauhan)

Comment 7

4 years ago
This is called HWC_BLIT Composition. The content of the frame, which has been composed by Copybit, gets drawn on FrameBuffer layer by driver. It avoids use of unnecessary intermediate full screen render buffers, which improves memory usage on device. Since the composed content is drawn on FB layer, that's why we need to call eglSwapBuffers() in HwcComposer2D to flip FB target before calling HWC set, even when the frame gets composed by HWC. See "HWC_BLIT" in HwcComposer2D.cpp. See the logic at:
[1]: http://mxr.mozilla.org/mozilla-central/source/widget/gonk/HwcComposer2D.cpp#546
[2]: http://mxr.mozilla.org/mozilla-central/source/widget/gonk/HwcComposer2D.cpp#560
[3]: http://mxr.mozilla.org/mozilla-central/source/widget/gonk/HwcComposer2D.cpp#586

Since, you mentioned, majority of wait happens while waiting for the acquire fence, check the acquire fence fd code at: http://mxr.mozilla.org/mozilla-central/source/widget/gonk/HwcComposer2D.cpp#594, to make sure if this is a valid fence. I guess you added this code.
Flags: needinfo?(sushilchauhan)
(Assignee)

Comment 8

4 years ago
Created attachment 8438673 [details] [diff] [review]
patch - Deliver acquire fence to HwComposer
(Assignee)

Comment 9

4 years ago
(In reply to Sushil from comment #7)
> 
> Since, you mentioned, majority of wait happens while waiting for the acquire
> fence, check the acquire fence fd code at:
> http://mxr.mozilla.org/mozilla-central/source/widget/gonk/HwcComposer2D.
> cpp#594, to make sure if this is a valid fence. I guess you added this code.

I recognized that Current gecko does not deliver acquire fence to HwComposer. The aquireFence wait happen incorrectly at the following.
http://mxr.mozilla.org/mozilla-central/source/gfx/layers/opengl/GrallocTextureHost.cpp#448

attachment 8438673 [details] [diff] [review] fix this problem. But fps is still low. After applying patch, wait is moved to Release fence at Surface::hook_dequeueBuffer_DEPRECATED().
(Assignee)

Comment 10

4 years ago
(In reply to Sushil from comment #2)
> Can you dump and share the HWC layer list sent to HWC hal. Add this line at
> [1]:
> 
> LOGE("HWC layer[%d]::%s: dst=[%d %d %d %d] src=[%f %f %f %f] Tr=%x
> Blend=0x%x Flags=0x%x", mList->numHwLayers, aLayer->Name(),
> hwcLayer.displayFrame.left, hwcLayer.displayFrame.top,
> hwcLayer.displayFrame.right, hwcLayer.displayFrame.bottom,
> hwcLayer.sourceCropf.left, hwcLayer.sourceCropf.top,
> hwcLayer.sourceCropf.right, hwcLayer.sourceCropf.bottom,
> hwcLayer.transform, hwcLayer.blending, hwcLayer.flags);

Flame uses JB, sourceCropf seems not defined yet. I am going to remove sourceCropf locally.
(Assignee)

Comment 11

4 years ago
Created attachment 8438688 [details]
logcat during the app run

Logcat during the app running. Sorry, we can not share the app.
(Assignee)

Comment 12

4 years ago
http://people.mozilla.org/~bgirard/cleopatra/#report=8dec5906370d8ebb17fc5cde78be403dc0b1ec41

profile of the Compositor during the app running. Almost time is waiting Fence::waitForever() in Surface::hook_dequeueBuffer_DEPRECATED()
(Assignee)

Comment 13

4 years ago
There might be following way to the problem in Comment 12.
- [1] Increase Framebuffer number to 3.
- [2] Apply vsync implementation
- [3] Replace gonk to new one that does not use Surface::hook_dequeueBuffer_DEPRECATED().
(Assignee)

Comment 14

4 years ago
Easier solution is [1]. But it require more gralloc buffer. It seems not fit to b2g. When applying [1], FPS becomes good.

Comment 15

4 years ago
(In reply to Sotaro Ikeda [:sotaro] from comment #14)
> Easier solution is [1]. But it require more gralloc buffer. It seems not fit
> to b2g. When applying [1], FPS becomes good.

Yes, I was about to suggest this one. We also enable triple frame buffer via BoardConfig file of the device. See Bug: https://bugzilla.mozilla.org/show_bug.cgi?id=988160. If you have confirmed that [1] improved FPS, then please make this change in the board config file of the device. We should not change it in FrameBufferSurface.cpp since it might not be needed on higher end devices.

BTW, [2] will not improve the FPS, it will make it worse. I am not aware about [3], you can give it a try.

Comment 16

4 years ago
1 suggestion to reduce load on HWC. In below layer configuration, layer[1] is totally hidden under layer[2] so it should not be present in layer tree (similar to Bug 1022164). It will lead to unnecessary solid fill operation in driver. Just for a quick check, can you hack HwcComposer2D code and do not include layer[1] and see if FPS improves ?

E HWComposer: HWC layer[0]::ColorLayerComposite: dst=[0 0 480 854] Tr=ff000000 Blend=0x100 Flags=0x8
E HWComposer: HWC layer[1]::ColorLayerComposite: dst=[1 2 479 852] Tr=ff000000 Blend=0x100 Flags=0x8
E HWComposer: HWC layer[2]::CanvasLayerComposite: dst=[1 2 479 852] Tr=2 Blend=0x100 Flags=0x0
qdhwcomposer: canUseCopybitForRGB:renderArea 406300, fbArea 409920
HWComposer: Frame rendered
Flags: needinfo?(sotaro.ikeda.g)
(Assignee)

Comment 17

4 years ago
(In reply to Sushil from comment #15)
> (In reply to Sotaro Ikeda [:sotaro] from comment #14)
> > Easier solution is [1]. But it require more gralloc buffer. It seems not fit
> > to b2g. When applying [1], FPS becomes good.
> 
> Yes, I was about to suggest this one. We also enable triple frame buffer via
> BoardConfig file of the device. See Bug:
> https://bugzilla.mozilla.org/show_bug.cgi?id=988160. If you have confirmed
> that [1] improved FPS, then please make this change in the board config file
> of the device. We should not change it in FrameBufferSurface.cpp since it
> might not be needed on higher end devices.
> 
> BTW, [2] will not improve the FPS, it will make it worse. I am not aware
> about [3], you can give it a try.

[1] makes a good result. The profile is the following. [2][3] are just possible workaournd. They are not realistic as b2g-v1.4.

http://people.mozilla.org/~bgirard/cleopatra/#report=8dec5906370d8ebb17fc5cde78be403dc0b1ec41
Flags: needinfo?(sotaro.ikeda.g)

Comment 18

4 years ago
Cool. You can go ahead with [1] then. Please do a sanity on device to make sure it does not break anything else. Also in layer config of frame, HWC layer[1] is unnecessary, can you check by hacking and dropping this layer in HwcComposer to check if it improves FPS (with & without triple FrameBuffer).
(Assignee)

Comment 19

4 years ago
(In reply to Sushil from comment #16)
> 1 suggestion to reduce load on HWC. In below layer configuration, layer[1]
> is totally hidden under layer[2] so it should not be present in layer tree
> (similar to Bug 1022164). It will lead to unnecessary solid fill operation
> in driver. Just for a quick check, can you hack HwcComposer2D code and do
> not include layer[1] and see if FPS improves ?

From the profile in Comment 17, we do not need to handle about it in this bug, I think.
(Assignee)

Comment 20

4 years ago
(In reply to Sushil from comment #18)
> Also in layer config of frame, HWC
> layer[1] is unnecessary, can you check by hacking and dropping this layer in
> HwcComposer to check if it improves FPS (with & without triple FrameBuffer).

It is better to handle as a different problem as a different bug to keep a bug simple.
(Assignee)

Updated

4 years ago
Depends on: 1024144
(Assignee)

Comment 21

4 years ago
Comment on attachment 8438673 [details] [diff] [review]
patch - Deliver acquire fence to HwComposer

The patch moved to Bug 1024144.
Attachment #8438673 - Attachment is obsolete: true
(Assignee)

Comment 22

4 years ago
(In reply to Sotaro Ikeda [:sotaro] from comment #17)
> 
> [1] makes a good result. The profile is the following. [2][3] are just
> possible workaournd. They are not realistic as b2g-v1.4.
> 
> http://people.mozilla.org/~bgirard/cleopatra/
> #report=8dec5906370d8ebb17fc5cde78be403dc0b1ec41

Sorry, I pasted wrong profile data. It is same one to comment 12.
(Assignee)

Comment 23

4 years ago
The following is the correct profile when [1] is applied.
http://people.mozilla.org/~bgirard/cleopatra/#report=5abf4e60431b2439498a3e95ec1b6ac36cb5bdea

> - [1] Increase Framebuffer number to 3.
(Assignee)

Comment 24

4 years ago
Flame(msm8610) seems not have an enough graphics handling power. The profile in Comment 12, Compositor thread comsume 80% of time at eglSwapBuffers(). It is not an acceptable value as a product. But [1] mean that the device consumes more graphic buffer steadily than before. It is very bad thing for b2g.

I am going to increase the frame buffer count only on flame device. And after Bug 987529 is fixed, I am going to check if it is possible to reduce frame buffer number to 2.

Comment 25

4 years ago
VSYNC implementation will not improve the FPS, when GPU itself it taking too much time in eglSwapBuffers(). You can verify with a quick patch by adding vsync hook in HwcComposer2D in Flame code.
(Assignee)

Updated

4 years ago
Status: NEW → ASSIGNED
(Assignee)

Comment 26

4 years ago
Android has a mechanism to modify a number of frame buffer. It uses ifdef in FramebufferSurface.cpp. If we add "NUM_FRAMEBUFFER_SURFACE_BUFFERS := 3" to BoardConfig.mk. The config is delivered to FramebufferSurface.cpp in android.

http://androidxref.com/4.4.3_r1.1/xref/frameworks/native/services/surfaceflinger/DisplayHardware/FramebufferSurface.cpp#39

But FramebufferSurface.cpp is build as part of gecko now. IIRC, if we want to add device specific ifdef to gecko build. It seems to need to change "configure.in". It is not desirable. Instead, use system property is easier solution.
(Assignee)

Comment 27

4 years ago
(In reply to Sushil from comment #25)
> VSYNC implementation will not improve the FPS, when GPU itself it taking too
> much time in eglSwapBuffers(). You can verify with a quick patch by adding
> vsync hook in HwcComposer2D in Flame code.

Possibility of reduction is not important here. Do evaluation after Bug 987529 fix is important. So, there is no necessity to check it urgently.
(Assignee)

Updated

4 years ago
Blocks: 1014011
(Assignee)

Comment 28

4 years ago
Nominate to "b2g-v1.4". This bug affect to WebGL game's performance and blocking 1.4+ bug.
blocking-b2g: --- → 1.4?
(Assignee)

Comment 29

4 years ago
Created attachment 8439485 [details] [diff] [review]
patch part 1 - Number of Framebuffer be changeable
(Assignee)

Comment 30

4 years ago
Created attachment 8439486 [details] [diff] [review]
patch part 2 - Add ro.moz.framebuffer.buffers_num
(Assignee)

Comment 31

4 years ago
I confirmed that the patches change the frame buffer surface count to 3 on master flame.
(Assignee)

Comment 32

4 years ago
Comment on attachment 8439485 [details] [diff] [review]
patch part 1 - Number of Framebuffer be changeable

mwu, can you review the patch? If there are a better way than this, can you give me an advice?
Attachment #8439485 - Flags: review?(mwu)
(Assignee)

Updated

4 years ago
Attachment #8439486 - Flags: review?(mwu)
triage team, see comment 28 for why this is a blocker.
NI :elan, if we still need given we passed partner certification?
Flags: needinfo?(elancaster)
Investigating, see bug 1022823.
(Assignee)

Updated

4 years ago
No longer blocks: 1022823
Whiteboard: [c=progress p= s= u=]
Priority: -- → P2
blocking for dependent bug being blocker.
blocking-b2g: 1.4? → 1.4+
Upon further review I'd like this in backlog.
Blocks: 1014011
blocking-b2g: 1.4+ → backlog

Updated

4 years ago
No longer blocks: 1014011

Updated

4 years ago
No longer blocks: 1014011
(Assignee)

Comment 38

4 years ago
When flame is on 256MB config, "frame buffer surface count to 3" could cause oom problem. It seems better not to change the count in 256MB config.
(Assignee)

Comment 39

4 years ago
On b2g v1.4, I got the following fps. I checked performance down in HwComposition on master flame. gfx layer's difference might cause it.
- b2g-v1.4, OpenGL composition: 41fps
- b2g-v1.4, HwComposer composition: 42fps
(Assignee)

Updated

4 years ago
Attachment #8439485 - Flags: review?(mwu)
(Assignee)

Updated

4 years ago
Attachment #8439486 - Flags: review?(mwu)
(Assignee)

Comment 40

4 years ago
It might better to re-think about to increase frame buffer surface count. An effect of changing it is very big.
(Assignee)

Comment 41

4 years ago
(In reply to Sotaro Ikeda [:sotaro] from comment #40)
> It might better to re-think about to increase frame buffer surface count. An
> effect of changing it is very big.

The main problem comes from flame's egl surface's implementation. It uses  Surface::hook_dequeueBuffer_DEPRECATED()v like in Comment 12. egl surface should not use the function. And newer platform does not use the function.
(Assignee)

Comment 42

4 years ago
(In reply to Sotaro Ikeda [:sotaro] from comment #39)
> On b2g v1.4, I got the following fps. I checked performance down in
> HwComposition on master flame. gfx layer's difference might cause it.
> - b2g-v1.4, OpenGL composition: 41fps
> - b2g-v1.4, HwComposer composition: 42fps

Bug 1022164 is fixing some gfx layer's problems.
(Assignee)

Comment 43

4 years ago
With On b2g v1.4 + latest app, the app's rendering seems very efficient. HwComposer compose only Canvas layer. In this case, flame's hw composer seems efficient.

> qdhwcomposer: canUseCopybitForRGB:renderArea 409920, fbArea 409920.
(Assignee)

Comment 44

4 years ago
From Comment 41 and Comment 43, it seems better close this bug.
(Assignee)

Updated

4 years ago
Status: ASSIGNED → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → WONTFIX

Comment 45

4 years ago
Sotaro,

So you are not increasing FB surfaces to 3. What is the conclusion of this Bug? Are you planning to fix Flame's egl surface's implementation ? Or Flame is being moved to b2g v1.4 ?

Updated

4 years ago
Flags: needinfo?(sotaro.ikeda.g)

Updated

4 years ago
Whiteboard: [c=progress p= s= u=] → [c=progress p= s=2014.06.20.t u=]
(Assignee)

Comment 46

4 years ago
(In reply to Sushil from comment #45)
> Sotaro,
> 
> So you are not increasing FB surfaces to 3. What is the conclusion of this
> Bug? Are you planning to fix Flame's egl surface's implementation ? Or Flame
> is being moved to b2g v1.4 ?

On b2g-v1.4, mozilla will fix only very critical bugs now. "Fullscreen web gl game" use case, is not a critical problem anymore. Therefore I do not make a change to b2g-v1.4 about this problem.

To fix Surface::hook_dequeueBuffer_DEPRECATED() problem, we need to replace prebuilt binary to latest codeaurora's one. In my understanding, mozilla side does not have a right to do it. It is device vendor side's problem. The vendor already chose old JB platform as gonk.
Flags: needinfo?(sotaro.ikeda.g)

Updated

3 years ago
Flags: needinfo?(elancaster)
blocking-b2g: backlog → ---
tracking-b2g: --- → backlog
You need to log in before you can comment on or make changes to this bug.