Closed
Bug 1024025
Opened 10 years ago
Closed 10 years ago
WebGL game's FPS is low when HwComposer is used
Categories
(Core :: Graphics: Layers, defect, P2)
Tracking
()
RESOLVED
WONTFIX
tracking-b2g | backlog |
People
(Reporter: sotaro, Assigned: sotaro)
References
Details
(Keywords: perf, Whiteboard: [c=progress p= s=2014.06.20.t u=])
Attachments
(3 files, 1 obsolete file)
4.17 KB,
text/plain
|
Details | |
1.60 KB,
patch
|
Details | Diff | Splinter Review | |
593 bytes,
patch
|
Details | Diff | Splinter Review |
When running WebGL game, HwComposer's composition is slower than OpenGL composition.
Assignee | ||
Comment 1•10 years ago
|
||
I found the following problems. -[1] EGLImage binding is done even when OpenGL composition is not used. -[2] EGLImage binding is done when gralloc buffer is delivered to Compositor side. We need to defer EGLImage binding. Because Fence wait need to be done before EGLImage binding. -[3] Flame's EGLSurface uses Surface::hook_dequeueBuffer_DEPRECATED(). It is sync function and block Composer thread. On latest hw platform does not use this function.
Assignee | ||
Updated•10 years ago
|
Assignee: nobody → sotaro.ikeda.g
Can you dump and share the HWC layer list sent to HWC hal. Add this line at [1]: LOGE("HWC layer[%d]::%s: dst=[%d %d %d %d] src=[%f %f %f %f] Tr=%x Blend=0x%x Flags=0x%x", mList->numHwLayers, aLayer->Name(), hwcLayer.displayFrame.left, hwcLayer.displayFrame.top, hwcLayer.displayFrame.right, hwcLayer.displayFrame.bottom, hwcLayer.sourceCropf.left, hwcLayer.sourceCropf.top, hwcLayer.sourceCropf.right, hwcLayer.sourceCropf.bottom, hwcLayer.transform, hwcLayer.blending, hwcLayer.flags); [1]: http://mxr.mozilla.org/mozilla-central/source/widget/gonk/HwcComposer2D.cpp#505 Grep for "HWC" in ADB logs.
Flags: needinfo?(sotaro.ikeda.g)
Also change LOGD to LOGE in HwcComposer2D::TryRender() to confirm GPU/HWC Composition. If possible, share Game app and steps to install it.
Assignee | ||
Comment 4•10 years ago
|
||
I already confirmed that the majority of wait happened by Acquire fence wait and Surface::hook_dequeueBuffer_DEPRECATED().
Flags: needinfo?(sotaro.ikeda.g)
Assignee | ||
Comment 5•10 years ago
|
||
(In reply to Sushil from comment #2) > Can you dump and share the HWC layer list sent to HWC hal. Add this line at > [1]: > I am going to provide it.
Assignee | ||
Comment 6•10 years ago
|
||
Sushil, do you know why the following code is necessary? If HwComposer do all layer's composition by blit, it seem not necessary except fence sync purpose.
----------------------
HwcComposer2D::TryHwComposition()
>>
} else if (blitComposite) {
// BLIT Composition, flip FB target
GetGonkDisplay()->UpdateFBSurface(mDpy, mSur);
FramebufferSurface* fbsurface = (FramebufferSurface*)(GetGonkDisplay()->GetFBSurface());
if (!fbsurface) {
LOGE("H/W Composition failed. NULL FBSurface.");
return false;
}
mList->hwLayers[idx].handle = fbsurface->lastHandle;
mList->hwLayers[idx].acquireFenceFd = fbsurface->GetPrevFBAcquireFd();
}
Flags: needinfo?(sushilchauhan)
This is called HWC_BLIT Composition. The content of the frame, which has been composed by Copybit, gets drawn on FrameBuffer layer by driver. It avoids use of unnecessary intermediate full screen render buffers, which improves memory usage on device. Since the composed content is drawn on FB layer, that's why we need to call eglSwapBuffers() in HwcComposer2D to flip FB target before calling HWC set, even when the frame gets composed by HWC. See "HWC_BLIT" in HwcComposer2D.cpp. See the logic at: [1]: http://mxr.mozilla.org/mozilla-central/source/widget/gonk/HwcComposer2D.cpp#546 [2]: http://mxr.mozilla.org/mozilla-central/source/widget/gonk/HwcComposer2D.cpp#560 [3]: http://mxr.mozilla.org/mozilla-central/source/widget/gonk/HwcComposer2D.cpp#586 Since, you mentioned, majority of wait happens while waiting for the acquire fence, check the acquire fence fd code at: http://mxr.mozilla.org/mozilla-central/source/widget/gonk/HwcComposer2D.cpp#594, to make sure if this is a valid fence. I guess you added this code.
Flags: needinfo?(sushilchauhan)
Assignee | ||
Comment 8•10 years ago
|
||
Assignee | ||
Comment 9•10 years ago
|
||
(In reply to Sushil from comment #7) > > Since, you mentioned, majority of wait happens while waiting for the acquire > fence, check the acquire fence fd code at: > http://mxr.mozilla.org/mozilla-central/source/widget/gonk/HwcComposer2D. > cpp#594, to make sure if this is a valid fence. I guess you added this code. I recognized that Current gecko does not deliver acquire fence to HwComposer. The aquireFence wait happen incorrectly at the following. http://mxr.mozilla.org/mozilla-central/source/gfx/layers/opengl/GrallocTextureHost.cpp#448 attachment 8438673 [details] [diff] [review] fix this problem. But fps is still low. After applying patch, wait is moved to Release fence at Surface::hook_dequeueBuffer_DEPRECATED().
Assignee | ||
Comment 10•10 years ago
|
||
(In reply to Sushil from comment #2) > Can you dump and share the HWC layer list sent to HWC hal. Add this line at > [1]: > > LOGE("HWC layer[%d]::%s: dst=[%d %d %d %d] src=[%f %f %f %f] Tr=%x > Blend=0x%x Flags=0x%x", mList->numHwLayers, aLayer->Name(), > hwcLayer.displayFrame.left, hwcLayer.displayFrame.top, > hwcLayer.displayFrame.right, hwcLayer.displayFrame.bottom, > hwcLayer.sourceCropf.left, hwcLayer.sourceCropf.top, > hwcLayer.sourceCropf.right, hwcLayer.sourceCropf.bottom, > hwcLayer.transform, hwcLayer.blending, hwcLayer.flags); Flame uses JB, sourceCropf seems not defined yet. I am going to remove sourceCropf locally.
Assignee | ||
Comment 11•10 years ago
|
||
Logcat during the app running. Sorry, we can not share the app.
Assignee | ||
Comment 12•10 years ago
|
||
http://people.mozilla.org/~bgirard/cleopatra/#report=8dec5906370d8ebb17fc5cde78be403dc0b1ec41 profile of the Compositor during the app running. Almost time is waiting Fence::waitForever() in Surface::hook_dequeueBuffer_DEPRECATED()
Assignee | ||
Comment 13•10 years ago
|
||
There might be following way to the problem in Comment 12. - [1] Increase Framebuffer number to 3. - [2] Apply vsync implementation - [3] Replace gonk to new one that does not use Surface::hook_dequeueBuffer_DEPRECATED().
Assignee | ||
Comment 14•10 years ago
|
||
Easier solution is [1]. But it require more gralloc buffer. It seems not fit to b2g. When applying [1], FPS becomes good.
Comment 15•10 years ago
|
||
(In reply to Sotaro Ikeda [:sotaro] from comment #14) > Easier solution is [1]. But it require more gralloc buffer. It seems not fit > to b2g. When applying [1], FPS becomes good. Yes, I was about to suggest this one. We also enable triple frame buffer via BoardConfig file of the device. See Bug: https://bugzilla.mozilla.org/show_bug.cgi?id=988160. If you have confirmed that [1] improved FPS, then please make this change in the board config file of the device. We should not change it in FrameBufferSurface.cpp since it might not be needed on higher end devices. BTW, [2] will not improve the FPS, it will make it worse. I am not aware about [3], you can give it a try.
Comment 16•10 years ago
|
||
1 suggestion to reduce load on HWC. In below layer configuration, layer[1] is totally hidden under layer[2] so it should not be present in layer tree (similar to Bug 1022164). It will lead to unnecessary solid fill operation in driver. Just for a quick check, can you hack HwcComposer2D code and do not include layer[1] and see if FPS improves ? E HWComposer: HWC layer[0]::ColorLayerComposite: dst=[0 0 480 854] Tr=ff000000 Blend=0x100 Flags=0x8 E HWComposer: HWC layer[1]::ColorLayerComposite: dst=[1 2 479 852] Tr=ff000000 Blend=0x100 Flags=0x8 E HWComposer: HWC layer[2]::CanvasLayerComposite: dst=[1 2 479 852] Tr=2 Blend=0x100 Flags=0x0 qdhwcomposer: canUseCopybitForRGB:renderArea 406300, fbArea 409920 HWComposer: Frame rendered
Flags: needinfo?(sotaro.ikeda.g)
Assignee | ||
Comment 17•10 years ago
|
||
(In reply to Sushil from comment #15) > (In reply to Sotaro Ikeda [:sotaro] from comment #14) > > Easier solution is [1]. But it require more gralloc buffer. It seems not fit > > to b2g. When applying [1], FPS becomes good. > > Yes, I was about to suggest this one. We also enable triple frame buffer via > BoardConfig file of the device. See Bug: > https://bugzilla.mozilla.org/show_bug.cgi?id=988160. If you have confirmed > that [1] improved FPS, then please make this change in the board config file > of the device. We should not change it in FrameBufferSurface.cpp since it > might not be needed on higher end devices. > > BTW, [2] will not improve the FPS, it will make it worse. I am not aware > about [3], you can give it a try. [1] makes a good result. The profile is the following. [2][3] are just possible workaournd. They are not realistic as b2g-v1.4. http://people.mozilla.org/~bgirard/cleopatra/#report=8dec5906370d8ebb17fc5cde78be403dc0b1ec41
Flags: needinfo?(sotaro.ikeda.g)
Comment 18•10 years ago
|
||
Cool. You can go ahead with [1] then. Please do a sanity on device to make sure it does not break anything else. Also in layer config of frame, HWC layer[1] is unnecessary, can you check by hacking and dropping this layer in HwcComposer to check if it improves FPS (with & without triple FrameBuffer).
Assignee | ||
Comment 19•10 years ago
|
||
(In reply to Sushil from comment #16) > 1 suggestion to reduce load on HWC. In below layer configuration, layer[1] > is totally hidden under layer[2] so it should not be present in layer tree > (similar to Bug 1022164). It will lead to unnecessary solid fill operation > in driver. Just for a quick check, can you hack HwcComposer2D code and do > not include layer[1] and see if FPS improves ? From the profile in Comment 17, we do not need to handle about it in this bug, I think.
Assignee | ||
Comment 20•10 years ago
|
||
(In reply to Sushil from comment #18) > Also in layer config of frame, HWC > layer[1] is unnecessary, can you check by hacking and dropping this layer in > HwcComposer to check if it improves FPS (with & without triple FrameBuffer). It is better to handle as a different problem as a different bug to keep a bug simple.
Assignee | ||
Comment 21•10 years ago
|
||
Comment on attachment 8438673 [details] [diff] [review] patch - Deliver acquire fence to HwComposer The patch moved to Bug 1024144.
Attachment #8438673 -
Attachment is obsolete: true
Assignee | ||
Comment 22•10 years ago
|
||
(In reply to Sotaro Ikeda [:sotaro] from comment #17) > > [1] makes a good result. The profile is the following. [2][3] are just > possible workaournd. They are not realistic as b2g-v1.4. > > http://people.mozilla.org/~bgirard/cleopatra/ > #report=8dec5906370d8ebb17fc5cde78be403dc0b1ec41 Sorry, I pasted wrong profile data. It is same one to comment 12.
Assignee | ||
Comment 23•10 years ago
|
||
The following is the correct profile when [1] is applied. http://people.mozilla.org/~bgirard/cleopatra/#report=5abf4e60431b2439498a3e95ec1b6ac36cb5bdea > - [1] Increase Framebuffer number to 3.
Assignee | ||
Comment 24•10 years ago
|
||
Flame(msm8610) seems not have an enough graphics handling power. The profile in Comment 12, Compositor thread comsume 80% of time at eglSwapBuffers(). It is not an acceptable value as a product. But [1] mean that the device consumes more graphic buffer steadily than before. It is very bad thing for b2g. I am going to increase the frame buffer count only on flame device. And after Bug 987529 is fixed, I am going to check if it is possible to reduce frame buffer number to 2.
Comment 25•10 years ago
|
||
VSYNC implementation will not improve the FPS, when GPU itself it taking too much time in eglSwapBuffers(). You can verify with a quick patch by adding vsync hook in HwcComposer2D in Flame code.
Assignee | ||
Updated•10 years ago
|
Status: NEW → ASSIGNED
Assignee | ||
Comment 26•10 years ago
|
||
Android has a mechanism to modify a number of frame buffer. It uses ifdef in FramebufferSurface.cpp. If we add "NUM_FRAMEBUFFER_SURFACE_BUFFERS := 3" to BoardConfig.mk. The config is delivered to FramebufferSurface.cpp in android. http://androidxref.com/4.4.3_r1.1/xref/frameworks/native/services/surfaceflinger/DisplayHardware/FramebufferSurface.cpp#39 But FramebufferSurface.cpp is build as part of gecko now. IIRC, if we want to add device specific ifdef to gecko build. It seems to need to change "configure.in". It is not desirable. Instead, use system property is easier solution.
Assignee | ||
Comment 27•10 years ago
|
||
(In reply to Sushil from comment #25) > VSYNC implementation will not improve the FPS, when GPU itself it taking too > much time in eglSwapBuffers(). You can verify with a quick patch by adding > vsync hook in HwcComposer2D in Flame code. Possibility of reduction is not important here. Do evaluation after Bug 987529 fix is important. So, there is no necessity to check it urgently.
Assignee | ||
Comment 28•10 years ago
|
||
Nominate to "b2g-v1.4". This bug affect to WebGL game's performance and blocking 1.4+ bug.
blocking-b2g: --- → 1.4?
Assignee | ||
Comment 29•10 years ago
|
||
Assignee | ||
Comment 30•10 years ago
|
||
Assignee | ||
Comment 31•10 years ago
|
||
I confirmed that the patches change the frame buffer surface count to 3 on master flame.
Assignee | ||
Comment 32•10 years ago
|
||
Comment on attachment 8439485 [details] [diff] [review] patch part 1 - Number of Framebuffer be changeable mwu, can you review the patch? If there are a better way than this, can you give me an advice?
Attachment #8439485 -
Flags: review?(mwu)
Assignee | ||
Updated•10 years ago
|
Attachment #8439486 -
Flags: review?(mwu)
Comment 33•10 years ago
|
||
triage team, see comment 28 for why this is a blocker.
Comment 34•10 years ago
|
||
NI :elan, if we still need given we passed partner certification?
Flags: needinfo?(elancaster)
Comment 35•10 years ago
|
||
Investigating, see bug 1022823.
Updated•10 years ago
|
Whiteboard: [c=progress p= s= u=]
Updated•10 years ago
|
Priority: -- → P2
Comment 37•10 years ago
|
||
Upon further review I'd like this in backlog.
Blocks: 1014011
blocking-b2g: 1.4+ → backlog
Assignee | ||
Comment 38•10 years ago
|
||
When flame is on 256MB config, "frame buffer surface count to 3" could cause oom problem. It seems better not to change the count in 256MB config.
Assignee | ||
Comment 39•10 years ago
|
||
On b2g v1.4, I got the following fps. I checked performance down in HwComposition on master flame. gfx layer's difference might cause it. - b2g-v1.4, OpenGL composition: 41fps - b2g-v1.4, HwComposer composition: 42fps
Assignee | ||
Updated•10 years ago
|
Attachment #8439485 -
Flags: review?(mwu)
Assignee | ||
Updated•10 years ago
|
Attachment #8439486 -
Flags: review?(mwu)
Assignee | ||
Comment 40•10 years ago
|
||
It might better to re-think about to increase frame buffer surface count. An effect of changing it is very big.
Assignee | ||
Comment 41•10 years ago
|
||
(In reply to Sotaro Ikeda [:sotaro] from comment #40) > It might better to re-think about to increase frame buffer surface count. An > effect of changing it is very big. The main problem comes from flame's egl surface's implementation. It uses Surface::hook_dequeueBuffer_DEPRECATED()v like in Comment 12. egl surface should not use the function. And newer platform does not use the function.
Assignee | ||
Comment 42•10 years ago
|
||
(In reply to Sotaro Ikeda [:sotaro] from comment #39) > On b2g v1.4, I got the following fps. I checked performance down in > HwComposition on master flame. gfx layer's difference might cause it. > - b2g-v1.4, OpenGL composition: 41fps > - b2g-v1.4, HwComposer composition: 42fps Bug 1022164 is fixing some gfx layer's problems.
Assignee | ||
Comment 43•10 years ago
|
||
With On b2g v1.4 + latest app, the app's rendering seems very efficient. HwComposer compose only Canvas layer. In this case, flame's hw composer seems efficient.
> qdhwcomposer: canUseCopybitForRGB:renderArea 409920, fbArea 409920.
Assignee | ||
Comment 44•10 years ago
|
||
From Comment 41 and Comment 43, it seems better close this bug.
Assignee | ||
Updated•10 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → WONTFIX
Comment 45•10 years ago
|
||
Sotaro, So you are not increasing FB surfaces to 3. What is the conclusion of this Bug? Are you planning to fix Flame's egl surface's implementation ? Or Flame is being moved to b2g v1.4 ?
Updated•10 years ago
|
Whiteboard: [c=progress p= s= u=] → [c=progress p= s=2014.06.20.t u=]
Assignee | ||
Comment 46•10 years ago
|
||
(In reply to Sushil from comment #45) > Sotaro, > > So you are not increasing FB surfaces to 3. What is the conclusion of this > Bug? Are you planning to fix Flame's egl surface's implementation ? Or Flame > is being moved to b2g v1.4 ? On b2g-v1.4, mozilla will fix only very critical bugs now. "Fullscreen web gl game" use case, is not a critical problem anymore. Therefore I do not make a change to b2g-v1.4 about this problem. To fix Surface::hook_dequeueBuffer_DEPRECATED() problem, we need to replace prebuilt binary to latest codeaurora's one. In my understanding, mozilla side does not have a right to do it. It is device vendor side's problem. The vendor already chose old JB platform as gonk.
Flags: needinfo?(sotaro.ikeda.g)
Updated•10 years ago
|
Flags: needinfo?(elancaster)
Updated•10 years ago
|
blocking-b2g: backlog → ---
tracking-b2g:
--- → backlog
You need to log in
before you can comment on or make changes to this bug.
Description
•