Closed
Bug 905589
Opened 11 years ago
Closed 11 years ago
use VBOs instead of client-side rendering
Categories
(Firefox OS Graveyard :: General, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: vlin, Assigned: gal)
Details
Attachments
(1 file, 10 obsolete files)
16.05 KB,
patch
|
milan
:
review+
|
Details | Diff | Splinter Review |
Display resolution is 1280x800 Freescale Quad-core CPU, 1.4GHz with Vivante GPU inside The FPS is low(under 30) in Homescreen sliding. First time we found that the bottleneck is caused by composition in B2G process. The root cause is glDrawArray takes around 10ms. After GPU vendor updating library, glDrawArray's execution time lowered down to ~1ms. Then the FPS increases from 15 to 25, the bottleneck is now in content process. The Homescreen layers are all drawing by software rendering. Peter Chang found that CPU resource is still sufficient. Viral Wang's experiment proofed CPU resource allocation is fine. More profiling and analyzing jobs have to do. This Bug is to track the status of performance issue of tablet project.
Comment 1•11 years ago
|
||
Add Andreas in cc list. -- Keven
Updated•11 years ago
|
Blocks: koi-devices
Assignee | ||
Comment 2•11 years ago
|
||
Ok, lets attach some profiles here. I would like to see a profile of the compositor thread and a profile of the main thread. There is no way we are gated on the compositor here. Something else is wrong.
(In reply to Andreas Gal :gal from comment #2) > Ok, lets attach some profiles here. I would like to see a profile of the > compositor thread and a profile of the main thread. There is no way we are > gated on the compositor here. Something else is wrong. Can you please profile gl()->fEGLImageTargetTexture2D() at http://dxr.mozilla.org/mozilla-central/source/gfx/layers/opengl/TextureHostOGL.cpp#l1073
Assignee | ||
Comment 4•11 years ago
|
||
Ok, the team did a really clever hack here to prove that the Freescale driver is just totally broken. They disabled in an Android build the HWC and frame rate of the Android homescreen drops below 30fps. So HWC is hiding the fact that Freescale's driver is fantastically slow. It seems that issuing a draw call can block >10ms. Since Android is broken too, I don't think we have much of a chance of fixing this on our end. Its up to the vendor to ship a working GPU driver. We can continue with the HWC work, but even with that this would bite us elsewhere, so until we have better drivers, we are stuck with this particular chipset. Since we have alternatives, lets explore those and we can continue here once we get vendor help.
Assignee | ||
Comment 5•11 years ago
|
||
On a related note, it seems that the issue is around locking. With gralloc disabled framerate doubles to 25 or so (using shmem). When uploading shmem is faster than gralloc, its sadness all around.
No longer blocks: flatfish, koi-devices
Summary: Performance: Graphic performance issues on tablet project → Vivante GPU (Freescale) can't reach 60fps on homescreen
Whiteboard: needs vendor fix
Comment 6•11 years ago
|
||
I'm going to CC Sotaro on this; I don't think it's related to bug 905304/bug 898919, but just in case.
Assignee | ||
Comment 7•11 years ago
|
||
I have an idea here. The Vivante driver might have a slow path when using client-stored vertices. We should try using a VBO instead. I will see if my GL is good enough to hack this up so we can test it.
Assignee | ||
Comment 8•11 years ago
|
||
In particular, a really dumb driver might use internally glBufferSubData on a single VBO to handle vertex pointers, so the 2nd glDrawArrays would then use the same VBO, and as a result glBufferSubData would have to wait until the GL pipeline has drained before it can proceed.
Assignee | ||
Comment 9•11 years ago
|
||
Here goes nothing. This switches the BindAndDrawQuadWithTextureRect path over to using VBOs. It turns out BindAndDrawQuad already uses an VBO (mQuadVBO). As bonus I also switched the FPS counter drawing over. I have no idea whether this actually fixes the bug here, but its probably a good idea in general to avoid client-stored vertex data, so we might want to take this patch even if it doesn't help this broken driver. This patch was tested only on MacOSX.
Assignee: nobody → gal
Assignee | ||
Comment 10•11 years ago
|
||
Attachment #791675 -
Attachment is obsolete: true
Assignee | ||
Comment 11•11 years ago
|
||
Don't leak VBOs when drawing the FPS counter.
Attachment #791676 -
Attachment is obsolete: true
Assignee | ||
Comment 12•11 years ago
|
||
Remove unintentional change to JS::Value.
Attachment #791679 -
Attachment is obsolete: true
Assignee | ||
Comment 13•11 years ago
|
||
Delete texture on shutdown (typo).
Attachment #791680 -
Attachment is obsolete: true
Assignee | ||
Updated•11 years ago
|
Attachment #791685 -
Flags: feedback?(vlin)
Comment 14•11 years ago
|
||
Before I found the glDrawArray was very slow from DrawQuad calls in Freescale paltform. http://mxr.mozilla.org/mozilla-central/source/gfx/layers/opengl/CompositorOGL.cpp#620 Using VBO didn't help the performance but it is good to have VBO patch landed as gal mentioned. Only reducing the texture size to one small 2x2 black texture for the background layer could improve the FPS. Therefore, our partner releases a gralloc fix for us. Later Jerry will provide the profiling data after applying gralloc partner fix.
Assignee | ||
Comment 15•11 years ago
|
||
Freescale fixed the slowness in glDrawArray. The previous version of the driver was slow. The current version is fast even without VBOs.
Comment 16•11 years ago
|
||
The tablet screen resolution is 1280x800. We set cpu governor to performance mode. The cpu freq is fixed at 1G. (adb shell "echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor") And we use the the latest gralloc library(fix glDrawArray performance problem) from partner. When we scroll homescreen between two pages(no repainting on content process), it spends about 30~40 ms for SwapBuffers(). http://mxr.mozilla.org/mozilla-central/source/gfx/layers/opengl/CompositorOGL.cpp#1291 We make an experiment that calls glFinish before SwapBuffers. Then the glFinish becomes the bottleneck. mGLContext->fFinish(); //30~40ms mGLContext->SwapBuffers(); //0~1ms
Assignee | ||
Comment 17•11 years ago
|
||
Nice work Jerry. Lets analyze this further. I will come upstairs to chat.
Assignee | ||
Comment 18•11 years ago
|
||
Jerry, can you please get the current values in BindAndDrawQuadWithTextureRect for: mGLContext->CanUploadNonPowerOfTwo() aTexture->GetSize().width aTexture->GetSize().height wrapMode (our assumption right now is that this is not GL_REPEAT)
Assignee | ||
Comment 19•11 years ago
|
||
We currently do not support tiled thebes for B2G. We only support them on Android and via texture upload. We can probably make that path work for B2G, but it will be some work. We also don't support tiled images layers at all. So if we are slow rendering a very large image layer here, that would suck.
Comment 20•11 years ago
|
||
(In reply to Andreas Gal :gal from comment #18) > Jerry, can you please get the current values in > BindAndDrawQuadWithTextureRect for: > > mGLContext->CanUploadNonPowerOfTwo() true > aTexture->GetSize().width > aTexture->GetSize().height There are many texute,such as 1280*800, 1260*800, 1185*800 in homescreen > wrapMode (our assumption right now is that this is not GL_REPEAT) LOCAL_GL_CLAMP_TO_EDGE
Assignee | ||
Comment 21•11 years ago
|
||
After some back and forth, we are at this point not 100% confident that the GPU has even enough memory bandwidth to composite a large texture to the framebuffer. The team will try to measure at what texture size things fall apart. We did try to disable CanUploadNonPowerOfTwo, and it didn't help. In addition rendering also was wrong. At the time being this is a dead end. We should feed this back to Freescale, and line up work for tiled layers. However, even that may not solve the problem if this is indeed memory bandwidth.
Assignee | ||
Comment 22•11 years ago
|
||
Actually, one more idea. Can you please remove the homescreen from gaia and just have it black or any color? Lets see whether that impacts frame rate.
Comment 23•11 years ago
|
||
Instead of chnage texture size and guess the limitation of this platform, why don't we just create an simple app and reveal this problem to platfrom vendor? Mimic the layout of launcher(2 full screen layers and one lauch bar which stay at the bottom of screen), keep compositor compose these three layers(with some hack). If we can prove that all the other platforms can survive(reach 60FPS) in this scenario, ask chip set vendor to have a deeper look into driver side, it might just a cache-line bug, or some trivial problems, and easy for them to fix. But we need to have some evidance in a simple app(scenario) which can tell them the story easily.
Comment 24•11 years ago
|
||
Morris, please update profile data after HWC enable.
Flags: needinfo?(mtseng)
Comment 25•11 years ago
|
||
We spend a lot of time, up and down, and eventually know the problem is in B2G process, instead of content process, and it's sort of GPU bound instead of CPU bound. I file another issue(Bug 907055) to construct a better/ more effecient way to identify GLCompositor GPU bound problem. With that feature, gfx guys in AE or System team can address this kind of problem while porting.
Comment 26•11 years ago
|
||
This is the performance profile sheet when using dummy texture for homescreen. We create a static dummy texture, and use this texture for all composition operation. https://docs.google.com/spreadsheet/ccc?key=0AhJcEt1NXQbqdGVDWWNFN09KRWpvbXJGSUh6QW5YWWc&usp=sharing
Assignee | ||
Updated•11 years ago
|
Attachment #791685 -
Flags: feedback?(vlin) → feedback?(bas)
Assignee | ||
Updated•11 years ago
|
Attachment #791685 -
Flags: feedback?(bas) → review?(bas)
Comment 27•11 years ago
|
||
Comment on attachment 791685 [details] [diff] [review] patch Review of attachment 791685 [details] [diff] [review]: ----------------------------------------------------------------- ::: gfx/gl/VBOArena.h @@ +21,5 @@ > + > + // Throw away all allocated VBOs. > + void Flush(GLContext *aGLContext); > +private: > + nsTArray<GLuint> mAllocatedVBOs; nit: in the whole 'which way do we go discussion', maybe these should be std::vector's? I have a mild preference for using STL where we can. I don't feel strongly though. ::: gfx/layers/opengl/CompositorOGL.cpp @@ +55,5 @@ > + size_t length = aElements; > + if (aMode == LOCAL_GL_TRIANGLES) > + length *= 3; > + else > + length *= 2; I believe there's an error here, this is all really confusing in our current implementation so I don't blame you :P. aElements is actually the amount of vertices you're drawing, not the amount of primitives, so 'length' in what you're currently doing as far as I understand it should just be aElements. In our case, the stride of both our arrays should always be 2 * sizeof(GLfloat) and thus the size 2 * aElements * sizeof(GLfloat). Currently I'd say from looking at it your code is reading random additional memory off the stack when in LOCAL_GL_TRIANGLES mode, this won't cause visual problems since GL will only use the valid portions when drawing. also a nit: I'd like to see some verticle whitespace in this function to improve readability.
Attachment #791685 -
Flags: review?(bas) → review-
Comment 28•11 years ago
|
||
Other than the buffer overflow that I believe is occurring this patch looks fine to me btw.
Assignee | ||
Comment 29•11 years ago
|
||
Attachment #791685 -
Attachment is obsolete: true
Assignee | ||
Comment 30•11 years ago
|
||
Attachment #793559 -
Attachment is obsolete: true
Assignee | ||
Comment 31•11 years ago
|
||
Comment on attachment 793560 [details] [diff] [review] patch Nice catch with *2/*3. Thanks. I absolutely hate nsTArray, so +1 from me for using std.
Attachment #793560 -
Flags: review?(bas)
Comment 32•11 years ago
|
||
Comment on attachment 793560 [details] [diff] [review] patch Review of attachment 793560 [details] [diff] [review]: ----------------------------------------------------------------- Strictly an improvement in my mind. ::: gfx/layers/opengl/CompositorOGL.cpp @@ +49,5 @@ > +static void > +DrawWithVertexBuffer2(GLContext *aGLContext, VBOArena &aVBOs, > + GLenum aMode, GLsizei aElements, > + GLint aAttr1, GLfloat *aArray1, > + GLint aAttr2, GLfloat *aArray2) One last nit: We should probably add a tiny comment that this function expects arrays with a stride of 2 * sizeof(GLfloat).
Attachment #793560 -
Flags: review?(bas) → review+
Assignee | ||
Comment 33•11 years ago
|
||
Attachment #793560 -
Attachment is obsolete: true
Assignee | ||
Updated•11 years ago
|
Attachment #793560 -
Attachment is obsolete: false
Assignee | ||
Updated•11 years ago
|
Flags: needinfo?(mtseng)
Keywords: checkin-needed
Summary: Vivante GPU (Freescale) can't reach 60fps on homescreen → use VBOs instead of client-side rendering
Updated•11 years ago
|
Attachment #793560 -
Attachment is obsolete: true
Comment 34•11 years ago
|
||
Doesn't apply cleanly, please rebase. Also, please make sure that you have hg configured to generate patches with all the necessary commit information, including a commit message. https://developer.mozilla.org/en-US/docs/Mercurial_FAQ#How_can_I_generate_a_patch_for_somebody_else_to_check-in_for_me.3F
Keywords: checkin-needed
Whiteboard: needs vendor fix
Comment 35•11 years ago
|
||
Andreas, let me know if it helps for us to rebase this patch instead.
Flags: needinfo?(gal)
Comment 36•11 years ago
|
||
Carry over r+ from :bas for the rebase.
Attachment #793669 -
Attachment is obsolete: true
Attachment #794071 -
Flags: review+
Flags: needinfo?(gal)
Updated•11 years ago
|
Keywords: checkin-needed
Updated•11 years ago
|
Keywords: checkin-needed
Comment 37•11 years ago
|
||
Carry over r+ from :bas for the rebase.
Attachment #794071 -
Attachment is obsolete: true
Attachment #794109 -
Flags: review+
Updated•11 years ago
|
Keywords: checkin-needed
Comment 38•11 years ago
|
||
Thanks :) https://hg.mozilla.org/integration/b2g-inbound/rev/230120afcbbf
Keywords: checkin-needed
Comment 39•11 years ago
|
||
Backed out for Android and B2G build bustage. https://hg.mozilla.org/integration/b2g-inbound/rev/39599246f51b https://tbpl.mozilla.org/php/getParsedLog.php?id=26879354&tree=B2g-Inbound
Comment 40•11 years ago
|
||
std::vector::data is new in C++11, so looks like some compilers didn't support it before.
Comment 41•11 years ago
|
||
Carry over r+ from :bas for the rebase. Use &vector[0] instead of vector.data() as not all compilers support std::vector::data() method.
Attachment #794109 -
Attachment is obsolete: true
Attachment #794185 -
Flags: review+
Updated•11 years ago
|
Keywords: checkin-needed
Comment 42•11 years ago
|
||
(In reply to Andreas Gal :gal from comment #7) > I have an idea here. The Vivante driver might have a slow path when using > client-stored vertices. We should try using a VBO instead. I will see if my > GL is good enough to hack this up so we can test it. When using VBO, we also need to upload vertex and texture coordinate for each different quad. Does use VBO better than use vertex data directly?
Flags: needinfo?(gal)
Comment 43•11 years ago
|
||
https://hg.mozilla.org/integration/b2g-inbound/rev/d128d47bfe47
Keywords: checkin-needed
Comment 44•11 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/d128d47bfe47
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Assignee | ||
Updated•10 years ago
|
Flags: needinfo?(gal)
You need to log in
before you can comment on or make changes to this bug.
Description
•