Closed
Bug 869696
Opened 12 years ago
Closed 12 years ago
Using Gralloc causes bad performance on the Geeksphone Peak
Categories
(Core :: General, defect)
Tracking
()
RESOLVED
FIXED
mozilla25
Tracking | Status | |
---|---|---|
b2g-v1.2 | --- | fixed |
People
(Reporter: cwiiis, Assigned: cwiiis)
References
Details
(Keywords: regression)
Attachments
(5 files, 1 obsolete file)
4.95 KB,
patch
|
Details | Diff | Splinter Review | |
1.26 KB,
patch
|
bjacob
:
review+
|
Details | Diff | Splinter Review |
1.48 KB,
patch
|
bjacob
:
review+
|
Details | Diff | Splinter Review |
2.09 KB,
patch
|
jrmuizel
:
review+
|
Details | Diff | Splinter Review |
676 bytes,
patch
|
bjacob
:
review+
bajaj
:
approval-mozilla-b2g26+
|
Details | Diff | Splinter Review |
Building B2G from master has serious performance issues (wrt framerate) on the Geeksphone Peak (and I've heard on the Keon, but I've not tested it) - however, building B2G with master Gaia but the gecko-18 branch of Gecko has very respectable performance, suggesting a gecko issue.
I've not debugged to see where this issue lies yet.
Assignee | ||
Updated•12 years ago
|
blocking-b2g: --- → koi?
Assignee | ||
Updated•12 years ago
|
Keywords: regression
![]() |
||
Updated•12 years ago
|
Keywords: regressionwindow-wanted
Comment 1•12 years ago
|
||
We haven't really started triaging koi yet. This isn't actionable in itself yet to be nomed yet (no clear scope associated with the bug). Pulling the koi? nom since this doesn't end up in our high priority QA bucket and doesn't have clear scope.
In order to get a regression window here, we're going to need accurate STR, expected results, and actual results that can be used to diagnose the issue to get a window on.
Chris - Can you provide a clear STR to use for the regression window?
blocking-b2g: koi? → ---
Flags: needinfo?(chrislord.net)
Assignee | ||
Comment 2•12 years ago
|
||
(In reply to Jason Smith [:jsmith] from comment #1)
> We haven't really started triaging koi yet. This isn't actionable in itself
> yet to be nomed yet (no clear scope associated with the bug). Pulling the
> koi? nom since this doesn't end up in our high priority QA bucket and
> doesn't have clear scope.
>
> In order to get a regression window here, we're going to need accurate STR,
> expected results, and actual results that can be used to diagnose the issue
> to get a window on.
>
> Chris - Can you provide a clear STR to use for the regression window?
STR:
1. Let phone boot
2. Swipe to everything.me
3. Swipe back to home
If you do this with the gecko-18 branch, performance is pretty good - on master, performance is *dramatically* worse. It should be pretty self-evident, especially if you can compare them side-by-side.
Note, this applies to just about every operation that results in an animation, but this is probaby the quickest and easiest one you can do.
Flags: needinfo?(chrislord.net)
Assignee | ||
Comment 3•12 years ago
|
||
Issue doesn't exist in the Gecko aurora branch.
Assignee | ||
Comment 4•12 years ago
|
||
This is bug 868556. Seems this hasn't hit the git mirror yet(?) but it's on hg m-c, so I assume it's coming.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → DUPLICATE
Updated•12 years ago
|
Keywords: regressionwindow-wanted
Assignee | ||
Comment 5•12 years ago
|
||
I had this wrong - it seems I get terrible performance on the Geeksphone Peak *unless* I revert bug 868556...
This seems pretty unintuitive to me, perhaps the path is broken on that device? I'll do some cursory investigation, any tips welcome.
Blocks: 868556
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
Summary: Performance regression in Gecko affecting Firefox OS (B2G) → Using Gralloc causes bad performance on the Geeksphone Peak
Assignee | ||
Comment 6•12 years ago
|
||
I guess this is a driver issue, but this seems as good a way as any that achieves the same effect but doesn't cause slowness.
Assignee: nobody → chrislord.net
Status: REOPENED → ASSIGNED
Attachment #749423 -
Flags: review?(bjacob)
Comment 7•12 years ago
|
||
This is terrifying.
This means that we're creating and destroying GL textures everytime we composite. It's interesting to hear that this is faster on one device, but I have to wonder how it perform on others before I can r+ this.
Could you try a few tweaks to the current code instead. This code just binds the EGLImage returned by GetNullEGLImage. This is defined in GLContextProviderEGL.cpp:
EGLImage GetNullEGLImage() MOZ_OVERRIDE
{
if (!mNullGraphicBuffer.get()) {
mNullGraphicBuffer
= new android::GraphicBuffer(
1, 1,
PIXEL_FORMAT_RGB_565,
GRALLOC_USAGE_SW_READ_NEVER | GRALLOC_USAGE_SW_WRITE_NEVER);
EGLint attrs[] = {
LOCAL_EGL_NONE, LOCAL_EGL_NONE
};
mNullEGLImage = sEGLLibrary.fCreateImage(EGL_DISPLAY(),
EGL_NO_CONTEXT,
LOCAL_EGL_NATIVE_BUFFER_ANDROID,
mNullGraphicBuffer->getNativeBuffer(),
attrs);
}
return mNullEGLImage;
}
Could you check that the fCreateImage path is only hit once? Could you check if changing GRALLOC_USAGE_SW_READ_NEVER to GRALLOC_USAGE_SW_READ_OFTEN makes it better? Could you check if changing the (1, 1) size to (64, 64) makes it better?
Alternatively, if we're really going down the route of your patch:
- could you check at least another device to see how it's affected? And/or consider doing this only on this particular driver (as identified, maybe, by GL strings)?
- could you get feedback directly from specialists, such as Michael Vines or Diego Wilson, so we get to understand why this performs better?
Comment 8•12 years ago
|
||
Comment on attachment 749423 [details] [diff] [review]
Use an alternative method to unlock gralloc textures
Cancelling review for now; need more investigation.
Attachment #749423 -
Flags: review?(bjacob)
Assignee | ||
Comment 9•12 years ago
|
||
(In reply to Benoit Jacob [:bjacob] from comment #7)
> This is terrifying.
>
> This means that we're creating and destroying GL textures everytime we
> composite. It's interesting to hear that this is faster on one device, but I
> have to wonder how it perform on others before I can r+ this.
Fair comment. I don't see it as any more terrifying than targeting and untargeting the eglimage on every frame though - textures ought to be pretty lightweight until they have resources associated with them, and given we retarget every frame anyway...
I'll get results for the Keon as well - these are the only two devices I have at the moment.
> Could you try a few tweaks to the current code instead. This code just binds
> the EGLImage returned by GetNullEGLImage. This is defined in
> GLContextProviderEGL.cpp:
>
> EGLImage GetNullEGLImage() MOZ_OVERRIDE
> {
> if (!mNullGraphicBuffer.get()) {
> mNullGraphicBuffer
> = new android::GraphicBuffer(
> 1, 1,
> PIXEL_FORMAT_RGB_565,
> GRALLOC_USAGE_SW_READ_NEVER |
> GRALLOC_USAGE_SW_WRITE_NEVER);
> EGLint attrs[] = {
> LOCAL_EGL_NONE, LOCAL_EGL_NONE
> };
> mNullEGLImage = sEGLLibrary.fCreateImage(EGL_DISPLAY(),
> EGL_NO_CONTEXT,
>
> LOCAL_EGL_NATIVE_BUFFER_ANDROID,
>
> mNullGraphicBuffer->getNativeBuffer(),
> attrs);
> }
> return mNullEGLImage;
> }
>
> Could you check that the fCreateImage path is only hit once? Could you check
> if changing GRALLOC_USAGE_SW_READ_NEVER to GRALLOC_USAGE_SW_READ_OFTEN makes
> it better? Could you check if changing the (1, 1) size to (64, 64) makes it
> better?
So I tried every combination of things here - changing size, format, flags, separately and together, it made no difference. I can confirm the path only gets hit once.
> Alternatively, if we're really going down the route of your patch:
> - could you check at least another device to see how it's affected? And/or
> consider doing this only on this particular driver (as identified, maybe, by
> GL strings)?
This sounds like a reasonable thing to do - will look at that.
> - could you get feedback directly from specialists, such as Michael Vines
> or Diego Wilson, so we get to understand why this performs better?
I don't know who these people are, but I'll go about finding out...
Comment 10•12 years ago
|
||
I'm told that passing the bit GraphicBuffer::USAGE_HW_TEXTURE in the GraphicBuffer constructor would be a good idea too, since we are binding this buffer to a OpenGL texture.
Assignee | ||
Comment 11•12 years ago
|
||
(In reply to Benoit Jacob [:bjacob] from comment #10)
> I'm told that passing the bit GraphicBuffer::USAGE_HW_TEXTURE in the
> GraphicBuffer constructor would be a good idea too, since we are binding
> this buffer to a OpenGL texture.
Tried this, no difference - in the Android code, unless the hardware doesn't support cached buffers (whatever they are), all buffers effectively have this flag: http://gitorious.org/0xdroid/frameworks_base/blobs/36088840d796fa745be14384e0fa96443fbc2363/libs/surfaceflinger/Layer.cpp#line374 http://gitorious.org/0xdroid/frameworks_base/blobs/36088840d796fa745be14384e0fa96443fbc2363/libs/surfaceflinger/Layer.cpp#line120
Comment 12•12 years ago
|
||
So in fact, it seems that what we're doing in Unlock() might be completely useless. Can you try just making Unlock() return; without doing anything? Check adb logcat for genlock failures (also try a debug build, it will soon assert if locking fails). If you don't get any genlock failure and rendering looks correct... then Unlock() was a waste of time all along!
Here's the theory behind this. The GL gets a read lock on the gralloc buffer not exactly in Lock(), but when it subsequently starts drawing with it. Likewise, it would release that read lock when it's actually finished drawing with it, regardless of Unlock(). The reason why we'd hopefully not get a genlock failure the next time that we try to lock the gralloc buffer for write to draw content into it, is that the handle that we have to the gralloc buffer is different from the one that the GL internally has to it, so when we lock it, it actually waits for the GL's lock to be released (by contrast, if we try to lock it for write while _we_ have a read lock on it, with the same handle, that fails immediately).
Comment 13•12 years ago
|
||
Does this work?
Assignee | ||
Comment 14•12 years ago
|
||
(In reply to Benoit Jacob [:bjacob] from comment #13)
> Created attachment 750431 [details] [diff] [review]
> Try this!
>
> Does this work?
Unfortunately not - just returning in ::Unlock results in large pauses, occasional corruption and this on logcat:
E/libgenlock( 1263): perform_lock_unlock_operation: GENLOCK_IOC_LOCK failed (lockType0x1, err=Connection timed out fd=87)
E/msm7627a.gralloc( 1263): gralloc_lock: genlock_lock_buffer (lockType=0x2) failed
W/GraphicBufferMapper( 1263): lock(...) failed -22 (Invalid argument)
Assignee | ||
Comment 15•12 years ago
|
||
bjacob sent me a patch to instrument libgenlock.so that aborts on failed locks. This is a stack trace for the first abort: http://www.pastebin.mozilla.org/2410230
I can confirm that the abort only happens when ::Unlock doesn't do anything - deleting the texture or retargeting the eglimage are both fine.
Comment 16•12 years ago
|
||
Oh, right --- I think I remember now, that in addition to the read lock that the GL itself places on the gralloc texture it's sampling from, what we do in Lock() has by itself the effect of placing a read lock on the gralloc buffer; that would be why Unlock() is actually needed.
Assignee | ||
Comment 17•12 years ago
|
||
This adds "Adreno (TM) 205" to the renderers, as this is the renderer string of the Peak, and I assume other affected devices. I don't have an "Adreno 205" device to see if that also needs the work-around.
Attachment #750923 -
Flags: review?(bjacob)
Assignee | ||
Comment 18•12 years ago
|
||
The same patch, altered so it only affects AdrenoTM205.
I tried some other methods of unlocking the texture, but this was the only one that was fast. It'd be great to know a method of unlocking without retargeting/deleting anything, I suspect there are tangible gains to be had here.
Attachment #749423 -
Attachment is obsolete: true
Attachment #750924 -
Flags: review?(bjacob)
Assignee | ||
Comment 19•12 years ago
|
||
needinfo'ing mvines and dwilson - do you guys know of any faster way to get the GL driver to unlock the gralloc buffer besides retargeting the eglimage or deleting the texture?
If we could do it without retargeting or deleting, ::Lock could pretty much turn into a nullop, which would be great :)
Flags: needinfo?(mvines)
Flags: needinfo?(dwilson)
Comment 20•12 years ago
|
||
I think it would be premature to generalize problems seen on the Peak to all A205 devices. I suggest you try this on a Leo, Buri, or Ikura device first. Unagi or Otoro could be used in a pinch but would not be ideal.
Flags: needinfo?(mvines)
Assignee | ||
Comment 21•12 years ago
|
||
(In reply to Michael Vines [:m1] [:evilmachines] from comment #20)
> I think it would be premature to generalize problems seen on the Peak to all
> A205 devices. I suggest you try this on a Leo, Buri, or Ikura device first.
> Unagi or Otoro could be used in a pinch but would not be ideal.
The only other device I have is a Keon, can someone help with this testing? Also, is this method of unlocking really any worse than retargeting the eglimage?
Updated•12 years ago
|
Flags: needinfo?(dwilson)
Updated•12 years ago
|
Attachment #750923 -
Flags: review?(bjacob) → review+
Comment 22•12 years ago
|
||
Comment on attachment 750924 [details] [diff] [review]
Part 2 - Use an alternative method to unlock gralloc textures on AdrenoTM205
Review of attachment 750924 [details] [diff] [review]:
-----------------------------------------------------------------
::: gfx/layers/opengl/TextureHostOGL.cpp
@@ +839,4 @@
> * i.e. before the next time that we will try to acquire a write lock on the same buffer,
> * because read and write locks on gralloc buffers are mutually exclusive.
> */
> + if (mGL->Renderer() == GLContext::RendererAdrenoTM205) {
Eh. I'll be honest, I would much much rather have us figure out what is actually slow with the current code on this driver... but alright, go for it.
Please at least add a comment.
Attachment #750924 -
Flags: review?(bjacob) → review+
Assignee | ||
Comment 23•12 years ago
|
||
Comment added, pushed to inbound:
remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/1caf7322f918
remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/e0e9b99639f8
Another out-of-the-blue hypothesis: Perhaps retargeting causes some kind of flush? I wonder if deferring unlocking until after swap would make any difference?
Comment 24•12 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/1caf7322f918
https://hg.mozilla.org/mozilla-central/rev/e0e9b99639f8
Status: ASSIGNED → RESOLVED
Closed: 12 years ago → 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla24
Comment 25•12 years ago
|
||
So here in the Taipei work week, people said exactly what Michael said in comment 20: that the Peak and Keon devices specifically are known to have special, bad drivers even though the GL strings are the same. So my r+'ing here was premature. Could you at least do the followup work of doing this for the Peak device specifically instead of basing this on GL strings? (Even better, of course, would be to do the follow-up work of going to the bottom of why this was slow on Peak).
Comment 26•12 years ago
|
||
(In reply to Chris Lord [:cwiiis] from comment #23)
> Another out-of-the-blue hypothesis: Perhaps retargeting causes some kind of
> flush? I wonder if deferring unlocking until after swap would make any
> difference?
That's worth looking into.
Assignee | ||
Comment 27•12 years ago
|
||
Looks like this work-around was removed and gralloc is back to performing really terribly on the Peak. Let's just disable it and rely on the tiles backend work making it fast again.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee | ||
Comment 28•12 years ago
|
||
Attachment #777066 -
Flags: review?(jmuizelaar)
Comment 29•12 years ago
|
||
Comment on attachment 777066 [details] [diff] [review]
Disable gralloc on peak
It be nice to have this work if the peak gets updated drivers, but I'm guessing we don't have a way of getting reliable driver versions from the driver?
Attachment #777066 -
Flags: review?(jmuizelaar) → review+
Assignee | ||
Comment 30•12 years ago
|
||
Pushed to inbound: https://hg.mozilla.org/integration/mozilla-inbound/rev/55768d72cb11
We don't, as far as I know, have a reliable way of getting driver versions from the driver and I'm not confident that we have enough employees testing on this device to stop it from breaking frequently anyway.
Given the decision is to go with the tiles layer backend long-term, I think this is a reasonable thing to do for now to give users better performance.
Comment 31•12 years ago
|
||
Status: REOPENED → RESOLVED
Closed: 12 years ago → 12 years ago
Resolution: --- → FIXED
Target Milestone: mozilla24 → mozilla25
Comment 32•11 years ago
|
||
Disable gralloc makes that on 1.2 version says that the browser not support WebGL.
We tested enabling gralloc and seems to work fine without black screen on browser issue and performing very well for example on cubevid.
Regards.
Comment 33•11 years ago
|
||
Should we reenable gralloc on keon/peak ? This causes issues like bug 928398...
Comment 34•11 years ago
|
||
No, this issue is because it's NOT enabled. See comment 3 in this bug.
Yes, you need to reenable gralloc, it's fixed here: https://bugzilla.mozilla.org/show_bug.cgi?id=912134
Comment 35•11 years ago
|
||
Geeksphone: yep, sorry, that's what I meant :) "this" was "gralloc not enabled" ;)
Comment 36•11 years ago
|
||
Yes, that's correct.
We enabled it in internal builds and works flawlessly. It's time to reenable it.
Comment 37•11 years ago
|
||
Alright, Chris, do you want to take care of that?
Flags: needinfo?(chrislord.net)
Assignee | ||
Comment 38•11 years ago
|
||
I don't have a convenient B2G tree at the moment, but seeing as I'm blocking on tests, I'll take a look.
Flags: needinfo?(chrislord.net)
Assignee | ||
Comment 39•11 years ago
|
||
Tested it, on a very quick test, nothing appears to break horribly.
Attachment #8333911 -
Flags: review?(bjacob)
Updated•11 years ago
|
Attachment #8333911 -
Flags: review?(bjacob) → review+
Comment 40•11 years ago
|
||
Asking koi for the reenable fix, I think we can reenable it in 1.2 too, as the fixing bug (bug 912134) was koi+.
(triagers: the status is resolved/fixed because of the first patch that was to disable gralloc on peak)
blocking-b2g: --- → koi?
Assignee | ||
Comment 41•11 years ago
|
||
Pushed to inbound: https://hg.mozilla.org/integration/mozilla-inbound/rev/dc6a1e1f686d
Comment 42•11 years ago
|
||
Please request the b2g26-approval for the same.
Comment 43•11 years ago
|
||
Assignee | ||
Comment 44•11 years ago
|
||
Comment on attachment 8333911 [details] [diff] [review]
Enable gralloc on peak
Requesting b2g26 approval, as requested.
NOTE: This flag is now for security issues only. Please see https://wiki.mozilla.org/Release_Management/B2G_Landing to better understand the B2G approval process and landings.
[Approval Request Comment]
Bug caused by (feature/regressing bug #):
User impact if declined: Possibly worse performance on Geeksphone Peak
Testing completed: Tested locally, Geeksphone have commented that it works fine
Risk to taking this patch (and alternatives if risky): Risk of bad behaviour on Geeksphone Peak due to flaky drivers/presumptuous code
String or UUID changes made by this patch: None
Attachment #8333911 -
Flags: approval-mozilla-b2g26?
Comment 45•11 years ago
|
||
(In reply to Julien Wajsberg [:julienw] from comment #40)
> Asking koi for the reenable fix, I think we can reenable it in 1.2 too, as
> the fixing bug (bug 912134) was koi+.
>
> (triagers: the status is resolved/fixed because of the first patch that was
> to disable gralloc on peak)
Forgot to add the blocking decision from triage here - this isn't a blocker, as this doesn't target a production device. We might consider taking this on an approval, however.
blocking-b2g: koi? → -
Comment 46•11 years ago
|
||
Jason, I thought it was too late for approval for 1.2?
Comment 47•11 years ago
|
||
(In reply to Chris Lord [:cwiiis] from comment #44)
>
> [Approval Request Comment]
> Bug caused by (feature/regressing bug #):
> User impact if declined: Possibly worse performance on Geeksphone Peak
User impact if declined: Picking an image from the Gallery doesn't work on Peak.
> Testing completed: Tested locally, Geeksphone have commented that it works
> fine
> Risk to taking this patch (and alternatives if risky): Risk of bad behaviour
> on Geeksphone Peak due to flaky drivers/presumptuous code
> String or UUID changes made by this patch: None
Updated•11 years ago
|
blocking-b2g: - → 1.3?
Comment 48•11 years ago
|
||
Too late for 1.2 so yes moving to 1.3
Comment 49•11 years ago
|
||
(In reply to Preeti Raghunath(:Preeti) from comment #48)
> Too late for 1.2 so yes moving to 1.3
Looks like this has already landed on m-c, so removing the blocking 1.3 flag here.
blocking-b2g: 1.3? → ---
Comment 50•11 years ago
|
||
(In reply to Julien Wajsberg [:julienw] from comment #46)
> Jason, I thought it was too late for approval for 1.2?
Correct, we have crossed the phase of uplifting patches that are not blockers.
Comment 51•11 years ago
|
||
Bhavana, Preeti,
this patch is important for a basic functionality on Geeksphone Peak (see comment 47). As you can easily see on the patch, the patch will have zero implication on other phones. Therefore, how can we move forward on having this on the 1.2 branch?
Thanks,
Flags: needinfo?(praghunath)
Flags: needinfo?(bbajaj)
Comment 52•11 years ago
|
||
Or should we say to Peak users: "please don't use the 1.2 branch on Peak, it won't work correctly"?
Comment 53•11 years ago
|
||
Firstly apologies on late response here, have been out.
Making an "exception" here on granting approval here which we typically don't do at this point in the cycle as this is a)impacts peak only b)regression causing breakage on an important basic functionality.
Flags: needinfo?(bbajaj)
Updated•11 years ago
|
Attachment #8333911 -
Flags: approval-mozilla-b2g26? → approval-mozilla-b2g26+
Comment 54•11 years ago
|
||
thanks!
Flagging Ryan for uplifting then!
Flags: needinfo?(praghunath) → needinfo?(ryanvm)
Keywords: checkin-needed
Comment 55•11 years ago
|
||
You need to log in
before you can comment on or make changes to this bug.
Description
•