Using Gralloc causes bad performance on the Geeksphone Peak

RESOLVED FIXED in Firefox OS v1.2

Status

()

Core
General
RESOLVED FIXED
5 years ago
4 years ago

People

(Reporter: cwiiis, Assigned: cwiiis)

Tracking

({regression})

Trunk
mozilla25
ARM
Gonk (Firefox OS)
regression
Points:
---

Firefox Tracking Flags

(b2g-v1.2 fixed)

Details

Attachments

(5 attachments, 1 obsolete attachment)

(Assignee)

Description

5 years ago
Building B2G from master has serious performance issues (wrt framerate) on the Geeksphone Peak (and I've heard on the Keon, but I've not tested it) - however, building B2G with master Gaia but the gecko-18 branch of Gecko has very respectable performance, suggesting a gecko issue.

I've not debugged to see where this issue lies yet.
(Assignee)

Updated

5 years ago
blocking-b2g: --- → koi?
(Assignee)

Updated

5 years ago
Keywords: regression
Keywords: regressionwindow-wanted
We haven't really started triaging koi yet. This isn't actionable in itself yet to be nomed yet (no clear scope associated with the bug). Pulling the koi? nom since this doesn't end up in our high priority QA bucket and doesn't have clear scope.

In order to get a regression window here, we're going to need accurate STR, expected results, and actual results that can be used to diagnose the issue to get a window on.

Chris - Can you provide a clear STR to use for the regression window?
blocking-b2g: koi? → ---
Flags: needinfo?(chrislord.net)
(Assignee)

Comment 2

5 years ago
(In reply to Jason Smith [:jsmith] from comment #1)
> We haven't really started triaging koi yet. This isn't actionable in itself
> yet to be nomed yet (no clear scope associated with the bug). Pulling the
> koi? nom since this doesn't end up in our high priority QA bucket and
> doesn't have clear scope.
> 
> In order to get a regression window here, we're going to need accurate STR,
> expected results, and actual results that can be used to diagnose the issue
> to get a window on.
> 
> Chris - Can you provide a clear STR to use for the regression window?

STR:

1. Let phone boot
2. Swipe to everything.me
3. Swipe back to home

If you do this with the gecko-18 branch, performance is pretty good - on master, performance is *dramatically* worse. It should be pretty self-evident, especially if you can compare them side-by-side.

Note, this applies to just about every operation that results in an animation, but this is probaby the quickest and easiest one you can do.
Flags: needinfo?(chrislord.net)
(Assignee)

Comment 3

5 years ago
Issue doesn't exist in the Gecko aurora branch.
(Assignee)

Comment 4

5 years ago
This is bug 868556. Seems this hasn't hit the git mirror yet(?) but it's on hg m-c, so I assume it's coming.
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 868556

Updated

5 years ago
Keywords: regressionwindow-wanted
(Assignee)

Comment 5

5 years ago
I had this wrong - it seems I get terrible performance on the Geeksphone Peak *unless* I revert bug 868556...

This seems pretty unintuitive to me, perhaps the path is broken on that device? I'll do some cursory investigation, any tips welcome.
Blocks: 868556
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
Summary: Performance regression in Gecko affecting Firefox OS (B2G) → Using Gralloc causes bad performance on the Geeksphone Peak
(Assignee)

Comment 6

5 years ago
Created attachment 749423 [details] [diff] [review]
Use an alternative method to unlock gralloc textures

I guess this is a driver issue, but this seems as good a way as any that achieves the same effect but doesn't cause slowness.
Assignee: nobody → chrislord.net
Status: REOPENED → ASSIGNED
Attachment #749423 - Flags: review?(bjacob)
This is terrifying.

This means that we're creating and destroying GL textures everytime we composite. It's interesting to hear that this is faster on one device, but I have to wonder how it perform on others before I can r+ this.

Could you try a few tweaks to the current code instead. This code just binds the EGLImage returned by GetNullEGLImage. This is defined in GLContextProviderEGL.cpp:

    EGLImage GetNullEGLImage() MOZ_OVERRIDE
    {
        if (!mNullGraphicBuffer.get()) {
            mNullGraphicBuffer
              = new android::GraphicBuffer(
                  1, 1,
                  PIXEL_FORMAT_RGB_565,
                  GRALLOC_USAGE_SW_READ_NEVER | GRALLOC_USAGE_SW_WRITE_NEVER);
            EGLint attrs[] = {
                LOCAL_EGL_NONE, LOCAL_EGL_NONE
            };
            mNullEGLImage = sEGLLibrary.fCreateImage(EGL_DISPLAY(),
                                                     EGL_NO_CONTEXT,
                                                     LOCAL_EGL_NATIVE_BUFFER_ANDROID,
                                                     mNullGraphicBuffer->getNativeBuffer(),
                                                     attrs);
        }
        return mNullEGLImage;
    }

Could you check that the fCreateImage path is only hit once? Could you check if changing GRALLOC_USAGE_SW_READ_NEVER to GRALLOC_USAGE_SW_READ_OFTEN makes it better? Could you check if changing the (1, 1) size to (64, 64) makes it better?

Alternatively, if we're really going down the route of your patch:
 - could you check at least another device to see how it's affected? And/or consider doing this only on this particular driver (as identified, maybe, by GL strings)?
 - could you get feedback directly from specialists, such as Michael Vines or Diego Wilson, so we get to understand why this performs better?
Comment on attachment 749423 [details] [diff] [review]
Use an alternative method to unlock gralloc textures

Cancelling review for now; need more investigation.
Attachment #749423 - Flags: review?(bjacob)
(Assignee)

Comment 9

5 years ago
(In reply to Benoit Jacob [:bjacob] from comment #7)
> This is terrifying.
> 
> This means that we're creating and destroying GL textures everytime we
> composite. It's interesting to hear that this is faster on one device, but I
> have to wonder how it perform on others before I can r+ this.

Fair comment. I don't see it as any more terrifying than targeting and untargeting the eglimage on every frame though - textures ought to be pretty lightweight until they have resources associated with them, and given we retarget every frame anyway...

I'll get results for the Keon as well - these are the only two devices I have at the moment.

> Could you try a few tweaks to the current code instead. This code just binds
> the EGLImage returned by GetNullEGLImage. This is defined in
> GLContextProviderEGL.cpp:
> 
>     EGLImage GetNullEGLImage() MOZ_OVERRIDE
>     {
>         if (!mNullGraphicBuffer.get()) {
>             mNullGraphicBuffer
>               = new android::GraphicBuffer(
>                   1, 1,
>                   PIXEL_FORMAT_RGB_565,
>                   GRALLOC_USAGE_SW_READ_NEVER |
> GRALLOC_USAGE_SW_WRITE_NEVER);
>             EGLint attrs[] = {
>                 LOCAL_EGL_NONE, LOCAL_EGL_NONE
>             };
>             mNullEGLImage = sEGLLibrary.fCreateImage(EGL_DISPLAY(),
>                                                      EGL_NO_CONTEXT,
>                                                     
> LOCAL_EGL_NATIVE_BUFFER_ANDROID,
>                                                     
> mNullGraphicBuffer->getNativeBuffer(),
>                                                      attrs);
>         }
>         return mNullEGLImage;
>     }
> 
> Could you check that the fCreateImage path is only hit once? Could you check
> if changing GRALLOC_USAGE_SW_READ_NEVER to GRALLOC_USAGE_SW_READ_OFTEN makes
> it better? Could you check if changing the (1, 1) size to (64, 64) makes it
> better?

So I tried every combination of things here - changing size, format, flags, separately and together, it made no difference. I can confirm the path only gets hit once.

> Alternatively, if we're really going down the route of your patch:
>  - could you check at least another device to see how it's affected? And/or
> consider doing this only on this particular driver (as identified, maybe, by
> GL strings)?

This sounds like a reasonable thing to do - will look at that.

>  - could you get feedback directly from specialists, such as Michael Vines
> or Diego Wilson, so we get to understand why this performs better?

I don't know who these people are, but I'll go about finding out...
I'm told that passing the bit GraphicBuffer::USAGE_HW_TEXTURE in the GraphicBuffer constructor would be a good idea too, since we are binding this buffer to a OpenGL texture.
(Assignee)

Comment 11

5 years ago
(In reply to Benoit Jacob [:bjacob] from comment #10)
> I'm told that passing the bit GraphicBuffer::USAGE_HW_TEXTURE in the
> GraphicBuffer constructor would be a good idea too, since we are binding
> this buffer to a OpenGL texture.

Tried this, no difference - in the Android code, unless the hardware doesn't support cached buffers (whatever they are), all buffers effectively have this flag: http://gitorious.org/0xdroid/frameworks_base/blobs/36088840d796fa745be14384e0fa96443fbc2363/libs/surfaceflinger/Layer.cpp#line374 http://gitorious.org/0xdroid/frameworks_base/blobs/36088840d796fa745be14384e0fa96443fbc2363/libs/surfaceflinger/Layer.cpp#line120
So in fact, it seems that what we're doing in Unlock() might be completely useless. Can you try just making Unlock() return; without doing anything? Check adb logcat for genlock failures (also try a debug build, it will soon assert if locking fails). If you don't get any genlock failure and rendering looks correct... then Unlock() was a waste of time all along!

Here's the theory behind this. The GL gets a read lock on the gralloc buffer not exactly in Lock(), but when it subsequently starts drawing with it. Likewise, it would release that read lock when it's actually finished drawing with it, regardless of Unlock(). The reason why we'd hopefully not get a genlock failure the next time that we try to lock the gralloc buffer for write to draw content into it, is that the handle that we have to the gralloc buffer is different from the one that the GL internally has to it, so when we lock it, it actually waits for the GL's lock to be released (by contrast, if we try to lock it for write while _we_ have a read lock on it, with the same handle, that fails immediately).
(Assignee)

Comment 14

5 years ago
(In reply to Benoit Jacob [:bjacob] from comment #13)
> Created attachment 750431 [details] [diff] [review]
> Try this!
> 
> Does this work?

Unfortunately not - just returning in ::Unlock results in large pauses, occasional corruption and this on logcat:

E/libgenlock( 1263): perform_lock_unlock_operation: GENLOCK_IOC_LOCK failed (lockType0x1, err=Connection timed out fd=87)
E/msm7627a.gralloc( 1263): gralloc_lock: genlock_lock_buffer (lockType=0x2) failed
W/GraphicBufferMapper( 1263): lock(...) failed -22 (Invalid argument)
(Assignee)

Comment 15

5 years ago
bjacob sent me a patch to instrument libgenlock.so that aborts on failed locks. This is a stack trace for the first abort: http://www.pastebin.mozilla.org/2410230

I can confirm that the abort only happens when ::Unlock doesn't do anything - deleting the texture or retargeting the eglimage are both fine.
Oh, right --- I think I remember now, that in addition to the read lock that the GL itself places on the gralloc texture it's sampling from, what we do in Lock() has by itself the effect of placing a read lock on the gralloc buffer; that would be why Unlock() is actually needed.
(Assignee)

Comment 17

5 years ago
Created attachment 750923 [details] [diff] [review]
Part 1 - Add Adreno (TM) 205 to renderers

This adds "Adreno (TM) 205" to the renderers, as this is the renderer string of the Peak, and I assume other affected devices. I don't have an "Adreno 205" device to see if that also needs the work-around.
Attachment #750923 - Flags: review?(bjacob)
(Assignee)

Comment 18

5 years ago
Created attachment 750924 [details] [diff] [review]
Part 2 - Use an alternative method to unlock gralloc textures on AdrenoTM205

The same patch, altered so it only affects AdrenoTM205.

I tried some other methods of unlocking the texture, but this was the only one that was fast. It'd be great to know a method of unlocking without retargeting/deleting anything, I suspect there are tangible gains to be had here.
Attachment #749423 - Attachment is obsolete: true
Attachment #750924 - Flags: review?(bjacob)
(Assignee)

Comment 19

5 years ago
needinfo'ing mvines and dwilson - do you guys know of any faster way to get the GL driver to unlock the gralloc buffer besides retargeting the eglimage or deleting the texture?

If we could do it without retargeting or deleting, ::Lock could pretty much turn into a nullop, which would be great :)
Flags: needinfo?(mvines)
Flags: needinfo?(dwilson)
I think it would be premature to generalize problems seen on the Peak to all A205 devices.  I suggest you try this on a Leo, Buri, or Ikura device first.  Unagi or Otoro could be used in a pinch but would not be ideal.
Flags: needinfo?(mvines)
(Assignee)

Comment 21

5 years ago
(In reply to Michael Vines [:m1] [:evilmachines] from comment #20)
> I think it would be premature to generalize problems seen on the Peak to all
> A205 devices.  I suggest you try this on a Leo, Buri, or Ikura device first.
> Unagi or Otoro could be used in a pinch but would not be ideal.

The only other device I have is a Keon, can someone help with this testing? Also, is this method of unlocking really any worse than retargeting the eglimage?

Updated

5 years ago
Flags: needinfo?(dwilson)
Attachment #750923 - Flags: review?(bjacob) → review+
Comment on attachment 750924 [details] [diff] [review]
Part 2 - Use an alternative method to unlock gralloc textures on AdrenoTM205

Review of attachment 750924 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/opengl/TextureHostOGL.cpp
@@ +839,4 @@
>     * i.e. before the next time that we will try to acquire a write lock on the same buffer,
>     * because read and write locks on gralloc buffers are mutually exclusive.
>     */
> +  if (mGL->Renderer() == GLContext::RendererAdrenoTM205) {

Eh. I'll be honest, I would much much rather have us figure out what is actually slow with the current code on this driver... but alright, go for it.

Please at least add a comment.
Attachment #750924 - Flags: review?(bjacob) → review+
(Assignee)

Comment 23

5 years ago
Comment added, pushed to inbound:

remote:   https://hg.mozilla.org/integration/mozilla-inbound/rev/1caf7322f918
remote:   https://hg.mozilla.org/integration/mozilla-inbound/rev/e0e9b99639f8


Another out-of-the-blue hypothesis: Perhaps retargeting causes some kind of flush? I wonder if deferring unlocking until after swap would make any difference?
https://hg.mozilla.org/mozilla-central/rev/1caf7322f918
https://hg.mozilla.org/mozilla-central/rev/e0e9b99639f8
Status: ASSIGNED → RESOLVED
Last Resolved: 5 years ago5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla24
So here in the Taipei work week, people said exactly what Michael said in comment 20: that the Peak and Keon devices specifically are known to have special, bad drivers even though the GL strings are the same. So my r+'ing here was premature. Could you at least do the followup work of doing this for the Peak device specifically instead of basing this on GL strings? (Even better, of course, would be to do the follow-up work of going to the bottom of why this was slow on Peak).
(In reply to Chris Lord [:cwiiis] from comment #23)
> Another out-of-the-blue hypothesis: Perhaps retargeting causes some kind of
> flush? I wonder if deferring unlocking until after swap would make any
> difference?

That's worth looking into.
(Assignee)

Comment 27

5 years ago
Looks like this work-around was removed and gralloc is back to performing really terribly on the Peak. Let's just disable it and rely on the tiles backend work making it fast again.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(Assignee)

Comment 28

5 years ago
Created attachment 777066 [details] [diff] [review]
Disable gralloc on peak
Attachment #777066 - Flags: review?(jmuizelaar)
Comment on attachment 777066 [details] [diff] [review]
Disable gralloc on peak

It be nice to have this work if the peak gets updated drivers, but I'm guessing we don't have a way of getting reliable driver versions from the driver?
Attachment #777066 - Flags: review?(jmuizelaar) → review+
(Assignee)

Comment 30

5 years ago
Pushed to inbound: https://hg.mozilla.org/integration/mozilla-inbound/rev/55768d72cb11

We don't, as far as I know, have a reliable way of getting driver versions from the driver and I'm not confident that we have enough employees testing on this device to stop it from breaking frequently anyway.

Given the decision is to go with the tiles layer backend long-term, I think this is a reasonable thing to do for now to give users better performance.
https://hg.mozilla.org/mozilla-central/rev/55768d72cb11
Status: REOPENED → RESOLVED
Last Resolved: 5 years ago5 years ago
Resolution: --- → FIXED
Target Milestone: mozilla24 → mozilla25

Comment 32

5 years ago
Disable gralloc makes that on 1.2 version says that the browser not support WebGL.

We tested enabling gralloc and seems to work fine without black screen on browser issue and performing very well for example on cubevid.

Regards.
Should we reenable gralloc on keon/peak ? This causes issues like bug 928398...

Comment 34

4 years ago
No, this issue is because it's NOT enabled. See comment 3 in this bug.

Yes, you need to reenable gralloc, it's fixed here: https://bugzilla.mozilla.org/show_bug.cgi?id=912134
Geeksphone: yep, sorry, that's what I meant :) "this" was "gralloc not enabled" ;)

Comment 36

4 years ago
Yes, that's correct.

We enabled it in internal builds and works flawlessly. It's time to reenable it.
Alright, Chris, do you want to take care of that?
Flags: needinfo?(chrislord.net)
(Assignee)

Comment 38

4 years ago
I don't have a convenient B2G tree at the moment, but seeing as I'm blocking on tests, I'll take a look.
Flags: needinfo?(chrislord.net)
(Assignee)

Comment 39

4 years ago
Created attachment 8333911 [details] [diff] [review]
Enable gralloc on peak

Tested it, on a very quick test, nothing appears to break horribly.
Attachment #8333911 - Flags: review?(bjacob)
Attachment #8333911 - Flags: review?(bjacob) → review+
Asking koi for the reenable fix, I think we can reenable it in 1.2 too, as the fixing bug (bug 912134) was koi+.

(triagers: the status is resolved/fixed because of the first patch that was to disable gralloc on peak)
blocking-b2g: --- → koi?
Please request the b2g26-approval for the same.
(Assignee)

Comment 44

4 years ago
Comment on attachment 8333911 [details] [diff] [review]
Enable gralloc on peak

Requesting b2g26 approval, as requested.

NOTE: This flag is now for security issues only. Please see https://wiki.mozilla.org/Release_Management/B2G_Landing to better understand the B2G approval process and landings.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): 
User impact if declined: Possibly worse performance on Geeksphone Peak
Testing completed: Tested locally, Geeksphone have commented that it works fine
Risk to taking this patch (and alternatives if risky): Risk of bad behaviour on Geeksphone Peak due to flaky drivers/presumptuous code
String or UUID changes made by this patch: None
Attachment #8333911 - Flags: approval-mozilla-b2g26?
(In reply to Julien Wajsberg [:julienw] from comment #40)
> Asking koi for the reenable fix, I think we can reenable it in 1.2 too, as
> the fixing bug (bug 912134) was koi+.
> 
> (triagers: the status is resolved/fixed because of the first patch that was
> to disable gralloc on peak)

Forgot to add the blocking decision from triage here - this isn't a blocker, as this doesn't target a production device. We might consider taking this on an approval, however.
blocking-b2g: koi? → -
Jason, I thought it was too late for approval for 1.2?
(In reply to Chris Lord [:cwiiis] from comment #44)
> 
> [Approval Request Comment]
> Bug caused by (feature/regressing bug #): 
> User impact if declined: Possibly worse performance on Geeksphone Peak

User impact if declined: Picking an image from the Gallery doesn't work on Peak.

> Testing completed: Tested locally, Geeksphone have commented that it works
> fine
> Risk to taking this patch (and alternatives if risky): Risk of bad behaviour
> on Geeksphone Peak due to flaky drivers/presumptuous code
> String or UUID changes made by this patch: None
blocking-b2g: - → 1.3?
Too late for 1.2 so yes moving to 1.3
(In reply to Preeti Raghunath(:Preeti) from comment #48)
> Too late for 1.2 so yes moving to 1.3

Looks like this has already landed on m-c, so removing the blocking 1.3 flag here.
blocking-b2g: 1.3? → ---
(In reply to Julien Wajsberg [:julienw] from comment #46)
> Jason, I thought it was too late for approval for 1.2?


Correct, we have crossed the phase of uplifting patches that are not blockers.
Bhavana, Preeti,

this patch is important for a basic functionality on Geeksphone Peak (see comment 47). As you can easily see on the patch, the patch will have zero implication on other phones. Therefore, how can we move forward on having this on the 1.2 branch?

Thanks,
Flags: needinfo?(praghunath)
Flags: needinfo?(bbajaj)
Or should we say to Peak users: "please don't use the 1.2 branch on Peak, it won't work correctly"?
Firstly apologies on late response here, have been out.

Making an "exception" here on granting approval here which we typically don't do at this point in the cycle as this is a)impacts peak only b)regression causing breakage on an important basic functionality.
Flags: needinfo?(bbajaj)

Updated

4 years ago
Attachment #8333911 - Flags: approval-mozilla-b2g26? → approval-mozilla-b2g26+
thanks!

Flagging Ryan for uplifting then!
Flags: needinfo?(praghunath) → needinfo?(ryanvm)
Keywords: checkin-needed
https://hg.mozilla.org/releases/mozilla-b2g26_v1_2/rev/ed5db1fd8220
status-b2g-v1.2: --- → fixed
Flags: needinfo?(ryanvm)
Keywords: checkin-needed

Updated

4 years ago
Depends on: 951782
No longer depends on: 951782
You need to log in before you can comment on or make changes to this bug.