Closed Bug 1077301 Opened 5 years ago Closed 5 years ago

Cleanup gralloc textures code.

Categories

(Core :: Graphics: Layers, defect)

x86_64
Gonk (Firefox OS)
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla36

People

(Reporter: nical, Assigned: nical)

References

Details

Attachments

(2 files, 14 obsolete files)

83.57 KB, patch
Details | Diff | Splinter Review
912 bytes, patch
aosmond
: review+
Details | Diff | Splinter Review
Our gralloc code has progressively evolved into becoming complex and hard to understand. Now is a good time to simplify it and factor the solutions to the different problems we have solved in the past two years into something simpler that takes these problems into account.

The solution Sotaro and I discussed is consists into splitting the TextureHost and TextureSource implementation for the gralloc backend so that the analogy "TextureHost/TextureSource" maps directly to "GraphicBuffer/GLTextureHandle".
We are already close to this except that currently TextureSource is strictly oned by TextureHost and to reach our goal we need to relax this constraint a bit.

The first step is to replace the method
TextureSource* TextureHost::GetTextureSources()
by
bool TextureHost::BindTextureSource(TextureSourceRef& aTextureSource)

which is more flexible (gives the TextureHost the oportunity to attach to an already existing TextureSource if present, and lets us remove helper classes such as CompositableBackendSpecificData which contribute to the complexity of the current code.

An additional method TextureHost::UnbindTexture() is added to let notify the TextureHost when the current TextureSource it is bound to, is being bound to another TextureHost.

For non-gralloc textures, the implementation of BindTextureSources can simply remain something like:
aTextureSource = this->GetTextureSources();

For gralloc, however, we can centralise most of the tricky logic in this function and have the opportunity to do the various optimisations we want to do.

More detailed explanation in this pad: https://etherpad.mozilla.org/texturesource

This is purely a maintainability issue. The goal of this bug is that after this we make roughly the same gralloc/gl calls that we currently do but with simpler code, and be able to build optimizations on top of this as followups.
Attached patch 1) Introduce BindTextureSource (obsolete) — Splinter Review
Prototype put together in the train, needs cleanups and testing, seems to work on linux, but doesn't implement the important part (gralloc stuff).
Attached patch 2) Prototype gralloc implem (obsolete) — Splinter Review
Also made in the train, sort of works but not really (I see things on the screen but get a lot of black flashes), but gives an idea of what the code should be like.
Attached patch 2) Prototype gralloc implem (obsolete) — Splinter Review
The problem was caused by a silly mistake. This patch seems to work (still not implemented for tiling though) on the flame (with my very limited testing).
Attachment #8499415 - Attachment is obsolete: true
OS: Linux → Gonk (Firefox OS)
Attached patch 1) Introduce BindTextureSource (obsolete) — Splinter Review
I had to write a bunch of code so that the LayerScope code paths also use the new api, and also had to copy-paste ContentHostBase::Composite into ContentHostIncremental::Composite so that the specifics of incremental content host get out of the way (this is okay because incremental content host should be removed soon-ish). As a result the patch is a bit bigger than I hoped but the changes are fairly mechanical. What's important is that this introduces the BindTextureSource method which, if not overridden, defaults to calling GetTextureSources internally.
Assignee: nobody → nical.bugzilla
Attachment #8499410 - Attachment is obsolete: true
Attachment #8501355 - Flags: review?(sotaro.ikeda.g)
The interesting part: all of the gralloc code goes into one function and the function is actually quite simple in comparison with our current gralloc logic. This has worked well for me on the flame so far but I haven't run the test suite on it yet. Tiling doesn't use this code yet but all of the other layer types do.
Attachment #8499421 - Attachment is obsolete: true
Attachment #8501359 - Flags: review?(sotaro.ikeda.g)
This is the reward: removing the code that is made obsolete by the previous two patches.
Attachment #8501361 - Flags: review?(sotaro.ikeda.g)
Comment on attachment 8501359 [details] [diff] [review]
2) Implement BindTextureSource for gralloc

Turning review into feedback request because there are some issues.
Attachment #8501359 - Flags: review?(sotaro.ikeda.g) → feedback?(sotaro.ikeda.g)
Attachment #8501361 - Flags: review?(sotaro.ikeda.g)
So I think that I am close but there is something I am missing.

A) If I have the TextureSource keep the EGLImage, it means that in the common case there is one EGLImage+gl handle pair per compositable, and successions of GraphicBuffers get attached to them one frame after the other.
B) If I store the EGLImage in the TextureHost and keep the gl handle in the TextureSource, in the common case each GraphicBuffer has its own EGLImage and the pairs GraphicBuffer+EGLImage get attached to the gl texture handle that is used by the compositable.

I first tried A) and there are some rendering issues, then I tried B) and almost nothing works. Interestingly the way we currently do tiling looks more like B so there must be a little somthing I missed.

 - in the case of A)
1st frame
    glGenTextures
    glActiveTexture
    glBindTexture
    glTexParameteri
    glTexParameteri
    gl->EGLImageTargetTexture2D;
2nd frame, flipping back and front buffers 
    glActiveTexture
    glBindTexture // with the previous gl handle
    gl->EGLImageTargetTexture2D;
3rd frame, flipping back and front buffers 
    glActiveTexture
    glBindTexture // with the previous gl handle
    gl->EGLImageTargetTexture2D;
etc...
when destroying the layer
    glDeleteTextures
    destroy the egl image
Attached patch 1) Introduce BindTextureSource (obsolete) — Splinter Review
Attachment #8501355 - Attachment is obsolete: true
Attachment #8501359 - Attachment is obsolete: true
Attachment #8501355 - Flags: review?(sotaro.ikeda.g)
Attachment #8501359 - Flags: feedback?(sotaro.ikeda.g)
Attachment #8501361 - Attachment is obsolete: true
(In reply to Nicolas Silva [:nical] from comment #8)
> So I think that I am close but there is something I am missing.
> 
> A) If I have the TextureSource keep the EGLImage, it means that in the
> common case there is one EGLImage+gl handle pair per compositable, and
> successions of GraphicBuffers get attached to them one frame after the other.
> B) If I store the EGLImage in the TextureHost and keep the gl handle in the
> TextureSource, in the common case each GraphicBuffer has its own EGLImage
> and the pairs GraphicBuffer+EGLImage get attached to the gl texture handle
> that is used by the compositable.
> 
> I first tried A) and there are some rendering issues, then I tried B) and
> almost nothing works. Interestingly the way we currently do tiling looks
> more like B so there must be a little somthing I missed.

Current gralloc implementation works as B). It's implementation mimics anroid's BufferQueue implementation. I think that it is important to prevent unexpected problems.

In andorid, gralloc buffer and EGLImage are handle as a pair. EGL Image is created when gralloc buffer is bound to OpneGL texture, and the EGLImage is deleted when gralloc buffer is released or the BufferQueue is detached from a OpenGL context.
From the patches, I am not sure about the following.
-  how one OpenGL texture is shared among TextureHosts that are managed under one CompositableHost.
-  how TextureHost is going to own OpenGL texture when it detect that is owned by multiple ImageHosts.
UseTextureHost() calls UnbindTextureSource() at first. It seems to break current "OpenGL binding replacement by new Texture Host".
(In reply to Sotaro Ikeda [:sotaro] from comment #12)
>
> In andorid, gralloc buffer and EGLImage are handle as a pair. EGL Image is
> created when gralloc buffer is bound to OpneGL texture, and the EGLImage is
> deleted when gralloc buffer is released or the BufferQueue is detached from
> a OpenGL context.

EGLImage is bound to OpenGL texture, when the corresponding gralloc buffer is updated and rendered.
(In reply to Sotaro Ikeda [:sotaro] from comment #14)
> UseTextureHost() calls UnbindTextureSource() at first. It seems to break
> current "OpenGL binding replacement by new Texture Host".

UnbindTextureSource currently only clears the mEGLTextureSource reference in GrallocTextureHost, so that the latter knows that it will need to rebind next time it is rendered. It doesn't actually do any unlocking. Unlocking will happen either when another TextureHost is bound to the TextureSource, or when the TextureSource's ref count goes to zero and the gl texture is deleted.
\o/ The reason B) didn't work is that I stupidly forgot to initialize a variable in a constructor.

Patches coming soon!
(In reply to Sotaro Ikeda [:sotaro] from comment #13)
> From the patches, I am not sure about the following.
> -  how one OpenGL texture is shared among TextureHosts that are managed
> under one CompositableHost.
> -  how TextureHost is going to own OpenGL texture when it detect that is
> owned by multiple ImageHosts.

From the patches, it seems to be handled by GrallocTextureHostOGL::BindTextureSource().
This version works great, and is optimal as far as I understand how gralloc works (we only ever bind when we need to).

The nice part of going for solution B is that we already had GLTextureSource so I didn't even need to create a new TextureSource class. I ended up folding the patches because of the way fixes ended up mixed around, let me now if you would like me to split the new code away from the removal of the old code for the review.

Sotaro, I added some comments in the code that try to answer your questions. As you guessed, all of the interesting logic is in BindTextureSource
Attachment #8502548 - Attachment is obsolete: true
Attachment #8502549 - Attachment is obsolete: true
Attachment #8503162 - Flags: feedback?(sotaro.ikeda.g)
With the hwcomposer part fixed.
Attachment #8503162 - Attachment is obsolete: true
Attachment #8503162 - Flags: feedback?(sotaro.ikeda.g)
Attachment #8503169 - Flags: review?(sotaro.ikeda.g)
Attachment #8502546 - Flags: review?(sotaro.ikeda.g)
I have two comment to the current patches.

- In current gecko, CompositableHost::UseTextureHost() replace OpenGL texture's EGLImage binding.
  But the patch seems not update the binding.
  Then, gralloc buffers could return to client side without unbinding it.
  It breaks how adnroid's gralloc buffer works and it cause to hamachi device genlock failures.

- BindTextureSource() could be called several times for same TextureSource and TextureHost pairs.
  If it is already bound, it should skip the binding.
  calling fEGLImageTargetTexture2D() is relatively heavy.
Attachment #8502546 - Flags: review?(sotaro.ikeda.g)
Attachment #8503169 - Flags: review?(sotaro.ikeda.g)
(In reply to Sotaro Ikeda [:sotaro] from comment #21)
> I have two comment to the current patches.
> 
> - In current gecko, CompositableHost::UseTextureHost() replace OpenGL
> texture's EGLImage binding.
>   But the patch seems not update the binding.
>   Then, gralloc buffers could return to client side without unbinding it.
>   It breaks how adnroid's gralloc buffer works and it cause to hamachi
> device genlock failures.

It's the BindTextureSource of TextureHost N+1 that unlocks TextureHost N, or when the TextureSource is destroyed. If TextureHost N returns to the client side before Texture N+1 is used for compositing you are in trouble anyway, because the GPU is still using TextureHost N. If you forcefully unbind TextureHost N without binding TextureHost N+1, then you are also in trouble because slow things happen in the driver.

Moving the binding from UseTextureHost to BindTextureSource (which happens during composition) should not make a difference because, as far as understand, the effect of fEGLImageTargetTexture2D on genlock is not synchronous, and genlock only unlocks TextureHost N when TextureHost N+1 has been composited.

So I think that this patch is still correct for genlock (unless I misunderstood your comment).

> 
> - BindTextureSource() could be called several times for same TextureSource
> and TextureHost pairs.
>   If it is already bound, it should skip the binding.
>   calling fEGLImageTargetTexture2D() is relatively heavy.

If BindTextureSource is called twice in a frame, it will skip the binding the second time, because it sets the GrallocTextureHostOGL::mGLTextureSource member the first time, and then we early return at the beginning of BindTextureSource the second time because mGLTextureSource is not null.
Actually genlock failure happens on hamachi device. It say that a part of the code does not work as expected.
In current gecko code, "binding/unbinding to OpenGL texture" and "OpenGL texture handling" is more explicit. By the patch, some important property becomes implicit. It seems to make more debugging difficult when the problem happens like the current patch's genlock problem.
The patches is bitrotted in latest master.
BindTextureSource seems to be called only around composing, but not around layer transaction. There is a case that a new swapped TextureHost is not used for composition, because of layer state. In this case, BindTextureSource seems not called as expected.
> 
> > 
> > - BindTextureSource() could be called several times for same TextureSource
> > and TextureHost pairs.
> >   If it is already bound, it should skip the binding.
> >   calling fEGLImageTargetTexture2D() is relatively heavy.
> 
> If BindTextureSource is called twice in a frame, it will skip the binding
> the second time, because it sets the GrallocTextureHostOGL::mGLTextureSource
> member the first time, and then we early return at the beginning of
> BindTextureSource the second time because mGLTextureSource is not null.

I confirmed the above part. Thanks.
(In reply to Sotaro Ikeda [:sotaro] from comment #24)
> In current gecko code, "binding/unbinding to OpenGL texture" and "OpenGL
> texture handling" is more explicit. By the patch, some important property
> becomes implicit. It seems to make more debugging difficult when the problem
> happens like the current patch's genlock problem.

Sorry, it depends on point of view.
(In reply to Sotaro Ikeda [:sotaro] from comment #28)
> Sorry, it depends on point of view.

No problem, I do think the new system is simpler and therefore will be easier to debug, but I may of course have gotten some stuff wrong, we'll find out and fix the mistakes if any.
I am rebasing the patches and i'll try on hamachi today
(In reply to Sotaro Ikeda [:sotaro] from comment #26)
> BindTextureSource seems to be called only around composing, but not around
> layer transaction. There is a case that a new swapped TextureHost is not
> used for composition, because of layer state. In this case,
> BindTextureSource seems not called as expected.

Ah interesting! when is that happening, when the layer is outside the screen? When it's hidden? I guess we could move the BindTextureSource call into UseTextureHost instead of during compositing, but it I need to check how that interacts with lock/unlock
I faced it in the past, therefore current GrallocTextureHost update OpenGL texture binding during layer transaction. ContainerLayerComposite seems to skip non-shown layer at ContainerPrepare().
http://dxr.mozilla.org/mozilla-central/source/gfx/layers/composite/ContainerLayerComposite.cpp#183
(In reply to Sotaro Ikeda [:sotaro] from comment #32)
> I faced it in the past, therefore current GrallocTextureHost update OpenGL
> texture binding during layer transaction. ContainerLayerComposite seems to
> skip non-shown layer at ContainerPrepare().
> http://dxr.mozilla.org/mozilla-central/source/gfx/layers/composite/
> ContainerLayerComposite.cpp#183

Ok, we could add a method like "TextureHost::PrepareTextureSource(CompositableTextureSourceRef&)"
which is called in UseTextureHost in the transaction, and move the code from GrallocTextureHost::BindTextureSource to there. I'd need to still have something for render time because of the D3D11 backend. How does this sound ?
That sounds good, gralloc also need another method for this, we need to defer  WaitAcquireFenceSyncComplete() call as long as possible.
Attached patch 1) Introduce BindTextureSource (obsolete) — Splinter Review
rebased patch.
Attachment #8502546 - Attachment is obsolete: true
Attachment #8505526 - Flags: review?(sotaro.ikeda.g)
Attached patch 2) gralloc texture (obsolete) — Splinter Review
rebased, and added the PrepareTextureSource method. Sotaro you were right about the genlock issue when a layer is updated but not composited. This patch fixes the problem and I don't see any genlock failure anymore with hamachi (when trying the camera, some canvas games and scrollong around the web).
Attachment #8503169 - Attachment is obsolete: true
Comment on attachment 8505529 [details] [diff] [review]
2) gralloc texture

I left the printfs in the patch so that if you want to apply it locally you can see what's happening and in what order in the logcat, (disregard them for the review).
Attachment #8505529 - Flags: review?(sotaro.ikeda.g)
Attachment #8505526 - Flags: review?(sotaro.ikeda.g) → review+
Comment on attachment 8505529 [details] [diff] [review]
2) gralloc texture

Review of attachment 8505529 [details] [diff] [review]:
-----------------------------------------------------------------

Good! I also confirmed that genlock failure did not happen on hamachi.
And transPerfMod2 in Bug 1015984 could have similar performance to current master.
Attachment #8505529 - Flags: review?(sotaro.ikeda.g) → review+
Attachment #8505526 - Attachment is obsolete: true
Attachment #8505529 - Attachment is obsolete: true
I'm getting rendering problems with this patch.  Regression checking led me here. 

Mozilla/5.0 (Windows NT 6.3; WOW64; rv:36.0) Gecko/20100101 Firefox/36.0

GOOD
----
20141017045355
https://hg.mozilla.org/integration/mozilla-inbound/rev/1d96887e4712

BAD
---
20141017053755
https://hg.mozilla.org/integration/mozilla-inbound/rev/197317c196cf 

STR:

1. Have this patch on.
2. Go to http://www.weather.com/weather/map/interactive/07746:4?interactiveMapLayer=radar&baseMap=r&zoom=7
3. Move the pointer around and watch the screen or portions of it turn black.

Also, if you hide the menu toolbar and then press the Alt key to show it you will also get the 'blackness'.
Depends on: 1085038
Given that we already have two windows regressions filed and this hasn't even merged to central yet, I tend to think this should probably get backed out before it merges to central.  Does that make sense to you?
Flags: needinfo?(nical.bugzilla)
(My suspicion is that it broke component alpha on unaccelerated Windows, or something like that.)
I decided to go for backing this out now since otherwise there's a danger it'll get merged to central and in a nightly before you see the above:
https://hg.mozilla.org/integration/mozilla-inbound/rev/bcb2a8673c20

(Relanding is easy, and better safe than sorry, I think.)
(In reply to David Baron [:dbaron] (UTC-4) (needinfo? for questions) (away/busy Oct. 24-31) from comment #47)
> (My suspicion is that it broke component alpha on unaccelerated Windows, or
> something like that.)

Component alpha was indeed broken because of a copy-paste mistake.

fixed try push https://tbpl.mozilla.org/?tree=Try&rev=fb0d72b9e67b
Flags: needinfo?(nical.bugzilla)
https://hg.mozilla.org/mozilla-central/rev/5ff4c7dded93
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla36
Blocks: 998019
nical, are you going to keep the fix landed? It affects to bug 998019 land.
Flags: needinfo?(nical.bugzilla)
(In reply to Sotaro Ikeda [:sotaro] from comment #52)
> nical, are you going to keep the fix landed? It affects to bug 998019 land.

Let's keep it landed.
Flags: needinfo?(nical.bugzilla)
(In reply to Nicolas Silva [:nical] from comment #53)
> (In reply to Sotaro Ikeda [:sotaro] from comment #52)
> > nical, are you going to keep the fix landed? It affects to bug 998019 land.
> 
> Let's keep it landed.

Thanks, I just want to confirm it.
Backed out for the same WinXP reftest near perma-fail it was backed out for last time.

https://hg.mozilla.org/integration/mozilla-inbound/rev/ca03ec269799
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: mozilla36 → ---
Depends on: 1088769
Got it. There was another issue related to component alpha where if a layer went from being component alpha to not not being component alpha, a pointer wasn't cleared and the layer be considered as component alpha by the rendering code. It also explains why the problem was intermittent: the bug depended on what was rendered before rather than on what was being rendered at the time of the test.

try push with half a gazillion of xp reftest retriggers:
https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=f340788a936e
https://hg.mozilla.org/mozilla-central/rev/703cf7b92df4
Status: REOPENED → RESOLVED
Closed: 5 years ago5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla36
This is causing the regression in camera described in bug 1088808. From the regression window, I backed out only the above change and it is working as expected now. From the about:memory reports, it looks like the kgsl-memory/b2g/egl_image allocation keeps going up every time we take a picture.
Blocks: 1088808
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
No longer blocks: 1088808
Blocks: 1088808
Attached patch Fix EGL image leak, v1 (obsolete) — Splinter Review
nical: Based on my earlier comments there appearing to be a egl image leak, I tested this patch with camera and the problem seems to be resolved. Of course, I'm not sure if there are other side effects since my GL knowledge is a bit weak :). At least this should help point you in the right area if it isn't enough on its own.
Attachment #8517478 - Flags: review?(nical.bugzilla)
Comment on attachment 8517478 [details] [diff] [review]
Fix EGL image leak, v1

Review of attachment 8517478 [details] [diff] [review]:
-----------------------------------------------------------------

Ouch, that's something I forgot when I moved the EGLImage form TextureSource to TextureHost. Good catch.
Attachment #8517478 - Flags: review?(nical.bugzilla) → review+
Adding proper header to patch file for submit.
Attachment #8517478 - Attachment is obsolete: true
Attachment #8517513 - Flags: review+
Status: REOPENED → ASSIGNED
https://hg.mozilla.org/mozilla-central/rev/66be6ec6d514
Status: ASSIGNED → RESOLVED
Closed: 5 years ago5 years ago
Resolution: --- → FIXED
Depends on: 1098000
See Also: → 1124907
You need to log in before you can comment on or make changes to this bug.