Closed Bug 824666 Opened 12 years ago Closed 10 years ago

the Deconstruct function of GraphicBuffer is called disorder

Categories

(Core :: Graphics, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 959089
blocking-b2g -

People

(Reporter: xuhui7148, Unassigned)

References

Details

Attachments

(1 file)

Hello :
In my work on B2G, I found that if I leave from LockScreen to HomeScreen there is something wrong to call the deconstruct function.
By this cause, the B2G process acts both as CompositorParent and CompositorChild(just in my opinion). It providers GraphicBuffer (through flatten by one thread and unflatten by another thread).
Then when the GraphicBuffer is deleted, from the Parent opinion, the member of GraphicBuffer's entity is ownData and from the Child opinion , the value is ownHandle. The base code like this:

void GraphicBuffer::free_handle()
{
    if (mOwner == ownHandle) {
	mBufferMapper.unregisterBuffer(handle);
        native_handle_close(handle);
        native_handle_delete(const_cast<native_handle*>(handle));
    } else if (mOwner == ownData) {
        GraphicBufferAllocator& allocator(GraphicBufferAllocator::get());
        allocator.free(handle);
    }
    mWrappedBuffer = 0;
}

Sometimes it will first go Parent's flow and then go Child's flow , which means  the gralloc buffer has been freed at first and then unregistered.

In some devices, it would cause memory leak(by the implement of Gralloc.so)
Just to confirm this disorder of calls is a real issue. 

It is easily seen on Crespo (nexus S) and PandaBoard (PowerVR GPU), where gralloc issues a warning on unregister, because a buffer has already been deleted:


> W/GraphicBufferMapper( 1604): unregisterBuffer(0x473e6c00) failed -22 (Invalid argument)
> W/GraphicBufferMapper( 1604): unregisterBuffer(0x4809e340) failed -22 (Invalid argument)
> W/GraphicBufferMapper( 1604): unregisterBuffer(0x480ac1c0) failed -22 (Invalid argument)
> W/GraphicBufferMapper( 1604): unregisterBuffer(0x481cc040) failed -22 (Invalid argument)

My opinion is that this happens on every (or almost every) buffer unregistration, as these messages appear quite often on video playback, app start/close, homescreen...

If more feedback is required I can try to capture more detailed logcat, which prove the issue.
There was some discussion about the issue here: Bug 871624, but it was not resolved then.
It seems I do not have the right (or knowledge) to set this bug as a blocker for Bug 901322. Can somebody do this? I think :gerard-majax  would like to track it.
(In reply to Nikola from comment #2)
> It seems I do not have the right (or knowledge) to set this bug as a blocker
> for Bug 901322. Can somebody do this? I think :gerard-majax  would like to
> track it.

Done, thanks for confirming this issue.
Blocks: b2g-nexuss
Status: UNCONFIRMED → NEW
Ever confirmed: true
(In reply to Nikola from comment #2)
> It seems I do not have the right (or knowledge) to set this bug as a blocker
> for Bug 901322. Can somebody do this? I think :gerard-majax  would like to
> track it.

Awesome. I've been seeing and trying to find this one for months. We are also seeing the Warning on other PowerVR devices (Flatfish).
Attached is a GDB backtrace when the unregisterBuffer() hits -EINVAL.

One can notice two things:

#2  0x4011fdd6 in ~GraphicBuffer (this=0x463b4680, __in_chrg=<value optimized out>) at frameworks/base/libs/ui/GraphicBuffer.cpp:96
#3  0x4011fdf8 in ~GraphicBuffer (this=0x43552910, __in_chrg=<value optimized out>) at frameworks/base/libs/ui/GraphicBuffer.cpp:98

and

#7  0x40f0855a in ~GrallocBufferActor (this=0x478bd560, __in_chrg=<value optimized out>) at /home/alex/codaz/Mozilla/b2g/devices/NexusS/B2G/gecko/gfx/layers/ipc/ShadowLayerUtilsGralloc.cpp:271
#8  0x40f08584 in ~GrallocBufferActor (this=0x43552910, __in_chrg=<value optimized out>) at /home/alex/codaz/Mozilla/b2g/devices/NexusS/B2G/gecko/gfx/layers/ipc/ShadowLayerUtilsGralloc.cpp:271
I'm wondering if I am not observing this on Tarako also:
E/[Gralloc-ERROR](  826): int gralloc_unregister_buffer(const gralloc_module_t*, const native_handle_t*):274 Trying to unregister buffer 0x4592f3d0 from process 826 that was not created in current process: 826
This is the order of calls I got on V1.3 from mid december (unfortunately I don't have any new logs): 
1. alloc      - address 0x3160f8c0
2. unflatten  - address 0x30415b40
3. free       - address 3160f8c0
4. unregister - unregisterBuffer(0x30415b40) failed -22 (Invalid argument)

From what I remember about the issue back then, problem was that IPDL generated code did not call destructor of Parent and Child classes in a correct oreder. But I am not sure if this is correct, and I cannot find the source.

I do remember this part of the code being mentioned as a suspect:
objdir-gecko/ipc/ipdl/PLayerTransactionChild.cpp
 485     (actor)->DestroySubtree(Deletion);
 486     (actor)->DeallocSubtree();
 487     ((actor)->mManager)->RemoveManagee(PLayerTransactionMsgStart, actor);

Hope this helps.
(In reply to Nikola from comment #7)
> This is the order of calls I got on V1.3 from mid december (unfortunately I
> don't have any new logs): 
> 1. alloc      - address 0x3160f8c0
> 2. unflatten  - address 0x30415b40
> 3. free       - address 3160f8c0
> 4. unregister - unregisterBuffer(0x30415b40) failed -22 (Invalid argument)
> 
> From what I remember about the issue back then, problem was that IPDL
> generated code did not call destructor of Parent and Child classes in a
> correct oreder. But I am not sure if this is correct, and I cannot find the
> source.
> 
> I do remember this part of the code being mentioned as a suspect:
> objdir-gecko/ipc/ipdl/PLayerTransactionChild.cpp
>  485     (actor)->DestroySubtree(Deletion);
>  486     (actor)->DeallocSubtree();
>  487     ((actor)->mManager)->RemoveManagee(PLayerTransactionMsgStart,
> actor);
> 
> Hope this helps.

I do find traces of this in the current IPDL generated code. Can you link any discussion related to this fact?
This was not easy to find :)

Bug 862324 - Use after free of PGrallocBufferParent when the child process is killed 
https://bugzilla.mozilla.org/show_bug.cgi?id=862324#c7

I will see later today if I can find more.
Maybe Ben can help us for IPDL stuff :)
blocking-b2g: --- → 1.3T?
Component: General → Graphics
Product: Firefox OS → Core
Gregor, can you provide more justification for blocking? 
does not seem to be an obvious blocker for tarako. thanks
It has a performance impact on scrolling for the whole system.
Hi! CJ,

Need comment form your team. Thanks.

--
Keven
Flags: needinfo?(cku)
qawanted: can we have QA to test overall scrolling performance on the device after turning off APZC? Thanks
Keywords: qawanted
One good way to reproduce this, on both Nexus S and Tarako, is to grab the status bar and pull down the utility tray. You will get the logcat message, and you should experience some bad lag doing so.
(In reply to Joe Cheng [:jcheng] from comment #14)
> qawanted: can we have QA to test overall scrolling performance on the device
> after turning off APZC? Thanks

Brian's team already wrote up a detailed report analyzing this.
Keywords: qawanted
Chiajung, graloc buffer relative, please comment
Flags: needinfo?(cku) → needinfo?(chung)
(In reply to Alexandre LISSY :gerard-majax from comment #6)
> I'm wondering if I am not observing this on Tarako also:
> E/[Gralloc-ERROR](  826): int gralloc_unregister_buffer(const
> gralloc_module_t*, const native_handle_t*):274 Trying to unregister buffer
> 0x4592f3d0 from process 826 that was not created in current process: 826
From tarako's code base, I think it is somewhat a false-alarm.

(In reply to C.J. Ku[:CJKu] from comment #17)
> Chiajung, graloc buffer relative, please comment

I remember that gralloc deallocation is always fired from Parent side from 
http://dxr.mozilla.org/mozilla-central/source/gfx/layers/opengl/GrallocTextureHost.cpp#351

And the generated code is:

    bool __sendok = ((actor)->mChannel)->Send(__msg);

    (actor)->DestroySubtree(Deletion);
    (actor)->DeallocSubtree();
    ((actor)->mManager)->RemoveManagee(PGrallocBufferMsgStart, actor);

Since the delete message is async, the child side may not have chance to unregister first. A very simple workaround should be make PGrallocBuffer sync protocol and make __delete__ sync, too. Another way to fix this is to make child side issues the deallocation. (These 2 solutions conflict, and both of them must make sure the __delete__ message is in fact single side callable)
Flags: needinfo?(chung) → needinfo?(james.zhang)
Loop Ying
Flags: needinfo?(james.zhang) → needinfo?(ying.xu)
I believe deallocation would not work on the client side, as it must be done by the process which allocated the buffer (as it is the only one which has the native handle), and this is parent (Main) process. However, I did not test deallocation with child handle, maybe it can work. 

Using sync messaging with dealloc on parent side sounds OK.
(In reply to Nikola from comment #20)
> Using sync messaging with dealloc on parent side sounds OK.

Sync messages every time we dealloc a gralloc buffer is going to hurt performance. If we need to message the parent for this, it has to be async.
Note that we already PTexture::RemoveTexture which goes from child to parent when we destroy the texture so we can hook some logic in there if needed (without having to add a new ipdl message).
triage: add to backlog, current scrolling performance numbers look ok
blocking-b2g: 1.3T? → backlog
How did you collect those scrolling performance numbers? I documented one way to reproduce on comment 15, I should add a few things:
 - when it reproduces, you clearly feel a bad lag on the UI
 - general scrolling is not affected
Flags: needinfo?(jcheng)
(In reply to Alexandre LISSY :gerard-majax from comment #6)
> I'm wondering if I am not observing this on Tarako also:
> E/[Gralloc-ERROR](  826): int gralloc_unregister_buffer(const
> gralloc_module_t*, const native_handle_t*):274 Trying to unregister buffer
> 0x4592f3d0 from process 826 that was not created in current process: 826

on tarako with mali gpu.
graphic buffer should not be allocated and registered in the same process.
If it happened as this bug descripted , a log would be printed.

And we haven't find any memory lead yet.
Flags: needinfo?(ying.xu)
(In reply to ying.xu from comment #24)
> (In reply to Alexandre LISSY :gerard-majax from comment #6)
> > I'm wondering if I am not observing this on Tarako also:
> > E/[Gralloc-ERROR](  826): int gralloc_unregister_buffer(const
> > gralloc_module_t*, const native_handle_t*):274 Trying to unregister buffer
> > 0x4592f3d0 from process 826 that was not created in current process: 826
> 
> on tarako with mali gpu.
> graphic buffer should not be allocated and registered in the same process.
> If it happened as this bug descripted , a log would be printed.
> 
> And we haven't find any memory lead yet.

Ying xu, is it true for all mali gpus? Even latest one?
Flags: needinfo?(ying.xu)
In current b2g, android::GraphicBuffers are instantiated for Parent side and Child side. In normal application, parent and child are in different processes. But system app is in b2g process. In this case, parent and child are in same process. It is unavoidable in current architecture.
Most easy way for workaround to "graphic buffer should not be allocated and registered in the same process" problem seems Bug 959089 fix + "some workaround".
(In reply to Sotaro Ikeda [:sotaro] from comment #25)
> (In reply to ying.xu from comment #24)
> Ying xu, is it true for all mali gpus? Even latest one?

No.

I dont think it's the requirement of mali GPU but android system.

Our ffos code base came from android system directly.

I will check gralloc hal of android4.4 .
(In reply to ying.xu from comment #28)
> (In reply to Sotaro Ikeda [:sotaro] from comment #25)
> > (In reply to ying.xu from comment #24)
> > Ying xu, is it true for all mali gpus? Even latest one?
> 
> No.
> 
> I dont think it's the requirement of mali GPU but android system.

Really? I do not think android have such limitation. The following code is android ICS's generic gralloc code. It just avoid unmap if gralloc handle is created in same process. The code actually permit "graphic buffer should not be allocated and registered in the same process".

--------------------------------------

int gralloc_unregister_buffer(gralloc_module_t const* module,
        buffer_handle_t handle)
{
    if (private_handle_t::validate(handle) < 0)
        return -EINVAL;

    // never unmap buffers that were created in this process
    private_handle_t* hnd = (private_handle_t*)handle;
    if (hnd->pid != getpid()) {
        if (hnd->base) {
            gralloc_unmap(module, handle);
        }
    }
    return 0;
}
So, I also suspicious mali has such limitation.
(In reply to Sotaro Ikeda [:sotaro] from comment #30)
> So, I also suspicious mali has such limitation.

Because, tarako actually render correctly system app UI.
The following are map function from android ICS. IF gralloc buffer handle is in same process to original gralloc, it does not do map. So, creating multiple anroid::GraphicBuffers from handle in same process is valid. gpu does not need to handle multiple mapping if original gralloc buffer is in same process.

-----------------------------------

int gralloc_register_buffer(gralloc_module_t const* module,
        buffer_handle_t handle)
{
    if (private_handle_t::validate(handle) < 0)
        return -EINVAL;

    // if this handle was created in this process, then we keep it as is.
    int err = 0;
    private_handle_t* hnd = (private_handle_t*)handle;
    if (hnd->pid != getpid()) {
        void *vaddr;
        err = gralloc_map(module, handle, &vaddr);
    }
    return err;
}
I also checked JB and KK. Since JB, gralloc_register_buffer() implementation becomes different. Even when gralloc buffer handle is created in same process to original gralloc, new mapping is created. Android expect gpu supports multiple mapping. But it might cause "memory ordering problems". It is written in the code.

http://androidxref.com/4.4.2_r2/xref/hardware/libhardware/modules/gralloc/mapper.cpp#89

http://androidxref.com/4.3_r2.1/xref/hardware/libhardware/modules/gralloc/mapper.cpp#89
From comment 32, comment 33, andorid framework point of view, create multiple android::GraphicBuffer from original android::GraphicBuffer is basically supported, but could cause "memory ordering problems". It seems better to create one android::GraphicBuffer per one process.
(In reply to Sotaro Ikeda [:sotaro] from comment #34)
> From comment 32, comment 33, andorid framework point of view, create
> multiple android::GraphicBuffer from original android::GraphicBuffer is
> basically supported, but could cause "memory ordering problems". It seems
> better to create one android::GraphicBuffer per one process.

In b2g, gralloc buffer usage is controlled correctly between CompositableHost and CompositableClient. So, "memory ordering problems" might not be a problem.
(In reply to ying.xu from comment #24)
> (In reply to Alexandre LISSY :gerard-majax from comment #6)
> > I'm wondering if I am not observing this on Tarako also:
> > E/[Gralloc-ERROR](  826): int gralloc_unregister_buffer(const
> > gralloc_module_t*, const native_handle_t*):274 Trying to unregister buffer
> > 0x4592f3d0 from process 826 that was not created in current process: 826
> 
> on tarako with mali gpu.
> graphic buffer should not be allocated and registered in the same process.
> If it happened as this bug descripted , a log would be printed.
> 
> And we haven't find any memory lead yet.

If this lead to any problem, it is definitely avoidable.
The register call from main thread(or ImageBridgeChild thread) in b2g proc is caused by Deserializer stuff of MagicGralloccBufferHandle, where we can sandbox the AllocSurfacfeDescriptorGralloc call to avoid serialize/deserialize in b2g process. Though the change is going to quiet big.
(In reply to Nicolas Silva [:nical] from comment #21)
> (In reply to Nikola from comment #20)
> > Using sync messaging with dealloc on parent side sounds OK.
> 
> Sync messages every time we dealloc a gralloc buffer is going to hurt
> performance. If we need to message the parent for this, it has to be async.
> Note that we already PTexture::RemoveTexture which goes from child to parent
> when we destroy the texture so we can hook some logic in there if needed
> (without having to add a new ipdl message).

Agree this, another way to avoid this is to add an async Message for child:
    async WillDelete()
And make Parent side call SendWillDelete and let child side call Send__delete__ after it unregister the buffer(when it RecvWillDelete) to keep the unregister and deallocation sequence. This would need some very hacky android sp operations.

Sync for delete message may ruins performance in current architecture, since the Gralloc things is running on CompositorThread in Parent side, which we should avoid to lock it anyway.
(In reply to Sotaro Ikeda [:sotaro] from comment #27)
> Most easy way for workaround to "graphic buffer should not be allocated and
> registered in the same process" problem seems Bug 959089 fix + "some
> workaround".

Since Bug 959089 just look up for the original instance. For the deletion order problem, we can make the Drop call in that new protocol to be sync.(Note: since the new protocol has its own thread, it should not hurt performance)
(In reply to Sotaro Ikeda [:sotaro] from comment #30)
> So, I also suspicious mali has such limitation.

I checked the code of KK , still with mali gpu, the register/unregister code are as below

static int gralloc_register_buffer(gralloc_module_t const *module, buffer_handle_t handle)
{
        // if this handle was created in this process, then we keep it as is.
        private_handle_t *hnd = (private_handle_t *)handle;

        int retval = -EINVAL;

        pthread_mutex_lock(&s_map_lock);

        hnd->pid = getpid();
.....

static int gralloc_unregister_buffer(gralloc_module_t const *module, buffer_handle_t handle)
{
        private_handle_t *hnd = (private_handle_t *)handle;

        if (hnd->flags & private_handle_t::PRIV_FLAGS_FRAMEBUFFER)
        {
                AERR("Can't unregister buffer 0x%x as it is a framebuffer", (unsigned int)handle);
        }
        else if (hnd->pid == getpid()) // never unmap buffers that were not registered in this process
        {
....
Flags: needinfo?(ying.xu)
FYI, the other day, nical gave me a small patch to test that was making sure we would not call unregisterBuffer from the same process. This did not helped.
I hacked a bit, and I noticed that the -EINVAL were only triggered on main process requests. Also, I think some of the first unregisterBuffer() do not fail, but I'm not 100% sure.
I applied the patches of bug 959089 on my Nexus S, and I cannot see anymore any unregisterBuffer() error in logcat !
Flags: needinfo?(jcheng)
I'm marking this as a duplicate of bug 959089 since I cannot reproduce the unregisterBuffer() returning -EINVAL anymore on my device, and bug 959089 is fixed.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → DUPLICATE
blocking-b2g: backlog → -
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: