Closed Bug 911941 Opened 6 years ago Closed 6 years ago

Crash in test_stream_autoplay with new gralloc textures

Categories

(Core :: Graphics: Layers, defect)

ARM
Gonk (Firefox OS)
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla26

People

(Reporter: nical, Assigned: nical)

References

Details

Attachments

(1 file)

STR:
* enable new textures on B2G
* go to people.mozilla.org/~nsilva/mediastream_autoplay.html
The tab is killed by SIGTERM because of an error while deserializing a shmem
This tests set the source of one video to be the content of the other:
v2.mozSrcObject = v1.mozCaptureStream();

In the media code this translates to two ImageContainers receiving the same SharedPlanarYCbCrImage. In term of layers this means the same TextureClient is added to two compositables. Our TextureClient/Host code doesn't handle that case yet (we need bug 897452 for that).

The solution for now will be to make sure we don't share the same texture twice (Copying it in ImageClient::UpdateImage if the texture is already shared with the compositor).

I am kinda surprised (scared) that it actually worked with the deprecated texture client.
Depends on: 912040
With this patch, if a texture client is passed to two compositables, the second compositable will ignore it.

The result isn't great because the two layers are racing to get the texture first, so sometimes one will be updated, sometimes it is the other one. It doesn't crash anymore though.

If we take a fix like this one, it means we break the feature of being able to do:
video2.mozSrcObject = video1.mozCaptureStream();

So we want to be careful about that and weight the wins and losses of being able to enable new textures on B2G.
Note that on mozilla-central with deprecated textures, the test case crashes with h264 (-> when using gralloc) if you try to reload the tab (both while or after the video is playing).
And the frame rate is terrible.

It is important to try with h264 videos because other formats use different code paths and the problem that is hardest to solve is with gralloc (as always)

webm video: http://people.mozilla.org/~nsilva/mediastream_autoplay.html?0
h264 video: http://people.mozilla.org/~nsilva/mediastream_autoplay.html?1
ogv video:  http://people.mozilla.org/~nsilva/mediastream_autoplay.html?2
a try push with this hacky fix: https://tbpl.mozilla.org/?tree=Try&rev=7d0729d2a985
I just tried with a b2g18 device and the test case crashed right away without even displaying a frame.
D'oh, I still have a crash when reloading the page in this test case with gralloc (+ new textures). It crashes in libstagefright_omx.so.

Sotaro, it very much looks like this thing you told me about the other day, about having to be sure all the gralloc buffers have returned to the GonkNativeWindow before a specific call to ImageContainer. I don't remember the details, can you tell me more about that?
Flags: needinfo?(sotaro.ikeda.g)
Right now, I am writing a comment of GraphicBufferLockedWrapper, does the following help?

/**
 * The helper class to call GraphicBufferLocked::Unlock() in the destructor.
 * GraphicBufferLocked::Unlock() returns the gralloc buffer to GonkNativeWindow.
 * Actual GraphicBufferLocked::Unlock()'s implementation is provided
 * by derived classes. GraphicBufferLocked's destructor can not call the function.
 * In b2g18, Image class's lifetime is same to the buffer return to GonkNativeWindow.
 * GraphicBufferLocked::Unlock() is called in Image class's destructor.
 * But current gecko, lifetime of Image and GraphicBufferLocked is different.
 * Then Image class can not be used for it.
 * This class is created that mimic as if GraphicBufferLocked class returns
 * the gralloc buffer in the GraphicBufferLocked's destructor.
 */
Flags: needinfo?(sotaro.ikeda.g)
In b2g18, Image class's lifetime in content process is same as buffer's usage duration in content process. I feel current gfx layers, do not have such class. By this even when one Image object is used by multiple ImageContainer, the rendering works correctly.
(In reply to Sotaro Ikeda [:sotaro] from comment #9)
> In b2g18, Image class's lifetime in content process is same as buffer's
> usage duration in content process. I feel current gfx layers, do not have
> such class. By this even when one Image object is used by multiple
> ImageContainer, the rendering works correctly.

In current layers with new textures the lifetime of a gralloc buffer in the case of the camera equal to:
[A: Lifetime of TextureClient] + [B: the time it takes for the compositor to send back ReplyTextureRemoved]

[A] is rather easy. In order to take [B] into account we need to store the GrallocBufferLocked somewhere in a map with an ID and make it so OpRemoveTexture and ReplyTextureRemoved carry the ID. this way when we receive ReplyTextureRemoved we can safely return the gralloc buffer to the GonkNativeWindow.
With b2g18 we had a bunch of problems related to the lifetime of the Image not always taking the compositor side into account because we more or less made it through trial and errors. Basically the Image code paths have not been designed to take into account the compositor.

But that's not what I was referring to here. This documentation comment is good for the other bug where I asked for more precisions about the wrapper. Here I am interested in the shutdown part of the ImageContainer (I think?) where there is a call that must be made after all the gralloc buffers have returned, otherwise it crashes in libstagefright. It's a bit fuzzy in my memory though.
(In reply to Nicolas Silva [:nical] from comment #5)
> a try push with this hacky fix:
> https://tbpl.mozilla.org/?tree=Try&rev=7d0729d2a985

M3 is green on that last try push
(In reply to Nicolas Silva [:nical] from comment #10)
> [A] is rather easy.

How do you think about if GrallocBufferLocked is used by multiple ImageContainer? Which way is better to controll the time to return the gralloc buffer to GonkNativeWindow?

 In order to take [B] into account we need to store the
> GrallocBufferLocked somewhere in a map with an ID and make it so
> OpRemoveTexture and ReplyTextureRemoved carry the ID. this way when we
> receive ReplyTextureRemoved we can safely return the gralloc buffer to the
> GonkNativeWindow.

It seems like if OpRemoveTexture is done synchronously, there seems no need to wait ReplyTextureRemoved. It it correct?

> With b2g18 we had a bunch of problems related to the lifetime of the Image
> not always taking the compositor side into account because we more or less
> made it through trial and errors. Basically the Image code paths have not
> been designed to take into account the compositor.

In b2g18, PImageContainer::Flush() is sync. Isn't it enough just related to flush buffers in compositor side?
(In reply to Sotaro Ikeda [:sotaro] from comment #12)
> 
>  In order to take [B] into account we need to store the
> > GrallocBufferLocked somewhere in a map with an ID and make it so
> > OpRemoveTexture and ReplyTextureRemoved carry the ID. this way when we
> > receive ReplyTextureRemoved we can safely return the gralloc buffer to the
> > GonkNativeWindow.
> 
> It seems like if OpRemoveTexture is done synchronously, there seems no need
> to wait ReplyTextureRemoved. It it correct?

The problem is that the reference count of TextureClient can reach zero on any thread, and then OpRemoveTexture is sent at the next transaction.
So even though the layer transaction is synchronous, we have no guarantee that the lifetime of TextureClient ends inside a transaction.
> The problem is that the reference count of TextureClient can reach zero on
> any thread, and then OpRemoveTexture is sent at the next transaction.
> So even though the layer transaction is synchronous, we have no guarantee
> that the lifetime of TextureClient ends inside a transaction.

What is the problem here? If it happens during video playback, TextureClient's destructor calls RemoveTexture(). And in this function, the GraphicBufferLocked can be holded by ImageBridgeChild until next transaction complete. The gralloc buffers could be returned to GonkNativeWindow after next transaction complete.
(In reply to Sotaro Ikeda [:sotaro] from comment #14)
> What is the problem here?

The "problem" (I used the wrong word, because it is not really hard to solve) is that it is not synchronous. I am just saying that we can't only wait for the TextureClient's destructor to happen ([A]) and that we need some mechanism that waits for the ReplyRemovedTexture message that will be received at the end of the next transaction ([B]).

> If it happens during video playback,
> TextureClient's destructor calls RemoveTexture(). And in this function, the
> GraphicBufferLocked can be holded by ImageBridgeChild until next transaction
> complete. The gralloc buffers could be returned to GonkNativeWindow after
> next transaction complete.

Yes, that's more or less what I suggested in comment #9 when referring to [B].
Attachment #799545 - Flags: review?(bas)
Comment on attachment 799545 [details] [diff] [review]
Brutally disable sharing textures between two layers

Review of attachment 799545 [details] [diff] [review]:
-----------------------------------------------------------------

Fix typo and add a bug reference.
Attachment #799545 - Flags: review?(bas) → review+
https://hg.mozilla.org/mozilla-central/rev/949ea52cb8ed
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla26
You need to log in before you can comment on or make changes to this bug.