Video canvas can not be used as a texture in WebGL on Android with e10s enabled

RESOLVED FIXED in Firefox 66

Status

()

defect
P1
critical
RESOLVED FIXED
11 months ago
4 months ago

People

(Reporter: rbarker, Assigned: jhlin)

Tracking

Trunk
mozilla65
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(geckoview62 wontfix, geckoview64 wontfix, geckoview65 wontfix, firefox-esr52 wontfix, firefox-esr60 wontfix, firefox61 wontfix, firefox62 wontfix, firefox63 wontfix, firefox64 wontfix, firefox65 wontfix, firefox66 fixed, firefox67 fixed)

Details

(Whiteboard: [geckoview:e10s] [geckoview:fxr:p1] [geckoview:fenix:p1])

Attachments

(3 attachments)

Web sites like https://www.panomoments.com/ that use a video canvas as a texture source in WebGL fail to render and display as black when e10s is enabled. This is likely causes by the fact that SurfaceTextures can only be consumed by a single OpenGL context. This probably works in the non-e10s case because the SurfaceTexture is attached to the WebGL context first.
Whiteboard: [geckoview:e10s] [geckoview:fxr]
Randall is FXR going with e10s for release and does this block?
Flags: needinfo?(rbarker)
We have to ship with e10s, so yes, it is blocking. I'm supposed to work on it, just haven't had time yet.
Flags: needinfo?(rbarker)
Marking P1 and assigning to Randall because he said he will work on it when he has time.

We won't need to uplift to GV 62.
Assignee: nobody → rbarker
Priority: -- → P1
Whiteboard: [geckoview:e10s] [geckoview:fxr] → [geckoview:e10s] [geckoview:fxr:p1]
After investigating the issue, it turns out to be more complicated than described in the initial comment. Video decoding works as follows on Android:

1) The process containing the compositor creates a GeckoSurfaceTexture and initialized a GeckoSurface with the GeckoSurfaceTexture.
2) The GeckoSurface is sent to the media decode process over the binder.
3) The media decode process writes to the Surface
4) The compositor consumes the output via the GeckoSurfaceTexture created in the first step.

When WebGL uses video elements as textures it tries to consumes the GeckoSurfaceTexture created by the compositor. This works when WebGL and the compositor exist in the same process. However, when e10s is enabled, they do not live in the same process because WebGL is part of the content process and does not have access to the GeckoSurfaceTexture created but the compositor.

:snorp's proposed solution to this problem is detect when a video element is create with document.createElement('video') and have the content process create the GeckoSurfaceTexture and send the GeckoSurface to the media decode process instead of having the compositor create one.
Jeff Gilbert might have good ideas here. Jeff this blocks e10s for Firefox Reality. Any thoughts on comment 4?
Flags: needinfo?(jgilbert)
Is there aaaaany chance we can make these decoded SurfaceTextures immutable and sharable, getting a new one out of the decoder each time? If we can't do this universally, we might want to do this ourselves if we find ourselves with multiple consumers, even if it costs us an extra copy. (Does Android allow cross-process EGLImage/EGLSurfaces?)

:snorp's approach is ok, though lazily choosing a decode destination process means we have weird behavior if both processes want it. This would be bad if you were watching a video in-page, and then wanted to later use that same video element in webgl. (or vice-versa)
Flags: needinfo?(jgilbert)
Flags: needinfo?(snorp)
Severity: normal → critical
Whiteboard: [geckoview:e10s] [geckoview:fxr:p1] → [geckoview:e10s] [geckoview:fxr:p1][geckoview]
Randall isn't active on this, unassigning. James will brain dump some thoughts soon.
Assignee: rbarker → nobody
Some background on why we have this bug:

Video decoding APIs for Android require a Surface[0] as a sink for the decoded data. A Surface is a write-only interface. It can be sent to other processes using the Android Binder IPC system.

The way we get the contents of a Surface on the screen is by creating a SurfaceTexture against a Surface. The SurfaceTexture can be used as an external texture in OpenGL. It cannot be sent cross-process. It can only be attached to a single OpenGL context at a time.

Currently, to create a Surface for the decoder, we ask the parent process to create the Surface for us. This ensures that the accompanying SurfaceTexture is in the right process for the Compositor. This is problematic for the WebGL case, however, since we now want the SurfaceTexture to live in the content process and be attached to the GL context used by WebGL.
Flags: needinfo?(snorp)
It just occurred to me that we may be able to create two SurfaceTextures against one Surface. It doesn't seem like it should work, but if it does that would be the easiest way to solve this. Otherwise I think we're going to end up copying it around a bunch...
Today's meeting thinking was: maybe but a long shot...

Jamie is this something you could help with? Jeff do you have thoughts on recent comments?
Flags: needinfo?(jnicol)
Flags: needinfo?(jgilbert)
I don't think it's possible to map 1 Surface to 2 SurfaceTextures.

I've done a wee bit of googling, and I think it *might* be possible on Android O to achieve what we want. Rather than using SurfaceTextures, we can use HardwareBuffers. These seem to be a buffer for a single image frame, as opposed to SurfaceTexture which handles a stream.

HardwareBuffers can be obtained from Images, which I think can be obtained from the MediaCodec directly, or from an ImageReader (which looks similar to SurfaceTexture in that it consumes a stream of images, and gives you a Surface to hook up to the producer side).

HardwareBuffers look like they can be sent across processes, and it is possible to obtain EGL images from them.

These are the pages I've been reading, and I unfortunately don't know any more than is detailed in them!:
https://developer.android.com/ndk/guides/stable_apis for HardwareBuffer docs (if you can call them that)
https://developer.android.com/reference/android/media/Image
https://developer.android.com/reference/android/media/ImageReader
https://developer.android.com/reference/android/media/MediaCodec
Flags: needinfo?(jnicol)
I agree that HardwareBuffer looks nice, but I just saw the following[0]

> the HardwareBuffer associated with this Image or null if this Image doesn't support this feature. (Unsupported use cases
> include Image instances obtained through MediaCodec, and on versions prior to Android P, ImageWriter). 

Not sure it's going to be helpful here.

[0] https://developer.android.com/reference/android/media/Image.html#getHardwareBuffer()
(In reply to Jeff Gilbert [:jgilbert] from comment #7)
> Is there aaaaany chance we can make these decoded SurfaceTextures immutable
> and sharable, getting a new one out of the decoder each time?

Nope.

>  If we can't do
> this universally, we might want to do this ourselves if we find ourselves
> with multiple consumers, even if it costs us an extra copy. (Does Android
> allow cross-process EGLImage/EGLSurfaces?)

Yeah, might be the way to go. There isn't any cross-process texture, just the SurfaceTexture/Surface combination. What we could do is immediately blit from an in-process SurfaceTexture to an EGLImage on each frame, which can be consumed via WebGL. If we need to send the frame to the compositor, we'd do an additional copy to a in-compositor Surface. This would mean in the "normal" case of a 2d web page we'd have 2 copies per video frame, which is...pretty crappy.
Yeah, I looked through the SurfaceTexture source and the idea of taking ownership of a buffer from a queue is pretty central to the implementation. There's no concept of a SurfaceTexture that doesn't own its surface. That class is just a wrapper around GLConsumer, which pulls surfaces from an IGraphicBufferConsumer. I don't think we'll be able to make a shared SurfaceTexture.
(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #14)
> (In reply to Jeff Gilbert [:jgilbert] from comment #7)
> >  If we can't do
> > this universally, we might want to do this ourselves if we find ourselves
> > with multiple consumers, even if it costs us an extra copy. (Does Android
> > allow cross-process EGLImage/EGLSurfaces?)
> 
> Yeah, might be the way to go. There isn't any cross-process texture, just
> the SurfaceTexture/Surface combination. What we could do is immediately blit
> from an in-process SurfaceTexture to an EGLImage on each frame, which can be
> consumed via WebGL. If we need to send the frame to the compositor, we'd do
> an additional copy to a in-compositor Surface. This would mean in the
> "normal" case of a 2d web page we'd have 2 copies per video frame, which
> is...pretty crappy.

Why do we have to blit twice? Why not just use the path we have today (in which the compositor handles creation and consumption of all surfaces), but insert an EGLImage copy in the middle of the decoding path to hand to content if it wants to use it for WebGL? This would only involve one blit, not two…
(In reply to Patrick Walton (:pcwalton) from comment #16)
> 
> Why do we have to blit twice? Why not just use the path we have today (in
> which the compositor handles creation and consumption of all surfaces), but
> insert an EGLImage copy in the middle of the decoding path to hand to
> content if it wants to use it for WebGL? This would only involve one blit,
> not two…

You explained it yourself with "the compositor handles creation and consumption of all surfaces". With the pipeline we have now, we cannot consume the surface from the content process at all.
(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #17)
> You explained it yourself with "the compositor handles creation and
> consumption of all surfaces". With the pipeline we have now, we cannot
> consume the surface from the content process at all.

I understand that, but why two blits? The content process needs one blit, but who needs the second one? The compositor should already have the source SurfaceTexture that it used as the source for the blit, unless I'm misunderstanding something…
(In reply to Patrick Walton (:pcwalton) from comment #18)
> (In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #17)
> > You explained it yourself with "the compositor handles creation and
> > consumption of all surfaces". With the pipeline we have now, we cannot
> > consume the surface from the content process at all.
> 
> I understand that, but why two blits? The content process needs one blit,

Oh, I think we probably want to do an initial copy from the ST/Surface that the decoder uses into an EGLImage. This would allow easy consumption from whatever WebGL or 2d canvas context. The second copy would be from that EGLImage into the Compositor-side Surface.
(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #19)
> (In reply to Patrick Walton (:pcwalton) from comment #18)
> > (In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #17)
> > > You explained it yourself with "the compositor handles creation and
> > > consumption of all surfaces". With the pipeline we have now, we cannot
> > > consume the surface from the content process at all.
> > 
> > I understand that, but why two blits? The content process needs one blit,
> 
> Oh, I think we probably want to do an initial copy from the ST/Surface that
> the decoder uses into an EGLImage. This would allow easy consumption from
> whatever WebGL or 2d canvas context. The second copy would be from that
> EGLImage into the Compositor-side Surface.

To elaborate a little further: you can attach a SurfaceTexture to one GL context at a time, but are allowed to detach/attach at will. When I originally wrote the SurfaceTexture Host/Client pair I wanted to use this so the Compositor was only attached while it was actively compositing the SurfaceTexture, and detached otherwise. This resulted in a lot of corrupted images making it onto the screen on some devices. It seems like there might be fencing issues with this type of usage. I don't think the attach/detach functionality is expected to be used for multiplexing the consumption of the SurfaceTexture. I think it's likely meant to be used to recover from context loss.

Given the above, I don't think we want to try the attach/detach dance with canvas contexts and whatever context we use for copying to the Compositor. It will be far more reliable if we just copy it from the get go (and a lot slower).
John, do you have any interest in tackling this? Or do you have any other ideas for fixing it?
Flags: needinfo?(jolin)
How about MediaCodec.setOutputSurface[1]? It's been added since API 23 to support dynamic output surface switch. Maybe we could change the GeckoSurface to be able to have more than one Android surfaces and have direct access to the MediaCodec, so it can change the output surface when either compositor or canvas needs the output, and only make copies when both.


[1] https://developer.android.com/reference/android/media/MediaCodec.html#setOutputSurface(android.view.Surface)
Flags: needinfo?(jolin)
Randall says he will look at this bug next week.
Assignee: nobody → rbarker
We need media team expertise here. Nils, this is a hairy bug but it blocks e10s for GeckoView powered FxR.
Assignee: rbarker → nobody
Flags: needinfo?(drno)
John, could you please try to help the GeckoView folks on this one?
Assignee: nobody → jolin
Flags: needinfo?(drno) → needinfo?(jolin)
(In reply to Nils Ohlmeier [:drno] from comment #26)
> John, could you please try to help the GeckoView folks on this one?

I will try the MediaCodec.setOutputSurface() approach as mentioned in comment 22. However, the distribution dashboard [1] says there are ~70% devices running API 23 or later. So that solution, if working, won't cover the other 30%.

[1] https://developer.android.com/about/dashboards
Flags: needinfo?(jolin)
(In reply to John Lin [:jhlin][:jolin] from comment #27)
> I will try the MediaCodec.setOutputSurface() approach as mentioned in
> comment 22. However, the distribution dashboard [1] says there are ~70%
> devices running API 23 or later. So that solution, if working, won't cover
> the other 30%.

If you need any help testing patches or anything else, please let me know.
Updating flags, we want FxR to be unblocked though.
63=wontfix because FxR 1.1 will ship GV 64.
John, do you have any updates on this e10s canvas bug for FxR?
Flags: needinfo?(jlin)
Flags: needinfo?(jlin) → needinfo?(jolin)
(In reply to Chris Peterson [:cpeterson] from comment #31)
> John, do you have any updates on this e10s canvas bug for FxR?

Turns out my plan to switch output surfaces using setOutputSurface() to avoid image copy doesn't work as I expected. After studying how and when the compositor and WebGL get video contents, I found it impossible to find the right time to switch between consumers/surfaces.

IIUC, the current implementation performs one blitting[1] to copy data from SurfaceTextureImage to WebGL frame buffer, and the problem is that when e10s is enabled, this operation in the content process cannot directly access the SurfaceTexture created in the parent process. It seems to me that for WebGL in the content process to see the data, an extra copy that can be shared across processes is inevitable. And as James said in his comments, the performance will not be good.

James, what do you think? Should I proceed with the implementation of shared buffers? The rough idea is creating another Surface/Texture in the content process for WebGL and passing it to the parent process for the compositor to render frames into.

[1] https://searchfox.org/mozilla-central/source/dom/canvas/TexUnpackBlob.cpp#698
Flags: needinfo?(jolin) → needinfo?(snorp)
Yeah, I don't see a way of solving this without introducing an additional copy for the common case (compositor consumption of the video).
Flags: needinfo?(snorp)
James, is there existing code that exposes native mozilla::gfx::GLContext functionalities to our Java module? I'm working on buffer copying by rendering the texture from MediaCodec output surface, and it seems that GLContext has everything I need. However, if it's not available to Java, perhaps I can just call the Java EGL/GL API directly?
Flags: needinfo?(snorp)
(In reply to John Lin [:jhlin][:jolin] from comment #34)
> James, is there existing code that exposes native mozilla::gfx::GLContext
> functionalities to our Java module? I'm working on buffer copying by
> rendering the texture from MediaCodec output surface, and it seems that
> GLContext has everything I need. However, if it's not available to Java,
> perhaps I can just call the Java EGL/GL API directly?

There isn't any good way currently to use the C++ GLContext from Java. I'd recommend doing the GL bits in C++ if you can, but otherwise yeah it should work from Java using the EGL/GL stuff there. The amount of code to do the blit is kinda ridiculous, though.
Flags: needinfo?(snorp)
I'll go a step further and say that anything that changes the GLContext's gl state machine *must* be in c++. Splitting logic across languages is not something we want.
Flags: needinfo?(jgilbert)
Duplicate of this bug: 1481406
64=wontfix because FxR 1.1 is using GV 65 and this issue doesn't block Focus 8.0 from using GV 64.
Child processes cannot access textures allocated in the parent process,
which is needed by the compositor to render video elements efficiently.
Unfortunately, Android doesn't expose Sufrace buffers (sharable across
processes) in the SDK/NDK as other platforms, so we need to generate
extra texture/surface in the child process and update texture images
through the surface, which is passed to the parent process for the remote
texture to copy its contents into.

Depends on D11938
Pushed by jolin@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b0fdcebfd20d
p1: conform SharedMemory.describeContents() to Android API spec. r=snorp
https://hg.mozilla.org/integration/autoland/rev/5d897c0c7a25
p2: expose native GL blitter to Java. r=snorp
https://hg.mozilla.org/integration/autoland/rev/9c2834ca8823
p3: copy texture contents for remote allocated Surface. r=snorp
When playing a youtube video with e10s enable in GeckoView. There is no image only sound. The video will play when e10s is disabled.

This error in the log looks relevant:

11-27 13:59:55.668  2852  3074 E GLConsumer: [SurfaceTexture-0-2852-27] attachToContext: GLConsumer is already attached to a context
11-27 13:59:55.668  2852  3074 W System.err: java.lang.RuntimeException: Error during attachToGLContext (see logcat for details)
11-27 13:59:55.669  2852  3074 W System.err: 	at android.graphics.SurfaceTexture.attachToGLContext(SurfaceTexture.java:290)
11-27 13:59:55.669  2852  3074 W System.err: 	at org.mozilla.gecko.gfx.GeckoSurfaceTexture.attachToGLContext(GeckoSurfaceTexture.java:88)
John, Youtube seems to be broken when e10s in enable in current GeckoView nightly witch contains this fix. See my previous comment.
Flags: needinfo?(jolin)
I think we need to back this out, video playback with e10s seems to be pretty busted.
No longer blocks: 1510464
Depends on: 1510464
Backed out 3 changesets (Bug 1486659) as requested by jhlin on irc for breaking video playback on Android and causing regression Bug 1510464.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Backout by cbrindusan@mozilla.com:
https://hg.mozilla.org/mozilla-central/rev/4a3f88e36298
Backed out 3 changesets as requested by jhlin on irc for breaking video playback on Android and causing regression Bug 1510464. a=backout
Attachment #9025164 - Attachment description: Bug 1486659 - p1: conform SharedMemory.describeContents() to Android API spec. r?snorp! → Bug 1486659 - p1: conform SharedMemory.describeContents() to Android API spec. r=snorp
Attachment #9025166 - Attachment description: Bug 1486659 - p2: expose native GL blitter to Java. r?snorp! → Bug 1486659 - p2: expose native GL blitter to Java. r=snorp
Attachment #9025167 - Attachment description: Bug 1486659 - p3: copy texture contents for remote allocated Surface. r?snorp! → Bug 1486659 - p3: copy texture contents for remote allocated Surface. r=snorp
Patch reopened for review on Phabricator.
Flags: needinfo?(jolin)
Pushed by jolin@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/f491ebcda076
p1: conform SharedMemory.describeContents() to Android API spec. r=snorp
https://hg.mozilla.org/integration/autoland/rev/6c89aae076dd
p2: expose native GL blitter to Java. r=snorp
https://hg.mozilla.org/integration/autoland/rev/62064231dfbc
p3: copy texture contents for remote allocated Surface. r=snorp
Not sure if GV65 wants this or not, but I'm not inclined to uplift it to Beta anyway.
Product: Firefox for Android → GeckoView
Target Milestone: Firefox 65 → mozilla65
(In reply to Ryan VanderMeulen [:RyanVM] from comment #54)
> Not sure if GV65 wants this or not, but I'm not inclined to uplift it to
> Beta anyway.

65=wontfix. We don't need to uplift this fix to GV 65. Focus will start testing GV 66 Nightly soon.
Whiteboard: [geckoview:e10s] [geckoview:fxr:p1][geckoview] → [geckoview:e10s] [geckoview:fxr:p1]

I think this still has problems. The example here hangs Gecko on the Reference Browser (GeckoView Nightly 66).

https://developer.mozilla.org/en-US/docs/Web/API/WebGL_API/Tutorial/Animating_textures_in_WebGL

Status: RESOLVED → REOPENED
Resolution: FIXED → ---

(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #56)

I think this still has problems. The example here hangs Gecko on the Reference Browser (GeckoView Nightly 66).

https://developer.mozilla.org/en-US/docs/Web/API/WebGL_API/Tutorial/Animating_textures_in_WebGL

It seems fine in my local (central) build on Pixel 3/Pie. Could you please share which device and GeckoView version (armeabi-v7a-66.0.2019...) you're using? Thanks!

Flags: needinfo?(snorp)
Whiteboard: [geckoview:e10s] [geckoview:fxr:p1] → [geckoview:e10s] [geckoview:fxr:p1] [geckoview:fenix:p1]

(In reply to John Lin [:jhlin][:jolin] from comment #57)

(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #56)

I think this still has problems. The example here hangs Gecko on the Reference Browser (GeckoView Nightly 66).

https://developer.mozilla.org/en-US/docs/Web/API/WebGL_API/Tutorial/Animating_textures_in_WebGL

It seems fine in my local (central) build on Pixel 3/Pie. Could you please share which device and GeckoView version (armeabi-v7a-66.0.2019...) you're using? Thanks!

I'm using a Pixel 2 with armv7.

Flags: needinfo?(snorp)

(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #58)

It seems fine in my local (central) build on Pixel 3/Pie. Could you please share which device and GeckoView version (armeabi-v7a-66.0.2019...) you're using? Thanks!

I'm using a Pixel 2 with armv7.

John, have you had a chance to test a Pixel 2 with an ARMv7 GV build yet? If you need a Pixel 2 device, you can order one on Service Now.

Flags: needinfo?(jolin)

(In reply to Chris Peterson [:cpeterson] from comment #59)

John, have you had a chance to test a Pixel 2 with an ARMv7 GV build yet? If you need a Pixel 2 device, you can order one on Service Now.

Sorry for the late reply. No, I haven't been able to reproduce it on other devices. I have ordered a Pixel 2 and will test again once I get it.

Flags: needinfo?(jolin)

Just received a Pixel 2 and tested the video texture example with latest reference browser (1.0.1907, Build#10461217, Gecko 67.0a1-20190214044118), but could not reproduce the issue. James, could you still reproduce it with the latest version? If so, what Android version are you on? Thanks!

Flags: needinfo?(snorp)

Hmm, that example does seem to be working for me now on the Pixel 2. I wonder if I was getting bit by the autoplay blocking stuff which rbarker recently fixed.

Flags: needinfo?(snorp)

(In reply to John Lin [:jhlin][:jolin] from comment #27)

I will try the MediaCodec.setOutputSurface() approach as mentioned in
comment 22. However, the distribution dashboard [1] says there are ~70%
devices running API 23 or later. So that solution, if working, won't cover
the other 30%.

[1] https://developer.android.com/about/dashboards

If it helps anything, minimum target API for supported FxR platform is API 24 (HTC WaveVR & Google Daydream).

I think this is fixed for the most part. We seem to have one case that fails and might not even be related. I filed this: https://bugzilla.mozilla.org/show_bug.cgi?id=1529812

This should be closed as fixed and the remaining issue needs to be triages and cause identified.

(In reply to Randall Barker [:rbarker] from comment #64)

I think this is fixed for the most part. We seem to have one case that fails and might not even be related. I filed this: https://bugzilla.mozilla.org/show_bug.cgi?id=1529812

This should be closed as fixed and the remaining issue needs to be triages and cause identified.

OK. I will close this bug.

Moving to the "Core::Audio/Video: Playback" component because this is a core Gecko issue, not a GeckoView issue.

Status: REOPENED → RESOLVED
Closed: 7 months ago5 months ago
Component: General → Audio/Video: Playback
Product: GeckoView → Core
Resolution: --- → FIXED
No longer blocks: 1528330
You need to log in before you can comment on or make changes to this bug.