Open Bug 1237489 Opened 8 years ago Updated 2 years ago

[webvr] Add basic MOZ_texture_from_element to enable creating a texture from an arbitrary element

Categories

(Core :: Graphics: CanvasWebGL, enhancement, P3)

enhancement

Tracking

()

People

(Reporter: jgilbert, Unassigned)

Details

(Keywords: feature, Whiteboard: [webvr])

Creating textures from arbitrary elements isn't something we can expose to unprivileged code, but we should add this functionality to allow the WebVR folks to do some prototyping.

My WIP:
https://github.com/jdashg/gecko-dev/tree/tex-from-elem

Unfortunately, it looks like nsLayoutUtils::SurfaceFromElement is only meant to work with image/video/canvas, not general elements. Maybe we can use the screenshotting APIs? (ni=mattwoodrow)
Flags: needinfo?(matt.woodrow)
'WIP' is generous. I've done the boilerplate for adding the extension and IDL, but just have MOZ_CRASH() as an implementation.
Thanks Jeff!  I have added a [webvr] tag.
Summary: Add basic MOZ_texture_from_element to enable creating a texture from an arbitrary element → [webvr] Add basic MOZ_texture_from_element to enable creating a texture from an arbitrary element
(Copying post from relevant conversation on dev-platform)

On Thursday, January 7, 2016 at 1:32:59 AM UTC-8, Robert O'Callahan wrote:
- hide quoted text -
> On Thu, Jan 7, 2016 at 8:46 PM, Anne van Kesteren <annevk@annevk.nl> wrote:
>
> > At least enforcing CORS-same-origin would be somewhat trivial from a
> > specification perspective since all fetches go through Fetch. Limiting
> > plugins and other affected features would be some added conditionals
> > here and there. I don't see how content changes would have an impact
> > since you can only change the policy through navigation at which point
> > you'd have a new global and such anyway.
> >
>
> Some of the things that would need to be handled:
> -- <input type="file"> controls need to not expose sensitive data about
> file paths
> -- For SVG images we disable native themes to avoid those being inspectable
> by the Web site
> -- Non-origin-clean canvas images, <video> frames and MediaStream frames
> would have to be suppressed
> -- Non-same origin content (<img>, <iframe>, etc) would have to be blocked.
> This isn't as simple as a change to Fetch, since a site could create an
> element and load its contents in an unrestricted browsing context and move
> it into a different document with different rules.
> -- :visited
>
> Rob
> --
> lbir ye,ea yer.tnietoehr  rdn rdsme,anea lurpr  edna e hnysnenh hhe uresyf
> toD
> selthor  stor  edna  siewaoeodm  or v sstvr  esBa  kbvted,t
> rdsme,aoreseoouoto
> o l euetiuruewFa  kbn e hnystoivateweh uresyf tulsa rehr  rdm  or rnea
> lurpr
> .a war hsrer holsa rodvted,t  nenh hneireseoouot.tniesiewaoeivatewt sstvr
> esn

There are two use cases for this functionality needed by the WebVR team.

The one needed earliest is to implement HUD interfaces and dialogues.  In this case, we will only be using this API from chrome privileged code.  jgilbert has started on a mechanism we can use for now for this case as a chrome-only API (Bug 1237489).

For the second case, which could benefit everyone, we would like to use DOM elements in any WebGL scene.  The intent would be to deprecate the API added in Bug 1237489 once the Khronos specification for WEBGL_dynamic_texture is more mature and can be used in unprivileged content.  jgilbert mentions that the WEBGL_dynamic_texture specification is due for a pruning refactor that should simplify things a bit.

Would it help to reduce the scope and brittle-ness of the security model by white-listing elements that are allowed in non-privileged content?  The original WEBGL_dynamic_texture proposal names specifically HTMLVideoElement, HTMLCanvasElement and HTMLImageElement.  Perhaps these would be a good start, with an intent to expand later?

If we wish to style content in a way that it gets sanitized of security sensitive content, perhaps we could roll it up in a convenient CSS attribute applied to the element we are capturing as a texture while also ensuring that the element generates a layer and is taken out of flow from the 2d document.

How would you feel about this CSS:

    position: embed

It would work like "position: absolute" in that it takes the element out of flow; however, the element would not be positioned or rendered in the 2d layout.  It would also ensure that the element generates a layer, similar to "will-change: transform".  When called from non-privileged content, it would enforce the whitelist, CORS, and sanitize any security sensitive output such as those that Roc has identified earlier.

A chrome-only version with a different name could be used from privileged code to implement backwards compatibility of 2d web pages in a VR based browser:

    position: embed-unsanitized

(I'd love to hear other attribute naming ideas)

For the case of 2d web pages in a VR based browser, we would need to include things such as iframes, cross-origin content, and :visited link styles.  These could be restricted to be used by privileged code only with "position: embed-unsanitized"
Can't you just stuff the chrome HUD content in an <iframe> and use the canvas.drawWindow to render it to a canvas, then with chrome privileges use that with WebGL?
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #4)
> Can't you just stuff the chrome HUD content in an <iframe> and use the
> canvas.drawWindow to render it to a canvas, then with chrome privileges use
> that with WebGL?

While I think this is particularly unsatisfying from a performance standpoint, it should be sufficient for prototyping functionality.

How much of the functionality needs does this cover?
Flags: needinfo?(kgilbert)
For our immediate needs, we will only need to display dialogs and menus consisting of some static text and images.  These would all be implemented from privileged chrome code.  (We could implement transitions and such effects with WebGL)

Our near-term needs after that would be to capture elements within unprivileged content.  This would be used most often for rendering HTML content in libraries like AFrame and three.js.  Performance would be more important at this point, as the captured elements may contain animations or video; however, we could focus on specialized solutions for use cases such as video.

Further out, we would be using it for capturing entire web page tabs for backwards compatibility in the VR browser.  These can be any web site, which may include CSS animations, video, scrolling frames, iframes, canvas elements, etc...

I'm ni'ing Casey, who may have some more input on requirements for Horizon and the timelines involved.
Flags: needinfo?(kgilbert) → needinfo?(caseyyee.ca)
I would say from a product standpoint, I think we are most blocked on being able to import entire web page layouts and windows for viewing in VR.  This would give us the backwards compatibility for viewing 'classic'/2D web content that you would expect from a browser product.

Although it's a bit of a hassle making simple 2D layouts (menus, buttons, dialogues) in VR, we're not really blocked.    Just a bit of a inconvenience really.

What we are aiming after is an effect similar to this:  http://learningthreejs.com/blog/2013/04/30/closing-the-gap-between-html-and-webgl/ , but stereo rendered and distorted for VR.

I quiet like the idea of position: embed.  This would give us a nice way to export the visual portions of a element while retain the function of the element.   canvas.drawWindow works, but it won't let us do this with fragments.   Only entire windows.

Another nice-to-have would be the ability to export requestFullscreen in a similar manner.  I Imagine being able to view a youtube page and click the fullscreen button to display the content on a theatre or IMAX sized screen.
Flags: needinfo?(caseyyee.ca)
I'm not sure that drawWindow will work with iframes, but I will give that a shot over the weekend and report back.   It could be enough for prototype purposes.
(In reply to Casey Yee from comment #7)
> I would say from a product standpoint, I think we are most blocked on being
> able to import entire web page layouts and windows for viewing in VR.  This
> would give us the backwards compatibility for viewing 'classic'/2D web
> content that you would expect from a browser product.
> 
> Although it's a bit of a hassle making simple 2D layouts (menus, buttons,
> dialogues) in VR, we're not really blocked.    Just a bit of a inconvenience
> really.
> 
> What we are aiming after is an effect similar to this: 
> http://learningthreejs.com/blog/2013/04/30/closing-the-gap-between-html-and-
> webgl/ , but stereo rendered and distorted for VR.
> 
> I quiet like the idea of position: embed.  This would give us a nice way to
> export the visual portions of a element while retain the function of the
> element.   canvas.drawWindow works, but it won't let us do this with
> fragments.   Only entire windows.
> 
I spoke to Casey and reviewed the solution Roc has proposed.  Casey will post back here after doing a test with canvas.drawWindow to see if it is enough for our immediate needs in chrome-only code.  I've NI'ed Casey to post his findings here after the experiment.

> Another nice-to-have would be the ability to export requestFullscreen in a
> similar manner.  I Imagine being able to view a youtube page and click the
> fullscreen button to display the content on a theatre or IMAX sized screen.
This is an intriguing idea.  Perhaps this behavior doesn't need to be a web standard, but rather that our VR browser implementation could interpret the existing web standards in a way that provides the best ergonomics for VR.  This may be similar to how font-enlargement works for mobile browsers on non-mobile content.
Flags: needinfo?(caseyyee.ca)
Whiteboard: [webvr]
> This is an intriguing idea.  Perhaps this behavior doesn't need to be a web standard, but rather that our VR browser implementation could interpret the existing web standards in a way that provides the best ergonomics for VR.  This may be similar to how font-enlargement works for mobile browsers on non-mobile content.

That was always my thought: that once we decoupled WebVR API from Fullscreen API, we'd present Fullscreen API calls with "home theatre"-style user experiences when using VR user agents.
Hi guys,

I am doing some experiment based on Jeff's repo, and currently I have some good results can be shared. (https://github.com/daoshengmu/gecko-dev/blob/VRFullScreen/dom/canvas/WebGLExtensionTextureFromElement.cpp)

First of all, we need to disable cross-origin check. Then, we will have two options to implement this bug. The first one is like Roc mentioned using RenderDocument that will render the document to a surface. I am not quite sure that would work for all domElement but I have checked it works for iframe. Or we can use GetPrimaryFrame to render the whole page and clip the region by choosing the rect that we want, but it is not a good approach because it will spend lots of memory to render a big image and do clipping.

The other option is making WebGLTexture::TexOrSubImage supports more types of domElment. Currently, it has supported <img>, <canvas>, and <video> elements. I just spent a couple of days to make it support <iframe> although we still need to find a way to solve coordinate mapping and detecting document changes. But I think this is a good start.
It sounds like either of these approaches will work for our primary use case of capturing iframe contents.   Though WebGLTexture::TexOrSubImage seems to be more flexible and opens up more possibilities down the road.  Definetly worth pursuing in my opinion.
Flags: needinfo?(caseyyee.ca)
Hi Jeff,

In my current process, I am working on nsLayoutUtils::SurfaceFromElement for HTMLIFrameElement
https://github.com/daoshengmu/gecko-dev/commit/297418814ac924f7737c0389e67d2ef001c93eca#diff-81c0a9db69d21b1dc0081929f57f4f49R7052. I know we must have many different kinds of elements have to be implemented. But, I think this is the same to implement an another screenshotting API for elements. This might be a good way to followup.
(In reply to Daosheng Mu[:daoshengmu] from comment #13)
> Hi Jeff,
> 
> In my current process, I am working on nsLayoutUtils::SurfaceFromElement for
> HTMLIFrameElement
> https://github.com/daoshengmu/gecko-dev/commit/
> 297418814ac924f7737c0389e67d2ef001c93eca#diff-
> 81c0a9db69d21b1dc0081929f57f4f49R7052. I know we must have many different
> kinds of elements have to be implemented. But, I think this is the same to
> implement an another screenshotting API for elements. This might be a good
> way to followup.

I'm not actually sure this will be any more efficient than just using drawWindow, though it's simpler for the caller.

I don't understand the 'if (drawDT) {' block in your code. Looks like you're taking the contents of drawDT, and then drawing it back to itself? With a readback to CPU memory in the middle for GPU accelerated DT types.

You're also not initializing result to anything.
Flags: needinfo?(matt.woodrow)
(In reply to Matt Woodrow (:mattwoodrow) from comment #14)
> (In reply to Daosheng Mu[:daoshengmu] from comment #13)
> > Hi Jeff,
> > 
> > In my current process, I am working on nsLayoutUtils::SurfaceFromElement for
> > HTMLIFrameElement
> > https://github.com/daoshengmu/gecko-dev/commit/
> > 297418814ac924f7737c0389e67d2ef001c93eca#diff-
> > 81c0a9db69d21b1dc0081929f57f4f49R7052. I know we must have many different
> > kinds of elements have to be implemented. But, I think this is the same to
> > implement an another screenshotting API for elements. This might be a good
> > way to followup.
> 
> I'm not actually sure this will be any more efficient than just using
> drawWindow, though it's simpler for the caller.
> 

Thanks for remind. I notice and remove it now.

https://github.com/daoshengmu/gecko-dev/blob/VRFullScreen/dom/canvas/WebGLExtensionTextureFromElement.cpp#L46

> I don't understand the 'if (drawDT) {' block in your code. Looks like you're
> taking the contents of drawDT, and then drawing it back to itself? With a
> readback to CPU memory in the middle for GPU accelerated DT types.
> 
> You're also not initializing result to anything.

Oops, my bad. I put it at an another commit.

Currently, I think the only question for me is how to use drawWindow to support all types of dom::Element. DrawWindow only supports for Canvas. If I choose to use GetComposedDoc, it will render the whole web page. So, I think that might be the reason why SurfaceFromElement has to be implemented respectively.
Keywords: feature
Type: defect → enhancement
Priority: -- → P3
Assignee: jgilbert → nobody
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.