Closed Bug 653656 Opened 13 years ago Closed 11 years ago

Write specification for MOZ_window_region_texture WebGL extension

Categories

(Core :: Graphics: CanvasWebGL, defect, P2)

defect

Tracking

()

RESOLVED WONTFIX

People

(Reporter: cedricv, Assigned: cedricv)

References

Details

At DevTools, we'd like a WebGL-optimized equivalent to privileged canvas.drawWindow in order to render DOM elements into WebGL textures.

The extension would be listed by getExtensions from privileged/chrome JavaScript only (similarly to canvas.drawWindow).

This will be initially used for Tilt (https://wiki.mozilla.org/Tilt_Project_Page) but could be used by other new extensions.

This will enable easier extension development of 3D representations of the DOM and likely much better performance than using canvas.drawWindow (in theory we could share underlying 'layer' textures and/or reduce copy to the minimum).
Assignee: nobody → cedricv
Status: NEW → ASSIGNED
Summary: Write draft specification for MOZ_dom_element_texture extension → Write draft specification for MOZ_dom_element_texture WebGL extension
Blocks: 653658
Blocks: 653657
we're getting close to the wire on this bug. Any progress?
Nope.

The design decision for current Javascript implementation in Tilt (#653658) is interesting and looks more like how a MOZ_dom_window_texture extension would look like.

This proves to make sense for Tilt's use case so we probably should spec towards that design (grabbing a texture for the whole window rather than one texture per element), which is MUCH easier to spec out as well.

Spec'ing from an existing prototype is usually better than to spec'ing upfront, the system works :p
So we need to agree on the overall design, I can see two directions :

A) The extension allows to retrieve a texture for every DOM element individually.
   This can be hard to spec/implement in many cases (eg. elements that are 'hidden' and/or do not have their own layer), also this can dramatically increase the number of state changes necessary to draw a scene.
   IMHO we can forget this design.


B) The extension allows to retrieve a texture for the whole window.
   1) Enabling the extension allows passing a DOMWindow as tex(Sub)?Image2D's data argument.
   It is the user's responsibility to update the texture with subsequent texture definition calls.
   This API might hinder full reusability (requires a texture allocation and 'copy').

   2) Extension object exposes a getWindowTexture() returning a read-only WebGLTexture that always contains the latest available composition.
   Attempting to attach this texture to a FBO or using (copy)tex(Sub)Image2D generates an INVALID_OPERATION.
   This API might allow full reusability of underlying graphic objects when possible.


Thoughts?
A)
This is really not needed, and shouldn't be needed, because it's a lot easier and more efficient to just calculate texture coordinates for a DOM element instead of using multiple images. Especially with Tilt, too many textures is a bad idea, as the the number of DOM nodes drawn can be very high. Too many textures will also require multiple meshes -> multiple draw calls -> bad.

B)
1. Creating a texture each frame (or at least when the webpage content changes - which is hard to handle perfectly, think videos for example) still involves duplicating and allocating memory. It might cripple the functionality and ux.

2. A read only texture seems perfect and it involves no memory duplication. I don't see any context in which modifying it directly is necessary/useful/safe. Basically, just the ability of attaching the texture to a sampler in a pixel shader is enough.
Common problems for both approaches :
- how to expose content larger than MAX_TEXTURE_SIZE ?
- is the window texture always RGB or always RBGA ? or is it up to the implementation (possibly depending on the page) ?


Issue for B2 :
- returning a WebGLTexture adds one more reference to track and define special behavior to (read-only, cannot delete, cannot change parameters{?})


Inspired from GL_NV_video_capture [1], new API proposal that resolves these issues:

void bindWindowTextureRegion(GLenum target, WebGLWindowTextureRegion region);

Bind a window texture region to a texturing target.
When the call is successful, the currently bound WebGLTexture is unbound. {NOTE: therefore the read-only behavior is given 'for free' by the current WebGL spec [2]}

If the region is not valid, an INVALID_ARGUMENT error is generated.


long getWindowTextureRegionCount();
Returns the number of available window texture regions.


WebGLWindowTextureRegion getWindowTextureRegion(GLint index);
Creates a WebGLWindowTextureRegion for region index starting from region 0 to the value returned by getWindowTextureRegionCount() exclusive.

If there is no region at index, an INVALID_VALUE error is generated.

The returned object is valid for binding with bindWindowTextureRegion for as long as the region's properties do not change.

Regions can overlap.
{{NOTE: this is to allow that one region internally references a video/image/canvas independent layer directly}}



interface WebGLWindowTextureRegion {
  GLint x;
  GLint y;
  GLint width;
  GLint height;
  GLenum format;
}



[1] : http://www.opengl.org/registry/specs/NV/video_capture.txt

[2] : 5.13.8 Texture objects
Texture objects provide storage and state for texturing operations. If no      WebGLTexture is bound (e.g., passing null or 0 to bindTexture) then attempts to modify or query the texture object shall generate an INVALID_OPERATION error.
in interface WebGLTextureRegion:
width => screenWidth
height => screenHeight

To clarify that region sizes are defined by their 'onscreen' dimensions, the actual underlying texture width/height is not exposed.
For instance if the page is 1920×1080 black with one 1024x768 video, there *could* optimally be only two regions with a 1x1 black texture for the former.


Adding "bool isWindowTextureRegion(WindowTextureRegion region)" to check validity of a region is probably a good idea to simplify region validity checks if the application needs to adapt its geometry/uniforms/etc when region properties change.
(In reply to comment #6)
This sounds nice. The only (probably unlikely) difficulty may arise when the maximum texture regions is greater than GL_MAX_COMBINED_TEXTURE_IMAGE_UNITS. That would require multiple draw calls and splitting the geometry in total_regions/max_texture_units. Is that a possibility?
Simply doing one draw call per region doesn't seem unreasonable considering the number of regions is likely to be small for most but contrived pages.

In practice the number of regions should relate to the number of layers Gecko is using to composite the page.


A clever application could indeed batch regions per number of texture units, but the complexity is probably not worth it IMHO.
A screen plane (x,y,width,height) is probably not flexible enough to expose efficiently textures for layers transformed on the GPU (eg. CSS (3D)? transforms).

It might make sense to rather expose a 2D quad in viewport px coordinates.

interface WebGLWindowTextureRegion {
  GLenum format;
  GLint x0;
  GLint y0;
  GLint x1;
  GLint y1;
  GLint x2;
  GLint y2;
  GLint x3;
  GLint y3;
}

The quad is in CCW order.
Texture UV coordinates are always {x0,y0}=>0,0 ; {x1,y1}=>0,1 ; {x2,y2}=>0,1  {x3,y3}=>1,1
Texture UV coordinates of a window region quad are always {x0,y0}=>0,0 ; {x1,y1}=>0,1 ; {x2,y2}=>1,1 ; {x3,y3}=>1,0
I don't think it's a good idea to use an already-3d-transformed quad as source for a texture: that loses quality and could result in artifacts at the quad's edges. Rather, if interop with css 3d transforms is a concern, I'd talk directly with Matt Woodrow who's working on css 3d transforms and I believe is considering layers acceleration for them so he must be thinking about the problems that you need to solve.
Interesting, actually 'interop' with any kind of GPU-accelerated transform is important, CSS 2D transforms must be 'exposed' as well.

How would it lose quality? I thought anisotropic filtering but that does not require depth afaik.

Beside the coordinates issue, what are your thoughts on the overall design?
Right that we probably better keep Z in the projected coordinates in order to solve z-index and or overlapping 3D transforms.

Update the API to avoid region object allocations and use of a typed array to allow direct usage of region coordinates for draw calls (possibly batched by using views from the same ArrayBuffer) :

void readWindowTextureRegionCoords(GLint index, WebGLFloatArray quad);

Fills an array with vec3 projected coordinates of the region in viewport pixel coordinates.

quad array must have a length of at least 12 elements or an INVALID_OPERATION error is generated.

If there is no region at index, an INVALID_VALUE error is generated.


any getWindowTextureRegionParameter(GLint index, GLenum pname);

With pname MOZ_WINDOW_REGION_TEXTURE_FORMAT_WEBGL, returns the texture format of the region (one of PixelFormat enumeration, ie. RGB or RGBA).

If there is no region at index, an INVALID_VALUE error is generated.
Summary: Write draft specification for MOZ_dom_element_texture WebGL extension → Write specification for MOZ_window_region_texture WebGL extension
Small typo fixes same URL.
Two issues:
[1] how to specify the depth component and its range?
[2] should arrays filled with readWindowRegionCoords contain interleaved texture coordinates? should different formats be available through an format parameter?
(In reply to comment #15)
> Small typo fixes same URL.
> Two issues:
> [1] how to specify the depth component and its range?
> [2] should arrays filled with readWindowRegionCoords contain interleaved
> texture coordinates? should different formats be available through an format
> parameter?

[2] We can follow the glVertexAttribPointer style, and add some extra params: readWindowRegionCoords(GLint region, FloatArray coords, GLsizei stride, GLint positionPointer, GLint uvPointer), where 
 * stride is the byte offset for each vertex
 * positionPointer is the index of the position elements for a vertex
 * uvPointer is the index of the texture coordinates in the array

Default param values:
* stride defaults to Float32Array.BYTES_PER_ELEMENT * 2
* positionPointer defaults to 0, 
* uvPointer to -1 (which means no uv coordinates are interleaved in the array)

The simplest case would be
[x, y, ...]
with only 4 vertices, no uvs, so an explicit call is:
readWindowRegionCoords(..., Float32Array.BYTES_PER_ELEMENT * 2, 0, -1);

For example, for a structure like
[e0, e1, x, y, z, e5, e6, e7, u, v, ...]
a vertex may have any number of elements, in this case the call is:
readWindowRegionCoords(..., Float32Array.BYTES_PER_ELEMENT * 10, 2, 8);

Note: that z value in the array is left alone (only x, y, u and v are changed).

[1] I'm not quite sure what you mean by depth component. If you're referring to the z-index of a region, to solve the overlapping problem, we could
a). sort the regions by their depth internally, so that region == depth for the bindWindowRegionTexture(target, region) call
b). use getWindowRegionParameter(region, pname) for each region, and add a WINDOW_REGION_TEXTURE_DEPTH or WINDOW_REGION_TEXTURE_ZINDEX param
[2] Adding stride is a great idea. I'll add this.
    This resolves well the problem of interleaving the vertex coordinates with other information (UV but also possibly which sampler to use in case one want to batch draw commands for multiple region textures).
    IMHO 'stride' alone, makes the 'uvPointer' argument unnecessary as the vertex order is specified and can be set filled in the array at appropriate strides.
    I'm not so sure it makes sense to have an offset argument (here 'positionPointer') as this capability is already provided by typed arrays.


[1] By depth component I mean 'z' in the vec3 vertex for each coordinate.
    Solving z-index is only one part of problem, though arguably the most important. Sorting or having a WINDOW_REGION_TEXTURE_DEPTH parameter would not work as a region is not always a plane in viewport space (eg. CSS 3D transforms), so we need a depth component on each vertex of the quad.
    I'm not sure how we can specify the range, near plane and far plane for this component.
(In reply to comment #17)
> [2] Adding stride is a great idea. I'll add this.
>     This resolves well the problem of interleaving the vertex coordinates
> with other information (UV but also possibly which sampler to use in case
> one want to batch draw commands for multiple region textures).
>     IMHO 'stride' alone, makes the 'uvPointer' argument unnecessary as the
> vertex order is specified and can be set filled in the array at appropriate
> strides.
>     I'm not so sure it makes sense to have an offset argument (here
> 'positionPointer') as this capability is already provided by typed arrays.
> 
> 
> [1] By depth component I mean 'z' in the vec3 vertex for each coordinate.
>     Solving z-index is only one part of problem, though arguably the most
> important. Sorting or having a WINDOW_REGION_TEXTURE_DEPTH parameter would
> not work as a region is not always a plane in viewport space (eg. CSS 3D
> transforms), so we need a depth component on each vertex of the quad.
>     I'm not sure how we can specify the range, near plane and far plane for
> this component.

[2] I agree.

[1]
a). One idea would be unprojecting the four vertices by calculating the inverse mvp matrix of the CSS transform. This would give 4 2d points in screen space, (thus with depth = 0) but it would make far less interesting implementations available (in Tilt, this CSS3 transforms may perhaps be represented as actual tilted stacks, rotated or transformed based on the CSS model transformation).

b). A better idea: the position of each vertex from the quad can be multiplied by the modelview transformation matrix defining the CSS transform (or any other transformation if available, at the readWindowRegionCoords call), and this will directly give a corresponding z value for the position of each vertex.

In both cases, the question is: can a transformation matrix be accessed (calculated?) from a CSS transform?
Thought some more about [1], and it seems a). from my previous reply is too much of a hassle, ignore.

For b), getComputedStyle, for both 2d and 3d transforms*, gives a transformation matrix that is the result of applying the individual functions listed in the transform property. If we multiply the position of each vertex from the quad (untransformed) with the matrix, shouldn't this inherently give the correct 3d coordinates (including the z depth)?

* http://www.w3.org/TR/css3-3d-transforms/
Added stride parameter to readWindowRegionCoords.
Replaced getWindowRegionCount by getWindowParameter MAX_WINDOW_REGION, this is more in line with he general GL API conventions and a natural extension point for new window parameters.
Fixed misc typos.

=> http://neonux.com/webgl/MOZ_window_region_texture.html
New update at same URL.

- readWindowRegionCoords now fills non-transformed vec2 coordinates in viewport pixel space.
- transformations are to be read (if/as needed) with readWindowRegionTransform, it's more flexible (allows transformations and tweening to happens fully in shaders) and maps well with CSS 2D/3D transforms.
- misc typo fixes and clarifications
New update at same URL.

- Added introduction to window region concept in Overview.
- Overlapping regions are ordered back-to-front.
- Fixed misc typos.
Added WINDOW_REGION_ALPHA parameter for region-wide opacity.
Had a phone call with Cedric this morning, no big objection. The main conclusion we drew was that it was needed to clarify when the window region capture would occur; we agreed that the spec should say something like "any time between the bindWindow call and the next HTML compositing" and not try to define it more precisely than that. For the rest we mostly discussed details, I trust that Cedric noted that down.
We also noted that there still were some question marks around Layout:
 - how to correctly account for CSS 3D transforms when determining the Z-ordering of regions
 - is there a need to handle arbitrary texcoords (probably yes)
For these questions what's needed is to get in touch with Matt Woodrow and/or Roc.
Thanks Benoit.

Updated at http://neonux.com/webgl/MOZ_window_region_texture.html according to our call :
- clarified sentence about compositing event wrt execution context
- MAX_WINDOW_REGION => MAX_WINDOW_REGIONS
- FloatArray => Float32Array
- added readWindowRegionTexCoords, so that a rendering engine using texture "clipping" internally does not have to copy texture to expose to WRT, just expose new UVs
- fixed misc typos


Unresolved issues :
- maybe we could find a better name for WINDOW_REGION_ALPHA ?
- is ordering regions back-to-front enough? (need testing and/or talk with mwoodrow)
CSS 3D transforms (bug 505115) just landed.
Priority: -- → P2
What's the advantage to doing this instead of using drawWindow and uploading?  Rather, what will this extension enable us to do differently?  Short of having content being able to render directly via GL, it seems like no matter what we'd be uploading into a texture...
(In reply to Vladimir Vukicevic [:vlad] [:vladv] from comment #28)
> What's the advantage to doing this instead of using drawWindow and
> uploading?  Rather, what will this extension enable us to do differently? 
> Short of having content being able to render directly via GL, it seems like
> no matter what we'd be uploading into a texture...

IIRC, the point was to access texture data directly, without a subsequent texImage2D or texSubImage2D on the client side (or any kind of copy/duplication for that matter). This way, the window region textures would just be simply bound to a uniform location and used directly, whereas the current approach requires an intermediate step.
Sure, but I don't see where those "window region textures" come from.  We don't have such a thing -- the closest that we have is some layers on very specific GL-layers-backend-using platforms... but there, the actual GL textures are owned by the compositor which is on a different thread or different process.  Even if it was in the same one, WebGL does not have access to those textures because they will always be in an entirely different GL context (and we don't do context sharing for various reasons, including performance and security).  There are ways to share just a texture across contexts/processes, but they are generally fragile and highly platform specific.
(In reply to Vladimir Vukicevic [:vlad] [:vladv] from comment #30)
> Sure, but I don't see where those "window region textures" come from.  

Yeah, that was the hardest problem to solve. The (apparently wrong) assumption was that layers may provide such data efficiently or without too much hassle.
Unfortunately no :(  Even if you were on a GL-layers-backend using system (Android, OSX, B2G) and you were in some way able to access that texture from your process space (Android, OSX, B2G with difficulty), those content layers are either flattened (multiple chunks of content rendered to one texture, so you can't get a texture specifically for just one element) or multiple textures are involved (in case we have some elements that are animating, for example), so you'd have to track that.

The amount of effort involved in bookkeeping all of that, even assuming you could get to it (and note that Windows is not on the first list even), I think makes it not worth it especially for a developer tool.  Sorry! :(
(In reply to Vladimir Vukicevic [:vlad] [:vladv] from comment #32)
> Unfortunately no :(  Even if you were on a GL-layers-backend using system
> (Android, OSX, B2G) and you were in some way able to access that texture
> from your process space (Android, OSX, B2G with difficulty), those content
> layers are either flattened (multiple chunks of content rendered to one
> texture, so you can't get a texture specifically for just one element) or
> multiple textures are involved (in case we have some elements that are
> animating, for example), so you'd have to track that.
> 

Having one or more different textures for certain areas of a webpage with little or no apparent relation to each other was never really an issue.

The current implementation itself uses a single texture for the entire visualization mesh, not single textures for multiple elements. The idea of multiple texture(s)-per-region(s) wouldn't conflict too much with the current scenario, because several meshes could've been formed. In fact, a handful of cute optimizations would've been possible (like a single mesh, with or without degenerate triangles, with attributes specifying from which texture to sample the pixels etc.)

> The amount of effort involved in bookkeeping all of that, even assuming you
> could get to it (and note that Windows is not on the first list even), I
> think makes it not worth it especially for a developer tool.  Sorry! :(

Indeed, that makes total sense. Feel free to close this and bug 653657.
Marking WONTFIX based on comment 32 and 33.
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.