Closed Bug 814524 Opened 7 years ago Closed 6 years ago

Make WebGLContext::TexImage2D avoid readback for video elements

Categories

(Core :: Canvas: WebGL, defect)

18 Branch
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla33

People

(Reporter: roc, Assigned: chiajung)

References

Details

(Whiteboard: webgl-perf)

Attachments

(4 files, 20 obsolete files)

6.39 KB, text/plain
Details
49.57 KB, patch
chiajung
: review+
Details | Diff | Splinter Review
3.61 KB, patch
chiajung
: review+
Details | Diff | Splinter Review
1.64 KB, patch
Details | Diff | Splinter Review
Currently TexImage2D always calls SurfaceFromElement, which will usually/often do a readback, especially for esoteric video frame Image types.

I suggest defining nsLayoutUtils::ImageFromElement(nsHTMLVideoElement* aElement), returning a layers Image object along with mCORSUsed and mPrincipal. Then we'll need some code in GLContext, preferably shared with ImageLayerOGL, to support copying from various kinds of Images to a texture.
Or in the case of YCbCr gralloc image, turn it to a texture directly. We would need to create a RGB texture if the WebGL content want to write into the texture.
Assignee: nobody → chung
I think the simplest way to enhance the performance here is simply do color convertion in GPU than return the texture id directly. Though it is possible to direct bind the GraphicBuffer to texture on B2G, but to use image_external extension, the shader code must change. Then the best way to overcome such problem becomes 

http://www.khronos.org/registry/webgl/extensions/proposals/WEBGL_dynamic_texture/

To be generic, the first step should still enhance the GetAsSurface implementation for YUV format image since it helps to use video element with 2D Canvas case, too.

The basic idea is do color convertion in GPU and read it back or map it back to gfxASurface based on the backend OpenGL version and platform. For B2G, it is possible to render to GraphicBuffer directly and wrap the GraphicBuffer to gfxASurface. For other platform do not support OpenGL4.0, render to renderbuffer and readback is needed. If the GPU support OpenGL4.0, we can just memory map the VRAM. However, for platform other than B2G, it does not guarantee performance boost. As this cost an extra readback/mmap operation than CPU conversion code. (And note that upload texture with the mmap'ed pointer is an error)

For WebGL, we may define GetAsTexture and convert the frame in GPU, which should definitely faster since all operation is occurred in GPU side, and no readback or extra texture upload is needed.
We should be able to pass a YUV frame through a YUV->RGB shader, and render the output into a texture. Then, we just have WebGL give the user back that texture.
Rather, have SurfaceFromElement support giving us a YUV frame, which webgl code can then be responsible for color-converting itself.

Eventually, ideally, we'd skip uploading the YUV frame, since we *should* have at least *some* decoded video frame sitting around already on the GPU as YUV.
(In reply to Jeff Gilbert [:jgilbert] from comment #3)
> We should be able to pass a YUV frame through a YUV->RGB shader, and render
> the output into a texture. Then, we just have WebGL give the user back that
> texture.

This is what I think. 

But for 2D canvas now, we will have to convert-and-readback to produce a RGB gfxASurface, where GPU convertion is faster than CPU on B2G but maybe not on desktop.
Duplicate of this bug: 679366
Depends on: 880114
(In reply to Chiajung Hung [:chiajung] from comment #5)
> But for 2D canvas now, we will have to convert-and-readback to produce a RGB
> gfxASurface, where GPU convertion is faster than CPU on B2G but maybe not on
> desktop.

Conversion is fill-rate limited, and should always[1] be faster on the GPU. The exception is if we're talking CPU-convert vs upload+GPU-Convert+readback, which will be slower. However, I don't think there's a situation where we want to make CPU-local conversions fast, but where a GPU-side conversion would not be better. The only one I can think of is for the CPU canvas2d case, but we should just not bother with this optimization for that. (It's going to be slow anyways)

The scope for this bug should only be optimizing texImage2D in WebGL.


That said, it might actually be easier to tackle accelerated texImage2D from <canvas>s first, though, since that's smaller in scope.
(In reply to Jeff Gilbert [:jgilbert] from comment #7)
> (In reply to Chiajung Hung [:chiajung] from comment #5)
> > But for 2D canvas now, we will have to convert-and-readback to produce a RGB
> > gfxASurface, where GPU convertion is faster than CPU on B2G but maybe not on
> > desktop.
> 
> Conversion is fill-rate limited, and should always[1] be faster on the GPU.
> The exception is if we're talking CPU-convert vs
> upload+GPU-Convert+readback, which will be slower. However, I don't think
> there's a situation where we want to make CPU-local conversions fast, but
> where a GPU-side conversion would not be better. The only one I can think of
> is for the CPU canvas2d case, but we should just not bother with this
> optimization for that. (It's going to be slow anyways)
> 
> The scope for this bug should only be optimizing texImage2D in WebGL.
> 
> 
> That said, it might actually be easier to tackle accelerated texImage2D from
> <canvas>s first, though, since that's smaller in scope.

Yes, I created another bug for video-to-canvas2d case. 

I think this bug should specific to video-to-webgl case. Howevet, I am not sure I understand why canvas-to-canvas case need any optimization. I will check the code later...

For webgl-to-webgl, I think its very simple if the GLContexts are shared context. If they are not, and on B2G with gralloc SurfaceStream, we may transfer the GraphicBuffer directly. On other platform, readback maybe inevitable.

For canvas2D-to-webgl, if skiaGL becomes the default backend, then the case is similar to webgl-to-webgl case. If software renderer is used, no color space conversion is needed at all, but texture upload is needed.
Whiteboard: webgl-perf
Attached patch WIP (obsolete) — Splinter Review
This patch tested on B2G devices thus GrallocImage path only.
Next version will contain the ability to handle Y-flip case, and test both normal YCbCrImage and GrallocImage on device.
Attachment #8393874 - Flags: feedback?(jgilbert)
Added detailed performance analysis of WebGL video to texture upload with theoretical on bug 880114. Should I quote it on this ticket too?
(In reply to Florian Bösch from comment #10)
> Added detailed performance analysis of WebGL video to texture upload with
> theoretical on bug 880114. Should I quote it on this ticket too?
It's OK for me, but to make things clear I think we should keep further video-to-webGL discussion here and video-to-skiaGL there.
Attached patch V1 (obsolete) — Splinter Review
Implement a fast path for video texture uploading for both GrallocImage and PlanarYCbCrImage with flip support.
Attachment #8393874 - Attachment is obsolete: true
Attachment #8393874 - Flags: feedback?(jgilbert)
Attachment #8394478 - Flags: review?(jgilbert)
Tested Firefox 29.0b1

Machine: OS: Ubuntu 13.04 kernel 3.8.0-35, Driver: Nvidia driver 331.20, GPU: GTX-780, CPU: Intel Core i7 4770S 3.1ghz

Full Test
=========

Program: http://codeflow.org/issues/slow_video_to_texture/

Stats: attached to the test page in historical

Result: no significant change to Firefox 28.0

Simple Test
===========

Program: http://codeflow.org/issues/slow_video_to_texture/simple.html

Stats:
  FPS: 45.24
  Upload: 9.11ms

Result: no significant change to Firefox 28.0

Conclusion
==========

Firefox still around 10x slower than theoretical best performance on this machine (as measured by http://codeflow.org/issues/slow_video_to_texture/video-test.zip)
(In reply to Chiajung Hung [:chiajung] from comment #12)
> Created attachment 8394478 [details] [diff] [review]
> V1
> 
> Implement a fast path for video texture uploading for both GrallocImage and
> PlanarYCbCrImage with flip support.

If I understand the patch correctly here's what it does each time webgl calls texImage2D with a video in planar yuv format:

1) create the required resources (a new program, framebuffer object, vertex buffer and 3 textures)
2) allocate the 3 textures and fill them with texImage2D (and allocate if required the destTex
3) attach the destTex to the framebuffer object
4) bind the fbo use the program and raw the vertices
5) destroy all the resources again

If so, this would be suboptimal because:

* createTexture/deleteTexture 3x each frame would be expensive, it's better to use texSubImage2D against established textures (this also avoids triggering allocations on the GPU as opposed to texImage2D on a persistent texture)
* compiling/linking a program each frame is expensive, it's better to use a persistent program and just call useProgram at the appropriate time
* creating/destroying a vertex buffer object with the same data each frame is expensive, it's better to use a persistent vertex buffer object and just call the appropriate bind/vertexAttribPointers (or VAO bind)
* creating/deleting a framebuffer object each time is expensive, it's better to use a persistent framebuffer object
* modifying a persistent framebuffer object each frame would be expensive, it's better to leave a once created framebuffer object unmodified and just switch between FBOs.


At a rough estimate I'd say the overhead created by doing the extra work of creating/setting up/destroying the resources each frame eclipses any gains made by the method (mileage may differ with different GPUs/drivers).

The difficulty introduced by keeping a persistent transfer state around would be how to know when to set it up and discard. I believe that could be solved however. A given <video> on a page will only require one transfer state (no matter how many contexts wish to read from it). Upon the first texImage2D(video) an appropriate transfer object could be created and subsequent calls utilize this objects setup resources to quickly perform the transfer. Each time a transfer object is used, it gets a timestamp of last use, if the client fails to use it for a set amount of time, it gets discarded.
(In reply to Florian Bösch from comment #14)
> (In reply to Chiajung Hung [:chiajung] from comment #12)
> > Created attachment 8394478 [details] [diff] [review]
> > V1
> > 
> > Implement a fast path for video texture uploading for both GrallocImage and
> > PlanarYCbCrImage with flip support.
> 
> If I understand the patch correctly here's what it does each time webgl
> calls texImage2D with a video in planar yuv format:
> 
> 1) create the required resources (a new program, framebuffer object, vertex
> buffer and 3 textures)
> 2) allocate the 3 textures and fill them with texImage2D (and allocate if
> required the destTex
> 3) attach the destTex to the framebuffer object
> 4) bind the fbo use the program and raw the vertices
> 5) destroy all the resources again
> 
> If so, this would be suboptimal because:
> 
> * createTexture/deleteTexture 3x each frame would be expensive, it's better
> to use texSubImage2D against established textures (this also avoids
> triggering allocations on the GPU as opposed to texImage2D on a persistent
> texture)
If this does matter, I think I can cache them or add a preference for this case, this is faster for firefoxOS devices (the GRALLOC_PLANAR_YCBCR path), which need only 1 texture object allocation(glGenTextures) and no texture allocation/upload.
> * compiling/linking a program each frame is expensive, it's better to use a
> persistent program and just call useProgram at the appropriate time
No, its cached.
Check InitTexQuadProgram, there is a early return which avoid VBO/program/attribute/uniform update. 
> * creating/destroying a vertex buffer object with the same data each frame
> is expensive, it's better to use a persistent vertex buffer object and just
> call the appropriate bind/vertexAttribPointers (or VAO bind)
They are cached, too.
> * creating/deleting a framebuffer object each time is expensive, it's better
> to use a persistent framebuffer object
> * modifying a persistent framebuffer object each frame would be expensive,
> it's better to leave a once created framebuffer object unmodified and just
> switch between FBOs.
Thanks for your detailed comment, I would like to do some real test on desktop firefox later to decide how to make it better. Because firefoxOS devices get some other trouble that I can not complete your benchmark on them. 
> 
> 
> At a rough estimate I'd say the overhead created by doing the extra work of
> creating/setting up/destroying the resources each frame eclipses any gains
> made by the method (mileage may differ with different GPUs/drivers).
> 
> The difficulty introduced by keeping a persistent transfer state around
> would be how to know when to set it up and discard. I believe that could be
> solved however. A given <video> on a page will only require one transfer
> state (no matter how many contexts wish to read from it). Upon the first
> texImage2D(video) an appropriate transfer object could be created and
> subsequent calls utilize this objects setup resources to quickly perform the
> transfer. Each time a transfer object is used, it gets a timestamp of last
> use, if the client fails to use it for a set amount of time, it gets
> discarded.
With this patch it in fact get faster even compared with Chrome(for high resolution(over 240p)), however, there is some render error I have to handle before add those GL object cache and detailed test.
It seems the rendering error comes from stride in the yuv frame, I will create a new version by following the code of Compositor:

http://dxr.mozilla.org/mozilla-central/source/gfx/layers/composite/TextureHost.cpp#551

I think the resultant performance may not that good since it may take time padding the input data for TexImage call.
Afaik planar yuv 420 uses a byte each for Y, U and V. I'm assuming the buffers the video decoder produced would be densely packed (what other information would there be?). So what's the stride?

Also if you've got trouble with some non dense/padded formats that would be slow to convert on the CPU, you can use a buffer texture with the format GL_R8 and then index the bytes in the buffer in a shader like so:

  uniform samplerBuffer source;
  ...
  float Y = texelFetch(source, offset)-0.0625;
  
Where offset is an integer.
Attachment #8394478 - Flags: review?(jgilbert) → feedback?(jgilbert)
Attached patch WIP V2 (obsolete) — Splinter Review
Fix stride problem, however, it makes the test: 

http://codeflow.org/issues/slow_video_to_texture/

black screen for second run. And a interesting thing is that the overhead makes 240p case slower. This patch do not contain hack for TexSubImage, which I will add later once I fix the problem on black screen in the test.
Attachment #8394478 - Attachment is obsolete: true
Attachment #8394478 - Flags: feedback?(jgilbert)
Attachment #8396830 - Flags: feedback?(jgilbert)
The result after fix black screen:
My Test build@GPU: nVidia Geforce 630GT CPU: Intel Core i7-3770
                         VP8 
          240p    480p    720p    1080p
texImage: *2.91*  3.35    3.90    6.13
SubImage: *0.18*  6.48    11.97   23.02

Where texImage is affected by the hack, SubImage is not. I think the first time shader compilation/link/VBO manipulation make the 240p average bad. Binary shader may help but I don't think it is a good idea.
(In reply to Chiajung Hung [:chiajung] from comment #20)
> I think the first
> time shader compilation/link/VBO manipulation make the 240p average bad.
The averages are computed from all frames. The framerate is mostly around 60fps, the video length is 10 seconds. Thus around 600 frames are considered for the average. If the first call to compile the shader is the problem, then that adds 2.73ms to the average. If the other 599 timings where 0.18ms, it would mean that the first one would have to be 1638ms. That seems unlikely.

Also test with the simple version http://codeflow.org/issues/slow_video_to_texture/simple.html
(In reply to Chiajung Hung [:chiajung] from comment #20)
>           240p    480p    720p    1080p
> texImage: *2.91*  3.35    3.90    6.13
> SubImage: *0.18*  6.48    11.97   23.02

At a guess these timings are still somewhere around 5-6x slower than required. It's likely most machines will manage a 1080p yuv420p upload at around 1ms (it's around 3mb).
(In reply to Florian Bösch from comment #22)
> (In reply to Chiajung Hung [:chiajung] from comment #20)
> >           240p    480p    720p    1080p
> > texImage: *2.91*  3.35    3.90    6.13
> > SubImage: *0.18*  6.48    11.97   23.02
> 
> At a guess these timings are still somewhere around 5-6x slower than
> required. It's likely most machines will manage a 1080p yuv420p upload at
> around 1ms (it's around 3mb).

1080p data upload in 1ms? I will log the time it costs for upload the padded YUV data later. And I will try to see whether GL objects cache helps.

Basically, 1080p data size in RGBA8 is 1920*1080*4~=8mb, for non-padded YUV it is 1920*1080*3/2~=3mb, however, the YUV data is 128 padded on my test build, so the real size should be 1920*1080+1024*1080*2~=4mb, and if we take 240p cost as the minimal cost incurred by those GL overheads(FBO manipulation, GL state manipulation, color convert rendering), it seems to me that to achieve 1080p at near 1ms quiet impossible.
Attached patch WIP V3 (obsolete) — Splinter Review
It seems GLobject cache helps the performace quiet much, the new result is quiet good. The only concern here is that if WebApp uses a large amount of FBO/texture such cache may frustrate it. Suppose a web app need a lot of VRAM, this patch will make it easier to hit OOM. Origin code can fall back to CPU version once GenTexture/GenFramebuffer/TexImage2D failed, but for now, I will have to check the OOM each time the web app calls create[Something], and tear down the cache when OOM occured then retry.

Test result with cache: 

                         Theora 
          240p    480p    720p    1080p
texImage: 0.33    0.49    0.79    1.56
SubImage: 0.57    2.77    6.00    10.90

For the 1080p case, the texImage2D for YUV textures cost me ~1.5ms:

frame size:(1920,1088): 2.99mb, upload size:(1952,1088): 3.04mb, push state: 0.021033, upload: 1.488709, restore: 0.063564, full: 1.691770 ms

where framesize is the video size, upload size is the frame size with padding, push state is the time used to store GL states, upload is the total time to upload yuv textures, restore is the time used to restore GL states, full means the total time of my function.

The conversion overhead is only near 0.2 ms for 1080p.
Attachment #8396830 - Attachment is obsolete: true
Attachment #8396830 - Flags: feedback?(jgilbert)
(In reply to Chiajung Hung [:chiajung] from comment #23)
> 1080p data upload in 1ms? I will log the time it costs for upload the padded
> YUV data later. And I will try to see whether GL objects cache helps.
> 
> Basically, 1080p data size in RGBA8 is 1920*1080*4~=8mb, for non-padded YUV
> it is 1920*1080*3/2~=3mb, however, the YUV data is 128 padded on my test
> build, so the real size should be 1920*1080+1024*1080*2~=4mb, and if we take
> 240p cost as the minimal cost incurred by those GL overheads(FBO
> manipulation, GL state manipulation, color convert rendering), it seems to
> me that to achieve 1080p at near 1ms quiet impossible.

I've ran upload tests for RGBA previously on my machine. The tests quoted on this comment https://code.google.com/p/chromium/issues/detail?id=91208#c48 .

Those are the results:

texSubImage2D RGBA 
	1080p: 2.43ms
	720p: 0.76ms
	480p: 0.32ms

double buffered PBO and memcpy RGBA
	1080p: 1.94ms
	720p: 0.86ms
	480p: 0.40ms

double buffered PBO and memcpy RGB
	1080p: 1.45ms (54% better)
	720p: 0.65ms (41% better)
	480p: 0.22ms (50% better)

Note that an RGBA frame in 1080p is 8mb and a 720p frame is 3.6mb. Every method tested managed to upload the 720p frame below 1ms. So I think that 3mb or 4mb of YUV data below 1ms is quite possible. On my machine it should be possible at around 0.85ms for a 4mb frame.
(In reply to Chiajung Hung [:chiajung] from comment #24)
> Created attachment 8397587 [details] [diff] [review]
> WIP V3

I think you're using texImage2D to do the upload. If that hits straight to the driver (and isn't transmoglified inbetween somewhere) it's quite possible that this is slower than texSubImage2D. A lot of drivers/GPUs will deallocate the previously existing texture and reallocate it upon texImage2D. For that reason you should only use texImage2D for setting up the size of a texture, and subsequent calls to fill it should use texSubImage2D. On more modern variants of GL you'd use the call glBufferStorage instead of texImage2D.
(In reply to Chiajung Hung [:chiajung] from comment #24)
> The only concern here is that if WebApp uses a large amount of
> FBO/texture such cache may frustrate it. Suppose a web app need a lot of
> VRAM, this patch will make it easier to hit OOM.

Something I just thought of, long as you use texImage2D/texSubImage2D you only need one transfer state per video size, for the entire browser. texImage2D is a render command queue blocking command, all previous commands will complete before this one is started, so that the driver can pick up the bytes from the client. That means you can service many WebGL contexts uploading from many say 1080p videos with the same allocated transfer state.

Note that the same would not be true of a double-buffered PBO approach, because that's explicitely trying to avoid blocking the render command queue (and has a 1-frame lag for that reason). a PBO would need one transfer state per video.
(In reply to Florian Bösch from comment #26)
> I think you're using texImage2D to do the upload. If that hits straight to
> the driver (and isn't transmoglified inbetween somewhere) it's quite
> possible that this is slower than texSubImage2D. A lot of drivers/GPUs will
> deallocate the previously existing texture and reallocate it upon
> texImage2D. For that reason you should only use texImage2D for setting up
> the size of a texture, and subsequent calls to fill it should use
> texSubImage2D. On more modern variants of GL you'd use the call
> glBufferStorage instead of texImage2D.

To use any function not exported by GLContext[1] requires huge amount of work which maybe not worthy to do.

To use PBO to avoid glTexImage blocking, it will need a huge amount of change in video related code, too. Since video tag are not always used as texture in the same process, I don't think it is a reasonable change to just enhance minor performance. (Though GrallocImage path is very similar to use PBO. However, it uses a external image upon a SharedMemory-like buffer)

[1]http://dxr.mozilla.org/mozilla-central/source/gfx/gl/GLContext.h#155
(In reply to Florian Bösch from comment #27)
> Something I just thought of, long as you use texImage2D/texSubImage2D you
> only need one transfer state per video size, for the entire browser.
> texImage2D is a render command queue blocking command, all previous commands
> will complete before this one is started, so that the driver can pick up the
> bytes from the client. That means you can service many WebGL contexts
> uploading from many say 1080p videos with the same allocated transfer state.

I will try to use glTexSubImage2D and see whether it helps. Basically I think the driver should make sure the existing buffer doesn't fit before re-allocation, but if it is not the case, checking that in my part is quiet OK for me. Thanks for the information.
Attached patch WIP V4 (obsolete) — Splinter Review
Fix a crash case:
http://people.mozilla.org/~bjacob/mdn_samples_webgl_sample8/index.html

Some GL driver crash on glDrawxxxxx calls, if you enable a vertex attribute but not use them in the shader program(even if the attribute bounded buffer object still valid)

This version uses glTexSubImage once the texture allocated(average 0.02 ms faster for 1080p case)
Make WebGLContext::TexSubImage2D go to GPU case, too.
Attachment #8397587 - Attachment is obsolete: true
Attached patch V1 (obsolete) — Splinter Review
Tested on leo devices, some note:

sw decoder: CPU path is the fastest, Gralloc/PlanarYCbCr path is very slow.
hw decoder: CPU/PlanarYCbCr path is very slow. Gralloc path is the best.

It seems the cache maintain cost incurred by GraphicBuffer::unlock is deffered until first reference and make EGLImageTargetTexture2D calls very slow for sw decoder path. And access hw decoded buffer by CPU is very slow, too. As a result, I decided to reject sw decoded buffer by check GraphicBuffer::usage. Since B2G devices I have can not play 720p mp4, I can not finish the benchmark.
Attachment #8398370 - Attachment is obsolete: true
Attachment #8402519 - Flags: review?(jgilbert)
No longer depends on: 880114
Blocks: 880114
Comment on attachment 8402519 [details] [diff] [review]
V1

Review of attachment 8402519 [details] [diff] [review]:
-----------------------------------------------------------------

Good first run, but let's clean this up a bit.
s/GLState/ScopedGLDrawState/, and move it to ScopedGLHelpers.h
Combine GLBlitImageHelper with GLBlitHelper::BlitTextureToFramebuffer, and name it GLDrawBlitHelper.

::: content/canvas/src/WebGLContext.cpp
@@ +1388,5 @@
> +{
> +    uint16_t readyState;
> +    if (NS_SUCCEEDED(video->GetReadyState(&readyState)) &&
> +        (readyState == nsIDOMHTMLMediaElement::HAVE_NOTHING ||
> +         readyState == nsIDOMHTMLMediaElement::HAVE_METADATA)) {

'{' on next line for multi-line conditionals.

@@ +1398,5 @@
> +    if (!principal) {
> +        return false;
> +    }
> +
> +    mozilla::layers::ImageContainer *container = video->GetImageContainer();

'*' to the left.

@@ +1406,5 @@
> +
> +    nsRefPtr<mozilla::layers::Image> srcImage = container->LockCurrentImage();
> +    WebGLTexture* tex = activeBoundTextureForTarget(target);
> +    GLuint destTex = tex->GLName();
> +    bool needAllocation = true;

`needsAllocation`

@@ +1408,5 @@
> +    WebGLTexture* tex = activeBoundTextureForTarget(target);
> +    GLuint destTex = tex->GLName();
> +    bool needAllocation = true;
> +    if (tex->HasImageInfoAt(target, 0)) {
> +        WebGLTexture::ImageInfo info = tex->ImageInfoAt(target, 0);

`const WebGLTexture::ImageInfo& info`

@@ +1409,5 @@
> +    GLuint destTex = tex->GLName();
> +    bool needAllocation = true;
> +    if (tex->HasImageInfoAt(target, 0)) {
> +        WebGLTexture::ImageInfo info = tex->ImageInfoAt(target, 0);
> +        needAllocation = info.Width() != srcImage->GetSize().width || info.Height() != srcImage->GetSize().height;

Put the 2nd half of this conditional on the next line.
foo = bar ||
      qux;

@@ +1414,5 @@
> +    }
> +    if (needAllocation) {
> +        gl->fTexImage2D(target, level, internalformat, srcImage->GetSize().width, srcImage->GetSize().height, 0, format, type, nullptr);
> +    }
> +    bool result = gl->BlitImageHelper()->BlitImageToTexture(srcImage.get(), srcImage->GetSize(), destTex, target, mPixelStoreFlipY);

This also needs to support these:
UNPACK_PREMULTIPLY_ALPHA_WEBGL
UNPACK_COLORSPACE_CONVERSION_WEBGL

For now, just bail if PREMULT_ALPHA is `false` or `COLORSPACE` is `LOCAL_GL_NONE`.

Also, use `ok` or `uploaded` instead of `result`.

@@ +1419,5 @@
> +    if (result) {
> +        TimeStamp st = TimeStamp::Now();
> +        tex->SetImageInfo(target, level, srcImage->GetSize().width, srcImage->GetSize().height, format, type, WebGLImageDataStatus::InitializedImageData);
> +        tex->Bind(target);
> +        printf_stderr("time: %.6f ms", (TimeStamp::Now()-st).ToMilliseconds());

Remove this debugging code. Also, this code doesn't appear to measure anything useful. Did you mean to put the `st` line before `if (needAllocation)`?

::: content/canvas/src/WebGLContext.h
@@ +428,5 @@
> +      if (IsContextLost())
> +          return;
> +
> +      mozilla::dom::HTMLVideoElement* video = mozilla::dom::HTMLVideoElement::FromContentOrNull(&elt);
> +      if (TexImageFromVideoElement(target, level, internalformat, format, type, video)) { // special case

Calling this func will crash if `video` is null.

@@ +475,5 @@
>          if (IsContextLost())
>              return;
> +
> +        mozilla::dom::HTMLVideoElement* video = mozilla::dom::HTMLVideoElement::FromContentOrNull(&elt);
> +        if (TexImageFromVideoElement(target, level, format, format, type, video)) { // special case

Calling this func will crash if `video` is null.

::: gfx/gl/GLBlitImageHelper.cpp
@@ +69,5 @@
> +                    colorMask[3]);
> +
> +    for (int i = 0; i < maxAttrib; i++) {
> +        if (attrib_enabled[i])
> +            mGL->fEnableVertexAttribArray(i);

Needs an else-DisableVertAttr(i) branch.

@@ +130,5 @@
> +
> +// Allowed to be destructive of state we restore in functions below.
> +bool
> +GLBlitImageHelper::InitTexQuadProgram(ImageFormat format)
> +{

This is a huge amount of duplicate code. We should do all this in one place.

I'm thinking that we should pull the draw-blit code out of GLBlitHelper, into a GLDrawBlitHelper, which should support the superset of GLBlitHelper's current formats, plus these new formats we want to add.
GLBlitHelper would then use GLDrawBlitHelper for the texture->FB path.

@@ +132,5 @@
> +bool
> +GLBlitImageHelper::InitTexQuadProgram(ImageFormat format)
> +{
> +    const char kTexBlit_VertShaderSource[] = "\
> +    attribute vec2 aPosition;                   \n\

Indent this by one:
const char foo[] = "\
  bar  \n\
  bat  \n\
";

@@ +135,5 @@
> +    const char kTexBlit_VertShaderSource[] = "\
> +    attribute vec2 aPosition;                   \n\
> +                                                \n\
> +    varying vec2 vTexCoord;                     \n\
> +    uniform float uYflip;                       \n\

Varyings should go after attributes and uniforms.

@@ +137,5 @@
> +                                                \n\
> +    varying vec2 vTexCoord;                     \n\
> +    uniform float uYflip;                       \n\
> +                                                \n\
> +    void main(void) {                           \n\

Base-scope function definitions should have { on its own line.

@@ +139,5 @@
> +    uniform float uYflip;                       \n\
> +                                                \n\
> +    void main(void) {                           \n\
> +        vTexCoord = aPosition;                  \n\
> +        vTexCoord.y = abs(vTexCoord.y - uYflip);\n\

We could also do this as:
vTexCoord.y = uYFlip + vTexCoord.y*(1 - 2*uYFlip);
Using `abs` is a cute way to do it. It might use a branch, but this is a vert shader, so this shouldn't be a problem.

@@ +148,5 @@
> +
> +    const char kTexExternalBlit_FragShaderSource[] = "\
> +    #extension GL_OES_EGL_image_external : require      \n\
> +    #ifdef GL_FRAGMENT_PRECISION_HIGH                   \n\
> +        precision highp float;                          \n\

highp isn't useful here, just use mediump unconditionally.

@@ +163,5 @@
> +    ";
> +
> +    const char kTexYUVPlanarBlit_FragShaderSource[] = "\
> +    #ifdef GL_FRAGMENT_PRECISION_HIGH                                   \n\
> +        precision highp float;                                          \n\

I don't think we need highp floats for these calculations. I believe mediump should be enough.

@@ +197,5 @@
> +    bool success = false;
> +
> +    GLuint *programPtr;
> +    GLuint *fragShaderPtr;
> +    const char* fragShaderSource;

Star to the left.

@@ +206,5 @@
> +        fragShaderPtr = &mTexExternalBlit_FragShader;
> +        fragShaderSource = kTexExternalBlit_FragShaderSource;
> +    } else
> +#endif
> +    if (format == ImageFormat::PLANAR_YCBCR){

Splitting `else if` across lines is really gross.
This should just be a `switch (format)`, with the GRALLOC case ifdef'd out.

@@ +226,5 @@
> +            break;
> +        }
> +
> +        if (!mTexBlit_Buffer) {
> +

Unnecessary newline.

@@ +250,5 @@
> +            mGL->fBufferData(LOCAL_GL_ARRAY_BUFFER, vertsSize, verts, LOCAL_GL_STATIC_DRAW);
> +        }
> +
> +        if (!mTexBlit_VertShader) {
> +

Unnecessary newline.

@@ +418,5 @@
> +    if (srcImage->GetFormat() == ImageFormat::GRALLOC_PLANAR_YCBCR) {
> +        GrallocImage* grImg = static_cast<GrallocImage*>(srcImage);
> +        android::sp<GraphicBuffer> buf = new GraphicBuffer((ANativeWindowBuffer*)grImg->GetNativeBuffer(), false);
> +        printf_stderr("buffer (%d(%d),%d):0x%x, usage: 0x%x", buf->width, buf->stride, buf->height, buf->format, buf->usage);
> +        if (buf->usage < GraphicBuffer::USAGE_HW_MASK) {

Less-than doesn't make sense with bitmasks.

@@ +421,5 @@
> +        printf_stderr("buffer (%d(%d),%d):0x%x, usage: 0x%x", buf->width, buf->stride, buf->height, buf->format, buf->usage);
> +        if (buf->usage < GraphicBuffer::USAGE_HW_MASK) {
> +            RestoreState();
> +            return false;
> +      }

Mis-indented }.

@@ +428,5 @@
> +
> +
> +    if (!mFBO)
> +        mGL->fGenFramebuffers(1, &mFBO);
> +    ScopedBindFramebuffer boundFB(mGL, mFBO);

Newline after non-bracketed ifs.

@@ +446,5 @@
> +
> +        EGLint attrs[] = {
> +                LOCAL_EGL_IMAGE_PRESERVED, LOCAL_EGL_TRUE,
> +                LOCAL_EGL_NONE, LOCAL_EGL_NONE
> +            };

This is indented too far.

FooT bar[] = {
  1,
  2
};

@@ +447,5 @@
> +        EGLint attrs[] = {
> +                LOCAL_EGL_IMAGE_PRESERVED, LOCAL_EGL_TRUE,
> +                LOCAL_EGL_NONE, LOCAL_EGL_NONE
> +            };
> +        EGLImage image = sEGLLibrary.fCreateImage(sEGLLibrary.Display(),

Can't `image` be null here?

@@ +454,5 @@
> +                                                  grallocImage->GetNativeBuffer(), attrs);
> +        if (!mSrcTex[0])
> +            mGL->fGenTextures(1, &mSrcTex[0]);
> +        // Since EXTERNAL_OES not exposed to canvas, no need to store origin value.
> +        mGL->fBindTexture(LOCAL_GL_TEXTURE_EXTERNAL_OES, mSrcTex[0]);

If we previously bound this texture to TEXTURE_2D, trying to bind it to TEXTURE_EXTERNAL will generate an INVALID_OPERATION.

@@ +458,5 @@
> +        mGL->fBindTexture(LOCAL_GL_TEXTURE_EXTERNAL_OES, mSrcTex[0]);
> +        mGL->fEGLImageTargetTexture2D(LOCAL_GL_TEXTURE_EXTERNAL_OES, image);
> +        mGL->fTexParameteri(LOCAL_GL_TEXTURE_EXTERNAL_OES, LOCAL_GL_TEXTURE_WRAP_S, LOCAL_GL_CLAMP_TO_EDGE);
> +        mGL->fTexParameteri(LOCAL_GL_TEXTURE_EXTERNAL_OES, LOCAL_GL_TEXTURE_WRAP_T, LOCAL_GL_CLAMP_TO_EDGE);
> +        mGL->fTexParameteri(LOCAL_GL_TEXTURE_EXTERNAL_OES, LOCAL_GL_TEXTURE_MAG_FILTER, LOCAL_GL_LINEAR);

If our blit is 1-1, don't we want to use GL_NEAREST?

@@ +460,5 @@
> +        mGL->fTexParameteri(LOCAL_GL_TEXTURE_EXTERNAL_OES, LOCAL_GL_TEXTURE_WRAP_S, LOCAL_GL_CLAMP_TO_EDGE);
> +        mGL->fTexParameteri(LOCAL_GL_TEXTURE_EXTERNAL_OES, LOCAL_GL_TEXTURE_WRAP_T, LOCAL_GL_CLAMP_TO_EDGE);
> +        mGL->fTexParameteri(LOCAL_GL_TEXTURE_EXTERNAL_OES, LOCAL_GL_TEXTURE_MAG_FILTER, LOCAL_GL_LINEAR);
> +        mGL->fTexParameteri(LOCAL_GL_TEXTURE_EXTERNAL_OES, LOCAL_GL_TEXTURE_MIN_FILTER, LOCAL_GL_LINEAR);
> +        mGL->fUniform1f(mGL->fGetUniformLocation(mTexExternalBlit_Program, "uYflip"), (float)yFlip);

Cache the location.

@@ +485,5 @@
> +        }
> +        int width[3] = {yuvData->mYStride, yuvData->mCbCrStride, yuvData->mCbCrStride};
> +        int height[3] = {yuvData->mYSize.height, yuvData->mCbCrSize.height, yuvData->mCbCrSize.height};
> +        uint8_t *data[3] = {yuvData->mYChannel, yuvData->mCbChannel, yuvData->mCrChannel};
> +        for (int i = 0; i < 3; i++) {

Pull this loop out into a static helper func. Then we can do:
void Foo(texUnit, width, height, data);
Foo(0, yuvData->mYStride, yuvData->mYSize.height, yuvData->mYChannel);
Foo(1, yuvData->mCbCrStride, yuvData->mCbCrSize.height, yuvData->mCbChannel);
Foo(2, yuvData->mCbCrStride, yuvData->mCbCrSize.height, yuvData->mCrChannel);

@@ +489,5 @@
> +        for (int i = 0; i < 3; i++) {
> +            mGL->fActiveTexture(LOCAL_GL_TEXTURE0 + i);
> +
> +            mGL->fGetIntegerv(LOCAL_GL_TEXTURE_BINDING_2D, &oldTex[i]);
> +            mGL->fGetTexParameteriv(LOCAL_GL_TEXTURE_2D, LOCAL_GL_TEXTURE_WRAP_S, &oldTexParams[i][0]);

TexParameters should be operating on the GL Texture, not the GL Texture Binding Point, so these values are save restored as part of rebinding the old texture.

@@ +521,5 @@
> +            mGL->fTexParameteri(LOCAL_GL_TEXTURE_2D, LOCAL_GL_TEXTURE_WRAP_T, LOCAL_GL_CLAMP_TO_EDGE);
> +            mGL->fTexParameteri(LOCAL_GL_TEXTURE_2D, LOCAL_GL_TEXTURE_MAG_FILTER, LOCAL_GL_LINEAR);
> +            mGL->fTexParameteri(LOCAL_GL_TEXTURE_2D, LOCAL_GL_TEXTURE_MIN_FILTER, LOCAL_GL_LINEAR);
> +        }
> +        mGL->fUniform1f(mGL->fGetUniformLocation(mTexYUVPlanarBlit_Program, "uYflip"), (float)yFlip);

Cache uniform locations.

::: gfx/gl/GLBlitImageHelper.h
@@ +24,5 @@
> +namespace gl {
> +
> +class GLContext;
> +
> +struct GLState {

`ScopedGLDrawState`
Also, this should either be moved to ScopedGLHelpers.h or into this header's CPP file.

@@ +25,5 @@
> +
> +class GLContext;
> +
> +struct GLState {
> +    GLState(GLContext *gl);

Star to the left.

@@ +43,5 @@
> +    ScopedGLState scissor;
> +    ScopedGLState stencil;
> +
> +    GLuint maxAttrib;
> +    GLint* attrib_enabled;

ScopedDeleteArray<GLint>

@@ +54,5 @@
> +
> +    realGLboolean colorMask[4];
> +    GLint viewport[4];
> +    GLint scissorBox[4];
> +    GLContext *mGL;

GLContext* const

@@ +60,5 @@
> +
> +class GLBlitImageHelper {
> +public:
> +    GLBlitImageHelper(GLContext* gl);
> +    virtual ~GLBlitImageHelper();

I don't think this needs a virtual dtor.

@@ +67,5 @@
> +        GLuint destTex, GLenum destTarget, bool yFlip = 0, GLuint xoffset = 0,
> +        GLuint yoffset = 0, GLuint width = 0, GLuint height = 0);
> +
> +protected:
> +    GLState* mStoredState;

ScopedDeletePtr<GLState>

@@ +68,5 @@
> +        GLuint yoffset = 0, GLuint width = 0, GLuint height = 0);
> +
> +protected:
> +    GLState* mStoredState;
> +    GLContext* mGL;

`GLContext* const`

@@ +77,5 @@
> +    GLuint mTexExternalBlit_Program;
> +    GLuint mTexYUVPlanarBlit_Program;
> +
> +    GLuint mFBO;
> +    GLuint mSrcTex[3];

Name these texture explicitly instead of having an array.

@@ +81,5 @@
> +    GLuint mSrcTex[3];
> +    int mTexWidth;
> +    int mTexHeight;
> +
> +    void PushState();

'SaveState', since we're not Pushing onto a stack.

@@ +82,5 @@
> +    int mTexWidth;
> +    int mTexHeight;
> +
> +    void PushState();
> +    void RestoreState();

It looks like these two funcs are only used internally to Blit(). Blit should just RAII-create a GLState manually.

::: gfx/gl/GLContext.cpp
@@ +1191,4 @@
>      const bool firstRun = false;
>  #endif
>  
> +    InitializeExtensionsBitSet(mAvailableExtensions, extensions, sExtensionNames, 1);//firstRun && DebugMode());

This looks like a development-only change, right?
Attachment #8402519 - Flags: review?(jgilbert) → review-
No longer blocks: 880114
Comment on attachment 8402519 [details] [diff] [review]
V1

Review of attachment 8402519 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for your detailed review! I fixed all of them and will upload v2 later.

::: content/canvas/src/WebGLContext.cpp
@@ +1388,5 @@
> +{
> +    uint16_t readyState;
> +    if (NS_SUCCEEDED(video->GetReadyState(&readyState)) &&
> +        (readyState == nsIDOMHTMLMediaElement::HAVE_NOTHING ||
> +         readyState == nsIDOMHTMLMediaElement::HAVE_METADATA)) {

Done.

@@ +1398,5 @@
> +    if (!principal) {
> +        return false;
> +    }
> +
> +    mozilla::layers::ImageContainer *container = video->GetImageContainer();

Done.

@@ +1406,5 @@
> +
> +    nsRefPtr<mozilla::layers::Image> srcImage = container->LockCurrentImage();
> +    WebGLTexture* tex = activeBoundTextureForTarget(target);
> +    GLuint destTex = tex->GLName();
> +    bool needAllocation = true;

Done.

@@ +1408,5 @@
> +    WebGLTexture* tex = activeBoundTextureForTarget(target);
> +    GLuint destTex = tex->GLName();
> +    bool needAllocation = true;
> +    if (tex->HasImageInfoAt(target, 0)) {
> +        WebGLTexture::ImageInfo info = tex->ImageInfoAt(target, 0);

Done.

@@ +1409,5 @@
> +    GLuint destTex = tex->GLName();
> +    bool needAllocation = true;
> +    if (tex->HasImageInfoAt(target, 0)) {
> +        WebGLTexture::ImageInfo info = tex->ImageInfoAt(target, 0);
> +        needAllocation = info.Width() != srcImage->GetSize().width || info.Height() != srcImage->GetSize().height;

Done.

@@ +1414,5 @@
> +    }
> +    if (needAllocation) {
> +        gl->fTexImage2D(target, level, internalformat, srcImage->GetSize().width, srcImage->GetSize().height, 0, format, type, nullptr);
> +    }
> +    bool result = gl->BlitImageHelper()->BlitImageToTexture(srcImage.get(), srcImage->GetSize(), destTex, target, mPixelStoreFlipY);

YUV content is opaque(No matter premultiply or not, the result is the same), and video tag should always provide YUV layers::Image(Even if they are not, I will return ok = false), so I think PREMULT_ALPHA is useless here. Please correct me, if I am wrong.

By spec(https://www.khronos.org/registry/webgl/specs/1.0/#PIXEL_STORAGE_PARAMETERS), COLOR_SPACE_CONVERSION affect ImageElement only, right?

And while call TexImage2D with video, the js side always provide a format enum which can not be YUV.

@@ +1419,5 @@
> +    if (result) {
> +        TimeStamp st = TimeStamp::Now();
> +        tex->SetImageInfo(target, level, srcImage->GetSize().width, srcImage->GetSize().height, format, type, WebGLImageDataStatus::InitializedImageData);
> +        tex->Bind(target);
> +        printf_stderr("time: %.6f ms", (TimeStamp::Now()-st).ToMilliseconds());

Done. I put it here intented, since I think deferred GPU may cause implicit glFlush and glFinish and ruin the performance here.

::: content/canvas/src/WebGLContext.h
@@ +428,5 @@
> +      if (IsContextLost())
> +          return;
> +
> +      mozilla::dom::HTMLVideoElement* video = mozilla::dom::HTMLVideoElement::FromContentOrNull(&elt);
> +      if (TexImageFromVideoElement(target, level, internalformat, format, type, video)) { // special case

Fixed in cpp part to reject nullptr.

@@ +475,5 @@
>          if (IsContextLost())
>              return;
> +
> +        mozilla::dom::HTMLVideoElement* video = mozilla::dom::HTMLVideoElement::FromContentOrNull(&elt);
> +        if (TexImageFromVideoElement(target, level, format, format, type, video)) { // special case

Fixed in cpp part.

::: gfx/gl/GLBlitImageHelper.cpp
@@ +69,5 @@
> +                    colorMask[3]);
> +
> +    for (int i = 0; i < maxAttrib; i++) {
> +        if (attrib_enabled[i])
> +            mGL->fEnableVertexAttribArray(i);

Done.

@@ +130,5 @@
> +
> +// Allowed to be destructive of state we restore in functions below.
> +bool
> +GLBlitImageHelper::InitTexQuadProgram(ImageFormat format)
> +{

Merged into single GLBlitHelper. Done.

@@ +132,5 @@
> +bool
> +GLBlitImageHelper::InitTexQuadProgram(ImageFormat format)
> +{
> +    const char kTexBlit_VertShaderSource[] = "\
> +    attribute vec2 aPosition;                   \n\

Done.

@@ +135,5 @@
> +    const char kTexBlit_VertShaderSource[] = "\
> +    attribute vec2 aPosition;                   \n\
> +                                                \n\
> +    varying vec2 vTexCoord;                     \n\
> +    uniform float uYflip;                       \n\

Done.

@@ +137,5 @@
> +                                                \n\
> +    varying vec2 vTexCoord;                     \n\
> +    uniform float uYflip;                       \n\
> +                                                \n\
> +    void main(void) {                           \n\

Done.

@@ +139,5 @@
> +    uniform float uYflip;                       \n\
> +                                                \n\
> +    void main(void) {                           \n\
> +        vTexCoord = aPosition;                  \n\
> +        vTexCoord.y = abs(vTexCoord.y - uYflip);\n\

To use this, the texture wrap mode must be GL_REAPEAT(if uYflip is 1.0, the range of vTexCoord.y becomes [-1, 0]) which eglImage do not support.

@@ +148,5 @@
> +
> +    const char kTexExternalBlit_FragShaderSource[] = "\
> +    #extension GL_OES_EGL_image_external : require      \n\
> +    #ifdef GL_FRAGMENT_PRECISION_HIGH                   \n\
> +        precision highp float;                          \n\

Done.

@@ +163,5 @@
> +    ";
> +
> +    const char kTexYUVPlanarBlit_FragShaderSource[] = "\
> +    #ifdef GL_FRAGMENT_PRECISION_HIGH                                   \n\
> +        precision highp float;                                          \n\

Done.

@@ +197,5 @@
> +    bool success = false;
> +
> +    GLuint *programPtr;
> +    GLuint *fragShaderPtr;
> +    const char* fragShaderSource;

Done.

@@ +206,5 @@
> +        fragShaderPtr = &mTexExternalBlit_FragShader;
> +        fragShaderSource = kTexExternalBlit_FragShaderSource;
> +    } else
> +#endif
> +    if (format == ImageFormat::PLANAR_YCBCR){

Done.

@@ +226,5 @@
> +            break;
> +        }
> +
> +        if (!mTexBlit_Buffer) {
> +

Done.

@@ +250,5 @@
> +            mGL->fBufferData(LOCAL_GL_ARRAY_BUFFER, vertsSize, verts, LOCAL_GL_STATIC_DRAW);
> +        }
> +
> +        if (!mTexBlit_VertShader) {
> +

Done.

@@ +418,5 @@
> +    if (srcImage->GetFormat() == ImageFormat::GRALLOC_PLANAR_YCBCR) {
> +        GrallocImage* grImg = static_cast<GrallocImage*>(srcImage);
> +        android::sp<GraphicBuffer> buf = new GraphicBuffer((ANativeWindowBuffer*)grImg->GetNativeBuffer(), false);
> +        printf_stderr("buffer (%d(%d),%d):0x%x, usage: 0x%x", buf->width, buf->stride, buf->height, buf->format, buf->usage);
> +        if (buf->usage < GraphicBuffer::USAGE_HW_MASK) {

Since sw-codec uses GrallocImages now on B2G, I have to find a way filter out slow path, this is the simplest I can found.
Will find a better statement here later.
Fixed.

@@ +421,5 @@
> +        printf_stderr("buffer (%d(%d),%d):0x%x, usage: 0x%x", buf->width, buf->stride, buf->height, buf->format, buf->usage);
> +        if (buf->usage < GraphicBuffer::USAGE_HW_MASK) {
> +            RestoreState();
> +            return false;
> +      }

Done.

@@ +428,5 @@
> +
> +
> +    if (!mFBO)
> +        mGL->fGenFramebuffers(1, &mFBO);
> +    ScopedBindFramebuffer boundFB(mGL, mFBO);

Done.

@@ +446,5 @@
> +
> +        EGLint attrs[] = {
> +                LOCAL_EGL_IMAGE_PRESERVED, LOCAL_EGL_TRUE,
> +                LOCAL_EGL_NONE, LOCAL_EGL_NONE
> +            };

Done.

@@ +447,5 @@
> +        EGLint attrs[] = {
> +                LOCAL_EGL_IMAGE_PRESERVED, LOCAL_EGL_TRUE,
> +                LOCAL_EGL_NONE, LOCAL_EGL_NONE
> +            };
> +        EGLImage image = sEGLLibrary.fCreateImage(sEGLLibrary.Display(),

Done.

@@ +454,5 @@
> +                                                  grallocImage->GetNativeBuffer(), attrs);
> +        if (!mSrcTex[0])
> +            mGL->fGenTextures(1, &mSrcTex[0]);
> +        // Since EXTERNAL_OES not exposed to canvas, no need to store origin value.
> +        mGL->fBindTexture(LOCAL_GL_TEXTURE_EXTERNAL_OES, mSrcTex[0]);

Now all of them not share texture names. No worry about this.

@@ +458,5 @@
> +        mGL->fBindTexture(LOCAL_GL_TEXTURE_EXTERNAL_OES, mSrcTex[0]);
> +        mGL->fEGLImageTargetTexture2D(LOCAL_GL_TEXTURE_EXTERNAL_OES, image);
> +        mGL->fTexParameteri(LOCAL_GL_TEXTURE_EXTERNAL_OES, LOCAL_GL_TEXTURE_WRAP_S, LOCAL_GL_CLAMP_TO_EDGE);
> +        mGL->fTexParameteri(LOCAL_GL_TEXTURE_EXTERNAL_OES, LOCAL_GL_TEXTURE_WRAP_T, LOCAL_GL_CLAMP_TO_EDGE);
> +        mGL->fTexParameteri(LOCAL_GL_TEXTURE_EXTERNAL_OES, LOCAL_GL_TEXTURE_MAG_FILTER, LOCAL_GL_LINEAR);

Done. Originally I want to extend this function to support crop/scale/rotate case for Bug 880114. However, I gave up.

@@ +460,5 @@
> +        mGL->fTexParameteri(LOCAL_GL_TEXTURE_EXTERNAL_OES, LOCAL_GL_TEXTURE_WRAP_S, LOCAL_GL_CLAMP_TO_EDGE);
> +        mGL->fTexParameteri(LOCAL_GL_TEXTURE_EXTERNAL_OES, LOCAL_GL_TEXTURE_WRAP_T, LOCAL_GL_CLAMP_TO_EDGE);
> +        mGL->fTexParameteri(LOCAL_GL_TEXTURE_EXTERNAL_OES, LOCAL_GL_TEXTURE_MAG_FILTER, LOCAL_GL_LINEAR);
> +        mGL->fTexParameteri(LOCAL_GL_TEXTURE_EXTERNAL_OES, LOCAL_GL_TEXTURE_MIN_FILTER, LOCAL_GL_LINEAR);
> +        mGL->fUniform1f(mGL->fGetUniformLocation(mTexExternalBlit_Program, "uYflip"), (float)yFlip);

Done.

@@ +485,5 @@
> +        }
> +        int width[3] = {yuvData->mYStride, yuvData->mCbCrStride, yuvData->mCbCrStride};
> +        int height[3] = {yuvData->mYSize.height, yuvData->mCbCrSize.height, yuvData->mCbCrSize.height};
> +        uint8_t *data[3] = {yuvData->mYChannel, yuvData->mCbChannel, yuvData->mCrChannel};
> +        for (int i = 0; i < 3; i++) {

Done.

@@ +489,5 @@
> +        for (int i = 0; i < 3; i++) {
> +            mGL->fActiveTexture(LOCAL_GL_TEXTURE0 + i);
> +
> +            mGL->fGetIntegerv(LOCAL_GL_TEXTURE_BINDING_2D, &oldTex[i]);
> +            mGL->fGetTexParameteriv(LOCAL_GL_TEXTURE_2D, LOCAL_GL_TEXTURE_WRAP_S, &oldTexParams[i][0]);

Done.

@@ +521,5 @@
> +            mGL->fTexParameteri(LOCAL_GL_TEXTURE_2D, LOCAL_GL_TEXTURE_WRAP_T, LOCAL_GL_CLAMP_TO_EDGE);
> +            mGL->fTexParameteri(LOCAL_GL_TEXTURE_2D, LOCAL_GL_TEXTURE_MAG_FILTER, LOCAL_GL_LINEAR);
> +            mGL->fTexParameteri(LOCAL_GL_TEXTURE_2D, LOCAL_GL_TEXTURE_MIN_FILTER, LOCAL_GL_LINEAR);
> +        }
> +        mGL->fUniform1f(mGL->fGetUniformLocation(mTexYUVPlanarBlit_Program, "uYflip"), (float)yFlip);

Done.

::: gfx/gl/GLBlitImageHelper.h
@@ +24,5 @@
> +namespace gl {
> +
> +class GLContext;
> +
> +struct GLState {

Moved to ScopedGLHelpers.h

@@ +25,5 @@
> +
> +class GLContext;
> +
> +struct GLState {
> +    GLState(GLContext *gl);

Done.

@@ +43,5 @@
> +    ScopedGLState scissor;
> +    ScopedGLState stencil;
> +
> +    GLuint maxAttrib;
> +    GLint* attrib_enabled;

Done.

@@ +54,5 @@
> +
> +    realGLboolean colorMask[4];
> +    GLint viewport[4];
> +    GLint scissorBox[4];
> +    GLContext *mGL;

Done.

@@ +60,5 @@
> +
> +class GLBlitImageHelper {
> +public:
> +    GLBlitImageHelper(GLContext* gl);
> +    virtual ~GLBlitImageHelper();

Done.

@@ +67,5 @@
> +        GLuint destTex, GLenum destTarget, bool yFlip = 0, GLuint xoffset = 0,
> +        GLuint yoffset = 0, GLuint width = 0, GLuint height = 0);
> +
> +protected:
> +    GLState* mStoredState;

Removed.

@@ +68,5 @@
> +        GLuint yoffset = 0, GLuint width = 0, GLuint height = 0);
> +
> +protected:
> +    GLState* mStoredState;
> +    GLContext* mGL;

Removed.

@@ +77,5 @@
> +    GLuint mTexExternalBlit_Program;
> +    GLuint mTexYUVPlanarBlit_Program;
> +
> +    GLuint mFBO;
> +    GLuint mSrcTex[3];

Done.

@@ +81,5 @@
> +    GLuint mSrcTex[3];
> +    int mTexWidth;
> +    int mTexHeight;
> +
> +    void PushState();

Removed.

@@ +82,5 @@
> +    int mTexWidth;
> +    int mTexHeight;
> +
> +    void PushState();
> +    void RestoreState();

Removed.

::: gfx/gl/GLContext.cpp
@@ +1191,4 @@
>      const bool firstRun = false;
>  #endif
>  
> +    InitializeExtensionsBitSet(mAvailableExtensions, extensions, sExtensionNames, 1);//firstRun && DebugMode());

Reverted.
Attached patch V2 (obsolete) — Splinter Review
Patch v2 based on last review.
Slightly refactoring GLBlitHelper, too.
Attachment #8402519 - Attachment is obsolete: true
Attachment #8408788 - Flags: feedback?(pchang)
Attached patch V2 (obsolete) — Splinter Review
Wrong file uploaded... sorry
Attachment #8408788 - Attachment is obsolete: true
Attachment #8408788 - Flags: feedback?(pchang)
Attachment #8408814 - Flags: feedback?(pchang)
Comment on attachment 8408814 [details] [diff] [review]
V2

Review of attachment 8408814 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/canvas/src/WebGLContext.cpp
@@ +1386,5 @@
> +                              GLenum internalformat, GLenum format, GLenum type,
> +                              mozilla::dom::HTMLVideoElement* video)
> +{
> +    uint16_t readyState;
> +    if (!video)

I think you can check video element with the following way, then no need to include HTMLVideoElement in WebGLContext.h
http://dxr.mozilla.org/mozilla-central/source/content/canvas/src/CanvasRenderingContext2D.cpp#3229

::: content/canvas/src/WebGLContext.h
@@ +18,5 @@
>  #include "nsIDOMWebGLRenderingContext.h"
>  #include "nsICanvasRenderingContextInternal.h"
>  #include "mozilla/dom/HTMLCanvasElement.h"
> +#include "mozilla/dom/HTMLVideoElement.h"
> +#include "ImageContainer.h"

no need to include ImageContainer.h here

@@ +430,5 @@
> +
> +      mozilla::dom::HTMLVideoElement* video = mozilla::dom::HTMLVideoElement::FromContentOrNull(&elt);
> +      if (TexImageFromVideoElement(target, level, internalformat, format, type, video)) { // special case
> +          return;
> +      }

Please pass element directly to TexImageFromVideoElement, then we don't need to include HTMLVideoElement.h here.
And you also need to change the function name.

::: content/canvas/src/WebGLContextGL.cpp
@@ +281,4 @@
>      }
>  
>      *currentTexPtr = newTex;
> +    mCurrentTextureTarget = target;

Why to have mCurrentTextureTarget?

::: gfx/gl/GLBlitHelper.cpp
@@ +11,5 @@
> +#include "ImageContainer.h"
> +#ifdef MOZ_WIDGET_GONK
> +#include "GrallocImages.h"
> +#include "GLLibraryEGL.h"
> +#include <cutils/properties.h>

no need for properties.h

@@ +43,4 @@
>  
>  GLuint
>  CreateTexture(GLContext* aGL, GLenum aInternalFormat, GLenum aFormat,
> +              GLenum aType, const gfx::IntSize& aSize, bool filter)

nit: change filter to useNearestFilter

@@ +411,5 @@
> +            case BlitTex2D:
> +            case BlitTexRect:
> +            case ConvertGralloc:
> +            {
> +                MOZ_ASSERT(mGL->fGetAttribLocation(program, "aPosition") == 0);

no need for ConvertPlanarYCbCr?

@@ +569,5 @@
>  
>  void
> +GLBlitHelper::BindAndUploadYUVTexture(Channel which, uint32_t width, uint32_t height, void* data, bool allocation)
> +{
> +    GLuint* SrcTex[3] = {&mSrcTexY, &mSrcTexCb, &mSrcTexCr};

Add ASSERT when which is bigger than 3

@@ +572,5 @@
> +{
> +    GLuint* SrcTex[3] = {&mSrcTexY, &mSrcTexCb, &mSrcTexCr};
> +    if (!*SrcTex[which]) {
> +        *SrcTex[which] = CreateTexture(mGL, LOCAL_GL_LUMINANCE, LOCAL_GL_LUMINANCE, LOCAL_GL_UNSIGNED_BYTE,
> +                                       gfx::IntSize(width, height), false);

Where did you delete the texture?
And I didn't see the reason to keep mSrcTexY/mSrcTexCb/mSrcTexCr.
You can just call

GLuint tex = CreateTexture(...)
mGL->fActiveTexture(LOCAL_GL_TEXTURE0 + which);
mGL->fBindTexture(LOCAL_GL_TEXTURE_2D, tex);

@@ +629,5 @@
> +    if (srcImage->GetFormat() == ImageFormat::GRALLOC_PLANAR_YCBCR) {
> +        GrallocImage* grImg = static_cast<GrallocImage*>(srcImage);
> +        android::sp<GraphicBuffer> buf = new GraphicBuffer((ANativeWindowBuffer*)grImg->GetNativeBuffer(), false);
> +        // Filter out software decode frames
> +        int swUsageBit = GraphicBuffer::USAGE_SOFTWARE_MASK | GRALLOC_USAGE_HW_TEXTURE;

I think you can get the usage flag from GrallocTexutreClient by adding getUsage func.
Then you don't need to expose GraphicBuffer to this helper.

http://dxr.mozilla.org/mozilla-central/source/gfx/layers/opengl/GrallocTextureClient.cpp#302

For flag checking, use the following one is more clear.
(bug->usage && USAGE_SOFTWARE_MASK || buf->usage && USAGE_HW_TEXTURE)

BTW, why you need to filter USAGE_HW_TEXTURE one?

@@ +668,5 @@
> +            return false;
> +        }
> +
> +        if (!mSrcTexEGL)
> +            mGL->fGenTextures(1, &mSrcTexEGL);

When to free mSrcTexEGL?

::: gfx/gl/GLBlitHelper.h
@@ +79,5 @@
> +    {
> +        BlitTex2D,
> +        BlitTexRect,
> +        ConvertGralloc,
> +        ConvertPlanarYCbCr,

Please add comment for each blit type.
Status: NEW → ASSIGNED
Comment on attachment 8408814 [details] [diff] [review]
V2

Review of attachment 8408814 [details] [diff] [review]:
-----------------------------------------------------------------

I will update new version later.

::: content/canvas/src/WebGLContext.cpp
@@ +1386,5 @@
> +                              GLenum internalformat, GLenum format, GLenum type,
> +                              mozilla::dom::HTMLVideoElement* video)
> +{
> +    uint16_t readyState;
> +    if (!video)

That's not feasible here, since we use template here rather than an union class

::: content/canvas/src/WebGLContext.h
@@ +18,5 @@
>  #include "nsIDOMWebGLRenderingContext.h"
>  #include "nsICanvasRenderingContextInternal.h"
>  #include "mozilla/dom/HTMLCanvasElement.h"
> +#include "mozilla/dom/HTMLVideoElement.h"
> +#include "ImageContainer.h"

Done.

@@ +430,5 @@
> +
> +      mozilla::dom::HTMLVideoElement* video = mozilla::dom::HTMLVideoElement::FromContentOrNull(&elt);
> +      if (TexImageFromVideoElement(target, level, internalformat, format, type, video)) { // special case
> +          return;
> +      }

Done.

::: content/canvas/src/WebGLContextGL.cpp
@@ +281,4 @@
>      }
>  
>      *currentTexPtr = newTex;
> +    mCurrentTextureTarget = target;

Removed.

::: gfx/gl/GLBlitHelper.cpp
@@ +11,5 @@
> +#include "ImageContainer.h"
> +#ifdef MOZ_WIDGET_GONK
> +#include "GrallocImages.h"
> +#include "GLLibraryEGL.h"
> +#include <cutils/properties.h>

Removed.

@@ +43,4 @@
>  
>  GLuint
>  CreateTexture(GLContext* aGL, GLenum aInternalFormat, GLenum aFormat,
> +              GLenum aType, const gfx::IntSize& aSize, bool filter)

Done.

@@ +411,5 @@
> +            case BlitTex2D:
> +            case BlitTexRect:
> +            case ConvertGralloc:
> +            {
> +                MOZ_ASSERT(mGL->fGetAttribLocation(program, "aPosition") == 0);

Fixed.

@@ +569,5 @@
>  
>  void
> +GLBlitHelper::BindAndUploadYUVTexture(Channel which, uint32_t width, uint32_t height, void* data, bool allocation)
> +{
> +    GLuint* SrcTex[3] = {&mSrcTexY, &mSrcTexCb, &mSrcTexCr};

Trying to do that generate compiler error, since Channel has only 3 valid values.

@@ +572,5 @@
> +{
> +    GLuint* SrcTex[3] = {&mSrcTexY, &mSrcTexCb, &mSrcTexCr};
> +    if (!*SrcTex[which]) {
> +        *SrcTex[which] = CreateTexture(mGL, LOCAL_GL_LUMINANCE, LOCAL_GL_LUMINANCE, LOCAL_GL_UNSIGNED_BYTE,
> +                                       gfx::IntSize(width, height), false);

Leak fixed.
For better explain which channel it operates.
And I think this is better than 

switch (which)
{
    case Channel_Y:
    {
        mGL->fBindTexture(LOCAL_GL_TEXTURE_2D, mSrcTexY);
        mSrcTexY = CreateTexture(...);
        break;
    }
    case Channel_Cb:
    {
        mGL->fBindTexture(LOCAL_GL_TEXTURE_2D, mSrcTexCb};
        mSrcTexCr = CreateTexture(...);
        break;
    }
    ......
}

And it is better than
GLuint* tex;
switch (which)
{
    case Channel_Y;
        tex = &mSrcTexY;
        break;
    ......
}

@@ +629,5 @@
> +    if (srcImage->GetFormat() == ImageFormat::GRALLOC_PLANAR_YCBCR) {
> +        GrallocImage* grImg = static_cast<GrallocImage*>(srcImage);
> +        android::sp<GraphicBuffer> buf = new GraphicBuffer((ANativeWindowBuffer*)grImg->GetNativeBuffer(), false);
> +        // Filter out software decode frames
> +        int swUsageBit = GraphicBuffer::USAGE_SOFTWARE_MASK | GRALLOC_USAGE_HW_TEXTURE;

Done.

@@ +668,5 @@
> +            return false;
> +        }
> +
> +        if (!mSrcTexEGL)
> +            mGL->fGenTextures(1, &mSrcTexEGL);

Fixed.

::: gfx/gl/GLBlitHelper.h
@@ +79,5 @@
> +    {
> +        BlitTex2D,
> +        BlitTexRect,
> +        ConvertGralloc,
> +        ConvertPlanarYCbCr,

Done.
Attachment #8408814 - Flags: feedback?(pchang)
Attached patch V3 (obsolete) — Splinter Review
Attachment #8408814 - Attachment is obsolete: true
Attachment #8409996 - Flags: feedback?(pchang)
Comment on attachment 8409996 [details] [diff] [review]
V3

Review of attachment 8409996 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/canvas/src/WebGLContext.h
@@ +426,5 @@
>      {
> +      if (IsContextLost())
> +          return;
> +
> +      // special case, fast way

please add more clear comment to describe 'special case'

@@ +473,5 @@
>      {
>          if (IsContextLost())
>              return;
> +
> +        if (TexImageFromVideoElement(target, level, format, format, type, &elt)) { // special case

please add more clear comment to describe 'special case'

::: gfx/gl/GLBlitHelper.cpp
@@ +8,4 @@
>  #include "GLContext.h"
>  #include "ScopedGLHelpers.h"
>  #include "mozilla/Preferences.h"
> +#include "ImageContainer.h"

no need

@@ +11,5 @@
> +#include "ImageContainer.h"
> +#ifdef MOZ_WIDGET_GONK
> +#include "GrallocImages.h"
> +#include "GLLibraryEGL.h"
> +#include "ui/GraphicBuffer.h"

no need

@@ +13,5 @@
> +#include "GrallocImages.h"
> +#include "GLLibraryEGL.h"
> +#include "ui/GraphicBuffer.h"
> +using mozilla::layers::GrallocImage;
> +using android::GraphicBuffer;

no need

@@ +443,5 @@
> +                mYTexScaleLoc = mGL->fGetUniformLocation(program, "uYTexScale");
> +                mCbCrTexScaleLoc= mGL->fGetUniformLocation(program, "uCbCrTexScale");
> +                MOZ_ASSERT(texY != -1 && texU != -1 && texV != -1, "uniforms not found");
> +
> +                mGL->fUseProgram(program);

duplicated call

@@ +683,5 @@
> +        if (image == EGL_NO_IMAGE) {
> +            return false;
> +        }
> +
> +        if (!mSrcTexEGL)

missing braces

@@ +692,5 @@
> +        mGL->fTexParameteri(LOCAL_GL_TEXTURE_EXTERNAL_OES, LOCAL_GL_TEXTURE_WRAP_S, LOCAL_GL_CLAMP_TO_EDGE);
> +        mGL->fTexParameteri(LOCAL_GL_TEXTURE_EXTERNAL_OES, LOCAL_GL_TEXTURE_WRAP_T, LOCAL_GL_CLAMP_TO_EDGE);
> +        mGL->fTexParameteri(LOCAL_GL_TEXTURE_EXTERNAL_OES, LOCAL_GL_TEXTURE_MAG_FILTER, LOCAL_GL_LINEAR);
> +        mGL->fTexParameteri(LOCAL_GL_TEXTURE_EXTERNAL_OES, LOCAL_GL_TEXTURE_MIN_FILTER, LOCAL_GL_LINEAR);
> +

For the TexParameter calls, you can configure them during texture generation only.
Attachment #8409996 - Flags: feedback?(pchang) → feedback+
Comment on attachment 8409996 [details] [diff] [review]
V3

Review of attachment 8409996 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/canvas/src/WebGLContext.h
@@ +426,5 @@
>      {
> +      if (IsContextLost())
> +          return;
> +
> +      // special case, fast way

Done.

@@ +473,5 @@
>      {
>          if (IsContextLost())
>              return;
> +
> +        if (TexImageFromVideoElement(target, level, format, format, type, &elt)) { // special case

Done.

::: gfx/gl/GLBlitHelper.cpp
@@ +8,4 @@
>  #include "GLContext.h"
>  #include "ScopedGLHelpers.h"
>  #include "mozilla/Preferences.h"
> +#include "ImageContainer.h"

Done.

@@ +11,5 @@
> +#include "ImageContainer.h"
> +#ifdef MOZ_WIDGET_GONK
> +#include "GrallocImages.h"
> +#include "GLLibraryEGL.h"
> +#include "ui/GraphicBuffer.h"

Done.

@@ +13,5 @@
> +#include "GrallocImages.h"
> +#include "GLLibraryEGL.h"
> +#include "ui/GraphicBuffer.h"
> +using mozilla::layers::GrallocImage;
> +using android::GraphicBuffer;

Done.

@@ +443,5 @@
> +                mYTexScaleLoc = mGL->fGetUniformLocation(program, "uYTexScale");
> +                mCbCrTexScaleLoc= mGL->fGetUniformLocation(program, "uCbCrTexScale");
> +                MOZ_ASSERT(texY != -1 && texU != -1 && texV != -1, "uniforms not found");
> +
> +                mGL->fUseProgram(program);

Done.

@@ +683,5 @@
> +        if (image == EGL_NO_IMAGE) {
> +            return false;
> +        }
> +
> +        if (!mSrcTexEGL)

Done.

@@ +692,5 @@
> +        mGL->fTexParameteri(LOCAL_GL_TEXTURE_EXTERNAL_OES, LOCAL_GL_TEXTURE_WRAP_S, LOCAL_GL_CLAMP_TO_EDGE);
> +        mGL->fTexParameteri(LOCAL_GL_TEXTURE_EXTERNAL_OES, LOCAL_GL_TEXTURE_WRAP_T, LOCAL_GL_CLAMP_TO_EDGE);
> +        mGL->fTexParameteri(LOCAL_GL_TEXTURE_EXTERNAL_OES, LOCAL_GL_TEXTURE_MAG_FILTER, LOCAL_GL_LINEAR);
> +        mGL->fTexParameteri(LOCAL_GL_TEXTURE_EXTERNAL_OES, LOCAL_GL_TEXTURE_MIN_FILTER, LOCAL_GL_LINEAR);
> +

Done.
Attached patch V3 (obsolete) — Splinter Review
Attachment #8409996 - Attachment is obsolete: true
Attachment #8410894 - Flags: review?(jgilbert)
Attachment #8410894 - Flags: feedback?(pchang)
Attachment #8410894 - Flags: feedback?(pchang) → feedback+
Attachment #8410894 - Flags: review?(jgilbert)
I have some new findings base on various test online:
video is near 24 fps naturally, and WebGL FPS is 60 fps(constrained by RefreshDriver), however, some demo just register call back of animation frame or use the legacy setInterval to upload texture. Which cause most frames converted twice and waste time.
I will upload a new version later to check the input frame, if they are the same frame, then just convert and skip upload.
The only upload when a new frame was decoded is a valid optimization. However there's a problem with it.

The problem is that it makes frame-timing unpredictable from a developers point of view. So while it will help the average upload time, it'll lead to situations where constant jutter is introduced as every 2 or 3rd frame is dropped, leading to frame timings like 16 16 30 16 30 16 30 16 16 30 16 16 30 etc.
To clarify, from a developer/user perspective it is better to degrade to 32ms (30fps) consistently if the video upload would lead to a dropped frame, than trying to deliver 1-2 frames at 16ms (60fps) and every second or third at 32ms.

Unless you can guarantee that a video upload will never take up a sizable portion of 16ms (say **always** below 2ms).
Attached patch WIP V4 (obsolete) — Splinter Review
Rebased to newer version, and fix various small problem, however, I found it make B2G hw-codec path not work on recent code base.

Though my code path is the same for both sw/hw codec now, and the code is almost the same with Compositor, but it just not work. And I think the problem is the texture of hw-decoded frame is not recognized in shader somehow without any gl error and genlock deadlock.

Any idea, Jeff?
Attachment #8410894 - Attachment is obsolete: true
Flags: needinfo?(jgilbert)
Is it possible that B2G does not support luminance textures well? You could substitute luminance for GL_ALPHA and interprete the alpha channel as color.

Is it possible that B2G does not support multitexturing well? You could concatenate the data from the 3 clipping planes to one texture that'd work out to a size of width * height*2.5, and then perform 3 texture lookups to the appropriate coordinates into the one texture.
Comment on attachment 8419158 [details] [diff] [review]
WIP V4

Review of attachment 8419158 [details] [diff] [review]:
-----------------------------------------------------------------

This generally seems like it should work. I don't have any ideas why it wouldn't. KHR_debug support just landed, so maybe try running with DebugMode() and check the spew.

::: gfx/gl/GLBlitHelper.cpp
@@ +40,4 @@
>  
>  GLuint
>  CreateTexture(GLContext* aGL, GLenum aInternalFormat, GLenum aFormat,
> +              GLenum aType, const gfx::IntSize& aSize, bool useSmoothFilter)

Call it linear not smooth.

@@ +585,5 @@
>  void
> +GLBlitHelper::BindAndUploadYUVTexture(Channel which, uint32_t width, uint32_t height, void* data, bool allocation)
> +{
> +    GLuint* SrcTex[3] = {&mSrcTexY, &mSrcTexCb, &mSrcTexCr};
> +    if (!*SrcTex[which]) {

Assert which < 3.

@@ +651,5 @@
> +        ScopedBindTextureUnit boundTU(mGL, LOCAL_GL_TEXTURE0);
> +
> +        mGL->fColorMask(LOCAL_GL_TRUE, LOCAL_GL_TRUE, LOCAL_GL_TRUE, LOCAL_GL_TRUE);
> +        mGL->fViewport(0, 0, destSize.width, destSize.height);
> +        mGL->fClear(LOCAL_GL_COLOR_BUFFER_BIT|LOCAL_GL_DEPTH_BUFFER_BIT|LOCAL_GL_STENCIL_BUFFER_BIT);

Don't clear depth and stencil.

@@ +682,5 @@
> +        mGL->fBindTexture(LOCAL_GL_TEXTURE_EXTERNAL_OES, mSrcTexEGL);
> +        mGL->fEGLImageTargetTexture2D(LOCAL_GL_TEXTURE_EXTERNAL_OES, image);
> +
> +        if (mCurrFlip != yFlip) {
> +            mGL->fUniform1f(mYFlipLoc, (float)yFlip);

Do this explicitly. `yFlip ? 1.0 : 0.0`

@@ +688,5 @@
> +
> +        mGL->fDrawArrays(LOCAL_GL_TRIANGLE_STRIP, 0, 4);
> +
> +        sEGLLibrary.fDestroyImage(sEGLLibrary.Display(), image);
> +        mGL->fBindTexture(LOCAL_GL_TEXTURE_EXTERNAL_OES, 0);

Rebind this to previous.

@@ +734,5 @@
> +            mGL->fBindTexture(LOCAL_GL_TEXTURE_2D, oldTex[i]);
> +        }
> +    }
> +
> +    mGL->fDisableVertexAttribArray(0);

Don't disable this for no reason.

::: gfx/gl/ScopedGLHelpers.h
@@ +282,5 @@
> +    ~ScopedGLDrawState();
> +
> +    GLuint boundProgram;
> +    GLuint boundBuffer;
> +    GLuint boundElement;

Not used?
Comment on attachment 8419158 [details] [diff] [review]
WIP V4

Review of attachment 8419158 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for the detailed review, I will try DebugMode and see whether it find something before upload next version.

::: gfx/gl/GLBlitHelper.cpp
@@ +40,4 @@
>  
>  GLuint
>  CreateTexture(GLContext* aGL, GLenum aInternalFormat, GLenum aFormat,
> +              GLenum aType, const gfx::IntSize& aSize, bool useSmoothFilter)

Done.

@@ +585,5 @@
>  void
> +GLBlitHelper::BindAndUploadYUVTexture(Channel which, uint32_t width, uint32_t height, void* data, bool allocation)
> +{
> +    GLuint* SrcTex[3] = {&mSrcTexY, &mSrcTexCb, &mSrcTexCr};
> +    if (!*SrcTex[which]) {

I will do that as you and Peter just point out this same part.

However, I tried pass 3 into which(by literal or by variable), and it generates a compiler error, maybe there is some method to fool around compiler that I don't know.

@@ +651,5 @@
> +        ScopedBindTextureUnit boundTU(mGL, LOCAL_GL_TEXTURE0);
> +
> +        mGL->fColorMask(LOCAL_GL_TRUE, LOCAL_GL_TRUE, LOCAL_GL_TRUE, LOCAL_GL_TRUE);
> +        mGL->fViewport(0, 0, destSize.width, destSize.height);
> +        mGL->fClear(LOCAL_GL_COLOR_BUFFER_BIT|LOCAL_GL_DEPTH_BUFFER_BIT|LOCAL_GL_STENCIL_BUFFER_BIT);

This clear generally redundant at all.
I do it here since I read some articles that indicate such glClear would be a optimization for mobile GPU.(mobileGPU on mediawiki and OpenGL Insight)

@@ +682,5 @@
> +        mGL->fBindTexture(LOCAL_GL_TEXTURE_EXTERNAL_OES, mSrcTexEGL);
> +        mGL->fEGLImageTargetTexture2D(LOCAL_GL_TEXTURE_EXTERNAL_OES, image);
> +
> +        if (mCurrFlip != yFlip) {
> +            mGL->fUniform1f(mYFlipLoc, (float)yFlip);

Done.

@@ +688,5 @@
> +
> +        mGL->fDrawArrays(LOCAL_GL_TRIANGLE_STRIP, 0, 4);
> +
> +        sEGLLibrary.fDestroyImage(sEGLLibrary.Display(), image);
> +        mGL->fBindTexture(LOCAL_GL_TEXTURE_EXTERNAL_OES, 0);

Done.

@@ +734,5 @@
> +            mGL->fBindTexture(LOCAL_GL_TEXTURE_2D, oldTex[i]);
> +        }
> +    }
> +
> +    mGL->fDisableVertexAttribArray(0);

Done.

I do it here as a common GL practice: we enable 0 in InitTexQuadProgram, so generally we should disable it before leaving. Though the ScopedGLDrawState recovers those setting and make this redundant.

::: gfx/gl/ScopedGLHelpers.h
@@ +282,5 @@
> +    ~ScopedGLDrawState();
> +
> +    GLuint boundProgram;
> +    GLuint boundBuffer;
> +    GLuint boundElement;

Since we not use element array in current case, I will remove it.
Attached patch V4 (obsolete) — Splinter Review
DebugMode does not help. I think we can bypass the problematic cases here later if it really cause problem on B2G devices.
Attachment #8419158 - Attachment is obsolete: true
Attachment #8420728 - Flags: review?(jgilbert)
Flags: needinfo?(jgilbert)
In fact, I found the video is black while play it in browser when warm reboot, however, after you kill the browser from cardview and relaunch the video plays as normal.

Video address:
http://codeflow.org/issues/slow_video_to_texture/mp4/240p.mp4

I will ask our media team for help or file another bug for that.
(In reply to Chiajung Hung [:chiajung] from comment #49)
> In fact, I found the video is black while play it in browser when warm
> reboot, however, after you kill the browser from cardview and relaunch the
> video plays as normal.
> 
> Video address:
> http://codeflow.org/issues/slow_video_to_texture/mp4/240p.mp4
> 
> I will ask our media team for help or file another bug for that.

Confirmed the bug is not related to my patch. 
I can reproduce the bug in browser app without WebGL involved.
Comment on attachment 8420728 [details] [diff] [review]
V4

Review of attachment 8420728 [details] [diff] [review]:
-----------------------------------------------------------------

Cool stuff. Let's do another round. I'll be quicker about it this time!

::: content/canvas/src/WebGLContext.cpp
@@ +1403,5 @@
> +    }
> +
> +    if (NS_SUCCEEDED(video->GetReadyState(&readyState)) &&
> +        (readyState == nsIDOMHTMLMediaElement::HAVE_NOTHING ||
> +         readyState == nsIDOMHTMLMediaElement::HAVE_METADATA))

Can't we check that readyState is what-we-want, instead of checking if it's what-we-don't-want?

@@ +1423,5 @@
> +    nsRefPtr<mozilla::layers::Image> srcImage = container->LockCurrentImage();
> +    WebGLTexture* tex = activeBoundTextureForTarget(target);
> +
> +    if (!tex) {
> +        ErrorInvalidOperation("texImage2D: no texture is bound to this target");

I don't love adding another path that has to make this check. Is there any way we can delay this upload/copy until later in the cycle? After we've done all the validation?

::: content/canvas/src/WebGLContext.h
@@ +419,5 @@
>      void TexImage2D(GLenum target, GLint level,
>                      GLenum internalformat, GLenum format, GLenum type,
>                      ElementType& elt, ErrorResult& rv)
>      {
> +      if (IsContextLost())

This function should stay 4-space.

::: gfx/gl/GLBlitHelper.cpp
@@ +8,5 @@
>  #include "GLContext.h"
>  #include "ScopedGLHelpers.h"
>  #include "mozilla/Preferences.h"
> +#include "ImageContainer.h"
> +#ifdef MOZ_WIDGET_GONK

I've found it's cleaner if you leave a blank newline before #ifdef blocks for includes.

@@ +173,5 @@
> +    }
> +
> +    if (mSrcTexEGL) {
> +        mGL->fDeleteTextures(1, &mSrcTexEGL);
> +    }

This could probably be written as:
GLuint delArr[] = {
  mSrcTexY,
  mSrcTexCb,
  mSrcTexCr,
  mSrcTexEGL
};
mGL->fDeleteTextures(WE_HAVE_A_MACRO_FOR_ARRAY_LENGTH(delArr), delArr);

@@ +237,5 @@
> +    ";
> +
> +    const char kTexExternalBlit_FragShaderSource[] = "\
> +        #extension GL_OES_EGL_image_external : require      \n\
> +        precision mediump float;                            \n\

We actually probably want highp floats if available, so just pull the #ifdef blocks from the other sources.
mediump only guarantees precision of 2^-10, so if we're blitting anything bigger than 1024^2, we might need more precision.

@@ +426,5 @@
> +        switch (target) {
> +            case BlitTex2D:
> +            case BlitTexRect:
> +            case ConvertGralloc:
> +            {

{ on previous line.

@@ +439,5 @@
> +                GLint texCb = mGL->fGetUniformLocation(program, "uCbTexture");
> +                GLint texCr = mGL->fGetUniformLocation(program, "uCrTexture");
> +                mYTexScaleLoc = mGL->fGetUniformLocation(program, "uYTexScale");
> +                mCbCrTexScaleLoc= mGL->fGetUniformLocation(program, "uCbCrTexScale");
> +                MOZ_ASSERT(texY != -1 && texU != -1 && texV != -1, "uniforms not found");

Don't forget to check the scale uniforms, too!

@@ +471,4 @@
>                                false,
>                                0,
>                                nullptr);
> +    mGL->fUniform1f(mYFlipLoc, mCurrFlip);

Don't bother caching this, since we should always just explicitly set it closer to DrawArrays. Remove this line.

@@ +584,5 @@
>  
>  void
> +GLBlitHelper::BindAndUploadYUVTexture(Channel which, uint32_t width, uint32_t height, void* data, bool allocation)
> +{
> +    GLuint* SrcTex[3] = {&mSrcTexY, &mSrcTexCb, &mSrcTexCr};

s/SrcTex/srcTexArr/

@@ +585,5 @@
>  void
> +GLBlitHelper::BindAndUploadYUVTexture(Channel which, uint32_t width, uint32_t height, void* data, bool allocation)
> +{
> +    GLuint* SrcTex[3] = {&mSrcTexY, &mSrcTexCb, &mSrcTexCr};
> +    if (!*SrcTex[which]) {

Assert `which < Channel::Max`
So the ideal way to do this is with a const reference:
`const GLuint& tex = *srcTex[which]`

This makes things cleaner:
if (!tex) {
  tex = CreateTex(...)
}

@@ +593,5 @@
> +    mGL->fActiveTexture(LOCAL_GL_TEXTURE0 + which);
> +
> +    mGL->fBindTexture(LOCAL_GL_TEXTURE_2D, *SrcTex[which]);
> +
> +    if (!allocation) {

I don't think there should be a time we'd want to call this with allocation=true, since we already allocated the texture either earlier in this function with CreateTexture, or in the past.

@@ +680,5 @@
> +            mGL->fTexParameteri(LOCAL_GL_TEXTURE_EXTERNAL_OES, LOCAL_GL_TEXTURE_WRAP_T, LOCAL_GL_CLAMP_TO_EDGE);
> +            mGL->fTexParameteri(LOCAL_GL_TEXTURE_EXTERNAL_OES, LOCAL_GL_TEXTURE_MAG_FILTER, LOCAL_GL_NEAREST);
> +            mGL->fTexParameteri(LOCAL_GL_TEXTURE_EXTERNAL_OES, LOCAL_GL_TEXTURE_MIN_FILTER, LOCAL_GL_NEAREST);
> +        }
> +        // Since EXTERNAL_OES not exposed to canvas, no need to store origin value.

No need to do it yet, but we're likely adding it, so we want to save and restore it so we don't get bitten when we do implement it for WebGL.

@@ +686,5 @@
> +        mGL->fEGLImageTargetTexture2D(LOCAL_GL_TEXTURE_EXTERNAL_OES, image);
> +
> +        float yFlipf = yFlip ? 1.0f : 0.0f;
> +        if (mCurrFlip != yFlipf) {
> +            mCurrFlip = yFlipf;

As above, there's no reason to cache this, so just drop the cache. Thanks for doing this bool->float conversion explicitly, for clarity.

@@ +773,5 @@
> +        type = BlitTexRect;
> +        break;
> +    default:
> +        printf_stderr("[%s:%d] Fatal Error: Failed to prepare to blit texture->framebuffer.\n",
> +                      __FILE__, __LINE__);

I'm pretty sure MOZ_CRASH gives us the __FILE__/__LINE__ info, so we can just skip that in our printf here.

::: gfx/gl/GLBlitHelper.h
@@ +122,5 @@
> +    int mTexWidth;
> +    int mTexHeight;
> +
> +    // Cache some uniform values
> +    int mCurrFlip;

Shouldn't this be either bool or float?
Also, generally 'Cur' has one 'r'.
Attachment #8420728 - Flags: review?(jgilbert) → review-
(In reply to Jeff Gilbert [:jgilbert] from comment #51)
> Comment on attachment 8420728 [details] [diff] [review]
> Shouldn't this be either bool or float?
> Also, generally 'Cur' has one 'r'.

Really? I always use curr(ent). I'll have to remember that one.
Comment on attachment 8420728 [details] [diff] [review]
V4

Review of attachment 8420728 [details] [diff] [review]:
-----------------------------------------------------------------

I will upload next version soon after I slightly refactor the code sequence of texImage to avoid tex validation in 2 places.

::: content/canvas/src/WebGLContext.cpp
@@ +1403,5 @@
> +    }
> +
> +    if (NS_SUCCEEDED(video->GetReadyState(&readyState)) &&
> +        (readyState == nsIDOMHTMLMediaElement::HAVE_NOTHING ||
> +         readyState == nsIDOMHTMLMediaElement::HAVE_METADATA))

The opposite side have 3 conditions(HAVE_CURRENT_DATA, HAVE_FUTURE_DATA, HAVE_ENOUGH_DATA) to check, and involves more indent here.
Since each of these states contains the previous states, so we can write readState > HAVE_META_DATA or readyState >= HAVE_CURRENT_DATA to indicate that we need at least 1 frame to upload.

@@ +1423,5 @@
> +    nsRefPtr<mozilla::layers::Image> srcImage = container->LockCurrentImage();
> +    WebGLTexture* tex = activeBoundTextureForTarget(target);
> +
> +    if (!tex) {
> +        ErrorInvalidOperation("texImage2D: no texture is bound to this target");

Currently it is:
(1) Color conversion: nsLayoutUtils::SurfaceFromElement(video state validation here)
(2) state validaion + Tex Upload (in TexImage2D_base)

Since all original validation are right before uploading to GPU and after color conversion. We can avoid this only if we refactor texImage2D into:
(1) Tex state validation
(2) Color conversion + Tex Upload

I will make it this way in next version.

::: content/canvas/src/WebGLContext.h
@@ +419,5 @@
>      void TexImage2D(GLenum target, GLint level,
>                      GLenum internalformat, GLenum format, GLenum type,
>                      ElementType& elt, ErrorResult& rv)
>      {
> +      if (IsContextLost())

Done.

::: gfx/gl/GLBlitHelper.cpp
@@ +8,5 @@
>  #include "GLContext.h"
>  #include "ScopedGLHelpers.h"
>  #include "mozilla/Preferences.h"
> +#include "ImageContainer.h"
> +#ifdef MOZ_WIDGET_GONK

Done.

@@ +173,5 @@
> +    }
> +
> +    if (mSrcTexEGL) {
> +        mGL->fDeleteTextures(1, &mSrcTexEGL);
> +    }

Done.

@@ +237,5 @@
> +    ";
> +
> +    const char kTexExternalBlit_FragShaderSource[] = "\
> +        #extension GL_OES_EGL_image_external : require      \n\
> +        precision mediump float;                            \n\

Done.

@@ +426,5 @@
> +        switch (target) {
> +            case BlitTex2D:
> +            case BlitTexRect:
> +            case ConvertGralloc:
> +            {

Done.

@@ +439,5 @@
> +                GLint texCb = mGL->fGetUniformLocation(program, "uCbTexture");
> +                GLint texCr = mGL->fGetUniformLocation(program, "uCrTexture");
> +                mYTexScaleLoc = mGL->fGetUniformLocation(program, "uYTexScale");
> +                mCbCrTexScaleLoc= mGL->fGetUniformLocation(program, "uCbCrTexScale");
> +                MOZ_ASSERT(texY != -1 && texU != -1 && texV != -1, "uniforms not found");

Done.

@@ +471,4 @@
>                                false,
>                                0,
>                                nullptr);
> +    mGL->fUniform1f(mYFlipLoc, mCurrFlip);

Done.

@@ +584,5 @@
>  
>  void
> +GLBlitHelper::BindAndUploadYUVTexture(Channel which, uint32_t width, uint32_t height, void* data, bool allocation)
> +{
> +    GLuint* SrcTex[3] = {&mSrcTexY, &mSrcTexCb, &mSrcTexCr};

Done.

@@ +585,5 @@
>  void
> +GLBlitHelper::BindAndUploadYUVTexture(Channel which, uint32_t width, uint32_t height, void* data, bool allocation)
> +{
> +    GLuint* SrcTex[3] = {&mSrcTexY, &mSrcTexCb, &mSrcTexCr};
> +    if (!*SrcTex[which]) {

Done.

@@ +593,5 @@
> +    mGL->fActiveTexture(LOCAL_GL_TEXTURE0 + which);
> +
> +    mGL->fBindTexture(LOCAL_GL_TEXTURE_2D, *SrcTex[which]);
> +
> +    if (!allocation) {

We have to do this while video size change:
If the WebApp loads several different sizes video with the same WebGLContext, we need reallocation the buffer. 
Since GLBlitHelper is cached once used, therefore texture name availability does not reveal buffer state while video change.

@@ +680,5 @@
> +            mGL->fTexParameteri(LOCAL_GL_TEXTURE_EXTERNAL_OES, LOCAL_GL_TEXTURE_WRAP_T, LOCAL_GL_CLAMP_TO_EDGE);
> +            mGL->fTexParameteri(LOCAL_GL_TEXTURE_EXTERNAL_OES, LOCAL_GL_TEXTURE_MAG_FILTER, LOCAL_GL_NEAREST);
> +            mGL->fTexParameteri(LOCAL_GL_TEXTURE_EXTERNAL_OES, LOCAL_GL_TEXTURE_MIN_FILTER, LOCAL_GL_NEAREST);
> +        }
> +        // Since EXTERNAL_OES not exposed to canvas, no need to store origin value.

Yep, I am working on it :p

@@ +686,5 @@
> +        mGL->fEGLImageTargetTexture2D(LOCAL_GL_TEXTURE_EXTERNAL_OES, image);
> +
> +        float yFlipf = yFlip ? 1.0f : 0.0f;
> +        if (mCurrFlip != yFlipf) {
> +            mCurrFlip = yFlipf;

Done.

@@ +773,5 @@
> +        type = BlitTexRect;
> +        break;
> +    default:
> +        printf_stderr("[%s:%d] Fatal Error: Failed to prepare to blit texture->framebuffer.\n",
> +                      __FILE__, __LINE__);

Done.

::: gfx/gl/GLBlitHelper.h
@@ +122,5 @@
> +    int mTexWidth;
> +    int mTexHeight;
> +
> +    // Cache some uniform values
> +    int mCurrFlip;

Removed, and change all follow Curr to Cur.
Attached patch V5 (obsolete) — Splinter Review
Attachment #8424645 - Flags: review?(jgilbert)
Attachment #8424645 - Flags: feedback?(pchang)
(In reply to Chiajung Hung [:chiajung] from comment #53)
> We have to do this while video size change:
> If the WebApp loads several different sizes video with the same
> WebGLContext, we need reallocation the buffer. 
> Since GLBlitHelper is cached once used, therefore texture name availability
> does not reveal buffer state while video change.

Note that using texImage2D to reallocate a previously different sized texture can incur significant overhead over using texSubImage2D to update a texture whose size hasn't changed (mileage varies between OS, driver and GPU).

It's also the case that if you reuse one texture for different videos, that you will not later on be able to convert this usage to asynchronous uploads if you wish to do so.
(In reply to Florian Bösch from comment #55)
> (In reply to Chiajung Hung [:chiajung] from comment #53)
> > We have to do this while video size change:
> > If the WebApp loads several different sizes video with the same
> > WebGLContext, we need reallocation the buffer. 
> > Since GLBlitHelper is cached once used, therefore texture name availability
> > does not reveal buffer state while video change.
> 
> Note that using texImage2D to reallocate a previously different sized
> texture can incur significant overhead over using texSubImage2D to update a
> texture whose size hasn't changed (mileage varies between OS, driver and
> GPU).
Sure, but it still slightly better than deleteTextures + texImage2D than texSubImage2D. The video frame size can not change during playback, so I think it is OK. 
> 
> It's also the case that if you reuse one texture for different videos, that
> you will not later on be able to convert this usage to asynchronous uploads
> if you wish to do so.
We will create another bug for async upload. This patch builds all basic constructions(shader + flow), and we can later leverage these constructions in async upload version. If there's problem for PBO version, we can fix them later in that bug :p.
Comment on attachment 8424645 [details] [diff] [review]
V5

Review of attachment 8424645 [details] [diff] [review]:
-----------------------------------------------------------------

some minor suggestions while study your patch

::: gfx/gl/GLBlitHelper.cpp
@@ +169,5 @@
> +    };
> +
> +    mSrcTexY = mSrcTexCb = mSrcTexEGL = 0;
> +    mGL->fDeleteTextures(sizeof(tex)/sizeof(GLuint), tex);
> +

mGL->fDeleteTextures(ArrayLength(tex), tex);

@@ +292,5 @@
>          programPtr = &mTex2DRectBlit_Program;
>          fragShaderPtr = &mTex2DRectBlit_FragShader;
>          fragShaderSource = kTex2DRectBlit_FragShaderSource;
> +        break;
> +#ifdef MOZ_WIDGET_GONK:

remove :

@@ +428,5 @@
>          mGL->fUseProgram(program);
> +        switch (target) {
> +            case BlitTex2D:
> +            case BlitTexRect:
> +            case ConvertGralloc: {

#ifdef MOZ_WIDGET_GONK
case ConvertGralloc:
#endif
{

@@ +589,5 @@
> +    GLuint* srcTexArr[3] = {&mSrcTexY, &mSrcTexCb, &mSrcTexCr};
> +    GLuint& tex = *srcTexArr[which];
> +    if (!tex) {
> +        tex = CreateTexture(mGL, LOCAL_GL_LUMINANCE, LOCAL_GL_LUMINANCE, LOCAL_GL_UNSIGNED_BYTE,
> +                            gfx::IntSize(width, height), false);

if tex == nullptr, allocation must be true.
MOZ_ASSERT(allocation);

@@ +621,5 @@
> +
> +bool
> +GLBlitHelper::BlitImageToTexture(layers::Image* srcImage, const gfx::IntSize& destSize, GLuint destTex, GLenum destTarget, bool yFlip, GLuint xoffset, GLuint yoffset, GLuint cropWidth, GLuint cropHeight)
> +{
> +    mGL->MakeCurrent();

All the other blit functions do not MakeCurrent. Is this call really needed?

@@ +639,5 @@
> +    }
> +
> +    if (!InitTexQuadProgram(type))
> +        return false;
> +

{}

@@ +647,5 @@
> +
> +    ScopedBindFramebuffer boundFB(mGL, mFBO);
> +    mGL->fFramebufferTexture2D(LOCAL_GL_FRAMEBUFFER, LOCAL_GL_COLOR_ATTACHMENT0, destTarget, destTex, 0);
> +#ifdef MOZ_WIDGET_GONK
> +    if (srcImage->GetFormat() == ImageFormat::GRALLOC_PLANAR_YCBCR) {

if (type == ConvertGralloc)

Suggest move this if statement into another private member function
BlitImageToTexture(GrallocImage *aImage ....)

@@ +681,5 @@
> +            mGL->fTexParameteri(LOCAL_GL_TEXTURE_EXTERNAL_OES, LOCAL_GL_TEXTURE_WRAP_S, LOCAL_GL_CLAMP_TO_EDGE);
> +            mGL->fTexParameteri(LOCAL_GL_TEXTURE_EXTERNAL_OES, LOCAL_GL_TEXTURE_WRAP_T, LOCAL_GL_CLAMP_TO_EDGE);
> +            mGL->fTexParameteri(LOCAL_GL_TEXTURE_EXTERNAL_OES, LOCAL_GL_TEXTURE_MAG_FILTER, LOCAL_GL_NEAREST);
> +            mGL->fTexParameteri(LOCAL_GL_TEXTURE_EXTERNAL_OES, LOCAL_GL_TEXTURE_MIN_FILTER, LOCAL_GL_NEAREST);
> +        }

Suggest move #678 ~ #685 statement into another private function GetSrcTexEGL

And change code here to
GLuint texEGL = GetSrcTexEGL();

@@ +682,5 @@
> +            mGL->fTexParameteri(LOCAL_GL_TEXTURE_EXTERNAL_OES, LOCAL_GL_TEXTURE_WRAP_T, LOCAL_GL_CLAMP_TO_EDGE);
> +            mGL->fTexParameteri(LOCAL_GL_TEXTURE_EXTERNAL_OES, LOCAL_GL_TEXTURE_MAG_FILTER, LOCAL_GL_NEAREST);
> +            mGL->fTexParameteri(LOCAL_GL_TEXTURE_EXTERNAL_OES, LOCAL_GL_TEXTURE_MIN_FILTER, LOCAL_GL_NEAREST);
> +        }
> +        mGL->fBindTexture(LOCAL_GL_TEXTURE_EXTERNAL_OES, mSrcTexEGL);

if (!mSrcTexEGL) {
  // ...
}
else {
  mGL->fBindTexture(LOCAL_GL_TEXTURE_EXTERNAL_OES, mSrcTexEGL);
}

To prevent a dup gl call while mSrcTexEGL is null... although not quite important

@@ +687,5 @@
> +        mGL->fEGLImageTargetTexture2D(LOCAL_GL_TEXTURE_EXTERNAL_OES, image);
> +
> +        float yFlipf = yFlip ? 1.0f : 0.0f;
> +        mGL->fUniform1f(mYFlipLoc, (float)yFlipf);
> +

yFlipf is used only once, suggest remove this local variable.
mGL->fUniform1f(mYFlipLoc, yFlip ? 1.0f : 0.0f);

@@ +694,5 @@
> +        sEGLLibrary.fDestroyImage(sEGLLibrary.Display(), image);
> +        mGL->fBindTexture(LOCAL_GL_TEXTURE_EXTERNAL_OES, oldBinding);
> +    }
> +#endif
> +    if (srcImage->GetFormat() == ImageFormat::PLANAR_YCBCR) {

if (type == ConvertPlanarYCbCr)

Suggest move this if statement into another private member function
BlitImageToTexture(PlanarYCbCrImage *aImage ....)
(In reply to Chiajung Hung [:chiajung] from comment #56)
> (In reply to Florian Bösch from comment #55)
> > (In reply to Chiajung Hung [:chiajung] from comment #53)
> > > We have to do this while video size change:
> > > If the WebApp loads several different sizes video with the same
> > > WebGLContext, we need reallocation the buffer. 
> > > Since GLBlitHelper is cached once used, therefore texture name availability
> > > does not reveal buffer state while video change.
> > 
> > Note that using texImage2D to reallocate a previously different sized
> > texture can incur significant overhead over using texSubImage2D to update a
> > texture whose size hasn't changed (mileage varies between OS, driver and
> > GPU).
> Sure, but it still slightly better than deleteTextures + texImage2D than
> texSubImage2D. The video frame size can not change during playback, so I
> think it is OK. 
Unfortunately, In WebRTC, it might happen with screen rotation.(bug 984215 + bug 976397)
(In reply to C.J. Ku[:cjku] from comment #58)
> (In reply to Chiajung Hung [:chiajung] from comment #56)
> > (In reply to Florian Bösch from comment #55)
> > > (In reply to Chiajung Hung [:chiajung] from comment #53)
> > > > We have to do this while video size change:
> > > > If the WebApp loads several different sizes video with the same
> > > > WebGLContext, we need reallocation the buffer. 
> > > > Since GLBlitHelper is cached once used, therefore texture name availability
> > > > does not reveal buffer state while video change.
> > > 
> > > Note that using texImage2D to reallocate a previously different sized
> > > texture can incur significant overhead over using texSubImage2D to update a
> > > texture whose size hasn't changed (mileage varies between OS, driver and
> > > GPU).
> > Sure, but it still slightly better than deleteTextures + texImage2D than
> > texSubImage2D. The video frame size can not change during playback, so I
> > think it is OK. 
> Unfortunately, In WebRTC, it might happen with screen rotation.(bug 984215 +
> bug 976397)
For that case, something may different based on the implementation:
(1) If the video frame is rotated before send into the ImageContainer, than nothing to do here. (same as video size change, reallocation is needed)
(2) If the video frame is rotated in Compositor, we have to determine what is expected: If we choose to ignore the metadata of orientation, than nothing to do.

I don't think texImage2D aware EXIF(for <img>) or something similar currently. As a result, I think it is still OK.

As a result, I think the only case the patch can not fulfill is: texImage2D must respond to metadata. (It would be a large change itself, so do not bother that here, let's file a new bug for that :p)
Comment on attachment 8424645 [details] [diff] [review]
V5

Review of attachment 8424645 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/canvas/src/WebGLContext.cpp
@@ +1401,5 @@
> +    if (!video) {
> +        return false;
> +    }
> +
> +    if (NS_SUCCEEDED(video->GetReadyState(&readyState)) &&

Need error handling when GetReadyState fail

@@ +1403,5 @@
> +    }
> +
> +    if (NS_SUCCEEDED(video->GetReadyState(&readyState)) &&
> +        (readyState == nsIDOMHTMLMediaElement::HAVE_NOTHING ||
> +         readyState == nsIDOMHTMLMediaElement::HAVE_METADATA))

you can check readyState <= nsIDOMHTMLMiediaElement::HAVE_METADATA

@@ +1428,5 @@
> +    if (tex->HasImageInfoAt(target, 0)) {
> +        const WebGLTexture::ImageInfo& info = tex->ImageInfoAt(target, 0);
> +        needsAllocation = info.Width() != srcImage->GetSize().width ||
> +                          info.Height() != srcImage->GetSize().height;
> +    }

What happen if HasImageInfoAt return false?
Then TexImage2D will be always invoked.

@@ +1432,5 @@
> +    }
> +    if (needsAllocation) {
> +        gl->fTexImage2D(target, level, internalformat, srcImage->GetSize().width, srcImage->GetSize().height, 0, format, type, nullptr);
> +    }
> +    bool ok = gl->BlitHelper()->BlitImageToTexture(srcImage.get(), srcImage->GetSize(), destTex, target, mPixelStoreFlipY);

you can just call tex->GLName() instead of destTex

::: gfx/gl/GLBlitHelper.h
@@ +78,5 @@
> +
> +    /**
> +     * BlitTex2D is used to copy blit the content of a GL_TEXTURE_2D object,
> +     * BlitTexRect is used to copy blit the content of a GL_TEXTURE_RECT object,
> +     * The difference of BlitTex2D and BlitTexRect is the texture type, which affect

The difference between..., which affects
Attachment #8424645 - Flags: feedback?(pchang) → feedback+
Comment on attachment 8424645 [details] [diff] [review]
V5

Review of attachment 8424645 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for review. I will fix the patch based on your comment.

::: content/canvas/src/WebGLContext.cpp
@@ +1401,5 @@
> +    if (!video) {
> +        return false;
> +    }
> +
> +    if (NS_SUCCEEDED(video->GetReadyState(&readyState)) &&

http://dxr.mozilla.org/mozilla-central/source/content/html/content/src/HTMLMediaElement.cpp#1249

it always success.

@@ +1403,5 @@
> +    }
> +
> +    if (NS_SUCCEEDED(video->GetReadyState(&readyState)) &&
> +        (readyState == nsIDOMHTMLMediaElement::HAVE_NOTHING ||
> +         readyState == nsIDOMHTMLMediaElement::HAVE_METADATA))

Done.

@@ +1428,5 @@
> +    if (tex->HasImageInfoAt(target, 0)) {
> +        const WebGLTexture::ImageInfo& info = tex->ImageInfoAt(target, 0);
> +        needsAllocation = info.Width() != srcImage->GetSize().width ||
> +                          info.Height() != srcImage->GetSize().height;
> +    }

HasImageInfo is a hint about texture allocation state, once TexImage series function called against the texture, it should not return false.

@@ +1432,5 @@
> +    }
> +    if (needsAllocation) {
> +        gl->fTexImage2D(target, level, internalformat, srcImage->GetSize().width, srcImage->GetSize().height, 0, format, type, nullptr);
> +    }
> +    bool ok = gl->BlitHelper()->BlitImageToTexture(srcImage.get(), srcImage->GetSize(), destTex, target, mPixelStoreFlipY);

Done.

::: gfx/gl/GLBlitHelper.cpp
@@ +169,5 @@
> +    };
> +
> +    mSrcTexY = mSrcTexCb = mSrcTexEGL = 0;
> +    mGL->fDeleteTextures(sizeof(tex)/sizeof(GLuint), tex);
> +

Done.

@@ +292,5 @@
>          programPtr = &mTex2DRectBlit_Program;
>          fragShaderPtr = &mTex2DRectBlit_FragShader;
>          fragShaderSource = kTex2DRectBlit_FragShaderSource;
> +        break;
> +#ifdef MOZ_WIDGET_GONK:

Done.

@@ +428,5 @@
>          mGL->fUseProgram(program);
> +        switch (target) {
> +            case BlitTex2D:
> +            case BlitTexRect:
> +            case ConvertGralloc: {

This enum defined for all platform in the header, why we ifdef it here?

@@ +589,5 @@
> +    GLuint* srcTexArr[3] = {&mSrcTexY, &mSrcTexCb, &mSrcTexCr};
> +    GLuint& tex = *srcTexArr[which];
> +    if (!tex) {
> +        tex = CreateTexture(mGL, LOCAL_GL_LUMINANCE, LOCAL_GL_LUMINANCE, LOCAL_GL_UNSIGNED_BYTE,
> +                            gfx::IntSize(width, height), false);

Done.

@@ +621,5 @@
> +
> +bool
> +GLBlitHelper::BlitImageToTexture(layers::Image* srcImage, const gfx::IntSize& destSize, GLuint destTex, GLenum destTarget, bool yFlip, GLuint xoffset, GLuint yoffset, GLuint cropWidth, GLuint cropHeight)
> +{
> +    mGL->MakeCurrent();

Since I did not MakeCurrent all along the path, other blit usually MakeCurrent somewhere else.
Move MakeCurrent to WebGL::TexImageFromVideoElement.

@@ +639,5 @@
> +    }
> +
> +    if (!InitTexQuadProgram(type))
> +        return false;
> +

Done.

@@ +647,5 @@
> +
> +    ScopedBindFramebuffer boundFB(mGL, mFBO);
> +    mGL->fFramebufferTexture2D(LOCAL_GL_FRAMEBUFFER, LOCAL_GL_COLOR_ATTACHMENT0, destTarget, destTex, 0);
> +#ifdef MOZ_WIDGET_GONK
> +    if (srcImage->GetFormat() == ImageFormat::GRALLOC_PLANAR_YCBCR) {

Such change somehow make these gl operations fragmentary and maybe harder to trace.
Anyway this class has many fragmentary gl code segment.
I will refactor this part slightly in next patch.

@@ +681,5 @@
> +            mGL->fTexParameteri(LOCAL_GL_TEXTURE_EXTERNAL_OES, LOCAL_GL_TEXTURE_WRAP_S, LOCAL_GL_CLAMP_TO_EDGE);
> +            mGL->fTexParameteri(LOCAL_GL_TEXTURE_EXTERNAL_OES, LOCAL_GL_TEXTURE_WRAP_T, LOCAL_GL_CLAMP_TO_EDGE);
> +            mGL->fTexParameteri(LOCAL_GL_TEXTURE_EXTERNAL_OES, LOCAL_GL_TEXTURE_MAG_FILTER, LOCAL_GL_NEAREST);
> +            mGL->fTexParameteri(LOCAL_GL_TEXTURE_EXTERNAL_OES, LOCAL_GL_TEXTURE_MIN_FILTER, LOCAL_GL_NEAREST);
> +        }

I can not get the point here: if GetSrcTexEGL() just returns mSrcTexEGL, why we need this?
Maybe we can make another BindAndUpload function for external image.

::: gfx/gl/GLBlitHelper.h
@@ +78,5 @@
> +
> +    /**
> +     * BlitTex2D is used to copy blit the content of a GL_TEXTURE_2D object,
> +     * BlitTexRect is used to copy blit the content of a GL_TEXTURE_RECT object,
> +     * The difference of BlitTex2D and BlitTexRect is the texture type, which affect

Done.
Retested with Firefox 29.0.1 on Ubuntu and OSX.

Difference to measurements on Ubuntu March 20. on Firefox 28.0: negible.
Difference to measurements on OSX March 3. on Firefox 27.0.1 below:

							FPS	UPms	FPS	UPms
Firefox 29.0.1:	texImage2D	1080p	N/A	N/A	34.06	22.98	34.55	22.79
Firefox 27.0.1:	texImage2D	1080p	N/A	N/A	49.98	14.16	51.41	13.97
Change:							-15.92	+7.98	-16.86	+8.82
Change%:						-31%	+56%	-32%	+63%

Performance for video upload between Firefox 27.0.1 and 29.0.1 got around 30% worse in terms of FPS and around 60% worse in terms of upload time.

All numbers are in the historical section of the test page: http://codeflow.org/issues/slow_video_to_texture/
Attached patch V6 (obsolete) — Splinter Review
Slightly refactored based on earlier reviews.
Attachment #8420728 - Attachment is obsolete: true
Attachment #8424645 - Attachment is obsolete: true
Attachment #8424645 - Flags: review?(jgilbert)
Attachment #8430448 - Flags: review?(jgilbert)
Comment on attachment 8430448 [details] [diff] [review]
V6

Review of attachment 8430448 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/canvas/src/WebGLContext.cpp
@@ +1395,5 @@
> +bool WebGLContext::TexImageFromVideoElement(GLenum target, GLint level,
> +                              GLenum internalformat, GLenum format, GLenum type,
> +                              mozilla::dom::Element* elt)
> +{
> +    uint16_t readyState;

Declare this next to its use, not this early.

@@ +1421,5 @@
> +    }
> +
> +    gl->MakeCurrent();
> +    nsRefPtr<mozilla::layers::Image> srcImage = container->LockCurrentImage();
> +    WebGLTexture* tex = activeBoundTextureForTarget(target);

I think this needs to be INVALID_OP, if tex is null.

@@ +1423,5 @@
> +    gl->MakeCurrent();
> +    nsRefPtr<mozilla::layers::Image> srcImage = container->LockCurrentImage();
> +    WebGLTexture* tex = activeBoundTextureForTarget(target);
> +
> +    bool needsAllocation = true;

I would prefer `bool dimensionsMatch = false`
and
dimensionsMatch = width == width &&
                  height == height;

if (!dimensionsMatch) {
  // We need to allocate.

::: content/canvas/src/WebGLContext.h
@@ +412,5 @@
>      // Allow whatever element types the bindings are willing to pass
>      // us in TexImage2D
> +    bool TexImageFromVideoElement(GLenum target, GLint level,
> +                                  GLenum internalformat, GLenum format, GLenum type,
> +                                  mozilla::dom::Element* image);

Surely this should be `const mozilla::dom::Element& image`, since this shouldn't nullable.

::: content/canvas/src/WebGLContextGL.cpp
@@ -3608,5 @@
>  
>      WebGLTexture *tex = activeBoundTextureForTarget(target);
>  
> -    if (!tex)
> -        return ErrorInvalidOperation("texImage2D: no texture is bound to this target");

Why do you remove this? You should at least leave a MOZ_ASSERT.

::: gfx/gl/GLBlitHelper.cpp
@@ +196,5 @@
>      ";
>  
>      const char kTex2DBlit_FragShaderSource[] = "\
> +        #ifdef GL_FRAGMENT_PRECISION_HIGH                   \n\
> +           precision highp float;                           \n\

Bad indent (3-space?)

@@ +207,5 @@
> +        varying vec2 vTexCoord;                             \n\
> +                                                            \n\
> +        void main(void)                                     \n\
> +        {                                                   \n\
> +           gl_FragColor = texture2D(uTexUnit, vTexCoord);   \n\

3-space

@@ +225,5 @@
> +        varying vec2 vTexCoord;                                       \n\
> +                                                                      \n\
> +        void main(void)                                               \n\
> +        {                                                             \n\
> +           gl_FragColor = texture2DRect(uTexUnit,                     \n\

What's with all the 3-space?

@@ +260,5 @@
> +        uniform vec2 uYTexScale;                                            \n\
> +        uniform vec2 uCbCrTexScale;                                         \n\
> +        void main()                                                         \n\
> +        {                                                                   \n\
> +           vec4 color;                                                      \n\

No point in this color intermediary. Why not just use gl_FragColor directly?

@@ +442,5 @@
> +                GLint texCb = mGL->fGetUniformLocation(program, "uCbTexture");
> +                GLint texCr = mGL->fGetUniformLocation(program, "uCrTexture");
> +                mYTexScaleLoc = mGL->fGetUniformLocation(program, "uYTexScale");
> +                mCbCrTexScaleLoc= mGL->fGetUniformLocation(program, "uCbCrTexScale");
> +                MOZ_ASSERT(texY != -1 && texU != -1 && texV != -1 && mYTexScaleLoc != -1 && mCbCrTexScaleLoc != -1, "uniforms not found");

MOZ_ASSERT((foo &&
            bar &&
            bat)
           "Qux");

Or use:
DebugOnly<bool> hasUniformLocations = foo &&
                                      bar;
MOZ_ASSERT(hasUniformLocation, "qux");

@@ +585,4 @@
>  }
>  
>  void
> +GLBlitHelper::BindAndUploadYUVTexture(Channel which, uint32_t width, uint32_t height, void* data, bool allocation)

`bool needsAllocation`

@@ +625,5 @@
> +#ifdef MOZ_WIDGET_GONK
> +bool
> +GLBlitHelper::BindAndUploadExternalTexture(EGLImage image) {
> +    if (image == EGL_NO_IMAGE) {
> +        return false;

If we already can know the return value based on our inputs, why bother returning bool? We should just MOZ_ASSERT(image != EGL_NO_IMAGE), disallowing that as an input, and returning void.

@@ +636,5 @@
> +        mGL->fTexParameteri(LOCAL_GL_TEXTURE_EXTERNAL_OES, LOCAL_GL_TEXTURE_WRAP_T, LOCAL_GL_CLAMP_TO_EDGE);
> +        mGL->fTexParameteri(LOCAL_GL_TEXTURE_EXTERNAL_OES, LOCAL_GL_TEXTURE_MAG_FILTER, LOCAL_GL_NEAREST);
> +        mGL->fTexParameteri(LOCAL_GL_TEXTURE_EXTERNAL_OES, LOCAL_GL_TEXTURE_MIN_FILTER, LOCAL_GL_NEAREST);
> +    }
> +    mGL->fBindTexture(LOCAL_GL_TEXTURE_EXTERNAL_OES, mSrcTexEGL);

Don't bind this twice.

@@ +653,5 @@
> +    };
> +    EGLImage image = sEGLLibrary.fCreateImage(sEGLLibrary.Display(),
> +                                              EGL_NO_CONTEXT,
> +                                              LOCAL_EGL_NATIVE_BUFFER_ANDROID,
> +                                              grallocImage->GetNativeBuffer(), attrs);

Check this for failure here, instead of waiting until BindAndUploadExternalTexture.

@@ +656,5 @@
> +                                              LOCAL_EGL_NATIVE_BUFFER_ANDROID,
> +                                              grallocImage->GetNativeBuffer(), attrs);
> +
> +    int oldBinding = 0;
> +    mGL->fGetIntegerv(LOCAL_GL_TEXTURE_BINDING_EXTERNAL_OES, &oldBinding);

Don't we have a Scoped type for this, that accepts different TARGETs?

@@ +678,5 @@
> +{
> +    ScopedBindTextureUnit boundTU(mGL, LOCAL_GL_TEXTURE0);
> +    const PlanarYCbCrData* yuvData = yuvImage->GetData();
> +
> +    GLint oldTex[3];

Put this closer to its use.

@@ +727,5 @@
> +        default:
> +            return false;
> +    }
> +
> +    if (!InitTexQuadProgram(type)) {

Shouldn't we just assert this?

::: gfx/gl/GLBlitHelper.h
@@ +171,5 @@
>                                const gfx::IntSize& destSize,
>                                GLenum srcTarget = LOCAL_GL_TEXTURE_2D,
>                                GLenum destTarget = LOCAL_GL_TEXTURE_2D);
> +    bool BlitImageToTexture(layers::Image* srcImage, const gfx::IntSize& destSize,
> +                            GLuint destTex, GLenum destTarget, bool yFlip = 0, GLuint xoffset = 0,

`bool` should take `false` not `0`.
Attachment #8430448 - Flags: review?(jgilbert) → review-
Comment on attachment 8430448 [details] [diff] [review]
V6

Review of attachment 8430448 [details] [diff] [review]:
-----------------------------------------------------------------

I will upload the reviewed version soon, thanks.

::: content/canvas/src/WebGLContext.cpp
@@ +1395,5 @@
> +bool WebGLContext::TexImageFromVideoElement(GLenum target, GLint level,
> +                              GLenum internalformat, GLenum format, GLenum type,
> +                              mozilla::dom::Element* elt)
> +{
> +    uint16_t readyState;

Done.

@@ +1421,5 @@
> +    }
> +
> +    gl->MakeCurrent();
> +    nsRefPtr<mozilla::layers::Image> srcImage = container->LockCurrentImage();
> +    WebGLTexture* tex = activeBoundTextureForTarget(target);

I moved the check to TexImage2D, before any heavy lifting works start.

@@ +1423,5 @@
> +    gl->MakeCurrent();
> +    nsRefPtr<mozilla::layers::Image> srcImage = container->LockCurrentImage();
> +    WebGLTexture* tex = activeBoundTextureForTarget(target);
> +
> +    bool needsAllocation = true;

Done.

::: content/canvas/src/WebGLContextGL.cpp
@@ -3608,5 @@
>  
>      WebGLTexture *tex = activeBoundTextureForTarget(target);
>  
> -    if (!tex)
> -        return ErrorInvalidOperation("texImage2D: no texture is bound to this target");

I just move it to WebGLContext::TexImage2D

::: gfx/gl/GLBlitHelper.cpp
@@ +196,5 @@
>      ";
>  
>      const char kTex2DBlit_FragShaderSource[] = "\
> +        #ifdef GL_FRAGMENT_PRECISION_HIGH                   \n\
> +           precision highp float;                           \n\

This is 4-space indent.

@@ +207,5 @@
> +        varying vec2 vTexCoord;                             \n\
> +                                                            \n\
> +        void main(void)                                     \n\
> +        {                                                   \n\
> +           gl_FragColor = texture2D(uTexUnit, vTexCoord);   \n\

Done.(4-space now)

@@ +225,5 @@
> +        varying vec2 vTexCoord;                                       \n\
> +                                                                      \n\
> +        void main(void)                                               \n\
> +        {                                                             \n\
> +           gl_FragColor = texture2DRect(uTexUnit,                     \n\

They are copied from our old CompositorOGL.
And the indent is messed Eclipse.

@@ +260,5 @@
> +        uniform vec2 uYTexScale;                                            \n\
> +        uniform vec2 uCbCrTexScale;                                         \n\
> +        void main()                                                         \n\
> +        {                                                                   \n\
> +           vec4 color;                                                      \n\

Done. (They are copied from old code, too :p)

@@ +442,5 @@
> +                GLint texCb = mGL->fGetUniformLocation(program, "uCbTexture");
> +                GLint texCr = mGL->fGetUniformLocation(program, "uCrTexture");
> +                mYTexScaleLoc = mGL->fGetUniformLocation(program, "uYTexScale");
> +                mCbCrTexScaleLoc= mGL->fGetUniformLocation(program, "uCbCrTexScale");
> +                MOZ_ASSERT(texY != -1 && texU != -1 && texV != -1 && mYTexScaleLoc != -1 && mCbCrTexScaleLoc != -1, "uniforms not found");

Done.

@@ +585,4 @@
>  }
>  
>  void
> +GLBlitHelper::BindAndUploadYUVTexture(Channel which, uint32_t width, uint32_t height, void* data, bool allocation)

Done.

@@ +625,5 @@
> +#ifdef MOZ_WIDGET_GONK
> +bool
> +GLBlitHelper::BindAndUploadExternalTexture(EGLImage image) {
> +    if (image == EGL_NO_IMAGE) {
> +        return false;

Done.

@@ +636,5 @@
> +        mGL->fTexParameteri(LOCAL_GL_TEXTURE_EXTERNAL_OES, LOCAL_GL_TEXTURE_WRAP_T, LOCAL_GL_CLAMP_TO_EDGE);
> +        mGL->fTexParameteri(LOCAL_GL_TEXTURE_EXTERNAL_OES, LOCAL_GL_TEXTURE_MAG_FILTER, LOCAL_GL_NEAREST);
> +        mGL->fTexParameteri(LOCAL_GL_TEXTURE_EXTERNAL_OES, LOCAL_GL_TEXTURE_MIN_FILTER, LOCAL_GL_NEAREST);
> +    }
> +    mGL->fBindTexture(LOCAL_GL_TEXTURE_EXTERNAL_OES, mSrcTexEGL);

Done.

@@ +653,5 @@
> +    };
> +    EGLImage image = sEGLLibrary.fCreateImage(sEGLLibrary.Display(),
> +                                              EGL_NO_CONTEXT,
> +                                              LOCAL_EGL_NATIVE_BUFFER_ANDROID,
> +                                              grallocImage->GetNativeBuffer(), attrs);

Done.

@@ +656,5 @@
> +                                              LOCAL_EGL_NATIVE_BUFFER_ANDROID,
> +                                              grallocImage->GetNativeBuffer(), attrs);
> +
> +    int oldBinding = 0;
> +    mGL->fGetIntegerv(LOCAL_GL_TEXTURE_BINDING_EXTERNAL_OES, &oldBinding);

Use ScopedBindTexture would do some extra bind here, since the bindTexture call is inside BindAndUploadExternalTexture now.

@@ +678,5 @@
> +{
> +    ScopedBindTextureUnit boundTU(mGL, LOCAL_GL_TEXTURE0);
> +    const PlanarYCbCrData* yuvData = yuvImage->GetData();
> +
> +    GLint oldTex[3];

Done.

@@ +727,5 @@
> +        default:
> +            return false;
> +    }
> +
> +    if (!InitTexQuadProgram(type)) {

Done.

::: gfx/gl/GLBlitHelper.h
@@ +171,5 @@
>                                const gfx::IntSize& destSize,
>                                GLenum srcTarget = LOCAL_GL_TEXTURE_2D,
>                                GLenum destTarget = LOCAL_GL_TEXTURE_2D);
> +    bool BlitImageToTexture(layers::Image* srcImage, const gfx::IntSize& destSize,
> +                            GLuint destTex, GLenum destTarget, bool yFlip = 0, GLuint xoffset = 0,

Done.
Attached patch V7 (obsolete) — Splinter Review
Attachment #8430448 - Attachment is obsolete: true
Attachment #8432912 - Flags: review?(jgilbert)
I'm very glad you guys are actively working to address this issue. I just built and tested the V7 patch against mozilla-central checked out earlier today. On a late 2013 Retina MacBook Pro with an up to 2.9GHz Core i5 (dual core, 4 threads) and Intel Iris 5000 graphics, on the http://codeflow.org/issues/slow_video_to_texture/ page I see:

upload	resolution	frame/s	upload time (ms)	frame/s	upload time (ms)	frame/s	upload time (ms)
	H.264	VP8	Theora
texImage2D	240p	N/A	N/A	56.78	1.55	59.18	1.57
texImage2D	480p	N/A	N/A	59.73	1.72	56.16	1.78
texImage2D	720p	N/A	N/A	59.11	2.53	58.72	2.18
texImage2D	1080p	N/A	N/A	59.38	3.95	59.36	3.65
texSubImage2D	240p	N/A	N/A	59.58	1.27	59.27	1.06
texSubImage2D	480p	N/A	N/A	59.89	1.46	59.79	1.80
texSubImage2D	720p	N/A	N/A	59.05	1.79	58.95	1.80
texSubImage2D	1080p	N/A	N/A	56.79	4.00	52.29	3.68
texImage2D	240p	N/A	N/A	58.01	1.11	59.37	1.14
texImage2D	480p	N/A	N/A	56.78	1.61	59.58	1.65
texImage2D	720p	N/A	N/A	58.82	2.46	59.73	2.12
texImage2D	1080p	N/A	N/A	58.79	3.49	58.65	3.47
texSubImage2D	240p	N/A	N/A	59.13	1.16	56.30	1.46
texSubImage2D	480p	N/A	N/A	59.57	1.51	59.59	1.48
texSubImage2D	720p	N/A	N/A	60.05	1.88	60.06	1.68
texSubImage2D	1080p	N/A	N/A	59.20	3.48	60.03	3.06

There is one caveat though - the canvas is initially grey and when I hit run, it is always black. I saw that this was noted above in comment 19 and those following but the workaround did not work for me (start firefox, have just the card view empty tab open, quit, start firefox again, load and run test.)

I can test things or provide debug logs on request.
(In reply to robert.swain from comment #67)
> I'm very glad you guys are actively working to address this issue. I just
> built and tested the V7 patch against mozilla-central checked out earlier
> today. On a late 2013 Retina MacBook Pro with an up to 2.9GHz Core i5 (dual
> core, 4 threads) and Intel Iris 5000 graphics, on the
> http://codeflow.org/issues/slow_video_to_texture/ page I see:
> 
> upload	resolution	frame/s	upload time (ms)	frame/s	upload time (ms)	frame/s
> upload time (ms)
> 	H.264	VP8	Theora
> texImage2D	240p	N/A	N/A	56.78	1.55	59.18	1.57
> texImage2D	480p	N/A	N/A	59.73	1.72	56.16	1.78
> texImage2D	720p	N/A	N/A	59.11	2.53	58.72	2.18
> texImage2D	1080p	N/A	N/A	59.38	3.95	59.36	3.65
> texSubImage2D	240p	N/A	N/A	59.58	1.27	59.27	1.06
> texSubImage2D	480p	N/A	N/A	59.89	1.46	59.79	1.80
> texSubImage2D	720p	N/A	N/A	59.05	1.79	58.95	1.80
> texSubImage2D	1080p	N/A	N/A	56.79	4.00	52.29	3.68
> texImage2D	240p	N/A	N/A	58.01	1.11	59.37	1.14
> texImage2D	480p	N/A	N/A	56.78	1.61	59.58	1.65
> texImage2D	720p	N/A	N/A	58.82	2.46	59.73	2.12
> texImage2D	1080p	N/A	N/A	58.79	3.49	58.65	3.47
> texSubImage2D	240p	N/A	N/A	59.13	1.16	56.30	1.46
> texSubImage2D	480p	N/A	N/A	59.57	1.51	59.59	1.48
> texSubImage2D	720p	N/A	N/A	60.05	1.88	60.06	1.68
> texSubImage2D	1080p	N/A	N/A	59.20	3.48	60.03	3.06
> 
> There is one caveat though - the canvas is initially grey and when I hit
> run, it is always black. I saw that this was noted above in comment 19 and
> those following but the workaround did not work for me (start firefox, have
> just the card view empty tab open, quit, start firefox again, load and run
> test.)
> 
> I can test things or provide debug logs on request.
The problem mentioned there was a FirefoxOS device only bug, and it should always work on desktop. I do not own any Mac device, and I will try to find one to debug with.

Thanks for test and report!!!
(In reply to robert.swain from comment #67)
> I'm very glad you guys are actively working to address this issue. I just
> built and tested the V7 patch against mozilla-central checked out earlier
> today. On a late 2013 Retina MacBook Pro with an up to 2.9GHz Core i5 (dual
> core, 4 threads) and Intel Iris 5000 graphics, on the
> http://codeflow.org/issues/slow_video_to_texture/ page I see:
> 
> upload	resolution	frame/s	upload time (ms)	frame/s	upload time (ms)	frame/s
> upload time (ms)
> 	H.264	VP8	Theora
> texImage2D	240p	N/A	N/A	56.78	1.55	59.18	1.57
> texImage2D	480p	N/A	N/A	59.73	1.72	56.16	1.78
> texImage2D	720p	N/A	N/A	59.11	2.53	58.72	2.18
> texImage2D	1080p	N/A	N/A	59.38	3.95	59.36	3.65
> texSubImage2D	240p	N/A	N/A	59.58	1.27	59.27	1.06
> texSubImage2D	480p	N/A	N/A	59.89	1.46	59.79	1.80
> texSubImage2D	720p	N/A	N/A	59.05	1.79	58.95	1.80
> texSubImage2D	1080p	N/A	N/A	56.79	4.00	52.29	3.68
> texImage2D	240p	N/A	N/A	58.01	1.11	59.37	1.14
> texImage2D	480p	N/A	N/A	56.78	1.61	59.58	1.65
> texImage2D	720p	N/A	N/A	58.82	2.46	59.73	2.12
> texImage2D	1080p	N/A	N/A	58.79	3.49	58.65	3.47
> texSubImage2D	240p	N/A	N/A	59.13	1.16	56.30	1.46
> texSubImage2D	480p	N/A	N/A	59.57	1.51	59.59	1.48
> texSubImage2D	720p	N/A	N/A	60.05	1.88	60.06	1.68
> texSubImage2D	1080p	N/A	N/A	59.20	3.48	60.03	3.06
> 
> There is one caveat though - the canvas is initially grey and when I hit
> run, it is always black. I saw that this was noted above in comment 19 and
> those following but the workaround did not work for me (start firefox, have
> just the card view empty tab open, quit, start firefox again, load and run
> test.)
> 
> I can test things or provide debug logs on request.
It seems I can not find environment soon, can you make sure the site works as expected without this patch?
I found some strange Shader compilation error on Mac, I will try to fix it and upload a new version.
Attached patch V7 + Mac Fix (obsolete) — Splinter Review
I test on MacMini and fixed the problem: Mac does not allow "precision high float" and "highp vec4" so I just remove them.
And I found sometimes, it fail to switch to correct path, and fix a small texture uploading problem: If the video is not padded, and it's width is not multiple of 4, I have to change the UNPACK_ALIGNMENT before uploading.

And here is the result on Mac mini:
	                   H.264	     VP8	    Theora
texImage2D	240p	N/A	N/A	59.29	0.15	59.91	0.15
texImage2D	480p	N/A	N/A	59.74	0.25	59.78	0.23
texImage2D	720p	N/A	N/A	59.84	0.37	59.96	0.36
texImage2D	1080p	N/A	N/A	59.79	0.78	59.92	0.74
texSubImage2D	240p	N/A	N/A	59.66	0.13	58.94	0.13
texSubImage2D	480p	N/A	N/A	60.05	0.25	59.50	0.23
texSubImage2D	720p	N/A	N/A	59.90	0.38	59.83	0.39
texSubImage2D	1080p	N/A	N/A	60.09	0.74	59.59	0.81
texImage2D	240p	N/A	N/A	59.72	0.14	59.92	0.14
texImage2D	480p	N/A	N/A	59.71	0.24	59.95	0.23
texImage2D	720p	N/A	N/A	59.95	0.38	60.06	0.35
texImage2D	1080p	N/A	N/A	60.07	0.76	60.06	0.78
texSubImage2D	240p	N/A	N/A	59.53	0.14	59.86	0.11
texSubImage2D	480p	N/A	N/A	59.62	0.24	59.93	0.24
texSubImage2D	720p	N/A	N/A	59.77	0.37	60.08	0.34
texSubImage2D	1080p	N/A	N/A	59.26	0.87	59.84	0.77
Attachment #8432912 - Attachment is obsolete: true
Attachment #8432912 - Flags: review?(jgilbert)
Attachment #8438197 - Flags: review?(jgilbert)
With the V7 + Mac Fix patch, everything is working perfectly. :) Well done!

Results from a Late 2013 13" Retina MacBook Pro (Core i5 up to 2.9GHz, Intel Iris 5000 graphics using shared DDR3 1600MHz memory):

upload	resolution	frame/s	upload time (ms)	frame/s	upload time (ms)	frame/s	upload time (ms)
	H.264	VP8	Theora
texImage2D	240p	N/A	N/A	59.08	0.27	59.89	0.23
texImage2D	480p	N/A	N/A	59.92	0.44	59.88	0.41
texImage2D	720p	N/A	N/A	59.70	0.74	60.02	0.54
texImage2D	1080p	N/A	N/A	59.86	1.66	59.99	1.18
texSubImage2D	240p	N/A	N/A	59.87	0.20	59.22	0.29
texSubImage2D	480p	N/A	N/A	60.00	0.76	60.05	0.32
texSubImage2D	720p	N/A	N/A	59.95	0.77	59.56	0.76
texSubImage2D	1080p	N/A	N/A	59.92	1.49	59.91	1.45
texImage2D	240p	N/A	N/A	59.20	0.28	60.04	0.20
texImage2D	480p	N/A	N/A	59.66	0.64	59.48	0.40
texImage2D	720p	N/A	N/A	59.73	0.57	60.02	0.46
texImage2D	1080p	N/A	N/A	58.82	1.73	59.89	1.30
texSubImage2D	240p	N/A	N/A	59.57	0.25	59.71	0.23
texSubImage2D	480p	N/A	N/A	59.68	0.55	59.76	0.55
texSubImage2D	720p	N/A	N/A	59.98	0.57	59.40	0.74
texSubImage2D	1080p	N/A	N/A	58.54	2.20	58.34	2.28
Comment on attachment 8438197 [details] [diff] [review]
V7 +  Mac Fix

Review of attachment 8438197 [details] [diff] [review]:
-----------------------------------------------------------------

TexImage2D_base has a bunch of callers, so if you're removing a check from _base, you need to go through all the callers and add the check there.

::: content/canvas/src/WebGLContextGL.cpp
@@ +3801,2 @@
>  
>      WebGLTexture *tex = activeBoundTextureForTarget(target);

Assert this is non-null.

@@ -3801,5 @@
>  
>      WebGLTexture *tex = activeBoundTextureForTarget(target);
>  
> -    if (!tex)
> -        return ErrorInvalidOperation("texImage2D: no texture is bound to this target");

If you remove this from here, it needs to go in the callers of this function, which are also in this file. I don't see the addition of this code there.

::: gfx/gl/GLBlitHelper.cpp
@@ +12,5 @@
> +
> +#ifdef MOZ_WIDGET_GONK
> +#include "GrallocImages.h"
> +#include "GLLibraryEGL.h"
> +using mozilla::layers::GrallocImage;

Please don't use usings anymore, particularly in ifdefs.

@@ +167,5 @@
> +        mSrcTexCr,
> +        mSrcTexEGL,
> +    };
> +
> +    mSrcTexY = mSrcTexCb = mSrcTexCr = mSrcTexEGL = 0;

Why do you zero these but not mFBO?
Attachment #8438197 - Flags: review?(jgilbert) → review-
Comment on attachment 8438197 [details] [diff] [review]
V7 +  Mac Fix

Review of attachment 8438197 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/canvas/src/WebGLContextGL.cpp
@@ +3801,2 @@
>  
>      WebGLTexture *tex = activeBoundTextureForTarget(target);

This part reverted

@@ -3801,5 @@
>  
>      WebGLTexture *tex = activeBoundTextureForTarget(target);
>  
> -    if (!tex)
> -        return ErrorInvalidOperation("texImage2D: no texture is bound to this target");

Revert this part, because only my case need early rejection

::: gfx/gl/GLBlitHelper.cpp
@@ +12,5 @@
> +
> +#ifdef MOZ_WIDGET_GONK
> +#include "GrallocImages.h"
> +#include "GLLibraryEGL.h"
> +using mozilla::layers::GrallocImage;

Since the class is defined only if MOZ_WIDGET_GONK is defined...I will try to clean it up

@@ +167,5 @@
> +        mSrcTexCr,
> +        mSrcTexEGL,
> +    };
> +
> +    mSrcTexY = mSrcTexCb = mSrcTexCr = mSrcTexEGL = 0;

I just forget it...
Attached patch V8 (obsolete) — Splinter Review
Fix review comment...
Attachment #8438197 - Attachment is obsolete: true
Attachment #8443349 - Flags: review?(jgilbert)
Attached patch V8 (obsolete) — Splinter Review
Missing 1-line...
Attachment #8443349 - Attachment is obsolete: true
Attachment #8443349 - Flags: review?(jgilbert)
Attachment #8443351 - Flags: review?(jgilbert)
It would be great if this landed sooner rather than later so that it didn't miss a release window and get delayed coming out to the masses for some more weeks. :)
Comment on attachment 8443351 [details] [diff] [review]
V8

Review of attachment 8443351 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/canvas/src/WebGLContext.h
@@ +450,5 @@
>      {
>          if (IsContextLost())
>              return;
> +
> +        WebGLTexture *tex = activeBoundTextureForTarget(target);

Star-to-the-left.
Attachment #8443351 - Flags: review?(jgilbert) → review+
I pushed to try:
https://tbpl.mozilla.org/?tree=Try&rev=a59479bbbda9

@Jeff,
Any idea about the Android 4.0 webgl comformant test fail?
I was thinking it a precision problem, but use highp does not help.
Any suggestion?

I can reproduce it on Nexus 5, and most of them says:
should be 0,255,0 was 1,255,2

By the way, I found some test in the page fail on Firefox 28, too.
Flags: needinfo?(jgilbert)
I just tested this revision (a59479bbbda9) on Linux using eleVR-Web-Player [1][2].  It works, and texture upload is over an order of magnitude faster on my laptop with Sandy Bridge IGP graphics.  Thanks!

[1] https://github.com/hawksley/eleVR-Web-Player
[2] player.elevr.com
(In reply to Chiajung Hung [:chiajung] from comment #79)
> I pushed to try:
> https://tbpl.mozilla.org/?tree=Try&rev=a59479bbbda9
> 
> @Jeff,
> Any idea about the Android 4.0 webgl comformant test fail?
> I was thinking it a precision problem, but use highp does not help.
> Any suggestion?
> 
> I can reproduce it on Nexus 5, and most of them says:
> should be 0,255,0 was 1,255,2
> 
> By the way, I found some test in the page fail on Firefox 28, too.

Not sure. Is this int the YCbCr? Maybe try checking the math manually? I really assumed that your YCbCr conversion shader was correct, but maybe it has a mistake.
Flags: needinfo?(jgilbert)
I found the formula is basically correct, but some device just have very bad precision.
The cause of it is: RGB(0, 255, 0) is YCbCr(144.52, 53.795, 34.16) and stored as (145, 54, 34). while converting back to RGB the value in double is (0.132, 255.903, -1.194) and is convert to (0, 255, 0). while some device get strange value like (1, 255, 2) while read back, I found it can be avoided by manipulate the shader output by
gl_FragColor.r = floor(r * 255.0) / 255.0;

This makes (255, 0, 0) case happy, but (0, 255, 0) case is (0, 255, 1) now. And by tweak the Cb offset from 0.5 to 0.504(from 128 to 129) makes them happy. (since Cb rounding makes the problem)

Basically this seems impossible to overcome while ESSL 1.0 not support double in general. And I think such workaround is not harmful, so I think this part is cleared now. And I have more cases to fix...

New try ticket:
https://tbpl.mozilla.org/?tree=Try&rev=5de7e84ec86b
(In reply to Chiajung Hung [:chiajung] from comment #82)
> I found the formula is basically correct, but some device just have very bad
> precision.
> The cause of it is: RGB(0, 255, 0) is YCbCr(144.52, 53.795, 34.16) and
> stored as (145, 54, 34). while converting back to RGB the value in double is
> (0.132, 255.903, -1.194) and is convert to (0, 255, 0). while some device
> get strange value like (1, 255, 2) while read back, I found it can be
> avoided by manipulate the shader output by
> gl_FragColor.r = floor(r * 255.0) / 255.0;
> 
> This makes (255, 0, 0) case happy, but (0, 255, 0) case is (0, 255, 1) now.
> And by tweak the Cb offset from 0.5 to 0.504(from 128 to 129) makes them
> happy. (since Cb rounding makes the problem)
> 
> Basically this seems impossible to overcome while ESSL 1.0 not support
> double in general. And I think such workaround is not harmful, so I think
> this part is cleared now. And I have more cases to fix...
> 
> New try ticket:
> https://tbpl.mozilla.org/?tree=Try&rev=5de7e84ec86b

Is this happening on the Nexus 5?
I don't think we should be being limited by precision here, but I'll take another look.
Also, have you considered that Pure Green [0 255 0] is an illegal colour? (ie. The test is wrong). Video is notorious for having illegal colours, eg. NTSC doesn't allow full red.

Do we know what the source of the YCbCr is? Does it conform to Rec. 601? Studio sources never use the full range of values because of this issue. (Rec. 601 specifies black at value 16 and white at 235).

http://www.poynton.com/notes/colour_and_gamma/ColorFAQ.html#RTFToC29
Yes, it is Nexus 5. I think we have no way to check the "real" video color space here. I think we may not able to know that even in decoder part.

I read a similar topic reveal some similar problem here:
http://www.fourcc.org/fccyvrgb.php

The test case is in fact WebGL conformance test, I don't know whether we can fix conformance test or not.
If yes, I think we should make it tolerant for small error.
(In reply to Dan Glastonbury :djg :kamidphish from comment #85)
> Also, have you considered that Pure Green [0 255 0] is an illegal colour?
> (ie. The test is wrong). Video is notorious for having illegal colours, eg.
> NTSC doesn't allow full red.
> 
> Do we know what the source of the YCbCr is? Does it conform to Rec. 601?
> Studio sources never use the full range of values because of this issue.
> (Rec. 601 specifies black at value 16 and white at 235).
> 
> http://www.poynton.com/notes/colour_and_gamma/ColorFAQ.html#RTFToC29

Generally SD content has been Rec. 601 and HD or newer content has been Rec. 709 as far as I am aware.
Attached file rec601.py (obsolete) —
After much spec reading and Python doodling, I have some data.

We assume Rec601 in our code, though we make some mistakes with our coefficients. Correcting these mistakes reduces our error, but according to Chiajung, we still are getting 0,255,1 on android.

Rec709 is dramatically different, so we'd know if we were getting that instead.

Here's the python script I threw together to do conversions and derivations of terms/matrices.

This just does everything with doubles, so there's no attempt to try to emulate the reduced precision in the shaders.
I talked with :derf, and he confirmed that we shouldn't expect stability here, since YCbCr and RGB are different color spaces. (also YCbCr uses less than [0,255])

I'm still surprised that it doesn't work on the device in question for the given values, but we might be able to blame this on the decoder.
We should do due diligence and dump the YCbCr values the decoder is giving us, to check if we can entirely blame the decoder.

We should probably allow +/-1 fuzziness for these tests, but let's try to check ourselves one last time here.
Attached file rec601.py
Oops, that had errors.
Attachment #8448979 - Attachment is obsolete: true
(In reply to Jeff Gilbert [:jgilbert] from comment #89)
> I talked with :derf, and he confirmed that we shouldn't expect stability
> here, since YCbCr and RGB are different color spaces. (also YCbCr uses less
> than [0,255])
> 
> I'm still surprised that it doesn't work on the device in question for the
> given values, but we might be able to blame this on the decoder.
> We should do due diligence and dump the YCbCr values the decoder is giving
> us, to check if we can entirely blame the decoder.
> 
> We should probably allow +/-1 fuzziness for these tests, but let's try to
> check ourselves one last time here.
Or let's land this first and file some follow ups for:
(1) PBO support
(2) conformance test
(3) Other tests

I think (1)(2) could be in follow ups, but (3) may need fixed to prevent affecting others.
I'll focus on (3) now.
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/f82a8a7c0cb9
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla33
Please fix the coefficients as per bug 1033124.
Status: RESOLVED → REOPENED
Flags: needinfo?(chung)
Resolution: FIXED → ---
(In reply to Jeff Gilbert [:jgilbert] from comment #95)
> Please fix the coefficients as per bug 1033124.

OK, I will create another patch to fix them.
Status: REOPENED → ASSIGNED
Flags: needinfo?(chung)
Target Milestone: mozilla33 → ---
Attached patch Part2, fix yuv to rgb formula (obsolete) — Splinter Review
Change YUV=>RGB formula to a more precise version.
Attachment #8454337 - Flags: review?(jgilbert)
Comment on attachment 8454337 [details] [diff] [review]
Part2, fix yuv to rgb formula

Review of attachment 8454337 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/gl/GLBlitHelper.cpp
@@ +258,4 @@
>              float y = texture2D(uYTexture, vTexCoord * uYTexScale).r;       \n\
>              float cb = texture2D(uCbTexture, vTexCoord * uCbCrTexScale).r;  \n\
>              float cr = texture2D(uCrTexture, vTexCoord * uCbCrTexScale).r;  \n\
> +            y = (y - 0.06275) * 1.16438;                                    \n\

Good, but please include the comment block from bug 1033124 near here.

@@ +262,5 @@
> +            cb = cb - 0.50196;                                              \n\
> +            cr = cr - 0.50196;                                              \n\
> +            gl_FragColor.r = y + cr * 1.59603;                              \n\
> +            gl_FragColor.g = y - 0.81297 * cr - 0.39176 * cb;               \n\
> +            gl_FragColor.b = y + cb * 2.01727;                              \n\

This should be 2.01723 not 2.01727.
Attachment #8454337 - Flags: review?(jgilbert) → review+
See Also: → 1033124
OS: Windows 7 → All
Hardware: x86_64 → All
Attached patch Part2Splinter Review
Comment added. Carry r+.
Try ticket:
https://tbpl.mozilla.org/?tree=Try&rev=18ec98ae9a4c
Attachment #8454337 - Attachment is obsolete: true
Attachment #8455863 - Flags: review+
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/bec2e24b4af8
Status: ASSIGNED → RESOLVED
Closed: 6 years ago6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla33
Retested with Chrome 36.0.1985.125, Firefox Aurora 33.0a2, Safari 6.1.4 and Internet Explorer 11

TL;DR Firefox now outperforms Chrome by around 600% on OSX and Linux.

Summary of result tables, for full results please see: http://codeflow.org/issues/slow_video_to_texture/

Ubuntu:
			FPS	UPms	FPS	UPms	FPS	UPms
Chrome 36	1080p	46.42	8.08	46.22	7.97	46.72	7.80
Chrome 35	1080p	45.28	8.02	45.10	8.03	45.39	7.98
Firefox Aurora	1080p	60.04	1.00	60.04	1.14	60.02	1.18

OSX:
			FPS	UPms	FPS	UPms	FPS	UPms
Chrome 36	1080p	59.56	9.91	59.72	9.96	59.78	9.91
Chrome 35	1080p	59.77	13.47	58.88	14.51	59.85	13.69
Firefox Aurora	1080p	N/A	N/A	59.50	1.54	59.07	1.50
Safari 6.1.4	1080p	20.07	34.94	N/A	N/A	N/A	N/A

Windows 8.1:

			FPS	UPms	FPS	UPms	FPS	UPms
Chrome 36	1080p	51.22	11.53	59.53	7.16	59.23	7.23
Firefox Aurora	1080p			57.14	9.55	56.89	9.41
IE11		video not supported for upload

Conclusion: Firefox has made good efforts in its Aurora build (scheduled to be beta early September, and release early October) on OSX and Linux, these now outperform Chrome by a substantial margin (by ~600%) and pushed upload times into the range of 1-2ms. Chrome 36 has also made some gains on Chrome 35 on OSX (by ~45%). Firefox has yet to implement their improvements on Windows and has also got to deal with bugs in their h.264 support on Windows.

Firefox issues:

 - The optimization from this ticket has not made it to the Firefox Aurora windows build. Is this a new error? Should I open a ticket for it? Or is this tracked by this ticket and it should be reopened?
 - The H.264 implementation of Firefox is buggy. It's quite jittery and often crashes (I have not looked for a ticket for this yet, is this known?)
I found the blit shader init failed on Windows, so we fallback to CPU path. I am checking why it fails.
Attached patch part 3Splinter Review
Fix the Windows part.
The problem there is ANGLE complains about no precision qualifier for float, which was removed due to MacOS can not recognize 'precision' qualifier.

The solution here is using #ifdef GL_ES to comfort both ANGLE and MacOS.
Attachment #8480438 - Flags: review?(jgilbert)
Probably worth creating a separate bug for this and the patch?  Otherwise, chances are that different patches on this bug would end up in different releases and that would just be weird.
(In reply to Chiajung Hung [:chiajung] from comment #104)
> Created attachment 8480438 [details] [diff] [review]
> part 3
> 
> Fix the Windows part.
> The problem there is ANGLE complains about no precision qualifier for float,
> which was removed due to MacOS can not recognize 'precision' qualifier.
> 
> The solution here is using #ifdef GL_ES to comfort both ANGLE and MacOS.

Precision qualifiers are only effective for GLSL ES. Desktop GLSL either doesn't support them, or leaves them as no-ops.
TIL:

> Precision qualifiers are added for code portability with OpenGL ES, not for functionality. They have the
> same syntax as in OpenGL ES, as described below, but they have no semantic meaning, which includes no
> effect on the precision used to store or operate on variables.
Comment on attachment 8480438 [details] [diff] [review]
part 3

Review of attachment 8480438 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks, but do this in its own bug.
Attachment #8480438 - Flags: review?(jgilbert)
(In reply to Milan Sreckovic [:milan] from comment #105)
> Probably worth creating a separate bug for this and the patch?  Otherwise,
> chances are that different patches on this bug would end up in different
> releases and that would just be weird.

I agree.
Blocks: 1060121
You need to log in before you can comment on or make changes to this bug.