Open Bug 999700 Opened 10 years ago Updated 2 years ago

gl.texImage2D and gl.texSubImage2D not taking fast upload path when dstStride != srcStride.

Categories

(Core :: Graphics: CanvasWebGL, enhancement, P3)

27 Branch
x86
macOS
enhancement

Tracking

()

UNCONFIRMED

People

(Reporter: alec, Unassigned)

Details

(Whiteboard: webgl-perf)

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.116 Safari/537.36

Steps to reproduce:

WebGLContext.cpp is testing that srcStride and dstStride are equal in both gl.texImage2D and gl.texSubImage2D and not taking the fast path (direct to the call).  Instead it's padding out buffers to dstStride.  This is completely unnecessary, and slows uploads.  One good example is copying small tiles to a larger buffer.  This is something fairly common to do.



Actual results:

void
WebGLContext::TexImage2D_base(GLenum target, GLint level, GLenum internalformat,
                              GLsizei width, GLsizei height, GLsizei srcStrideOrZero,
                              GLint border,
                              GLenum format, GLenum type,
                              void *data, uint32_t byteLength,
                              int jsArrayType, // a TypedArray format enum, or -1 if not relevant
                              WebGLTexelFormat srcFormat, bool srcPremultiplied)
{

 if (actualSrcFormat == dstFormat &&
            srcPremultiplied == mPixelStorePremultiplyAlpha &&
            srcStride == dstStride && <-- this should not be tests
            !mPixelStoreFlipY)
        {
            // no conversion, no flipping, so we avoid copying anything and just pass the source pointer
            error = CheckedTexImage2D(target, level, internalformat,
                                      width, height, border, format, type, data);
        }
        else
        {
            size_t convertedDataSize = height * dstStride;
            nsAutoArrayPtr<uint8_t> convertedData(new uint8_t[convertedDataSize]);
            ConvertImage(width, height, srcStride, dstStride,
                        static_cast<uint8_t*>(data), convertedData,
                        actualSrcFormat, srcPremultiplied,
                        dstFormat, mPixelStorePremultiplyAlpha, dstTexelSize);
            error = CheckedTexImage2D(target, level, internalformat,
                                      width, height, border, format, type, convertedData);
        }



void
WebGLContext::TexSubImage2D_base(GLenum target, GLint level,
                                 GLint xoffset, GLint yoffset,
                                 GLsizei width, GLsizei height, GLsizei srcStrideOrZero,
                                 GLenum format, GLenum type,
                                 void *pixels, uint32_t byteLength,
                                 int jsArrayType,
                                 WebGLTexelFormat srcFormat, bool srcPremultiplied)
{ 
...

if (actualSrcFormat == dstFormat &&
        srcPremultiplied == mPixelStorePremultiplyAlpha &&
        srcStride == dstStride &&  <- this should be tested
        !mPixelStoreFlipY)
    {
        // no conversion, no flipping, so we avoid copying anything and just pass the source pointer
        gl->fTexSubImage2D(target, level, xoffset, yoffset, width, height, format, realType, pixels);
    }
    else
    {
        size_t convertedDataSize = height * dstStride;
        nsAutoArrayPtr<uint8_t> convertedData(new uint8_t[convertedDataSize]);
        ConvertImage(width, height, srcStride, dstStride,
                    static_cast<const uint8_t*>(pixels), convertedData,
                    actualSrcFormat, srcPremultiplied,
                    dstFormat, mPixelStorePremultiplyAlpha, dstTexelSize);

        gl->fTexSubImage2D(target, level, xoffset, yoffset, width, height, format, realType, convertedData);
    }



Expected results:

This should always be taking the fast path for uploads.  texSubImage2D can handle mismatched stride uploads, and this is the only way to copy smaller images to larger ones.
Whiteboard: webgl-perf
I can't get the slow path taken. Here's my small test app.  My builds of Firefox don't report variables at breakpoints, so while I can see code paths taken, I can't inspect to see why.   This may not be a bug afterall, but I thought these were the cases that would fail.


<!DOCTYPE html>
<html>
<head>
	<meta charset="utf-8" />
	<title>Example</title>
</head>
<body>
<script type="text/javascript">

var firstTime = true;

function draw() {
	try {
		var gl = document.getElementById("webgl")
			.getContext("experimental-webgl");
		if (!gl) { throw "x"; }
	} catch (err) {
		throw "Your web browser does not support WebGL!";
	}
	
	if (!firstTime)
		return;
	
	console.log("test upload begin");
	
	firstTime = true;
	
	// allocate one pixel to upload data
	var width = 2049;
	var height = 2048; // height must be < width for uploads below
	var data = new Uint8Array(width * height * 4);
	
	var id = gl.createTexture();
	gl.bindTexture(gl.TEXTURE_2D, id);
	
	gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MAG_FILTER, gl.LINEAR);
        gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MIN_FILTER, gl.LINEAR);
        gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_WRAP_S, gl.CLAMP_TO_EDGE);
        gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_WRAP_T, gl.CLAMP_TO_EDGE);
   
	// create a buffer and upload to it, note that data will be padded if not same size
	gl.texImage2D(gl.TEXTURE_2D, 0, gl.RGBA, width, height, 0, gl.RGBA, gl.UNSIGNED_BYTE, data);
	
	// this doesn't take slow path, but why ?
	gl.texSubImage2D(gl.TEXTURE_2D, 0, 0,0, 1, height, gl.RGBA, gl.UNSIGNED_BYTE, data);
	
	gl.texSubImage2D(gl.TEXTURE_2D, 0, 0,0, 1, 1, gl.RGBA, gl.UNSIGNED_BYTE, data);

	gl.texSubImage2D(gl.TEXTURE_2D, 0, 0,0, height, 1, gl.RGBA, gl.UNSIGNED_BYTE, data);
	
	gl.bindTexture(gl.TEXTURE_2D, null);
	
	console.log("test upload end");
	
}

function init() {
	try {
		draw();
	} catch (e) {
		alert("Error: "+e);
	}
}
setTimeout(init, 100);

</script>
<canvas id="webgl" width="640" height="480"></canvas>
</body>
</html>
Component: Untriaged → Canvas: WebGL
Product: Firefox → Core
Type: defect → enhancement
Priority: -- → P3

I would like to get the input from some experts, because it feels like our app/library is severely affected by the bottleneck described, here. It's difficult for me to make sense or debug the current c++ sources, but here's the situation:

We are dynamically populating a large texture atlas via several texSubImage2D calls. This works very well, however is dramatically (order of magnitude) slower in FF, than in all other WebGL2 capable browsers we tested. In fact the speed seems to be the same when we always upload the whole texture atlas image, which is often several 10.000 times larger than the small subimages we want to create and copy incrementally. Thus we believe we run into a case like above where the strides mismatch and a very slow path is chosen instead of the actual "subtexcopy".

What is the best way in FF to modify a large texture incrementally? Isn't texSubImage2D the way to go?
What would you say is the current state of the implementation? Is copying a large canvas really the same performance as copying only a tiny fraction of it? This is marked as an enhancement, however considering the huge performance difference to other implementations, if this can be fixed, I would rather see this marked as a defect, than a possible enhancement.

Here's our demo (no sources for now, sorry): https://live.yworks.com/demos/view/rendering-optimizations/index.html (enable "node labels" and then switch to the large "balloon" example in the drop-down in the top right corner - this takes about 10 times as long (20 seconds) in FF than it does in Edgium on Windows on my machine.

As a side note: Could this be related to the following warnings we get (once) on the console?

WebGL warning: texSubImage: Texture has not been initialized prior to a partial upload, forcing the browser to clear it. This may be slow.
WebGL warning: texSubImage: Tex image TEXTURE_2D level 0 is incurring lazy initialization.
WebGL warning: copyTexSubImage: Texture has not been initialized prior to a partial upload, forcing the browser to clear it. This may be slow.
WebGL warning: copyTexSubImage: Tex image TEXTURE_2D level 0 is incurring lazy initialization.

We found that by actually initializing the large texture we get rid of the warning, but ironically doing the initialization is way more expensive than "forcing the browser to clear it", so actually ignoring the warning seems the best alternative.

Thanks!

TexSubImage is probably what you want, yes. What is your actual call and state? What does "copying a large canvas" mean? Copying from canvases is generally slower in Firefox, particularly if you're copying from another WebGL canvas. You would need to describe more about what you're trying to do and how you're doing it.

The initialization warnings are pushing you to doing a glClear before using only part of a resource. The clearing code is more complicate than you would expect, and so sometimes our internal lazy clear doesn't hit the glClear/glClearBuffer fast path, but instead callocs and uploads zeros. The warnings are pushing you to "please clear efficiently".

These sound like multiple bugs, and I'm not sure how connected they are to this bug. Please file new bugs for each of the issues you are running into.

I filed a bug (https://bugzilla.mozilla.org/show_bug.cgi?id=1719154) on the texSubImage2D performance issue, which occurs independently of any initialization warnings. See https://jsfiddle.net/gnfu7tyo/1/

Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.