Closed
Bug 743585
Opened 13 years ago
Closed 11 years ago
Webgl hits the slow path for glReadPixels with LLVMpipe
Categories
(Core :: Graphics: CanvasWebGL, defect)
Tracking
()
RESOLVED
WORKSFORME
People
(Reporter: cosinusoidaly, Unassigned)
References
Details
(Whiteboard: webgl-driver)
Attachments
(2 files)
I've been using LLVMpipe (Mesa 8.0.2, LLVM 3.0) under Ubuntu 10.10 to render webgl content. I've noticed that Firefox seems to be hitting the slow path in Mesa readpixels code (Mesa-8.0.2/src/mesa/main/readpix.c). If I modify mesa's code I can get a decent speed up in Firefox webgl rendering (if the particular demo is not too shader heavy). I think Firefox may be attempting to read RGBA data, which is then causing it to go through the fallback, though from Benoit Jacob's reply elsewhere (731836), that may not be the case.
I originally posted the following on another bug:
I've also hacked my version of Mesa to fast path glReadPixels. With Mesa 8.0.2 the developers significantly improved the performance of glReadPixels. Problem is, the performance boost for llvmpipe is for BGRA data only. Firefox seems to read RGBA data, so it doesn't benefit at all. I bodged the code to force a fast path for RGBA data. My patch regresses BRGA performance (and rendering) but boosts RGBA performance (on my machine) by up to 850%.
Before (stats from mesa-demos-8.0.1/src/perf/readpixels):
glReadPixels(10 x 10, RGBA/ubyte): 12002.9 images/sec, 4.6 Mpixels/sec
glReadPixels(100 x 100, RGBA/ubyte): 2229.7 images/sec, 85.1 Mpixels/sec
glReadPixels(500 x 500, RGBA/ubyte): 109.8 images/sec, 104.7 Mpixels/sec
glReadPixels(1000 x 1000, RGBA/ubyte): 27.8 images/sec, 106.2 Mpixels/sec
glReadPixels(10 x 10, BGRA/ubyte): 12760.1 images/sec, 4.9 Mpixels/sec
glReadPixels(100 x 100, BGRA/ubyte): 11959.1 images/sec, 456.2 Mpixels/sec
glReadPixels(500 x 500, BGRA/ubyte): 5736.7 images/sec, 5470.9 Mpixels/sec
glReadPixels(1000 x 1000, BGRA/ubyte): 567.0 images/sec, 2162.9 Mpixels/sec
After:
glReadPixels(10 x 10, RGBA/ubyte): 12226.9 images/sec, 4.7 Mpixels/sec
glReadPixels(100 x 100, RGBA/ubyte): 8047.2 images/sec, 307.0 Mpixels/sec
glReadPixels(500 x 500, RGBA/ubyte): 973.4 images/sec, 928.3 Mpixels/sec
glReadPixels(1000 x 1000, RGBA/ubyte): 250.5 images/sec, 955.5 Mpixels/sec
glReadPixels(10 x 10, BGRA/ubyte): 12291.1 images/sec, 4.7 Mpixels/sec
glReadPixels(100 x 100, BGRA/ubyte): 8287.3 images/sec, 316.1 Mpixels/sec
glReadPixels(500 x 500, BGRA/ubyte): 944.6 images/sec, 900.9 Mpixels/sec
glReadPixels(1000 x 1000, BGRA/ubyte): 239.9 images/sec, 915.2 Mpixels/sec
Patch (a horrible hack, but it works for firefox):
--- src/mesa/main/readpix.c 2012-04-08 18:14:38.263151001 +0100
+++ src/mesa/main/readpix.c 2012-04-08 19:30:21.811151008 +0100
@@ -209,8 +209,6 @@
GLubyte *dst, *map;
int dstStride, stride, j, texelBytes;
- if (!_mesa_format_matches_format_and_type(rb->Format, format, type))
- return GL_FALSE;
/* check for things we can't handle here */
if (packing->SwapBytes ||
@@ -240,10 +238,19 @@
}
texelBytes = _mesa_get_format_bytes(rb->Format);
+
+ uint32_t dst_off=0,map_off=0;
+ uint32_t k;
+
for (j = 0; j < height; j++) {
- memcpy(dst, map, width * texelBytes);
- dst += dstStride;
- map += stride;
+ for(k=0;k<width * texelBytes;k=k+4){
+ dst[dst_off+k+2]=map[map_off+k+0];
+ dst[dst_off+k+1]=map[map_off+k+1];
+ dst[dst_off+k+0]=map[map_off+k+2];
+ dst[dst_off+k+3]=map[map_off+k+3];
+ }
+ dst_off += dstStride;
+ map_off += stride;
}
ctx->Driver.UnmapRenderbuffer(ctx, rb);
Comment 1•13 years ago
|
||
First of all let's check that you don't have layers acceleration enabled (it's not on by default on linux). The about:support page should say: "GPU Accelerated Windows: 0". Non-accelerated layers is what we want to optimize for here, since people who get layers acceleration almost certainly also get hardware-accelerated WebGL, hence don't need LLVMpipe.
Your findings surprise me, because _in theory_ we should be reading back the WebGL canvas into BGRA format, not RGBA. The reading back occurs in GLContext::ReadPixelsIntoImageSurface. This calls GLContext::GetOptimalReadFormats to determine which format to read in. This is in gfx/gl/GLContext.cpp. Useful code search tools include http://mxr.mozilla.org/mozilla-central/ and http://dxr.mozilla.org/ .
I tried here (desktop linux, default config) and it does read back in BGRA format. Checked by setting a breakpoint in GLContext::ReadPixelsIntoImageSurface.
What you could do is open gfx/gl/GLContext.h, search for fReadPixels, set a breakpoint there (or add a printf) and check what formats are really used. That would be the definitive experience.
Also, it would be very useful to profile this, to understand the speed difference. Build firefox with --enable-profiling, and use perf (the linux profiler).
Reporter | ||
Comment 2•13 years ago
|
||
I had a bit more of a dig around in Mesa. I think you are right that Firefox *is* attempting to read back in BGRA format. I still don't understand why if I force the fast path by doing:
@@ -209,8 +209,6 @@
GLubyte *dst, *map;
int dstStride, stride, j, texelBytes;
- if (!_mesa_format_matches_format_and_type(rb->Format, format, type))
- return GL_FALSE;
I then I get a performance boost, but on screen the R and B channels are flipped.
I think this might be a Mesa bug, or failing that they may be able to understand what's going on better than I can. I'll file a bug report over there and see what they say.
Reporter | ||
Comment 3•13 years ago
|
||
From about:support
Adapter DescriptionVMware, Inc. -- Gallium 0.4 on llvmpipe (LLVM 0x300)Driver Version2.1 Mesa 8.0.2WebGL RendererVMware, Inc. -- Gallium 0.4 on llvmpipe (LLVM 0x300) -- 2.1 Mesa 8.0.2GPU Accelerated Windows 0
Reporter | ||
Comment 4•13 years ago
|
||
I'm very short of disk space at the moment so I am unable to build firefox. I have managed to build a debug build of mesa and attached gdb to a running firefox instance.
I think the description below is a bit convoluted. tl;dr version: Firefox creates a RGBA glx pixmap, draws to it, then reads it back with glReadPixels. Requested format in glReadPixels is BGRA. Because the pixmap format and glReadPixels format differ you hit the slow path.
Longer version:
I set a break point on glXCreateNewContext. From the backtrace, glXCreateNewContext was called with the following arguments:
#1 0xa9290bc7 in glXCreateNewContext (dpy=0xb757d000, config=0xa274e790,
renderType=32788, shareCtx=0xaa9857c0, direct=1) at glx_api.c:2118
so renderType is 32788 . From GL/glxext.h that is GLX_RGBA_TYPE (0x8014) . So I assume you are creating an RGBA context to draw all your GL content to.
I don't have a debug build of firefox but from mxr I think the relevant call is
http://mxr.mozilla.org/mozilla-central/source/gfx/gl/GLContextProviderGLX.cpp#710
ie
context = sGLXLibrary.xCreateNewContext(display,
cfg,
GLX_RGBA_TYPE,
shareContext ? shareContext->mContext : NULL,
True);
I then added a breakpoint to fast_read_rgba_pixels_memcpy (which will be called when you call glReadPixels). I get the following:
Breakpoint 5, fast_read_rgba_pixels_memcpy (ctx=0xa11e8000, x=0, y=0,
width=500, height=500, format=32993, type=33639, pixels=0x9f401000,
packing=0xbfb7a2e0, transferOps=2048) at main/readpix.c:208
208 struct gl_renderbuffer *rb = ctx->ReadBuffer->_ColorReadBuffer;
(gdb) step
212 if (!_mesa_format_matches_format_and_type(rb->Format, format, type))
(gdb) step
_mesa_format_matches_format_and_type (gl_format=MESA_FORMAT_RGBA8888_REV,
format=32993, type=33639) at main/formats.c:2528
2528 const GLboolean littleEndian = _mesa_little_endian();
Which confirms that your render buffer is MESA_FORMAT_RGBA8888_REV ie a RGBA render buffer. It also confirms that you are attempting to read BGRA (format 32993 which, from GL/gl.h , is GL_BGRA (0x80E1))
If I then step a couple more times Firefox then falls off the fast path as your requested format is BGRA but the source format is RGBA:
2528 const GLboolean littleEndian = _mesa_little_endian();
(gdb) step
_mesa_little_endian () at main/imports.h:522
522 const GLuint ui = 1; /* intentionally not static */
(gdb) step
523 return *((const GLubyte *) &ui);
(gdb) step
524 }
(gdb) step
_mesa_format_matches_format_and_type (gl_format=MESA_FORMAT_RGBA8888_REV,
format=32993, type=33639) at main/formats.c:2539
2539 switch (gl_format) {
(gdb) step
2552 return ((format == GL_RGBA && type == GL_UNSIGNED_INT_8_8_8_8_REV));
(gdb) step
2841 }
(gdb) step
fast_read_rgba_pixels_memcpy (ctx=0xa11e8000, x=0, y=0, width=500, height=500,
format=32993, type=33639, pixels=0x9f401000, packing=0xbfb7a2e0,
transferOps=2048) at main/readpix.c:213
213 return GL_FALSE;
(gdb) step
252 }
(gdb) step
read_rgba_pixels (ctx=0xa11e8000, x=0, y=0, width=500, height=500,
format=32993, type=33639, pixels=0x9f401000, packing=0xbfb7a2e0)
at main/readpix.c:337
337 slow_read_rgba_pixels(ctx, x, y, width, height,
(gdb) step
slow_read_rgba_pixels (ctx=0xa11e8000, x=0, y=0, width=500, height=500,
format=32993, type=33639, pixels=0x9f401000, packing=0xbfb7a2e0,
transferOps=2048) at main/readpix.c:263
263 struct gl_renderbuffer *rb = ctx->ReadBuffer->_ColorReadBuffer;
Comment 5•13 years ago
|
||
So this looks like an effective dupe of bug 729726. Most implementations tend to use BGRA internally, so BGRA readback is fastest. (It's pretty easy to see ANGLE doing this in its code) We should, however, deliberately request BGRA where available, and not rely on what is 'common' on platforms.
It's relevant to note that we never draw to the pixmap/pbuffer (I forget which) we get from GLX, and instead only render to/from framebuffer objects backed by textures for sharing. Currently, we just request RGBA textures, but we should really try to request a backing texture with BGRA if we're going to be reading back as BGRA.
Not marking as dupe yet, in case Mesa does not expose that it can back with BGRA textures.
Can you dump the list of GL extensions llvmpipe supplies and post it here?
Status: UNCONFIRMED → NEW
Ever confirmed: true
Reporter | ||
Comment 6•13 years ago
|
||
Comment 7•13 years ago
|
||
Comment on attachment 613469 [details]
glx output
Great, we have GL_EXT_bgra, which should allow us to create a BGRA framebuffer backing.
Comment 8•13 years ago
|
||
(In reply to Jeff Gilbert [:jgilbert] from comment #7)
> Comment on attachment 613469 [details]
> glx output
>
> Great, we have GL_EXT_bgra, which should allow us to create a BGRA
> framebuffer backing.
It turns out this extension doesn't supply this functionality. More on this in bug 729726. It looks like there is little we can do within GL, but we *can* blocklist on LLVMPipe to disable BGRA-readback. However, what should likely be done is an upstream fix of their RGBA=>BGRA slow path. They should be much faster than they are.
It is also clear we should have a hidden pref to disable BGRA-readback for performance debugging.
PS: Haha, it looks like they're using a quite-slow unpack+pack two step conversion, which we know to be much slower than array look-ups, which is slower still than bit-shift swizzling. (We use a similar method at the moment for texture format conversions, and it's coming out soon in favor of a much faster lib I'm working on)
Reporter | ||
Comment 9•13 years ago
|
||
Shall I go ahead and file a bug with mesa?
Just a clarification on the stats above. With the readpixels benchmark (see below) the backing store is MESA_FORMAT_ARGB8888 (which I assume is BGRA, little endian). So, the fast path is hit for BGRA. With FF the backing store is RGBA, so it hits the fast path for RGBA.
I think strictly my description is still subtly wrong. And the description I am about to offer is probably subtly wrong too, but in a subtly different way :P
fast_read_rgba_pixels_memcpy calls ctx->Driver.MapRenderbuffer , which causes llvmpipe to rasterize its internal backing store (I think it's some kind of tiled render buffer) into a temporary, linear, frame buffer. Anyway, the point I'm trying to make is that the "backing store" I refer to above is actually created on the fly when you call glReadPixels, so the actual pixel order llvmpipe renders to internally may even be different, it may just swizzle the pixels when you do ctx->Driver.MapRenderbuffer .
I hope that kind of makes sense. I'd write a more succinct explaination, but I'm half asleep atm.
From readpixels (on a different machine)
Breakpoint 1, glReadPixels () at glapi_x86-64.S:9163
9163 pushq %rdi
(gdb) break fast_read_rgba_pixels_memcpy
Breakpoint 2 at 0x7ffff66bf581: file main/readpix.c, line 208.
(gdb) continue
Continuing.
Breakpoint 2, fast_read_rgba_pixels_memcpy (ctx=0x6b9780, x=567, y=475,
width=10, height=10, format=6408, type=5121, pixels=0x78cfd0,
packing=0x7fffffffda00, transferOps=2048) at main/readpix.c:208
208 struct gl_renderbuffer *rb = ctx->ReadBuffer->_ColorReadBuffer;
(gdb) step
212 if (!_mesa_format_matches_format_and_type(rb->Format, format, type))
(gdb) step
_mesa_format_matches_format_and_type (gl_format=MESA_FORMAT_ARGB8888,
format=6408, type=5121) at main/formats.c:2528
2528 const GLboolean littleEndian = _mesa_little_endian();
Comment 10•13 years ago
|
||
(In reply to Liam Wilson from comment #9)
> Shall I go ahead and file a bug with mesa?
>
> Just a clarification on the stats above. With the readpixels benchmark (see
> below) the backing store is MESA_FORMAT_ARGB8888 (which I assume is BGRA,
> little endian). So, the fast path is hit for BGRA. With FF the backing store
> is RGBA, so it hits the fast path for RGBA.
Interesting. How do they get a BGRA renderbuffer for this? Do they store RGBA RBs as BGRA, but RGBA textures as RGBA?
Reporter | ||
Comment 11•13 years ago
|
||
(In reply to Jeff Gilbert [:jgilbert] from comment #10)
> (In reply to Liam Wilson from comment #9)
> > Shall I go ahead and file a bug with mesa?
> >
> > Just a clarification on the stats above. With the readpixels benchmark (see
> > below) the backing store is MESA_FORMAT_ARGB8888 (which I assume is BGRA,
> > little endian). So, the fast path is hit for BGRA. With FF the backing store
> > is RGBA, so it hits the fast path for RGBA.
>
> Interesting. How do they get a BGRA renderbuffer for this? Do they store
> RGBA RBs as BGRA, but RGBA textures as RGBA?
I've no idea, sorry. I don't know that much about the internals of LLVMpipe. I've only just started digging around in its source.
I don't think I've really been getting the whole picture with gdb. When I get a chance I'm probably going to have a play with José Fonseca's ApiTrace. At least then I should be able to get my head around exactly how Firefox initializes, paints to, and reads back from buffers.
I've reported a bug to mesa (https://bugs.freedesktop.org/show_bug.cgi?id=48545 ) .
Reporter | ||
Comment 12•13 years ago
|
||
I knocked together a simple benchmark that does:
gl.clearColor(t, t, t, 1.0); // where t cycles from 0 to 1 over the course of a second
gl.clear(gl.COLOR_BUFFER_BIT);
as many times per second as mozRequestAnimationFrame will allow. This is with a 1280x720 context. The code is based on one of the learningwebgl.com lessons.
With a standard mesa master from git I get 34fps. With the patch proposed on https://bugs.freedesktop.org/show_bug.cgi?id=48545 I get 64 fps. Problem is, that patch only fast paths MESA_FORMAT_RGBA8888_REV . If I modify the test to do canvas.getContext("experimental-webgl",{ alpha: false }) I instead get a MESA_FORMAT_XRGB8888 render buffer, and I'm back to 34fps.
I've noticed that, even with the patch, my CPU usage is still very high (Firefox consumes a whole core, and Xorg is consuming about a quarter of a core). I'll try and do some profiling, but I assume this could be due to how you paint the buffer to the screen using cario and X?
Comment 13•13 years ago
|
||
Yes, it's a known problem that Cairo X11 makes us use a lot of CPU for basic compositing, in a way we can't control. The main bug to track in this area is bug 720523 depending on bug 738937.
Comment 14•11 years ago
|
||
The bug we opened with Mesa was fixed a while ago. I'm going to mark this WFM unless we still aren't getting the fastpath on recent Mesa drivers.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → WORKSFORME
Whiteboard: webgl-driver
You need to log in
before you can comment on or make changes to this bug.
Description
•