Closed Bug 743585 Opened 13 years ago Closed 11 years ago

Webgl hits the slow path for glReadPixels with LLVMpipe

Categories

(Core :: Graphics: CanvasWebGL, defect)

11 Branch
x86
Linux
defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: cosinusoidaly, Unassigned)

References

Details

(Whiteboard: webgl-driver)

Attachments

(2 files)

I've been using LLVMpipe (Mesa 8.0.2, LLVM 3.0) under Ubuntu 10.10 to render webgl content. I've noticed that Firefox seems to be hitting the slow path in Mesa readpixels code (Mesa-8.0.2/src/mesa/main/readpix.c). If I modify mesa's code I can get a decent speed up in Firefox webgl rendering (if the particular demo is not too shader heavy). I think Firefox may be attempting to read RGBA data, which is then causing it to go through the fallback, though from Benoit Jacob's reply elsewhere (731836), that may not be the case. I originally posted the following on another bug: I've also hacked my version of Mesa to fast path glReadPixels. With Mesa 8.0.2 the developers significantly improved the performance of glReadPixels. Problem is, the performance boost for llvmpipe is for BGRA data only. Firefox seems to read RGBA data, so it doesn't benefit at all. I bodged the code to force a fast path for RGBA data. My patch regresses BRGA performance (and rendering) but boosts RGBA performance (on my machine) by up to 850%. Before (stats from mesa-demos-8.0.1/src/perf/readpixels): glReadPixels(10 x 10, RGBA/ubyte): 12002.9 images/sec, 4.6 Mpixels/sec glReadPixels(100 x 100, RGBA/ubyte): 2229.7 images/sec, 85.1 Mpixels/sec glReadPixels(500 x 500, RGBA/ubyte): 109.8 images/sec, 104.7 Mpixels/sec glReadPixels(1000 x 1000, RGBA/ubyte): 27.8 images/sec, 106.2 Mpixels/sec glReadPixels(10 x 10, BGRA/ubyte): 12760.1 images/sec, 4.9 Mpixels/sec glReadPixels(100 x 100, BGRA/ubyte): 11959.1 images/sec, 456.2 Mpixels/sec glReadPixels(500 x 500, BGRA/ubyte): 5736.7 images/sec, 5470.9 Mpixels/sec glReadPixels(1000 x 1000, BGRA/ubyte): 567.0 images/sec, 2162.9 Mpixels/sec After: glReadPixels(10 x 10, RGBA/ubyte): 12226.9 images/sec, 4.7 Mpixels/sec glReadPixels(100 x 100, RGBA/ubyte): 8047.2 images/sec, 307.0 Mpixels/sec glReadPixels(500 x 500, RGBA/ubyte): 973.4 images/sec, 928.3 Mpixels/sec glReadPixels(1000 x 1000, RGBA/ubyte): 250.5 images/sec, 955.5 Mpixels/sec glReadPixels(10 x 10, BGRA/ubyte): 12291.1 images/sec, 4.7 Mpixels/sec glReadPixels(100 x 100, BGRA/ubyte): 8287.3 images/sec, 316.1 Mpixels/sec glReadPixels(500 x 500, BGRA/ubyte): 944.6 images/sec, 900.9 Mpixels/sec glReadPixels(1000 x 1000, BGRA/ubyte): 239.9 images/sec, 915.2 Mpixels/sec Patch (a horrible hack, but it works for firefox): --- src/mesa/main/readpix.c 2012-04-08 18:14:38.263151001 +0100 +++ src/mesa/main/readpix.c 2012-04-08 19:30:21.811151008 +0100 @@ -209,8 +209,6 @@ GLubyte *dst, *map; int dstStride, stride, j, texelBytes; - if (!_mesa_format_matches_format_and_type(rb->Format, format, type)) - return GL_FALSE; /* check for things we can't handle here */ if (packing->SwapBytes || @@ -240,10 +238,19 @@ } texelBytes = _mesa_get_format_bytes(rb->Format); + + uint32_t dst_off=0,map_off=0; + uint32_t k; + for (j = 0; j < height; j++) { - memcpy(dst, map, width * texelBytes); - dst += dstStride; - map += stride; + for(k=0;k<width * texelBytes;k=k+4){ + dst[dst_off+k+2]=map[map_off+k+0]; + dst[dst_off+k+1]=map[map_off+k+1]; + dst[dst_off+k+0]=map[map_off+k+2]; + dst[dst_off+k+3]=map[map_off+k+3]; + } + dst_off += dstStride; + map_off += stride; } ctx->Driver.UnmapRenderbuffer(ctx, rb);
First of all let's check that you don't have layers acceleration enabled (it's not on by default on linux). The about:support page should say: "GPU Accelerated Windows: 0". Non-accelerated layers is what we want to optimize for here, since people who get layers acceleration almost certainly also get hardware-accelerated WebGL, hence don't need LLVMpipe. Your findings surprise me, because _in theory_ we should be reading back the WebGL canvas into BGRA format, not RGBA. The reading back occurs in GLContext::ReadPixelsIntoImageSurface. This calls GLContext::GetOptimalReadFormats to determine which format to read in. This is in gfx/gl/GLContext.cpp. Useful code search tools include http://mxr.mozilla.org/mozilla-central/ and http://dxr.mozilla.org/ . I tried here (desktop linux, default config) and it does read back in BGRA format. Checked by setting a breakpoint in GLContext::ReadPixelsIntoImageSurface. What you could do is open gfx/gl/GLContext.h, search for fReadPixels, set a breakpoint there (or add a printf) and check what formats are really used. That would be the definitive experience. Also, it would be very useful to profile this, to understand the speed difference. Build firefox with --enable-profiling, and use perf (the linux profiler).
I had a bit more of a dig around in Mesa. I think you are right that Firefox *is* attempting to read back in BGRA format. I still don't understand why if I force the fast path by doing: @@ -209,8 +209,6 @@ GLubyte *dst, *map; int dstStride, stride, j, texelBytes; - if (!_mesa_format_matches_format_and_type(rb->Format, format, type)) - return GL_FALSE; I then I get a performance boost, but on screen the R and B channels are flipped. I think this might be a Mesa bug, or failing that they may be able to understand what's going on better than I can. I'll file a bug report over there and see what they say.
From about:support Adapter DescriptionVMware, Inc. -- Gallium 0.4 on llvmpipe (LLVM 0x300)Driver Version2.1 Mesa 8.0.2WebGL RendererVMware, Inc. -- Gallium 0.4 on llvmpipe (LLVM 0x300) -- 2.1 Mesa 8.0.2GPU Accelerated Windows 0
I'm very short of disk space at the moment so I am unable to build firefox. I have managed to build a debug build of mesa and attached gdb to a running firefox instance. I think the description below is a bit convoluted. tl;dr version: Firefox creates a RGBA glx pixmap, draws to it, then reads it back with glReadPixels. Requested format in glReadPixels is BGRA. Because the pixmap format and glReadPixels format differ you hit the slow path. Longer version: I set a break point on glXCreateNewContext. From the backtrace, glXCreateNewContext was called with the following arguments: #1 0xa9290bc7 in glXCreateNewContext (dpy=0xb757d000, config=0xa274e790, renderType=32788, shareCtx=0xaa9857c0, direct=1) at glx_api.c:2118 so renderType is 32788 . From GL/glxext.h that is GLX_RGBA_TYPE (0x8014) . So I assume you are creating an RGBA context to draw all your GL content to. I don't have a debug build of firefox but from mxr I think the relevant call is http://mxr.mozilla.org/mozilla-central/source/gfx/gl/GLContextProviderGLX.cpp#710 ie context = sGLXLibrary.xCreateNewContext(display, cfg, GLX_RGBA_TYPE, shareContext ? shareContext->mContext : NULL, True); I then added a breakpoint to fast_read_rgba_pixels_memcpy (which will be called when you call glReadPixels). I get the following: Breakpoint 5, fast_read_rgba_pixels_memcpy (ctx=0xa11e8000, x=0, y=0, width=500, height=500, format=32993, type=33639, pixels=0x9f401000, packing=0xbfb7a2e0, transferOps=2048) at main/readpix.c:208 208 struct gl_renderbuffer *rb = ctx->ReadBuffer->_ColorReadBuffer; (gdb) step 212 if (!_mesa_format_matches_format_and_type(rb->Format, format, type)) (gdb) step _mesa_format_matches_format_and_type (gl_format=MESA_FORMAT_RGBA8888_REV, format=32993, type=33639) at main/formats.c:2528 2528 const GLboolean littleEndian = _mesa_little_endian(); Which confirms that your render buffer is MESA_FORMAT_RGBA8888_REV ie a RGBA render buffer. It also confirms that you are attempting to read BGRA (format 32993 which, from GL/gl.h , is GL_BGRA (0x80E1)) If I then step a couple more times Firefox then falls off the fast path as your requested format is BGRA but the source format is RGBA: 2528 const GLboolean littleEndian = _mesa_little_endian(); (gdb) step _mesa_little_endian () at main/imports.h:522 522 const GLuint ui = 1; /* intentionally not static */ (gdb) step 523 return *((const GLubyte *) &ui); (gdb) step 524 } (gdb) step _mesa_format_matches_format_and_type (gl_format=MESA_FORMAT_RGBA8888_REV, format=32993, type=33639) at main/formats.c:2539 2539 switch (gl_format) { (gdb) step 2552 return ((format == GL_RGBA && type == GL_UNSIGNED_INT_8_8_8_8_REV)); (gdb) step 2841 } (gdb) step fast_read_rgba_pixels_memcpy (ctx=0xa11e8000, x=0, y=0, width=500, height=500, format=32993, type=33639, pixels=0x9f401000, packing=0xbfb7a2e0, transferOps=2048) at main/readpix.c:213 213 return GL_FALSE; (gdb) step 252 } (gdb) step read_rgba_pixels (ctx=0xa11e8000, x=0, y=0, width=500, height=500, format=32993, type=33639, pixels=0x9f401000, packing=0xbfb7a2e0) at main/readpix.c:337 337 slow_read_rgba_pixels(ctx, x, y, width, height, (gdb) step slow_read_rgba_pixels (ctx=0xa11e8000, x=0, y=0, width=500, height=500, format=32993, type=33639, pixels=0x9f401000, packing=0xbfb7a2e0, transferOps=2048) at main/readpix.c:263 263 struct gl_renderbuffer *rb = ctx->ReadBuffer->_ColorReadBuffer;
So this looks like an effective dupe of bug 729726. Most implementations tend to use BGRA internally, so BGRA readback is fastest. (It's pretty easy to see ANGLE doing this in its code) We should, however, deliberately request BGRA where available, and not rely on what is 'common' on platforms. It's relevant to note that we never draw to the pixmap/pbuffer (I forget which) we get from GLX, and instead only render to/from framebuffer objects backed by textures for sharing. Currently, we just request RGBA textures, but we should really try to request a backing texture with BGRA if we're going to be reading back as BGRA. Not marking as dupe yet, in case Mesa does not expose that it can back with BGRA textures. Can you dump the list of GL extensions llvmpipe supplies and post it here?
Status: UNCONFIRMED → NEW
Ever confirmed: true
Depends on: 729726
Attached file glx output
Comment on attachment 613469 [details] glx output Great, we have GL_EXT_bgra, which should allow us to create a BGRA framebuffer backing.
(In reply to Jeff Gilbert [:jgilbert] from comment #7) > Comment on attachment 613469 [details] > glx output > > Great, we have GL_EXT_bgra, which should allow us to create a BGRA > framebuffer backing. It turns out this extension doesn't supply this functionality. More on this in bug 729726. It looks like there is little we can do within GL, but we *can* blocklist on LLVMPipe to disable BGRA-readback. However, what should likely be done is an upstream fix of their RGBA=>BGRA slow path. They should be much faster than they are. It is also clear we should have a hidden pref to disable BGRA-readback for performance debugging. PS: Haha, it looks like they're using a quite-slow unpack+pack two step conversion, which we know to be much slower than array look-ups, which is slower still than bit-shift swizzling. (We use a similar method at the moment for texture format conversions, and it's coming out soon in favor of a much faster lib I'm working on)
Shall I go ahead and file a bug with mesa? Just a clarification on the stats above. With the readpixels benchmark (see below) the backing store is MESA_FORMAT_ARGB8888 (which I assume is BGRA, little endian). So, the fast path is hit for BGRA. With FF the backing store is RGBA, so it hits the fast path for RGBA. I think strictly my description is still subtly wrong. And the description I am about to offer is probably subtly wrong too, but in a subtly different way :P fast_read_rgba_pixels_memcpy calls ctx->Driver.MapRenderbuffer , which causes llvmpipe to rasterize its internal backing store (I think it's some kind of tiled render buffer) into a temporary, linear, frame buffer. Anyway, the point I'm trying to make is that the "backing store" I refer to above is actually created on the fly when you call glReadPixels, so the actual pixel order llvmpipe renders to internally may even be different, it may just swizzle the pixels when you do ctx->Driver.MapRenderbuffer . I hope that kind of makes sense. I'd write a more succinct explaination, but I'm half asleep atm. From readpixels (on a different machine) Breakpoint 1, glReadPixels () at glapi_x86-64.S:9163 9163 pushq %rdi (gdb) break fast_read_rgba_pixels_memcpy Breakpoint 2 at 0x7ffff66bf581: file main/readpix.c, line 208. (gdb) continue Continuing. Breakpoint 2, fast_read_rgba_pixels_memcpy (ctx=0x6b9780, x=567, y=475, width=10, height=10, format=6408, type=5121, pixels=0x78cfd0, packing=0x7fffffffda00, transferOps=2048) at main/readpix.c:208 208 struct gl_renderbuffer *rb = ctx->ReadBuffer->_ColorReadBuffer; (gdb) step 212 if (!_mesa_format_matches_format_and_type(rb->Format, format, type)) (gdb) step _mesa_format_matches_format_and_type (gl_format=MESA_FORMAT_ARGB8888, format=6408, type=5121) at main/formats.c:2528 2528 const GLboolean littleEndian = _mesa_little_endian();
(In reply to Liam Wilson from comment #9) > Shall I go ahead and file a bug with mesa? > > Just a clarification on the stats above. With the readpixels benchmark (see > below) the backing store is MESA_FORMAT_ARGB8888 (which I assume is BGRA, > little endian). So, the fast path is hit for BGRA. With FF the backing store > is RGBA, so it hits the fast path for RGBA. Interesting. How do they get a BGRA renderbuffer for this? Do they store RGBA RBs as BGRA, but RGBA textures as RGBA?
(In reply to Jeff Gilbert [:jgilbert] from comment #10) > (In reply to Liam Wilson from comment #9) > > Shall I go ahead and file a bug with mesa? > > > > Just a clarification on the stats above. With the readpixels benchmark (see > > below) the backing store is MESA_FORMAT_ARGB8888 (which I assume is BGRA, > > little endian). So, the fast path is hit for BGRA. With FF the backing store > > is RGBA, so it hits the fast path for RGBA. > > Interesting. How do they get a BGRA renderbuffer for this? Do they store > RGBA RBs as BGRA, but RGBA textures as RGBA? I've no idea, sorry. I don't know that much about the internals of LLVMpipe. I've only just started digging around in its source. I don't think I've really been getting the whole picture with gdb. When I get a chance I'm probably going to have a play with José Fonseca's ApiTrace. At least then I should be able to get my head around exactly how Firefox initializes, paints to, and reads back from buffers. I've reported a bug to mesa (https://bugs.freedesktop.org/show_bug.cgi?id=48545 ) .
Attached file Simple webgl benchmark
I knocked together a simple benchmark that does: gl.clearColor(t, t, t, 1.0); // where t cycles from 0 to 1 over the course of a second gl.clear(gl.COLOR_BUFFER_BIT); as many times per second as mozRequestAnimationFrame will allow. This is with a 1280x720 context. The code is based on one of the learningwebgl.com lessons. With a standard mesa master from git I get 34fps. With the patch proposed on https://bugs.freedesktop.org/show_bug.cgi?id=48545 I get 64 fps. Problem is, that patch only fast paths MESA_FORMAT_RGBA8888_REV . If I modify the test to do canvas.getContext("experimental-webgl",{ alpha: false }) I instead get a MESA_FORMAT_XRGB8888 render buffer, and I'm back to 34fps. I've noticed that, even with the patch, my CPU usage is still very high (Firefox consumes a whole core, and Xorg is consuming about a quarter of a core). I'll try and do some profiling, but I assume this could be due to how you paint the buffer to the screen using cario and X?
Yes, it's a known problem that Cairo X11 makes us use a lot of CPU for basic compositing, in a way we can't control. The main bug to track in this area is bug 720523 depending on bug 738937.
The bug we opened with Mesa was fixed a while ago. I'm going to mark this WFM unless we still aren't getting the fastpath on recent Mesa drivers.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → WORKSFORME
Whiteboard: webgl-driver
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: