Closed Bug 616416 Opened 14 years ago Closed 13 years ago

Crash from jQuery with WebGL enabled when it creates a WebGLContext (!) on economist.com [@gl::GLContextGLX::CreateGLContext]

Categories

(Core :: Graphics: CanvasWebGL, defect)

All
Linux
defect
Not set
critical

Tracking

()

RESOLVED DUPLICATE of bug 659842
Tracking Status
blocking2.0 --- -
status2.0 --- ?

People

(Reporter: octoploid, Assigned: bjacob)

References

()

Details

Attachments

(8 files, 1 obsolete file)

User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:2.0b8pre) Gecko/20100101 Firefox/4.0b8pre Build Identifier: Mozilla/5.0 (X11; Linux x86_64; rv:2.0b8pre) Gecko/20100101 Firefox/4.0b8pre Whenever I visit http://www.economist.com/printedition/ and open a few links on that site, firefox crashes: ... New Thread 0x7fffd0442710 (LWP 12319)] [New Thread 0x7fffcfc41710 (LWP 12320)] failed to create drawable failed to create drawable [New Thread 0x7fffbecff710 (LWP 12321)] [New Thread 0x7fffbd8ff710 (LWP 12322)] failed to create drawable Program received signal SIGSEGV, Segmentation fault. 0x00007fffee9abad1 in ?? () from //usr/lib64/opengl/xorg-x11/lib/libGL.so.1 (gdb) bt #0 0x00007fffee9abad1 in ?? () from //usr/lib64/opengl/xorg-x11/lib/libGL.so.1 #1 0x00007fffd15d5aaf in ?? () from /usr/lib64/dri/swrast_dri.so #2 0x00007fffd15d60b3 in ?? () from /usr/lib64/dri/swrast_dri.so #3 0x00007fffd15d6f50 in ?? () from /usr/lib64/dri/swrast_dri.so #4 0x00007fffd15d632c in ?? () from /usr/lib64/dri/swrast_dri.so #5 0x00007fffd15d32f5 in ?? () from /usr/lib64/dri/swrast_dri.so #6 0x00007fffee9abbf9 in ?? () from //usr/lib64/opengl/xorg-x11/lib/libGL.so.1 #7 0x00007fffee989c67 in glXMakeCurrentReadSGI () from //usr/lib64/opengl/xorg-x11/lib/libGL.so.1 #8 0x00007ffff6f3c4c3 in ?? () from /usr/lib64/firefox-4.0b8pre/libxul.so #9 0x00007ffff6f3c765 in ?? () from /usr/lib64/firefox-4.0b8pre/libxul.so #10 0x00007ffff6f3bd7d in ?? () from /usr/lib64/firefox-4.0b8pre/libxul.so #11 0x00007ffff6f3be4e in mozilla::gl::GLContextProviderGLX::CreateOffscreen(gfxIntSize const&, mozilla::gl::ContextFormat const&) () from /usr/lib64/firefox-4.0b8pre/libxul.so #12 0x00007ffff67fcb64 in ?? () from /usr/lib64/firefox-4.0b8pre/libxul.so #13 0x00007ffff68434a8 in ?? () from /usr/lib64/firefox-4.0b8pre/libxul.so #14 0x00007ffff6843d15 in ?? () from /usr/lib64/firefox-4.0b8pre/libxul.so #15 0x00007ffff6ba7386 in ?? () from /usr/lib64/firefox-4.0b8pre/libxul.so #16 0x00007ffff715bee5 in ?? () from /usr/lib64/firefox-4.0b8pre/libxul.so #17 0x00007ffff71518a3 in ?? () from /usr/lib64/firefox-4.0b8pre/libxul.so #18 0x00007fffbdb7869e in ?? () #19 0x0000000100000000 in ?? () #20 0x0000000000000001 in ?? () #21 0x0000000000000000 in ?? () I have no problems whatsoever on all other sites that I visit regularly. Reproducible: Always
This must be related to Bug 589546. On my system it crashes regardless of what mesa renderer is used: OpenGL vendor string: Advanced Micro Devices, Inc. OpenGL renderer string: Mesa DRI R600 (RS780 9614) 20090101 TCL DRI2 OpenGL version string: 1.4 (2.1 Mesa 7.9) OpenGL vendor string: X.Org OpenGL renderer string: Gallium 0.4 on R600 (HD2XXX,HD3XXX) OpenGL version string: 1.4 (2.1 Mesa 7.9) OpenGL extensions: and even the software renderer crashes. When I disable webgl.enabled_for_all_sites firefox no longer crashes.
This is a backtrace running Gallium 0.4: (gdb) bt #0 0x00007fffed8fbac0 in xcb_glx_get_string_string_length () from /usr/lib/libxcb-glx.so.0 #1 0x00007fffeebac9b5 in ?? () from //usr/lib64/opengl/xorg-x11/lib/libGL.so.1 #2 0x00007fffeebaa2c4 in ?? () from //usr/lib64/opengl/xorg-x11/lib/libGL.so.1 #3 0x00007fffeeb8d6a2 in ?? () from //usr/lib64/opengl/xorg-x11/lib/libGL.so.1 #4 0x00007fffeeb8acdb in glXMakeCurrentReadSGI () from //usr/lib64/opengl/xorg-x11/lib/libGL.so.1 #5 0x00007ffff6f3c4c3 in ?? () from /usr/lib64/firefox-4.0b8pre/libxul.so #6 0x00007ffff6f3c765 in ?? () from /usr/lib64/firefox-4.0b8pre/libxul.so #7 0x00007ffff6f3bd7d in ?? () from /usr/lib64/firefox-4.0b8pre/libxul.so #8 0x00007ffff6f3be4e in mozilla::gl::GLContextProviderGLX::CreateOffscreen(gfxIntSize const&, mozilla::gl::ContextFormat const&) () from /usr/lib64/firefox-4.0b8pre/libxul.so #9 0x00007ffff67fcb64 in ?? () from /usr/lib64/firefox-4.0b8pre/libxul.so #10 0x00007ffff68434a8 in ?? () from /usr/lib64/firefox-4.0b8pre/libxul.so #11 0x00007ffff6843d15 in ?? () from /usr/lib64/firefox-4.0b8pre/libxul.so #12 0x00007ffff6ba7386 in ?? () from /usr/lib64/firefox-4.0b8pre/libxul.so #13 0x00007ffff715bee5 in ?? () from /usr/lib64/firefox-4.0b8pre/libxul.so #14 0x00007ffff71518a3 in ?? () from /usr/lib64/firefox-4.0b8pre/libxul.so #15 0x00007fffd264069e in ?? () #16 0x0000000100000000 in ?? () #17 0x00007fffed5db110 in ?? () #18 0x0000000000000000 in ?? ()
Most probably a duplicate of bug 613079 which is about to get resolved. Once that bug is marked as resolved, can you please try the nightly build of the next day.
Depends on: 613079
Just did that and unfortunately the bug is still there. Exactly the same trace as above. I noticed that webgl.enabled_for_all_sites is now disabled by default. As I said above the crash only happens when the preference is enabled.
So, this is really weird, because I really don't think that economist.com is obviously not doing WebGL. Here I don't get the crash (linux x86-64, NVIDIA proprietary driver) There are 2 things that you can do to help me: First, can you please try with beta7. If it doesn't crash, then that's a recent regression, so can you please try nightlies to bisect, to see when it started crashing. Former nightly builds are available at: https://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/ (use the mozilla-central builds). Second, can you run firefox with these environment variables: MOZ_X_SYNC=1 MOZ_GL_DEBUG=1 for example: MOZ_X_SYNC=1 MOZ_GL_DEBUG=1 /path/to/firefox the backtrace should then be more interesting. Even better: after it crashed, rerun firefox, go to about:crashes, give me the link that you get.
beta7 also crashes: http://crash-stats.mozilla.com/report/index/bp-15211fdd-446c-4c34-84dd-077652101208 Setting these environment variables doesn't change the backtrace.
It crashes at the same point down to beta2. beta1 doesn't crash at first, but X11 always crashes when I finally close the firefox window. Backtrace: [376133.558] 0: /usr/bin/X (xorg_backtrace+0x28) [0x49c708] [376133.558] 1: /usr/bin/X (0x400000+0x5e309) [0x45e309] [376133.558] 2: /lib/libpthread.so.0 (0x7f0865250000+0xf490) [0x7f086525f490] [376133.558] 3: /usr/lib64/dri/r600_dri.so (0x7f0862f2d000+0x686d7) [0x7f0862f956d7] [376133.558] 4: /usr/lib64/dri/r600_dri.so (0x7f0862f2d000+0x65856) [0x7f0862f92856] [376133.558] 5: /usr/lib64/dri/r600_dri.so (0x7f0862f2d000+0x27030) [0x7f0862f54030] [376133.558] 6: /usr/lib64/xorg/modules/extensions/libglx.so (0x7f0865e27000+0x426d9) [0x7f0865e696d9] [376133.558] 7: /usr/lib64/xorg/modules/extensions/libglx.so (0x7f0865e27000+0x37b41) [0x7f0865e5eb41] [376133.558] 8: /usr/lib64/xorg/modules/extensions/libglx.so (0x7f0865e27000+0x37bd3) [0x7f0865e5ebd3] [376133.559] 9: /usr/bin/X (FreeResourceByType+0x117) [0x447d97] [376133.559] 10: /usr/lib64/xorg/modules/extensions/libglx.so (0x7f0865e27000+0x340a4) [0x7f0865e5b0a4] [376133.559] 11: /usr/lib64/xorg/modules/extensions/libglx.so (0x7f0865e27000+0x37d6d) [0x7f0865e5ed6d] [376133.559] 12: /usr/bin/X (0x400000+0x2b829) [0x42b829] [376133.559] 13: /usr/bin/X (0x400000+0x1fd2d) [0x41fd2d] [376133.559] 14: /lib/libc.so.6 (__libc_start_main+0xfd) [0x7f086442fcdd] [376133.559] 15: /usr/bin/X (0x400000+0x1f889) [0x41f889] [376133.559] Segmentation fault at address 0x10 [376133.559] Fatal server error: [376133.559] Caught signal 11 (Segmentation fault). Server aborting
(In reply to comment #6) > beta7 also crashes: > http://crash-stats.mozilla.com/report/index/bp-15211fdd-446c-4c34-84dd-077652101208 > > Setting these environment variables doesn't change > the backtrace. Excellent, thanks a lot. In this link I get a stack trace with debug info, very useful.
So it's crashing with a segfault deep inside of glxMakeCurrent. This can't be anything but a driver bug. The best we could do here is to blacklist your driver (it's buggy) but it's not certain we'll have time to do this for firefox 4. What I would really like to understand is how theeconomist.com can lead to this. The stack trace shows clearly that it tried creating a WebGLContext (frame 10 in WebGLContext::SetDimensions). Are you absolutely certain that you don't have another tab with another website here? Do you have extensions installed, that might be playing with WebGL? Can you try creating a new profile (in a terminal, run firefox -P), enable webgl, and check if the problem persists?
Summary: Firefox/4.0b8pre (latest hg) always crashes on economist.com site. → Firefox/4.0b8pre (latest hg) always crashes on economist.com site. [@gl::GLContextGLX::CreateGLContext]
OK, I created a new test profile, started it, enabled webgl and opened http://www.economist.com/printedition/. On the web-page I middle-clicked tree to four links and firefox crashes: http://crash-stats.mozilla.com/report/index/28d8d9d5-e5fc-4066-8eba-24dd72101208 (trace looks identical to the one posted above).
Indeed, this is incredible: I can reproduce the fact that The Economist tries to create a WebGLContext. Not on the front page itself, but for example this article: http://www.economist.com/node/17629709?Story_ID=17629709 This needs to be investigated very closely. I can't think of a legitimate reason why this journal would try to create a WebGL context, when 99% of today's users use browsers that don't support that anyway. Here's a JS stack: (gdb) print DumpJSStack() [Thread 0x7fffd2bfd710 (LWP 17572) exited] 0 anonymous() ["http://www.economist.com/sites/default/files/js/js_098f42439bafb9ba30019ba6d4863a7f.js":4528] a = [object HTMLCanvasElement @ 0x7fffb4366e18 (native @ 0xe58b80)] this = [object Object] 1 anonymous(u = undefined, e = [object HTMLDocument @ 0x365d370 (native @ 0x17943a0)], i = [object Window @ 0x1643900 (native @ 0x2007e48)]) ["http://www.economist.com/sites/default/files/js/js_098f42439bafb9ba30019ba6d4863a7f.js":4537] H = "webgl" R = [function] G = [function] o = [function] Q = [function] w = "webgl" P = flexbox,canvas,canvastext L = [object Object] N = [object Object] d = [object Object] v = [object Object] F = Webkit,Moz,O,ms,Khtml q = ,-webkit-,-moz-,-o-,-ms-,-khtml-, O = [function] M = ":)" h = [object HTMLInputElement @ 0x7fffb435a948 (native @ 0x2210b70)] j = [object CSSStyleDeclaration @ 0x7fffb435a840 (native @ 0xf9f420)] E = [object HTMLUnknownElement @ 0x7fffb435a7e8 (native @ 0x1ae2240)] l = [object HTMLHtmlElement @ 0x7fffd017c160 (native @ 0x369d3b0)] f = [object Object] S = [function] n = [function] D = [function] s = [function] this = [object Window @ 0x1643900 (native @ 0x2007e48)] 2 <TOP LEVEL> ["http://www.economist.com/sites/default/files/js/js_098f42439bafb9ba30019ba6d4863a7f.js":4540] this = [object Window @ 0x1643900 (native @ 0x2007e48)] $1 = void
Assignee: nobody → bjacob
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Summary: Firefox/4.0b8pre (latest hg) always crashes on economist.com site. [@gl::GLContextGLX::CreateGLContext] → Firefox/4.0b8pre (latest hg) always crashes as economist.com creates a WebGLContext (!). [@gl::GLContextGLX::CreateGLContext]
Summary: Firefox/4.0b8pre (latest hg) always crashes as economist.com creates a WebGLContext (!). [@gl::GLContextGLX::CreateGLContext] → Crash on economist.com when it creates a WebGLContext (!). [@gl::GLContextGLX::CreateGLContext]
Do other WebGL demos work for you, or do they all crash? For example: http://spidergl.org/example.php?id=1
Looking at the .js file above, it looks like http://www.modernizr.com/ is causing this. (It just checks for webgl functionality) http://spidergl.org/example.php?id=1 does not crash, but all other examples on that site do. http://www.html5test.com/ also crashes.
This is a copy of the JS file referenced from the above JS stack. A search for 'webgl' through it does find occurences. A comment at the top claims it to be jQuery, but I checked in jQuery, both 1.4.4 and git, and there's no occurence of 'webgl' in it. I didn't know about 'modernizer' and need to read stuff about it.
It's on line 4528 of that js file: l.removeChild(a);return c};d.canvas=function(){var a=e.createElement("canvas");return!!(a.getContext&&a.getContext("2d"))};d.canvastext=function(){return!!(f.canvas&&typeof e.createElement("canvas").getContext("2d").fillText=="function")};d.webgl=function(){var a=e.createElement("canvas");try{if(a.getContext("webgl"))return true}catch(b){}try{if(a.getContext("experimental-webgl"))return true}catch(c){}return false};d.touch=function(){return"ontouchstart"in i||Q("@media ("+q.join("touch-enabled),(")+
And in readable form: tests['webgl'] = function(){ var elem = doc.createElement( 'canvas' ); try { if (elem.getContext('webgl')){ return true; } } catch(e){ } try { if (elem.getContext('experimental-webgl')){ return true; } } catch(e){ } return false; };
So, this is indeed just creating a WebGL context, and seemingly not using it at all. I checked that at least WebGLContext::getParameter and WebGLContext::drawArrays are never called. Someone with a JS debugger may be able to confirm more decisively that they don't try to do anything with this WebGLContext. (Note: in your case, using OSMesa should allow you to run WebGL in software without crashes. install OSMesa and set webgl.osmesalib=libOSMesa.so.6 and webgl.force_osmesa=true)
Confirmed: I set a breakpoint in WebGLContext::MakeContextCurrent(), which is called by every WebGL function, and it's only ever reached during WebGL context initialization. So what's this modernizr thing, is it a tool to make web sites slower to load by making them create WebGL contexts that they don't need?
Anyway, can you please paste the output of glxinfo | egrep -i vendor\|renderer\|version So I can try to implement a blacklist, so that people can still use firefox with this driver after we enable WebGL by default on linux.
These "failed to create drawable" errors are still puzzling me. I wish I knew what's causing this.
Please see my Comment 1 above for the renderer versions.
Sorry. OK, I would like this to be debugged a bit before we resort to blacklisting. Filing a bug @ freedesktop.org. Could you please install as much debug symbols as you can, for Mesa, X and Radeon packages from your linux distro, hopefully you'll get a better backtrace for them.
OK, I will try this later. Please post a link to the freedesktop bug you've just filled.
Here's what I have found out: 1) with software renderer: Program received signal SIGABRT, Aborted. 0x00007ffff35ec7c5 in raise () from /lib/libc.so.6 (gdb) bt #0 0x00007ffff35ec7c5 in raise () from /lib/libc.so.6 #1 0x00007ffff35edc46 in abort () from /lib/libc.so.6 #2 0x00007ffff35e5345 in __assert_fail () from /lib/libc.so.6 #3 0x00007fffd7186f71 in _mesa_update_draw_buffer_bounds (ctx=0x7fffc8ada000) at main/framebuffer.c:507 #4 0x00007fffd71d159c in _mesa_update_state_locked (ctx=0x7fffc8ada000) at main/state.c:596 #5 0x00007fffd71d172c in _mesa_update_state (ctx=0x7fffc8ada000) at main/state.c:676 #6 0x00007fffd7188eb7 in check_extra (ctx=0x7fffc8ada000, func=0x7fffd73e8644 "glGetIntegerv", d=0x7fffd7503c78) at main/get.c:1698 #7 0x00007fffd7189196 in find_value (func=0x7fffd73e8644 "glGetIntegerv", pname=3410, p=0x7fffffff9d28, v=0x7fffffff9d30) at main/get.c:1785 #8 0x00007fffd71899df in _mesa_GetIntegerv (pname=3410, params=0x7fffffff9f20) at main/get.c:1999 #9 0x00007ffff6f2abe1 in ?? () from /usr/lib64/firefox-4.0b8pre/libxul.so #10 0x00007ffff6f2ce4d in ?? () from /usr/lib64/firefox-4.0b8pre/libxul.so #11 0x00007ffff6f38684 in ?? () from /usr/lib64/firefox-4.0b8pre/libxul.so #12 0x00007ffff6f38988 in ?? () from /usr/lib64/firefox-4.0b8pre/libxul.so #13 0x00007ffff6f37efd in ?? () from /usr/lib64/firefox-4.0b8pre/libxul.so #14 0x00007ffff6f37c70 in mozilla::gl::GLContextProviderGLX::GetGlobalContext() () from /usr/lib64/firefox-4.0b8pre/libxul.so #15 0x00007ffff6f37ec7 in ?? () from /usr/lib64/firefox-4.0b8pre/libxul.so #16 0x00007ffff6f37ff2 in mozilla::gl::GLContextProviderGLX::CreateOffscreen(gfxIntSize const&, mozilla::gl::ContextFormat const&) () from /usr/lib64/firefox-4.0b8pre/libxul.so #17 0x00007ffff67f6185 in ?? () from /usr/lib64/firefox-4.0b8pre/libxul.so #18 0x00007ffff683cfb0 in ?? () from /usr/lib64/firefox-4.0b8pre/libxul.so #19 0x00007ffff683d81d in ?? () from /usr/lib64/firefox-4.0b8pre/libxul.so #20 0x00007ffff6ba1c24 in ?? () from /usr/lib64/firefox-4.0b8pre/libxul.so #21 0x00007fffda252ee9 in ?? () #22 0x00007fffe670dab8 in ?? () #23 0x00007fffee91c040 in ?? () #24 0x0000000000000001 in ?? () #25 0x0000000000000000 in ?? () (gdb) q 2) with Gallium or Mesa DRI R600: Program received signal SIGSEGV, Segmentation fault. 0x00007fffeec3cac0 in xcb_glx_get_string_string_length () from /usr/lib/libxcb-glx.so.0 (gdb) bt #0 0x00007fffeec3cac0 in xcb_glx_get_string_string_length () from /usr/lib/libxcb-glx.so.0 #1 0x00007ffff0342dbb in __glXGetString (dpy=0x7fffee83e000, opcode=153, contextTag=10027013, name=7939) at glx_query.c:82 #2 0x00007ffff033ef6c in __indirect_glGetString (name=7939) at single2.c:686 #3 0x00007ffff031006e in indirect_bind_context (gc=0x7fffd9ffa1e0, old=0x7fffd9ff9880, draw=10486546, read=10486546) at indirect_glx.c:156 #4 0x00007ffff030cbb9 in MakeContextCurrent (dpy=0x7fffee83e000, draw=10486546, read=10486546, gc_user=0x7fffd9ffa1e0) at glxcurrent.c:263 #5 0x00007ffff030cc6a in glXMakeCurrent (dpy=0x7fffee83e000, draw=10486546, gc=0x7fffd9ffa1e0) at glxcurrent.c:287 #6 0x00007ffff6f38667 in ?? () from /usr/lib64/firefox-4.0b8pre/libxul.so #7 0x00007ffff6f38988 in ?? () from /usr/lib64/firefox-4.0b8pre/libxul.so #8 0x00007ffff6f37efd in ?? () from /usr/lib64/firefox-4.0b8pre/libxul.so #9 0x00007ffff6f37ff2 in mozilla::gl::GLContextProviderGLX::CreateOffscreen(gfxIntSize const&, mozilla::gl::ContextFormat const&) () from /usr/lib64/firefox-4.0b8pre/libxul.so #10 0x00007ffff67f6185 in ?? () from /usr/lib64/firefox-4.0b8pre/libxul.so #11 0x00007ffff683cfb0 in ?? () from /usr/lib64/firefox-4.0b8pre/libxul.so #12 0x00007ffff683d81d in ?? () from /usr/lib64/firefox-4.0b8pre/libxul.so #13 0x00007ffff6ba1c24 in ?? () from /usr/lib64/firefox-4.0b8pre/libxul.so #14 0x00007fffdd522ee9 in ?? () #15 0x00007fffe6bf6ab8 in ?? () #16 0x00007fffee91c040 in ?? () #17 0x0000000000000001 in ?? () #18 0x0000000000000000 in ?? () (gdb) q
Attached file 2 traces
Benoit, I have found the root of the problem. I normally use my monitor in portrait mode: DVI-0 connected 1050x1680+0+0 left When I rotate it back to its default position and run "xrandr --output DVI-0 --rotate normal" there are no crashes anymore and webgl is working fine. (i.e. the examples on http://spidergl.org/ all render fine) This should be the reason for: firefox-bin: main/framebuffer.c:507: _mesa_update_draw_buffer_bounds: Assertion `buffer->_Ymin <= buffer->_Ymax' failed. (because 1680>1050)
Great, can you update the relevant above Mesa bug(s)? (From what you say I gather that it's https://bugs.freedesktop.org/show_bug.cgi?id=32243 that you're talking about).
Done. But this must be a bug in firefox. Because other gl applications are running just fine here (stellarium). What should the gl stack do but crash, if you send wrong parameters to it?
There can be both a driver bug and a bug in firefox. The segfault you get with gallium is clearly a driver bug, the other is less clear, and at the same time, yes there can be a firefox bug making us feed your GL library with bad stuff, I'll look into this.
Indeed, when I change line 489 in content/canvas/src/WebGLContext.cpp to: gl = gl::GLContextProviderOSMesa::CreateOffscreen(gfxIntSize(1050, 1680), format); firefox no longer crashes and renders basic webgl demos fine... So it seems that gl::GLContextProviderOSMesa::CreateOffscreen(gfxIntSize(width, height), format); is being called with the wrong width and height parameters in the rotated monitor case.
This is getting crazy. I cannot reproduce the crash anymore. Even vanilla beta7 works fine now. What the heck is going on? It's driving me nuts.
It seems as though that the RS780 microcode didn't get loaded at boot-time on the kernel I was using during the tests: Dec 04 06:41:20 [kernel] Linux agpgart interface v0.103 Dec 04 06:41:20 [kernel] [drm] Initialized drm 1.1.0 20060810 Dec 04 06:41:20 [kernel] [drm] radeon defaulting to kernel modesetting. Dec 04 06:41:20 [kernel] [drm] radeon kernel modesetting enabled. Dec 04 06:41:20 [kernel] radeon 0000:01:05.0: PCI INT A -> GSI 18 (level, low) -> IRQ 18 Dec 04 06:41:20 [kernel] radeon 0000:01:05.0: setting latency timer to 64 Dec 04 06:41:20 [kernel] [drm] initializing kernel modesetting (RS780 0x1002:0x9614). Dec 04 06:41:20 [kernel] [drm] register mmio base: 0xFBEE0000 Dec 04 06:41:20 [kernel] [drm] register mmio size: 65536 Dec 04 06:41:20 [kernel] ATOM BIOS: 113 Dec 04 06:41:20 [kernel] radeon 0000:01:05.0: VRAM: 128M 0xC0000000 - 0xC7FFFFFF (128M used) Dec 04 06:41:20 [kernel] radeon 0000:01:05.0: GTT: 512M 0xA0000000 - 0xBFFFFFFF Dec 04 06:41:20 [kernel] [drm] Detected VRAM RAM=128M, BAR=128M Dec 04 06:41:20 [kernel] [drm] RAM width 32bits DDR Dec 04 06:41:20 [kernel] [TTM] Zone kernel: Available graphics memory: 2026154 kiB. Dec 04 06:41:20 [kernel] [TTM] Initializing pool allocator. Dec 04 06:41:20 [kernel] [drm] radeon: 128M of VRAM memory ready Dec 04 06:41:20 [kernel] [drm] radeon: 512M of GTT memory ready. Dec 04 06:41:20 [kernel] [drm] radeon: irq initialized. Dec 04 06:41:20 [kernel] [drm] GART: num cpu pages 131072, num gpu pages 131072 Dec 04 06:41:20 [kernel] radeon 0000:01:05.0: WB enabled Dec 04 06:41:20 [kernel] [drm] ring test succeeded in 1 usecs Dec 04 06:41:20 [kernel] [drm] radeon: ib pool ready. Dec 04 06:41:20 [kernel] [drm] ib test succeeded in 0 usecs Dec 04 06:41:20 [kernel] [drm] Enabling audio support Dec 04 06:41:20 [kernel] [drm] Radeon Display Connectors Dec 04 06:41:20 [kernel] [drm] Connector 0: Dec 04 06:41:20 [kernel] [drm] VGA Dec 04 06:41:20 [kernel] [drm] DDC: 0x7e40 0x7e40 0x7e44 0x7e44 0x7e48 0x7e48 0x7e4c 0x7e4c Dec 04 06:41:20 [kernel] [drm] Encoders: Dec 04 06:41:20 [kernel] [drm] CRT1: INTERNAL_KLDSCP_DAC1 Dec 04 06:41:20 [kernel] [drm] Connector 1: Dec 04 06:41:20 [kernel] [drm] DVI-D Dec 04 06:41:20 [kernel] [drm] HPD3 Dec 04 06:41:20 [kernel] [drm] DDC: 0x7e50 0x7e50 0x7e54 0x7e54 0x7e58 0x7e58 0x7e5c 0x7e5c Dec 04 06:41:20 [kernel] [drm] Encoders: Dec 04 06:41:20 [kernel] [drm] DFP3: INTERNAL_KLDSCP_LVTMA Dec 04 06:41:20 [kernel] [drm] radeon: power management initialized Dec 04 06:41:20 [kernel] [drm] fb mappable at 0xF0141000 Dec 04 06:41:20 [kernel] [drm] vram apper at 0xF0000000 Dec 04 06:41:20 [kernel] [drm] size 7258112 Dec 04 06:41:20 [kernel] [drm] fb depth is 24 Dec 04 06:41:20 [kernel] [drm] pitch is 6912 Dec 04 06:41:20 [kernel] Console: switching to colour frame buffer device 131x105 Dec 04 06:41:20 [kernel] fb0: radeondrmfb frame buffer device Dec 04 06:41:20 [kernel] drm: registered panic notifier Dec 04 06:41:20 [kernel] [drm] Initialized radeon 2.7.0 20080528 for 0000:01:05.0 on minor 0 Notice that the line: [drm] Loading RS780 Microcode is missing. I've checked my kernel logs and this was the only time it didn't get loaded during boot. (I don't use modules, everything is build in) I don't know, could this be the reason for the bug I was observing?
I don't know whether or not the missing line is significant. If the micro code isn't loaded, then I'd expect fallback to software rendering. You can force that with LIBGL_ALWAYS_SOFTWARE=1 in the environment.
Are you doing suspend to RAM or to disk? Could it be that you did something like this: 1. boot your computed with a given screen orientation 2. start firefox with maximized window, go to the crashy economist.com page 3. suspend to RAM or to disk (whatever you use to do) 4. change screen orientation 5. resume Does that reproduce the crash? Do you think you've ever gotten the crash without any suspend involved ?
No, I don't use suspend to RAM or disk. I use kexec to reboot normally, but in this case it was a cold-booted kernel, that ran during the tests. (Another symptom during the tests was, that I couldn't switch my dri library on the fly. /usr/lib/dri/r600_dri.so is a symlink to either ../mesa/r600g_dri.so or ../mesa/r600_dri.so. When I change this symlink the renderer switches instantaneous normally, but I had to restart X to change the renderer during my tests.) Let's just assume that a cosmic ray hit my GPU and maybe close this bug.
So, I think there was a real bug in my code. Although it's not my fault that your system crashed, I think I wasn't helping it: What it was doing was: glxCreateNewContext, then glxMakeCurrent, then do some GL initialization (load symbols, check for extensions) then finally check for X error. This patch changes it to: glxCreateNewContext, then immediately check for X error, then go on with glxMakeCurrent etc.
Attachment #496589 - Flags: review?(karlt)
Karl: I still agree that initially I should only catch BadAllocs and forward other X errors to the preexisting error handler, but that was relatively tricky to implement in ScopedXErrorHandler which must be stackable. Instead, let's say that it's the responsibility of the user of ScopedXErrorHandler to properly act upon errors, which is what this patch fixes here.
There seems little point in the "goto TRY_AGAIN_NO_SHARING" here http://hg.mozilla.org/mozilla-central/annotate/f971ad6ed5a5/gfx/thebes/GLContextProviderGLX.cpp#l277 as the new context will be discarded here: http://hg.mozilla.org/mozilla-central/annotate/f971ad6ed5a5/gfx/thebes/GLContextProviderGLX.cpp#l284 Should error be reset to false on TRY_AGAIN?
Comment on attachment 496589 [details] [diff] [review] check X error immediately after glxCreateNewContext > if (shareContext) { >- if (error || xErrorHandler.SyncAndGetError(display)) { >+ if (error) { > shareContext = nsnull; > goto TRY_AGAIN_NO_SHARING; > } > } > > // at this point, if shareContext != null, we know there's no error. > // it's important to minimize the number of XSyncs for startup performance. > if (!shareContext) { With this change, we no longer know that glXMakeCurrent has succeeded, so this block would need to be executed even if shareContext. And if there has been an error, then it may be worth considering TRY_AGAIN_NO_SHARING. > What it was doing was: glxCreateNewContext, then glxMakeCurrent, then do some > GL initialization (load symbols, check for extensions) then finally check for X > error. I'm not aware of any problem with this approach, and fewer Syncs seems better.
Attachment #496589 - Flags: review?(karlt) → review-
(In reply to comment #38) > Karl: I still agree that initially I should only catch BadAllocs and forward > other X errors to the preexisting error handler, but that was relatively > tricky to implement in ScopedXErrorHandler which must be stackable. Instead, > let's say that it's the responsibility of the user of ScopedXErrorHandler to > properly act upon errors The API with SyncAndGetError returning an XErrorEvent is not right because there may be more than one error. Currently it returns the last error, which is probably the least useful. Returning the first error and ignoring the rest would be a big improvement because, if that first error was the error that the ScopedXErrorHandler was be used to catch, then subsequent errors may simply be results of the first error and so should be ignored. However, the user of ScopedXErrorHandler does not know the old error handler so cannot chain up unexpected errors to nested handlers. I think a full ScopedXErrorHandler implementation would either take a callback argument to the constructor, or have a virtual function that derived classes can override and would be used as the callback on error. The callback would return a boolean to indicate whether the error (and all subsequent errors) should be ignored or whether the error should be chained up to the old error handler.
(In reply to comment #41) > I think a full ScopedXErrorHandler implementation would either take a callback > argument to the constructor, or have a virtual function that derived classes > can override and would be used as the callback on error. But that may be a bit much effort to use. I'm thinking it might be easiest to forget about detecting particular errors and instead improve things by collecting only errors from particular scopes. That improvement could be made by recording NextRequest(dpy) in the ScopedXErrorHandler constructor and sending any error events with earlier serial numbers up the chain.
(In reply to comment #39) > There seems little point in the "goto TRY_AGAIN_NO_SHARING" here > http://hg.mozilla.org/mozilla-central/annotate/f971ad6ed5a5/gfx/thebes/GLContextProviderGLX.cpp#l277 > as the new context will be discarded here: > http://hg.mozilla.org/mozilla-central/annotate/f971ad6ed5a5/gfx/thebes/GLContextProviderGLX.cpp#l284 > > Should error be reset to false on TRY_AGAIN? Yes it should! Thanks for catching that.
> > What it was doing was: glxCreateNewContext, then glxMakeCurrent, then do some > > GL initialization (load symbols, check for extensions) then finally check for X > > error. > > I'm not aware of any problem with this approach, and fewer Syncs seems better. The problem is that if glxCreateNewContext failed, i.e. if the created GL context is bad, then glxMakeCurrent may crash, depending on buggy GL implementations/drivers. I realize that it's not our fault, but we have no choice but to go extra lengths to prevent such crashes.
Blocks: 622294
Blocks: 600079
No longer blocks: 600079
(In reply to comment #40) > With this change, we no longer know that glXMakeCurrent has succeeded, so this > block would need to be executed even if shareContext. > And if there has been an error, then it may be worth considering > TRY_AGAIN_NO_SHARING. > > > What it was doing was: glxCreateNewContext, then glxMakeCurrent, then do some > > GL initialization (load symbols, check for extensions) then finally check for X > > error. > > I'm not aware of any problem with this approach, and fewer Syncs seems better. OK, let me reexplain this as this is really the central issue. Our current code does: glxCreateNewContext() glxMakeCurrent() XSync() and check X error I am proposing to move up the X sync and error checking so it looks like this instead: glxCreateNewContext() XSync() and check X error glxMakeCurrent() The rationalization for this is as follows. It often happens that glxCreateNewContext fails and causes a X error, and when this is the case, with buggy drivers and some bad luck, the next glxMakeCurrent call can crash. This is what's happening here. So we need to do X error checking between the glxCreateNewContext and the glxMakeCurrent. On the other hand, if glxCreateNewContext() didn't fail, then glxMakeCurrent() probably won't fail, at least I've never seen it failing in that case. Moreover, calling glxMakeCurrent is something we do literally all the time, and so we couldn't afford to do XSync and check X error everytime we call it. So doing it here would be a little futile, unless there is a particular reason to believe that the first glxMakeCurrent call is especially likely to fail even if glxCreateNewContext succeeded, but I'm not aware of anything going in that direction (?).
Here's the new patch, incorporating your comments: * reset error = false * record only the first X error (make the error handler ignore subsequent errors) as I don't have time now to implement a fully satisfactory solution such as you describe.
Attachment #496589 - Attachment is obsolete: true
Attachment #501126 - Flags: review?(karlt)
Comment on attachment 501126 [details] [diff] [review] check X error immediately after glxCreateNewContext, updated (In reply to comment #45) > It often happens that > glxCreateNewContext fails and causes a X error, and when this is the case, with > buggy drivers and some bad luck, the next glxMakeCurrent call can crash. Can you be more specific about the buggy drivers and crash here, so we know what we are working around please? Which drivers and which crash stack? > > On the other hand, if glxCreateNewContext() didn't fail, then glxMakeCurrent() > probably won't fail, In attachment 492774 [details], a GLXBadDrawable error occurs during GLXMakeCurrent. In that case, were there other (hidden) errors that occurred during glxCreateNewContext that would interrupt the process before this happens? If so, then the comment here needs updating: http://hg.mozilla.org/mozilla-central/annotate/7f2b60765d01/gfx/thebes/GLContextProviderGLX.cpp#l288 This patch is changing the behavior to glxCreateNewContext() XSync() and check X error glxMakeCurrent() if (!shareContext) { XSync() and check X error } I assume the different behavior depending on shareContext is not intentional. Bug 621699 (which is a dupe of bug 589546 AFAIK) is another case where a (different) error happens during glxMakeCurrent(). I'm guessing that is an unrelated bug. Do you know what's going on there? > ... unless there is a particular reason to > believe that the first glxMakeCurrent call is especially likely to fail even if > glxCreateNewContext succeeded, This documentation may indicate such a reason (though I don't know whether it exists in practice): "BadAlloc may be generated if the server has delayed allocation of ancillary buffers until glXMakeCurrent is called, only to find that it has insufficient resources to complete the allocation." r- is because this patch doesn't do what was proposed in comment 45, though I'm interested in more information as to why the proposed approach is appropriate.
Attachment #501126 - Flags: review?(karlt)
Attachment #501126 - Flags: review-
(In reply to comment #47) > Comment on attachment 501126 [details] [diff] [review] > check X error immediately after glxCreateNewContext, updated > > (In reply to comment #45) > > It often happens that > > glxCreateNewContext fails and causes a X error, and when this is the case, with > > buggy drivers and some bad luck, the next glxMakeCurrent call can crash. > > Can you be more specific about the buggy drivers and crash here, so we know > what we are working around please? Which drivers and which crash stack? The first stack trace in this bug, in comment 0, is crashing in glxMakeCurrent, and comment 1 says that it's with the free ATI driver on linux. > > > > > On the other hand, if glxCreateNewContext() didn't fail, then glxMakeCurrent() > > probably won't fail, > > In attachment 492774 [details], a GLXBadDrawable error occurs during GLXMakeCurrent. In > that case, were there other (hidden) errors that occurred during > glxCreateNewContext that would interrupt the process before this happens? That was recorded before the graceful X error handler was added, so IIUC any earlier error would have interrupted the process. So that's confusing --- I had forgotten about that. I had been working under the assumption that glxMakeCurrent only crashes when earlier X errors exist, when actually it seems that it can crash even without a prior X error existing. That means that the only defense against that would be driver blacklisting, which I need to implement anyway. > > If so, then the comment here needs updating: > http://hg.mozilla.org/mozilla-central/annotate/7f2b60765d01/gfx/thebes/GLContextProviderGLX.cpp#l288 > > This patch is changing the behavior to > > glxCreateNewContext() > XSync() and check X error > glxMakeCurrent() > if (!shareContext) { XSync() and check X error } > > I assume the different behavior depending on shareContext is not intentional. Indeed, sorry. > > Bug 621699 (which is a dupe of bug 589546 AFAIK) is another case where a > (different) error happens during glxMakeCurrent(). I'm guessing that is an > unrelated bug. Do you know what's going on there? My working hypothesis was that glxCreateNewContext fails, and a buggy implementation of glxMakeCurrent then crashes. I was hoping that a X error was happening during glxCreateNewContext, was unfortunately hidden by my 'graceful' X error handler, and that the present patch, moving up the X error check, would catch it. > > > ... unless there is a particular reason to > > believe that the first glxMakeCurrent call is especially likely to fail even if > > glxCreateNewContext succeeded, > > This documentation may indicate such a reason (though I don't know whether it > exists in practice): > > "BadAlloc may be generated if the server has delayed allocation of ancillary > buffers until glXMakeCurrent is called, only to find that it has insufficient > resources to complete the allocation." Ah, indeed. > > r- is because this patch doesn't do what was proposed in comment 45, though I'm > interested in more information as to why the proposed approach is appropriate.
(In reply to comment #48) > > Can you be more specific about the buggy drivers and crash here, so we know > > what we are working around please? Which drivers and which crash stack? > > The first stack trace in this bug, in comment 0, is crashing in glxMakeCurrent, > and comment 1 says that it's with the free ATI driver on linux. Thanks. Hmm. That doesn't seem to match up with either of the stacks in attachment 496167 [details].
(In reply to comment #49) > Thanks. > > Hmm. That doesn't seem to match up with either of the stacks in attachment > 496167 [details]. Well, it probably was without MOZ_X_SYNC, so perhaps it can be ignored.
Apparently these builds were not properly uploaded, please try again with these builds, which also have a greater chance of working (I added more error checking): http://ftp.mozilla.org/pub/mozilla.org/firefox/tryserver-builds/bjacob@mozilla.com-1adffb67558a/
blocking2.0: --- → ?
Component: General → Canvas: WebGL
Product: Firefox → Core
QA Contact: general → canvas.webgl
Hardware: x86_64 → All
Summary: Crash on economist.com when it creates a WebGLContext (!). [@gl::GLContextGLX::CreateGLContext] → Crash from jQuery with WebGL enabled when it creates a WebGLContext (!) (on support.mozilla.com or economist.com) [@gl::GLContextGLX::CreateGLContext] [@ linux-gate.so@0x422 ]
Version: unspecified → Trunk
See bug 622294 comment 15 for piles of error spew that may be useful here.
Summary: Crash from jQuery with WebGL enabled when it creates a WebGLContext (!) (on support.mozilla.com or economist.com) [@gl::GLContextGLX::CreateGLContext] [@ linux-gate.so@0x422 ] → Crash from jQuery with WebGL enabled when it creates a WebGLContext (!) on economist.com [@gl::GLContextGLX::CreateGLContext]
Given that this only seems to happen with the free ATI driver, this is only a soft blocker, but it is a blocker nonetheless.
blocking2.0: ? → betaN+
Whiteboard: [soft blocker]
Actually, this hasn't shown up on crash-stats at all in the past week. I'd love a fix, but I don't think this blocks after all.
blocking2.0: betaN+ → -
Whiteboard: [soft blocker]
Blocker tease :p We don't use the status2.0:wanted+ flag enough. It's wanted, so requesting.
status2.0: --- → ?
I see a stack similar to comment 0 with 0474f6b72e6e and LIBGL_ALWAYS_SOFTWARE=1 OpenGL vendor string: VMware, Inc. OpenGL renderer string: Gallium 0.4 on llvmpipe OpenGL version string: 2.1 Mesa 7.9.1 OpenGL shading language version string: 1.20 Found extension GL_ARB_framebuffer_object Found extension GL_ARB_pixel_buffer_object Found extension GL_ARB_texture_non_power_of_two Found extension GL_ARB_texture_rectangle Found extension GL_EXT_bgra Found extension GL_EXT_framebuffer_object OpenGL vendor ('VMware, Inc.') unrecognized WARNING: Failed to create GLXContext!: file /home/karl/moz/dev/gfx/thebes/GLContextProviderGLX.cpp, line 287 [GLX] FBConfig is not double-buffered Program /home/karl/moz/dev/obj/dist/bin/firefox-bin (pid = 9335) received signal 11. #4 <signal handler called> #5 0x00007f5a755c3114 in swrastPutImage (draw=0x7f5a27623550, op=3, x=0, y=0, w=32768, h=801837056, data=0x7f59f3900000 "\245\245\245\245\245\245\245\245\245\245\245\245\245\245\245\245\245\245\245\245\245\245\245\245\245\245\245"..., loaderPrivate=0x7f5a2762b6a0) at drisw_glx.c:171 #6 0x00007f5a265709ef in put_image (drawable=<value optimized out>, data=<value optimized out>, width=32768, height=801837056) at drisw.c:68 #7 drisw_put_image (drawable=<value optimized out>, data=<value optimized out>, width=32768, height=801837056) at drisw.c:87 #8 0x00007f5a26571011 in drisw_present_texture (drawable=<value optimized out>, statt=<value optimized out>) at drisw.c:123 #9 drisw_copy_to_front (drawable=<value optimized out>, statt=<value optimized out>) at drisw.c:143 #10 drisw_flush_frontbuffer (drawable=<value optimized out>, statt=<value optimized out>) at drisw.c:184 #11 0x00007f5a26571f10 in dri_st_framebuffer_flush_front (stfbi=<value optimized out>, statt=ST_ATTACHMENT_BACK_RIGHT) at dri_drawable.c:104 #12 0x00007f5a265710f4 in dri_unbind_context (cPriv=<value optimized out>) at dri_context.c:148 #13 0x00007f5a2656e19d in driUnbindContext (pcp=0x7f5a27629cd0) at ../common/drisw_util.c:177 #14 0x00007f5a755c3231 in drisw_unbind_context (context=0x7f5a27baa980, new=0x3) at drisw_glx.c:285 #15 0x00007f5a755a1449 in MakeContextCurrent (dpy=0x7f5a7383c000, draw=27266401, read=27266401, gc_user=<value optimized out>) at glxcurrent.c:250 #16 0x00007f5a7fbe997c in mozilla::gl::GLContextGLX::MakeCurrentImpl (this=0x7f5a27636000, aForce=0) at /home/karl/moz/dev/gfx/thebes/GLContextProviderGLX.cpp:333 #17 0x00007f5a7e9d34f7 in mozilla::gl::GLContext::MakeCurrent (this=0x7f5a27636000, aForce=0) at ../../../dist/include/GLContext.h:454 #18 0x00007f5a7fbe98b1 in mozilla::gl::GLContextGLX::Init (this=0x7f5a27636000) at /home/karl/moz/dev/gfx/thebes/GLContextProviderGLX.cpp:313 #19 0x00007f5a7fbe95ff in mozilla::gl::GLContextGLX::CreateGLContext(const mozilla::gl::ContextFormat &, Display *, GLXDrawable, GLXFBConfig, struct {...} *, mozilla::gl::GLContextGLX *, PRBool, gfxXlibSurface *) (format=..., display=0x7f5a7383c000, drawable=27266401, cfg=0x7f5a274274a0, vinfo=0x7f5a27410400, shareContext=0x0, deleteDrawable=1, pixmap=0x7f5a278c5300) at /home/karl/moz/dev/gfx/thebes/GLContextProviderGLX.cpp:268 #20 0x00007f5a7fbe8dd2 in mozilla::gl::CreateOffscreenPixmapContext (aSize=..., aFormat=..., aShare=1) at /home/karl/moz/dev/gfx/thebes/GLContextProviderGLX.cpp:650 #21 0x00007f5a7fbe8e83 in mozilla::gl::GLContextProviderGLX::CreateOffscreen (aSize=..., aFormat=...) at /home/karl/moz/dev/gfx/thebes/GLContextProviderGLX.cpp:662 #22 0x00007f5a7e9cf5a2 in mozilla::WebGLContext::SetDimensions (this=0x7f5a274b3400, width=300, height=150) at /home/karl/moz/dev/content/canvas/src/WebGLContext.cpp:480 #23 0x00007f5a7ea80506 in nsHTMLCanvasElement::UpdateContext (this=0x7f5a27836420, aNewContextOptions=0x0) at /home/karl/moz/dev/content/html/content/src/nsHTMLCanvasElement.cpp:611 #24 0x00007f5a7ea800ae in nsHTMLCanvasElement::GetContext (this=0x7f5a27836420, aContextId=..., aContextOptions=..., aContext=0x7fff2fcb1c00) at /home/karl/moz/dev/content/html/content/src/nsHTMLCanvasElement.cpp:534 #25 0x00007f5a7f2ae90d in nsIDOMHTMLCanvasElement_GetContext (cx=0x7f5a4b164400, argc=1, vp=0x7f5a59bfe270) at dom_quickstubs.cpp:20391 h=801837056 doesn't look right.
Pretty sure you can't trust any of the values there, unless you have a non-optimized build of DRI/X/etc. All the moz code is doing here is calling glxMakeCurrent() with the drawables and context that we created in CreateGLContext. Everything below GLContextGLX::MakeCurrentImpl is not our code. Our blocking of non-nvidia drivers should catch this, shouldn't it?
(In reply to comment #59) > Our blocking of non-nvidia drivers should catch this, shouldn't it? It does --- I believe Karl is defining MOZ_GLX_IGNORE_BLACKLIST.
Blocks: 624593
Here's my WIP patch, I've made a tryserver build: http://ftp.mozilla.org/pub/mozilla.org/firefox/tryserver-builds/bjacob@mozilla.com-55faeffbd9db/ To everybody who got crashes here, can you please run this, and in case of crashes, give me your whole terminal (standard error) output? You can log it into a file by doing e.g. ./firefox 2>&1 | tee logfile.txt
Attached file crash logfile
OK. Here's a crash logfile, that was produced by running the WebGL conformance suite. glxinfo |grep -i opengl OpenGL vendor string: X.Org OpenGL renderer string: Gallium 0.4 on AMD RS780 OpenGL version string: 2.1 Mesa 7.10 OpenGL shading language version string: 1.20 OpenGL extensions:
Attached file crash logfile2
And a short one, also produced while running the test suite on: OpenGL vendor string: Advanced Micro Devices, Inc. OpenGL renderer string: Mesa DRI R600 (RS780 9614) 20090101 TCL DRI2 OpenGL version string: 2.1 Mesa 7.10 OpenGL shading language version string: 1.20 OpenGL extensions:
(In reply to comment #62) > Created attachment 506236 [details] > crash logfile > > OK. Here's a crash logfile, that was produced by running the > WebGL conformance suite. > > glxinfo |grep -i opengl > OpenGL vendor string: X.Org > OpenGL renderer string: Gallium 0.4 on AMD RS780 > OpenGL version string: 2.1 Mesa 7.10 > OpenGL shading language version string: 1.20 > OpenGL extensions: Hm, no X error, so this really isn't the same bug as the one originally discussed on this bug. This means that there are driver crashes here that we can't seem to avoid just by doing proper X error handling. It's quite pleasing though that it only crashes so late through the WebGL test suite.
(In reply to comment #63) > Created attachment 506237 [details] > crash logfile2 > > And a short one, also produced while running the test suite on: > OpenGL vendor string: Advanced Micro Devices, Inc. > OpenGL renderer string: Mesa DRI R600 (RS780 9614) 20090101 TCL DRI2 > OpenGL version string: 2.1 Mesa 7.10 > OpenGL shading language version string: 1.20 > OpenGL extensions: Just before crashing, it says: radeonSetSpanFunctions: bad format: 0x0002 radeonSetSpanFunctions: bad format: 0x0002 Maybe that would be a useful clue to driver developers.
Can you still reproduce the original crash (economist.com), with this new tryserver build?
No, at least economist.com is rock stable now (tested with Gallium).
Karl, can you review this? It applies your earlier review comments, and simplifies the CreateGLContext code (fewer lines of code). I think that only the first SyncAndGetError() is useful (the one after glxCreateNewContext). The second SyncAndGetError, after Init(), should be removed in my opinion: that Init() function is just doing glXMakeCurrent() and GL calls, which we're later on going to do without checking X errors all the time, so I wonder about the usefulness of checking X errors this time.
Attachment #506414 - Flags: review?(karlt)
Comment on attachment 506414 [details] [diff] [review] fix X error handling I like the change to ScopedXErrorHandler and the resetting of error in CreateGLContext, but they should be done in another bug. I don't like calling XSync twice, though. I don't know anything that it is improving and it adds at a cost. This bug, as reported in comment 0, is about working around https://bugs.freedesktop.org/show_bug.cgi?id=33203 This is already worked around through blacklisting the driver, and I don't think we should add further workarounds at least unless they mean we actually can use the driver. FWIW, the patch does not workaround the mesa bug. The first X error is an BadDrawable from CreateGC. (i.e. same as Bug 589546, which is https://bugzilla.redhat.com/show_bug.cgi?id=575825) That occurs during MakeCurrent here: #0 _XError (dpy=0x7fffea13c000, rep=0x7fff99ff1430) at XlibInt.c:1554 #1 0x00007ffff0600cbf in handle_error (dpy=0x7fffea13c000, err=0x7fff99ff1430, in_XReply=1) at xcb_io.c:166 #2 0x00007ffff0600d06 in handle_response (dpy=0x7fffea13c000, response=0x7fff99ff1430, in_XReply=1) at xcb_io.c:266 #3 0x00007ffff06012b0 in _XReply (dpy=0x7fffea13c000, rep=<value optimized out>, extra=<value optimized out>, discard=<value optimized out>) at xcb_io.c:555 #4 0x00007ffff05fc9b9 in XSync (dpy=0x7fffea13c000, discard=0) at Sync.c:44 #5 0x00007ffff05fcb7b in _XSyncFunction (dpy=0x7fffea13c000) at Synchro.c:35 #6 0x00007ffff0603847 in _XPrivSyncFunction (dpy=0x7fffea13c000) at XlibInt.c:251 #7 0x00007ffff05dc87a in XCreateGC (dpy=0x7fffea13c000, d=0, valuemask=140737120661504, values=0x0) at CrGC.c:100 #8 0x00007fffebe15fd2 in XCreateDrawable (base=0x7fff99c65160, xDrawable=50335183, drawable=<value optimized out>, modes=0x7fff9a07ede0) at drisw_glx.c:78 #9 driCreateDrawable (base=0x7fff99c65160, xDrawable=50335183, drawable=<value optimized out>, modes=0x7fff9a07ede0) at drisw_glx.c:377 #10 0x00007fffebe16337 in driFetchDrawable (gc=0x7fff99eef600, glxDrawable=50335183) at dri_common.c:373 #11 0x00007fffebe15afa in drisw_bind_context (context=0x7fffea13c000, old=<value optimized out>, draw=1, read=33106) at drisw_glx.c:266 #12 0x00007fffebdf3d11 in MakeContextCurrent (dpy=0x7fffea13c000, draw=50335183, read=50335183, gc_user=<value optimized out>) at glxcurrent.c:263 #13 0x00007ffff6465be8 in mozilla::gl::GLContextGLX::MakeCurrentImpl ( this=0x7fff99f91000, aForce=0) at /home/karl/moz/dev/gfx/thebes/GLContextProviderGLX.cpp:336 #14 0x00007ffff523d42b in mozilla::gl::GLContext::MakeCurrent (this= 0x7fff99f91000, aForce=0) at ../../../dist/include/GLContext.h:455 #15 0x00007ffff6465b1d in mozilla::gl::GLContextGLX::Init ( this=0x7fff99f91000) at /home/karl/moz/dev/gfx/thebes/GLContextProviderGLX.cpp:316 #16 0x00007ffff64658ad in mozilla::gl::GLContextGLX::CreateGLContext(const mozilla::gl::ContextFormat &, Display *, GLXDrawable, GLXFBConfig, struct {...} *, mozilla::gl::GLContextGLX *, PRBool, gfxXlibSurface *) (format=..., display= 0x7fffea13c000, drawable=50335183, cfg=0x7fff9a07ede0, vinfo=0x7fff9a0a9c00, shareContext=0x0, deleteDrawable=1, pixmap=0x7fff9e5ec300) at /home/karl/moz/dev/gfx/thebes/GLContextProviderGLX.cpp:282 The crash is also in MakeCurrent (comment 75), before the error is checked.
Attachment #506414 - Flags: review?(karlt) → review-
Depends on: 632969
I'm the guy who write the Modernizr library and included those tests. I've spoken with the Chrome GPU guys and decided to change the WebGL feature detect to just !!window.WebGLRenderingContext To avoid crashes like these.. but hey, spinning up webgl contexts everywhere did a nice job to round up the troublesome graphics drivers for ya'll. :) Anyway, I believe this new test will false positive in FF in a number of cases, but AFAICT a reliable feature test for WebGL isnt quite possible without spinning up a new context.
My instance of Firefox 4.0 on Linux always crashes upon trying to visit economist.com. I reported several crashes, the cleanest one being this one (see link), where I started Firefox 4.0 with a clean profile: https://crash-stats.mozilla.com/report/index/bp-e7f7c556-a0f4-4afa-bcd9-cc97c2110324
Otto, do you have MOZ_GLX_IGNORE_BLACKLIST defined in the environment? If not, can you attach the output you get from glxinfo, please?
MOZ_GLX_IGNORE_BLACKLIST is not set in my environment.
I have a second site that also always crashes, apparently in the same way: drumbeat.org (a Mozilla site) Clean profile crash reports: https://crash-stats.mozilla.com/report/index/bp-b3996d44-3488-4a79-8153-19eb32110324 https://crash-stats.mozilla.com/report/index/bp-29a18506-0512-4ce0-b590-e81232110324 Additionally, I noticed these error messages in my terminal, appear at the moment Firefox 4 crashed: [otto@zaphod ~]$ firefox -ProfileManager failed to create drawable failed to create drawable ###!!! ABORT: X_FreeGC: BadGC (invalid GC parameter); 3 requests ago: file nsX11ErrorHandler.cpp, line 190 ###!!! ABORT: X_FreeGC: BadGC (invalid GC parameter); 3 requests ago: file nsX11ErrorHandler.cpp, line 190 I hope that might be of some help.
Toggled webgl.disable to true in about:config and both sites no longer crash. No surprise of course, but an easy workaround if you want to actually visit those sites...
Weird: your crash report says 'GLContext+' which really means that Firefox created a GL context, but it shouldn't have as your driver is blacklisted. Are you 100% sure that you don't have MOZ_GLX_IGNORE_BLACKLIST defined somewhere? Can you try running with the environment variable MOZ_X_SYNC=1? That should give a better crash report to understand these X errors.
Where else should I look for MOZ_GLX_IGNORE_BLACKLIST ? It's not in env, and I certainly never added this anywhere. Maybe Firefox loads something else somewhere that includes it, how do I know? I ran it with MOZ_X_SYNC=1 and it didn't crash! It did produce these error messages in the terminal when visiting economist.com (2 per page visited): [test@zaphod ~]$ export MOZ_X_SYNC=1 [test@zaphod ~]$ firefox failed to create drawable failed to create drawable failed to create drawable failed to create drawable NOTE: child process received `Goodbye', closing down [test@zaphod ~]$ I ran it as a different user on my system, and webgl.disable is false. Also prior to trying with MOZ_X_SYNC I ran firefox and tried to visit economist.com, it did crash. This just to exclude differences between my normal user and my test user.
I'd like to note that Modernizr 1.7 (unfortunately The Economist is still on 1.6) removes the webgl crash by not attempting to create a GL context, and instead relying on the following: http://www.modernizr.com/downloadfulljs/ // This WebGL test false positives in FF depending on graphics hardware. But really it's quite impossible to know // wether webgl will succeed until after you create the context. You might have hardware that can support // a 100x100 webgl canvas, but will not support a 1000x1000 webgl canvas. So this feature inference is weak, // but intentionally so. tests['webgl'] = function(){ return !!window.WebGLRenderingContext; }; Obviously the work bjacob is doing in adding a little gl context creator for linux to FF5 is should fix this across the board, but wondering if this is kind of an evang issue, where The Economist should be notified of the Modernizr fix which was released back in February.
(In reply to comment #79) > Obviously the work bjacob is doing in adding a little gl context creator for > linux to FF5 is should fix this across the board, but wondering if this is kind > of an evang issue, where The Economist should be notified of the Modernizr fix > which was released back in February. Evangelism --> CC'ing Paul.
The crash in comment 0 is resolved by attachment 539555 [details] [diff] [review], so that seems the same issue as bug 659842. I now see bug 589546 with "Gallium 0.4 on llvmpipe".
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: