Last Comment Bug 659842 - X_GLXMakeCurrent: GLXBadContextTag SIGABRT with indirect Mesa at closedown and on about:support
: X_GLXMakeCurrent: GLXBadContextTag SIGABRT with indirect Mesa at closedown an...
Status: VERIFIED FIXED
: crash, regression, topcrash
Product: Core
Classification: Components
Component: Graphics (show other bugs)
: Trunk
: All Linux
: -- critical with 1 vote (vote)
: mozilla7
Assigned To: Benoit Jacob [:bjacob] (mostly away)
:
Mentors:
: 616416 (view as bug list)
Depends on:
Blocks: 624935 645407
  Show dependency treegraph
 
Reported: 2011-05-25 19:08 PDT by Tony Mechelynck [:tonymec]
Modified: 2011-06-29 14:48 PDT (History)
9 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
?
affected
+
fixed


Attachments
check if webgl is disabled (1.79 KB, patch)
2011-05-30 09:40 PDT, Benoit Jacob [:bjacob] (mostly away)
no flags Details | Diff | Splinter Review
kill SafeToCreateCanvas3DContex, check webgl.disabled in SetDimensions (71 bytes, patch)
2011-06-08 08:14 PDT, Benoit Jacob [:bjacob] (mostly away)
no flags Details | Diff | Splinter Review
log from "verbose" run of Nightly, see comment #29 and #34 (34.10 KB, text/plain)
2011-06-08 16:47 PDT, Tony Mechelynck [:tonymec]
no flags Details
kill SafeToCreateCanvas3DContex, check webgl.disabled in SetDimensions (9.45 KB, patch)
2011-06-08 19:12 PDT, Benoit Jacob [:bjacob] (mostly away)
no flags Details | Diff | Splinter Review
kill SafeToCreateCanvas3DContex, check webgl.disabled in SetDimensions (UPDATED) (10.64 KB, patch)
2011-06-09 19:30 PDT, Benoit Jacob [:bjacob] (mostly away)
karlt: review+
Details | Diff | Splinter Review
sweep X error under the carpet (717 bytes, patch)
2011-06-09 19:50 PDT, Benoit Jacob [:bjacob] (mostly away)
jacob.benoit.1: review-
Details | Diff | Splinter Review
bzipped log from try-build run, see comment #51 (15.79 KB, application/octet-stream)
2011-06-11 06:28 PDT, Tony Mechelynck [:tonymec]
no flags Details
check that MakeCurrent succeeds after destroying the drawable underlying the previous context (4.68 KB, patch)
2011-06-12 17:55 PDT, Karl Tomlinson (:karlt)
no flags Details | Diff | Splinter Review
bzipped log, see comment #58 to #60 (11.47 KB, application/octet-stream)
2011-06-13 12:29 PDT, Tony Mechelynck [:tonymec]
no flags Details
bzipped log see comment #61 and #62 (13.05 KB, application/octet-stream)
2011-06-13 13:17 PDT, Tony Mechelynck [:tonymec]
no flags Details
block swrast (1.33 KB, patch)
2011-06-14 12:10 PDT, Benoit Jacob [:bjacob] (mostly away)
karlt: review+
Details | Diff | Splinter Review
bzipped log, see comment #71 sqq. (6.05 KB, application/octet-stream)
2011-06-15 05:02 PDT, Tony Mechelynck [:tonymec]
no flags Details
release GL context before destroying it (855 bytes, patch)
2011-06-15 09:09 PDT, Benoit Jacob [:bjacob] (mostly away)
karlt: review+
bugzilla: approval‑mozilla‑aurora+
Details | Diff | Splinter Review

Description Tony Mechelynck [:tonymec] 2011-05-25 19:08:18 PDT
This bug was filed from the Socorro interface and is 
report bp-db62e4c1-c47f-40ff-8501-748932110525 .
============================================================= 
«at closedown»

Mozilla/5.0 (X11; Linux x86_64; rv:7.0a1) Gecko/20110525 Firefox/7.0a1 SeaMonkey/2.2a1pre
ID:20110525003002

also:
bp-3a718eb4-8f48-427c-b637-5aca92110525 «after Ctrl+Q in Safe Mode»
bp-8b03e8d0-b942-4ecb-a2bf-1993e2110525 «after clicking "Restart with Add-ons Disabled" then okaying the popups»

Since installing this build, I get this crash at every closedown of SeaMonkey, even in Safe Mode.

NB: CPU is "amd64" according to Breakpad/Soccorro (as shown below) but "GenuineIntel" (model name: Intel(R) Pentium(R) 4 CPU 2.80GHz) according to openSUSE YaST.

Here comes the crash report. Sorry, this nightly was apparently built without symbols.

Signature	libc-2.11.3.so@0x32ab5
UUID	db62e4c1-c47f-40ff-8501-748932110525
Uptime	35.6 minutes
Last Crash	2833 seconds (47.2 minutes) before submission
Install Age	15300 seconds (4.2 hours) since version was first installed.
Install Time	2011-05-25 21:07:23
Product	SeaMonkey
Version	2.2a1pre
Build ID	20110525003002
Release Channel	nightly
Branch	2.2
OS	Linux
OS Version	0.0.0 Linux 2.6.37.6-0.5-desktop #1 SMP PREEMPT 2011-04-25 21:48:33 +0200 x86_64
CPU	amd64
CPU Info	family 15 model 4 stepping 1
Crash Reason	SIGABRT
Crash Address	0x363e
User Comments	at closedown
App Notes 	OpenGL: Mesa Project -- Software Rasterizer -- 1.4 (2.1 Mesa 7.10.2)
WebGL? libGL.so.1? libGL.so.1+
GL Context? GL Context+
WebGL-
X_GLXMakeCurrent: GLXBadContextTagxpcom_runtime_abort(###!!! ABORT: X_GLXMakeCurrent: GLXBadContextTag: file /builds/slave/comm-cen-trunk-lnx64-ntly/build/mozilla/toolkit/xre/nsX11ErrorHandler.cpp, line 199)
Processor Notes 	
EMCheckCompatibility	False
Winsock LSP	
Adapter Vendor ID	
Adapter Device ID	
Bugzilla - Report this Crash
Crashing Thread
Frame 	Module 	Signature [Expand] 	Source
0 	libc-2.11.3.so 	libc-2.11.3.so@0x32ab5 	
1 	libc-2.11.3.so 	libc-2.11.3.so@0x33fb5 	
2 	libxul.so 	libxul.so@0x1e7e78f 	
3 	libxul.so 	libxul.so@0x20563c5 	
4 	libxul.so 	libxul.so@0x2056fbf 	
5 	libc-2.11.3.so 	libc-2.11.3.so@0x6b1f 	
6 	libmozalloc.so 	libmozalloc.so@0x517 	
7 	libxul.so 	libxul.so@0x29e022f 	
8 	libxul.so 	libxul.so@0x1e7e177 	
9 	libplds4.so 	libplds4.so@0x202fff 	
10 	ld-2.11.3.so 	ld-2.11.3.so@0xcb10 	
11 	libmozalloc.so 	libmozalloc.so@0x1091 	
12 	libxul.so 	libxul.so@0x16bf038 	
13 	libxul.so 	libxul.so@0x16d7052 	
14 	libGL.so.1.2 	libGL.so.1.2@0xd52d 	
15 	libxul.so 	libxul.so@0x6d6def 	
16 	libxul.so 	libxul.so@0x6d6ea0 	
17 	libxul.so 	libxul.so@0x16d423e 	
18 	libxul.so 	libxul.so@0x20563c5 	
19 	ld-2.11.3.so 	ld-2.11.3.so@0x9225 	
20 	libxul.so 	libxul.so@0x1095f 	
21 	libxul.so 	libxul.so@0x2b80f 	
22 	libxul.so 	libxul.so@0x16678c7 	
23 	libxul.so 	libxul.so@0x43b19 	
24 	libxul.so 	libxul.so@0x4365f 	
25 	libxul.so 	libxul.so@0x2b80f 	
26 	ld-2.11.3.so 	ld-2.11.3.so@0x9531 	
27 	libxul.so 	libxul.so@0x1e7e66c 	
28 	libxul.so 	libxul.so@0x20563c5 	
29 	libxul.so 	libxul.so@0x43b19 	
30 	libxul.so 	libxul.so@0x1e7e78f 	
31 	libxul.so 	libxul.so@0x20563c5 	
32 	libxul.so 	libxul.so@0x2056fbf 	
33 	libxul.so 	libxul.so@0x29e022f 	
34 	libxul.so 	libxul.so@0x16d423e 	
35 	libxul.so 	libxul.so@0x2b80f 	
36 	libxul.so 	libxul.so@0x29e022f 	
37 	libxpcom.so 	libxpcom.so@0x203fff 	
38 	ld-2.11.3.so 	ld-2.11.3.so@0xcb10 	
39 	libxul.so 	libxul.so@0x2b80f 	
40 	libxul.so 	libxul.so@0x6d1e79
Comment 1 Tony Mechelynck [:tonymec] 2011-05-25 19:15:30 PDT
The same crash happened when clicking "Reload" on about:support at first startup of this build, after noticing that no extensions were listed (see also bug 659772): bp-a058f651-88d6-401b-854a-b38cc2110525
Comment 2 Karl Tomlinson (:karlt) 2011-05-25 20:19:24 PDT
(In reply to comment #0)
> 1.4 (2.1 Mesa 7.10.2)

This is what is normally displayed with indirect rendering.
Can you check with "LIBGL_DEBUG=verbose glxinfo", please, that it says "direct rendering: No" and see what it says about why it is indirect?

> X_GLXMakeCurrent: GLXBadContextTag

Same issue was reported in bug 645407 comment 10 and 17.

http://www.opengl.org/documentation/specs/glx/glx1.4.pdf says:

"The following error codes may be generated by a faulty GLX implementation,
but would not normally be visible to clients:
GLXBadContextTag A rendering request contains an invalid context tag.
(Context tags are used to identify contexts in the protocol.)"

In bug 429604 comment 0 it was happening when trying to MakeCurrent a context whose drawable had already been destroyed.

I guess this is a regression from bug 645407 because before then Mesa libGL was blacklisted.
Comment 3 Tony Mechelynck [:tonymec] 2011-05-25 21:23:09 PDT
In reply to comment #2

linux:~ # LIBGL_DEBUG=verbose glxinfo
name of display: :0
display: :0  screen: 0
direct rendering: No
server glx vendor string: SGI
server glx version string: 1.4
server glx extensions:
    GLX_ARB_multisample, GLX_EXT_visual_info, GLX_EXT_visual_rating, 
    GLX_EXT_import_context, GLX_EXT_texture_from_pixmap, GLX_OML_swap_method, 
    GLX_SGI_make_current_read, GLX_SGIS_multisample, GLX_SGIX_hyperpipe, 
    GLX_SGIX_swap_barrier, GLX_SGIX_fbconfig, GLX_SGIX_pbuffer, 
    GLX_MESA_copy_sub_buffer, GLX_INTEL_swap_event
client glx vendor string: Mesa Project and SGI
client glx version string: 1.4
client glx extensions:
    GLX_ARB_get_proc_address, GLX_ARB_multisample, GLX_EXT_import_context, 
    GLX_EXT_visual_info, GLX_EXT_visual_rating, GLX_MESA_copy_sub_buffer, 
    GLX_MESA_swap_control, GLX_OML_swap_method, GLX_OML_sync_control, 
    GLX_SGI_make_current_read, GLX_SGI_swap_control, GLX_SGI_video_sync, 
    GLX_SGIS_multisample, GLX_SGIX_fbconfig, GLX_SGIX_pbuffer, 
    GLX_SGIX_visual_select_group, GLX_EXT_texture_from_pixmap, 
    GLX_INTEL_swap_event
GLX extensions:
    GLX_ARB_get_proc_address, GLX_ARB_multisample, GLX_EXT_import_context, 
    GLX_EXT_visual_info, GLX_EXT_visual_rating, GLX_MESA_copy_sub_buffer, 
    GLX_OML_swap_method, GLX_SGI_make_current_read, GLX_SGIS_multisample, 
    GLX_SGIX_fbconfig, GLX_SGIX_pbuffer, GLX_EXT_texture_from_pixmap, 
    GLX_INTEL_swap_event
OpenGL vendor string: Mesa Project
OpenGL renderer string: Software Rasterizer
OpenGL version string: 1.4 (2.1 Mesa 7.10.2)
OpenGL extensions:
    GL_ARB_depth_texture, GL_ARB_draw_buffers, GL_ARB_fragment_program, 
    GL_ARB_fragment_program_shadow, GL_ARB_multisample, GL_ARB_multitexture, 
    GL_ARB_occlusion_query, GL_ARB_point_parameters, GL_ARB_point_sprite, 
    GL_ARB_shadow, GL_ARB_shadow_ambient, GL_ARB_texture_border_clamp, 
    GL_ARB_texture_compression, GL_ARB_texture_cube_map, 
    GL_ARB_texture_env_add, GL_ARB_texture_env_combine, 
    GL_ARB_texture_env_crossbar, GL_ARB_texture_env_dot3, 
    GL_ARB_texture_mirrored_repeat, GL_ARB_texture_non_power_of_two, 
    GL_ARB_texture_rectangle, GL_ARB_transpose_matrix, GL_ARB_vertex_program, 
    GL_ARB_window_pos, GL_EXT_abgr, GL_EXT_bgra, GL_EXT_blend_color, 
    GL_EXT_blend_equation_separate, GL_EXT_blend_func_separate, 
    GL_EXT_blend_logic_op, GL_EXT_blend_minmax, GL_EXT_blend_subtract, 
    GL_EXT_copy_texture, GL_EXT_draw_range_elements, GL_EXT_fog_coord, 
    GL_EXT_framebuffer_object, GL_EXT_multi_draw_arrays, GL_EXT_packed_pixels, 
    GL_EXT_paletted_texture, GL_EXT_point_parameters, GL_EXT_polygon_offset, 
    GL_EXT_rescale_normal, GL_EXT_secondary_color, 
    GL_EXT_separate_specular_color, GL_EXT_shadow_funcs, 
    GL_EXT_shared_texture_palette, GL_EXT_stencil_two_side, 
    GL_EXT_stencil_wrap, GL_EXT_subtexture, GL_EXT_texture, GL_EXT_texture3D, 
    GL_EXT_texture_edge_clamp, GL_EXT_texture_env_add, 
    GL_EXT_texture_env_combine, GL_EXT_texture_env_dot3, 
    GL_EXT_texture_lod_bias, GL_EXT_texture_mirror_clamp, 
    GL_EXT_texture_object, GL_EXT_texture_rectangle, GL_EXT_vertex_array, 
    GL_3DFX_texture_compression_FXT1, GL_APPLE_packed_pixels, 
    GL_ATI_draw_buffers, GL_ATI_texture_env_combine3, 
    GL_ATI_texture_mirror_once, GL_ATIX_texture_env_combine3, 
    GL_IBM_texture_mirrored_repeat, GL_INGR_blend_func_separate, 
    GL_MESA_pack_invert, GL_MESA_ycbcr_texture, GL_NV_blend_square, 
    GL_NV_depth_clamp, GL_NV_fragment_program, GL_NV_fragment_program_option, 
    GL_NV_light_max_exponent, GL_NV_point_sprite, GL_NV_texgen_reflection, 
    GL_NV_texture_env_combine4, GL_NV_texture_rectangle, GL_NV_vertex_program, 
    GL_NV_vertex_program1_1, GL_SGIS_generate_mipmap, 
    GL_SGIS_texture_border_clamp, GL_SGIS_texture_edge_clamp, 
    GL_SGIS_texture_lod, GL_SGIX_shadow_ambient, GL_SUN_multi_draw_arrays
glu version: 1.3
glu extensions:
    GLU_EXT_nurbs_tessellator, GLU_EXT_object_space_tess

***
*** WARNING: Direct Rendering is NOT enabled
***


   visual  x  bf lv rg d st colorbuffer ax dp st accumbuffer  ms  cav
 id dep cl sp sz l  ci b ro  r  g  b  a bf th cl  r  g  b  a ns b eat
----------------------------------------------------------------------
0x21 24 tc  0 32  0 r  y  .  8  8  8  8  0 24  8  0  0  0  0  0 0 None
0xc2 24 tc  0 24  0 r  .  .  8  8  8  0  0  0  0  0  0  0  0  0 0 None
0xc3 24 tc  0 24  0 r  .  .  8  8  8  0  0  0  0 16 16 16  0  0 0 Slow
0xc4 24 tc  0 24  0 r  y  .  8  8  8  0  0  0  0  0  0  0  0  0 0 None
0xc5 24 tc  0 24  0 r  y  .  8  8  8  0  0  0  0 16 16 16  0  0 0 Slow
0xc6 24 tc  0 24  0 r  .  .  8  8  8  0  0  0  8  0  0  0  0  0 0 None
0xc7 24 tc  0 24  0 r  .  .  8  8  8  0  0  0  8 16 16 16  0  0 0 Slow
0xc8 24 tc  0 24  0 r  y  .  8  8  8  0  0  0  8  0  0  0  0  0 0 None
0xc9 24 tc  0 24  0 r  y  .  8  8  8  0  0  0  8 16 16 16  0  0 0 Slow
0xca 24 tc  0 24  0 r  .  .  8  8  8  0  0 24  0  0  0  0  0  0 0 None
0xcb 24 tc  0 24  0 r  .  .  8  8  8  0  0 24  0 16 16 16  0  0 0 Slow
0xcc 24 tc  0 24  0 r  y  .  8  8  8  0  0 24  0  0  0  0  0  0 0 None
0xcd 24 tc  0 24  0 r  y  .  8  8  8  0  0 24  0 16 16 16  0  0 0 Slow
0xce 24 tc  0 24  0 r  .  .  8  8  8  0  0 24  8  0  0  0  0  0 0 None
0xcf 24 tc  0 24  0 r  .  .  8  8  8  0  0 24  8 16 16 16  0  0 0 Slow
0xd0 24 tc  0 24  0 r  y  .  8  8  8  0  0 24  8  0  0  0  0  0 0 None
0xd1 24 tc  0 24  0 r  y  .  8  8  8  0  0 24  8 16 16 16  0  0 0 Slow
0xd2 24 tc  0 32  0 r  .  .  8  8  8  8  0  0  0  0  0  0  0  0 0 None
0xd3 24 tc  0 32  0 r  .  .  8  8  8  8  0  0  0 16 16 16 16  0 0 Slow
0xd4 24 tc  0 32  0 r  y  .  8  8  8  8  0  0  0  0  0  0  0  0 0 None
0xd5 24 tc  0 32  0 r  y  .  8  8  8  8  0  0  0 16 16 16 16  0 0 Slow
0xd6 24 tc  0 32  0 r  .  .  8  8  8  8  0  0  8  0  0  0  0  0 0 None
0xd7 24 tc  0 32  0 r  .  .  8  8  8  8  0  0  8 16 16 16 16  0 0 Slow
0xd8 24 tc  0 32  0 r  y  .  8  8  8  8  0  0  8  0  0  0  0  0 0 None
0xd9 24 tc  0 32  0 r  y  .  8  8  8  8  0  0  8 16 16 16 16  0 0 Slow
0xda 24 tc  0 32  0 r  .  .  8  8  8  8  0 24  0  0  0  0  0  0 0 None
0xdb 24 tc  0 32  0 r  .  .  8  8  8  8  0 24  0 16 16 16 16  0 0 Slow
0xdc 24 tc  0 32  0 r  y  .  8  8  8  8  0 24  0 16 16 16 16  0 0 Slow
0xdd 24 tc  0 32  0 r  .  .  8  8  8  8  0 24  8  0  0  0  0  0 0 None
0xde 24 tc  0 32  0 r  .  .  8  8  8  8  0 24  8 16 16 16 16  0 0 Slow
0xdf 24 tc  0 32  0 r  y  .  8  8  8  8  0 24  8 16 16 16 16  0 0 Slow
0xe0 24 dc  0 24  0 r  .  .  8  8  8  0  0  0  0  0  0  0  0  0 0 None
0xe1 24 dc  0 24  0 r  .  .  8  8  8  0  0  0  0 16 16 16  0  0 0 Slow
0xe2 24 dc  0 24  0 r  y  .  8  8  8  0  0  0  0  0  0  0  0  0 0 None
0xe3 24 dc  0 24  0 r  y  .  8  8  8  0  0  0  0 16 16 16  0  0 0 Slow
0xe4 24 dc  0 24  0 r  .  .  8  8  8  0  0  0  8  0  0  0  0  0 0 None
0xe5 24 dc  0 24  0 r  .  .  8  8  8  0  0  0  8 16 16 16  0  0 0 Slow
0xe6 24 dc  0 24  0 r  y  .  8  8  8  0  0  0  8  0  0  0  0  0 0 None
0xe7 24 dc  0 24  0 r  y  .  8  8  8  0  0  0  8 16 16 16  0  0 0 Slow
0xe8 24 dc  0 24  0 r  .  .  8  8  8  0  0 24  0  0  0  0  0  0 0 None
0xe9 24 dc  0 24  0 r  .  .  8  8  8  0  0 24  0 16 16 16  0  0 0 Slow
0xea 24 dc  0 24  0 r  y  .  8  8  8  0  0 24  0  0  0  0  0  0 0 None
0xeb 24 dc  0 24  0 r  y  .  8  8  8  0  0 24  0 16 16 16  0  0 0 Slow
0xec 24 dc  0 24  0 r  .  .  8  8  8  0  0 24  8  0  0  0  0  0 0 None
0xed 24 dc  0 24  0 r  .  .  8  8  8  0  0 24  8 16 16 16  0  0 0 Slow
0xee 24 dc  0 24  0 r  y  .  8  8  8  0  0 24  8  0  0  0  0  0 0 None
0xef 24 dc  0 24  0 r  y  .  8  8  8  0  0 24  8 16 16 16  0  0 0 Slow
0xf0 24 dc  0 32  0 r  .  .  8  8  8  8  0  0  0  0  0  0  0  0 0 None
0xf1 24 dc  0 32  0 r  .  .  8  8  8  8  0  0  0 16 16 16 16  0 0 Slow
0xf2 24 dc  0 32  0 r  y  .  8  8  8  8  0  0  0  0  0  0  0  0 0 None
0xf3 24 dc  0 32  0 r  y  .  8  8  8  8  0  0  0 16 16 16 16  0 0 Slow
0xf4 24 dc  0 32  0 r  .  .  8  8  8  8  0  0  8  0  0  0  0  0 0 None
0xf5 24 dc  0 32  0 r  .  .  8  8  8  8  0  0  8 16 16 16 16  0 0 Slow
0xf6 24 dc  0 32  0 r  y  .  8  8  8  8  0  0  8  0  0  0  0  0 0 None
0xf7 24 dc  0 32  0 r  y  .  8  8  8  8  0  0  8 16 16 16 16  0 0 Slow
0xf8 24 dc  0 32  0 r  .  .  8  8  8  8  0 24  0  0  0  0  0  0 0 None
0xf9 24 dc  0 32  0 r  .  .  8  8  8  8  0 24  0 16 16 16 16  0 0 Slow
0xfa 24 dc  0 32  0 r  y  .  8  8  8  8  0 24  0  0  0  0  0  0 0 None
0xfb 24 dc  0 32  0 r  y  .  8  8  8  8  0 24  0 16 16 16 16  0 0 Slow
0xfc 24 dc  0 32  0 r  .  .  8  8  8  8  0 24  8  0  0  0  0  0 0 None
0xfd 24 dc  0 32  0 r  .  .  8  8  8  8  0 24  8 16 16 16 16  0 0 Slow
0xfe 24 dc  0 32  0 r  y  .  8  8  8  8  0 24  8  0  0  0  0  0 0 None
0xff 24 dc  0 32  0 r  y  .  8  8  8  8  0 24  8 16 16 16 16  0 0 Slow
0x41 32 tc  0 32  0 r  y  .  8  8  8  8  0 24  0  0  0  0  0  0 0 None
Comment 4 Tony Mechelynck [:tonymec] 2011-05-25 21:29:47 PDT
Karl: do you think I could (and could you tell me how to) disable high-speed graphics for the time being? (Until yesterday about:support said: "Graphics: 0/2")
Comment 5 Karl Tomlinson (:karlt) 2011-05-25 21:59:18 PDT
I thought this was just related to about:support and possibly WebGL.

I thought about:support should still be saying "Graphics: 0/2".  OpenGL (low-speed) composition landed briefly but was backed out.

If you are seeing "Graphics: 2/2", then try disabling Preferences -> Advanced -> General -> "Use hardware acceleration where available".
(It's a bad name.  What it really means is "Prefer OpenGL over RENDER for layers composition if deemed appropriate".)  I don't know that this is related to your issue though.

I think what you want to do is disable use of OpenGL in about:support, but I don't know how to do that.  Benoit or Matt might know?
Comment 6 Benoit Jacob [:bjacob] (mostly away) 2011-05-25 22:03:01 PDT
In about:config, set webgl.disabled and layers.acceleration.disabled.
Comment 7 Karl Tomlinson (:karlt) 2011-05-25 22:22:29 PDT
about:support still "Created offscreen FBO" with webgl.disabled and layers.acceleration.disabled both set to true in a build based on f9a070327df8.
Comment 8 Tony Mechelynck [:tonymec] 2011-05-25 22:27:22 PDT
In reply to comment #5:
I have no such "Use hardware acceleration where available" checkbox in SeaMonkey.

What I'm now seeing at the bottom of about:support is:
Graphics
    Adapter Description        Mesa Project -- Software Rasterizer
    Driver Version             1.4 (2.1 Mesa 7.10.2)
    WebGL Renderer             false

and in addition, when sending about:support to the clipboard, it adds the following, which I don't see when viewing the page:
    GPU Accelerated Windows    0/2

Until yesterday (in a build from 17 April) I didn't see the first three lines but I saw the fourth one.

(In reply to comment #6)
> In about:config, set webgl.disabled and layers.acceleration.disabled.
thanks, I'll try that, let's hope it masks these crashes.
Comment 9 Benoit Jacob [:bjacob] (mostly away) 2011-05-25 22:58:32 PDT
(In reply to comment #7)
> about:support still "Created offscreen FBO" with webgl.disabled and
> layers.acceleration.disabled both set to true in a build based on
> f9a070327df8.

Then that's a bug we should fix!
Comment 10 Karl Tomlinson (:karlt) 2011-05-29 13:07:17 PDT
Similar crash report on shutdown on a different system: bp-f328d261-dd63-4b80-8c8e-9b37e2110528
Comment 11 Benoit Jacob [:bjacob] (mostly away) 2011-05-29 16:22:58 PDT
I can't reproduce this in Firefox. As soon as I set webgl.disabled, about:support ceases trying to create GL contexts. Specifically, about:support calls GfxInfoWebGL::GetWebGLParameter() which calls WebGLContext::SetDimensions which checks if WebGL is disabled or blacklisted before calling GLContextProvider::CreateOffscreen().

The original bug report here is about SeaMonkey, not Firefox. Could this be a SeaMonkey-specific bug, or a bug in an older version of this code that SeaMonkey is still using?
Comment 12 Benoit Jacob [:bjacob] (mostly away) 2011-05-29 16:24:18 PDT
(This bug report is getting confused between two different things, the crash as in comment 10 on the one hand, and the fact that SeaMonkey's about:support does GL stuff even when WebGL is disabled)
Comment 13 Tony Mechelynck [:tonymec] 2011-05-29 23:55:57 PDT
I'm still getting this crash at every closedown of SeaMonkey, and Socorro dutifully mentions this bug when displaying them, but no symbols so far. If someone could tell me how to get a SeaMonkey linux-x86_64 trunk build with symbols without compiling it myself, I would gladly do so, just so I could link a stack trace with symbols from this bug.

In reply to comment #12:
I have webgl.disabled = true in about:config, all other webgl prefs defaulted (.force-enabled = false, .force_osmesa = false, .osmesalib = "", .prefer-native-gl = false, .shader_validator = true, .verbose = false). Under layers.acceleration I also have .disabled (user set) = true and .force-enabled (default) = false. At the bottom of about:support I see (ATM, in this build):

Adapter Description      Mesa Project -- Software Rasterizer
Driver Version           1.4 (2.1 Mesa 7.10.2)
WebGL Renderer           false
GPU Accelerated Windows  0/3

If this build does _not_ crash when I close it down later today ("today" for my CEST timezone) to install the next nightly, I'll mention it here.

Mozilla/5.0 (X11; Linux x86_64; rv:7.0a1) Gecko/20110529 Firefox/7.0a1 SeaMonkey/2.2a1pre ID:20110529003002
Comment 14 Tony Mechelynck [:tonymec] 2011-05-30 00:20:12 PDT
Adding topcrash keyword: this crash is ATM #2 topcrasher for SeaMonkey 2.2a1pre for all three of 7- 14- and 28-day stats periods (and #1 on Linux, since #1 all over is a Windows-only crash).
Comment 15 Karl Tomlinson (:karlt) 2011-05-30 03:30:34 PDT
(In reply to comment #12)
> (This bug report is getting confused between two different things, the crash
> as in comment 10 on the one hand,

That is this bug.
Note that the crash in comment 10 is from Firefox, the same crash as initially reported in SeaMonkey.

> and the fact that SeaMonkey's
> about:support does GL stuff even when WebGL is disabled)

That was discovered in trying to use the suggested workaround for this bug.

Firefox's about:support also does GL stuff when WebGL is disabled.

(In reply to comment #11)
> Specifically,
> about:support calls GfxInfoWebGL::GetWebGLParameter() which calls
> WebGLContext::SetDimensions which checks if WebGL is disabled or blacklisted
> before calling GLContextProvider::CreateOffscreen().

I see the blacklist check at
http://hg.mozilla.org/mozilla-central/annotate/d105fc895d91/content/canvas/src/WebGLContext.cpp#l458

but where is SafeToCreateCanvas3DContext() meant to be called or webgl.disabled checked elsewhere?
Comment 16 Tony Mechelynck [:tonymec] 2011-05-30 03:44:55 PDT
(In reply to comment #15)
> (In reply to comment #12)
> > (This bug report is getting confused between two different things, the crash
> > as in comment 10 on the one hand,
> 
> That is this bug.
> Note that the crash in comment 10 is from Firefox, the same crash as
> initially reported in SeaMonkey.

[...]

If it's the same crash (but with symbols in Firefox :-) ) then the bug's Summary doesn't need to say "[SeaMonkey]".
Comment 17 Benoit Jacob [:bjacob] (mostly away) 2011-05-30 09:23:33 PDT
(In reply to comment #15)
> (In reply to comment #11)
> > Specifically,
> > about:support calls GfxInfoWebGL::GetWebGLParameter() which calls
> > WebGLContext::SetDimensions which checks if WebGL is disabled or blacklisted
> > before calling GLContextProvider::CreateOffscreen().
> 
> I see the blacklist check at
> http://hg.mozilla.org/mozilla-central/annotate/d105fc895d91/content/canvas/
> src/WebGLContext.cpp#l458
> 
> but where is SafeToCreateCanvas3DContext() meant to be called or
> webgl.disabled checked elsewhere?

Ah! Sorry about that. I had the blacklisting in mind and forgot to check where we were checking for webgl.disabled. That is indeed in SafeToCreateCanvas3DContext() which is called from GetContext(). The bug here is that about:support bypasses GetContext, so it doesn't honor webgl.disabled.
Comment 18 Benoit Jacob [:bjacob] (mostly away) 2011-05-30 09:40:10 PDT
Created attachment 536116 [details] [diff] [review]
check if webgl is disabled

Here's a patch that should work (didn't try yet).

Also the reason why I failed to reproduce is probably that I did something wrong on my end (changed not only disabled but also force-enabled which I need to bypass the blacklist).
Comment 19 Benoit Jacob [:bjacob] (mostly away) 2011-05-30 12:14:57 PDT
(In reply to comment #18)
> Created attachment 536116 [details] [diff] [review] [review]
> check if webgl is disabled
> 
> Here's a patch that should work (didn't try yet).

I tried it, it works here.
Comment 20 Asa Dotzler [:asa] 2011-05-31 14:48:34 PDT
when you get reviews, please ask for approval on the patch for aurora for 6. thanks.
Comment 21 Karl Tomlinson (:karlt) 2011-05-31 20:44:01 PDT
Comment on attachment 536116 [details] [diff] [review]
check if webgl is disabled

>+  // we need to check for webgl.disabled here, because otherwise it's only checked in GetContext() which
>+  // we're bypassing here. We can't call SafeToCreateCanvas3DContext here because it's whitelisting chrome altogether,
>+  // and here we're chrome and we don't want to be whitelisted.

This feels to me like giving different meanings to webgl.disabled in different places.

If it is not safe to use WebGL in about:support, then in which part of chrome would it be safe to use?

It is time to change the meaning of webgl.disabled, so that it disables webgl everywhere?
Or do we need separate prefs?  Perhaps a global disable-opengl pref?
Comment 22 Karl Tomlinson (:karlt) 2011-05-31 20:45:09 PDT
(In reply to comment #21)
> It is time to change the meaning of webgl.disabled,

Sorry, I meant "Is it time to ..."
Comment 23 Benoit Jacob [:bjacob] (mostly away) 2011-06-01 06:14:59 PDT
(In reply to comment #22)
> (In reply to comment #21)
> > It is time to change the meaning of webgl.disabled,
> 
> Sorry, I meant "Is it time to ..."

I think I agree with you. I don't see why code was written in such a way that webgl.disabled is not honored in chrome.
Comment 24 Tony Mechelynck [:tonymec] 2011-06-04 10:10:55 PDT
Mozilla/5.0 (X11; Linux x86_64; rv:7.0a1) Gecko/20110604 Firefox/7.0a1 SeaMonkey/2.2a1pre ID:20110604003003

Got the "about:support reload" variant of this crash for the second time (the first was comment #1): bp-b373949c-f5cc-4e84-a8cf-a05de2110604; on restart, all extensions were enabled, see bug 659772 comment #24

I'm still getting the other variant of this bug at every shutdown.
Comment 25 Tony Mechelynck [:tonymec] 2011-06-07 18:37:43 PDT
Mozilla/5.0 (X11; Linux x86_64; rv:7.0a1) Gecko/20110607 Firefox/7.0a1 SeaMonkey/2.4a1 ID:20110607003003

My latest closedown had no crash, but it was after very little activity: after a series of startup crashes, I invoked "seamonkey -P default -url about:addons", uninstalled one extension (QuickFolders 2.6), shut down (with crash), restarted the same, checked that the addon was gone, shut down (no crash), and restarted again (with my usual multitab homepage). If the next closedown doesn't crash I'll comment again.

(Don't rush and deduce that it was all the addon's fault: in the past I've had this crash even in Safe Mode.)
Comment 26 Benoit Jacob [:bjacob] (mostly away) 2011-06-07 18:41:29 PDT
Tony, we've not checked in any patch yet here, so don't expect anything to be fixed yet. I'll try to implement the idea in comment 21 ASAP.
Comment 27 Tony Mechelynck [:tonymec] 2011-06-08 06:46:54 PDT
(In reply to comment #26)
> Tony, we've not checked in any patch yet here, so don't expect anything to
> be fixed yet. I'll try to implement the idea in comment 21 ASAP.

No prob; just telling what I'm seeing, since AFAICT from Socorro reports most or all of the reports making this the #1 Linux topcrash are from me (and BTW, I don't crash for the pleasure of crashing ;-), it's just that as a nightly tester I close down SeaMonkey at least once a day, often more than that).
Comment 28 Benoit Jacob [:bjacob] (mostly away) 2011-06-08 06:58:37 PDT
Bug 645407 comment 17 says he got this crash also with r300g, so it's not specific to software Mesa.

Maybe we're really doing something wrong here. Will make you a tryserver build that prints more debugging info.
Comment 29 Benoit Jacob [:bjacob] (mostly away) 2011-06-08 07:47:46 PDT
Tryserver build: http://tbpl.mozilla.org/?tree=Try&rev=caf49abe7cdd

Once the buils are done, they will be available at:
https://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/bjacob@mozilla.com-caf49abe7cdd

Please run like this:

MOZ_X_SYNC=1 ./firefox -P -no-remote 2>&1 | tee logfile.txt

The MOZ_X_SYNC ensures that the X error occurs synchronously wrt debug output. The 2>&1 is because I print the debug info to stderr. Please then (compress and) attach logfile.txt.
Comment 30 Benoit Jacob [:bjacob] (mostly away) 2011-06-08 08:14:33 PDT
Created attachment 538022 [details] [diff] [review]
kill SafeToCreateCanvas3DContex, check webgl.disabled in SetDimensions

This SafeToCreateCanvas3DContext method was apparently a remnant of the days when WebGL was called Canvas3D. This patch removes it, instead the webgl.disabled pref is checked in SetDimensions which is where the other prefs are checked and where we potentially create GL contexts. The prefs-checking code is moved up a bit to occur before we do anything about GL ContextFormats. Chrome canvases are no longer special-cases. The about:support code no longer does anything special, it just creates a WebGL canvas and check if that succeeds.
Comment 31 Benoit Jacob [:bjacob] (mostly away) 2011-06-08 10:38:57 PDT
The build from comment 29 is ready at:
https://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/bjacob@mozilla.com-caf49abe7cdd/
Comment 32 Karl Tomlinson (:karlt) 2011-06-08 15:37:03 PDT
(In reply to comment #28)
> Bug 645407 comment 17 says he got this crash also with r300g, so it's not
> specific to software Mesa.

Yes, but FYI that was also indirect, so it would be the same client-side code running.  (The device-dependent part would be on the server.)

> Maybe we're really doing something wrong here.

I wonder whether the context might need to be cleared (or something) when the drawable is destroyed.  Do we do that?
Comment 33 Karl Tomlinson (:karlt) 2011-06-08 15:42:50 PDT
Comment on attachment 538022 [details] [diff] [review]
kill SafeToCreateCanvas3DContex, check webgl.disabled in SetDimensions

Empty patch :)
Comment 34 Tony Mechelynck [:tonymec] 2011-06-08 16:47:35 PDT
Created attachment 538148 [details]
log from "verbose" run of Nightly, see comment #29 and #34

(In reply to comment #29)
> Tryserver build: http://tbpl.mozilla.org/?tree=Try&rev=caf49abe7cdd
> 
> Once the buils are done, they will be available at:
> https://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/bjacob@mozilla.
> com-caf49abe7cdd
> 
> Please run like this:
> 
> MOZ_X_SYNC=1 ./firefox -P -no-remote 2>&1 | tee logfile.txt
> 
> The MOZ_X_SYNC ensures that the X error occurs synchronously wrt debug
> output. The 2>&1 is because I print the debug info to stderr. Please then
> (compress and) attach logfile.txt.

Is that instead of, or in addition to, the --sync command-line option to "make X calls synchronous"? I assumed "in addition to".

The crash report is at bp-733dd75f-a0cb-487e-b02f-8fc622110608

I usually experience this crash with SeaMonkey; this time I used your try-build of Firefox in an ad-hoc profile, viewed some pages including about:, about:support and about:addons, and got the expected closedown crash.
Comment 35 Benoit Jacob [:bjacob] (mostly away) 2011-06-08 19:12:58 PDT
Created attachment 538161 [details] [diff] [review]
kill SafeToCreateCanvas3DContex, check webgl.disabled in SetDimensions

Sorry about the empty patch, here is the real patch.
Comment 36 Benoit Jacob [:bjacob] (mostly away) 2011-06-08 19:26:51 PDT
(In reply to comment #34)
> > The MOZ_X_SYNC ensures that the X error occurs synchronously wrt debug
> > output. The 2>&1 is because I print the debug info to stderr. Please then
> > (compress and) attach logfile.txt.
> 
> Is that instead of, or in addition to, the --sync command-line option to
> "make X calls synchronous"? I assumed "in addition to".

I didn't know about --sync. The description sounds exactly like MOZ_X_SYNC does, but I don't know how/where it's implemented so I don't even know if it works. MOZ_X_SYNC at least works for sure.

> 
> The crash report is at bp-733dd75f-a0cb-487e-b02f-8fc622110608
> 
> I usually experience this crash with SeaMonkey; this time I used your
> try-build of Firefox in an ad-hoc profile, viewed some pages including
> about:, about:support and about:addons, and got the expected closedown crash.

Thanks for the log. It tells us that a WebGL context was created but failed to initialize (WebGL was unavailable), then the WebGL context was successfully destroy, but then on shutdown you got a crash in glXMakeCurrent() called from GLContext::MarkDestroyed() called from GLContextProviderGLX::Shutdown().

(In reply to comment #32)
> I wonder whether the context might need to be cleared (or something) when
> the drawable is destroyed.  Do we do that?

I don't think we do: I don't even know how we would learn when the drawable is destroyed? But since ultimately we are the ones who are destroying the drawable, this should be a matter of doing things in the right order.
Comment 37 Benoit Jacob [:bjacob] (mostly away) 2011-06-08 19:34:52 PDT
Notice how GLContextProviderGLX::Shutdown() is just doing:

  gGlobalContext = nsnull;

I.e. destroying the gGlobalContext. This is happening during XPCOM shutdown, it seems that this is too late, perhaps later than when the drawable is destroyed.

Whatever the benefits of having this "global context" are, I'm not sure they offset the cost...
Comment 38 Karl Tomlinson (:karlt) 2011-06-08 19:42:09 PDT
http://www.opengl.org/sdk/docs/man/xhtml/glXMakeCurrent.xml
"To release the current context without assigning a new one, call glXMakeCurrent with drawable set to None and ctx set to NULL."

If the current context depends on a drawable, then I expect this needs to be done before the drawable is destroyed.
Comment 39 Tony Mechelynck [:tonymec] 2011-06-08 20:40:47 PDT
(In reply to comment #36)
[...]
> I didn't know about --sync. The description sounds exactly like MOZ_X_SYNC
> does, but I don't know how/where it's implemented so I don't even know if it
> works. MOZ_X_SYNC at least works for sure.
[...]
It's listed under "X11 options" by seamonkey -h or firefox -h, and it is not specific to Mozilla applications: e.g. gvim has it too (at least when built with GTK2 GUI). I think it is a GTK setting (the Vim help says "Look in the GTK documentation for how they are used", talking of that option and several others). It is also mentioned by some X error messages, which tell you to set that option and then run the app under gdb with a breakpoint at (some particular symbol) if you want to get a stack trace or examine some variables at the moment the error is triggered.
-- Well, I used both.
Comment 40 Karl Tomlinson (:karlt) 2011-06-08 21:06:42 PDT
Comment on attachment 538161 [details] [diff] [review]
kill SafeToCreateCanvas3DContex, check webgl.disabled in SetDimensions

(In reply to comment #30)
> This patch removes it, instead the
> webgl.disabled pref is checked in SetDimensions which is where the other
> prefs are checked and where we potentially create GL contexts.

Looks good.

> The prefs-checking code is moved up a bit to occur before we do anything
> about GL ContextFormats.

If WebGLContext::SetDimensions() is going to fail, then wouldn't it make sense
to still DestroyResourcesAndContext() to clear old resources?
i.e. shouldn't the check be after DestroyResourcesAndContext()?

> Chrome canvases are no longer special-cases.

Makes sense to me.

> The about:support code no longer does anything special, it just creates a
> WebGL canvas and check if that succeeds.

I didn't see this change.  Was that meant to be included in this patch, or is
this a statement about what has already happened?
Or was the "special" thing about about:support that you refer to, the fact
that a WebGL context was created even when WebGL is disabled?
Comment 41 Karl Tomlinson (:karlt) 2011-06-08 21:15:00 PDT
(In reply to comment #39)
> -- Well, I used both.

Yes, both is fine.  --sync is implemented in GDK; MOZ_X_SYNC is implemented in Gecko.  An advantage of MOZ_X_SYNC is that it works in child (plugin and, in the future, content) processes, though that is not relevant here.
Comment 42 Benoit Jacob [:bjacob] (mostly away) 2011-06-09 19:30:40 PDT
Created attachment 538421 [details] [diff] [review]
kill SafeToCreateCanvas3DContex, check webgl.disabled in SetDimensions (UPDATED)

(In reply to comment #40)
> Comment on attachment 538161 [details] [diff] [review] [review]
> > The prefs-checking code is moved up a bit to occur before we do anything
> > about GL ContextFormats.
> 
> If WebGLContext::SetDimensions() is going to fail, then wouldn't it make
> sense
> to still DestroyResourcesAndContext() to clear old resources?
> i.e. shouldn't the check be after DestroyResourcesAndContext()?

You're entirely right, thanks for spotting this! New patch calls DestroyResourcesAndContext() as soon as we're done with the early success cases.


> > The about:support code no longer does anything special, it just creates a
> > WebGL canvas and check if that succeeds.
> 
> I didn't see this change.  Was that meant to be included in this patch, or is
> this a statement about what has already happened?
> Or was the "special" thing about about:support that you refer to, the fact
> that a WebGL context was created even when WebGL is disabled?

Sorry, what I wrote here was totally confused. Just ignore it :-)
Comment 43 Benoit Jacob [:bjacob] (mostly away) 2011-06-09 19:50:52 PDT
Created attachment 538423 [details] [diff] [review]
sweep X error under the carpet

This 1-line patch should silence the X error. The question is, do we want to do that?

My understanding is that we're calling GLContext::MarkDestroyed() during XPCOM context, which is a very late time to do such a thing, and the GL context has gone bad by the time we reach this point, probably the underlying surface is already destroyed.

So if we don't do this, then what are our alternatives?
 - probably the cleanest fix would be to find where exactly we're destroying our X resources and to destroy this global Gl context right before that.
 - alternatively we could decide that the idea of a "global GL context" was a bad idea?
Comment 44 Benoit Jacob [:bjacob] (mostly away) 2011-06-09 20:03:49 PDT
(In reply to comment #43)
> So if we don't do this, then what are our alternatives?
>  - probably the cleanest fix would be to find where exactly we're destroying

Perhaps a real fix on the horizon:

* X display closedown is done by MOZ_gdk_display_close, called at 2 places in nsAppRunner.cpp:
  http://mxr.mozilla.org/mozilla-central/ident?i=MOZ_gdk_display_close

* XPCOM shutdown is done by NS_ShutdownXPCOM, also called in nsAppRunner.cpp:
  http://mxr.mozilla.org/mozilla-central/ident?i=NS_ShutdownXPCOM&tree=mozilla-central&filter=

Patch coming...
Comment 45 Benoit Jacob [:bjacob] (mostly away) 2011-06-09 20:04:49 PDT
...though one could also decide that it's not worth complexifying nsAppRunner.cpp just to avoid a stupid X error! So please consider the 1-line patch.
Comment 46 Karl Tomlinson (:karlt) 2011-06-10 01:50:56 PDT
The X display is closed after the ScopedXPCOMStartup is destroyed in XRE_main, so that is not causing the drawable to get destroyed.  Even if it were, the lack of the display would cause different problems; the GLXBadContextTag error would not even be received.  Also remember this happens on reload of about:support (before shutdown).
Comment 47 Karl Tomlinson (:karlt) 2011-06-10 01:54:09 PDT
Most crashes in Mesa's glXMakeCurrent seem to involve unbinding the old current context rather than making new context current.

I'm suspicious that the drawable of the old current context may have been destroyed.

What ensures that a context is not current when its drawable gets destroyed?
Comment 48 Vladimir Vukicevic [:vlad] [:vladv] 2011-06-10 07:53:32 PDT
(In reply to comment #47)
> What ensures that a context is not current when its drawable gets destroyed?

The GLX docs state:

If /draw/ is destroyed after glXMakeContextCurrent is called, then subsequentrendering commands will be processed and the context state will be updated, butthe frame buffer state becomes undefined. If /read/ is destroyed after glXMake-ContextCurrent then pixel values read from the framebuffer (e.g., as result of calling glReadPixels ,glCopyPixels or glCopyColorTable) are undefined. If theX Window underlying the GLXWindow /draw/ or /read/ drawable is destroyed, ren-dering and readback are handled as above.

So it should be valid to have a current GL context whose GLX or X drawables have been destroyed.  This could well be a mesa bug.
Comment 49 Tony Mechelynck [:tonymec] 2011-06-10 09:56:47 PDT
Mozilla/5.0 (X11; Linux x86_64; rv:7.0a1) Gecko/20110610 Firefox/7.0a1 SeaMonkey/2.4a1 ID:20110610003052

The latest nightly where I crashed at every closedown @ libc-2.11.3.so@0x32ab5 was dated 2006-06-08; but see bp-4158b2a9-c75e-40c5-8ec1-eab7f2110610 which happened also at closedown (might be unrelated though). I had a few non-crashing closedowns with yesterday's nightly (June 9). I don't remember any specific X package updates but I wouldn't swear that there hasn't been any: I'm applying all patches from the openSUSE 11.4 "Update-Test" repository as soon as they are published.
Comment 50 Benoit Jacob [:bjacob] (mostly away) 2011-06-10 10:11:35 PDT
New try build with a LOT more debugging output based on above discussion:
http://tbpl.mozilla.org/?tree=Try&rev=8b7d6ea3affb

Once the buils are done, they will be available at:
https://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/bjacob@mozilla.com-8b7d6ea3affb

Run with:

MOZ_X_SYNC=1 ./firefox -P -no-remote 2>&1 | tee logfile.txt

I'm interested in both a shutdown crash, and a non-shutdown crash.

Note that this time the ligfile.txt might get really big so perhaps compress before attaching (or just pipe xz in the command line)
Comment 51 Benoit Jacob [:bjacob] (mostly away) 2011-06-10 10:52:28 PDT
Sorry, previous build failed, this one should work:
http://tbpl.mozilla.org/?tree=Try&rev=bbb566273556

Once the buils are done, they will be available at:
https://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/bjacob@mozilla.com-bbb566273556
Comment 52 Benoit Jacob [:bjacob] (mostly away) 2011-06-10 13:47:19 PDT
Landed the patch making about:support honor webgl.disabled:
http://hg.mozilla.org/mozilla-central/rev/cacfd85ffb49
Comment 53 Tony Mechelynck [:tonymec] 2011-06-11 06:28:30 PDT
Created attachment 538690 [details]
bzipped log from try-build run, see comment #51

This file (minefield.log.bz2) is a compressed log from a short run of the try-build from comment #51 in an ad-hoc "almost fresh" profile used only for tests of new Firefox builds (and not very often at that).

The closedown crash dump is available at bp-16a69069-72ca-493a-90d9-72b342110611
Comment 54 Benoit Jacob [:bjacob] (mostly away) 2011-06-11 08:37:00 PDT
Thanks a lot. Still processing the log; note that I made a mistake here,

gfxXlibSurface::~gfxXlibSurface()
{
    FUNCTION_DEBUG_HELPER
    printf_stderr("XXXXX this = %p, drawable = %d\n", this, mDrawable);
    if (mPixmapTaken) {
        XFreePixmap (mDisplay, mDrawable);
	 printf_stderr("XXXXX freed drawable = %d\n", this, mDrawable);
    }
}

So the "freed drawable" lines have wrong information,  but it doesn't matter as the above "XXXXX this = %p, drawable = %d\n" line gives the same information anyway.
Comment 55 Benoit Jacob [:bjacob] (mostly away) 2011-06-11 10:13:24 PDT
The place where it starts going wrong is "Error resizing offscreen framebuffer -- framebuffer not complete". Still working on the log...
Comment 56 Karl Tomlinson (:karlt) 2011-06-12 17:55:40 PDT
Created attachment 538794 [details] [diff] [review]
check that MakeCurrent succeeds after destroying the drawable underlying the previous context

(In reply to comment #48)
> If theX Window underlying the GLXWindow /draw/ or /read/
> drawable is destroyed, ren-dering and readback are handled as above.
> 
> So it should be valid to have a current GL context whose GLX or X drawables
> have been destroyed.  This could well be a mesa bug.

Thanks, Vlad.  I have a little trouble reconciling that with "If the previous
context of the calling thread has unflushed commands, and the previous
drawable is no longer valid, GLXBadCurrentDrawable is generated" but at least
BadContextTag is not appropriate for such a situation.

I tried modifying glxtest (as attached) to check that the GLX implementation
handled this situation, but it did not detect any problem. 

Here I get a SIGSEGV, which is a bit different to the error reported in this
bug, but happens in the same situations:

OpenGL vendor string: Advanced Micro Devices, Inc.
OpenGL renderer string: Mesa DRI R600 (JUNIPER 68A0) 20090101  TCL DRI2
OpenGL version string: 1.4 (2.1 Mesa 7.10.2)

#4  <signal handler called>
#5  0x00007f428dd54ed0 in xcb_glx_get_string_string_length () from /usr/lib64/libxcb-glx.so.0
#6  0x00007f429003b987 in __glXGetString (dpy=<value optimized out>, opcode=<value optimized out>, contextTag=9961477, name=7939) at glx_query.c:82
#7  0x00007f4290038436 in __indirect_glGetString (name=7939) at single2.c:685
#8  0x00007f429001cb0a in indirect_bind_context (gc=0x7f422ee6eda0, old=0x7f422ee6d180, draw=62915964, read=62915964) at indirect_glx.c:156
#9  0x00007f4290019f01 in MakeContextCurrent (dpy=0x7f428da40000, draw=62915964, read=62915964, gc_user=<value optimized out>) at glxcurrent.c:263
#10 0x00007f4299c85750 in mozilla::gl::GLContextGLX::MakeCurrentImpl (this=0x7f426df53800, aForce=0) at /home/karl/moz/dev/gfx/thebes/GLContextProviderGLX.cpp:439
Comment 57 Benoit Jacob [:bjacob] (mostly away) 2011-06-13 09:48:49 PDT
I think we have two separate problems here:
 1. That we get this incomplete framebuffer errors
 2. That we crash after this error

I made a new tryserver build to understand these problems.
http://tbpl.mozilla.org/?tree=Try&rev=260494d04975

When the builds are ready they will be at
https://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/bjacob@mozilla.com-260494d04975

This time, please run with this additional environment variable:
MOZ_GL_DEBUG_VERBOSE=1 MOZ_X_SYNC=1 ./firefox -P -no-remote 2>&1 | tee logfile.txt

This will record all GL calls and GL errors, which is likely to help understand the incomplete framebuffer errors. Also this new build has the patch from bug 654424 applied, which gives a lot more information about that.
Comment 58 Benoit Jacob [:bjacob] (mostly away) 2011-06-13 10:05:00 PDT
Sorry, previous build was bad, use this one:
http://tbpl.mozilla.org/?tree=Try&rev=12883056ea23

When the builds are ready they will be at
https://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/bjacob@mozilla.com-12883056ea23
Comment 59 Benoit Jacob [:bjacob] (mostly away) 2011-06-13 10:08:53 PDT
Also, notice that about:support should now honor webgl.disabled. if you want to reproduce the crash, either go to some other webgl page, or don't disable.
Comment 60 Tony Mechelynck [:tonymec] 2011-06-13 12:29:38 PDT
Created attachment 538977 [details]
bzipped log, see comment #58 to #60

(In reply to comment #59)
> Also, notice that about:support should now honor webgl.disabled. if you want
> to reproduce the crash, either go to some other webgl page, or don't disable.

Mozilla/5.0 (X11; Linux x86_64; rv:7.0a1) Gecko/20110613 Firefox/7.0a1

This is a test: I'm running this build of Nightly (big N), which is not a nightly (small n) but a try-build, in a test profile which differs very little from the default; in particular, webgl.disabled is false. (I see a pref named webgl.verbose which is also false by default, if you want it true next time, tell me).

Crash dump bp-31e79929-8c55-45dc-a987-e0bd72110613
Comment 61 Benoit Jacob [:bjacob] (mostly away) 2011-06-13 12:41:30 PDT
This unfortunately doesn't have MOZ_GL_DEBUG_VERBOSE=1, see comment 58, can you please retry with it? It's really useful here.

No need for webgl.verbose, that is only to help JS developers fix their code.
Comment 62 Tony Mechelynck [:tonymec] 2011-06-13 13:17:25 PDT
Created attachment 538986 [details]
bzipped log see comment #61 and #62

(In reply to comment #61)
> This unfortunately doesn't have MOZ_GL_DEBUG_VERBOSE=1, see comment 58, can
> you please retry with it? It's really useful here.

Oops sorry; here it is.
Comment 63 Tony Mechelynck [:tonymec] 2011-06-13 13:19:30 PDT
P.S. bp-d853c549-de05-4a73-80aa-593412110613
Comment 64 Benoit Jacob [:bjacob] (mostly away) 2011-06-13 14:31:34 PDT
Thank a lot. This part of the log already shows a clear Mesa bug here:

[gl:0x2a41c00] > void mozilla::gl::GLContext::fFramebufferRenderbuffer(GLenum, GLenum, GLenum, GLuint)
parameters:
  attachmentPoint = 0x8d00
  renderbuffer = 16
[gl:0x2a41c00] < void mozilla::gl::GLContext::fFramebufferRenderbuffer(GLenum, GLenum, GLenum, GLuint) [0x0000]
[gl:0x2a41c00] > GLenum mozilla::gl::GLContext::fCheckFramebufferStatus(GLenum)
[gl:0x2a41c00] < GLenum mozilla::gl::GLContext::fCheckFramebufferStatus(GLenum) [0x0000]
framebuffer info:
  default framebuffer. No FBO is currently bound.

Here, FramebufferRenderbuffer succeeds (see the [0x0000], that means no GL error) which implies that a FBO is currently bound (and indeed just above in the log we called BindFramebuffer)... but actually no FBO is bound (the code behind that is calling GetIntegerv(FRAMEBUFFER_BINDING) and gets the value 0 meaning no FBO).

--> for sure we have to blacklist the (current versions of the) swrast driver.

That leaves open 2 questions:
 1) didn't we have similar bugs on other drivers too? Karl?
 2) we should recover more gracefully from that i.e. the X errors we're currently getting suggest that perhaps we don't clean up appropriately from that error point.
Comment 65 Tony Mechelynck [:tonymec] 2011-06-13 17:18:58 PDT
(In reply to comment #64)
> Thank a lot. This part of the log already shows a clear Mesa bug here:
[...]

If you want to report the bug upstream (e.g. at bugzilla.novell.com), I have the following installed:

OS: openSUSE Linux 11.4 (version: Final, architecture: x86_64)

Software packages (among others, of course):
xorg-x11-driver-video 7.6-53.58.1 ("intel" driver in service)
Mesa 7.10.2-7.3.1
DirectFB-Mesa 1.4.5-14.2

Hardware devices (among others, of course):
Motherboard: Intel/Fujitsu Scenic W620 (handling display, network, PCI)
Framebuffer Device: Intel(r)915G/915GV/910GL Graphics Controller
Comment 66 Karl Tomlinson (:karlt) 2011-06-13 20:32:41 PDT
I expect there may well be similar bugs with all drivers and indirect mesa because the client-side code is the same.  But (at least part of) the server-side code is different, so I'm not sure.

Bug 664066 is making it hard for me to test the Try builds, but I seem to have GL_FRAMEBUFFER_COMPLETE after CheckFramebufferStatus.

Knowing how to recover is tricky if the library doesn't follow defined behaviour, so I'm not sure we should try too hard.

However, I did see X_GLXMakeCurrent: GLXBadCurrentWindow suggesting that we didn't flush the previous context before its underlying drawable was destroyed.

I can try Mesa 7.10.3 with r600 when bug 664066 is fixed.
Comment 67 Benoit Jacob [:bjacob] (mostly away) 2011-06-14 12:10:21 PDT
Created attachment 539276 [details] [diff] [review]
block swrast
Comment 68 Benoit Jacob [:bjacob] (mostly away) 2011-06-14 12:11:37 PDT
Note, I've been told by a developer that swrast is deprecated anyways and the new supported Mesa software renderers are llvmpipe and softpipe. They will be unblocked as soon as we unblock Gallium.
Comment 69 Benoit Jacob [:bjacob] (mostly away) 2011-06-14 12:34:46 PDT
Filed Mesa bug: https://bugs.freedesktop.org/show_bug.cgi?id=38312
Comment 70 Benoit Jacob [:bjacob] (mostly away) 2011-06-14 12:39:49 PDT
I edited the code to simulate that bug here, in the hope to reproduce the X error, but I didn't manage to reproduce it or any crash (NVIDIA driver here).
Comment 71 Benoit Jacob [:bjacob] (mostly away) 2011-06-14 13:26:52 PDT
A Mesa developer has replied,
https://bugs.freedesktop.org/show_bug.cgi?id=38312#c3

and requires more information that can be obtained by running this build
https://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/bjacob@mozilla.com-49c288664dcd

with
MOZ_GL_DEBUG_VERBOSE=1 MOZ_X_SYNC=1 ./firefox -P -no-remote 2>&1 | tee logfile.txt
Comment 72 Tony Mechelynck [:tonymec] 2011-06-14 17:48:43 PDT
(In reply to comment #71)
> A Mesa developer has replied,
> https://bugs.freedesktop.org/show_bug.cgi?id=38312#c3

Good, I've CCed myself to that bug.

> 
> and requires more information that can be obtained by running this build
> https://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/bjacob@mozilla.
> com-49c288664dcd
> 
> with
> MOZ_GL_DEBUG_VERBOSE=1 MOZ_X_SYNC=1 ./firefox -P -no-remote 2>&1 | tee
> logfile.txt

You'll have it tomorrow (in 12 hours or so), I have to sleep and I have an appointment with the doctor in the morning.
Comment 73 Karl Tomlinson (:karlt) 2011-06-14 23:02:06 PDT
Comment on attachment 539276 [details] [diff] [review]
block swrast

It's hard to tell whether this issue is specific to the Software Rasterizer when indirect or whether it is a bug with the Software Rasterizer in general because we don't even get this far when direct.  But that means there's no harm in blocking it in general.
Comment 74 Tony Mechelynck [:tonymec] 2011-06-15 05:02:29 PDT
Created attachment 539478 [details]
bzipped log, see comment #71 sqq.

Here it is, the minefield.log.bz2 from
  (download the linux64 try build linked in comment #71)
  rm -Rvf firefox
  tar -jxvf firefox-7.0a1.en-US.linux-x86_64.tar.bz2
  MOZ_GL_DEBUG_VERBOSE=1 MOZ_X_SYNC=1 ./firefox/firefox --sync -P virgin -no-remote 2>&1|tee minefield.log
  bzip2 -kvf minefield.log

The corresponding closedown crash dump is bp-1b0d2c69-bb93-4f9c-a6f2-9d7992110615
Comment 75 Benoit Jacob [:bjacob] (mostly away) 2011-06-15 05:38:24 PDT
Thanks. That shows that GetIntegerv does not generate an error, and explicitly sets the result to 0 (does not just leave it uninitialized). Replying to the Mesa bug.
Comment 76 Benoit Jacob [:bjacob] (mostly away) 2011-06-15 09:09:06 PDT
Created attachment 539555 [details] [diff] [review]
release GL context before destroying it

Jose's comments 21 and 24 on https://bugs.freedesktop.org/show_bug.cgi?id=38312 suggest that if we don't release the GL context before destroying it, then its destruction is postpone to a later point in time. But we are immediately destroying its drawable, and indeed we must do that otherwise we'd potentially be leaking drawables. So, in order to have well-defined behavior, this patch releases the GL context and checks with a NS_ABORT_IF_FALSE that that indeed happened. (See glXMakeCurrent man page). This hopefully fixes the X errors we're getting here.
Comment 77 Benoit Jacob [:bjacob] (mostly away) 2011-06-15 14:17:55 PDT
https://bugs.freedesktop.org/show_bug.cgi?id=38312#c27 confirms that the 'release GL context' patch fixes the crash. Thanks Tony!
Comment 78 Karl Tomlinson (:karlt) 2011-06-15 18:37:45 PDT
Comment on attachment 539555 [details] [diff] [review]
release GL context before destroying it

I don't understand well enough to know whether this is resolving a bug on our side or working around a bug in Mesa, but if it fixes the problem, great.

I notice that MarkDestroyed already makes the context current (to free resources i assume), so there is no reason to skip this if another context were current.
Comment 79 Benoit Jacob [:bjacob] (mostly away) 2011-06-15 18:47:41 PDT
(In reply to comment #78)
> Comment on attachment 539555 [details] [diff] [review] [review]
> release GL context before destroying it
> 
> I don't understand well enough to know whether this is resolving a bug on
> our side or working around a bug in Mesa, but if it fixes the problem, great.

Bug on our side, we were relying on undefined behavior, were lucky with the NVIDIA driver and sometimes unlucky with Mesa. See comment 76. See the glXDestroyContext man page, it says:

  If GLX rendering context ctx is not current to any thread,
  glXDestroyContext  destroys it immediately.  Otherwise, ctx is destroyed
  when it becomes not current to any thread.  In either	case, the resource ID
  referenced by	ctx is freed immediately.

In other words, if we want glXDestroyContext to have the well-defined behavior of destroying the context before future X commands take effect, we must first release the GL context before calling it. We were failing to do that, but we were destroying the drawable immediately after that call, and as a result, the context was outliving its underlying drawable.

> I notice that MarkDestroyed already makes the context current (to free
> resources i assume), so there is no reason to skip this if another context
> were current.

But precisely, we want the GL context to NOT be current by the time glXDestroyContext is called on it.
Comment 80 Benoit Jacob [:bjacob] (mostly away) 2011-06-15 18:51:07 PDT
Comment on attachment 539555 [details] [diff] [review]
release GL context before destroying it

Please approve this bug for Firefox 6. While we've had this crash since Firefox 4 (see original report) it's getting a lot worse in Firefox 6 as many Mesa setups got whitelisted. It's risk-free and only 3 lines.
Comment 81 Tony Mechelynck [:tonymec] 2011-06-16 09:00:40 PDT
(In reply to comment #77)
> https://bugs.freedesktop.org/show_bug.cgi?id=38312#c27 confirms that the
> 'release GL context' patch fixes the crash. Thanks Tony!

I just reported a crash I saw, and then ran builds of which you sent me the links. Nothing hard in that. You were the one who tracked down the error, discussed it with the Mesa people, wrote a patch, and (IIUC) are now pushing at the wheel to get the fix into both trunk and aurora repositories. In a few days (hopefully), the bug will be FIXED and then I'll be entitled to say: Merci Benoît!
Comment 82 Johnny Stenback (:jst, jst@mozilla.com) 2011-06-16 14:51:19 PDT
Comment on attachment 539555 [details] [diff] [review]
release GL context before destroying it

Clearing approval request since this doesn't seem to be landed on mozilla-central yet. Once it's landed, and has baked for a bit, please re-request approval.
Comment 83 Benoit Jacob [:bjacob] (mostly away) 2011-06-17 09:06:55 PDT
Landed:
http://hg.mozilla.org/mozilla-central/rev/60182c83c925

Tony, your work capturing logs and running apitrace was crucial, so really thanks.
Comment 84 :Ms2ger 2011-06-17 10:05:52 PDT
Please fix the build warnings you caused in opt builds.
Comment 85 Benoit Jacob [:bjacob] (mostly away) 2011-06-17 10:52:55 PDT
What warnings?
There are warnings in webgl code,
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1308326872.1308330305.12966.gz&fulltext=1
but i don't see new ones.
Comment 86 Karl Tomlinson (:karlt) 2011-06-17 16:15:23 PDT
>-  glXDestroyContext(dpy, context);

>+  glXMakeCurrent(dpy, None, NULL); // must release the GL context before destroying it

I think you still want to destroy the context in glxtest.

(In reply to comment #84)
> Please fix the build warnings you caused in opt builds.

Ms2ger: a more specific comment would be more helpful.

A suspect this is "success" set but not used.
It could be silenced by putting its declaration in "ifdef DEBUG".
Comment 87 Benoit Jacob [:bjacob] (mostly away) 2011-06-17 18:45:01 PDT
Thanks!
http://hg.mozilla.org/mozilla-central/rev/5b56da7babb9
Comment 88 Tony Mechelynck [:tonymec] 2011-06-17 20:04:40 PDT
I stopped having this crash in SeaMonkey some time ago, see comment #49.

The current Firefox nightly, built before the fix was landed, still has the bug:
Mozilla/5.0 (X11; Linux x86_64; rv:7.0a1) Gecko/20110617 Firefox/7.0a1 ID:20110617030741 crash: bp-a66426e7-b788-49b8-ab78-d8a252110617 and bp-5ad044e9-d0e8-4a20-b315-26f022110617

Now let's find a Firefox "Nightly" tinderbox-build whose source was pulled later than comment #87... there is none yet... wait until there is one available...
Mozilla/5.0 (X11; Linux x86_64; rv:7.0a1) Gecko/20110617 Firefox/7.0a1 ID:20110617184644 (Built from http://hg.mozilla.org/mozilla-central/rev/5b56da7babb9)... no crash.

I'm setting this bug VERIFIED on the assumption that the fix applies across all applications and platforms. People, if you want to REOPEN, first make sure that you observe the crash in a build whose 14-digit timestamp ("Build ID" as shown in the crash report) is later than comment #87, and then paste your bp-something crash ID in your reopening comment.
Comment 89 Tony Mechelynck [:tonymec] 2011-06-17 20:20:18 PDT
Thanks Benoît! But your job isn't finished yet. I'm setting Fx6-affected on the basis of comment #80, and "when the fix will have baked a little on trunk" comment #82 won't apply anymore.
Comment 90 Benoit Jacob [:bjacob] (mostly away) 2011-06-20 15:07:13 PDT
Comment on attachment 539555 [details] [diff] [review]
release GL context before destroying it

Requesting aurora approval for this patch + the followup fix http://hg.mozilla.org/mozilla-central/rev/5b56da7babb9
Comment 91 Karl Tomlinson (:karlt) 2011-06-21 17:40:33 PDT
*** Bug 616416 has been marked as a duplicate of this bug. ***
Comment 92 Benoit Jacob [:bjacob] (mostly away) 2011-06-27 10:57:25 PDT
Landed on Aurora:
http://hg.mozilla.org/releases/mozilla-aurora/rev/e89f192dbd8c
Comment 93 Tony Mechelynck [:tonymec] 2011-06-28 10:16:24 PDT
Fx6-fixed according to comment #92

According to Socorro this signature is seen:
- on Fx 6.0a2 dated 2011-06-23 to 2011-06-25 with comment "html5test"
- on Fx 5.0 dated 2011-06-15 (the latest beta?)
- on Fx4 before that.

Only on Linux64 but I suppose on i686 the offset in libxul would be different.

I suppose Fx4 is at EOL. Is this crash fix worth porting to Fx5, or will the release-branch fix have to wait for Fx6 release in 6 weeks or so?
Comment 94 Karl Tomlinson (:karlt) 2011-06-28 14:21:13 PDT
On Firefox 5, this should only be affecting users with indirect Mesa libGL and either connecting to a display with NVIDIA GLX or forcing webgl on for their OpenGL.  That's probably a small enough proportion of situations that it is not worth the risk of making a change to Firefox 5.
Comment 95 Tony Mechelynck [:tonymec] 2011-06-29 13:29:35 PDT
(In reply to comment #94)
> On Firefox 5, this should only be affecting users with indirect Mesa libGL
> and either connecting to a display with NVIDIA GLX or forcing webgl on for
> their OpenGL.  That's probably a small enough proportion of situations that
> it is not worth the risk of making a change to Firefox 5.

https://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox%3A5.0&platform=linux&query_search=signature&query_type=contains&reason_type=contains&date=06%2F29%2F2011%2012%3A53%3A15&range_value=4&range_unit=weeks&hang_type=any&process_type=any&do_query=1&signature=libc-2.11.3.so%400x32ab5

Twelve crashes in four weeks (of which six less than a week old), with five different Build IDs, and each a different install time. Crash #60 to #79 (ex-aequo) on page 1 of 13 for Fx5 on Linux this week. This said... Linux crashes (and especially linux-x86_64 crashes) are of course nowhere near the number of Windows crashes; maybe too small a sample to draw statistically valid conclusions.
Comment 96 Karl Tomlinson (:karlt) 2011-06-29 14:48:20 PDT
The same signature in this case doesn't mean anything more than abort was called with the same libc.  There are a few different changes there but none mention X_GLXMakeCurrent: GLXBadContextTag.  The most common looks like Bug 640908.

Note You need to log in before you can comment on or make changes to this bug.