Closed Bug 1287444 Opened 8 years ago Closed 8 years ago

Crash in mozilla::gl::GLContext::MakeCurrent

Categories

(Core :: Graphics, defect)

47 Branch
Unspecified
All
defect
Not set
critical

Tracking

()

RESOLVED FIXED
Tracking Status
firefox47 --- affected
firefox48 --- wontfix
firefox49 + wontfix
firefox-esr45 --- affected
firefox50 --- fixed
firefox51 --- verified

People

(Reporter: marco, Assigned: mtseng)

References

()

Details

(4 keywords, Whiteboard: [gfx-noted])

Crash Data

This bug was filed from the Socorro interface and is 
report bp-370f66d9-e0db-40bf-bd64-33afc2160716.
=============================================================

This is a top crasher (#26) on 47.0.1.

Most crashes with this signature are occurring on Windows 7.

A lot of crashes (more than 30%) with this signature are occurring on Intel graphics cards with device ID = 0x0046.

A lot of crashes (between 20% and 30%) with this signature are occurring on Google Maps.
Morris, please have a look at this. 
Looks like the GLContext was gone when problem happened.

[crash stack for FF 47]
I didn't see this problem in FF 48.
0 	xul.dll 	mozilla::gl::GLContext::MakeCurrent(bool) 	gfx/gl/GLContext.h:3217
1 	xul.dll 	mozilla::gl::SharedSurface_ANGLEShareHandle::~SharedSurface_ANGLEShareHandle() 	gfx/gl/SharedSurfaceANGLE.cpp:120
2 	xul.dll 	mozilla::gl::SharedSurface_ANGLEShareHandle::`scalar deleting destructor'(unsigned int) 	

[crash stack for FF50] I only see this stack on FF50 and I believe this was another problem.

0 	xul.dll 	mozilla::gl::GLContext::MakeCurrent(bool) 	gfx/gl/GLContext.h:3210
1 	xul.dll 	mozilla::WebGLVertexArrayGL::DeleteImpl() 	dom/canvas/WebGLVertexArrayGL.cpp:28
2 	xul.dll 	mozilla::WebGLVertexArray::Delete() 	dom/canvas/WebGLVertexArray.cpp:45
3 	xul.dll 	mozilla::WebGLRefCountedObject<mozilla::WebGLVertexArray>::DeleteOnce() 	dom/canvas/WebGLObjectModel.h:142
4 	xul.dll 	mozilla::WebGLVertexArrayGL::~WebGLVertexArrayGL()
Assignee: nobody → mtseng
Whiteboard: [gfx-noted]
Version: Trunk → 47 Branch
For the second crash. Might be related to bug 1286459?
I see this signature at high volume on both Windows and Android. 
* On Windows this is at #25 with 603 reports in Firefox 47.0.1 (0.4%). 
* On Android this is at #4 with 14383 reports in Fennec 47.0 (2.42%).
OS: Windows → All
(In reply to Anthony Hughes (:ashughes) [GFX][QA][Mentor] from comment #3)
> I see this signature at high volume on both Windows and Android. 
> * On Windows this is at #25 with 603 reports in Firefox 47.0.1 (0.4%). 
> * On Android this is at #4 with 14383 reports in Fennec 47.0 (2.42%).

Also, FWIW I see crashes going back to Firefox/Fennec 41.
For version 47 crash, it should be fixed by bug 1224199 which land in version 47.0a2.

For version 50 crash, it should be handled by bug bug 1286459.

I'll keep tracking those crashes.
Blocks: 1286459, 1224199
This is the #5 Windows topcrash in Nightly 20160722030235, with 28 occurrences.
¡Hola!

Is this spiking?

https://crash-stats.mozilla.com/signature/?product=Firefox&signature=mozilla%3A%3Agl%3A%3AGLContext%3A%3AMakeCurrent&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_sort=-date&page=1#summary shows 3658 crashes.

Mine are:

https://crash-stats.mozilla.com/report/index/bp-4eae80d3-9569-460e-8f24-d7a3f2160725
https://crash-stats.mozilla.com/report/index/bp-4eae80d3-9569-460e-8f24-d7a3f2160725

STR:
- Load https://maps.google.com

Result:
Nightly crashes

Expected result:
No crash

Graphics:
Graphics
Features
Compositing	Basic
Asynchronous Pan/Zoom	none
WebGL Renderer	Google Inc. -- ANGLE (Software Adapter Direct3D11 vs_5_0 ps_5_0)
WebGL2 Renderer	WebGL creation failed: * Refused to create native OpenGL context because of blacklist entry: FEATURE_FAILURE_TEST * Exhausted GL driver options.
Hardware H264 Decoding	No; Hardware video decoding disabled or blacklisted
Direct2D	Blocked for your graphics card because of unresolved driver issues.
DirectWrite	false (6.2.9200.17568)
GPU #1
Active	Yes
Description	Citrix Systems Inc. Display Driver
Vendor ID	0x000c
Device ID	0x000c
Drivers	vdtw30
Subsys ID	0000000c
RAM	Unknown
Diagnostics
AzureCanvasAccelerated	0
AzureCanvasBackend	skia
AzureContentBackend	cairo
AzureFallbackCanvasBackend	cairo
Decision Log
D3D11_COMPOSITING	
Blocklisted; failure code BLOCKLIST_FEATURE_FAILURE_TEST
D3D9_COMPOSITING	
Blocklisted; failure code BLOCKLIST_FEATURE_FAILURE_TEST
DIRECT2D	
unavailable by default: Direct2D requires Direct3D 11 compositing
D3D11_HW_ANGLE	
unavailable by default: D3D11 compositing is disabled
disabled by env: D3D11 compositing is disabled

¡Gracias!
Alex
Is the about:support from a session that crashes, or just the follow up session to the crash?
> Description	Citrix Systems Inc. Display Driver

This should be blacklisted for all features including WebGL. The user might of forced on webgl?
I'm fairly sure this is a use-after-free. We're crashing on a virtual call. We don't hold a strong reference to mGL.

We have a dupe somewhere but I couldn't find it.

A few things we could do, but since jgilbert knows these lifetimes better I'll let him pick:
1) Strong reference to mGL.
2) Weak reference to mGL. If it's gone then we don't need to release our resource.
3) Make sure we release before mGL goes away (if the lifetimes make this possible).
Flags: needinfo?(jgilbert)
(In reply to Benoit Girard (:BenWa) from comment #10)
> I'm fairly sure this is a use-after-free. We're crashing on a virtual call.
> We don't hold a strong reference to mGL.
> 
> We have a dupe somewhere but I couldn't find it.
> 
> A few things we could do, but since jgilbert knows these lifetimes better
> I'll let him pick:
> 1) Strong reference to mGL.
> 2) Weak reference to mGL. If it's gone then we don't need to release our
> resource.
> 3) Make sure we release before mGL goes away (if the lifetimes make this
> possible).

Most simply, these would ideally be strong refs. However, we don't want to allow the number of GLContexts to grow unbounded, which is generally why we try to bound the lifetimes of these objects to within the GLContext.

Long term, we should make these objects detachable from the GLContext, and have them be automatically detached when we want to destroy the GLContext.

Short term, these should be weak-refs.
Flags: needinfo?(jgilbert)
A crash is pretty much the worse case. So just about anything other than what we're doing is better here.
Flags: needinfo?(bgirard)
Looks like the faulting code was removing bug 1285044.

This might be WorksForMe now.
Flags: needinfo?(bgirard)
removed in bug 1285044*
¡Hola Benoit!

FWIW Mozilla/5.0 (Windows NT 6.1; WOW64; rv:51.0) Gecko/20100101 Firefox/51.0 ID:20160802030437 CSet: ffac2798999c5b84f1b4605a1280994bb665a406 no longer crashes as described on https://bugzilla.mozilla.org/show_bug.cgi?id=1287444#c7

¡Gracias!
Alex
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Hello Alex,
could you use mozregression [1] to find what fixed this bug?

Simply download and install mozregression from https://github.com/mozilla/mozregression/releases, launch it, select "File -> Run a new bisection", leave the default options in the first two dialogs, in the third dialog write "2016-08-02" in "Last known good build", "2016-07-25" in "Last known bad build", select the "(Search for a bug fix instead of a regression)" option and then click "Finish".
mozregression will automatically download and execute Firefox for you, you can test if it crashes. If it crashes, click on "good", if it doesn't crash, close it and click on "bad". Once you're finished, copy the log here.

[1]: http://mozilla.github.io/mozregression/
Flags: needinfo?(alex_mayorga)
¡Hola Marco!

I downloaded mozregression-gui.exe from the page given and installed it but now get this crash every time it starts:

platform: Windows-7-6.1.7601-SP1
python: 2.7.11 FROZEN (32bit)
mozregui: 0.9.2
mozregression: 2.3.2
message: ConnectionError: ('Connection aborted.', gaierror(11004, 'getaddrinfo failed'))
traceback:   File ".\mozregui\check_release.py", line 20, in run
  File "..\mozregression\network.py", line 27, in retry_get
  File "C:\Python27\lib\site-packages\redo\__init__.py", line 152, in retry
  File "C:\Python27\lib\site-packages\requests\api.py", line 65, in get
  File "C:\Python27\lib\site-packages\requests\api.py", line 49, in request
  File "C:\Python27\lib\site-packages\requests\sessions.py", line 461, in request
  File "C:\Python27\lib\site-packages\requests\sessions.py", line 573, in send
  File "C:\Python27\lib\site-packages\requests\adapters.py", line 415, in send

Am I doing something wrong?

¡Gracias!
Alex
Flags: needinfo?(alex_mayorga) → needinfo?(mcastelluccio)
(In reply to alex_mayorga from comment #19)
> ¡Hola Marco!
> 
> I downloaded mozregression-gui.exe from the page given and installed it but
> now get this crash every time it starts:
> 
> platform: Windows-7-6.1.7601-SP1
> python: 2.7.11 FROZEN (32bit)
> mozregui: 0.9.2
> mozregression: 2.3.2
> message: ConnectionError: ('Connection aborted.', gaierror(11004,
> 'getaddrinfo failed'))
> traceback:   File ".\mozregui\check_release.py", line 20, in run
>   File "..\mozregression\network.py", line 27, in retry_get
>   File "C:\Python27\lib\site-packages\redo\__init__.py", line 152, in retry
>   File "C:\Python27\lib\site-packages\requests\api.py", line 65, in get
>   File "C:\Python27\lib\site-packages\requests\api.py", line 49, in request
>   File "C:\Python27\lib\site-packages\requests\sessions.py", line 461, in
> request
>   File "C:\Python27\lib\site-packages\requests\sessions.py", line 573, in
> send
>   File "C:\Python27\lib\site-packages\requests\adapters.py", line 415, in
> send
> 
> Am I doing something wrong?
> 
> ¡Gracias!
> Alex

Hello Alex,
this looks like a temporary network problem, could you try again?

Feel free to ping me on IRC if mozregression is still not working (I'm 'marco').
Flags: needinfo?(mcastelluccio)
I can't repro this problem either, so I'm hoping it's just a temporary network issue, as Marco suggests.
There are some fennec crashes in 49 beta 1. If we can figure out what fixed this, it might be a good thing to uplift to beta.  Jeff, do you think this may be related to bug 1286459 as Marco suggests?
Flags: needinfo?(jgilbert)
Crash volume for signature 'mozilla::gl::GLContext::MakeCurrent':
 - nightly (version 51): 0 crashes from 2016-08-01.
 - aurora  (version 50): 0 crashes from 2016-08-01.
 - beta    (version 49): 24 crashes from 2016-08-02.
 - release (version 48): 448 crashes from 2016-07-25.
 - esr     (version 45): 1630 crashes from 2016-05-02.

Crash volume on the last weeks (Week N is from 08-22 to 08-28):
            W. N-1  W. N-2  W. N-3
 - nightly       0       0       0
 - aurora        0       0       0
 - beta          7       8       4
 - release     149     125      53
 - esr         113     105     124

Affected platform: Windows

Crash rank on the last 7 days:
           Browser   Content     Plugin
 - nightly
 - aurora
 - beta    #1648
 - release #131      #140
 - esr     #82
Morris, what do you think, do you want to request beta uplift? There are currently only a few (under 10) crashes on recent betas, for desktop and fennec, but that may become worse on release (as it seemed to with 47)
Flags: needinfo?(jgilbert) → needinfo?(mtseng)
Do you mean uplift bug 1286459 to beta? I think it is ok but I'll defer this request to jgilbert since he is patch author.
Flags: needinfo?(mtseng) → needinfo?(jgilbert)
Flags: needinfo?(jgilbert)
You need to log in before you can comment on or make changes to this bug.