Closed Bug 1176404 Opened 9 years ago Closed 7 months ago

Crashes at __pthread_kill | ... gpusGenerateCrashLog ... | gpusSubmitDataBuffers, mostly on OS X 10.10 and 10.11

Categories

(Core :: Graphics, defect, P3)

All
macOS
defect

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox48 --- wontfix
firefox49 --- wontfix
firefox-esr45 --- affected
firefox50 --- affected
firefox51 --- affected
firefox52 --- wontfix

People

(Reporter: smichaud, Unassigned)

References

Details

(Keywords: crash, steps-wanted, Whiteboard: [gfx-noted])

Crash Data

Attachments

(2 files)

These are crashes in low-level OS X OpenGL graphics code -- at or near the driver level.  They occur in fairly large numbers back to FF 38.0.5, but aren't Mac topcrashers except on the 41 branch.

They occur mostly on Intel HD Graphics 4000 hardware (device id 0x0166, available for example on 2012 Retina MacBook Pros) and Intel HD Graphics 3000 hardware (device id 0x0126, available for example on 2011 MPBs) -- these together represent well over 50% of the crashes.  But they aren't limited to this kind of graphics hardware.

We probably won't be able to do anything about these until and unless we're able to reproduce them.  And as these happen disproportionately on OS X 10.10.3, they're probably at least partly a bug in OS X.
Be aware that this crash "signature" is shared by lots of totally unrelated bugs.  It just translates to __pthread_kill -- typically used by OS code to kill a misbehaving process.
Crash Signature: [@ libsystem_kernel.dylib@0x16286 ]
Top of typical crash stack (bp-84aff656-5fd1-4f8e-b98d-5a1812150614),
translated using atos:

0  libsystem_kernel.dylib           libsystem_kernel.dylib@0x16286
1  libsystem_c.dylib                libsystem_c.dylib@0x5db52
2  libGPUSupportMercury.dylib       libGPUSupportMercury.dylib@0x1b80
3  AppleIntelHD4000GraphicsGLDriver AppleIntelHD4000GraphicsGLDriver@0x369bc4
4  libGPUSupportMercury.dylib       libGPUSupportMercury.dylib@0x3067
5  AppleIntelHD4000GraphicsGLDriver AppleIntelHD4000GraphicsGLDriver@0x37063f
6  AppleIntelHD4000GraphicsGLDriver AppleIntelHD4000GraphicsGLDriver@0x3704bc
7  AppleIntelHD4000GraphicsGLDriver AppleIntelHD4000GraphicsGLDriver@0x371931
8  AppleIntelHD4000GraphicsGLDriver AppleIntelHD4000GraphicsGLDriver@0x371c28
9  GLEngine                         GLEngine@0x1d130
10 OpenGL                           OpenGL@0xcfb9
11 AppKit                           AppKit@0x332f3a
...

0  __pthread_kill (in libsystem_kernel.dylib) + 10
1  abort (in libsystem_c.dylib) + 128
2  gpusGenerateCrashLog (in libGPUSupportMercury.dylib) + 172
3  gpusKillClient (in AppleIntelHD4000GraphicsGLDriver) + 8
4  gpusSubmitDataBuffers (in libGPUSupportMercury.dylib) + 499
5  IntelCommandBuffer::getNew(GLDContextRec*) (in AppleIntelHD4000GraphicsGLDriver)
6  intelSubmitCommands (in AppleIntelHD4000GraphicsGLDriver) + 201
7  SwapFlush(GLDContextRec*, unsigned int) (in AppleIntelHD4000GraphicsGLDriver) + 18
8  gldPresentFramebufferData (in AppleIntelHD4000GraphicsGLDriver) + 137
9  glSwap_Exec (in GLEngine) + 96
10 CGLFlushDrawable (in OpenGL) + 65
11 -[NSOpenGLContext flushBuffer] (in AppKit) + 26
A very similar crash stack was reported here for an entirely different app:

https://github.com/tomaka/glutin/issues/129
The following link may not be directly relevant.  But I'd bet it describes something like what's going on here:

http://hacksoflife.blogspot.com/2013/08/what-does-gpusreturnguiltyforhardwarere.html
Keywords: crash
> (bp-84aff656-5fd1-4f8e-b98d-5a1812150614)

Oops, wrong crash id.  The one that corresponds to the stack in comment #2 is bp-4e894f09-dd87-46fe-adcf-353c72150618.
Summary: Crashes at __pthread_kill | ... gpusGenerateCrashLog ... | -[NSOpenGLContext flushBuffer], mostly on OS X 10.10.3 → Crashes at __pthread_kill | ... gpusGenerateCrashLog ... | gpusSubmitDataBuffers, mostly on OS X 10.10.3
I got this crash twice today:

bp-18109f17-0fd0-430d-8743-9dd292150630
bp-e86ddb62-8427-4b35-9b3b-cc72b2150630

In both cases I observed broken rendering in the window prior to the crash. The second time I was able to grab a screenshot which I'm attaching.
Interesting.

I saw something like your screenshot on OS X 10.10.3 just today, with yesterday's m-c nightly.  I didn't crash though.  In case it's relevant, I had e10s off (I was testing for another bug).
Markus, I haven't been able to reconstruct what I was doing just before I saw the tiling bug.  Do you remember what you were doing?
> just today, with yesterday's m-c nightly

just yesterday, with yesterday's m-c nightly
No, unfortunately not.
These crashes in Firefox 39 seem similar but with Nvidia hardware (user posted them to SuMo this evening):

https://crash-stats.mozilla.com/report/index/2901b52e-31ad-4be8-8b4f-73a472150714
https://crash-stats.mozilla.com/report/index/1bb6974b-513b-4ffe-b2ce-202932150713 - YouTube
https://crash-stats.mozilla.com/report/index/b05dab73-ddbc-43a0-a785-2466b2150713 - Twitter

GLEngine -> GeForceGLDriver -> libGPUSupportMercury.dylib ...
As bug 1176316 indicates this is the main bug for these problems:

I just encountered 
https://crash-stats.mozilla.com/report/index/46444a51-e188-4c25-bfbd-ae4002150817

on Mac OSX 10.10.5 with Fx 41.0b1, but surprisingly in the GeForce driver.
Crash Signature: [@ libsystem_kernel.dylib@0x16286 ] → [@ libsystem_kernel.dylib@0x16286 ] [@ libsystem_kernel.dylib@0x170ae ]
Summary: Crashes at __pthread_kill | ... gpusGenerateCrashLog ... | gpusSubmitDataBuffers, mostly on OS X 10.10.3 → Crashes at __pthread_kill | ... gpusGenerateCrashLog ... | gpusSubmitDataBuffers, mostly on OS X 10.10 and 10.11
As soon as I turned on e10s on 43.0a2 (2015-09-27) I encountered this twice within a couple of minutest:
https://crash-stats.mozilla.com/report/index/ba8c6306-4cec-4f02-8b8b-bad8a2150928
https://crash-stats.mozilla.com/report/index/5b79ff67-e395-4c29-8eda-56dbe2150928

This bug is so bad for me on 10.10.5 that it makes it really hard to tolerate e10s.
I hit the following on 42.0b2 not long after upgrading to Mac OS X 10.11 El Capitan (that was just released today):

https://crash-stats.mozilla.com/report/index/c6fba1e0-094e-4bf8-bae5-3c8582150930
Like Gary, I hit this crash the first time I ran Nightly 44 after upgrading from OS X 10.10 Yosemite to OS X 10.11 El Capitan:

bp-b99d903d-8f45-4741-83cf-6d5aa2151001
The signature [@ libsystem_kernel.dylib@0x16286 ] has been the top 3 crash on OS X. I reported a probably duplicate bug 1218070, but it seems that crash happens near the nvidia geforce driver, so I have no idea whether that is a duplicate.
Keywords: topcrash-mac
See Also: → 1218070
(In reply to Chris Peterson [:cpeterson] from comment #18)
> Like Gary, I hit this crash the first time I ran Nightly 44 after upgrading
> from OS X 10.10 Yosemite to OS X 10.11 El Capitan:
> 
> bp-b99d903d-8f45-4741-83cf-6d5aa2151001

Chris, just that one crash?  Or does it happen on every new nightly?
Flags: needinfo?(cpeterson)
(In reply to Milan Sreckovic [:milan] from comment #20)
> (In reply to Chris Peterson [:cpeterson] from comment #18)
> > Like Gary, I hit this crash the first time I ran Nightly 44 after upgrading
> > from OS X 10.10 Yosemite to OS X 10.11 El Capitan:
> > 
> > bp-b99d903d-8f45-4741-83cf-6d5aa2151001
> 
> Chris, just that one crash?  Or does it happen on every new nightly?

I only saw the crash once.
Flags: needinfo?(cpeterson)
I'm not sure if this crash is related to this particular issue, but I received a similar crash on the latest beta fx43.0b1 [BuildID: 20151103023037 Changeset: 955ece9be4f2]:

* bp-48e8362d-f20b-41cd-b938-3dd082151109

Here's the STR that I used when I got the crash... I couldn't reproduce it so I'm assuming it's perhaps timing related? I've ran into several crashes in the past few months following this exact flow.

- opened fx43.0b1 and quickly type in about:support
- scrolled all the way down to the "Experimental Features" section
- glanced at the information and quickly closed fx which resulted in an immediate crash
In my case was frequent doing any action (like tab switch) of JS/Flash based sites, never had problems with plain html pages. Looks like FF freeze for an instance before the crash and was so annoying that i couldn't use firefox

https://crash-stats.mozilla.com/report/index/9a15aba9-a84b-4f83-ad6b-cb3f32151020
(In reply to info from comment #23)
> In my case was frequent doing any action (like tab switch) of JS/Flash based sites

How frequently does this reproduce on your system?
Whiteboard: [gfx-noted]
It variable, from a couple of times in a day to multiple crashes in an hour; sadly i never managed to correlate to a specific website or usage behaviour
(In reply to info from comment #25)
> It variable, from a couple of times in a day to multiple crashes in an hour;
> sadly i never managed to correlate to a specific website or usage behaviour

Do you recall with which version of Firefox this started? If not, would you be willing to test some older releases to figure that out?
(In reply to Anthony Hughes, QA Mentor (:ashughes) from comment #26)
> (In reply to info from comment #25)
> > It variable, from a couple of times in a day to multiple crashes in an hour;
> > sadly i never managed to correlate to a specific website or usage behaviour
> 
> Do you recall with which version of Firefox this started? If not, would you
> be willing to test some older releases to figure that out?

What i can tell you is that the problem has started to be annoying since 29/08/2015 (looking at about:crashes), the number of crashes increases after 40.0.3, build 20150826023504 then i stopped using firefox seeing that 41.0.1 didnt fix the issue.

Regard testing drop me an email
(In reply to info from comment #27)
> (In reply to Anthony Hughes, QA Mentor (:ashughes) from comment #26)
> > (In reply to info from comment #25)
> > > It variable, from a couple of times in a day to multiple crashes in an hour;
> > > sadly i never managed to correlate to a specific website or usage behaviour
> > 
> > Do you recall with which version of Firefox this started? If not, would you
> > be willing to test some older releases to figure that out?
> 
> What i can tell you is that the problem has started to be annoying since
> 29/08/2015 (looking at about:crashes), the number of crashes increases after
> 40.0.3, build 20150826023504 then i stopped using firefox seeing that 41.0.1
> didnt fix the issue.

Okay thanks. I took a look at that crash data and these signatures go back to at least Firefox 31esr so if it's a regression it probably isn't recent and is going to be improbable that we'd find a range in testing.

Milan, is there anything we can have this person do or any information they can provide to move this bug forward?
Flags: needinfo?(milan)
I expect some of these crashes to go down once the fix for bug 1219230 lands.
Flags: needinfo?(milan)
Kamil, any reproducible workflow for this?
> Kamil, any reproducible workflow for this?

I usually get the crash when I quickly load about:support and start scrolling through the page before it completes loading. Sometimes (very rarely), it crashes instantly when I load about:support. I tried reproducing the crash and ran into the following: (not sure if this is even helpful but figured I would add it)

Process 45549 stopped
* thread #30: tid = 0x158e36, 0x00007fff8c1af0ae libsystem_kernel.dylib`__pthread_kill + 10, name = 'Compositor', stop reason = signal SIGABRT
    frame #0: 0x00007fff8c1af0ae libsystem_kernel.dylib`__pthread_kill + 10
libsystem_kernel.dylib`__pthread_kill:
->  0x7fff8c1af0ae <+10>: jae    0x7fff8c1af0b8            ; <+20>
    0x7fff8c1af0b0 <+12>: movq   %rax, %rdi
    0x7fff8c1af0b3 <+15>: jmp    0x7fff8c1aa3ef            ; cerror_nocancel
    0x7fff8c1af0b8 <+20>: retq

I starting running into the crash when I was installing a telemetry experiment (comment # 22) and wanted to check about:support to see if the experiment information was correctly being populated under "Experimental Features". Maybe it's because there's new information that's being added into about:support and I'm trying to scroll to the information to quickly? I'll try reproducing the crash via "./mach run --debug" and see if I can get more information.
I'd be interested to know if bug 1219230 that just landed today reduces the chance of this crash.
Did not get fixed by a patch from bug 1219230, but I don't know if it slowed it down.
Does look like this has slowed down at all:
> 2015-11-06 to 2015-11-12: 6 crashes reported against 45.0a1
> 2015-11-13 to 2015-11-19: 18 crashes reported against 45.0a1
> 2015-11-20 to 2015-11-26: 15 crashes reported against 45.0a1
> 2015-11-27 to 2015-12-04: 22 crashes reported against 45.0a1
Flags: needinfo?(howareyou322)
This is currently the #8 topcrash for Firefox 48 in MacOS at 2.19%. Based on the hardware correlations I would suspect trying to reproduce this on a 13" Macbook Pro would probably be the best course to take. However I would note that some users reporting crashes have provided email addresses; one user has reported this crash 17 times in the last week. I might suggest we reach out to these people directly. 

Breakdowns for the last month:
==============================
Platform
> 90.93% MacOS 10.10 vs 9.05% MacOS 10.11

GPU Vendor
> 89.87% Intel vs 7.27% NVIDIA vs 2.72% AMD

Top AMD GPUs
> 55.85% Southern Islands chipsets, 76% of which are on AMD Radeon HD 8770M GPUs
> 21.43% Evergreen chipsets, 64% on AMD Radeon HD 6650M/6770 GPUs

Top NVIDIA GPUs
> 73.39% Kepler chipsets, 91% of which are using Geforce GT 650M/750M GPUs
> 19.38% Tesla chipsets, 30% of which are Geforce GT 330M GPUs

Top Intel GPUs
> 38.32% Haswell chipsets, 96% of which are using Intel HD 5000 GPUs
> 26.66% Ivybridge chipsets, 100% of which are using Intel HD 4000 GPUs
> 21.99% Sandybridge chipsets, 66% of which are using Intel HD 3000 GPUs
Crash Signature: [@ libsystem_kernel.dylib@0x16286 ] [@ libsystem_kernel.dylib@0x170ae ] → [@ libsystem_kernel.dylib@0x16286 ] [@ libsystem_kernel.dylib@0x170ae ] [@ hang | libsystem_kernel.dylib@0x16286 ]
Keywords: steps-wanted
These catch all sorts of different bugs.  For example, bug 1218070 is one of these, and is being tracked separately.  We really need to get the symbols for these libraries on all the versions of OS X.  I opened bug 1296728 for that.
Depends on: 1296728
Crash volume for signature 'libsystem_kernel.dylib@0x170ae':
 - nightly (version 51): 0 crashes from 2016-08-01.
 - aurora  (version 50): 0 crashes from 2016-08-01.
 - beta    (version 49): 1 crash from 2016-08-02.
 - release (version 48): 40 crashes from 2016-07-25.
 - esr     (version 45): 241 crashes from 2016-05-02.

Crash volume on the last weeks (Week N is from 08-22 to 08-28):
            W. N-1  W. N-2  W. N-3
 - nightly       0       0       0
 - aurora        0       0       0
 - beta          1       1       0
 - release       9      12       7
 - esr          19      25      24

Affected platform: Mac OS X

Crash rank on the last 7 days:
           Browser     Content   Plugin
 - nightly
 - aurora
 - beta    #12432
 - release #1104
 - esr     #356
Crashes reported in the last week:
> 49.*:  14 [0.98% Overall]
> 50.*: 142 [1.04% Overall]
> 51.*:  22 [1.29% Overall]
> 52.*:   1 [0.16% Overall]
> 53.*:   0

While this is no longer a topcrash it is still being reported in current branches.
Flags: needinfo?(howareyou322)
Too late for firefox 52, mass-wontfix.
See Also: → 1535120

Like bug 1535120 and bug 1576767, these crashes (the current ones at least) are all (ultimately) in mozilla::gl::GLContextCGL::SwapBuffers().

Mass-removing myself from cc; search for 12b9dfe4-ece3-40dc-8d23-60e179f64ac1 or any reasonable part thereof, to mass-delete these notifications (and sorry!)

QA Whiteboard: qa-not-actionable
Severity: critical → S2

Since the crash volume is low (less than 15 per week), the severity is downgraded to S3. Feel free to change it back if you think the bug is still critical.

For more information, please visit auto_nag documentation.

Severity: S2 → S3

Closing because no crashes reported for 12 weeks.

Status: NEW → RESOLVED
Closed: 7 months ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: