Closed Bug 1201401 Opened 4 years ago Closed 3 months ago

crash in CVCGDisplayLink::getDisplayTimes Mac coming out of sleep (waking) with external monitor

Categories

(Core :: Widget: Cocoa, defect, P2, critical)

Unspecified
macOS
defect

Tracking

()

RESOLVED FIXED
mozilla71
Tracking Status
firefox47 --- wontfix
firefox48 --- wontfix
firefox-esr52 --- wontfix
firefox-esr60 --- wontfix
firefox-esr68 70+ fixed
firefox67 --- wontfix
firefox68 --- wontfix
firefox69 --- wontfix
firefox70 + verified
firefox71 + verified

People

(Reporter: masayuki, Assigned: smichaud)

References

()

Details

(Keywords: crash, topcrash-mac, topcrash-thunderbird, Whiteboard: [gfx-noted][tbird topcrash][tpi:+])

Crash Data

Attachments

(8 files, 5 obsolete files)

4.30 KB, text/plain
Details
4.30 KB, text/plain
Details
47 bytes, text/x-phabricator-request
Details | Review
3.82 KB, text/plain
Details
47 bytes, text/x-phabricator-request
Details | Review
14.37 KB, text/plain
Details
1.86 KB, patch
Details | Diff | Splinter Review
1.86 KB, patch
Details | Diff | Splinter Review
This bug was filed from the Socorro interface and is 
report bp-71e78d27-f874-4476-bd9f-8d1d92150903.
=============================================================
> 0 	CoreVideo 	CVCGDisplayLink::getDisplayTimes(unsigned long long*, unsigned long long*, unsigned long long*) 	
> 1 	CoreVideo 	CVHWTime::update(double, bool*, bool*) 	
> 2 	CoreVideo 	CVXTime::update() 	
> 3 	CoreVideo 	CVDisplayLink::runIOThread() 	
> 4 	CoreVideo 	startIOThread(void*) 	
> Ø 5 	libsystem_pthread.dylib 	libsystem_pthread.dylib@0x4059 	
> Ø 6 	libsystem_pthread.dylib 	libsystem_pthread.dylib@0x3fd6 	
> Ø 7 	libsystem_pthread.dylib 	libsystem_pthread.dylib@0x13ec 	
> 8 	CoreVideo 	CVDisplayLink::isRunning()

According to the comments and the summary of same reports, this may be reproduced when user uses external display on OS X 10.10.

Looks like we hit a bug of OS X 10.10. If we could avoid the bug in our side, it's great.
Or, Graphics is better component?
> Or, Graphics is better component?

I think so :-)
Component: Widget: Cocoa → Graphics
Looks like the mac vsync stuff.
Flags: needinfo?(mchang)
Whiteboard: gfx-noted
I took a look at the crash report and minidump. The main thread isn't even in XUL code, it's in some random place in CoreGraphics, with the top of the stack being AppKit. From the crashed thread, it should happen when we enable/disable vsync, but that isn't happening here as the main thread isn't in gfxPlatformMac. I'm not really sure what we can do here since it looks like Yosemite is doing something to start up a display link that vsync didn't actually request...and the uptimes are super long. And they all happen only on release builds....

https://crash-stats.mozilla.com/report/list?product=Firefox&range_unit=days&version=Firefox%3A40.0.3&signature=CVCGDisplayLink%3A%3AgetDisplayTimes%28unsigned+long+long*%2C+unsigned+long+long*%2C+unsigned+long+long*%29&date=2015-09-03&range_value=28#tab-reports
Flags: needinfo?(mchang)
Markus, I know this isn't much to go by, but does this crash stack make any sense to you on the main thread? Where would we do something like this?
Flags: needinfo?(mstange)
Do you know which version of OS X this is on? Is it 10.10.5? I attempted symbolicating it using Benoit's 10.10.4 libraries, but the CoreGraphics symbols don't make any sense, so it looks like we have a different version.

The CoreFoundation frames are__CFRUNLOOP_IS_SERVICING_THE_MAIN_DISPATCH_QUEUE__, __CFRunLoopRun and CFRunLoopRunSpecific, and then it calls into libdispatch (which is where GCD is implemented), so it seems it's executing some kind of completion runnable of an off-main-thread action.

Here are the commands we ran:
> /usr/bin/atos -arch x86_64  -l 0x0 -o '/System/Library/Frameworks/CoreGraphics.framework/CoreGraphics' 0x2ebf 0x2dc3 0x2ce8 0x89aa 0xe8e64 0x61441 0x7f4d6 0x5ef1e 0x5de64 0x5de2a 0x9c7f6
> /usr/bin/atos -arch x86_64  -l 0x0 -o '/System/Library/Frameworks/AppKit.framework/AppKit' 0x918ab 0x90e58 0x15edde 0x15d231
> /usr/bin/atos -arch x86_64  -l 0x0 -o '/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation' 0xb73f9 0x7268f 0x71bd8 
> /usr/bin/atos -arch x86_64h  -l 0x0 -o '/usr/lib/system/libdispatch.dylib' 0x6323 0x1c13 0xdcbf
Flags: needinfo?(mstange)
Attached file 10.10.4 crash stack
The previous one was on 10.10.5. This attached main thread crash stack is from 10.10.4 via this crash report:

https://crash-stats.mozilla.com/report/index/6e1df635-7d38-4074-abb6-cacfc2150827

On a 10.10.5 machine, I'm getting these symbols for CoreGraphics:

/usr/bin/atos -arch x86_64  -l 0x0 -o '/System/Library/Frameworks/CoreGraphics.framework/CoreGraphics' 0x2ebf 0x2dc3 0x2ce8 0x89aa 0xe8e64 0x61441 0x7f4d6 0x5ef1e 0x5de64 0x5de2a 0x9c7f6

CGSDisplaySystemStateCreateFromSerialization (in CoreGraphics) + 169
_CGSGetDisplaySystemState (in CoreGraphics) + 205
CGSGetDisplaySystemState (in CoreGraphics) + 192
_CGSGetDockRectWithReason (in CoreGraphics) + 274
W8_image_mark (in CoreGraphics) + 6680
CGSDatagramReadStream::dispatch_next_datagram() (in CoreGraphics) + 3475
_CGSAddStructuralRegionOfType (in CoreGraphics) + 120
create_display_mode (in CoreGraphics) + 840
__initDisplays_block_invoke (in CoreGraphics) + 64
__initDisplays_block_invoke (in CoreGraphics) + 6
Full stack here:

// CoreGraphics
CGSDisplaySystemStateCreateFromSerialization (in CoreGraphics) + 169
_CGSGetDisplaySystemState (in CoreGraphics) + 205
CGSGetDisplaySystemState (in CoreGraphics) + 192
_CGSGetDockRectWithReason (in CoreGraphics) + 274
W8_image_mark (in CoreGraphics) + 6680
CGSDatagramReadStream::dispatch_next_datagram() (in CoreGraphics) + 3475
_CGSAddStructuralRegionOfType (in CoreGraphics) + 120
create_display_mode (in CoreGraphics) + 840
__initDisplays_block_invoke (in CoreGraphics) + 64
__initDisplays_block_invoke (in CoreGraphics) + 6

// lib dispatch
_dispatch_call_block_and_release (in libdispatch.dylib) + 12
_dispatch_client_callout (in libdispatch.dylib) + 8
_dispatch_main_queue_callback_4CF (in libdispatch.dylib) + 861

// Core Foundation
__CFRUNLOOP_IS_SERVICING_THE_MAIN_DISPATCH_QUEUE__ (in CoreFoundation) + 9
__CFRunLoopRun (in CoreFoundation) + 2159
CFRunLoopRunSpecific (in CoreFoundation) + 296

// App Kit
_DPSNextEvent (in AppKit) + 978
-[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] (in AppKit) + 346
-[NSEvent window] (in AppKit) + 86
-[NSApplication sendEvent:] (in AppKit) + 2535

Looks like the display is getting attached / detached, and CoreGraphics is crashing for some reason while trying to create the display state.
I've tried a couple of things to no avail. I created a small program that just constantly runs a CVDisplayLink outputting vsync. I attached an external monitor a couple of times to each of the two display ports on a retina macbook pro. I also forced the machine to sleep / wake up and put the computer in clamshell mode as well. Vsync kept going.

I did the same for nightly and release, but firefox did not crash. I wonder if it's specific to a kind of monitor.
I don't think this has anything to do with the vsync code -- that's not yet in a release, is it?

https://crash-stats.mozilla.com/search/?signature=CVCGDisplayLink&date=%3E%3D2015-01-01&_facets=signature&_facets=platform_version&_facets=version&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-version

There have been lots of these over time, but their distribution is bizarre and very clumpy.  Note the huge number of crashes on FF 39.  I suspect that there a small number of people who see *lots* of these crashes.

Note also that all the crashes happen on OS X 10.10.X.  So this seems to be some combination of a hardware problem and an OS bug.  I doubt there's anything we can do about it.
> but their distribution is bizarre and very clumpy.  Note the huge number of crashes on FF 39.

Actually, it may just be that FF 39 has had more exposure to users than FF 40 has yet had.

I don't know what it means that these crashes seem to have started with FF 39.
> I don't think this has anything to do with the vsync code -- that's not yet in a release, is it?

I take this back.  Mac vsync support was turned on in FF 39 (bug 1144321).  So *that*'s why these crashes started in FF 39.

I still doubt there's much we can do about them, though.
Mason, I suggest you try testing with the crappiest HDMI cable you can find :-)
I can contact a reporter of the crash reports. Does it help you if I do so? If so, what should I ask him/her?
> If so, what should I ask him/her?

Right now I'd say the following:

1) Is your monitor external?  What kind of connection do you have to it?

2) Do the crashes only happen when you disconnect/reconnect the monitor?  When you wiggle the connecting cable (presuming it's external)?  Or just randomly?
(Following up comment #16)

Oops, one more question:

3) Does your monitor have visible problems when the crashes happen?  (Like the picture being replaced by snow, temporarily or permanently.)
(Following up comment #16)

And yet one more:

4) Do the crashes also happen in Safari?
(In reply to Masayuki Nakano (:masayuki) (Mozilla Japan) from comment #15)
> I can contact a reporter of the crash reports. Does it help you if I do so?
> If so, what should I ask him/her?

The questions Steven asked all sound good to me.
This is getting ahead of the game, but ...

If we find one or more people for whom their connecting cable (presumably an HDMI cable) is the problem, Mozilla should trade them a fancy new one for the old crappy one, so we can test with it/them.
I now asked him/her. However, as far as I know, the frequency isn't so high. So, I'm not sure if he/she will answer that soon.
I got a lot of information from him!

1) Is your monitor external?  What kind of connection do you have to it?

Yes. He uses third-party's HDMI connector (it looks like PL-MDPHD02 of PLANEX <https://www.planex.co.jp/product/av/pl-mdphd02/>, but he is not sure because the environment is in his office's. Japan is already midnight now, so, he cannot confirm the exact product). And the display is probably, iiyama's 24 inch monitor (perhaps, its resolution is 1920x1080).

2) Do the crashes only happen when you disconnect/reconnect the monitor?  When you wiggle the connecting cable (presuming it's external)?  Or just randomly?

He is not sure, but he probably didn't do that. However, at least one of the crashes occurred when he make his MBA wake up from sleep. He usually use his MBA with external input devices and the external monitor (mirroring mode) and close the MBA. And he is using hot-corner feature (I'm not sure the feature, sorry). He said that the crash might have occurred at using hot-corner.

3) Does your monitor have visible problems when the crashes happen?  (Like the picture being replaced by snow, temporarily or permanently.)

Probably, nothing occurred.

4) Do the crashes also happen in Safari?

He wasn't used Safari at the crash, however, he used Chrome but Chrome wasn't crashed. Interestingly, he launched 3 Firefox instances, all of them are maximized. At the latest crash, only 2 instance of them are crashed. The last one's window size was MBA's screen size. He said, perhaps, the window had been maximized before connecting the external display.
> He said that the crash might have occurred at using hot-corner.

Sorry, this is wrong. He said that he made his MBA slept by hot-cornor, after that, he tries to make MBA wake up, then, 2 Firefox instances are crashed.
FYI: Miximize is not the full screen mode of OS X. I meant that it's the legacy green button's behavior.
Here's another question for your contact, Masayuki:

5) How long is his HDMI cable?

I understand that the performance of HDMI cables can degrade with length.

I'd also like to know if he crashes in Safari, and if wiggling the HDMI cable can trigger crashes (and display problems).
(In reply to Steven Michaud [:smichaud] from comment #25)
> Here's another question for your contact, Masayuki:
> 
> 5) How long is his HDMI cable?

1.5m (5 feet).

> I'd also like to know if he crashes in Safari, and if wiggling the HDMI
> cable can trigger crashes (and display problems).

He said, he will launch Safari until he'll reproduce the crash. However, the frequency isn't so high in his environment. So, probably, we need to wait some days.
I'm particularly interested to know if he can reproduce the crashes by wiggling the HDMI cable, in Firefox or Safari.  If he can, it probably indicates some flaw in the cable.
Crash Signature: [@ CVCGDisplayLink::getDisplayTimes(unsigned long long*, unsigned long long*, unsigned long long*)] → [@ CVCGDisplayLink::getDisplayTimes(unsigned long long*, unsigned long long*, unsigned long long*)] [@ CVCGDisplayLink::getDisplayTimes]
Crash volume for signature 'CVCGDisplayLink::getDisplayTimes':
 - nightly (version 50): 2 crashes from 2016-06-06.
 - aurora  (version 49): 9 crashes from 2016-06-07.
 - beta    (version 48): 18 crashes from 2016-06-06.
 - release (version 47): 372 crashes from 2016-05-31.
 - esr     (version 45): 176 crashes from 2016-04-07.

Crash volume on the last weeks:
             Week N-1   Week N-2   Week N-3   Week N-4   Week N-5   Week N-6   Week N-7
 - nightly          0          1          0          1          0          0          0
 - aurora           1          1          3          1          0          0          0
 - beta             6          3          2          0          2          3          0
 - release         53         71         68         57         50         42         11
 - esr             24         18         17         22         16         21          8

Affected platform: Mac OS X
Crash volume for signature 'CVCGDisplayLink::getDisplayTimes':
 - nightly (version 52): 3 crashes from 2016-09-19.
 - aurora  (version 51): 6 crashes from 2016-09-19.
 - beta    (version 50): 5 crashes from 2016-09-20.
 - release (version 49): 62 crashes from 2016-09-05.
 - esr     (version 45): 408 crashes from 2016-06-01.

Crash volume on the last weeks (Week N is from 10-03 to 10-09):
            W. N-1  W. N-2
 - nightly       3       0
 - aurora        6       0
 - beta          5       0
 - release      53       9
 - esr          55      45

Affected platform: Mac OS X

Crash rank on the last 7 days:
           Browser     Content   Plugin
 - nightly #247
 - aurora  #182
 - beta    #1863
 - release #820
 - esr     #193
September 8 the crash rate trippled, roughly the time of release for Firefox 45.4.0esr 	buildid 20160905130425
Whiteboard: gfx-noted → [gfx-noted][tbird crash]
I'm not sure whether it's helpful data at all, but it's possible to get this crash stack in Thunderbird 45 as well: bp-592be2aa-9bca-4dd0-8f68-025d22170302
Too late for firefox 52, mass-wontfix.
I just hit this by:

- Computer is asleep (over weekend)
- Pull out mini-displayport from laptop (connected to 4k 24" Dell monitor)
- Opened laptop maybe 30 seconds later

And Nightly crashed.

I'm running Nightly from a few days ago (already updated so I don't know which) on macOS 10.12.5. I can PM you my crash report if you like.

---

fwiw, I frequently get Nightly crashes when waking my computer from sleep and I perform some monitor plug/unplug incantations but, looking at about:crashes, these all seem to be other issues.
In my case I resumed the laptop by opening the lid (I'm always connected to my monitor) and saw that Nightly had crashed:
https://crash-stats.mozilla.com/report/index/44e8ae52-8103-4765-8e5c-460710170726
(Follow-up to comment 33) fwiw, I probably crash at least once daily at work with this STR (I walk away from the computer once an hour throughout the work day and wake up my computer an equal number of times) which can be pretty annoying (especially because I'm a sucker and enter a comment every time).
Component: Graphics → Widget: Cocoa
Priority: P3 → P2
Whiteboard: [gfx-noted][tbird crash] → [gfx-noted][tbird crash][tpi:+]
I get this every other day or so, here's the most recent: https://crash-stats.mozilla.com/report/index/fff1e21a-cecb-4fc7-9e97-10c590171103

My normal workflow which reproduces: 
Be connected to 2 DELL U2413 Display's via USB-C to HDMI dongles
Let the computer go to sleep over night
Mash keys on my external BT keyboard the next morning to wake it up
(In reply to Michael Comella (:mcomella) from comment #33)
> I just hit this by:
> 
> - Computer is asleep (over weekend)
> - Pull out mini-displayport from laptop (connected to 4k 24" Dell monitor)
> - Opened laptop maybe 30 seconds later
> 
> And Nightly crashed.

Note: I no longer get this with the older, non-4K 24" Dell (U2410?) over mini-DisplayPort.
I'm seeing this crash on Firefox almost everyday when my display is back from sleep, and today Thunderbird also crashed at the same time.

Firefox 57: bp-ae2625f0-9400-486b-a1ea-f79401171126
Thunderbird 52: bp-8095f8ba-fb5d-455b-a90b-1252a0171126
#5 crash for mac on tb52.4.0.

There is a history of mac sleep crash issues: bug 1320048, Bug 1112741, Bug 1168227, Bug 1406245
Whiteboard: [gfx-noted][tbird crash][tpi:+] → [gfx-noted][tbird topcrash][tpi:+]
Somehow Firefox 57.0.1 and 57.0.2 weren't crash-y but I'm often seeing this crash after updating to 57.0.3. The latest ID is bp-c5e54d6f-e92d-486a-9f08-19dc31180102. My MacBook Pro (Retina, 15-inch, Mid 2014) is connected to Samsung 28" 4K UHD monitor, and IIRC another Firefox instance on MBP's built-in display has never crashed, so I'm sure this bug is related to external monitors.
This reporter sees it in both Firefox and Thunderbird.  bp-ce6ae4a6-fc16-4671-b196-4eada0171229 Firefox 57.0.1
 0 	CoreVideo	CVCGDisplayLink::getDisplayTimes(unsigned long long*, unsigned long long*, unsigned long long*)	
1 	CoreVideo	CVHWTime::update(double, bool*, bool*)	
2 	CoreVideo	CVXTime::update()	
3 	CoreVideo	CVDisplayLink::runIOThread()	
4 	libsystem_pthread.dylib	_pthread_body	
5 	libsystem_pthread.dylib	_pthread_start	
6 	libsystem_pthread.dylib	thread_start	
7 	CoreVideo	CVDisplayLink::start()	

(In reply to Kohei Yoshino [:kohei] from comment #41)
> ...  The latest ID is bp-c5e54d6f-e92d-486a-9f08-19dc31180102. 
Your stack is the same
Summary: crash in CVCGDisplayLink::getDisplayTimes(unsigned long long*, unsigned long long*, unsigned long long*) → crash in CVCGDisplayLink::getDisplayTimes(unsigned long long*, unsigned long long*, unsigned long long*) Mac coming out of sleep
This is the #2 Mac topcrash on Nightly 20180209102946.
I just came across this article[1], which referenced CVDisplayLinkCreateWithActiveCGDisplays. A quick search in dxr revealed a comment[2] that seems to be discussing a situation that could cause problems here. Personally, I've observed this crash after setting my MacBook in hibernation, attaching external monitors with different display rates than the internal monitor and then waking the MacBook. Markus, I see that you helped review this code way back in bug 552020. Do you have any thoughts here?

[1] http://thume.ca/2017/12/09/cvdisplaylink-doesnt-link-to-your-display/
[2] https://dxr.mozilla.org/mozilla-central/source/gfx/thebes/gfxPlatformMac.cpp#435-439
Flags: needinfo?(mstange)
(In reply to Nicholas Nethercote [:njn] from comment #43)
> This is the #2 Mac topcrash on Nightly 20180209102946.

Looking at the crash signature graph at the top of the bug, crashes with
this signature appear to have started again in the last week of 2017
and continue to happen.  Curiously the crash rates seem to have a
very marked weekly cycle, judging from the graph.
the graph only starts at the end of december, because in bug 1427111 we've purged all reports recorded before that date.
this crash signature would have been around for longer than that.
#5 Firefox crash for Mac.
Still #2 crash for Thunderbird
Summary: crash in CVCGDisplayLink::getDisplayTimes(unsigned long long*, unsigned long long*, unsigned long long*) Mac coming out of sleep → crash in CVCGDisplayLink::getDisplayTimes Mac coming out of sleep (waking)
This is next on my list of things to tackle/investigate after landing bug 860493.
This crash happens to me often, I use Firefox on a MacBook Pro that is attached to 2 external monitors and I put the MBP to sleep & detach it from the monitors every evening. If I can help with info or testing please let me know.
(In reply to Kohei Yoshino [:kohei] from comment #41)
> Somehow Firefox 57.0.1 and 57.0.2 weren't crash-y but I'm often seeing this
> crash after updating to 57.0.3. The latest ID is
> bp-c5e54d6f-e92d-486a-9f08-19dc31180102. My MacBook Pro (Retina, 15-inch,
> Mid 2014) is connected to Samsung 28" 4K UHD monitor, and IIRC another
> Firefox instance on MBP's built-in display has never crashed, so I'm sure
> this bug is related to external monitors.

Firefox is not crashy these days but instead Thunderbird on my *built-in* display starts crashing after upgrading to 52.8.0. The latest report is bp-a09d691e-5ff1-42a1-a82c-f56720180603.
(In reply to Kohei Yoshino [:kohei] from comment #50)
> (In reply to Kohei Yoshino [:kohei] from comment #41)
> > Somehow Firefox 57.0.1 and 57.0.2 weren't crash-y but I'm often seeing this
> > crash after updating to 57.0.3. The latest ID is
> > bp-c5e54d6f-e92d-486a-9f08-19dc31180102. My MacBook Pro (Retina, 15-inch,
> > Mid 2014) is connected to Samsung 28" 4K UHD monitor, and IIRC another
> > Firefox instance on MBP's built-in display has never crashed, so I'm sure
> > this bug is related to external monitors.
> 
> Firefox is not crashy these days 

Just to be clear, Firefox does have crashes - bp-5ba8b407-a8f1-4d69-ba4e-856ed0180602 62.0a1 


> instead Thunderbird on my *built-in* display starts crashing after upgrading to 52.8.0. The latest report is
> bp-a09d691e-5ff1-42a1-a82c-f56720180603.

So you have no external monitors, nor monitor changes, in the wake/sleep period?

Gotta wonder why Thunderbird has higher crash rate.  FWIW, Thunderbird does have HWA disabled by default.
Flags: needinfo?(kohei.yoshino)
Summary: crash in CVCGDisplayLink::getDisplayTimes Mac coming out of sleep (waking) → crash in CVCGDisplayLink::getDisplayTimes Mac coming out of sleep (waking) with external monitor
I do have a Samsung 28" 4K UHD monitor as said before. Firefox is on the external monitor, while Thunderbird is on the built-in monitor. I was surprised to see the crash trend changed in my setting.
Flags: needinfo?(kohei.yoshino)
As mentioned in comment 44, I'm able to reproduce reliably and I think I have a lead. Investigating.
Flags: needinfo?(mstange)
I encounter this bug daily when I bring my MacBook Pro out of sleep. It is connected to an external monitor via mini DisplayLink.
(In reply to Dave from comment #54)
> I encounter this bug daily when I bring my MacBook Pro out of sleep. It is
> connected to an external monitor via mini DisplayLink.

Mini DisplayPort, that is.
(In reply to Stephen A Pohl [:spohl] from comment #48)
> This is next on my list of things to tackle/investigate after landing bug 860493.

good news?
Flags: needinfo?(spohl.mozilla.bugs)
(In reply to Wayne Mery (:wsmwk) from comment #56)
> (In reply to Stephen A Pohl [:spohl] from comment #48)
> > This is next on my list of things to tackle/investigate after landing bug 860493.
> 
> good news?

I don't have any news to report yet as I've been sidetracked with a few other bugs. I'm hoping to get back to this soon.
Flags: needinfo?(spohl.mozilla.bugs)
This is currently the #7 top browser crash on nightly, although it only has 65 crashes in the last 7 days.
This bug is overall the #25 top crasher on nightly, and almost consistently the top Mac crash on beta and release. All the comments are pretty consistent and seems to happen most often when waking from sleep.
I am suffering from this issue. On my two Macbooks (both 2017) Firefox crashes multiple times a day and is completely unstable. My use-case is that I use my laptop in clamshell mode as a desktop at my desk and then open as a normal laptop when I'm away, so I'm connecting/disconnecting an external display several times a day.

Unfortunately I will need to cease using Firefox for now due to its instability, which is a shame as it's my favorite browser. I will follow this thread to help test potential fixes to hopefully return to using Firefox when this bug is resolved.
Duplicate of this bug: 1493534
I think my crash matches this:

crash bp-5e470d5a-a6eb-4cef-b791-7b3440181025
Firefox: 63.0
MacOS: 10.13.6
Duplicate of this bug: 1513257

Having this issue with 64.0.2 on my work Mac. I have one external monitor (Dell U2414H) connected over DP <-> USB-C. Crashing few times a day after I resume the laptop from sleep (when back from lunch). The lid is always closed.

Going to swap the display and see if problem keep happening (Dell U2718Q).

I have the same use-case as you and same symptoms. I’ve tried multiple displays and it doesn’t matter. Firefox is just completely unreliable on macOS. It’s a shame this bug has existed for over three years with no fix.

Can reproduce with 64.0.2 on a 2017 MacBook Pro with two external Dell P2715Q monitors connected via USB-C -> DisplayPort. Firefox crashes when I unlock the machine after the displays have been off (the computer itself doesn't need to have been asleep to reproduce this, just the displays.)

I love Firefox, but this one bug is making my daily experience with it extremely frustrating.

(In reply to Josh Dick from comment #67)

Can reproduce with 64.0.2 on a 2017 MacBook Pro with two external Dell P2715Q monitors connected via USB-C -> DisplayPort. Firefox crashes when I unlock the machine after the displays have been off (the computer itself doesn't need to have been asleep to reproduce this, just the displays.)

This is definitely new information. Could you provide the exact steps to reproduce? It would be great if your steps included starting Firefox, turning off displays, locking/unlocking the machine and anything else that is necessary to reproduce the issue reliably from start to finish. For example, does it matter what screen Firefox is on to reproduce? Thank you!

Flags: needinfo?(josh)

(In reply to Stephen A Pohl [:spohl] from comment #68)

(In reply to Josh Dick from comment #67)

Can reproduce with 64.0.2 on a 2017 MacBook Pro with two external Dell P2715Q monitors connected via USB-C -> DisplayPort. Firefox crashes when I unlock the machine after the displays have been off (the computer itself doesn't need to have been asleep to reproduce this, just the displays.)

This is definitely new information. Could you provide the exact steps to reproduce? It would be great if your steps included starting Firefox, turning off displays, locking/unlocking the machine and anything else that is necessary to reproduce the issue reliably from start to finish. For example, does it matter what screen Firefox is on to reproduce? Thank you!

Unfortunately I can't reproduce the issue 100% reliably, but I can reproduce it for roughly 2 out of every 3 screen wakes. I'll document some steps to the best of my ability. This is all on a 2017 15-inch MacBook Pro running macOS Mojave 10.14.3 and Firefox 64.0.2, but I previously had the same issue with a 2015 MacBook Pro. As I said before, I have two external Dell P2715Q monitors connected via USB-C -> DisplayPort, using the two USB-C ports on the left side of the computer (when looking at its built-in display.)

These steps assume the following:

a) The computer is running in clamshell mode/lid closed, using only the external displays, with external power connected. I normally keep Firefox on the display that is configured as secondary since only one display can be primary, but I doubt that which of the two displays Firefox is shown on makes any difference.

b) The computer is configured in System Preferences -> Energy Saver to "Prevent computer from sleeping automatically when the display is off" while connected to external power.

c) You have some way to sleep the displays without sleeping the computer. I normally do this via a Hot Corner, configured in System Preferences -> Desktop & Screen Saver -> Screen Saver tab -> Hot Corners... and configuring and using a "Put Display to Sleep" hot corner. Using pmset displaysleepnow in the terminal should be exactly the same, but I reproduce this daily using a hot corner.

Finally, here are the steps:

  1. Open Firefox on the secondary display (the one that doesn't have the Dock.) Use it for normal browsing for an hour or so.

  2. Sleep the displays as described above, without sleeping the computer. Right before lunch is a great time for this. :)

  3. Wait 30-60 minutes. The computer should remain awake and idle, and the displays should remain asleep.

  4. Wake up the computer by pressing a keyboard key and log in.

  5. Firefox will have crashed and be showing a crash report window.

I hope this information helps in further investigating this issue.

Flags: needinfo?(josh)

(In reply to Stephen A Pohl [:spohl] from comment #68)

(In reply to Josh Dick from comment #67)

Can reproduce with 64.0.2 on a 2017 MacBook Pro with two external Dell P2715Q monitors connected via USB-C -> DisplayPort. Firefox crashes when I unlock the machine after the displays have been off (the computer itself doesn't need to have been asleep to reproduce this, just the displays.)

This is definitely new information. Could you provide the exact steps to reproduce? It would be great if your steps included starting Firefox, turning off displays, locking/unlocking the machine and anything else that is necessary to reproduce the issue reliably from start to finish. For example, does it matter what screen Firefox is on to reproduce? Thank you!

More info that might help: The two monitors I'm using, Dell P2715Q monitors connected via USB-C -> DisplayPort, are both 4K monitors, and I run both at the same scaled resolution ("Looks like 2560 by 1440".)

(In reply to Michael S from comment #65)

been running this experiment for 2 weeks now. Not a single crash. I do notice that the current display fails to resume from sleep some times and I need to turn it off and on again but Firefox itself never crashed. I'm going to run with this display for another week and switch back to my old display and see if problem returned.

(In reply to Michael S from comment #71)

(In reply to Michael S from comment #65)

been running this experiment for 2 weeks now. Not a single crash. I do notice that the current display fails to resume from sleep some times and I need to turn it off and on again but Firefox itself never crashed. I'm going to run with this display for another week and switch back to my old display and see if problem returned.

Do you have the ability to use two displays at once? That seems to be a factor.

I do, but in my case, the issue happens with a single display.

For me, this bug happens at least every other day, or more frequently.

I am using the Mac in clamshell mode, plugged into a Thunderbolt dock. I have one display plugged into the thunderbolt dock, a 4K ultrawide display.

This bug seems to happen most often when I remove it from the dock and go back to using just the laptop screen or switching back to display plugged into the dock. It's almost like Firefox can't handle transitioning from one display to the other.

Thunderbird has this happen too.

I've also tried USB-C hubs, same problem.

No longer a topcrash for Thunderbird - ranks #46 for 60.4.0.

Whiteboard: [gfx-noted][tbird topcrash][tpi:+] → [gfx-noted][tbird crash][tpi:+]

No longer a topcrash for Thunderbird - ranks #46 for 60.4.0.

But it does rank #3 for Mac Thunderbird.

And along the lines of comment 59, for Firefox 64.0.2 it is #1 Mac crash

Keywords: topcrash-mac

Adding 67 as affected. This continues to be the top Mac crash in most releases.

An additional datapoint.

Using the same setup I described previously (two Dell P2715Q 4K monitors, both running at the same scaled resolution ("Looks like 2560 by 1440")), I switched from using USB-C -> DisplayPort cables that I believe contain no active electronics, to using USB-C -> HDMI cables that do seem to contain active electronics on the HDMI end, and the crashes have completely stopped for me. I don't think the previous DisplayPort cables were defective, since everything else on the Mac worked fine when using those cables.

I just recently switched from USB-C to Display Port cables to now using standard HDMI (single monitor) and I still get the crashes unfortunately. Both Firefox and Thunderbird crash every day for me, and are completely unstable. On my Ubuntu machine, same dock, same cables, same setup, same monitors, I have no issue whatsoever. Basically, I have one dock I plug in between my Ubuntu laptop and Macbook. I unplug one laptop from the dock and switch to the other periodically throughout the day. Ubuntu never has a single issue with this.

Adding the stalled keyword to this bug. On nightly 68 this is the #23 overall top crash.

Keywords: stalled

Person on Thunderbird support forum posted issue with Thunderbird crashing upon waking computer.
https://support.mozilla.org/en-US/questions/1259347

Sumitted crash report:
https://crash-stats.mozilla.com/report/index/d4a7b177-0abd-45df-a4a5-8a9e90190518#tab-bugzilla

OS X 10.14
TB version 60.6.1
Crash Reason EXC_BAD_ACCESS / KERN_INVALID_ADDRESS

https://support.mozilla.org/en-US/questions/1260299
Product Firefox
Release Channel release
Version 67.0
Build ID 20190516215225 (2019-05-16)
OS OS X 10.14
OS Version 10.14.5 18F132

bp-6ec4ea83-017d-4df4-88fe-c47c50190527
Signature: CVCGDisplayLink::getDisplayTimes

Crash Reason EXC_BAD_ACCESS / KERN_INVALID_ADDRESS

Just to add some more info, I have this issue as well. I have my MBP (Mojave 10.14.5) open as a side monitor, using external ASUS VE278 (27", 1920 x 1080) with USB-C connector to a 3 ft long HDMI cable as my main display. I notice after I sleep the machine for an extended period of time (overnight, or leaving for a few hours) and when I come back and waken it up, the only thing that has closed unexpectedly is Firefox (v 67.0.4). Safari, Chrome, and all other apps remain open and unaffected.

https://crash-stats.mozilla.org/report/index/4c586e72-4ad6-4b1c-a339-617c00190703

(In reply to denis.kosovich from comment #86)

This is just unbelievable! This issue was opened four years ago!!!
:( :( :(

https://crash-stats.mozilla.org/report/index/b973c380-1642-48ca-9011-68fd60190819

This is now the #6 crash for Nightly (Firefox).

Stephen, is this something you might look into again? Since the problem is getting worse maybe something critical has changed. Following up in email.

Some information I haven't seen on this bug yet is if those who experience the crash actually use the integrated or discrete GPU. Maybe this is related to the discrete GPU, and only visible on MBP starting from 15". With the 13" MBP, which only has the integrated GPU I never had this crash, and I'm using an external monitor each day.

So if you experience this crash please have a look at the following page, and check which GPU is used for Firefox:
https://support.apple.com/en-us/HT202053

Based on crash-stats 40% of all the crashes happen with the following graphic adapter:

Baffin [Radeon RX 460/560D / Pro 450/455/460/555/555X/560/560X] (0x67ef)

So if anyone notices that the discrete GPU is in use please try to disable it, and force the Mac to use the internal GPU. Try and check if that maybe fixes the crash after sleep.

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #89)

Some information I haven't seen on this bug yet is if those who experience the crash actually use the integrated or discrete GPU. Maybe this is related to the discrete GPU, and only visible on MBP starting from 15".

I have a 15" MBP with integrated Intel Iris Pro only, and I used to run into this crash quite regularly before switching to Chrome about a year ago (not because of this issue but for performance / GPU temp reasons).

Crash stats shows AMD Baffin [Radeon RX 460/560D / Pro 450/455/460/555/555X/560/560X] (0x67ef) is the most common Graphics Adapter affected, with over 40% of the crashes coming from that version. Various Intel adapters such as Crystal Well Integrated Graphics Controller (0x0d26) only account for about 8% of the crashes over a 6 month span.

Hi Marcia, Henrik and Liz. This has long bugged me, though I've never been able to reproduce it (I failed again just now). But I might be able to get somewhere by using my HookCase (https://github.com/steven-michaud/HookCase) to learn more about how CVDisplayLink::start() and CVCGDisplayLink::getDisplayTimes() are supposed to work.

This is a very complex problem, and I won't be working on it full time (since I'm now retired). So don't expect me to come up with a solution quickly. But this kind of problem is just the thing HookCase is best at, so it seems a shame not to try it out here.

I have this problem on a 13” MacBook Pro. It doesn’t happen as often but it does happen. It seems to happen the most often when connecting or disconnecting a thunderbolt dock. The thunderbolt dock doesn’t have a GPU. So it’s definitely not specific to the 15” model.

I have this problem (last happened 4 days ago) on a Mid 2015 MacBook Pro 15" which only has an integrated GPU.

By assigning this bug to myself, I don't mean to stop other people from working on it. I do mean to show that I'll be spending serious time on it over the next few weeks.

Assignee: nobody → smichaud

(In reply to Steven Michaud [:smichaud] (Retired) from comment #93)

Hi Marcia, Henrik and Liz. This has long bugged me, though I've never been able to reproduce it (I failed again just now). But I might be able to get somewhere by using my HookCase (https://github.com/steven-michaud/HookCase) to learn more about how CVDisplayLink::start() and CVCGDisplayLink::getDisplayTimes() are supposed to work.

This is a very complex problem, and I won't be working on it full time (since I'm now retired). So don't expect me to come up with a solution quickly. But this kind of problem is just the thing HookCase is best at, so it seems a shame not to try it out here.

Thanks so much Steven for taking time to look into this! We really appreciate your efforts.

I've discovered that the CVDisplayLink methods work a little strangely (and inefficiently) in Firefox as compared with Safari and Chrome. So it's possible that cleaning this up a bit may make this bug go away. In the next day or two I'll come up with a patch to do this, and (if I still have access to it) do a tryserver build. Once I get a tryserver build I'll ask people who can reproduce this bug to test it for a while, to see if my patch "fixes" this bug (i.e. works around it, since it's almost certainly an Apple bug).

But tryserver builds are made on the trunk (on the mozilla-central branch). So first I need to get people to test an unaltered mozilla-central nightly build. Here's a link to today's mozilla-central nightly:

http://ftp.mozilla.org/pub/firefox/nightly/2019/08/2019-08-20-09-38-33-mozilla-central/firefox-70.0a1.en-US.mac.dmg

Whoever can reproduce this bug at all reliably, please download this build and try it out for a day or two. Post your results here. If Josh Dick is still around (and still sees this bug), I'd particularly like to hear from him.

I hope and expect that you will still see this bug using the mozilla-central nightly I linked above. Otherwise there isn't much point to my doing a tryserver build.

You're most welcome, Marcia :-) Mozilla bugs are more fun now that I don't have to live with and breathe them 24-hours a day. It's also nice to get a chance to put my HookCase debugging tool through the paces.

I think I've figured out the proximate cause of this bug's crashes:

They happen dereferencing a variable at offset 0x570 (on macOS 10.13.6) in a CVCGDisplayLink object. This variable is set in CVCGDisplayLink::setCurrentDisplay(CGDirectDisplayID displayID). So if CVCGDisplayLink::getDisplayTimes() is called on an object before CVCGDisplayLink::setCurrentDisplay() is called on it, it's value is still NULL, and you get a crash trying to access data at offset 0x40.

Of course, knowing this doesn't (by itself) tell me how to work around Apple's bug. But at least it's an important clue.

Steven,

I'm indeed still around, but unfortunately can't reproduce this anymore, since I no longer have the 2017 15" MacBook Pro that I was previously able to somewhat-reliably reproduce this with.

FWIW: That computer was owned by my employer and has since been swapped out for a 2018 15" MacBook Pro, which I have yet to see this crash on, though it is now connected to lower-resolution displays (144p 2560 × 1440 instead of 4K.) I also have a 2018 Mac Mini connected to the aforementioned 4K monitors via USB-C -> DisplayPort cables, and have seen this crash happen maybe once ever on it.

In any case, thanks for taking the time and effort to investigate this once again!

Status: NEW → ASSIGNED

I've come up with some very limited changes that might make a difference here. But when I pushed them to try I got the following error message:

remote: smichaud@pobox.com@hg.mozilla.org: Permission denied (publickey).

Presumably that means I no longer have permission to use the tryserver. Anyone know who I should contact about this? I'm hoping that you'll know, Liz, or know who to ask.

Flags: needinfo?(lhenry)

(In reply to Steven Michaud [:smichaud] (Retired) from comment #101)

I've come up with some very limited changes that might make a difference here. But when I pushed them to try I got the following error message:

remote: smichaud@pobox.com@hg.mozilla.org: Permission denied (publickey).

Presumably that means I no longer have permission to use the tryserver. Anyone know who I should contact about this? I'm hoping that you'll know, Liz, or know who to ask.

Steven: I filed Bug 1576632 to have Infra help with this.

Thanks, MattN, for pointing me in the right general direction. And thanks, Marcia for finding the right people to ask. I've been able to push my test patch to the tryserver, and the build has completed.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=4b8b27e028fb521121d5a961e2eeb6c9ff8f6c58

But now I don't know where to look for the build to test with. It seems tryserver builds are no longer being copied to https://ftp.mozilla.org/pub/firefox/try-builds/, and I can't find any Mozilla documentation that tells me the new location. Nor can I find any link on the above page that points to it.

If you click on the B for the build you want, then Job Details, then look for target.dmg.

Thanks, Timothy!

So here's the optimized build made with my patch (optimized as opposed to debug):
https://queue.taskcluster.net/v1/task/WMRaE5GdS0yW_Ig6oTlW9A/runs/0/artifacts/public/build/target.dmg

Do you have any idea how long these builds stay available in this location? Is it a few hours, a few days, or a few weeks?

I'm not sure, it's at least a few days, probably a few weeks.

(In reply to Timothy Nikkel (:tnikkel) from comment #107)

I'm not sure, it's at least a few days, probably a few weeks.

Thanks! I just found an extant target.dmg from a tryserver build made on 2019-08-08, which seems to point in the direction of "a few weeks".

So ...

I have a request for whoever still sees this bug at all regularly (preferably several times a day). Please download the following build and test with it for at least a few days. Do with it whatever you normally do, and report back with your results. With luck it fixes this bug.

https://queue.taskcluster.net/v1/task/WMRaE5GdS0yW_Ig6oTlW9A/runs/0/artifacts/public/build/target.dmg

Above (in comment #98) I asked you to test first with a current mozilla-central nightly (the build to which my patch was added which may fix this bug). I'd assumed that this bug might not necessarily happen in (unaltered) mozilla-central nightly builds. But that turns out to be wrong. https://crash-stats.mozilla.com shows that a lot of this bug's crashes happen in mozilla-central nightlies. So please just test target.dmg.

https://crash-stats.mozilla.com/signature/?product=Firefox&version=70.0a1&signature=CVCGDisplayLink%3A%3AgetDisplayTimes&date=%3E%3D2019-08-19T23%3A07%3A00.000Z&date=%3C2019-08-26T23%3A07%3A00.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=install_time&_columns=startup_crash&_columns=platform_version&_sort=-date&page=1

ni on Michael and Jay since they said they could reproduce it. Please see Comment 109 for instructions. Thanks!

Flags: needinfo?(reg1)
Flags: needinfo?(michael.weibel)

I downloaded the mentioned build and am running it now side-by-side with my normal instance for the next few days 👍

Flags: needinfo?(michael.weibel)

On my end it happens when I connect/disconnect a USB-C or Thunderbolt dock with a monitor attached, so I will try to see if there's a way to reproduce it.

Flags: needinfo?(reg1)

I've been trying to use HookCase to trigger these crashes. The basic idea is to hook a system call and make it behave incorrectly. So far it hasn't worked, but it might help if I knew on which URLs these crashes were most likely to happen. That is if there actually is any pattern to these crashes' URLs. But I don't have permission to view the crash URLs in https://crash-stats.mozilla.com/, or even to view the comments.

Marcia, I believe you have these permissions. At least you used to. Could you look at the URLs and see if there's some pattern to them. And if so, could you list 10 or so of the most common? Thanks in advance!

Flags: needinfo?(mozillamarcia.knous)

there's no particular pattern in the urls that get recorded with the crashes - they are just some popular sites you'd suspect showing up frequently:

1 https://mail.google.com/mail/u/0/#inbox 805 2.92 %
2 about:home 472 1.71 %
3 about:newtab 414 1.50 %
4 https://www.facebook.com/ 149 0.54 %
5 about:blank 104 0.38 %
6 about:sessionrestore 103 0.37 %
7 https://web.whatsapp.com/ 92 0.33 %
8 https://mail.google.com/mail/u/1/#inbox 85 0.31 %
9 https://inside.amazon.com/en/Pages/default.aspx 78 0.28 %
10 https://www.youtube.com/ 62 0.22 %

some of the recent comments:

  • Plugged in my mpb into a dock
  • Waking my computer up from sleep and the error appeared as soon as the screen came on.
  • I was away from my MacBook laptop Pro when this happened. Computer was asleep or in lock screen mode. This has happened before.
  • Crashed after unlocking the screen
  • The browser crashes as soon as I unplug my external screens.
  • Turned external monitor off and on again
  • MacBook went to sleep (?) and found Firefox crashed after unlocking the screen
  • this is the second time this has happened in the same day where after my MacBook pro 2017 in clam shell mode using a 40” 4k monitor was
  • From what I can tell, I disconnected my MacBook Pro running MacOS Mojave 10.14.6 from an external monitor and unplugged the power cable (it was 100% charged). After a couple of hours, I connected to power and a different monitor, and Firefox had crashed.
Flags: needinfo?(mozillamarcia.knous)

Thanks, phillipp. As you suspected, the information I asked for isn't very helpful.

But I'm slowly making progress. I've now discovered that making the following internal API return NULL on every tenth call is a thoroughly reliable way of triggering this bug's crashes -- and only this bug's crashes. Doing that also doesn't seem to bother Safari or Google Chrome (it doesn't cause crashes in either).

__ZL32get_current_display_system_statev
get_current_display_system_state()

This method is called often, and in many different contexts. It's also carefully written to minimize the possibility that it will return NULL. You'd think that messing with it like I've done would cause a much larger variety of crashes, in all applications. But very mysteriously it doesn't.

Unfortunately, my patch doesn't stop these crashes. My altered target.dmg build crashes just as much as do unaltered mozilla-central nightlies, at least in my tests. So I suspect that the people testing my patch will still see them. Do keep testing, though, and let us know your results.

I think I've demonstrated that problems with get_current_display_system_state() are very likely at the root of this bug's crashes. Now I need to figure out how to insulate Firefox from these problems.

The get_current_display_system_state() method is in the SkyLight private framework on Mojave and HighSierra. So far I've only tested on Mojave.

Same results on High Sierra. I did see one "problem with tab" message in Safari, but it didn't bring down the whole app. That's presumably because, in Safari, graphics rendering takes place in a separate process.

I think I've found a real fix for this bug. As best I can tell, I've found Apple's bug and have a well-targeted workaround for it. I need to do a bit more testing, though. I expect I'll post it sometime tomorrow, together with another tryserver build.

I've started a tryserver build. I expect it will run for at least several hours, and possibly overnight.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=8cc8ebc9191252cfeb5231570ab019fc3acab79f

(In reply to Steven Michaud [:smichaud] (Retired) from comment #118)

I've started a tryserver build. I expect it will run for at least several hours, and possibly overnight.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=8cc8ebc9191252cfeb5231570ab019fc3acab79f

Build link https://queue.taskcluster.net/v1/task/Br60USjLShqQD5zNk_BDCw/runs/0/artifacts/public/build/target.dmg

Thanks, Timothy. But I'd prefer if people waited a bit to test with it. I'm having trouble devising a good test (via HookCase) of whether or not it works properly. With luck I'll have figured that out by tomorrow. Then I can explain what I've done.

I just discovered that Firefox Nightly.app from the tryserver builds' target.dmg won't run when you double-click on it. It hasn't been signed :-(

Up til now I've been running them from the command line, which gets around the problem. I wonder why our testers haven't mentioned that.

You can also open them by context clicking and choosing open from the context menu.

Here's my description of Apple's bug, and how my latest patch works around it:

As I mentioned above (in comment #99), these crashes happen when a CVCGDisplayLink object's pointer variable is dereferenced, in CVCGDisplayLink::getDisplayTimes(), before it's had a chance to be initialized (while it's still NULL). This variable is set in CVCGDisplayLink::setCurrentDisplay(CGDirectDisplayID displayID), and at first I thought that this method was being called out of order, or being skipped entirely. That's not the case. In fact setCurrentDisplay() returns early when it makes a call to CGDisplayIDToOpenGLDisplayMask() that fails (when it returns 0). This failure causes setCurrentDisplay() to return early, skipping over the code that sets the pointer that's later dereferenced in getDisplayTimes().

setCurrentDisplay() does return an error ( kCVReturnInvalidDisplay) when its call to CGDisplayIDToOpenGLDisplayMask() fails. The bug is that its caller, CVCGDisplayLink::initWithCGDisplays(), doesn't itself return an error when this happens. initWithCGDisplays() succeeds, as does the method that calls it (CVDisplayLinkCreateWithCGDisplays(), in turn called by CVDisplayLinkCreateWithActiveCGDisplays()). Apple's code then blithely continues to use the CVCGDisplayLink object until getDisplayTimes() tries to dereference its uninitialized (NULL) pointer.

CGDisplayIDToOpenGLDisplayMask() fails (returns 0) when its call to get_current_display_system_state() returns NULL (an error condition). As I mentioned above in comment #115, get_current_display_system_state() is carefully written to minimize the possibility that it will return NULL. That it can still do so appears to be at the root of this bug's crashes. But I wasn't able to figure out why it can sometimes return NULL, and so I'm not able to workaround this bug by somehow guaranteeing that this can't happen. Instead I've decided to call CGDisplayIDToOpenGLDisplayMask() directly in XUL's OSXVsyncSource::OSXDisplay::EnableVsync(), and do an early return whenever CGDisplayIDToOpenGLDisplayMask() returns 0. This avoids calling CVDisplayLinkCreateWithActiveCGDisplays() or CVDisplayLinkCreateWithCGDisplays() in conditions that will inevitably lead to crashes in CVCGDisplayLink::getDisplayTimes().

Attached file Fix bug 1201401 (obsolete) —

For reference, here's my patch from the latest tryserver build:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=8cc8ebc9191252cfeb5231570ab019fc3acab79f

Attached file bug1201401-hook-library.diff (obsolete) —

I couldn't have figured all this out without HookCase.

Here's the hook library that I used for general exploration of how the CVDisplayLink functions work, and of what causes this bug's crashes. To make it easier to read, it's formatted as a patch on the original HookLibraryTemplate/hook.mm in the HookCase distro.

Attached file bug1201401-testhook-library.diff (obsolete) —

Here's the hook library I used to test that my patch works as expected. Since I couldn't figure out this bug's underlying cause (why get_current_display_system_state() sometimes returns NULL), it's a bit preconceived -- it's closely tailored to the sequence of events that leads up to this bug's crashes, and to my workaround for them. But at least it shows that my patch works as it claims to. With luck it will also stop this bug's crashes :-)

When the time comes, I plan to submit two patches for review:

  1. The patch in comment #124 (https://bugzilla.mozilla.org/attachment.cgi?id=9089552).

  2. A patch that cleans up existing CVDisplayLink code, in a small way. For example, I don't think we need to register a callback if the call to CVDisplayLinkCreateWithCGDisplays()/CVDisplayLinkCreateWithActiveCGDisplays() in OSXVsyncSource::OSXDisplay::EnableVsync() fails. I suspect it's just fine to return early.

It should, I hope, be possible to land the first patch quickly. It almost certainly does no harm, and it might do a lot of good. The second patch may be more controversial, but it's also a lot less urgent. Before I request any reviews, though, I need to refamiliarize myself with the Bugzilla reviewing infrastructure (and make sure I still have permission to use it). I'll save that for next week, probably starting Tuesday (since Monday is Labor Day).

Something I forgot to mention earlier:

The CVDisplayLink functions (like CVDisplayLinkCreateWithActiveCGDisplays() and CVDisplayLinkSetOutputCallback()) are called a lot more often in Firefox than they are in Safari or Chrome. This isn't really a problem, and seems to be down to design differences with regard to vsync. But it does explain why this bug's crashes happen much more often in Firefox than they do in Safari or Chrome -- more frequent successful calls to CVDisplayLinkCreateWithActiveCGDisplays() mean more chances that get_current_display_system_state() will return NULL and trigger the crashes.

My first hook library, posted in comment #125 (https://bugzilla.mozilla.org/attachment.cgi?id=9089557), contains two tests of get_current_display_system_state(): One (mentioned in comment #115) that just makes it return NULL on one in every ten calls, and another that's tailored much more specifically to the conditions of this bug. The second test reliably crashes all three browsers in CVCGDisplayLink::getDisplayTimes().

Attached file Bug 1201401 - Update patch (obsolete) —

Depends on D44525

I appear to have messed up with Phabricator. I will try to figure out how to fix things. Any suggestions will be appreciated :-)

Try folding the two patches into one locally, and then using moz-phab submit . . to update the first patch. You can then mark the second patch as abandoned using the action dropdown at the very end of the phabricator page.

Attachment #9090177 - Attachment is obsolete: true

Thanks, Markus! I followed your suggestion and it seems to have worked.

Possibly related?

  • Bug 1381485 - Hangs frequently while sending imap mail while copying message to imap Sent folder on Mac. - displaying the progress bar. No problem if Sent is set to local folder. Deadlock on CGLClearDrawable
  • Bug 1398807 - Crash in nsMsgCompose::CloseWindow
Flags: needinfo?(smichaud)

(In reply to Wayne Mery (:wsmwk) from comment #134)

Possibly related?

  • Bug 1381485 - Hangs frequently while sending imap mail while copying message to imap Sent folder on Mac. - displaying the progress bar. No problem if Sent is set to local folder. Deadlock on CGLClearDrawable
  • Bug 1398807 - Crash in nsMsgCompose::CloseWindow

I think it's very unlikely.

I expect that bug 1201401 (this bug) always leads to a crash in CVCGDisplayLink::getDisplayTimes(). Hangs, or crashes in other locations, are almost certainly unrelated. Only crashes on macOS 10.13 (High Sierra) and 10.14 (Mojave) are symbolicated in crash reports, though. On earlier versions of macOS they'll show up as crashes at an address in the CoreVideo framework.

For example:

macOS 10.12: CoreVideo@0xba47
OS X 10.11: CoreVideo@0xc13d
OS X 10.10: CoreVideo@0x2955

Flags: needinfo?(smichaud)
Crash Signature: [@ CVCGDisplayLink::getDisplayTimes(unsigned long long*, unsigned long long*, unsigned long long*)] [@ CVCGDisplayLink::getDisplayTimes] → [@ CVCGDisplayLink::getDisplayTimes(unsigned long long*, unsigned long long*, unsigned long long*)] [@ CVCGDisplayLink::getDisplayTimes] [@ CoreVideo@0xba47] [@ CoreVideo@0xc13d] [@ CoreVideo@0x2955]

These crashes still happen on macOS 10.15, and crash reports containing them aren't symbolicated.

Crash Signature: [@ CVCGDisplayLink::getDisplayTimes(unsigned long long*, unsigned long long*, unsigned long long*)] [@ CVCGDisplayLink::getDisplayTimes] [@ CoreVideo@0xba47] [@ CoreVideo@0xc13d] [@ CoreVideo@0x2955] → [@ CVCGDisplayLink::getDisplayTimes(unsigned long long*, unsigned long long*, unsigned long long*)] [@ CVCGDisplayLink::getDisplayTimes] [@ CoreVideo@0xba47] [@ CoreVideo@0xc13d] [@ CoreVideo@0x2955] [@ CoreVideo@0x1724d]

Markus, I'd like to respond to your comment in Phabricator, but I don't know how. Thanks in advance for your help!

Flags: needinfo?(mstange)

If you're logged in on Phabricator (which you probably are, given that it let you mark a patch as abandoned earlier), you should see reply buttons in the top right corners of the boxes that contain my inline comments. Those buttons are a curved arrow between the text "Not Done" and the cross icon that collapses the comment. The reply buttons will let you enter a reply to an inline comment, but that comment will not be submitted even if you save it. To actually submit your comments, you need to click the "Submit" button at the very end of the page. The end of the page is also where the textbox for general comments is located.

Flags: needinfo?(mstange)

Thanks. I'm definitely logged in. Your description of how to reply seems only to apply to inline comments. As best I can tell, I can only reply to your general comment in the window at the bottom of the page. I'll do my best, and hopefully not mess things up too badly.

As best I can tell, I can only reply to your general comment in the window at the bottom of the page.

Ah, yes. That's the correct way to do that.

Markus, I've responded to your general comment. But I made a serious mistake, so I revised my response. This is just to let you know to refresh your screen :-)

Attachment #9090152 - Attachment description: Fix Bug 1201401 → Bug 1201401 - Work around a crash in CVCGDisplayLink::getDisplayTimes. r=mstange

My workaround has made the task less urgent, but we need to get Apple to fix this bug. In the past I've reported similar bugs to Apple on my own authority, in the hope that Apple would pay attention and fix them. Sometimes they did. But, unlike this bug, they were all reproducible. And I'm no longer working for Mozilla.

Recently, in bug 1570451, Mozilla reported a serious Catalina bug to Apple and they fixed it. I wonder if we should use the same channels here as were used in that bug.

Flags: needinfo?(mozillamarcia.knous)
Flags: needinfo?(haftandilian)

Amazing work, Steven! I've forwarded this issue to our Apple contact referencing this bug and your explanation in comment 123. I'll update the bug if there is any progress.

Flags: needinfo?(haftandilian)
Flags: needinfo?(mozillamarcia.knous)

Thanks Steven, the patch is clear to land. I'll let you press the buttons on https://lando.services.mozilla.com/D44525/ :)

Pushed by smichaud@pobox.com:
https://hg.mozilla.org/integration/autoland/rev/77662a255e78
Work around a crash in CVCGDisplayLink::getDisplayTimes. r=mstange
Status: ASSIGNED → RESOLVED
Closed: 3 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla71

As the stats table (under Crash Data above) shows, the latest mozilla-central nightly (with build id 20190906094324) is the first one with this bug's patch. I'm going to be keeping an eye on the number of crashes recorded there. I'm quite confident that this bug is fixed, and that the number will continue to be '0'. But you never know for sure.

By the way, I love the stats table. BMO didn't have it back in my day.

Awesome seeing this longstanding bug fixed, welcome back Steven! That said, I think we should let this fix ship with Fx70/68.2esr so it gets some bake time on Beta rather than trying to uplift into an Fx69 dot release before that.

(In reply to Ryan VanderMeulen [:RyanVM] from comment #148)

Awesome seeing this longstanding bug fixed, welcome back Steven! That said, I think we should let this fix ship with Fx70/68.2esr so it gets some bake time on Beta rather than trying to uplift into an Fx69 dot release before that.

Are you thinking of uplifting this bug's patch to the 70 branch (the current beta branch)? Not a bad idea. But let's wait a few days to make sure the number of crashes stays at '0'. We are, after all, dealing with a non-reproducible bug.

(In reply to Steven Michaud [:smichaud] (Retired) from comment #149)

Are you thinking of uplifting this bug's patch to the 70 branch (the current beta branch)? Not a bad idea. But let's wait a few days to make sure the number of crashes stays at '0'. We are, after all, dealing with a non-reproducible bug.

Yeah, there's no rush here (we just started the new cycle). But yeah, I think this would be great to uplift to Beta & ESR68 once we're confident the fix is working on Nightly without new regressions.

I just noticed a very bad sign. There's been one CVCGDisplayLink::getDisplayTimes() crash in the 20190906094324 mozilla-central nightly:

https://crash-stats.mozilla.com/report/index/0ec9eda7-3df8-4c50-aca2-e179c0190906

I'll wait a few days to see if a pattern emerges -- for example if there now appear to be fewer crashes than previously. But clearly my patch doesn't work around all of these crashes. One possibility is that there's a timing problem -- that my patch's call(s) to CGDisplayIDToOpenGLDisplayMask() can fail to return '0' even when EnableVsync()'s subsequent call to CGDisplayIDToOpenGLDisplayMask() (via CVDisplayLinkCreateWithCGDisplays()) does return '0' (and triggers the crash).

It's possible that I won't be able to find a true fix/workaround for this bug until I figure out why get_current_display_system_state() sometimes returns NULL. That will be a very tall order.

Status: RESOLVED → REOPENED
Resolution: FIXED → ---

I expect to spend at least several days trying to figure out why get_current_display_system_state() sometimes returns NULL, starting next week. Until I've learned more about that, I probably won't have much to say.

One or two of those crash reports must be from me. I installed nightly build, and it crashes consistently when I unplug the external monitors with laptop lid closed, then reconnect them.

(In reply to Haitao Li from comment #155)

One or two of those crash reports must be from me. I installed nightly build, and it crashes consistently when I unplug the external monitors with laptop lid closed, then reconnect them.

Could you write up detailed steps to reproduce? Given past experience, it's very unlikely they'll work for me. But they might, and it'd be good to have your STR on record. Be sure to include what kind of equipment you have -- the computer, the external display, and the cable used to connect them. Thanks in advance!

(Following up comment #154)

In the meantime I don't think we should back out my current patch. It's very unlikely to cause harm, and we might learn something from it -- for example whether or not it reduces the crashes' frequency.

I'm giving this another try, with another patch.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=2daf38e8178fadd14981ac441a73570fc4025837

This time, rather than trying to anticipate when this bug is going to happen, I detect it after it happened. Rather than trying to figure out when CVCGDisplayLink::setCurrentDisplay() is going to leave an internal pointer uninitialized (nulled), I check for it after the fact. I can't access the null pointer directly. But fortunately, when CVCGDisplayLink::setCurrentDisplay() triggers this bug (thanks to get_current_display_system_state() returning null and CGDisplayIDToOpenGLDisplayMask() returning 0), it also leaves the "current display" internal variable uninitialized (zeroed). This I can access, using the documented CVDisplayLinkGetCurrentCGDisplay() method.

As best I can tell, the reason my previous patch worked so poorly is that the condition(s) that cause get_current_display_system_state() to return null can change very quickly, from one call to the next. Given two consecutive calls to this function, there seems to be about a 50/50 chance that the first will return null and the second not, or vice versa (that is if the general conditions for this bug prevail, whatever they are).

I'm going to wait for my try build and its tests to finish. Then tomorrow I'll ask Stephen Pohl for a review. I'd ask Markus Stange again, but I see from bug 1578075 comment #12 I see that he's going on PTO tomorrow.

The crash is not as consistent as I first thought. After I installed the nightly build, I got 2 crashes in a row just by unplugging the USB-C display cables and plugged back in. But after that it didn't happen very often. I did get another crash today. I have a Macbook Pro 2019, and two external LG 4K monitors connected with USB-C cables. But I believe I had the same crash with just one external monitor. I don't use laptop monitor and external monitor at the same time. I always plug/unplug monitors when the lid is closed.

Attached file Fix bug 1201401

Here's my latest patch, for reference.

See comment #123 above for a detailed description of Apple's bug, and comment #158 for an explanation of the new approach I take in this patch (fixing the bug after it happens, rather than trying to avoid it before it happens).

There's yet one more thing that's new about this patch. Above, in comment #127, I said I wanted (at some point in the future) to get rid of the RetryEnableVsync() callback. As best I could tell, in my tests, it didn't serve any useful purpose. I've now changed my mind. While running an earlier version of my latest patch on top of a HookCase hook library that emulates this bug's crashes, I noticed one case of the display not getting updated until I manually refreshed the browser window. This happened when a CMD-TAB command coincided with an early return caused by my workaround for this bug's crashes. I now think it's best that we trigger the callback when my workaround causes an early return. Given that Apple's bug is so intermittent, I think this is unlikely to cause trouble. And it will presumably avoid the display failing to refresh when my workaround is exercised.

Attachment #9089552 - Attachment is obsolete: true
Attached file bug1201401-hook-library (obsolete) —

Here's the latest version of my HookCase hook library for testing. It can be used both for general exploration of how the CVDisplayLink functions work in different browsers (when you undefine BUG_1201401_CRASHTEST), and for emulating this bug's crashes. It can also be used to test my latest patch.

Attachment #9089557 - Attachment is obsolete: true
Attachment #9089562 - Attachment is obsolete: true

Haitao Li, you might want to try out this test build made with my latest patch:

https://queue.taskcluster.net/v1/task/ZkLEmxKUQwKz8RNN4QIH-A/runs/0/artifacts/public/build/target.dmg

Steven, I'm running your test build now. Will report back tomorrow.

This fixes a small bug in my testing hook library.

Attachment #9091865 - Attachment is obsolete: true
Target Milestone: mozilla71 → ---
Pushed by smichaud@pobox.com:
https://hg.mozilla.org/integration/autoland/rev/782ce43922df
Work around a crash in CVCGDisplayLink::getDisplayTimes. r=spohl
Status: REOPENED → RESOLVED
Closed: 3 months ago3 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla71

Here's the first mozilla-central nightly with my second patch (build id 20190911215306):

http://ftp.mozilla.org/pub/firefox/nightly/2019/09/2019-09-11-21-53-06-mozilla-central/firefox-71.0a1.en-US.mac.dmg

Once again I'm going to keep an eye on the number of crashes in this and subsequent builds. If they stay at zero for a few days, I think we can consider this bug truly fixed.

Whiteboard: [gfx-noted][tbird crash][tpi:+] → [gfx-noted][tbird topcrash][tpi:+]

I haven't got a crash with the new fix so far. It's looking good.

(In reply to Haitao Li from comment #170)

I haven't got a crash with the new fix so far. It's looking good.

I'm glad to hear it. And the stats table (under Crash Data above) still shows zero crashes with my new patch. But I want to wait a few more days before requesting uplift to the Beta (70) and ESR68 branches, as per comment #150 above.

Comment on attachment 9093104 [details] [diff] [review]
bug1201401 patch for beta (70) branch

Beta/Release Uplift Approval Request

  • User impact if declined: Mac topcrasher will remain unfixed
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): The crash workaround makes very reasonable assumptions
  • String changes made/needed:
Attachment #9093104 - Flags: approval-mozilla-beta?

Comment on attachment 9093105 [details] [diff] [review]
bug1201401 patch for esr68 branch

ESR Uplift Approval Request

  • If this is not a sec:{high,crit} bug, please state case for ESR consideration: This patch fixes a Mac topcrasher
  • User impact if declined: A Mac topcrasher will remain unfixed
  • Fix Landed on Version: 71
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): The crash workaround makes very reasonable assumptions
  • String or UUID changes made by this patch:
Attachment #9093105 - Flags: approval-mozilla-esr68?
Comment on attachment 9093104 [details] [diff] [review]
bug1201401 patch for beta (70) branch

Fixes a longstanding macOS topcrash. Approved for 70.0b8. Thanks for getting to the bottom of this, Steven!
Attachment #9093104 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
QA Whiteboard: [qa-triaged]
Comment on attachment 9093105 [details] [diff] [review]
bug1201401 patch for esr68 branch

Looking good on Beta too. Approved for 68.2esr.
Attachment #9093105 - Flags: approval-mozilla-esr68? → approval-mozilla-esr68+

Hello,

I tried to reproduce this issue and to verify it, but unfortunately due to technical limitations I was not able to do so. If anybody else can confirm that this issue is fix please feel free to do so.

Flags: qe-verify+

Haitao Li, can you confirm that this bug is fixed on your system?

Daniel, in general this bug is not reproducible. Though a few people see (or have seen) this bug consistently and have reported it here, they've often found that replacing some part of their equipment (say their monitor or their computer) makes the problem go away. Haitao Li is the most recent reporter, and I hope he can confirm that he no longer sees this bug (in current mozilla-central nightlies or Firefox 70 Beta8). But those of us who can't reproduce the bug (including me) have been relying on the stats table above (under Crash Data) to confirm that the bug is fixed -- since no crashes have occurred in builds with my latest patch.

Flags: needinfo?(lht1999)

I have been running nightly builds since your fix was merged in and there wasn't a crash so far. Before the fix I got it every day. So I'm quite confident it solved my issue at least.

Flags: needinfo?(lht1999)

On the strength of Haitao Li's comment, I'm marking this bug verified on the 70 and 71 branches.

You need to log in before you can comment on or make changes to this bug.