Closed Bug 854142 Opened 12 years ago Closed 7 years ago

[10.8.3] startup crash in libclh

Categories

(Core :: General, defect)

x86_64
macOS
defect
Not set
critical

Tracking

()

RESOLVED WONTFIX
Tracking Status
firefox19 --- wontfix
firefox20 --- wontfix
firefox21 --- wontfix
firefox22 --- wontfix
firefox-esr17 --- wontfix

People

(Reporter: scoobidiver, Unassigned)

Details

(Keywords: crash, Whiteboard: [startupcrash])

Crash Data

Attachments

(2 files, 1 obsolete file)

It started showing up on March 8 and spiking on March 15 when OS X 10.8.3 was released across all Firefox and TB versions. Is Firefox compatible with OS X 10.8.3? I am requesting a tracking for that. It's still low: #30 top browser in 19.0.2, #58 in 20.0b5 on Mac OS X with duplicates. Signature libclh.dylib@0x112939 More Reports Search UUID 858671e3-2df9-4a28-987b-f11cd2130323 Date Processed 2013-03-23 18:30:07 Uptime 5 Last Crash 8.3 hours before submission Install Age 1.1 days since version was first installed. Install Time 2013-03-22 17:07:41 Product Firefox Version 20.0 Build ID 20130320062118 Release Channel beta OS Mac OS X OS Version 10.8.3 12D78 Build Architecture amd64 Build Architecture Info family 6 model 23 stepping 10 Crash Reason EXC_BAD_ACCESS / KERN_INVALID_ADDRESS Crash Address 0x7 App Notes AdapterVendorID: 0x10de, AdapterDeviceID: 0x 8a0 Processor Notes sp-processor05.phx1.mozilla.com_7968:2008; exploitablity tool: ERROR: unable to analyze dump EMCheckCompatibility True Adapter Vendor ID 0x10de Adapter Device ID 0x 8a0 Frame Module Signature Source 0 @0x76200000007 1 libclh.dylib libclh.dylib@0x112939 2 GeForceGLDriver GeForceGLDriver@0x2f9ef9 3 DiskImages DiskImages@0xb60 4 DiskImages DiskImages@0xb60 5 DiskImages DiskImages@0xb60 6 DiskImages DiskImages@0xb60 7 DiskImages DiskImages@0xb60 8 DiskImages DiskImages@0xb60 9 DiskImages DiskImages@0xb60 10 DiskImages DiskImages@0xb60 11 DiskImages DiskImages@0xb60 More reports at: https://crash-stats.mozilla.com/report/list?signature=libclh.dylib%400x112939 https://crash-stats.mozilla.com/report/list?signature=libclh.dylib%400x112933
> Is Firefox compatible with OS X 10.8.3? If it weren't, this bug (and/or its friends) would be far worse -- together they'd easily be the #1 topcrash on all platforms. I'm currently running on 10.8.3 with no problems. That said, this may be a 10.8.3-specific (and possibly hardware-specific) bug.
libclh.dylib is part of the Apple-provided (OS-provided) GeForceGLDriver, and has been so since at least OS X 10.7.5.
> 2 GeForceGLDriver GeForceGLDriver@0x2f9ef9 This is just after a call to clhDeviceDestroy(void *) -- an undocumented, public (exported) method in libclh.dylib. > [@ libclh.dylib@0x112939] > [@ libclh.dylib@0x112933] These are both inside clhDeviceDestroy(void *). So it looks like clhDeviceDestroy(void *) is being called with an invalid pointer -- possibly to a deleted object. This is almost certainly an OS bug.
I was running 10.8.3 all along the dev path and never hit this crash. I have two machines (retina mac and older intel mac) set up in the office. When I get back I can try to reproduce it on those two machines.
Attached file "Emulated" crash stack
I've been able to "emulate" one of these crashes, using an interpose library. Here's the output (one call to clhDeviceGet() followed by one call to clhDeviceDestroy(), where the crash happens). My "emulated" crash, like the real ones, happens on startup. But in order to get these crashes to happen at address 0x7 (as they all appear to do), you need to pass a corrupt (nonsensical) pointer to clhDeviceDestroy(). So these crashes *don't* happen on accessing a deleted object. This stack was made using today's mozilla-central nightly. (These nightlies don't have their symbols stripped.)
Here's the Breakpad crash report corresponding to my "emulated" stack: bp-c6444ee0-e9f2-4e8a-b62e-0c9be2130326
Here's my interpose library. To use it: 1) Download it and decompress it. 2) Run "make" on it (you'll need to have at least a partial build environment). 3) Set the DYLD_INSERT_LIBRARIES environment variable as follows at a Terminal prompt: export DYLD_INSERT_LIBRARIES=[/full/path/to/]interpose.dylib 4) Run Firefox (or some other program) in the same Terminal prompt. In order to make Firefox crash, you'll need to uncomment "#define CRASH 1" in the interpose library's source code before running "make".
Marcia, please do try to reproduce these crashes. But don't spend too much time, because even you will have to be incredibly lucky to manage it :-)
One thing I forgot to mention: In order for the calls to clhDeviceGet() and clhDeviceDestroy() to happen, you need to have appropriate video hardware -- though that hardware *doesn't* need to currently be the default hardware (on machines that, like most laptops, have more than one kind of video hardware). I find that Retina MacBook Pro, does have it. It has an NVIDIA GeForce GT 650M (besides also having an Intel HD Graphics 4000).
Neither Safari nor Chrome calls glcDeviceDestroy() on startup. But Opera does. So we should expect that these crashes also happen in Opera. Though since Opera doesn't have an open bugbase, we may never hear about them.
> Neither Safari nor Chrome calls glcDeviceDestroy() on startup. clhDeviceDestroy(), of course.
(Following up comment #5) Note that both clhDeviceGet() and clhDeviceDestroy() are called from "(GeForceGLDriver) _gldGetDevicePartitionInfo", which is actually a non-exported method in GeForceGLDriver that (for convenience) I'll call sub_2002f979f(). As best I can tell, the only way these crashes can happen is if the call to clhDeviceGet() fails, and therefore fails to initialize the "device pointer" pointed to by its first parameter. This "device pointer" is what gets passed to clhDeviceDestroy() at the end of sub_2002f979f(), whether or not clhDeviceGet() succeeded (it indicates success by returning '0'). So we should be able to reproduce this bug if we can figure out why clhDeviceGet() might fail. That's not something I've managed to figure out yet.
Version: 19 Branch → Trunk
> As best I can tell, the only way these crashes can happen is if the > call to clhDeviceGet() fails, and therefore fails to initialize the > "device pointer" pointed to by its first parameter. I've confirmed this. And I've found two methods called (indirectly) from clhDeviceGet(), failure in either of which causes a crash identical to the ones reported here: IOConnectMapMemory() IOConnectCallMethod() But I haven't been able to find out why one or both of these methods is failing, or any reasonable way to trigger failures in them. Maybe someone who knows more about the IOKit than I do, or the Mach port infrastructure it uses, would be able to guess. But before I'll be able to, I'll need to learn a lot more about both of these things. So I'm going to have to put off this bug until I have more time to devote to it (or until it becomes more urgent, if it ever does). In the meantime let's hope somebody stumbles upon a way to reproduce these crashes.
Attachment #729817 - Attachment is obsolete: true
Just checked https://crash-stats.mozilla.com/ again and found an interesting comment in bp-33721a87-f412-4c36-b9e4-f56a62130327: "Just starting it up caused it to crash -- during an Apple Update procedure" So I suppose it'd be worthwhile to reinstall the OS X 10.8.3 Combo update (downloadable from Apple at http://support.apple.com/kb/DL1640), and try running Firefox while the updater is still running. Of course you'd need to do this on a Mac that has the "right" hardware, like a Retina MacBook Pro.
> and try running Firefox while the updater is still running Or just after it finishes, but before restarting.
Another interesting comment (from bp-a6eac830-ef00-44c3-846d-bee8a2130327), this time very puzzling: "I was sending an email on gmail and I tried to attach a file" The report is of a startup crash ("Uptime 0"). Clearly the commenter is leaving stuff out. But he might have crashed attaching a file (thanks to one of the Apple filepicker bugs that we've suffered with for years), and then had Firefox crash again on startup.
QA Contact: mozillamarcia.knous
I tried running Firefox several times while the OS X 10.8.3 Combo updater was running, and then again after it had finished but before restarting. No crashes. I also tried running Firefox with the profile manager (which restarts Firefox whenever you choose a profile from the list). No crashes.
Just for the record: Many (if not all) of the crash reports indicate (in App Notes) that hardware acceleration isn't turned on (there's no "GL Context+" or "GL Layers+"). But in fact these crashes can only happen with hardware acceleration turned on. I assume the reason this doesn't show up in App Notes is because the crashes happen so early in the startup process.
When/if we start getting user reports of these crashes, the questions we ask them should include: 1) Do the crashes also happen with Opera? 2) What (if any) messages appear in the System Console?
And yes, my "emulated" crashes also happen in Opera, on startup.
Closing because no crash reported since 12 weeks.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: