Seen while reviewing Mac crash stats. https://crash-stats.mozilla.com/report/list?signature=OSServices@0x56cc7 to the crashes which are all 10.7 and seen across all versions. Comments mention uploading and download files, and crashing while saving an image. The signature is not of much help, but filing so we can keep an eye on 10.7 specific issues. https://crash-stats.mozilla.com/report/index/7e801c9e-be35-4744-9a8c-c08ed2110707 Frame Module Signature [Expand] Source 0 OSServices OSServices@0x56cc7 1 OSServices OSServices@0x632fc 2 DesktopServicesPriv DesktopServicesPriv@0x3c562 3 libsystem_c.dylib libsystem_c.dylib@0x4d1bb 4 DesktopServicesPriv DesktopServicesPriv@0x7af4c 5 DesktopServicesPriv DesktopServicesPriv@0x7d1b7 6 DesktopServicesPriv DesktopServicesPriv@0x7ae9f 7 DesktopServicesPriv DesktopServicesPriv@0x7aea5 8 libdispatch.dylib libdispatch.dylib@0x1909 9 libdispatch.dylib libdispatch.dylib@0x3159 10 libdispatch.dylib libdispatch.dylib@0x2fb5 11 libdispatch.dylib libdispatch.dylib@0x27af 12 libsystem_c.dylib libsystem_c.dylib@0x503d9 13 libsystem_c.dylib libsystem_c.dylib@0x51b84 14 libdispatch.dylib libdispatch.dylib@0x26e9
It is #31 (resp. #6) top browser crasher on Mac OS X in 5.0.1 (resp. 6.0).
Some of the comments mention uploading images. There are also a number of other similar stacks that are all 10.7 only showing up in crash stats in varying volume: OSServices@0x56cc7 OSServices@0x56f93 OSServices@0x56a7f OSServices@0x5f67f OSServices@0x2c49 OSServices@0x5e91b OSServices@0x5739b OSServices@0x39cc OSServices@0xba02 OSServices@0x2ca6 OSServices@0x19d9e OSServices@0x2d8d OSServices@0x57117 OSServices@0x5f3e2
Odd that these are all on OS X 10.7. The OSServices framework has been around a long time (http://www.cocoadev.com/index.pl?OSServices). I suppose this means we're dealing with yet another OS bug :-( It'd be *really* nice to have STR for one of these crashes ... but it's probably not realistic to expect that anytime soon.
As stated in comment 2, some comments say: "saving an image" "I was uploading a photo and apparently it doesn't like that too much."
This is a bit higher volume. Pre-10.7 release, we have 244 of these over the past 2 weeks for all versions. We should track closely as more users migrate to 10.7.
Seeing this rise significantly yesterday in 5* stats seems to indicate that people are hitting this significantly more now that Lion is out.
It is #2 top browser crasher on Mac OS X in 5.0.1 or 6.0 over the last 2 days. According to comments, it happens while downloading or uploading images, exporting or restoring bookmarks. One comment describes a STR: "When browsing pictures through google search, closing the image preview to view the original page caused firefox to crash."
> One comment describes a STR: > > "When browsing pictures through google search, closing the image > preview to view the original page caused firefox to crash." This does look promising (none of the others you quoted above are even close to being detailed enough). But I just tried it, and didn't see any crashes. Note that "closing the image preview to view the original page" is ambiguous: In the page of images you get after doing a search, you can cause a slightly larger "preview" to appear by mousing over an image. So does "closing the image preview" mean pressing the ESC key, or moving the mouse cursor off the original image? But if you click on one of the images in the first page, another page shows up where a different style of "preview" appears above the page where the image was originally embedded (where Google's image search found it). This can be closed (and the original page displayed) by clicking on the X icon in the "preview"'s upper right. I tried both alternatives many times, but didn't crash.
I think the running of an OS X service in background (which setting?) is required to make the above STR crash Firefox.
For what it's worth, I've used "nm -nam" on OS X 10.7 Build 11A511 (the GM and release build) to translate the numbers from Marcia's crash stack in comment #0 to symbols: 0 OSServices OSServices@0x56cc7 NetworkBrowser::closeNode(__NWNode*) 1 OSServices OSServices@0x632fc NWBrowserCloseNode 2 DesktopServicesPriv DesktopServicesPriv@0x3c562 TNode::ExternalUnRegistrationProper(__NWBrowser*, __NWNode*) 3 libsystem_c.dylib libsystem_c.dylib@0x4d1bb pthread_mutex_lock 4 DesktopServicesPriv DesktopServicesPriv@0x7af4c FinalizeNetworkNode_block_invoke_069 5 DesktopServicesPriv DesktopServicesPriv@0x7d1b7 ExceptionSafeBlock(void ( block_pointer)()) 6 DesktopServicesPriv DesktopServicesPriv@0x7ae9f FinalizeNetworkNode_block_invoke_0 7 DesktopServicesPriv DesktopServicesPriv@0x7aea5 FinalizeNetworkNode_block_invoke_069 8 libdispatch.dylib libdispatch.dylib@0x1909 _dispatch_call_block_and_release 9 libdispatch.dylib libdispatch.dylib@0x3159 _dispatch_queue_drain 10 libdispatch.dylib libdispatch.dylib@0x2fb5 _dispatch_queue_invoke 11 libdispatch.dylib libdispatch.dylib@0x27af _dispatch_worker_thread2 12 libsystem_c.dylib libsystem_c.dylib@0x503d9 _pthread_wqthread 13 libsystem_c.dylib libsystem_c.dylib@0x51b84 start_wqthread 14 libdispatch.dylib libdispatch.dylib@0x26e9 _dispatch_root_queues_init
And again in a more readable format: 0 OSServices NetworkBrowser::closeNode(__NWNode*) 1 OSServices NWBrowserCloseNode 2 DesktopServicesPriv TNode::ExternalUnRegistrationProper(__NWBrowser*, __NWNode*) 3 libsystem_c.dylib pthread_mutex_lock 4 DesktopServicesPriv FinalizeNetworkNode_block_invoke_069 5 DesktopServicesPriv ExceptionSafeBlock(void ( block_pointer)()) 6 DesktopServicesPriv FinalizeNetworkNode_block_invoke_0 7 DesktopServicesPriv FinalizeNetworkNode_block_invoke_069 8 libdispatch.dylib _dispatch_call_block_and_release 9 libdispatch.dylib _dispatch_queue_drain 10 libdispatch.dylib _dispatch_queue_invoke 11 libdispatch.dylib _dispatch_worker_thread2 12 libsystem_c.dylib _pthread_wqthread 13 libsystem_c.dylib start_wqthread 14 libdispatch.dylib _dispatch_root_queues_init
But when I ran FF (a custom non-opt build with debug symbols) in gdb and broke on NWBrowserCloseNode, then tried the (supposed) STR from comment #7 and comment #8, this breakpoint wasn't hit. I'm not sure why. Possibly because my translation isn't as accurate as I'd hoped. Or possibly because it's unusual to hit that breakpoint.
I do break on NWBrowserCloseNode, though, when I choose Print, then Save As PDF (though not when I choose Open PDF In Preview): Breakpoint 1, 0x00007fff9402f2b8 in NWBrowserCloseNode () (gdb) bt #0 0x00007fff9402f2b8 in NWBrowserCloseNode () #1 0x00007fff91344563 in TNode::ExternalUnRegistrationProper () #2 0x00007fff91344966 in TNode::DoExternalUnRegistration () #3 0x00007fff9134c65e in TNode::HandleNodeRequest () #4 0x00007fff9137195a in __PostNodeTaskRequest_block_invoke_08 () #5 0x00007fff913851b8 in ExceptionSafeBlock () #6 0x00007fff91371902 in __PostNodeTaskRequest_block_invoke_0 () #7 0x00007fff915ab90a in _dispatch_call_block_and_release () #8 0x00007fff915ad15a in _dispatch_queue_drain () #9 0x00007fff915acfb6 in _dispatch_queue_invoke () #10 0x00007fff915ac7b0 in _dispatch_worker_thread2 () #11 0x00007fff8d68e3da in _pthread_wqthread () #12 0x00007fff8d68fb85 in start_wqthread () This is (of course) on a secondary thread. One of the comments on this bug's stack signature says "trying to Save As PDF".
That's all for now, folks. I'll be on vacation (and mostly away from the Internet) for all of next week.
One final note: Saving as PDF is one way to reproduce bug 671064 -- which is very easy to reproduce on all versions of OS X, and is due to a bug in Cairo 1.10 (to which we recently upgraded on the trunk). So it's probably best to try to repro this bug on builds that don't yet have Cairo 1.10 (that predate 2011-05-28).
Here's a gdb stack trace of what is probably this bug's crash. I was just playing around in gdb, so I can't yet reproduce. Program received signal EXC_BAD_ACCESS, Could not access memory. Reason: 13 at address: 0x0000000000000000 0x00007fff84c86e90 in objc_msgSend () (gdb) bt #0 0x00007fff84c86e90 in objc_msgSend () #1 0x00007fff87957c00 in __PRETTY_FUNCTION__.174748 () #2 0x00007fff8e833cd3 in NetworkBrowser::closeNode () #3 0x00007fff8e8402fd in NWBrowserCloseNode () #4 0x00007fff8bb55563 in TNode::ExternalUnRegistrationProper () #5 0x00007fff8bb93f4d in __FinalizeNetworkNode_block_invoke_069 () #6 0x00007fff8bb961b8 in ExceptionSafeBlock () #7 0x00007fff8bb93ea0 in __FinalizeNetworkNode_block_invoke_0 () #8 0x00007fff8bdbc90a in _dispatch_call_block_and_release () #9 0x00007fff8bdbe15a in _dispatch_queue_drain () #10 0x00007fff8bdbdfb6 in _dispatch_queue_invoke () #11 0x00007fff8bdbd7b0 in _dispatch_worker_thread2 () #12 0x00007fff87e9f3da in _pthread_wqthread () #13 0x00007fff87ea0b85 in start_wqthread ()
NSBrowserCloseNode is called whenever a file picker is closed (whether by cancellation or not), usually twice, on different threads. The same thing happens in Safari -- but I think we can count on Apple having made sure that Safari doesn't crash.
https://crash-analysis.mozilla.com/chofmann/20110724/top-mac-crashes-all.txt indicates this is the top crash on Mac 10.7 at the moment.
This has increased in volume since the Lion release - over 1660 in a week across all versions.
Steven, we're coming up fast on the final beta. Any chance you've got more insight here?
I'm just back from vacation, and have started working again on this bug. It'll probably be my #1 priority until I can figure out something to do about it. That said, I can't make any progress until I can reproduce this bug, and none of the comments on this crash at crash-stats.mozilla.com are really any help. So this is *not* going to be an easy bug to fix or work around. I expect to spend the rest of this week digging around for any clues I can find. And it'll probably be several weeks before we have anything approaching a fix.
Just FYI, we're having a similar bug on Chromium: http://crbug.com/90716 Unfortunately, we're having a similar lack of success in reproducing it.
Thanks, Scott. While you're here, I'll grasp at a straw :-) I'm pretty sure Safari uses Objective-C 2.0 garbage collection. Mozilla browsers don't, and this seems to be a memory-management issue (i.e. the crashes seem to be the result of accessing a deleted object). Does Chrome use Objective-C 2.0 garbage collection?
Chrome uses manual garbage collection.
> Chrome uses manual garbage collection. You mean "retain" and "release"? Then it's like Mozilla browsers.
Apologies, yes, -retain, -release, -autorelease, and some C++ scoper templates to automate it in many cases. I agree with you that it sounds like the background thread is hitting something that was freed by something else (the save panel?).
Given that both Firefox and Chrome see this, I have logged Bug ID# 9889921 with Apple. I'll update if they tell me anything useful.
We're not going to track this specifically for 6. If there is a workaround we'd be interested in taking it. I'll poke apple, maybe we can get it fixed in a point release as it is affecting both Firefox and Chrome (and is Firefox's #1 crasher on 10.7 I believe).
https://crash-analysis.mozilla.com/chofmann/20110802/top-mac-crashes-all.txt indicates this is the top crash on 10.7 at the moment.
> (and is Firefox's #1 crasher on 10.7 I believe) This is Firefox's #1 crasher *on OS X* (all versions), with almost twice as many crashes as the next one in the list. https://crash-stats.mozilla.com/query/query?product=Firefox&version=ALL%3AALL&platform=mac&range_value=1&range_unit=weeks&date=08%2F03%2F2011+12%3A06%3A43&query_search=signature&query_type=contains&query=&reason=&build_id=&process_type=any&hang_type=any&do_query=1
Steven: Not sure what it means, but the correlation report for this stack shows: 441% (150/34) vs. 143% (1174/820) cl_kernels 438% (315/72) vs. 143% (1174/820) cl_kernels Not sure if it will provide any value but thought I would add it.
I don't know what "cl_kernels" means, so I can't say :-)
AFAICT, every Chrome crash from 10.7 contains the cl_kernels module. I suspect it's an OS implementation detail.
BTW, per a comment on the Chrome bug I mentioned above, I can repro this class of crash by mounting a network volume, then setting the open panel to the column mode, and repeatedly doing Command-O ESC to open and close the panel until a crash.
Thanks, Scott, for the info. I, too, have STR for this bug's crash. But it's very lame (complicated and with a low rate of "success") -- which is why I haven't mentioned it earlier. I've also made some progress. But I've had much less time to spend on this bug than I expected, and I'd prefer not to say much until I have more and better data. Suffice it to say that I'm currently hoping this is a simple Apple reference-counting bug (like bug 396680), and that I can work around it in the same general way. Here's my STR, such as it is: 1) Run Firefox in gdb. 2) In gdb, set a breakpoint on NWBrowserCloseNode. 3) Do something to open a file picker (e.g. File : Open), then close the file picker (by pressing ESC, clicking OK or clicking Cancel). 4) gdb will break on NWBrowserCloseNode on a secondary thread. While this thread is "frozen" (while gdb is waiting for a "continue" command), move the mouse around above the browser window and wait 30 seconds. Then enter "continue" at the gdb prompt. 5) gdb will break again on NWBrowserCloseNode, on a different secondary thread. Wait for another 30 seconds and enter "continue" at the gdb prompt. 6) At this point there's a 10%-20% that you'll get this bug's crash. If not, enter "continue" again at the gdb prompt and keep trying. It may help to first close the current browser window and open another one. It may help to ssh in from another computer and use gdb to attach to a running firefox-bin process.
I hit this crash again yesterday when I had the file picker open in FF 6, but as Steven notes, it is quite difficult to reliably reproduce the issue. Sigh.
Apple's response to my bug report indicates that it is known and being investigated. I take this to mean that it's on their end. Traditionally I've considered their bug tracker to be write-only, so I doubt I'll see more.
Greetings, I've been experiencing this bug repeatedly on OS X 10.7 with FF 6.0, and have seen it before on previous versions of FF and OS X 10.6, but the frequency has increased dramatically in 10.7. I've been monitoring this thread, and it sounds like you're getting close. If it's any healp, in an evening of saving PDF copies of medical literature from the uptodate.com site, I can usually count on it happening every 2-3 saves.
This bug isn't fixed in OS X 10.7.1, but the signature is slightly different: https://crash-stats.mozilla.com/report/list?product=Firefox&platform=mac&query_search=signature&query_type=contains&reason_type=contains&date=08%2F22%2F2011%2008%3A11%3A26&range_value=1&range_unit=weeks&hang_type=any&process_type=any&do_query=1&signature=OSServices%400x56c9f
Isn't fixed, or is fixed? The "isn't fixed... but signature is slightly different" didn't parse.
It isn't fixed in OS X 10.7.1. But the crashes happen at a different address in OS X 10.7.1 -- in other words, "the signature is slightly different".
Steven: Now that we have symbols we are getting this crash stack - [@ NetworkBrowser::closeNode(__NWNode*) ]. All 28 comments mention uploaded so I assume this stack is the same bug as this one.
> Now that we have symbols Excellent news! When did that start? And where is it being tracked (which bug)? > we are getting this crash stack - [@ > NetworkBrowser::closeNode(__NWNode*) ]. All 28 comments mention > uploaded so I assume this stack is the same bug as this one. Yes, I'm sure it is.
I've made some progress on this bug. In particular I found a way to make my STR from comment #36 *much* more effective. For a while I even thought I had a fix ... but my new STR shows it doesn't work. Interestingly, my new STR also "works" in Safari, which shows that two of my previous hunches are incorrect: 1) It's not true that this bug doesn't effect Safari. 2) It's not true that this bug has anything to do with Objective-C garbage collection (which I'm pretty sure Safari uses, but which I know Firefox and Chrome don't use). Here's my "new" STR: 1) Run Firefox in gdb, or run Firefox and use gdb to attach to the browser process. 2) In gdb, set a breakpoint at NWBrowserCloseNode. 3) Do something to open a file picker (e.g. File : Open), then close the file picker (by pressing ESC, clicking OK or clicking Cancel). 4) As mentioned in comment #17, NWBrowserCloseNode is now usually called twice, each time on a different thread. On the second call to NWBrowserCloseNode: a) If need be bring the browser to the foreground. b) Click on the browser window's close button (even if the cursor has already turned into the spinning wait cursor). c) Press Command-o to (later) reopen the file picker. 5) In gdb (at the second call to NWBrowserCloseNode) enter "continue". The browser window should close and then the file picker should reopen. 6) Mess around in the file picker, clicking on things and changing the window size. 7) Close the file picker (by clicking Cancel or pressing ESC). The browser will now sometimes crash immediately. But more likely gdb will break again on NWBrowserCloseNode. Enter "continue" (at the gdb prompt) whenever this happens. 8) If you don't crash, click on the browser's Dock icon to display a new window and start over at step 3. Sometimes clicking on the Dock icon will itself trigger a crash. This STR doesn't "work" (doesn't cause crashes) on OS X 10.6.8, where NWBrowserCloseNode is only called once when a file picker is closed.
[@ objc_msgSend | __PRETTY_FUNCTION__.174748 ] is another signature showing up now that we have symbols, and all the comments mention uploading or downloading something, so I am assuming it is this bug. Adding the signature to the crash field.
(In reply to comment #35) > BTW, per a comment on the Chrome bug I mentioned above, I can repro > this class of crash by mounting a network volume, then setting the > open panel to the column mode, and repeatedly doing Command-O ESC to > open and close the panel until a crash. This STR doesn't work for me at all in Firefox -- I never crash.
I think nsCOMPtr_base::assign_with_AddRef | nsTimerImpl::nsTimerImpl is related as well - shows up as a small volume 10.7 only crash -https://crash-stats.mozilla.com/report/list?signature=nsCOMPtr_base::assign_with_AddRef%20|%20nsTimerImpl::nsTimerImpl. One comment mentions uploaded and there are a few URLs which involve uploading.
Did we file a radar on this yet?
> Did we file a radar on this yet? We haven't. And there really isn't much to report. But Google did (see comment #28).
Ah, thanks. Knew I saw it somewhere. I'll poke Apple.
Good news! This bug appears to be fixed in Apple's current developer preview of OS X 10.7.2 (build 11C37). So it should presumably be fixed in the 10.7.2 release. I tested with my STR from comment #46 on current nightlies. I'm no longer able to make the browser crash. But even more encouraging (and telling) is that NWBrowserCloseNode() is no longer called twice every time you dismiss a file picker (as mentioned in comment #17 and comment #46) -- just once, as on OS X 10.6.X. I was very glad to find this out, because the only workarounds I've been able to discover are quite leaky.
I suspect this bug was caused by accessing an object that had already been deleted. NWBrowserCloseNode() appears to take two parameters. So its definition should probably be: NWBrowserCloseNode(__NWBrowser*, __NWNode*); I never saw it called twice with exactly the same parameters, even just before a crash. But the fact that it's only called once on OS X 10.6.X and 10.7.2 seems to indicate that there was something wrong with it's being called twice. What follows is information I dug up while trying to find a workaround for this bug. It's for my own future reference, or perhaps for those who want to debug other problems with the Mac file picker. The key to this bug is two methods of the FIFinderViewGutsController class in FinderKit framework in /System/Library/PrivateFrameworks: +[FIFinderViewGutsController initializeCounted] +[FIFinderViewGutsController finalizeCounted] The first call to 'initializeCounted' seems to initialize a number of system resources used by the file picker. Subsequent calls increment a reference count for these resources. Calls to 'finalizeCounted' decrement this reference count, and the last call tears down all the resources. Each call to 'initializeCounted' needs to be matched by a call to 'finalizeCounted'. There's also a FIFinderViewGutsController object, with instance variables and methods, which holds references to a number of objects used by the file picker. Its lifetime roughly matches that of the file picker, and 'finalizeCounted' is called from the destructors of many of the objects it holds references to. As I mentioned the last call to +[FIFinderViewGutsController finalizeCounted] tears down many of the resources used by the file picker. But (on OS X 10.7 and 10.7.1) two of the (static) methods it calls initiate calls to NWBrowserCloseNode() (on different threads, via calls on the main thread to "TNode::PostNodeTaskRequest(TCountedPtr<TNodeTask> const&) const" and TVolumeSyncThread::PostNodeTaskRequest(TNodeTask*)): TSidebarController::Finalize() TFENode::Finalize() In TFENode::Finalize(), the call to NWBrowserCloseNode() is triggered from a call to NodeContextClose(). On OS X 10.7.2 only TSidebarController::Finalize() triggers a call to NWBrowserCloseNode(). By the way, my workarounds involved leaking the "last" object to which the FIFinderViewGutsController holds a reference (a FILocationPopUp object), and thereby preventing the "last" call to +[FIFinderViewGutsController finalizeCounted] (called from -[FILocationPopUp dealloc]). This in turn leaks all the resources normally torn down by the last call to finalizeCounted.
> I suspect this bug was caused by accessing an object that had already > been deleted. I did wonder about that. I tried to repro on 10.7.1 using Valgrind but didn't see any use-after-free reports.
>> I suspect this bug was caused by accessing an object that had >> already been deleted. > > I did wonder about that. I tried to repro on 10.7.1 using Valgrind > but didn't see any use-after-free reports. Or maybe most of the crashes are simple null-dereferences (that's certainly what they look like on the face of it). But I'm pretty sure the larger story is contention (between threads) to tear down a single resource (probably a network resource, from the method names).
There's one crash in Mac OS X 10.7.2 11C26: bp-fd8018fe-ecf7-4186-aa6f-ca0062110930
(In reply to Scoobidiver from comment #60) > There's one crash in Mac OS X 10.7.2 11C26: > bp-fd8018fe-ecf7-4186-aa6f-ca0062110930 I've seen this crash happening to Chrome in 11C26, but not on later builds of 10.7.2. Other possibly-related crashes continued to later builds, but seem resolved as of build 11C62. Knock on wood!
OS X 10.7.2 has been released, and is available via Software Update or at http://support.apple.com/kb/DL1459. I'm downloading it now and will test it later.
Different file picker crashes seem to still be happening on OS X 10.7.2. See bug 695524.
I no longer see occurrences of those crashes on Mac OS X 10.7.2 build 11C62 in the past four weeks.