Closed Bug 738640 Opened 9 years ago Closed 9 years ago
.7] crash in -[Base Window dealloc], caused by bug in Robo Form extension
It's #8 top crasher in 12.0b1 on Mac OS X. It first appeared in 12.0a2/20120206 but is discontinuous across builds. Signature AppKit@0xd7851 More Reports Search UUID 5f6b4db5-606b-4649-9466-07ed12120323 Date Processed 2012-03-23 12:02:06 Uptime 4074 Last Crash 1.5 days before submission Install Age 10.6 hours since version was first installed. Install Time 2012-03-23 01:09:33 Product Firefox Version 12.0 Build ID 20120321033733 Release Channel beta OS Mac OS X OS Version 10.7.3 11D50 Build Architecture amd64 Build Architecture Info family 6 model 37 stepping 2 Crash Reason EXC_BAD_ACCESS / KERN_INVALID_ADDRESS Crash Address 0x9d02867 App Notes AdapterVendorID: 0x10de, AdapterDeviceID: 0x a29GL Context? GL Context+ GL Layers? GL Layers+ EMCheckCompatibility True Frame Module Signature Source 0 @0x109d02867 1 AppKit AppKit@0xd7851 2 AppKit AppKit@0xd0abd 3 AppKit AppKit@0xd080c 4 libmozglue.dylib je_calloc _string.h:80 5 libobjc.A.dylib libobjc.A.dylib@0x5b0a 6 libobjc.A.dylib libobjc.A.dylib@0x5fe2 7 libobjc.A.dylib libobjc.A.dylib@0x5b0a 8 libobjc.A.dylib libobjc.A.dylib@0xf507 9 libobjc.A.dylib libobjc.A.dylib@0xbe79 10 CoreFoundation CoreFoundation@0x71357 11 AppKit AppKit@0x11c19a 12 libsystem_c.dylib libsystem_c.dylib@0x4d46f 13 libsystem_c.dylib libsystem_c.dylib@0x4d46f 14 libsystem_c.dylib libsystem_c.dylib@0x4d6aa 15 CoreFoundation CoreFoundation@0x93a83 16 AppKit AppKit@0x472e0 17 AppKit AppKit@0x4d86f 18 AppKit AppKit@0x241b56 19 XUL -[BaseWindow dealloc] widget/cocoa/nsCocoaWindow.mm:2073 20 AppKit AppKit@0x42144 21 libobjc.A.dylib libobjc.A.dylib@0xed52 22 CoreFoundation CoreFoundation@0x31275 23 libobjc.A.dylib libobjc.A.dylib@0xf03b 24 CoreFoundation CoreFoundation@0x16d73b 25 CoreFoundation CoreFoundation@0x389a0 26 CoreFoundation CoreFoundation@0xf7e2 27 libsystem_c.dylib libsystem_c.dylib@0x4e371 28 libsystem_c.dylib libsystem_c.dylib@0x6a710 29 libsystem_c.dylib libsystem_c.dylib@0x4d46f 30 libsystem_c.dylib libsystem_c.dylib@0x4d6aa 31 AppKit AppKit@0x982c8a 32 AppKit AppKit@0x982c8a 33 libobjc.A.dylib libobjc.A.dylib@0xcd08 34 AppKit AppKit@0x982c8a 35 libobjc.A.dylib libobjc.A.dylib@0xce75 36 libobjc.A.dylib libobjc.A.dylib@0xc417 37 libsystem_c.dylib libsystem_c.dylib@0x4d46f 38 libsystem_c.dylib libsystem_c.dylib@0x4d6aa 39 CoreFoundation CoreFoundation@0x182cb3 40 CoreFoundation CoreFoundation@0x182cb3 41 libmozglue.dylib je_free memory/jemalloc/jemalloc.c:1662 42 XUL nsVoidArray::operator= obj-firefox/x86_64/xpcom/build/nsVoidArray.cpp:366 43 @0x100200043 44 libmozglue.dylib je_free memory/jemalloc/jemalloc.c:1662 45 @0x7fff5fbff17f 46 CoreFoundation CoreFoundation@0x31b04 47 AppKit AppKit@0x888adf 48 Foundation Foundation@0x951c 49 XUL ScopedXPCOMStartup::~ScopedXPCOMStartup toolkit/xre/nsAppRunner.cpp:1113 50 XUL XRE_main toolkit/xre/nsAppRunner.cpp:1115 51 firefox main browser/app/nsBrowserApp.cpp:205 52 firefox firefox@0x1433 More reports at: https://crash-stats.mozilla.com/report/list?signature=AppKit%400xd7851
More reports with a better stack also at: https://crash-stats.mozilla.com/report/list?signature=libobjc.A.dylib%400xb350
Crash Signature: [@ AppKit@0xd7851] → [@ AppKit@0xd7851] [@ libobjc.A.dylib@0xb350]
I have been able to reproduce this crash once, albeit with a slight different stack using 12.0b2: https://crash-stats.mozilla.com/report/index/bp-ae3793c9-15ca-4d8d-b17c-a51ba2120323 STR: 1. Go to https://cloudmagic.com and create an account. (Note: You may be able to skip this and just try https://addons.mozilla.org/en-US/firefox/addon/cloudmagic/eula/147475?src=dp-btn-primary but I did not follow that path) 2. After account has been established, click the button to install the Firefox extension (it will be at the top of the page) 3. I forced the extension to install even though it was not compatible. 4. Restart Firefox I crashed after restarting. I had some other extensions installed as well, but cloudmagic was found among the extensions in the crash reports. I will see if I can reproduce after lunch with a fresh profile and if so will add the keyword.
It seems the key to reproducing this bug actually involves more than one extension - here is my extension set: CloudMagic - Exchange, Gmail and Twitter Search 1.1.6 true cloudmagic@cloudmagic RoboForm "1-00-002" true firstname.lastname@example.org Test Pilot 1.2.1 true email@example.com So in addition to install Cloudmagic as described above, you should also: 1. Install http://www.roboform.com/platforms/browsers/safari. 2. Once the browser restarts, allow the installation of the Roboform extension which is designated as "1-00-002" 3. At this point you can then install cloud magic, or uninstall if it is already installed. 4. After installing or uninstalling the cloud magic addon, I get this crash - https://crash-stats.mozilla.com/report/index/bp-92f31cc6-9607-4744-b185-6c80c2120323
Crash Signature: [@ AppKit@0xd7851] [@ libobjc.A.dylib@0xb350] → [@ AppKit@0xd7851] [@ AppKit@0xd6799] [@ libobjc.A.dylib@0xb350]
Summary: [10.7] crash in -[BaseWindow dealloc] @ AppKit@0xd7851 → [10.7] crash in -[BaseWindow dealloc]
I'm unable to reproduce Marcia's STR. I tested with today's mozilla-central nightly on OS X 10.7.3. Here's what I did: 1) Once I'd downloaded roboform-mac.dmg from http://www.roboform.com/platforms/browsers/safari to my Desktop, I copied the RoboForm app to my Desktop and ran it. I didn't create an account or activate RoboForm. 2) Started today's mozilla-central nightly. I was prompted to install the RoboForm "1-00-02" extension, which I allowed. Then I clicked the "Restart Nightly" button. 3) Installed the CloudMagic extension without creating an account, by visiting https://addons.mozilla.org/en-US/firefox/addon/cloudmagic/eula/147475?src=dp-btn-primary and clicking the "Restart Now" button. My other extensions are Test Pilot 1.2.1 and PDF Viewer 0.2.414. I've left the PDF Viewer disabled. Marcia: Did you restart right after installing the RoboForm extension? Can you consistently reproduce the crash just by installing or uninstalling the CloudMagic extension, then restarting? (If you have the other extensions installed that you listed.)
RoboForm is a pain to uninstall completely. Here are the files/directories created by RoboForm that I've been able to find, which I've been deleting: ~/.goodsync ~/.roboform ~/Documents/Logins/Contact Info.rfp ~/Documents/Logins/cache.rfo ~/Documents/Logins/mru.rfo ~/Documents/Logins/options.rfo ~/Library/Application Support/RoboForm ~/Library/Caches/com.SilberSystems.RoboForm ~/Library/Internet Plug-Ins/rf-firefox-plugin.plugin ~/Library/Internet Plug-Ins/rf-safari-plugin.webplugin ~/Library/Preferences/com.SiberSystems.RoboForm.plist ~/Library/Preferences/com.SiberSystems.RoboForm.plist.lockfile ~/Library/Preferences/com.SiberSystems.RoboFormIcon.plist ~/Library/Preferences/com.SiberSystems.RoboFormIcon.plist.lockfile ~/Library/Saved Application State/com.SiberSystems.RoboForm.savedState
(Following up comment #5) Another thing about the RoboForm app -- it seems to be impossible to quit :-( So besides deleting the above-mentioned files and directories, you also need to log out or restart.
Interestingly, I *can* reproduce with FF 12.0b2 (the latest beta), though I can't with today's Aurora nightly. I suspect this has something to do with the FF release branding. I'll do a custom trunk build with the release branding, and see what happens.
Steven: I have only tested this with 12.0b2, and not with nightly. And with that build I can reproduce consistently either installing or uninstalling Cloud Magic. (In reply to Steven Michaud from comment #7) > Interestingly, I *can* reproduce with FF 12.0b2 (the latest beta), though I > can't with today's Aurora nightly. > > I suspect this has something to do with the FF release branding. I'll do a > custom trunk build with the release branding, and see what happens.
> I'll do a custom trunk build with the release branding, and see what > happens. This wasn't enough to trigger the crashes. Neither was forcing the version number to "12.0b2". So this bug may be "fixed" in current code. Next I'll try reproducing it in code corresponding exactly to the (real) 12.0b2 release.
Here's a gdb stack trace of this crash, obtained using a custom (non-opt, non-debug) build made from current beta-branch code. If I force the version number to "12.0", I can reproduce these crashes on the beta branch as far back as I've looked (to the 10 branch). So (for the moment at least) I've given up trying to find a regression range. Despite all the tricks I've played, I haven't been able to reproduce these crashes on the trunk or the aurora branch. But I suspect this is just an accident -- that the bug (whatever it is) also effects those branches.
Crash Signature: [@ AppKit@0xd7851] [@ AppKit@0xd6799] [@ libobjc.A.dylib@0xb350] → [@ AppKit@0xd7851] [@ AppKit@0xd6799] [@ libobjc.A.dylib@0xb350] [@ -[NSView(NSInternal) _uninstallTrackingArea:]]
Steven, do we have any ideas about a fix here? The combined signature gives us about 50 crashes a week on Mac which puts it in the top 5.
(In reply to comment #12) I've been working on it for the past few days, and have made some progress -- for example I've discovered that this is at least partly a RoboForm bug. But I haven't yet finished my analysis. What's good is that I can reproduce the bug consistently, and it appears to be in Objective-C code (which is relatively easy to debug). So it may take me another few days, but I'm reasonably certain I'll be able to find a fix/workaround for the bug (or bugs) here.
Here's a patch that fixes these crashes in my tests. Turns out I'd made a mistake translating from my debugging patch to the final patch. So no, I didn't encounter a third kind of crash that also needs fixing (it was one of the two I'd previously discovered). To recap, when the RoboForm extension unloads the RoboForm plugin (as the browser quits), it leaves a bunch of dangling pointers. These cause two different kinds of crashes, which my patch works around: 1) A crash accessing the IMP object for one of the methods that the RoboForm plugin swizzles (-[NSWindow isKeyWindow]). The stack in attachment 610165 [details], though incomplete at the top, represents this kind of crash. I work around this (as the browser quits) by unswizzling all the methods that the RoboForm plugin has swizzled. (More in a later comment about how I found out which methods these are.) I might have gotten away with only unswizzling the -[NSWindow isKeyWindow] method. But it's probably safer to do them all, and this isn't difficult. 2) A crash accessing the Class object for one of the custom classes created by the RoboForm plugin -- RFChildWindow. I work around this (as the browser quits) by detaching any RFChildWindow objects still in NSApplication's window list. The RoboForm plugin also creates many other kinds of custom classes, and instances thereof. But it only creates one RFChildWindow, which it makes a child of one of the browser windows. And (as best I can tell) it doesn't attach any of its other custom objects directly to a non-RoboForm native object. If I'm right about this our problem is solved. But if I'm wrong we'll keep seeing crashes on quit that are 100% correlated with the RoboForm extension. We'll need to keep our eyes open for this. Note that the RoboForm plugin (rf-firefox-plugin) *won't* appear in the Breakpad list of modules, since it's been unloaded by the time any of these crashes happen. I'm attempting to do a beta-branch tryserver build of my patch. If that works out, it should be available in a few hours.
Here's how I found out which methods the RoboForm plugin swizzles: First I ran class-dump and "nm -pam" on the plugin (rf-firefox-plugin in ~/Library/Internet Plug-Ins/). class-dump gives you header information on all the plugin's custom Objective-C classes. "nm -pam" gives you all the plugin's symbols (exported or not). One of the symbols is: __Z7swizzleP10objc_classP13objc_selectorS1_ c++filt translates this as follows: swizzle(objc_class*, objc_selector*, objc_selector) You can break on this in gdb. The first parameter is the Class object, one of whose methods you want to swizzle. The second is the selector (SEL object) for the method you want to swizzle. The third is the selector for the method you want to replace it with. Both the second and the third parameters can be treated as SEL objects (as pointers to objc_selector) -- for example you can call NSStringFromSelector() on both of them.
(Following up comment #15) > The RoboForm plugin also creates many other kinds of custom > classes, and instances thereof. But it only creates one > RFChildWindow, which it makes a child of one of the browser > windows. And (as best I can tell) it doesn't attach any of its > other custom objects directly to a non-RoboForm native object. Oops, this is partially incorrect. The RoboForm plugin creates one RFChildWindow object for every browser window (every ToolbarWindow object), and attaches it to the browser window as a child window (using -[NSWindow addChildWindow:]).
I just realized there's a much simpler way to fix this bug -- just do an extra call to dlopen() on the plugin! PR_UnloadLibrary() just calls dlclose(). But dlclose() uses reference counting (it only unloads a library when its reference count has reached zero). So an extra call to dlopen() will increase the reference count, and prevent it from reaching zero. It makes no sense for the RoboForm extension to unload its plugin as the browser quits -- the plugin will get unloaded very soon anyway. New patch coming up.
Attachment #612193 - Attachment description: Patch (works around dangling pointers) → Patch (work around dangling pointers)
Including dmandelin to see if he can help out with the JS questions above. Have we figured out the regressing bug in FF12 to consider backing out ahead of our next beta?
> Including dmandelin to see if he can help out with the JS questions > above. Thanks. > Have we figured out the regressing bug in FF12 to consider backing > out ahead of our next beta? There isn't one. This is entirely a RoboForm bug. I still don't know why it's only reproducible on the beta branch, with the FF version forced to 12. But it's possible that FF's versioning changes RoboForm's behavior. And I suspect the problem only started with the most recent version of the RoboForm extension (0-9-90, released on 2011-12-29).
Here's a tryserver build made with my original patch (attachment 612193 [details] [diff] [review] from comment #15): http://firstname.lastname@example.org/try-macosx64/firefox-12.0.en-US.mac.dmg In some ways it's not as good as my second patch (attachment 612365 [details] [diff] [review] from comment #20). But we may end up having to fall back to it. So please test it out!
Note that the tryserver build linked in comment #23 is a beta-branch build. I've started another one for my alternate patch, from comment #20.
Summary: [10.7] crash in -[BaseWindow dealloc] → [10.7] crash in -[BaseWindow dealloc], caused by bug in RoboForm extension
RoboForm does have a (well-hidden) way to report bugs (http://www.roboform.com/support/preticket_faq, bottom of the page). But it'd be nice to find someone from RoboForm we can CC on this bug. Can anyone help with this? Thanks in advance!
Here's a tryserver build (beta-branch) made from my alternate patch (attachment 612365 [details] [diff] [review]) from comment #20: http://email@example.com/try-macosx64/firefox-12.0.en-US.mac.dmg
As far as I know, the recommended way to deal with crashes caused by extensions is to report the problem to the extension author, and then optionally blacklist unfixed versions of the extension either after a fixed version has been released or after a grace period has run out. We really shouldn't need to add workarounds for specific extension to our code.
CC'ing two Roboform contacts (found in bug 698018). @Vadim: Description of the Roboform bug is in comment 14 and comment 15, suggested fix is to not unload the library on shutdown.
(Comment #28) Yes, of course. But I know that we often *do* include workarounds for OS bugs and plugin bugs. The reason (of course) is that it usually takes them a lot longer to fix the problem than it does for us to work around it.
And this bug is a lot more serious than bug 698018 -- it's a topcrasher. We can't afford to wait a long time to get it fixed. In fact we can't afford to let the effects of this bug get into a release (FF 12).
> @Vadim: Description of the Roboform bug is in comment 14 and comment 15 Also see comment 19 and comment 20.
To the RoboForm people: I particularly want to know if my "alternate patch" (attachment 612365 [details] [diff] [review], comment #20) might open a security hole in RoboForm. Or to put it another way, will it be OK (securitywise) to stop the rf-firefox-plugin module from being unloaded until all the other Firefox modules are unloaded (by the OS, as the Firefox process dies).
The version of RoboForm that I've been testing with is 1.0.2. But the current version is 1.0.7, so I've now started testing with that version, too. It still has the bug, and the symptoms are exactly the same. I've also found out that the bug doesn't happen (and RoboForm extension doesn't load its binary plugin) if you disable the RoboForm extension.
so this is RoboForm for Mac, not RoboForm for Windows. will send to developers.
> so this is RoboForm for Mac, not RoboForm for Windows. Correct. > will send to developers. Thanks!
(In reply to Steven Michaud from comment #22) > There isn't one. This is entirely a RoboForm bug. > > I still don't know why it's only reproducible on the beta branch, with > the FF version forced to 12. But it's possible that FF's versioning > changes RoboForm's behavior. And I suspect the problem only started > with the most recent version of the RoboForm extension (0-9-90, > released on 2011-12-29). Does the issue reproduce if the Roboform extension is disabled? If not, then we have a Plan B of blocklisting affected Roboform versions.
> Does the issue reproduce if the Roboform extension is disabled? No (see comment #34). > If not, then we have a Plan B of blocklisting affected Roboform versions. Correct.
so. I easily reproduce the problem on ff12b4. I will came up with the solution on monday. However nightly build work fine.
we did investigation and find out: 1. In ff12 during window initialization you recreate roboform toolbar view (NSView) twice. In nightly build you do not do this. 2. This view double creation did broke some of our logic and this crash is just consequences of double creation. So there is no sense to fix this crash directly. 3. So actually you can create and recreate NSView in toolbar as many as you want so we fixed the problem in our code assuming you may do this. On April 11 I will release new rf 1.1 with fixes included.
> you recreate roboform toolbar view (NSView) twice "You" isn't Firefox. Is it RoboForm? Is it the OS? Is the "roboform toolbar view" a subclass of NSView? If so which one? > On April 11 I will release new rf 1.1 with fixes included. I will try it as soon as it becomes available. Please let us know here when that happens.
> On April 11 I will release new rf 1.1 with fixes included. How's the RF 1.1 release going, Vlad? The most recent version at http://www.roboform.com/download is still 1.0.7.
Alex, a decision point for this bug is coming up soon -- the code freeze for FF 12 is tomorrow (2012-04-13). As I see it, our current choices are: 1) Take one of my patches on the beta branch. 2) Blocklist all Mac versions of RoboForm, including the current one. Hopefully our range of choices will broaden once RF 1.1 comes out. But I'll need to first check it thoroughly in my debugging-patched version of FF.
Release will come out later today
Comment on attachment 612193 [details] [diff] [review] Patch (work around dangling pointers) With the FF 12 code freeze tomorrow, things are getting down to the wire. Markus, I know you don't like working around extension bugs in FF code. But (as I've already mentioned) we've done this many times for plugin and OS bugs, so it's a legitimate option. And RoboForm does have a large number of users. Let's hope RF 1.1 fixes the bug. But in case it doesn't, I'd like to have both of these patches ready to go into the tree at a moment's notice. If you don't have time for the reviews, or if you'd rather not do them, please let me know as soon as possible. I'll find someone else.
> Release will come out later today Thanks! Please comment here as soon as it does. And also let us know if we need to get it somewhere else than http://www.roboform.com/download.
As for the issue I raise in comment #33 (and elsewhere in this bug), nobody's addressed it here, and the one person I asked about it offline hasn't gotten back to me. If I had to decide now which of my patches to choose, based on the information I currently have, I'd choose the "alternate" one (the one that prevents the RoboForm plugin from being unloaded "prematurely"). I think the chances that doing this would open a security hole in RoboForm are very small, so I'd be willing to take the risk. I'm quite sure that you need to do more than explicitly unload rf-firefox-plugin to prevent it from leaking sensitive information.
so we just release the fixed version. http://www.roboform.com/dist/roboform-mac.dmg
RoboForm Ver 1.1.1 has been released. It fixes this problem.
I downloaded RoboForm 1.1.1 from http://www.roboform.com/download and have been testing with it. I can confirm that it does fix this bug. Apparently the fix has been accomplished by no longer explicitly unloading the pr-firefox-plugin plugin from the extension. Now it's (safely) unloaded as all other browser modules are unloaded, under mozJSComponentLoader::UnloadModules(), called (indirectly) from the ScopedXPCCOMStartup destructor.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WORKSFORME
We'll still need to blocklist all earlier versions of the Mac RoboForm plugin -- all those before version 1.1.1.
> Now it's (safely) unloaded as all other browser modules are > unloaded, under mozJSComponentLoader::UnloadModules(), > called (indirectly) from the ScopedXPCCOMStartup destructor. I should have said "Now it's (safely) unloaded as all other JS components are unloaded ...". Most browser modules are unloaded by the OS.
Thanks, Vlad and Vadim, for your work on this.
(In reply to Steven Michaud from comment #51) > We'll still need to blocklist all earlier versions of the Mac RoboForm > plugin -- all those before version 1.1.1. We should look at the new version uptake an how that reflects on crash stats before deciding if this is necessary, and when if necessary.
(In reply to Jorge Villalobos [:jorgev] from comment #54) > (In reply to Steven Michaud from comment #51) > > We'll still need to blocklist all earlier versions of the Mac RoboForm > > plugin -- all those before version 1.1.1. > > We should look at the new version uptake an how that reflects on crash stats > before deciding if this is necessary, and when if necessary. We'll continue to track this bug for FF12, but we don't plan to take an in-product fix. We may still want to blocklist earlier versions.
What are the current stats for this crash? We need to determine if we want to blocklist the old version.
These seem to have fallen off to almost nothing on the 12 branch: https://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox%3A12.0&platform=mac&query_search=signature&query_type=contains&reason_type=contains&date=04%2F26%2F2012%2017%3A50%3A12&range_value=1&range_unit=weeks&hang_type=any&process_type=any&do_query=1&signature=libobjc.A.dylib%400xb350 https://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox%3A12.0&platform=mac&query_search=signature&query_type=contains&query=AppKit&reason_type=contains&date=04%2F26%2F2012%2018%3A01%3A46&range_value=1&range_unit=weeks&hang_type=any&process_type=any&do_query=1&signature=AppKit%400xd7851 I really wish we could search on Socorro by "-[BaseWindow dealloc]". But since that's never at the top of any of this bug's stacks, we can't -- which is the subject for another bug (soon to be opened).
It sounds like it's not worth blocklisting, but we can softblock if someone feels this is necessary.
You need to log in before you can comment on or make changes to this bug.