15.21 KB, patch
|Details | Diff | Splinter Review|
56.62 KB, text/plain
6.07 KB, patch
Josh Aas: review+
|Details | Diff | Splinter Review|
6.81 KB, patch
Josh Aas: review+
|Details | Diff | Splinter Review|
User-Agent: Opera/9.61 (Macintosh; PPC Mac OS X; U; en) Presto/2.1.1 Build Identifier: Mozilla/5.0 (Macintosh; U; PPC Mac OS X 10.5; en-US; rv:126.96.36.199) Gecko/2009060214 Firefox/3.0.11 Mac OS X has an accessibility API that allows other software to get information about and trigger actions for various UI elements on screen, in any application, including menus in the menu bar. When attempting to use this API to get information about Firefox's "History" menu before clicking and interacting with that menu normally, a crash will occur. Reproducible: Always Steps to Reproduce: 1. Download UIElementInspector.app from Apple. It can be found at the following URL: http://developer.apple.com/samplecode/UIElementInspector/index.html 2. Launch UIElementInspector.app 3. Launch Firefox.app 4. Be sure not to click and open any of Firefox's menus from this point on 5. Mouse over the "History" menu without clicking it, so that UIElementInspector shows UI information regarding that menu item 6. Press command-F7, the hotkey that locks UIElementInspector to the currently displayed UI element Actual Results: Firefox crashes Expected Results: UIElementInspector should have successfully retrieved more information about the History menu, and Firefox should have continued running without any interruption. There are other ways to reproduce this bug other than using UIElementInspector in case you have difficulty reproducing this crash. It was originally reported to me by a user of my software, xGestures, as xGestures makes use of this API to trigger menus in other software. Many users have had problems using Firefox in conjunction with xGestures because of this crash. Other accessibility-enabled applications can potentially trigger this crash as well.
Accessibility is not enabled by default on Mac OS X yet, because the API ssupport still has several problems, see bug 454202 as dependent ones, for example. The action you, Brian, are performing is therefore not supported by Firefox on the Mac, and as such kind of expected behavior. Closing this bug as invalid.
Marco, if firefox is crashed then bug can't be invalid. Possibly Brian has own build with --enable-accessibility option. Could we get stack trace?
Mac OS X's accessibility isn't on by default as a security measure, since it could potentially be used for controlling a computer entirely, not because it's buggy. The Accessibility API has been present in Mac OS X since 10.2. Furthermore, the aspect of the API that's being used when Firefox crashes is standard, and could be used by potentially any application, especially ones meant to aid disabled users. While I can't speak for other software that uses Mac OS X's accessibility API, I can attest that I've many complaints from many of my software's users regarding Firefox crashing ever since the release Firefox 3. I can say with a great deal of certainty that this crash is the result of a bug in Firefox, and shouldn't be dismissed so easily.
brian: we aren't talking about OS X's apis, we're talking about Mozilla's bridge to them. the people who work on the mozilla accessibility code have every right to dismiss your bug, since it can't possibly be their fault as their code isn't built into firefox's we ship to end users (and you oddly enough resemble an end user). now, if you don't want to be dismissed by everyone else, please follow the instructions outlined here: https://developer.mozilla.org/en/How_to_get_a_stacktrace_for_a_bug_report
Signature libobjc.A.dylib@0x15688 | imgRequestProxy::OnStopFrame(gfxIImageFrame*) UUID 8389e720-bde2-4377-943a-2daa82090622 Time 2009-06-22 01:18:06.240402 Uptime 60 Last Crash 1507309 seconds before submission Product Firefox Version 3.0.11 Build ID 2009060214 Branch 1.9 OS Mac OS X OS Version 10.5.7 9J61 CPU x86 CPU Info GenuineIntel family 6 model 7 stepping 6 Crash Reason EXC_BAD_ACCESS / KERN_INVALID_ADDRESS Crash Address 0x905aa688 User Comments Processor Notes Crashing Thread Frame Module Signature [Expand] Source 0 libobjc.A.dylib libobjc.A.dylib@0x15688 1 XUL imgRequestProxy::OnStopFrame mozilla/modules/libpr0n/src/imgRequestProxy.cpp:452 2 XUL imgRequest::OnStopFrame mozilla/modules/libpr0n/src/imgRequest.cpp:539 3 XUL nsPNGDecoder::EndImageFrame mozilla/modules/libpr0n/decoders/png/nsPNGDecoder.cpp:198 4 XUL end_callback mozilla/modules/libpr0n/decoders/png/nsPNGDecoder.cpp:852 5 XUL MOZ_PNG_push_read_chunk mozilla/modules/libimg/png/pngpread.c:348 6 XUL MOZ_PNG_process_data mozilla/modules/libimg/png/pngpread.c:35 7 XUL ReadDataOut mozilla/modules/libpr0n/decoders/png/nsPNGDecoder.cpp:335 8 XUL nsPipeInputStream::ReadSegments mozilla/xpcom/io/nsPipe3.cpp:799 9 XUL nsPNGDecoder::WriteFrom mozilla/modules/libpr0n/decoders/png/nsPNGDecoder.cpp:365 10 XUL imgRequest::OnDataAvailable mozilla/modules/libpr0n/src/imgRequest.cpp:893 11 XUL nsBaseChannel::OnDataAvailable mozilla/netwerk/base/src/nsBaseChannel.cpp:706 12 XUL nsInputStreamPump::OnStateTransfer mozilla/netwerk/base/src/nsInputStreamPump.cpp:508 13 XUL nsInputStreamPump::OnInputStreamReady mozilla/netwerk/base/src/nsInputStreamPump.cpp:398 14 XUL nsInputStreamReadyEvent::Run mozilla/gfx/cairo/libpixman/src/pixman-access-accessors.c:107 15 XUL nsThread::ProcessNextEvent mozilla/xpcom/threads/nsThread.cpp:510 16 XUL NS_ProcessPendingEvents_P nsThreadUtils.cpp:180 17 XUL nsBaseAppShell::NativeEventCallback mozilla/widget/src/xpwidgets/nsBaseAppShell.cpp:121 18 XUL nsAppShell::ProcessGeckoEvents mozilla/widget/src/cocoa/nsAppShell.mm:309 19 CoreFoundation CFRunLoopRunSpecific 20 CoreFoundation CFRunLoopRunInMode 21 HIToolbox RunCurrentEventLoopInMode 22 HIToolbox ReceiveNextEventCommon 23 HIToolbox BlockUntilNextEventMatchingListInMode 24 AppKit _DPSNextEvent 25 AppKit -[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] 26 AppKit -[NSApplication run] 27 XUL nsAppShell::Run mozilla/widget/src/cocoa/nsAppShell.mm:598 28 XUL nsAppStartup::Run mozilla/toolkit/components/startup/src/nsAppStartup.cpp:181 29 XUL XRE_main mozilla/toolkit/xre/nsAppRunner.cpp:3196 30 firefox-bin main mozilla/browser/app/nsBrowserApp.cpp:158 31 firefox-bin start crt.c:272 32 firefox-bin start 33 @0x1
For convenience (since this bug has such good STR) I'm going to undup this bug, and make others depend on it.
Created attachment 385135 [details] [diff] [review] Debugging patch (*not* a fix) Here's a debugging patch that can be used to show the proximate cause of this bug -- an nsMenuItemX's mIcon object (class nsMenuItemIconX) can sometimes survive longer than the nsMenuItemX object. The nsMenuItemX object that owns (and has created) an nsMenuItemIconX object also owns the latter's mNativeMenuItem (an NSMenuItem object). Because of this (i.e. because nsMenuItemIconX::mNativeMenuItem is "weak"), you often (always?) get a crash in nsMenuItemIconX::OnStopFrame() when an nsMenuItemX's mIcon object outlives it. Presumably the same is true of an nsMenuX's mIcon object. In my next comment I'll post a crash log (made using this patch) that demonstrates what I've just said.
Created attachment 385142 [details] Crash log made with debugging patch Here's a crash log made with my debugging patch from comment #7. Just before the crash, my patch logs the addresses of the mMenuObject (class nsMenuObjectX) and mNativeMenuItem (class NSMenuItem) of the object in whose nsMenuItemIconX::OnStopFrame() method the crash takes place: nsMenuItemIconX::OnStopFrame(): mMenuObject 0x17D0FC80, mNativeMenuItem 0x17D0E320 Search on "17D0E320" (the address of the mNativeMenuItem) in the log and you'll see that its nsMenuItemIconX's parent (mMenuObject 0x17D0FC80) gets destroyed (it's destructor gets called) without the destructor for nsMenuItemIcon getting called. This is presumably because code called from nsMenuItemIconX::LoadIcon() addrefs the nsMenuItemIconX object, which would (normally) only get released sometime after nsMenuItemIconX::StopFrame() is called on it. You'll then see where mNativeMenuItem is destroyed (in a call to [NSMenuItem dealloc]), followed by several calls to nsMenuItemIconX::On...() methods, ending with the final (fatal) call to nsMenuItemIconX::OnStopFrame(). All of these happen after mNativeMenuItem has been destroyed, and its pointer has become invalid.
Created attachment 385200 [details] [diff] [review] Fix Here's a patch that fixes this bug (and those that depend on it). Since the underlying problem is that an nsMenuItemIconX object can outlive the nsMenuObjectX object that created and owns it, we might just make nsMenuItemIconX::mNativeMenuItem a strong pointer (as I did in my patches for bug 477475). But then what do we do with nsMenuItemIconX::mMenuObject? If we also made it a strong pointer, an nsMenuItemIconX object and the nsMenuObjectX object that created it would own each other, and leaks would result. Better to do what my current patch does -- add an nsMenuItemIconX::Disable() method that can be called from its creator's destructor, and which nulls out nsMenuItemIconX member variables that are about to become invalid. Here's a tryserver build made with this patch: https://firstname.lastname@example.org/bugzilla499600-macosx.dmg
Comment on attachment 385200 [details] [diff] [review] Fix In general I like this patch. We finally know what is causing the icon to outlive its creating menu/menuitem - the outstanding img request. This was the missing piece of the lifetime puzzle before. Lets rename the "Disable" method to "Destroy" and cancel the image request in addition to nulling out the pointers. There is no reason to continue with the request once the creating menu/menuitem has been killed off. Since it is the request that is holding a reference to the icon object, canceling the request should kill the icon. If that doesn't happen it is a bug, the result of another reference we haven't considered yet. Also I think most of the new comments need to be changed or reverted. As you rewrote them it makes the lifetime sound ambiguous and subject to bugs we don't understand, which is no longer the case. If we cancel the image request in "Destroy" then the invariant of icons not outliving their creating menus/items remains intact. So long as that is true there is no need to reference a bug number or say things like "We may outlive the nsMenuObjectX that created us" or "The icon object may outlive us. Prevent this from causing trouble." Thanks for figuring this out!
Created attachment 387039 [details] [diff] [review] Fix rev1 -- call imgIRequest::CancelAndForgetObserver() Josh, I've found that calling imgIRequest::CancelAndForgetObserver() from nsMenuItemIconX::Destroy/Disable does fix the crash. It releases nsMenuItemIconX::mIconRequest and (most importantly) seems to stop any further calls to nsMenuItemIconX's imgIDecoderObserver methods. So here's a new patch that, in addition to nulling out mMenuObject and mNativeMenuItem, calls CancelAndForgetObserver(). I've revised my comments to reflect the fact that CancelAndForgetObserver() seems to work properly (to have immediate effect). A tryserver build should follow in a few hours.
> It releases nsMenuItemIconX::mIconRequest Oops. It releases the nsMenuItemIconX object to which mIconRequest belongs (it releases imgRequestProxy::mListener).
Comment on attachment 387039 [details] [diff] [review] Fix rev1 -- call imgIRequest::CancelAndForgetObserver() Perfect, thanks!
Comment on attachment 387039 [details] [diff] [review] Fix rev1 -- call imgIRequest::CancelAndForgetObserver() Landed on trunk: http://hg.mozilla.org/mozilla-central/rev/0161c74dc486
Comment on attachment 387039 [details] [diff] [review] Fix rev1 -- call imgIRequest::CancelAndForgetObserver() This is a topcrasher on each branch (1.9.0, 1.9.1) separately and on both of them together. So I'm requesting approval for both branches.
Comment on attachment 387039 [details] [diff] [review] Fix rev1 -- call imgIRequest::CancelAndForgetObserver() Josh says he'd also like this to land on both branches.
Josh says porting my patch to the 1.9.0 branch might not be straightforward -- there the menu implementation hasn't yet been deCOMtaminated, and lifetimes might be different. So I'll need to test it (or some variant of it) on the 1.9.0 branch separately. I'll do this in the next few days.
For what it's worth, here's a tryserver build made with my rev1 patch: https://email@example.com/bugzilla499600-rev1-macosx.dmg The patch will also be in tomorrow's (and subsequent) Minefield nightlies.
Steven, can we call it fixed?
Thanks Steven. I'll try the todays nightly later with the attached application. If it works we should also resolve the dependent bugs. Would it be possible to get an automated test? I suspect so, or?
> Would it be possible to get an automated test? It wouldn't be easy. I looked through the UIElementInspector source to find what triggered the crash, and I didn't see anything obvious. In any case the AXUIElement... API is native, and can't be used in a mochitest.
Verified fixed with Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.2a1pre) Gecko/20090707 Minefield/3.6a1pre ID:20090707031658
Created attachment 387216 [details] [diff] [review] Fix rev1 (backport to 1.9.0 branch) It was pretty easy to backport my rev1 patch to the 1.9.0 branch. I've tested with this bug's STR (i.e. with the UIElementInspector), and saw no crashes.
Here's a tryserver build made with my rev1 patch backport from comment #24: https://firstname.lastname@example.orgemail@example.com
Comment on attachment 387216 [details] [diff] [review] Fix rev1 (backport to 1.9.0 branch) + mMenuRef = nil; Carbon references should use NULL, not nil.
Comment on attachment 387216 [details] [diff] [review] Fix rev1 (backport to 1.9.0 branch) > + mMenuRef = nil; > > Carbon references should use NULL, not nil. I'll fix this on landing.
Did this turn out to fix the topcrash too, or was the topcrash just happening to hit the same signature? need to check with crash-stats.
So far this has only been landed on the trunk. Of the 9 trunk crash stacks that crash-stats has for this signature over the last 4 weeks, none are from builds made after this patch was landed. Sounds like the problem's been fixed to me :-)
Turns out I miscounted. And since my patch was landed less than a week ago, I needn't have searched over more than one week. Crash-stats finds no crashes using those criteria. Still fixed.
Comment on attachment 387216 [details] [diff] [review] Fix rev1 (backport to 1.9.0 branch) Approved for 188.8.131.52, a=dveditz for release-drivers
Comment on attachment 387216 [details] [diff] [review] Fix rev1 (backport to 1.9.0 branch) Landed on the 1.9.0 branch: Checking in widget/src/cocoa/nsMenuItemIconX.h; /cvsroot/mozilla/widget/src/cocoa/nsMenuItemIconX.h,v <-- nsMenuItemIconX.h new revision: 1.4; previous revision: 1.3 done Checking in widget/src/cocoa/nsMenuItemIconX.mm; /cvsroot/mozilla/widget/src/cocoa/nsMenuItemIconX.mm,v <-- nsMenuItemIconX.mm new revision: 1.9; previous revision: 1.8 done Checking in widget/src/cocoa/nsMenuItemX.mm; /cvsroot/mozilla/widget/src/cocoa/nsMenuItemX.mm,v <-- nsMenuItemX.mm new revision: 1.41; previous revision: 1.40 done Checking in widget/src/cocoa/nsMenuX.mm; /cvsroot/mozilla/widget/src/cocoa/nsMenuX.mm,v <-- nsMenuX.mm new revision: 1.73; previous revision: 1.72 done
I followed the steps in comment 0 with my OS X 10.5.7 machine. With 184.108.40.206, it crashes and with last night's build (Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:220.127.116.11pre) Gecko/2009072704 GranParadiso/3.0.13pre), it does not. So this is verified18.104.22.168. Strangely, when using debug builds, I had no crash in either instance.
Steven: I hit this while running the 3.5.2 candidate on my 10.6 machine - http://crash-stats.mozilla.com/report/index/84b66520-6581-4d5b-b7b0-3f7272090731. Is there any reason we did not check it into that branch?
(In reply to comment #34) > Steven: I hit this while running the 3.5.2 candidate on my 10.6 machine - > http://crash-stats.mozilla.com/report/index/84b66520-6581-4d5b-b7b0-3f7272090731. > Is there any reason we did not check it into that branch? I feel that it is still waiting for beltzners approval. I can't see a comment that something is missing to get checked-in on 1.9.1.
We opted not to take it for 3.5.2 and instead take it in 3.5.3 since that should be in sync with 3.0.14.
Comment on attachment 387039 [details] [diff] [review] Fix rev1 -- call imgIRequest::CancelAndForgetObserver() Approved for 22.214.171.124, a=dveditz for release-drivers
Comment on attachment 387039 [details] [diff] [review] Fix rev1 -- call imgIRequest::CancelAndForgetObserver() Landed on the 1.9.1 branch: http://hg.mozilla.org/releases/mozilla-1.9.1/rev/6e8c082677a8
Verified fixed with Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:126.96.36.199pre) Gecko/20090810 Shiretoko/3.5.3pre ID:20090810040503 (In reply to comment #22) > I looked through the UIElementInspector source to find what triggered > the crash, and I didn't see anything obvious. In any case the > AXUIElement... API is native, and can't be used in a mochitest. The crash happens because of the lock on the menu element? If yes, can't we simulate a similar behavior in the test even when it is native?
> The crash happens because of the lock on the menu element? This can't be the only factor -- otherwise this crash would be much more common.
>> The crash happens because of the lock on the menu element? > > This can't be the only factor -- otherwise this crash would be much > more common. And easier to reproduce: Crashes would happen every time you "locked on" to a menu element ... which isn't true.
Means it could be another line of code in the elements inspector application which is related to our crash? After you lock to that menu which elements/attributes are accessed by the testing application? Have you tried to remove some functions from the inspector app to check? I'm not familiar with ObjectC and cannot help. :/
As I've said, I've already looked at the code and didn't see anything obvious. And yes, I did try commenting out some code ... but this didn't reveal much. Writing a test for this would require getting a reduced testcase. On current evidence this wouldn't be easy, and might not be possible.
Just so that people know: This fix isn't in Firefox 3.5.2. But it will be in FF 3.5.3, of which release candidates are available at ftp://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/3.5.3-candidates/.
Steven: I noticed when looking at crash-stats today that there a few crashes in this stack on 3.6 - http://tinyurl.com/l8qpjs
Actually I think this is a different and unrelated crash. Notice that there's no mention of objc_msgSend. This bug's crash took place at the second of the following two lines (after mNativeMenuItem had been deleted and was no longer valid): if (!mNativeMenuItem) return NS_ERROR_FAILURE; [mNativeMenuItem setImage:newImage]; The crashes you report appear to take place at the third of the following three lines: nsRefPtr<gfxImageSurface> image; imageContainer->CopyCurrentFrame(getter_AddRefs(image)); PRInt32 height = image->Height(); I *think* the fact that the crash doesn't take place on the second line means that 'image' is NULL ... but I'm not completely sure. (Breakpad currently doesn't give us the address of the bad access -- which would unambiguously tell us whether or not it was a NULL dereference. It only gives us the (useless) address in the instruction pointer when the crash took place (the "crash address").)
I wonder how CopyCurrentFrame fails (if indeed image is NULL). Bug filed?
> I wonder how CopyCurrentFrame fails (if indeed image is NULL). Don't know. > Bug filed? Not yet. I'll try to find time tomorrow to look into this further. Unless you want to :-)