Closed Bug 430219 Opened 17 years ago Closed 16 years ago

Crash when FF3 on OS X is Quit if page contains multiple plugins

Categories

(Core Graveyard :: Plug-ins, defect)

PowerPC
macOS
defect
Not set
critical

Tracking

(Not tracked)

RESOLVED FIXED
mozilla1.9.1a1

People

(Reporter: vlad.alexander, Assigned: kinetik)

References

()

Details

(Keywords: crash, regression)

Attachments

(1 file, 2 obsolete files)

User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.4; en-US; rv:1.9pre) Gecko/2008042104 Minefield/3.0pre Build Identifier: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.4; en-US; rv:1.9pre) Gecko/2008042104 Minefield/3.0pre When there are multiple instances of our plug-in on a Web page, when you Quit FF3 on OS X, there is a crash. No crash with FF2 and Safari. We ran our plug-in through debugger and we are pretty sure the problem is not with our plug-in. The crash is reproducible about 50% of the time. You can download our plug-in from: http://misc.xstandard.com/mozilla/XStandard.plugin.zip Test page here: http://misc.xstandard.com/mozilla/multiple.htm Closing a tab or just closing the browse windows does not cause a crash. The crash occurs only when FF is Quit. Reproducible: Sometimes Steps to Reproduce: 1. Install XStandard plug-in 2. Go to: http://misc.xstandard.com/mozilla/multiple.htm 3. "Quit" FF. 4. If FF does not crash, please repeat steps 1 to 3 several more times. Actual Results: Crash Expected Results: No crash
Keywords: crash, regression
A stacktrace would be appreciated.
Component: General → Plug-ins
Keywords: stackwanted
Product: Firefox → Core
QA Contact: general → plugins
Version: unspecified → Trunk
Signature QD@0x9a8c UUID 9e8c0d26-10a8-11dd-8c6a-001cc4e2bf68 Time 2008-04-22 13:13:09-07:00 Uptime 31 Product Firefox Version 3.0pre Build ID 2008042104 OS Mac OS X OS Version 10.4.9 8P2137 CPU x86 CPU Info GenuineIntel family 6 model 14 stepping 8 Crash Reason EXC_BAD_ACCESS / KERN_INVALID_ADDRESS Crash Address 0x91717a8c Comments Crashing Thread Frame Module Signature Source 0 QD QD@0x9a8c 1 XStandard XStandard@0xad291 2 HIToolbox HIToolbox@0x41a89 3 CoreFoundation CoreFoundation@0x227e1 4 CoreFoundation CoreFoundation@0x21acd 5 HIToolbox HIToolbox@0x98d7 6 HIToolbox HIToolbox@0x8f18 7 HIToolbox HIToolbox@0x8e38 8 AppKit AppKit@0x17464 9 AppKit AppKit@0x17055 10 AppKit AppKit@0x10dda 11 XUL nsAppShell::Run mozilla/widget/src/cocoa/nsAppShell.mm:591 12 XUL nsAppStartup::Run mozilla/toolkit/components/startup/src/nsAppStartup.cpp:181 13 XUL XRE_main mozilla/toolkit/xre/nsAppRunner.cpp:3170 14 firefox-bin main mozilla/browser/app/nsBrowserApp.cpp:158 15 firefox-bin start crt.c:272 16 firefox-bin start 17 @0x1 So, we're not really in the stack.... Are you saying that we didn't shut down enough things properly? Can you at least list the sequence/order/count of calls into your NPP_ methods for shutdown for ff2/ff3/safari?
What timeless is saying is that all Gecko is doing here is delivering an event to your plugin. Then your plugin does something that calls into "QD" (Quickdraw?) and crashes. You know better than us what your plugin is doing there, could you tell us? Then timeless asked you to give us a log of the NPP_ calls from Gecko into your plugin, for each of FF2, FF3 and Safari. Does that make sense? CCing Matthew Gregan, who might have some insight here. I have a vague suspicion that we might be having trouble due to unexpected sharing of Quickdraw state --- something that Matthew has been working on in another context.
> timeless asked you to give us a log of the NPP_ calls from > Gecko into your plugin, for each of FF2, FF3 and Safari. How do I do this? I imagine I need to run our plug-in in Xcode debugger? What are the steps in order to get this log? Also, are you saying that it is one of the plug-in instances on the page that are crashing which is then causing FF to crash?
if you attach a debugger, either Xcode or gdb directly, then proceed to reproduce the crash, the debugger will be able to give you a more detailed stack trace ('where' in gdb, not sure about Xcode). I presume you have a debug build of your plugin so you can see detailed stack frames for your plugin code. (In reply to comment #5) > Also, are you saying that it is one of the plug-in instances on the page that > are crashing which is then causing FF to crash? I wouldn't say that. All we know right now is that it's some combination of Firefox, your plugin and Quickdraw.
Took a few tries, but I got it to crash in GDB. Last few stack frames are: #0 0x94787d21 in SetOrigin () #1 0x1d6ad2b2 in MacTimer () #2 0x93f6ab5e in CFRunLoopRunSpecific () #3 0x93f6ad18 in CFRunLoopRunInMode () #4 0x903806a0 in RunCurrentEventLoopInMode () #5 0x903803f2 in ReceiveNextEventCommon () #6 0x9038032d in BlockUntilNextEventMatchingListInMode () #7 0x969e67d9 in _DPSNextEvent () #8 0x969e608e in -[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] () #9 0x12ae95a0 in nsAppShell::ProcessNextNativeEvent (this=0x6300a0, aMayWait=0) at /Users/kinetik/work/mozilla-cvs/mozilla/widget/src/cocoa/nsAppShell.mm:501 While trying to invoke a crash, I noticed a couple of problems with the plugin when using malloc debugging (MallocScribble=1 MallocGuardEdges=1 MallocPreScribble=1). During plugin initialization, I got this a couple of times per plugin: -1608712288[609cf0]: ns4xPluginInstance::SetWindow (about to call it) this=602280 firefox-bin(62868,0xa01cffa0) malloc: *** error for object 0xaaaaaaaa: Non-aligned pointer being freed *** set a breakpoint in malloc_error_break to debug Breakpoint 1, 0x911bf661 in malloc_error_break () #0 0x911bf661 in malloc_error_break () #1 0x911ba64f in szone_error () #2 0x910dec83 in szone_free () #3 0x910deaed in free () #4 0x1d708c54 in CBelusXEditorCtrl::LoadDefaultXSLT () at nsTSubstring.h:177 #5 0x1d70a331 in CBelusXEditorCtrl::CBelusXEditorCtrl () at nsTSubstring.h:177 #6 0x1d66630b in CPlugin::Init () at nsTSubstring.h:177 #7 0x1d6671a0 in NPP_SetWindow () at nsTSubstring.h:177 #8 0x12a16ee3 in ns4xPluginInstance::SetWindow (this=0x602280, window=0x1c71d2c4) at /Users/kinetik/work/mozilla-cvs/mozilla/modules/plugin/base/src/ns4xPluginInstance.cpp:1173 #9 0x12a48d0d in nsPluginNativeWindow::CallSetWindow (this=0x1c71d2c0, aPluginInstance=@0xbfffdb98) at nsPluginNativeWindow.h:93 And this during plugin shutdown: -1608712288[609cf0]: ns4xPluginInstance::Stop this=602280 Program received signal EXC_BAD_ACCESS, Could not access memory. Reason: KERN_INVALID_ADDRESS at address: 0xaaaaaaaf 0x93f6c5c4 in CFRelease () (gdb) bt #0 0x93f6c5c4 in CFRelease () #1 0x1d75233c in CCustToolbar::~CCustToolbar () at nsTSubstring.h:177 #2 0x1d7101a7 in CBelusXEditorCtrl::~CBelusXEditorCtrl () at nsTSubstring.h:177 #3 0x1d6667f0 in CPlugin::shut () at nsTSubstring.h:177 #4 0x1d6669b4 in NPP_Destroy () at nsTSubstring.h:177 #5 0x12a19261 in ns4xPluginInstance::Stop (this=0x602280) at /Users/kinetik/work/mozilla-cvs/mozilla/modules/plugin/base/src/ns4xPluginInstance.cpp:955 #6 0x1138a2f1 in DoStopPlugin (aInstanceOwner=0x1d5031d0, aDelayedStop=0) at /Users/kinetik/work/mozilla-cvs/mozilla/layout/generic/nsObjectFrame.cpp:1819
Thank you Matthew. Are these problems related to the crash or are they separate issues? Any idea why it is crashing in FF3 and not in FF2 and Safari? Here is a debug build of our plug-in: http://misc.xstandard.com/mozilla/debug/XStandard.plugin.zip Are you able to get the stack track with our debug build so that we can get more info as to where the problems may be?
I think they're unrelated, I was using the malloc debugging configuration to try to make the crash easier to reproduce or crash earlier. Since I ran into those, I thought I'd mention them so someone working on the plugin can take a look. Thanks for the debug build, I'll try that in a moment. I've done some more digging with the regular plugin and I'll include my findings so far below. I don't yet know why the crash occurs inside the plugin with Fx3 but not with Fx2 or Safari. I'll keep digging. It looks like what's happening is that a CoreFoundation (or some variant, I didn't check the actual timer type) timer event is firing and calling MacTimer in the plugin code, which calls BackupOldPort (also in the plugin code), which calls (via a tail call) SetOrigin (a QD call), where the code crashes trying to use a graphics port that is no longer valid. When setting up the plugin, we pass the graphics port 0x1ba8d0a0 via SetWindow: NPP SetWindow called: this=0x602280, [x=8,y=224,w=660,h=200], win[0x1ba8d0a0, -8, -224] clip[t=224,b=424,l=8,r=668], return=0 When quiting, the nsChildView widget associated with the plugin is destroyed: widget 0x1db1a620/0x1db11860 destroyed win 0x1ba7fcf0 port 0x1ba8d0a0 plugin port 0x1ba8d0a0 The underlying NSView is destroyed shortly afterwards (we override NSView's dealloc so we can set the current graphics port to NULL via SetPort): widget dealloced 0x1db11860, port cleared Even later in shutdown, we crash with a stack like the one I pasted above (MacTimer->BackupOldPort->SetOrigin), where SetOrigin is trying to access the PixMap associated with the port, but the port is well dead by this point. Ah, I managed to reproduce this with NPAPI logging enabled now. You can see the widgets and nsObjectFrames being destroyed, but I don't see any call to stop or destroy the plugins: -1608712288[609cf0]: NPP HandleEvent called: this=602280, npp=60229c, event=-1073750824, return=1 -1608712288[609cf0]: NPP HandleEvent called: this=1d50aa10, npp=1d50aa2c, event=-1073750824, return=1 24 NS_KEY_DOWN widget=0x1bad4ad0 name=something id=0x0 refpt=0,0 25 NS_KEY_PRESS widget=0x1bad4ad0 name=something id=0x0 refpt=0,0 26 NS_KEY_UP widget=0x1bad4ad0 name=something id=0x0 refpt=0,0 27 UNKNOWN: 108 widget=0x1bad4ad0 name=something id=0x0 refpt=0,0 28 NS_LOSTFOCUS widget=0x1bad4ad0 name=something id=0x0 refpt=0,0 --WEBSHELL 0x1baf2ad0 == 5 widget 0x605530/0x605600 destroyed win 0x1ba7fca0 port 0x1ba8cfc0 plugin port 0x1ba8cfc0 29 NS_DESTROY widget=0x605530 name=something id=0x0 refpt=0,0 -1608712288[609cf0]: nsObjectFrame c9c774 deleted widget 0x6016c0/0x1db10f90 destroyed win 0x1ba7fca0 port 0x1ba8cfc0 plugin port 0x1ba8cfc0 30 NS_DESTROY widget=0x6016c0 name=something id=0x0 refpt=0,0 -1608712288[609cf0]: nsObjectFrame c9ca08 deleted widget 0x1d519be0/0x1d519c80 destroyed win 0x1ba7fca0 port 0x1ba8cfc0 31 NS_DESTROY widget=0x1d519be0 name=something id=0x0 refpt=0,0 widget 0x1d50a5f0/0x1d50a6c0 destroyed win 0x1ba7fca0 port 0x1ba8cfc0 32 NS_DESTROY widget=0x1d50a5f0 name=something id=0x0 refpt=0,0 --WEBSHELL 0x1baf4c90 == 4 widget 0x1c722310/0x1c7223b0 destroyed win 0x1ba7fca0 port 0x1ba8cfc0 33 NS_DESTROY widget=0x1c722310 name=something id=0x0 refpt=0,0 widget 0x1c718660/0x1c718700 destroyed win 0x1ba7fca0 port 0x1ba8cfc0 34 NS_DESTROY widget=0x1c718660 name=something id=0x0 refpt=0,0 widget 0x1c716bd0/0x1c716c70 destroyed win 0x1ba7fca0 port 0x1ba8cfc0 35 NS_DESTROY widget=0x1c716bd0 name=something id=0x0 refpt=0,0 widget 0x1bad4ad0/0x1bad4b70 destroyed win 0x1ba7fca0 port 0x1ba8cfc0 36 NS_DESTROY widget=0x1bad4ad0 name=something id=0x0 refpt=0,0 -1608712288[609cf0]: WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80520012: file /Users/kinetik/work/mozilla-cvs/mozilla/xpcom/io/nsLocalFileOSX.mm, line 482 WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80520012: file /Users/kinetik/work/mozilla-cvs/mozilla/xpcom/io/nsLocalFileOSX.mm, line 482 Program received signal EXC_BAD_ACCESS, Could not access memory. Reason: KERN_INVALID_ADDRESS at address: 0x6c7bc000 0x94787d21 in SetOrigin ()
The reason there's a crash when quitting but not when closing tabs is that the toplevel window is destroyed when quitting, and the lifetime of the graphics port passed to the plugin is tied to this. Indeed, I've just confirmed that I can reproduce the same crash by using two browser windows--when closing the window containing the plugins (and leaving the other window open), we crash with the same stack. It looks like the plugin destroy path is racing with code the plugin is running via timers. Once the nsObjectFrame is destroyed, we're doing a delayed plugin stop, which puts a runnable into the our event queue to complete the plugin stop "later" when we process events. The problem is that we can end up destroying the top level window before this delayed plugin stop code runs--if the plugin manages to run code after the toplevel window is destroyed but before the delayed plugin stop code runs, we can crash.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Attached patch patch v1 (obsolete) — Splinter Review
Add the XStandard plugin to the list of plugins we must stop immediately. This is an extension of the wallpaper fix from bug 426524.
Assignee: nobody → kinetik
Status: NEW → ASSIGNED
Attachment #318061 - Flags: superreview?(jst)
Attachment #318061 - Flags: review?(jst)
It's easier to reproduce this crash (and thus test the fix) by opening the testcase URL in multiple windows. I found that 10 windows were enough to make the bug occur almost every time I quit the browser.
>Add the XStandard plugin to the list of plugins we must stop immediately. Matthew, what does this statement mean?
Plugin destruction was changed last year (see bug 347743) to avoid destroying the plugin when destroying the associated frame--instead, we post an event to our event loop to destroy the plugin next time we return to the event loop. It turns out that this causes problems with a number of plugins (QuickTime, Flip4Mac, and XStandard) because they can run code between the time frame destruction (and associated required resources, in this case the graphics port associated with the top level window) occurs and when the delayed destruction event tells the plugin to stop. The patch I posted is working around this problem by forcing the named plugins to be destroyed immediately rather than via the delayed destruction mechanism. We need to fix this properly at some point (as I don't believe the plugins are doing anything wrong), but it is very late in the release cycle for Firefox 3, so working around the problem where necessary is safer than more invasive fixes.
Blocks: 347743
Keywords: stackwanted
(In reply to comment #14) > We need to fix this properly at some point (as I don't believe the plugins are > doing anything wrong), but it is very late in the release cycle for Firefox 3, > so working around the problem where necessary is safer than more invasive > fixes. The plugins are told about this by us calling setWindow(null), so the plugins know that the view port they're drawing to went away. Seems to me that if a plugin depends on that being there after getting that call it's a plugin bug, not a bug in our code. We obviously changed how our code works, so some of the assumptions that plugin vendors have made in the past and gotten away with changed, but that doesn't mean it's something we can't do. There's reasons for us doing delayed plugin destruction as well, but it appears we need to fix the issues on both sides of this fence one a case by case basis :(
Attachment #318061 - Flags: superreview?(jst)
Attachment #318061 - Flags: superreview+
Attachment #318061 - Flags: review?(jst)
Attachment #318061 - Flags: review+
(In reply to comment #15) > The plugins are told about this by us calling setWindow(null), so the plugins > know that the view port they're drawing to went away. Seems to me that if a > plugin depends on that being there after getting that call it's a plugin bug, > not a bug in our code. Ah, but the plugin never sees that a SetWindow(NULL) in this case, and (as far as I can tell) ever, due to the conditional at http://mxr.mozilla.org/firefox/source/modules/plugin/base/src/ns4xPluginInstance.cpp#1148 There is no SetWindow call at all (NULL or not) between when I quit and when we start tearing things down--see the log output in comment #9. Is this the core of the problem? I tried changing ns4xPluginInstance::SetWindow to allow a NULL window to be passed to the plugin, but (in this case, with the XStandard plugin), it crashes during a SetWindow call trying to dereference the window pointer. From http://developer.mozilla.org/en/docs/NPP_SetWindow I get the impression that the NPWindow pointer should be valid, but the void* window handle (the nsPluginPort in Gecko) should be NULL. I tried changing ns4xPluginInstance::SetWindow to pass a zeroed out NPWindow (so it had a null nsPluginPort pointer), which didn't cause a crash during the SetWindow call, but also wasn't sufficient to shut the plugin down and avoid the crash on quit. This might be the case you're referring to where the plugin is making assumptions or not handling this case correctly (i.e. not treating SetWindow with a NULL window handle as a destruction request). Thanks for the review!
Comment on attachment 318061 [details] [diff] [review] patch v1 Requesting approval. Wallpaper over a problem where we can crash in plugin code during shutdown after attempting a delay destruction of the plugin. Simple changes, very low chance of regressions. This only affects plugin destruction behaviour for the XStandard plugin and is extending an existing fix working around the same issue with QuickTime and Flip4Mac plugins.
Attachment #318061 - Flags: approval1.9?
erm. this sucks. r- please provide a pref which is read at runtime which enables people to add other plug-ins to this category. and while you're there, please add zinc, since afaict bug 429604 is their report of the same problem.
Depends on: 429604
Our plugin (zinc, linux, bug 429604) encountered this problem only with this commit:http://bonsai.mozilla.org/cvslog.cgi?file=mozilla/layout/generic/nsObjectFrame.cpp&rev=1.640 13-03-2008. Before that Firefox3 Beta1 to Beta4 were fine.
Comment on attachment 318061 [details] [diff] [review] patch v1 I'm not sure I understand what timeless means in comment 18. r- ? Re-request approval when this is clarified.
Attachment #318061 - Flags: approval1.9? → approval1.9-
While in general most plugins don't care about the shutdown order it seems reasonable that some may require notification before the window is invalid to enable them to disable interaction with the plugin window (such as timed redraws), or release resources before it is invalid (both of which our zinc plugin tries to do). It seems that it isn't just a fix to work around assumptions made by certain plugins, a new plugin may want this ability too, I would happily do whatever I can to change the extension/plugin but I don't know of any way to get these notifications as it seems that the ns4xPluginInstance does not allow the nsPluginInstanceVariable_CallSetWindowAfterDestroyBool variable to be set which would avoid the event based destruction and as discussed above the SetWindow(NULL) call doesn't propagate and if it was changed to do so then many more plugins might break. Currently I misname our plugin, but this seems unfortunate, and so if the only way to work around this is to get your plugin specially hardcoded, bug 429604 requests that our plugin name is added to the list. This change was added quite recently as far as I can tell, 13 March 2008, for reasons I am unable to see, rather than last year sometime as implied above, certainly that is the nightly version that broke for us, so I'm not just raising at this late stage something that has been around for some time.
Bug 416953 is the one mentioned in the commit I linked to in comment 19.
Additionally I note that the 2 already included and 1 proposed named exceptions all claim to be Registered Trade Marks if I look on their respective websites, making it less tenable to just misname your plugin to get the required behaviour. Maybe it would be better to include an exception for some generic name (as well as the known plugins), if this is the only way for a plugin developer to disable this behaviour.
Attached patch patch v2 (obsolete) — Splinter Review
Requesting approval. Wallpaper over a problem where we can crash in plugin code during shutdown after attempting a delay destruction of the plugin. Simple changes, very low chance of regressions. This only affects plugin destruction behaviour for the XStandard plugin and CMISS Zinc Plugin and is extending an existing fix working around the same issue with QuickTime and Flip4Mac plugins.
Attachment #320854 - Flags: review+
The patch v2 does not attempt to address the problem in general but it is difficult to know how to address this when I'm unable to see the closed bug which changed all plugins to delay shutdown. I could work on a patch to expose the nsPluginInstanceVariable_CallSetWindowAfterDestroyBool variable in NSAPI but is that what is needed?
Attachment #320854 - Flags: superreview?(jst)
Attachment #320854 - Flags: review?(jst)
Attachment #320854 - Flags: review+
Blocks: 416953
Comment on attachment 320854 [details] [diff] [review] patch v2 This looks good for now. r+sr=jst The problem here is really what you pointed out earlier, that SetWindow(NULL) doesn't reach the plugin, and that plugins don't expect to get a null plugin passed to them. For the next big release we could attempt to change that, make it so that plugins do get a SetWindow(NULL) call so that they can properly deal with this situation. On Win32 it's not a big deal as there we reparent the plugin window and delay the destruction of it, but we're not able to do that on any other platform (that I know of), and that's why we have this problem. Given that, and given that this code started as a fix for bug 426524, we should probably make this code MAC/Linux only, or not Win32 rather. So please wrap the appropriate #ifdef's around the checks in this code before landing. I'm happy to look over a patch that does that if you want... Thanks for working on this problem!
Attachment #320854 - Flags: superreview?(jst)
Attachment #320854 - Flags: superreview+
Attachment #320854 - Flags: review?(jst)
Attachment #320854 - Flags: review+
Could someone please set "checkin-needed" in the keywords for this bug, I believe that is the next step. I don't seem to have the privileges to do that myself here. Thanks.
Attached patch what I landedSplinter Review
This is what I landed. jst, can you confirm this is what you wanted in the end?
Attachment #318061 - Attachment is obsolete: true
Attachment #320854 - Attachment is obsolete: true
Attachment #329209 - Flags: review?(jst)
Pushed in 15868:4cdf6817a49c.
Status: ASSIGNED → RESOLVED
Closed: 16 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Target Milestone: --- → mozilla1.9.1a1
Comment on attachment 329209 [details] [diff] [review] what I landed Yeah, looks good.
Attachment #329209 - Flags: review?(jst) → review+
Blocks: 429604
No longer depends on: 429604
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: