Closed Bug 76936 Opened 24 years ago Closed 24 years ago

Leaving an applet page crashed browser - Trunk & M09 [@ nsPluginInstanceOwner::~nsPluginInstanceOwner]

Categories

(Core Graveyard :: Java: OJI, defect)

PowerPC
Mac System 9.x
defect
Not set
critical

Tracking

(Not tracked)

VERIFIED FIXED
mozilla0.9

People

(Reporter: joe.chou, Assigned: blizzard)

References

()

Details

(Keywords: crash, topcrash, Whiteboard: [critical to mozilla 0.9] [seeking reviews])

Crash Data

Attachments

(6 files)

Running a recent trunk build (at least after April 13th) on Linux and Solaris, after loading a page with applets (i.e., www.java.sun.com), then either hit the Back button or modify thr URL to go to another page, the browser will crash.
*** This bug has been marked as a duplicate of 75070 ***
Status: UNCONFIRMED → RESOLVED
Closed: 24 years ago
Resolution: --- → DUPLICATE
------- Additional Comments From edburns@acm.org 2001-04-12 15:21 ------- I'm finding that it's segfaulting when NS_RELEASING the nsIPluginInstance that is the java plugin. This is here: nsObjectFrame.cpp:1492 if (nsnull != mInstance) { if (mPluginHost) mPluginHost->StopPluginInstance(mInstance); NS_RELEASE(mInstance); } This translates to the release method in /net/jano/export/disk02/deployment/ws/ladybird/ext/plugin/oji-plugin/src/motif/navig5/JavaPluginInstance5.cpp I believe. I'm re-assinging this to Jim so his team can look at it. Ed ------- Additional Comments From edburns@acm.org 2001-04-12 15:44 ------- *** Bug 75515 has been marked as a duplicate of this bug. *** ------- Additional Comments From edburns@acm.org 2001-04-12 15:47 ------- It turns out that in the case of a page with more than one applet, only the NS_RELEASE(mInstance) call on the last applet will crash. ------- Additional Comments From Robert Kaiser 2001-04-13 07:28 ------- This seems to be happening for everyone on *ix Systems when exiting a page with a Java Plugin, interpreting what I read in bug 75515. Should we nominate this for 0.9? ------- Additional Comments From edburns@acm.org 2001-04-13 14:41 ------- change subject to be more accurate. ------- Additional Comments From edburns@acm.org 2001-04-13 14:42 ------- *** Bug 75862 has been marked as a duplicate of this bug. *** ------- Additional Comments From edburns@acm.org 2001-04-13 14:45 ------- Jim, can you please have someone from your team investigate this bug? Thanks, Ed ------- Additional Comments From edburns@acm.org 2001-04-13 14:51 ------- *** Bug 73541 has been marked as a duplicate of this bug. *** ------- Additional Comments From edburns@acm.org 2001-04-16 22:51 ------- *** Bug 75565 has been marked as a duplicate of this bug. *** ------- Additional Comments From edburns@acm.org 2001-04-16 22:52 ------- There are at least four, and probably more bugs that are marked as dup of this bug. This bug is real important. Ed ------- Additional Comments From jpatel@netscape.com 2001-04-17 14:21 ------- Adding crash, topcrash keywords, Trunk and [@ libgklayout.so] to summary. This crash has been on the Talkback topcrash list under that stack signature for a few days now. The libgklayout.so crash is very easily reproducible, just go to this url: http://www.elendor.net Here are a few Talkback entries for that crash: libgklayout.so + 0x5991d (0x4048891d) ecd5fd41 line Build: 2001041106 CrashDate: 2001-04-12 UptimeMinutes: 87 Total: 425 OS: Linux 2.4.1 Detailed : http://climate/reports/incidenttemplate.cfm?bbid=29004983 StackTrace: http://climate/reports/stackcommentemail.cfm?dynamicBBID=29004983 (29004983) URL: www.elendor.net (29004983) Comments: switched from www.elendor.net to cnn.com... This has been happening for several days and involves going to any java applet holding page and then leaving it. The applets run fine libgklayout.so + 0x599cd (0x4103c9cd) 95475b04 line Build: 2001041208 CrashDate: 2001-04-12 UptimeMinutes: 1 Total: 48 OS: Linux 2.2.14-5.0smp Detailed : http://climate/reports/incidenttemplate.cfm?bbid=29021948 StackTrace: http://climate/reports/stackcommentemail.cfm?dynamicBBID=29021948 (29021948) URL: clicked roofing materials link (29021948) Comments: http://www.reroofingshowroom.com/ ------- Additional Comments From edburns@acm.org 2001-04-18 13:16 ------- This is happening on Solaris with today's trunk. ------- Additional Comments From Joe Chou 2001-04-18 18:32 ------- This problem happened on trunk build on Unix, but not on OEM build. Based on the crash stack, the problem of TalkBack reports, going to "http://www.elendor.net" then hit Back button and crashed, has the identical stack (see below). Also the crash happens as long as you are leaving an applet page (to a non-applet page or another applet page or any page), and you don't have to click Back button, you may change the URL and return to crash it. I have not tried on Waterfall yet, since my build still has compiling problems, and I tried to launch George's build without success. Here is the top of the crash stack: 0xfb9a7eb8 in nsPluginInstanceOwner::~nsPluginInstanceOwner (this=0x753668, __in_chrg=3) at nsObjectFrame.cpp:1506 1506 NS_RELEASE(mInstance); (gdb) where #0 0xfb9a7eb8 in nsPluginInstanceOwner::~nsPluginInstanceOwner ( this=0x753668, __in_chrg=3) at nsObjectFrame.cpp:1506 #1 0xfb9a8474 in nsPluginInstanceOwner::Release (this=0x753668) at nsObjectFrame.cpp:1583 #2 0xfb9a2df4 in nsObjectFrame::~nsObjectFrame (this=0x7ca4dc, __in_chrg=3) at nsObjectFrame.cpp:267 #3 0xfb960a98 in nsFrame::Destroy (this=0x7ca4dc, aPresContext=0x725cc0) at nsFrame.cpp:427 #4 0xfb9594f4 in nsContainerFrame::Destroy (this=0x7ca4dc, aPresContext=0x725cc0) at nsContainerFrame.cpp:98 #5 0xfb9a3b54 in nsObjectFrame::Destroy (this=0x7ca4dc, aPresContext=0x725cc0) at nsObjectFrame.cpp:471 #6 0xfb998368 in nsLineBox::DeleteLineList (aPresContext=0x725cc0, aLine=0x7cab80) at nsLineBox.cpp:251 #7 0xfb942738 in nsBlockFrame::Destroy (this=0x7ca378, aPresContext=0x725cc0) at nsBlockFrame.cpp:1240 #8 0xfbb3dd6c in nsFrameList::DestroyFrames (this=0x7ca350, aPresContext=0x725cc0) at nsFrameList.cpp:41 #9 0xfb9594e4 in nsContainerFrame::Destroy (this=0x7ca31c, aPresContext=0x725cc0) at nsContainerFrame.cpp:95 #10 0xfbb3dd6c in nsFrameList::DestroyFrames (this=0x80afa0, aPresContext=0x725cc0) at nsFrameList.cpp:41 ------- Additional Comments From Brendan Eich 2001-04-19 11:05 ------- What's the diagnosis? Memory corruption? Ref-count underflow (one too many RELEASES on what mInstance pointed to)? /be ------- Additional Comments From Joe Chou 2001-04-19 11:18 ------- Based on the preliminary trace, in both a crashing case and a non-crash case, the destructor was invoked twice, and in both times the pointer of minstance seemed valid (not null nor strange value), but it crashed in the second time of calling the release in the crashing case. It seems a memory corruption, but need further investigation.
Status: RESOLVED → NEW
Ever confirmed: true
Status: NEW → ASSIGNED
Resolution: DUPLICATE → ---
This bug is created since the original problem of 75070 is a different problem, which is a duplicate of 74531. This bug is to address a more serious problem that the browser crashes when leaving any applet page.
Whiteboard: critical to mozilla 0.9
adding keywords from bug 75070, adding mozilla0.9 to reflect status whiteboard comment. summary of 75070 was "Trunk crash [@ libgklayout.so](SEGV) upon leaving a java-enabled page (was: Segmentation fault (libgklayout.so), debugfriendly build?)"
*** Bug 77163 has been marked as a duplicate of this bug. ***
Joe, I think the reason that ~nsPluginInstanceOwner is called twice is: there are two applets in the page. I think the crash happens when the refcount goes to zero.
*** Bug 77220 has been marked as a duplicate of this bug. ***
Adding Trunk [@ nsPluginInstanceOwner::~nsPluginInstanceOwner] to summary for tracking. Here is some info about the crash from today's Talkback topcrash report: nsPluginInstanceOwner::~nsPluginInstanceOwner 24 First BBID :29071286 Last BBID :29481330 Min Runtime :43 Max Runtime :25086 First Appearance Date : 2001-04-13 Last Appearance Date : 2001-04-22 First BuildID : 2001041306 Last BuildID : 2001042207 Some user commments and URLs: (29481330) URL: www.rgbgallery.com (29455441) URL: www.space.com (29455441) Comments: Pulling up space.com homepage after failing to get NASA TV streaming via the Real plugin at http://www.unitedspacealliance.com (29455158) URL: www.answermed.com (29291627) URL: http://www-4.ibm.com/software/events/keynotes/lvg/lvg_100k2.html (29291627) Comments: leaving the site after playing by clicking on a link (29290566) URL: http://www-4.ibm.com/software/events/keynotes/lvg/lvg_100k1.html (29290566) Comments: this time it was already stopped (29290519) URL: http://www-4.ibm.com/software/events/keynotes/lvg/lvg_100k1.html (29290519) Comments: While the video is running (29290159) Comments: starting a realplayer in a window (29289963) Comments: sending a mail when switching to another mozilla window with a realplayer in it (stopped) (29283816) Comments: launcing a video from yahoo crashes. (29085103) URL: news/foo.com (crash on startup) (29071286) URL: www.vw.com (29071286) Comments: I was closing the real player plugin powered RadioVW when it crashed the browser
Summary: Leaving an applet page crashed browser → Leaving an applet page crashed browser - Trunk [@ nsPluginInstanceOwner::~nsPluginInstanceOwner]
we gotta get this fixed for 0.9... anyone know what's going on yet, or what the patch might be?
Target Milestone: --- → mozilla0.9
cc'ing Peter Lubczynski for possible plugin involvement.
Eww...thanks for the cc:...It could be one of my checkins to nsObjectFrame.cpp but since it's been happening since the 12th and Ed Burns says: >>------- Additional Comments From edburns@acm.org 2001-04-12 15:47 ------- >> >>It turns out that in the case of a page with more than one applet, only the >>NS_RELEASE(mInstance) call on the last applet will crash. It could also possibly be some other checkins to plugins which shutdown plugins when the last instance is destroyed or because of recent fixed to when Java starts (not till needed). cc:ing Andrei and Sean Gussing for a fix would be to make sure that mPluginHost->StopPluginInstance(mInstance) isn't called for the last instance of Java??? Taking a look.....does someone have java and a fresh build on their Linux machine? Does this happen on Windows or Mac too?
This crashing problem does not happen on Windows.
nsPluginHostImpl.cpp changed on Apr 11 for bug 74485. Part of the patch affected nsPluginHostImpl::StopPluginInstance() which is called from nsObjectFrame.cpp right before the NS_RELEASE of the plugin instance. If the crash is actually occurring in the call to nsIPluginInstance->Release then it seems like there's a ref cnt problem and the mInstance value, which was once valid, is no longer valid. Joe: can you confirm this? What does the java plugin instance return for nsIPluginInstance->GetValue (nsPluginInstanceVariable_DoCacheBool, ...)?
I'll give it a shot (removing the patch of 74485) on Solaris today, ans see if the problem would go away.
Happens to me on a fresh pull on Linux today
Sean, I put some printf's in on someone machine and I confrim that the Java plugin is NOT being cached. I bet expanding the check to Java would solve this crash. I'll see if I can create a patch but what's the status of mozilla0.9?
Sean, I see why you are asking that. If it returns "don't cache", and I beleive it does, then StopPluginInstance will remove it from the active instance list, and if this is a last instance of the plugin we also fire TryUnloadPlugin which calls nsIPlugin::Shutdown and releases nsIPlugin. And then we come back to the nsPluginInstanceOwner::~nsPluginInstanceOwner and do this NS_RELEASE(mInstance). Could this be a problem? I would expect no instances existing by this point.
Yes Andrei, that's right. I'm trying to see how to fix this from the plugins side but I think it looks messy. I don't recall an easy way to check if a plugin instance is java without checking it's mime-type. Perhaps either the plugin can be modified to set the cache variable or perhaps when we create the Java instance, we set the variable.
Some of the talkbacks mention RealPlayer - so I don't think a Java specific solution is right - unless those urls that mention Real are also using Java. Talking out loud: Seems like there's a refcnt problem with plugins on unix. But also seems like NS_RELEASE(mInstance) shouldn't happen after calling host->StopPluginInstance(mInstance); since if a plugin isn't cached, we try to unload the module (in which case the refcnt problem is on the non-unix platforms?).
OK, I think we may finally have some scant cycles to help with this bug. What would be most helpful from the Java Plug-in team at this point? I do not have Mozilla code expertise on my team, so we could mostly help from the plugin side of things. Suggestions? Requests? - Jim Melvin
On Linux and Solaris, are PR_UnloadLibrary() (like that in nsPluginTag::TryUnloadPlugin) calls protected by OS refcnts like they are on Windows? If they aren't, then that could be a problem that wouldn't be manifested on Windows. I think that the NS_RELEASE(mInstance) calls in both nsObjectFrame.cpp and nsPluginViewer.cpp should be moved up one line before the StopPluginInstance calls: if (mInstance) mInstance->Release; if (mPluginHost) // the plugin host still has a reference to the instance mPluginHost->StopPluginInstance(mInstance); mInstance = nsnull;
Peter, I don't understand how Java is different from other plugins in this respect. And why this is specific to Unix. Looking at the code I cannot see any mismatched addref/release's. In fact, release in the nsPluginInstanceOwner destructor makes sense since we addref it in nsPluginInstanceOwner::SetInstance.
Sean, I think PR_UnloadLibrary should be refcounted by NSPR. But I think your suggestion might work. Can somebody try it, please?
Andrei, yep NSPR handles the PR_UnloadLibrary nicely.
I've tried removing the patch of 74485, and it did not help, still crashed at the same spot. I am going to tried Sean's suggestion next.
Tried Sean's suggestion, releasing mInstance before calling StopPluginInstance(), still crash with the following stack (see below after the code change): in nsObjectFrame.cpp: ... nsPluginInstanceOwner::~nsPluginInstanceOwner() { PRInt32 cnt; // shut off the timer. if (mPluginTimer != nsnull) { mPluginTimer->Cancel(); } if (mInstance) mInstance->Release(); if (mPluginHost) mPluginHost->StopPluginInstance(mInstance); mInstance = nsnull; ... and in nsPluginViewer.cpp: ... if (nsnull != mInstance) { PRBool doCache = PR_TRUE; // determine if the plugin wants to be cached mInstance->GetValue(nsPluginInstanceVariable_DoCacheBool, (void *) &doCache); mInstance->Stop(); if (!doCache) { // if not, destroy the instance mInstance->Destroy(); } mInstance->Release(); if (doCache) { nsCOMPtr<nsIPluginHost> host; host = do_GetService(kCPluginManagerCID); if(host) host->StopPluginInstance(mInstance); } mInstance = nsnull; } ... Program received signal SIGSEGV, Segmentation fault. [Switching to LWP 4] 0xfc1af9f0 in ?? () (gdb) where #0 0xfc1af9f0 in ?? () #1 0xfd953c0c in g_io_unix_dispatch (source_data=0x756520, current_time=0xffbee670, user_data=0x635020) at giounix.c:135 #2 0xfd9558e8 in g_main_dispatch (current_time=0xffbee670) at gmain.c:656 #3 0xfd956170 in g_main_iterate (block=-40390908, dispatch=1) at gmain.c:874 #4 0xfd956384 in g_main_run (loop=0x19e730) at gmain.c:932 #5 0xfda3f9ac in gtk_main () at gtkmain.c:476 #6 0xfdbbfb84 in nsAppShell::Run (this=0x7d308) at nsAppShell.cpp:360 #7 0xfdcc66a4 in nsAppShellService::Run (this=0xab800) at nsAppShellService.cpp:407 #8 0x230e8 in main1 (argc=1, argv=0xffbeeafc, nativeApp=0x0) at nsAppRunner.cpp:1005 #9 0x24440 in main (argc=1, argv=0xffbeeafc) at nsAppRunner.cpp:1300
So the crash moved? Any idea what was at 0xfc1af9f0 - part of the Java plugin? an nsIPluginInstance, nsIPlugin, nsIModule? If you step though the StopPluginInstance call, does the pluginHost call PR_UnloadLibrary on the Java plugin? Try a breakpoint in nsPluginTag::TryUnloadPlugin (nsPluginHostImpl.cpp).
What about using the NS_IF_RELEASE macro?
If the problem was that mInstance was no longer valid, then the NS_IF_RELEASE macro offers no protection since it just checks to see that the argument is non-null, not that it's pointing to valid memory. It could be non-null and invalid if the plugin library was unloaded or the plugin decided to delete the plugin instance even though the instance had outstanding references.
The call to mInstance->Release() before calling host->StopPluginInstance(mInstance) in that last code fragment is bogus. As far as your code is concerned, mInstance should be considered dead after calling Release(), as you may have decremented the refCount to 0 at that point.
Yeah it's bogus. This means that the plugin host can't unload plugin libraries since it doesn't have any means of determining whether there are outstanding references to plugin instances. An alternative is to make the pluginInstanceOwner's mInstance a weak reference that doesn't get addref'd or released (the plugin host already has a strong reference to the instance).
I am just wondering why this showed up only a week ago? The biggest suspect of course would be my fix to 74484 which happened to be not the case. What else?
bug 77319 looks similar - WinNT / default plugin.
Blocks: 77319
To bring us back on track.....Sean I think you were most correct. I set a breakpoint (in Windows) at nsPluginTag::TryUnloadPlugin and sure enough it gets called along with PR_UnloadLibrary. I think (please correct me if I'm wrong) but the Java plugin must not be unloaded once it is loaded. This is probably causing the crash as I looked at the refcount it looks correct (at least on Windows). Since I can't reproduce this, let me try creating an experimental patch for someone to try for TryUnloadPlugin to NOT return if the tag is Java. Comments Andrei/Sean? Oh, and I couldn't reproduce the stack in bug 77319 on Windows. I got another stack for which I attached a patch.
Sounds good. You mentioned that there was a code change recently regarding when java gets started up. Where does that code live?
What do you mean by 'TryUnloadPlugin to NOT return if the tag is Java'? To not unload the library? When is it going to be unloaded then? We already have a mechanism to postpone unloading libraries, but this one look a special case, it needs to not be unloaded till the application exits, right?
Ok, my dirty patch should treat Java like a cached plugin in StopPluginInstance() so it shouldn't try to unload the library untill we shut down. Now testing on Linux....(please try as well) Andrei, TryUnloadPlugin() was the wrong place to check. I think this patch may prevent or delay the crash.
Whoops - guess I wasn't following correctly. I think that the java plugin instances should not be cached but rather the plugin module should not be unloaded when the instance count reaches zero. As I recall, the nsPluginInstanceVariable_DoCacheBool variable was created specifically to address problems with caching of applets at Sun's request. Applying the patch will tell us if we are barking up the right tree though.
I tried Peter's patch, and seemed got the same result as Sean's: Program received signal SIGSEGV, Segmentation fault. 0xfcaaf9f0 in ?? () (gdb) where #0 0xfcaaf9f0 in ?? () #1 0xfe933c0c in g_io_unix_dispatch (source_data=0x72f0a8, current_time=0xffbee670, user_data=0x631098) at giounix.c:135 #2 0xfe9358e8 in g_main_dispatch (current_time=0xffbee670) at gmain.c:656 #3 0xfe936170 in g_main_iterate (block=-23744764, dispatch=1) at gmain.c:874 #4 0xfe936384 in g_main_run (loop=0xc2fa0) at gmain.c:932 #5 0xfdebf9ac in gtk_main () at gtkmain.c:476 #6 0xfd6bfb84 in nsAppShell::Run (this=0x7d308) at nsAppShell.cpp:360 #7 0xfd7c66a4 in nsAppShellService::Run (this=0xaba08) at nsAppShellService.cpp:407 #8 0x230e8 in main1 (argc=1, argv=0xffbeeafc, nativeApp=0x0) at nsAppRunner.cpp:1005 #9 0x24440 in main (argc=1, argv=0xffbeeafc) at nsAppRunner.cpp:1300
Joe: Any idea what was at 0xfcaaf9f0 - part of the Java plugin, the objectFrame, an applet? Similar offset as the last crash you reported. Can you confirm that the java plugin is still loaded at the time of the crash?
*** Bug 77957 has been marked as a duplicate of this bug. ***
*** Bug 77875 has been marked as a duplicate of this bug. ***
Ahh...good news. After spending hours on the phone with Joe Chou from Sun, it turns out that my patch does solve the problem of caching the Java plugin but the string I was comparing to was incorrect (Should have been "Java" instead of "Java Plug-in"). Making that change to the patch prevents the crash. However....I don't think this is the way we should solve this problem. The "correct" way would be to have the Java plugin return PR_TRUE for nsPluginInstanceVariable_DoCacheBool for GetValue. Stanley, that's where you come in as I understand you make changes to the Java Plugin but only in regular intervals.
Comments? Should we check-in my patch temporary till the Java plugin can fixed AND released....or create a better way of detecting Java?
Talk to Ed Burns and Stanley Ho. They added the nsPluginInstanceVariable_DoCacheBool on Sept 14 because of problems starting up cached applets - Stanley <REALLY> didn't want java instances cached, but maybe that has changed.
They didn't want it so bad that they wanted to remove plugin caching completely. That nsPluginInstanceVariable_DoCacheBool variable was the compromise that left caching intact.
Ahh...I see....Well, I don't know how to fix this crash without caching the Java plugin. What bug # was that? Can someone try with my [modified] patch? What to do?
One could also use QueryInterface() to determine if you're dealing with the Java plugin. Java plugins must implement the nsIJVMPlugin interface.
That nsPluginInstanceVariable_DoCacheBool bug is: http://bugzilla.mozilla.org/show_bug.cgi?id=50547 Does anyone know what is actually being accessed that has already been deleted? Is it a plugin, a plugin instance or the plugin module? Is this a refcnt bug?
I tested Peter's patch and it does prevent the crash, however after leaving the page with java, I can no longer get any Java to run on any subsequent page. So, it is good that the crash is gone, it is bad that Java stops working. BTW: beard's idea of the QI instead of the string search is great: much more reliable and less vulnerable to change.
Beard good idea! From talking with Marc, looks like even though the crash is fixed, there are still huge problems. I'll attempt to create another patch, this time to GetValue when doCacheBool is the case. If the plugin returns FALSE, I'll do the QI to check for Java and if that works, then return TRUE reguardless of what the Java plugin wants until such time it's fixed on the other side. Ed, you worked on the other bug, what are your feelings about taking this approach?
I have a hunch that this problem would be solved by re-compiling the java plugin on unix with current mozilla headers.
I've tried Peter's second patch, and the result is the same as what Marc saw: no crash, but Java stopped working after that. By the way, I also tried mozilla08.1 binary. Hitting the back button after loading an applet page did not crash the browser, but then hitting the forward button (to go back to the applet page) did.
I also tried mozilla08, and the problem was not there. I could toggled Back and Forward buttons without errors. Therefore, the problem seems to be introduced between mozilla08 and 08.1.
So, Joe, to be clear, what you are saying is that my second patch fixes THIS bug but there is still an underlying bug about going Back and Forward that was introduced between Mozilla 0.8 and Mozilla 0.8.1. Is there a bug already on THAT issue? If not, I suggest getting reviews and approval to check this patch in and open another bug on that problem. We probably shouldn't ship 0.9 with that not working either, but that's not my decision. My guess is that there is a problem in the caching logic but I'm not sure.
Joe, then I'd suspect that the root of the problem is the fix for bug 45009 which made it into 0.8.1. Is the java plugin doing anything in response to nsIPlugin::Shutdown()? Feels like I'm beating a deadhorse but we really shouldn't cache applet instances. I think we just need to prevent calling TryUnloadPlugin() on the java library once the instance count reaches zero. Has that at least been attempted?
Peter: Actually your second patch (same as the first patch) does stops the crashing, but the browser also stops working with Java (can't load any Java applets afterwords). Sean: can you elaborate on your suggestion a little more please (prevent calling TryUnloadPlugin() on the java library once the instance count reaches zero)? If you put up a patch, I'll give it a try.
Joe: for a quick test, just comment out the implementation of nsPluginTag::TryUnloadPlugin in nsPluginHostImpl.cpp and try a simple test that loads an applet, go to a page without any applets, go back and then exit.
Sean: it worked! I put in the change, and I could toggled between Back and Forward button without problems now. No crash, and Java still worked. So your assumsion is correct. What is next? (I got to go now, but I'll check it online from home later).
TryUnloadPlugin does essentially two things: shutdown/release and unload the lib itself. Which one should be skipped to fix the crash? If this unloading lib from memory than we can try to employ the mechanism to postpone this which mechanism already exists. To quickly test this you can do the following in nsActivePluginList::remove implementation: // xpconnected plugins from the old world should postpone unloading library // to avoid crash check, if so add library to the list of unloaded libraries - if(pluginTag->mXPConnected && (pluginTag->mFlags & NS_PLUGIN_FLAG_OLDSCHOOL)) + if(1) { pluginTag->mCanUnloadLibrary = PR_FALSE; if(aUnloadLibraryLater) *aUnloadLibraryLater = PR_TRUE; }
Just yesterday I was talking to someone who anecdotally said something to the effect that every bug had 2 causes. 1) must ensure that the Java plugin is not shutdown/released/unloaded (yet to be determined whether shutdown/release needs to be avoided) by the plugin host even though the plugin instance count hits 0. This alone will prevent applet crashes in ~nsObjectFrame. 2) nsObjectFrame (nsObjectFrame.cpp) and pluginInstanceOwner (nsPluginViewer.cpp) should switch from holding a plugin instance strong ref to a weak ref if the plugin host is to continue shutdown/release/unload of plugins when their instance counts reach 0. This should prevent the crashes in ~nsObjectFrame with Real player as reported in the talkbacks listed on the 23rd. I couldn't repro those on Win2K, so that's just another hunch.
I like your idea of switching to weak reference in the pluginInstanceOwner. This looks less hackery to me. From the other hand, there is still no guarantee that somebody else in future will not hold it strong, and we leave the potential possibility for this kind of crash to repeat. Should we vote I would cast my voice for the number two.
I'd go for the safer thing too - remove the call to TryUnloadPlugin from nsActivePluginList::remove and leave the one in nsPluginTag::~nsPluginTag as the only one. Also remove all the *unusedLibrary* stuff? Would need to see how javascript:navigator.plugins.refresh behaves.
That would be risky. I introduced it first time specifically to solve problems with this javascript command. Refreshing after _updating_ a plugin dll to a newer version, not just adding it. Also, this mechanism proved to be useful in another case -- our new way to make old style plugins scriptable. It will crash upon leaving the page without this unusedlibrary stuff.
And we shall not remove TryUnloadPlugin because the spec says very clearly that we shutdown the plugin after the last instance of it is gone.
Is there a way to find out the plugin instance count before calling TryUnloadPlugin()? If so, then we will only call TryUnloadPlugin() when the count is > 0. Would that be good enough?
*** Bug 78052 has been marked as a duplicate of this bug. ***
Andrei, I misread what your vote for number 2 actually meant. In number 2 there are 2 options. Remove the strong ref or stop the host from unloading plugin libraries until shutdown. If you remove the strong ref, then you still need a hack to keep the Java library loaded. If you go the route of stopping the call to TryUnloadPlugin from within nsActivePluginList::remove then no Java specific hack is required. But as you noted, removing the strong ref from nsObjectFrame is no guarantee that similar crashes won't occur when other things that hold instance refs call release. Andrei wrote: >our new way to make old style plugins scriptable. It will crash >upon leaving the page without this unusedlibrary stuff. I don't think that's the case if you don't call TryUnloadPlugin until shutdown (solely from the nsPluginTag dtor). nsPluginTags aren't destroyed until xpcom- shutdown, therefore the libraries that plugin instances come from can still handle release calls made on the instances. Just because the plugin host calls nsIPluginInstance::Destroy doesn't mean the plugin instance has been removed from memory. >And we shall not remove TryUnloadPlugin because the spec says very clearly >that we shutdown the plugin after the last instance of it is gone. The problem is that the plugin host does NOT know when the last instance is gone. Only when the last instance has been stopped. The whole problem here is that difference - we shouldn't unload the plugin library until the last reference to the instance has been *released*. It doesn't matter that the last instance has been *stopped* if there are still outstanding references to it (things that hang onto plugin instances are nsObjectFrame, nsPluginViewer, xpcWrappedNative, stuff in the DOM). Joe wrote: >If so, then we will only call TryUnloadPlugin() when the count is > 0. That's sure to cause crashes whenever a user leaves a page that has any plugin, no?
Sean: Now I am confused. (Sean wrote: "...I think we just need to prevent calling TryUnloadPlugin() on the java library once the instance count reaches zero...")
Joe, perhaps, your "when the count is > 0" should read "when the count is = 0"?
Sean, as I understand it the presense of outstanding references to the plugin instances is quite orthogonal to the number of running instances, and what you are saying (correct me if I am wrong) is that we should not unload the dll from memory untill the last reference is released (which is not the same thing as last instance is stopped and destroyed). This basically means that we confess that plugin dlls may stay resident indefinitely. The only 100% guarantee they are unloaded is exiting application. I can live with that, although those who think footprint may not agree. What I am mostly concerned about is nsIPlugin::Shutdown call which according to the spec we are obliged to fire upon the desruction of the last instance of the plugin. Maybe TryUnloadPlugin should become TryShutdownPluginObject or something like that. If we take this, I agree, we can get rid of all the 'unusedlibrary' stuff which is kind of alien.
Joe: As things are now, TryUnloadPlugin is called when the plugin instance count reaches zero and at xpcom-shutdown (the call at shutdown won't do anything if the first call was made). By "prevent calling TryUnloadPlugin() on the java library once the instance count reaches zero" I meant don't make that first call to TryUnloadPlugin - I didn't mean to imply that we should start calling it when there's a non-zero instance count.
Andrei, calling Shutdown when the plugin host's reckoning of the instance count reaches 0 might be ok. It would be good to know what call in TryUnloadPlugin is causing problems for Java (Shutdown, Release or PR_UnloadLibrary). Peter mentioned that there's special loader code for Java. I wasn't able to find it. If it exists, then it likely does a PR_LoadLibrary on the Java plugin. If that's the case, then it's unlikely that the PR_UnloadLibrary in TryUnloadPlugin is the problem since NSPR refcnts the module (that is, the plugin host has 'unloaded' the module but it is still loaded due to the code that explicitly started up Java much in the same way that when the plugin host calls PR_UnloadLibrary on my xpcom module, it is not unloaded since there is the outstanding load done by the component manager). That would leave either nsIPlugin::Shutdown or nsIPlugin::Release as the cause of java applet crashes. I don't know how the Java plugin is implemented or how the authors interpreted the nsIPlugin::Shutdown documentation. My interpretation of the doc at http://www.mozilla.org/docs/plugin.html places more relevance on the qualification made at the end of the second sentence than on the other part which you place more importance on: *Navigator calls this function* once after the last instance of your plug-in is destroyed, *before unloading the plug-in library itself.* Note also that the documentation for nsIPlugin::Initialize says: After the last instance of the plug-in has been deleted, Navigator calls Shutdown. That means that Shutdown shouldn't be called until all instance references have been released and all instances deleted. I suppose part of the problem here is the ambiguity of this phrase in Shutdown: once after the last instance of your plug-in is destroyed Is that destroyed as in the plugin instance destructor (as stated in the Initialize documentation) or destroyed as in nsIPluginInstance::Destroy()? My interpretation is that there will be no more instance calls after Shutdown. In my plugin, in order to prevent crashes at shutdown due to numerous plugin instance leaks in N6 and 6.01, my nsIPlugin::Shutdown cleans up all outstanding objects. If Java does the same, then calling Shutdown before all objects are not released is not safe since Release calls on applet instances will deref freed memory.
Sean, I see what you mean. Just because the instances go away, doesn't mean the references (like from the DOM) have gone away as with Java and possibly other scriptable plugins. Someone correct me if I'm wrong, but the OJI plugin once started isn't supposed to be shutdown until app shutdown? This was the case with Java in 4.x but that's completely different. Calling shutdown when the refcount goes to zero sounds like a good idea.
And what to do with the obligation to "call shutdown after the last instance is destroyed"? I still don't see a way to solve this other than to use barometer length to measure the height of the building. Change spec? Forbid using strong references?
*** Bug 78164 has been marked as a duplicate of this bug. ***
To answer Sean's earlier question (which of the call in TryUnloadPlugin that caused the crash: Shutdown(), Release() or PR_UnloadLibrary()?) I did some more trace, and found out that it was the call to PR_UnloadLibrary() causing the crash. Inside PR_UnloadLibrary(), the Java plugin lib (libjavaplugin_oji.so) was unloaded. Before unloaded, lib->refCount was 1. In the test case, there was one applet.
I don't think we can forbid the use of strong refs on plugin instances throughout mozilla. The spec does need to be clarified because as it is now, it's ambiguous and depending on the interpretation can't be implemented at this point. AFAIK XPCOM module unloading is not supported at this time in mozilla due to strong ref problems like this. The host needs to be able to find out from the plugin if it is safe to call Shutdown (the plugin will need to account for all of the objects that are held by mozilla), but there's no infrastructure for this and should be explored. As far as calling shutdown after the last instance is destroyed, I think calling nsIPlugin::Shutdown at xpcom-shutdown better meets this requirement. I don't see where the spec necessitates that shutdown be called as soon as the last instance is destroyed, only that there is a sequence of events and the Shutdown call occurs between 2 events: after the last instance is destroyed and before the module is unloaded. By waiting until xpcom-shutdown to make the Shutdown call, you are more likely to have fulfilled the requirement that the last instance is destroyed than if Shutdown is called when the last instance is stopped (but still no guarantee). It would be nice to have some unbiased third party opinion on these larger issues. What's the immediate goal? Resolve all the issues or get a bandaid together for 0.9 (a patch that prevents the unloadLibrary call for Java and that turns the nsObjectFrame instance reference from strong to weak)?
Joe, just to clarify, if you comment out the PR_UnloadLibrary, then Java is happy (back/forward testcase)?
Yep. If the call to PR_UnloadLibrary(mLibrary) is commented out, there is no problem to go back and forth among applet and non-applet pages. How about getting a patch to fix the crash problem here, and opening another bug to clean up the whole issue?
cc:ing some XPCOM folks.
Any comments on what to do about this bug next?
Joe, does this patch work? http://bugzilla.mozilla.org/showattachment.cgi?attach_id=32513 What needs to be done to fix this bug and others like it?
Re-assigning bug to myself. This bug needs to be fixed soon so 0.9 can go out the door. What is actually needed to be done here? Joe says: >>Yep. If the call to PR_UnloadLibrary(mLibrary) is commented out, there is no >> problem to go back and forth among applet and non-applet pages. So what if there was an additional check if for the NSPR refcnts on the module before it gets unloaded?
Assignee: James.Melvin → peterlubczynski
Status: ASSIGNED → NEW
For the patch of the crash, correct me if I am wrong here. In TryUnloadLibrary(), if it is Java plugin, then do not call PR_UnloadLibrary(). (What about Shutdown() and Release() there, should those calls be avoided too?) About changing nsObjectFrame from strong instance to weak instance, can we leave it to the complete cleanup (in a seperate bug)? Peter, whatever the outcome of the discussion of the patch, when you come up with a patch, I'll test it. For the complete cleanup of the issue that have been broght up here, can Sean or Andrei write a new bug, please?
I've tried Peter's last patch, and it seems working fine.
Java Plug-in cannot be unloaded once it is started, until the process exits. This is because the VM as well as the JDK libraries will have multiple threads running, and unloading the VM in the middle of browser session will be very tricky to do without crashing or deadlock. I think not calling TryToUnloadPlugin will be the proper workaround.
To get this off the radar, can I get reviews for patch #32513 unless someone else has input on how to better fix this for mozilla 0.9?
Status: NEW → ASSIGNED
Keywords: patch
Whiteboard: critical to mozilla 0.9 → [critical to mozilla 0.9] [seeking reviews]
Testing the MIME type string is fragile. I'd prefer we actually test the nsIPlugin itself and see if it implements nsIJVMPlugin. You can do this test once up front. Why are we eagerly unloading plugins anyway? Shouldn't they only be unloaded if the window is closed, and we are tossing history?
Oops, my mistake. The line under // XXX should read: nsCOMPtr<nsIJVMPlugin> isJava = do_QueryInterface(mEntryPoint); Joe, can you verify that this doesn't cause a crash on shutdown or the service manager not being held past XPCOM shutdown?
With that change, r=beard, assuming it fixes the problem.
*spam* adding myself to the cc: list... i'd like to see a solution to this as well...
I replaced the line of code (plus added a line to include the header), and I was able to go back and forth among applet pages (one applet or two applets pages),and non-applet pages. Any other test cases I should try? Peter, you seemed to have some test cases in mind when you said, "verify that this doesn't cause a crash on shutdown or the service manager not being held past XPCOM shutdown", right?
By the way, if you have access to a linux trunk build, you can apply the patch, then build an optimized build, and try your own test cases. To build an optimized build, simply add a file, .mozconfig to your mozilla directory. The file should contain the following lines: ac_add_options --disable-tests ac_add_options --disable-debug ac_add_options --enable-optimize #ac_add_options --enable-mathml #uncomment to enable mathml Basically the file will turn off the debug mode for the build.
Joe, trying this late last night on Windows, I was seeing refcount problems with the plugin host that Sean and Andrei describe. Basically, if Java is installed on Windows, the plugin host has an extra reference to it and therefore does not get destroyed at XPCOM shutdown (and is leaked) even though Shutdown() and Release are called. I guess you aren't seeing this and neither are other people, so perhaps it's something lurking in my tree. I had problems updating with CVS too last night, will try again today. Actually, this was happening with and without this patch so I think it's something else but related. So I assume you r=? If nobody objects, I'll try to get a super review and check this in.
*** Bug 78547 has been marked as a duplicate of this bug. ***
sr=waterson
a= asa@mozilla.org for checkin to the MOZILLA_0_9_BRANCH This just needs a r= right? Can we get this in as soon as the r+ becomes available. Thanks.
This bug is a topcrasher for M09 on Windows Changed OS to All. Here is a recent stack trace from a windows crash: Incident ID 29798225 nsPluginInstanceOwner::~nsPluginInstanceOwner [d:\builds\seamonkey\mozilla\layout\html\base\src\nsObjectFrame.cpp, line 1520] nsPluginInstanceOwner::Release [d:\builds\seamonkey\mozilla\layout\html\base\src\nsObjectFrame.cpp] nsObjectFrame::~nsObjectFrame [d:\builds\seamonkey\mozilla\layout\html\base\src\nsObjectFrame.cpp, line 267] nsObjectFrame::`scalar deleting destructor' nsFrame::Destroy [d:\builds\seamonkey\mozilla\layout\html\base\src\nsFrame.cpp, line 432] GKLAYOUT.DLL + 0xa9cb4 (0x603b9cb4) nsPluginInstanceOwner::AddRef [d:\builds\seamonkey\mozilla\layout\html\base\src\nsObjectFrame.cpp, line 1593]
OS: Linux → All
Summary: Leaving an applet page crashed browser - Trunk [@ nsPluginInstanceOwner::~nsPluginInstanceOwner] → Leaving an applet page crashed browser - Trunk & M09 [@ nsPluginInstanceOwner::~nsPluginInstanceOwner]
Can someone try the patch on Windows, and see if it fixes the crash that janc saw?
I'll try my most recent patch on Win32.
This patch will probably only fix Linux Java crashing. janc, how did you get that stack trace? Is there a testcase for Windows? What plugins? We need more information. Bug 77319 has the same trace but is a bug on Real. I just got the new Real bits an hour go, will try. This bug NEVER crashed on Windows or Mac in the first place, that's why it was so hard to fix.
I just tried my most recent patch with a debug build from 01 May 2001 trunk. Applets worked fine, modulo bug 76921.
Depends on: 76921
Peter is correct. The attached patch will only fix ~nsObjectFrame crashes when an applet is involved. Given that the talkback info posted on the 23rd shows a few Real Player descriptions, the java-specific patch won't address all the nsObjectFrame crashes. See the last comment I posted on the 27th.
Peter please note that nsPluginHostImpl.cpp has changed. I've attached a patch that reflects this change.
edburns, will we need to get the fix for 76921 on the branch for this bug's fix to work on the branch? Please excuse my ignorance. I just want to make sure we get the 0.9 branch working.
Asa, no they are orthogonal. I include the patch as a convenience. I have already requested in n.p.m.builds to have bug 76921 approved for checkin on the branch, as I feel it is important functionality for plugins. If you grant permission for me to check in 76921 on the branch I would greatly appreciate it.
yes please check into the MOZILLA_0_9_BRANCH a= asa@mozilla.org (on behalf of drivers
Asa, do you mean for 76921?
Fine by me, r=peterl
a= asa@mozilla.org for checkin to both this bug and 76921
I see from the comments that this bug is solving the problem only for Linux. However, but 77857 which was reported by me and was marked as a duplicate of this one is talking about Windows 98. So either the problem occurs on Win98 as well or it's not a dup and should be reopened.
checking into the TRUNK: /cvsroot/mozilla/modules/plugin/nglsrc/nsPluginHostImpl.cpp,v <-- nsPluginHostImpl.cpp new revision: 1.233.4.5; previous revision: 1.233.4.4 done Checking in nsPluginHostImpl.h; /cvsroot/mozilla/modules/plugin/nglsrc/nsPluginHostImpl.h,v <-- nsPluginHostImpl.h
Patch checked into the TRUNK. Marking FIXED. /cvsroot/mozilla/modules/plugin/nglsrc/nsPluginHostImpl.cpp,v <-- nsPluginHostImpl.cpp new revision: 1.242; previous revision: 1.241 done Checking in nsPluginHostImpl.h; /cvsroot/mozilla/modules/plugin/nglsrc/nsPluginHostImpl.h,v <-- nsPluginHostImpl.h new revision: 1.53; previous revision: 1.52
Status: ASSIGNED → RESOLVED
Closed: 24 years ago24 years ago
Resolution: --- → FIXED
oops, meant BRANCH on that last comment.
i don't see this problem anymore on the trunk on windows /linux (0503). VERIFIED
Status: RESOLVED → VERIFIED
removing invalid dependency (bug 76921)
No longer depends on: 76921
Reopening since this is still crashing, just somewhere else now. Here's my work around for the new crash on shutdown: Index: nsPluginHostImpl.cpp =================================================================== RCS file: /cvsroot/mozilla/modules/plugin/nglsrc/nsPluginHostImpl.cpp,v retrieving revision 1.233.4.6 diff -u -r1.233.4.6 nsPluginHostImpl.cpp --- nsPluginHostImpl.cpp 2001/05/03 21:06:35 1.233.4.6 +++ nsPluginHostImpl.cpp 2001/05/04 22:50:26 @@ -806,7 +806,9 @@ void nsPluginTag::TryUnloadPlugin(PRBool aForceShutdown) { // XXX This is a hack to keep Java around, see bug 76936 - nsCOMPtr<nsIJVMPlugin> isJava = do_QueryInterface(mEntryPoint); + PRBool isJava = PR_FALSE; + if (mDescription && !strcmp(mDescription, "Java(TM) Plug-in")) + isJava = PR_TRUE; if (isJava && !aForceShutdown) return;
Status: VERIFIED → REOPENED
Resolution: FIXED → ---
How about changing: nsCOMPtr<nsIJVMPlugin> isJava = do_QueryInterface(mEntryPoint); to: nsCOMPtr<nsIJVMPlugin> isJava; if (mEntryPoint) isJava = do_QueryInterface(mEntryPoint);
Checking the string desciption vs. the interface makes a difference? I like Sean's idea better but I thought do_QueryInterface already catches null pointers?
Christopher: How to reproduce your crash to try the new patches?
Working for me with build 2001050408, Linux 2.4.4 i686, RedHat 6.1
I had a look and there are several variants in nsCOMPtr.h/cpp. Apparently, the version that got picked was a version in which no error checkin is made.
I would imagine that to repro, you need the same plugins that he has installed. One of them must not implement nsIPlugin.
If Sean's approach works, I like it better than checking the description. Checking the description is almost guaranteed to break non-sun java plugins.
Leaving aside nsCOMPtr<> for a moment, you could just do: PRBool isJava = PR_FALSE; if (mEntryPoint) { nsIJVMPlugin* jvmPlugin; isJava = NS_SUCCEEDED(mEntryPoint->QueryInterface(NS_GET_IID(nsIJVMPlugin), (void**)&jvmPlugin)); }
+ the NS_IF_RELEASE, of course...
Is the real problem the version of nsCOMPtr.h/cpp? LXR only reveals one set. Like I said, I thought do_QueryInterface is safe to call on null pointers? If not, I like Sean's approach. Is there a testcase to reproduce the crash? I thought do_QI and nsCOMPtr saved you from doing all that as the guide says: http://www.mozilla.org/projects/xpcom/nsCOMPtr.html
Status: REOPENED → ASSIGNED
To reproduce the crash just have the java plugin installed, start the browser, and exit the browser.
I am observing that it isn't safe the optional 'rv' isn't specified. I tried putting an assertion in a random place in the code, say: NS_IMETHODIMP nsDocumentOpenInfo::OnStartRequest(nsIRequest *request, nsISupports * aCtxt) { nsresult rv = NS_OK; ... NS_ASSERTION(0,"Break"); // nsCOMPtr<nsIHTTPChannel> httpChannel(do_QueryInterface(request, &rv)); nsCOMPtr<nsIHTTPChannel> httpChannel(do_QueryInterface(NULL)); ... } and tracing this didn't gave a null httpChannel (I am reading httpChannel.mRawPtr = 0x01abfef4 on the trace :-) But it was safe when I tried: nsCOMPtr<nsIHTTPChannel> httpChannel(do_QueryInterface(NULL, &rv)); It gave null, and a failure error code for rv.
With build 2001050408, everything works fine, except that the browser crashes on about:plugins if the Java plugin is installed; the bottom of the stack trace looks like: #0 0x400df6a4 in nsCOMPtr_base::~nsCOMPtr_base () at eval.c:41 #1 0x40ed7c8d in nsPluginTag::TryUnloadPlugin () from mozilla/components/libgkplugin.so #2 0x40ed7a5f in nsPluginTag::~nsPluginTag () from mozilla/components/libgkplugin.so #3 0x40eda017 in nsPluginHostImpl::ReloadPlugins () from mozilla/components/libgkplugin.so #4 0x404b4c24 in PluginArrayImpl::Refresh () from mozilla/libjsdom.so #5 0x404b2901 in PluginArrayRefresh () from mozilla/libjsdom.so #6 0x4014e0c6 in js_Invoke () at eval.c:41 #7 0x40155005 in js_Interpret () at eval.c:41 #8 0x4014e50f in js_Execute () at eval.c:41 #9 0x40132106 in JS_EvaluateUCScriptForPrincipals () at eval.c:41 #10 0x4048a36d in nsJSContext::EvaluateString () from mozilla/libjsdom.so #11 0x40ca2a43 in HTMLContentSink::EvaluateScript () Is the same bug causing this, or should I open up a new bug?
Sorry peterl, I spoke too quickly. It is (should be) safe. My tests were wrong.
Okay, now this makes a little more sense to me seeing that stack. That stack is for bug 77319 which is highly related to this bug. That stack shows the problem with the nsCOMPtr. For whoever can reproduce this bug try doing a null pointer check as Sean suggests and see if the crash goes away. Also, can someone explain this piece of code in ScanPlugins: // load the plugin's library so we can ask it some questions, but not for Windows #ifndef XP_WIN if (pluginFile.LoadPlugin(pluginLibrary) != NS_OK || pluginLibrary == nsnull) continue; #endif Is this causing some of the PR_[Un]LoadLibrary() problems? Ed, on Win32, I'm getting the service manager held past XPCOM shutdown doing what you describe. Even exit when choose profile causes this. I will the debug bits I got.
Okay, I just got bittin by this bug, but in a different way while debuging another. I see what Sean means if the plugin is 4.x. A QI won't work for a 4.x plugin. That's probably why the string comparision works the best.
Try this and see if it clears up THIS crash: // XXX This is a hack to keep Java around, see bug 76936 PRBool isJava = PR_FALSE; if (mEntryPoint) { nsIJVMPlugin* jvmPlugin; isJava = NS_SUCCEEDED(mEntryPoint->QueryInterface(NS_GET_IID(nsIJVMPlugin), (void**)&jvmPlugin)); if (isJava) NS_IF_RELEASE(jvmPlugin); }
Knowing not to unload Java in a navigator.plugins.refresh() situation is going to be a lot harder to fix. Use bug 77319 for that tracking. That WILL cause a crash because we are very deep in plugin destruction and it's hard to tell that sitation apart from when XPCOM gets shutdown. I think this solution may be too hacky to solve that crash. We must go back to making sure we don't call Shutdown() on the plugin unless it's refcount is down to one. I don't think doing the call to Shutdown() in TryUnload is a good idea anyway. Here's the top of the stack for the reload case: nsPluginTag::TryUnloadPlugin(int 1) line 807 nsPluginTag::~nsPluginTag() line 763 DOMPluginImpl::~DOMPluginImpl() line 3092 + 11 bytes DOMPluginImpl::`scalar deleting destructor'(unsigned int 1) + 15 bytes DOMPluginImpl::Release(DOMPluginImpl * const 0x03b4b530) line 3139 + 154 bytes PluginElementImpl::~PluginElementImpl() line 270 + 27 bytes PluginElementImpl::`scalar deleting destructor'(unsigned int 1) + 15 bytes PluginElementImpl::Release(PluginElementImpl * const 0x03b48a70) line 280 + 154 bytes PluginArrayImpl::Refresh(PluginArrayImpl * const 0x03b4bfc4, int 0) line 202 + 45 bytes PluginArrayRefresh(JSContext * 0x03b26c50, JSObject * 0x02ca4778, unsigned int 1, long * 0x02d2ec98, long * 0x0012e2cc) line 333 + 16 bytes js_Invoke(JSContext * 0x03b26c50, unsigned int 1, unsigned int 0) line 813 + 23 bytes
Nit: some little changes: // XXX This is a hack to keep Java around, see bug 76936 PRBool isJava = PR_FALSE; if (mEntryPoint) { nsIJVMPlugin* jvmPlugin; if (NS_SUCCEEDED(mEntryPoint->QueryInterface(NS_GET_IID(nsIJVMPlugin), (void**)&jvmPlugin))) { NS_RELEASE(jvmPlugin); isJava = PR_TRUE; } }
I tried Peter's last patch, and it seemed working OK (I tried the "about:plugins" and exit rightaway cases). I also tried Christopher's patch, and it seemed working fine, too. So somehow, nsComptr seemed to cause some problem. One thing though, in the exit-right-away case, there were two assertions (for both patches) at the end: nsPluginHostImpl::Observe "xpcom-shutdown" ### beging nsCacheService::Shutdown() ### starting ~nsMemoryCacheDevice() ###!!! ASSERTION: Service Manager being held past XPCOM shutdown.: 'cnt == 0', file nsServiceManager.cpp, line 545 ###!!! Break: at file nsServiceManager.cpp, line 545 CanUnload_enumerate: skipping native GC Cache: hits: 395 203 72 11 31 9 0 1 2 2 hits: 726, misses: 244, hit percent: 74.845360% ###!!! ASSERTION: Component Manager being held past XPCOM shutdown.: 'cnt == 0', file nsXPComInit.cpp, line 505 ###!!! Break: at file nsXPComInit.cpp, line 505
Joe, can you try another plugin. Throw in Flash the Default Plugin or any other 4.x plugin. Acrobat PDF is broken (Bug 78741) on all builds from 4-24-01. Be sure when you are testing the closing case and about:plugins case that you FIRST visit a page with the plugin in question (namely Java or Flash or lack there of to trigger the Default). If that works, then I think rbs' second patch should go in. Yes, there is the problem of XPCOM shutdown I see on Win32. Do you see on Linux? Can you check the refcount of the Java OJI Plugin when ::Observe is called?
Please keep in mind that the mEntryPoint isn't NULL but when you call QueryInterface() on it it will fall over. So you can't use that test. Due to changing symbol sizes and changing structure sizes I suspect that comparing the char * description is the safest bet, assming that the description is the same everywhere.
Correct me if I'm wrong, but don't we wrap 4.x plugins in an XPCOM wrapper anyway (ns4xPlugin*)? Since mEntryPoint is of type nsIPlugin and hence derives from nsIFactory which derives from nsISupports, it should be safe to assume there will be a QI or am I missing something here?
I saw this function in the file: static PRBool isJavaPlugin(nsPluginTag * tag) Is it also a possible alternative here? ============================== No test against nsIJVMPlugin ? nsresult ns4xPlugin::QueryInterface(const nsIID& iid, void** instance) { if (instance == NULL) return NS_ERROR_NULL_POINTER; if (iid.Equals(kIPluginIID)) { *instance = (void *)(nsIPlugin *)this; AddRef(); return NS_OK; } if (iid.Equals(kIFactoryIID)) { *instance = (void *)(nsIFactory *)this; AddRef(); return NS_OK; } if (iid.Equals(kISupportsIID)) { *instance = (void *)(nsISupports *)this; AddRef(); return NS_OK; } return NS_NOINTERFACE; }
No need to test ns4xPlugin for nsIJVMPlugin in QI because it doesn't implement that interface. It's not Java. Take a look at MRJPlugin.h and notice it implements nsIPlugin and the QI. [static PRBool isJavaPlugin(nsPluginTag * tag)] is used to check for the old 4.x npjava plugin. Loading this causes crashes.
Chris wrote: >Please keep in mind that the mEntryPoint isn't NULL but when you call >QueryInterface() on it it will fall over. That's bad. mEntryPoint is supposed to either be NULL or an nsIPlugin*. If it's not a valid nsIPlugin*, then after this crash is addressed, there will be a crash in the Shutdown call which is protected by if (mEntryPoint).
OK peterl, saw it, QI isn't the cause then. sean/blizzard, it is declared as "nsIPlugin *mEntryPoint" so assumptions aren't broken.
Cc'ing scc, a bit late. Are we sure the nsCOMPtr (vs. a raw strong-ref ptr) was a necessary condition for the problem? /be
May be I have found the problem... Preparing a patch so that other could try for confirmation.
This is interesting, but there is no path through |do_QueryInterface| that blindly calls through |NULL|. It all must pass through http://lxr.mozilla.org/seamonkey/source/xpcom/base/nsCOMPtr.cpp#30 From your description of the behavior, it sounds like |mEntryPoint| isn't really a pointer to something that inherits unambiguously from |nsISupports|. Does stepping into the QI jump into limbo? Does the intermediate step look like a vtable? That would imply that it is an object, but the wrong kind. If the intermediate step doesn't even look like a vtable, that would imply the pointer is garbage but non-|NULL|. Here's one possibility: could it be that what |mEntryPoint| points to has already been unloaded? That would explain a lot.
Nope, it eliminates the crash but it *isn't* the correct fix: NS_IMETHODIMP nsPluginHostImpl::Observe(nsISupports *aSubject, const PRUnichar *aTopic, const PRUnichar *someData) { #ifdef NS_DEBUG nsAutoString topic(aTopic); char * newString = topic.ToNewCString(); printf("nsPluginHostImpl::Observe \"%s\"\n", newString ? newString : ""); if (newString) nsCRT::free(newString); #endif // if (NS_LITERAL_STRING(NS_XPCOM_SHUTDOWN_OBSERVER_ID) == aTopic || // NS_LITERAL_STRING("quit-application") == aTopic) { Destroy(); } NS_IF_RELEASE(serviceManager); < where serviceManager is the global SM in return NS_OK; < the code (it is explicitly clear in the < code that there is *no* addref of the GSM) } < nope, this is the wrong fix...
Note this interesting comment from bug #77319 (claimed to be related to this bug) >The crash happens on NS_ADDREF(mInstance) and taking a look at that interface >pointer shows it's vtable to be invalid so no wonder it crashes. What is going >on with this member variable on this page.
Scott, In the case of the Java plugin, at least on Win32, mEntryPoint shows up as deriving from nsIFactory, nsISupports, and there's a vtable. I realized that's all the info the debugger probably can tell as we have no symbols for those optimized bits. Ed sent me the debug bits but they don't even seem to work, probably because of the reason I talk about in bug 78150.
The debugger is showing you the static type of the variable, so it's expected that it _claims_ the inheritance is valid. When you actually look into the vtable and follow the appropriate entry, it should lead to a correct and in-memory implementation of |QueryInterface|. If it doesn't, then you will at least understand the exact mechanism of the crash, though not the reason behind it.
I tried this with a few different plugins in different senerios and anytime that mEntryPoint is not null, there looks to be a valid vtable with pointers. In the 4.x case it's ns4xPlugin and in the Java case, like I thought, it's just 3 pointers to memory because of no symbols. This is on Win32 though, Linux is another story. Can someone step through and examine the vtable just before the crash? Scott, if the Java plugin used an older nsCOMPtr version to build with, could it be possible that NULL gets through? What about this debug vs. optimized in terms of zeroing out pointers?
(a) no. no version in more than a year could fail for |NULL|, but (b) you're calling _your_ |do_QueryInterface|, not one in the plugin, right? This code is in the body of Mozilla somewhere ... not the plugin, and that's the |do_QueryInterface| (or really |nsQueryInterface::operator()|) we're calling, which is just a helper routine that calls QI. If you step into that call, what happens? See in particular, code around the URL I cited above. Just after that is where it actually queries your pointer: http://lxr.mozilla.org/seamonkey/source/xpcom/base/nsCOMPtr.cpp#32 Step into the query... do you get somewhere? Or does just stepping into the query crash you? If you got somewhere, examine the code ... is it really an implementation of |QueryInterface| (compare with dissasembly of a known entry point if symbols are not available). If this is not really an implementation of |QueryInterface|, then that is strong evidence in favor of unloaded code or a trashed |mEntryPoint|.
I hope no one here is under the impression that code responsible for loading or unloading plugins respects strong references to objects whose implementations are provided by that plugin. _That's_ certainly not the case by default. I'll wager that the |nsCOMPtr| is pointing to an object whose implementation has been unloaded. Then, any work done through that pointer, of course, leads to a jump into limbo. And work is guaranteed, since the |nsCOMPtr|s destructor will try to call |Release|.
Here are some data that might perhaps prove helpful. I expanded the ADDREF/RELEASE macros in nsServiceManager.cpp in order to see exactly who was initiating the addref & release. Then, I started SeaMonkey and exited immediately. As far I can tell, it seems the bug is in NPOJI600, for it appears to be doing a QI without a matching |release|: Here is my console output (the source changes to nsServiceManager.cpp are below) ###!!! ASSERTION: ADDREF: '0', file C:\Mozilla\src-m0.9\mozilla\xpcom\components \nsServiceManager.cpp, line 315 Type Manifest File: C:\Mozilla\src-m0.9\mozilla\dist\WIN32_D.OBJ\bin\components\ xpti.dat nsNativeComponentLoader: autoregistering begins. nsNativeComponentLoader: autoregistering succeeded nNCL: registering deferred (0) Initialized app shell component {18c2f989-b09f-11d2-bcde-00805f0e1353}, rv=0x000 00000 WEBSHELL+ = 1 nsPluginHostImpl ctor plugins at: C:\Mozilla\src-m0.9\mozilla\dist\WIN32_D.OBJ\bin\plugins plugins at: C:\Program Files\Netscape\Communicator\Program\Plugins For application/x-java-vm found plugin C:\Mozilla\src-m0.9\mozilla\dist\WIN32_D. OBJ\bin\plugins\NPOJI600.dll ###!!! ASSERTION: ADDREF: '0', file C:\Mozilla\src-m0.9\mozilla\xpcom\components \nsServiceManager.cpp, line 315 WEBSHELL+ = 2 Disabling View Source StyleSheet Enabling Quirk StyleSheet Note: verifyreflow is disabled Style Data Sharing is Enabled :) Note: styleverifytree is disabled Note: frameverifytree is disabled ###!!! ASSERTION: New URI failed: 'Error', file C:\Mozilla\src-m0.9\mozilla\netw erk\base\src\nsProtocolProxyService.cpp, line 302 Disabling View Source StyleSheet Start reading in bookmarks.html Finished reading in bookmarks.html (21000 microseconds) WEBSHELL+ = 3 Disabling View Source StyleSheet Enabling Quirk StyleSheet has multiple monitor apis is 1 blank WEBSHELL- = 2 /content/navigator.xul blank WEBSHELL- = 1 WEBSHELL- = 0 Shut down app shell component {18c2f989-b09f-11d2-bcde-00805f0e1353}, rv=0x00000 000 nsPluginHostImpl::Observe "xpcom-shutdown" ### beging nsCacheService::Shutdown() ### starting ~nsMemoryCacheDevice() ###!!! ASSERTION: RELEASE: '0', file C:\Mozilla\src-m0.9\mozilla\xpcom\component s\nsServiceManager.cpp, line 299 And this is where we get the Service Manager being held pass down msg... Changes to nsServiceManager.cpp (could someone try to reproduce as I am going off-line now) ===================================================== //NS_IMPL_ISUPPORTS1(nsServiceManagerImpl, nsIServiceManager) NS_IMPL_QUERY_INTERFACE1(nsServiceManagerImpl, nsIServiceManager) //NS_IMPL_RELEASE(nsServiceManagerImpl) NS_IMETHODIMP_(nsrefcnt) nsServiceManagerImpl::Release(void) { NS_ASSERTION(0,"RELEASE"); NS_PRECONDITION(0 != mRefCnt, "dup release"); NS_ASSERT_OWNINGTHREAD(nsServiceManagerImpl); --mRefCnt; NS_LOG_RELEASE(this, mRefCnt, "nsServiceManagerImpl"); if (mRefCnt == 0) { mRefCnt = 1; /* stabilize */ NS_DELETEXPCOM(this); return 0; } return mRefCnt; } //#define NS_IMPL_ADDREF(nsServiceManagerImpl) NS_IMETHODIMP_(nsrefcnt) nsServiceManagerImpl::AddRef(void) { NS_ASSERTION(0,"ADDREF"); NS_PRECONDITION(PRInt32(mRefCnt) >= 0, "illegal refcnt"); NS_ASSERT_OWNINGTHREAD(nsServiceManagerImpl); ++mRefCnt; NS_LOG_ADDREF(this, mRefCnt, "nsServiceManagerImpl", sizeof(*this)); return mRefCnt; } The stack trace at the time of the QI by NPOJI600 (I didn't capture the head) ============================================================== NTDLL! 77f9eea9() nsDebug::Assertion(const char * 0x100ecf24, const char * 0x100ecf20, const char * 0x100ecedc, int 315) line 286 + 13 bytes nsServiceManagerImpl::AddRef(nsServiceManagerImpl * const 0x004840d0) line 315 + 34 bytes nsServiceManagerImpl::QueryInterface(nsServiceManagerImpl * const 0x004840d0, const nsID & {...}, void * * 0x00338140) line 294 + 139 bytes JPINS32! 502e5e40() JPINS32! 502e5d41() NPOJI600! 5039104a() nsJVMManager::StartupJVM() line 599 + 32 bytes nsJVMManager::MaybeStartupLiveConnect() line 780 + 20 bytes nsJVMManager::StartupLiveConnect(nsJVMManager * const 0x01580448, JSRuntime * 0x00e16df0, int & 0) line 128 + 11 bytes nsJSEnvironment::nsJSEnvironment() line 1505 + 49 bytes nsJSEnvironment::GetScriptingEnvironment() line 1446 + 27 bytes NS_CreateScriptContext(nsIScriptGlobalObject * 0x0152b6d0, nsIScriptContext * * 0x0155cc10) line 1545 + 5 bytes nsDocShell::EnsureScriptEnvironment(nsDocShell * const 0x0155cb60) line 4656 + 50 bytes nsWebShell::GetInterface(nsWebShell * const 0x0155cb84, const nsID & {...}, void * * 0x0012fbd4) line 329 + 19 bytes nsGetInterface::operator()(const nsID & {...}, void * * 0x0012fbd4) line 37 + 31 bytes nsCOMPtr<nsIDOMWindowInternal>::assign_from_helper(const nsCOMPtr_helper & {...}, const nsID & {...}) line 974 + 18 bytes nsCOMPtr<nsIDOMWindowInternal>::nsCOMPtr<nsIDOMWindowInternal>(const nsCOMPtr_helper & {...}) line 556 nsAppShellService::GetHiddenWindowAndJSContext(nsAppShellService * const 0x00543630, nsIDOMWindowInternal * * 0x0012fc7c, JSContext * * 0x0012fc80) line 713 + 32 bytes nsAppShellService::SetXPConnectSafeContext() line 177 + 40 bytes nsAppShellService::CreateHiddenWindow(nsAppShellService * const 0x00543630) line 248 main1(int 1, char * * 0x00484460, nsISupports * 0x00000000) line 988 main(int 1, char * * 0x00484460) line 1309 + 37 bytes mainCRTStartup() line 338 + 17 bytes KERNEL32! 77e992a6()
Changes in diff -u format: Index: nglsrc/nsPluginHostImpl.cpp =================================================================== RCS file: /cvsroot/mozilla/modules/plugin/nglsrc/nsPluginHostImpl.cpp,v retrieving revision 1.233.4.6 diff -u -r1.233.4.6 nsPluginHostImpl.cpp --- nsPluginHostImpl.cpp 2001/05/03 21:06:35 1.233.4.6 +++ nsPluginHostImpl.cpp 2001/05/05 07:34:55 @@ -806,7 +806,15 @@ void nsPluginTag::TryUnloadPlugin(PRBool aForceShutdown) { // XXX This is a hack to keep Java around, see bug 76936 - nsCOMPtr<nsIJVMPlugin> isJava = do_QueryInterface(mEntryPoint); + PRBool isJava = PR_FALSE; + if (mEntryPoint) { + nsIJVMPlugin* jvmPlugin; + if (NS_SUCCEEDED(mEntryPoint->QueryInterface(NS_GET_IID(nsIJVMPlugin), + (void**)&jvmPlugin))) { + NS_RELEASE(jvmPlugin); + isJava = PR_TRUE; + } + } if (isJava && !aForceShutdown) return; Index: nsServiceManager.cpp =================================================================== RCS file: /cvsroot/mozilla/xpcom/components/nsServiceManager.cpp,v retrieving revision 3.48 diff -u -r3.48 nsServiceManager.cpp --- nsServiceManager.cpp 2000/09/13 23:55:57 3.48 +++ nsServiceManager.cpp 2001/05/05 07:40:10 @@ -289,7 +289,35 @@ } } -NS_IMPL_ISUPPORTS1(nsServiceManagerImpl, nsIServiceManager) +//NS_IMPL_ISUPPORTS1(nsServiceManagerImpl, nsIServiceManager) +NS_IMPL_QUERY_INTERFACE1(nsServiceManagerImpl, nsIServiceManager) + +//NS_IMPL_RELEASE(nsServiceManagerImpl) +NS_IMETHODIMP_(nsrefcnt) nsServiceManagerImpl::Release(void) +{ +NS_ASSERTION(0,"RELEASE"); + NS_PRECONDITION(0 != mRefCnt, "dup release"); + NS_ASSERT_OWNINGTHREAD(nsServiceManagerImpl); + --mRefCnt; + NS_LOG_RELEASE(this, mRefCnt, "nsServiceManagerImpl"); + if (mRefCnt == 0) { + mRefCnt = 1; /* stabilize */ + NS_DELETEXPCOM(this); + return 0; + } + return mRefCnt; +} + +//#define NS_IMPL_ADDREF(nsServiceManagerImpl) +NS_IMETHODIMP_(nsrefcnt) nsServiceManagerImpl::AddRef(void) +{ +NS_ASSERTION(0,"ADDREF"); + NS_PRECONDITION(PRInt32(mRefCnt) >= 0, "illegal refcnt"); + NS_ASSERT_OWNINGTHREAD(nsServiceManagerImpl); + ++mRefCnt; + NS_LOG_ADDREF(this, mRefCnt, "nsServiceManagerImpl", sizeof(*this)); + return mRefCnt; +} NS_IMETHODIMP nsServiceManagerImpl::GetService(const nsCID& aClass, const nsIID& aIID,
*** Bug 79019 has been marked as a duplicate of this bug. ***
OK, guys. I'm about to make all of us feel really dumb. Here's the correct fix: void nsPluginTag::TryUnloadPlugin(PRBool aForceShutdown) { // XXX This is a hack to keep Java around, see bug 76936 PRBool isJava = PR_FALSE; nsCOMPtr<nsIJVMPlugin> jvm; if (mEntryPoint) jvm = do_QueryInterface(mEntryPoint); if (jvm) { isJava = PR_TRUE; // release before we shutdown below jvm = nsnull; } if (isJava && !aForceShutdown) return; if (mEntryPoint) { mEntryPoint->Shutdown(); mEntryPoint->Release(); mEntryPoint = nsnull; } // before we unload check if we are allowed to, see bug #61388 if (mLibrary && mCanUnloadLibrary) PR_UnloadLibrary(mLibrary); // we should zero it anyway, it is going to be unloaded by // CleanUnsedLibraries before we need to call the library // again so the calling code should not be fooled and reload // the library fresh mLibrary = nsnull; } The problem is that the nsCOMPtr destructor is trying to call Release() on a pointer that was in a library that was unloaded by PR_LoadLibrary(). I didn't notice this until I used step instruction through the entire function. I think that when looking at the stack traces everyone including myselft thought it was when we called _into_ the function since the nsCOMPtr::~nsCOMPtr call would always show up at the beginning of the function since that's where the destructor is located as far as the debugger is concerned. So much for stack traces. Looking for real reviews here and testing. I think this is right, though.
Do we really believe that only JVMs will want to use threads, or might be bothered by abrupt unloading? I'm really nervous about this approach. I'd prefer that we either never unload plugins, or that we _always_ unload plugins, and require all plugins that want to avoid that behaviour to use a new- style plugin which returns the right thing from nsIModule::CanUnload. (We currently don't ever unload components, even when they report that they can be unloaded, because lots of our components suck and lie to us.) What does 4.x do here? How does the Sun Java(TM) Plugin avoid crashing there?
Checking the string on the name of the plugin doesn't crash because it doesn't try to release after the module has been unloaded.
>Do we really believe that only JVMs will want to use threads, or might be >bothered by abrupt unloading? No. There are MANY issues related to this bug and bug 77319 that need to be worked out yet. I'd prefer seeing all loaded plugin modules remain loaded until xpcom-shutdown (see various posts throughout this bug and 77319); temporarily until a CanUnload solution is impl'ed/tested. >Checking the string on the name of the plugin doesn't crash because it doesn't >try to release after the module has been unloaded. yep. Also explains why the patch from rbs (which used mEntryPoint but not via nsComPtr) worked. Sorry for throwing everyone off track with my assumption that the QI was failing due to a NULL ptr. Now: Why didn't the patch cause a shutdown crash before it was checked into the 0.9 branch?
blizzard: other than what I'm assuming is a small typo in your final comment (CleanUnsedLibraries) it looks good to me. [s]r=scc (on blizzard's implentation provided in his comment at 2001-05-05 10:43), whichever you need.
I still crash on exit after visiting java.sun.com with a CVS build which had blizzard's latest patch in it. Info on my system: o Debian GNU/Linux 2.2 (Sid) o glibc 2.2.2 o gcc-2.95.3 o Sun's Java SDK, Standard Edition. Version 1.3.0_02 I crash with the latest pre-built 0.9 release candidate too.
blizzard's nsCOMPtr-friendly patch is equivalent to mine, except that he is explicitly destroying the plugin (i.e., mEntryPoint). Another way to write his patch is: PRBool isJava = PR_FALSE; if (mEntryPoint) { nsCOMPtr<nsIJVMPlugin> jvm; jvm = do_QueryInterface(mEntryPoint); if (jvm) { isJava = PR_TRUE; // release before we shutdown below jvm = nsnull; } } I was under the impression that the particularity with the Java plugin was that it needed to be kept around until shut-down? It would be great if the plugin could indeed be destroyed at once... In that case, one would also need to do: jvm = nsnull; + mEntryPoint = nsnull; //mEntryPoint isn't declared as a nsCOMPtr This way, the following "if (mEntryPoint)" will always fail and there is never going to be an attempt at double destruction if the code ever reaches there. And speaking of this, is setting "jvm = nsnull" the cleaner way to destroy the plugin. Will the dtor do the right thing here? It seems one has to call Shutdown() and then Release()? ==== As for the crash at exit, has anybody took a look at the service manager stuff that I described above?
rbs, I only changed it to use an nsCOMPtr. The release of mEntryPoint is only done if PR_TRUE is passed into TryUnloadPlugin which is only done at shutdown so the jvm still isn't being released until then. The QI that do_QueryInterface() does does an addref and I do the release which means that I don't even unload early at least as I understand it.
OK, got it. And with the fact that nsCOMPtr<> will auto-release when it goes out of scope, the version of blizzard's nsCOMPtr patch that I would prefer would be: PRBool isJava = PR_FALSE; if (mEntryPoint) { nsCOMPtr<nsIJVMPlugin> jvm jvm = do_QueryInterface(mEntryPoint); if (jvm) { isJava = PR_TRUE; } } Some will even prefer nsCOMPtr<nsIJVMPlugin> jvm(do_QueryInterface(mEntryPoint)) To summarize: the mistake/lesson from earlier attempts was to use the nsCOMPtr 'jvm' as a flag; i.e., hold an extra ref to an object that we are trying to destroy. Very bad or "really dumb" to quote blizzard. But seeing the large number of people cc:ed, let's give to that the benefit of the sheer heat of the last m0.9 bug :-) ==== [Repeating] As for the crash at exit, has anybody took a look at the service manager stuff that I described above?
rbs, could you attach a patch so andre can try it out on his system as he seems to be the only one stil seeing a problem. thanks
Actually, I am not sure what andre's problem is. To be more precise, what I am referring to is an assertion: "service manager held pass down". And this assertion arises when the crash in nsPluginTag::TryUnloadPlugin() in fixed (either with string compare, raw QI, nsCOMPtr QI). In other words, the assertion won't show up in release builds. The assertion I am referring to is also mentioned in the comments from "Joe Chou 2001-05-04 19:54" and "Peter Lubczynski 2001-05-04 20:08" who added that it seemed to be win32-specific. Because it is just an assertion in debug builds, there shouldn't be more issues for m0.9 as far as release is concerned, but it shows that there is a leak of the service manager. So if there is an easy fix that the Java people know about, then that will be fine to get it too. Otherwise, it is fine too :) that wouldn't be the first and last leak anyway :) The reason I suggested for the leak is in my comments at "rbs@maths.uq.edu.au 2001-05-05 00:24" and the patch that will help to trace it is the one that changes nsServiceManager.cpp which I added in my following post -- it just expands ADDREF/RELEASE in nsServiceManager.cpp and shows that there is a pending release that should have been fired by NPOJI600 (it may be too late to investigate for m0.9, but could serve as a starting point).
Attached below is a backtrace that I finally managed to get with a cvs snapshot of gdb. This backtrace was done right after mozilla crashed on exit after visiting java.sun.com.
Same bug with releasing, just keeps moving around. That's the trace I got in bug 77319. Looks like we ADDREF in GetInstance which I think may be causing the crash then try to release after Shutdown.
Andre, the patch hasn't been applied to the m0.9 branch. Now wonder that the crash is still there. Folks, could I checkin this one that I have in my tree? C:\Mozilla\src-m0.9\mozilla\modules\plugin\nglsrc>cvs diff cvs server: Diffing . Index: nsPluginHostImpl.cpp =================================================================== RCS file: /cvsroot/mozilla/modules/plugin/nglsrc/nsPluginHostImpl.cpp,v retrieving revision 1.233.4.6 diff -r1.233.4.6 nsPluginHostImpl.cpp 809c809,817 < nsCOMPtr<nsIJVMPlugin> isJava = do_QueryInterface(mEntryPoint); --- > PRBool isJava = PR_FALSE; > if (mEntryPoint) { > nsIJVMPlugin* jvmPlugin; > if (NS_SUCCEEDED(mEntryPoint->QueryInterface(NS_GET_IID(nsIJVMPlugin), > (void**)&jvmPlugin))) { > NS_RELEASE(jvmPlugin); > isJava = PR_TRUE; > } > }
Could someone _please_ tell me why we want this fix at all, rather than just turning off all plugin unloading until we can find a system (like nsIModule has) that won't require us to put special hacks in the code for every plugin that wants to use threads? Why isn't this as simple as just removing the PR_UnloadLibrary call?
I don't know. Drivers?
Actually, this could be the orriginal reason for unloading: http://bugzilla.mozilla.org/show_bug.cgi?id=58128 4.x style plugins need to be "shutdown" when the last instance goes away. Isn't OJI treated as a 4.x style plugin but it can't be unloaded. On the flip side, correct me if i'm wrong, XPCOM stay loaded once loaded. And then, to complicate matters, calling nav.plugins.refresh() needs to try to shutdown plugins (all,I think) that are running because their libraries may have been replaced by another updated version or removed.
The last peterl's comment is a real threat. We will definitely crash on javascript:navigator.plugins.refresh(1) after updating the plugin if the plugin dll is not unloaded. And this particular functionality has been wanted for a long time.
Sure, being able to unload and reload plugins is a great thing to have, but we _don't_ have it if we don't have it safely. Why not ask the plugin if it's safe to unload? nsIModule::canUnload gives you that, as something to follow. (We don't currently unload C++ modules, because many of our modules are crap and lie to us when we ask them if it's safe, but don't get me started on that. It's not a flaw in the architecture, just a flaw in our general coding practices.) When you put this in, did you really think that it would be safe to just unload 3rd-party software like that? That no plugins would have timers or other threads to worry about? (From reading 58218, it looks like we have to call NPP_Shutdown on 4.x plugins, but I can't find anything that says we have to unload them. 73071 is analogous, but for new-style plugins. And, for new- style plugins, peterl says in January that we _shouldn't_ call NPP_Shutdown for XPCOM plugins. Very confusing.) "Just like 4.x but..." bothers me a lot: we have a new plugin API, lots of which was "designed" to meet the needs of Sun's JVM plugin, and yet we have fragile stuff like this, special cased to avoid one specific piece of fragility. Why isn't the OJI plugin using the New Plugin API, and being kept in by returning false from its module's canUnload hook? This plan, and the stuff that led up to it, just reeks of bandaid, to me. (I'm more than willing to walk away from navigator.plugins.refresh(1), or have it only pick up new plugins that aren't already known, if we absolutely can't find a way to ask a plugin if it's ready to be unloaded. But it's not like this is rocket science, IMO.)
rbs, please file a new bug about the service manager leak wrt java. It's a seperate issue than what's being dealt with in this bug. shaver, open a new bug if you want to talk about the architecture. Yes, there are problems but there's so much noise here it's difficult to talk about it. I want to see it fixed the right way as much as the next guy but let's deal with it seperately. I think we're shipping as is for now. It crashes less anyway. edburns, you need to look at andre's stack trace since he's reporting there's still a problem and deal with it here. Or open a new bug since this one is so noisy and link the two with each other with a comment.
I'll file a bug later, but I have precisely zero confidence that we'll ever do the right thing here once the 0.9-topcrash heat is taken off. Another special- case hack for Sun's JVM in our plugin code, instead of making Sun respect the semantics of the API they chose to use, whee.
> ----- Additional Comments From Peter Lubczynski 2001-05-04 23:19 ------- [...] > optimized bits. Ed sent me the debug bits but they don't even seem to work, > probably because of the reason I talk about in bug 78150. Peter, do you have JDK1.3.1 rc1 and only JDK1.3.1 rc1 installed on your system? Those plugin bits only work with jdk1.3.1 rc1. Please de-install all other Javas on your system and go to <http://developer.java.sun.com/servlet/SessionServlet? url=http://developer.java.sun.com/developer/earlyAccess/j2sdk131/> and get 1.3.1 rc2. Unfortunately 1.3.1 rc1 is no longer offered.
Mike Shaver wrote: > Why isn't the OJI plugin using the New Plugin API, and being kept > in by returning false from its module's canUnload hook? This plan, and the > stuff that led up to it, just reeks of bandaid, to me. Doch, Sun's java plugin does use the new interface: // CNetscapePlugin.h by Stanley Man-Kit Ho // [...] #ifndef CNETSCAPEPLUGIN_H #define CNETSCAPEPLUGIN_H class CPluginView; class CNetscapePlugin : public nsIPluginInstance { // [...] }; // CPluginModule.h by Stanley Man-Kit Ho // [...] #ifndef __PLUGINMODULE_H_ #define __PLUGINMODULE_H_ #include "resource.h" // main symbols #include "oji_clsid.h" ///////////////////////////////////////////////////////////////////////////// // CPluginModule class ATL_NO_VTABLE CPluginModule : // [...] super secret inheritance deleted...shhhh public IPluginModule { public: CPluginModule() {} // [...] public: // IPluginModule STDMETHOD_(nsresult, NSGetFactory) (const nsCID &aClass, nsISupports *aSupport, nsIFactory **aFactory); STDMETHOD_(PRBool, NSCanUnload) (void); STDMETHOD_(nsresult, NSRegisterSelf) (const char* path); STDMETHOD_(nsresult, NSUnregisterSelf) (const char* path); }; // JavaPluginFactory.h by Stanley Man-Kit Ho // [...] class CJavaPluginFactory : public nsIJVMPlugin, public nsIPlugin { // [...] }; But it doesn't implement nsIModule.
If it's using the XPCOM plugin interface, then it must be an XPCOM component, right? Which means that something must implement nsIModule (or NSGetFactory, but I told you to get rid of that many milestones ago, and it's going to break hard in 0.9.1, so I'll ignore than possibility), right? And we don't unload any modules once we've created them, and we never _will_ until shutdown, if they return false from canUnload. So, from the fact that this bug is happening at all, I think it's pretty clear that either: - the Sun OJI plugin doesn't really use the new-style plugin interface, or - the new-style plugin interface is savagely broken in ways I didn't previously suspect -- and my suspicions on that front are quite well-developed -- and actually _does_ unload XPCOM plugins. The plot thickens.
Just to reassure Mike: XPCOM plugins do not actually get unloaded when the plugin host thinks it is unloading them (nsPluginTag::TryUnloadPlugin) because the component manager continues to hold a ref to the nsIModule and nsIPlugin.
Sean: that's what I expected, but thanks for the reassurance. So, if we're seeing this crash, and Sun's plugin is using the XPCOM plugin interfaces, what the heck is going on? I'll be quite surprised, and even more frightened, if it's an XPCOM bug.
Reference has been made to special code that preloads Java. Maybe need to look at that (if it exists)?
Wow, if there's still a reference in the component manager then we probably shouldn't be calling PR_UnloadLibrary() on any XPCOM-based plugin modules. Aeii!
The plugin host also does a PR_LoadLibrary. So as long as the refcnt code on libraries is working, then the plugin host's call to PR_UnloadLibrary *should* be ok, right?
Please see: http://bugzilla.mozilla.org/show_bug.cgi?id=79196 about the long term unloading issue. This bug is just too noisy.
I did some more tracing on Peter's checked in patch: ... nsCOMPtr<nsIJVMPlugin> isJava; if (mEntryPoint) { isJava = do_QueryInterface(mEntryPoint); } //if (isJava && !aForceShutdown) return; if (isJava) return; ... and found out that the flag, aForceShutdown, played a decisive role: a)In the case of "about:plugins", isJava was non-null, but was 1, therefore skipped : if (isJava && !aForceShutdown) return; and went to unload and crashed. b) In the case of leaving an applet page (and Back/Forward with other pages), isJava was non-null, and aForceShutdown was 0, therefore returned at: if (isJava && !aForceShutdown) return; and no crash. Therefore, I took out the checking for aForceShutdown, and it worked in both cases, plus the case of "launch and quite". No crashes. Did that mean anything?
If someone wants to try the Linux debug build of Mozilla with the debug build of Java plugin, there two things you may need to know: 1) Get a recent trunk build. My build was checked out on last Friday, at it was the first time that a debug mozilla ever worked with a debug Java plugin for me. The same Java plugin did not work with the earlier builds of mozilla. 2) Expect longer startup time of mozilla with a debugger. In my case (using gdb), it took longer than half an hour (no kidding!) to just bring up the browser. It used around 99% cpu and most of the memory most of the startup time. I have a pretty fast pentium with 256 mb of memory. After the browser came up, speed went back to normal, but most of the memory was still occupied.
Sun's plugin implements these interfaces to be an XPCOM component: STDMETHOD_(nsresult, NSGetFactory) (const nsCID &aClass, nsISupports *aSupport, nsIFactory **aFactory); STDMETHOD_(PRBool, NSCanUnload) (void); STDMETHOD_(nsresult, NSRegisterSelf) (const char* path); STDMETHOD_(nsresult, NSUnregisterSelf) (const char* path); It doesn't implement nsIModule. I'll ask Stanley to explain why they don't implement nsIModule.
Joe, this is what I was worred about with aForceShutdown. I didn't realize that ReloadPlugin takes almost the same code path as actually shutting down so aForceShutdown being true sill force the Java XPCOM plugin to shutdown and unload pre-maturaly. You can't remove it all together because we do need to shutdown for 4.x but not for XPCOM. Hm...I have another idea, do the reverse.... try doing a modification of the QI for Java. Instead of bailing when we have a nsIJVMPlugin, do the oppposite: do a QI for a ns4xPlugin and if that fails, bail, because we are an XPCOM plugin and service manager should shut us down. What about that approach to solve this? Skips all this unloading for XPCOM plugins but keeps parity with 4.x.
I also tried to just take out the "!": ... if (isJava && aForceShutdown) return; ... and it also seemed to work OK, passed all three test cases.
Joe, try something like: What if we did something similar to: PRBool isXPCOM = PR_FALSE; if (mEntryPoint) { nsCOMPtr<nsI4xPlugin> the4xPlugin the4xPlugin = do_QueryInterface(mEntryPoint); if (!the4xPlugin) { isXPCOM = PR_TRUE; } } if (isXPCOM) return; // bail if XPCOM My need to call Release() first, I forget what the refcount is of an XPCOM plugin at this point in execution. Re-assigning to blizzard@mozilla.org to get it off my plate. Manager says I need to concentrate to fix Windows bugs for 0.9.1 embedding and this is bug is Linux (although VERY important). Please re-assign if you feel otherwise. BTW, what is the current status of patches check-in to the trunk/branch on this bug?
Assignee: peterlubczynski → blizzard
Status: ASSIGNED → NEW
Peter: I tried the last patch (using 4x plugin), and it crashed when leaving an applet. Taking out only "!" from your checkined fix seemed working for all three cases (leaving an applet page, launch and exit, go to about:plugins), as the following: old: if (isJava && !aForceShutdown) return; new: if (isJava && aForceShutdown) return;
Joe, my bad. There is no interface nsI4xPlugin, but rather ns4xPlugin implements nsIPlugin which is probably why that patch failed. In fact, what did you try since nsI4xPlugin won't compile? It makes sense that removing the bang from aForceShutdown fixes the problem as the code below is NEVER executed for the Java plugin. It prevents the crash...but only for Java. Other XPCOM plugins (like Real) will surely crash here. Hm....I'm going on a hunch here that bailing for any XPCOM plugin is the right thing to do. Can anyone back me up? Joe, can you try modifying my last patch. Instead of the doing the QI, check to see if mEntryPoints can be cast to an ns4xPlugin. If you are able to get a ns4xPlugin out of mEntryPoints then we have a 4.x plugin for sure and that rest of the code needs to be called. I'll do some more testing when I get home.
Assuming you mean dynamic_cast<>, you can't use that in Mozilla: not all of our compilers support it, and, in fact, we turn off RTTI on Linux explicitly. (I don't think static_cast<> or reinterpret_cast<> will tell you anything useful.) Maybe we do need a (dummy) nsI4xPlugin interface, perhaps with an Unload/Reload method where all of this logic could live?
>I'm going on a hunch here that bailing for any XPCOM plugin is the right >thing to do. Can anyone back me up? Bailing during the ReloadPlugins sequence? yes. Bailing at xpcom-shutdown for properly implemented xpcom plugins? No way. If you do that, then just get rid of nsIPlugin::Initialize and nsIPlugin::Shutdown altogether. They weren't even being called until a couple days before 0.8.1 was branched (bug 45009).
Wow. Just read back over bug 45009 and this bug now makes a whole lot more sense. Isn't that the bug that introduced holding the service manager past XPCOM shutdown as Andrei says that nsPluginHostImpl's dtor isn't called and nsIObserver was added. Ed, looks like you started with the patches, what's your take? cc:ing Waterson on the hook for the sr= Question for the experts: When is the correct time to call nsIPlugin->Shutdown()? Shouldn't refcouting handle when shutdown gets called in XPCOM land? Should we call nsIPlugin->Shutdown before the RELEASE? For 4.x plugins it is a different case as nsIPlugin->Shutdown can be called not only at shutdown, but during ReloadPlugins and when the last instance gets Destroyed. Sean says that "PR_FindSymbol(pluginTag->mLibrary, "NSGetFactory"); is not satisfactory for determining if a plugin module is an xpcom plugin or a 4x plugin". Why? Will that not easily distinguish XPCOM plugins from 4.x at the library level? I will try creating another band-aid patch to test. I'm almost all out of band-aids, though. :-)
>Sean says that "PR_FindSymbol(pluginTag->mLibrary, "NSGetFactory"); >is not satisfactory for determining if a plugin module is an xpcom plugin or a >4x plugin". Why? Will that not easily distinguish XPCOM plugins from 4.x at the >library level? NSGetFactory was deprecated a long time ago and is going away in 0.9.1. XPCOM components don't use NSGetFactory, they use NSGetModule. >I will try creating another band-aid patch to test. I'm almost all out of >band-aids, though. :-) 0.9 was released today so I don't think we need another bandaid on account of it.
Peter: re bug 45009, I recommend talking to Andrei about it, since the fix that was checked in differed substantially from the patch I started with. My interest in bug 45009 was to un-block bug 49510.
SPAM: mozilla 0.9 (and M1, and 0.8.1, etc.) has left the building. please update the target milestone so we can get a good idea of what's left for 0.9.1.
Patches look to be checked in and I think this is working. Marking FIXED. Please re-open if you think otherwise. Note: bug 76936 will now deal with the crash in @nsPluginInstanceOwner::GetInstance
Status: NEW → RESOLVED
Closed: 24 years ago24 years ago
Resolution: --- → FIXED
Oops. I mean bug 80105.
works fine on linux/win 20010517 trunk builds. Junruh, pls verify on mac. Thx!
OS: All → Mac System 9.x
QA Contact: shrir → junruh
Hardware: PC → Macintosh
Verified on Mac netscape 6 2001051708
Status: RESOLVED → VERIFIED
Please reopen : this is a regression about:plugin crashes Mozilla Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.3; MultiZilla v1.4.0.1) Gecko/20030317
Yves Lambert, <about:plugins> is not "an applet page" and not even a page with plugins, it just lists them, it doesn't use plugins. Please file a new bug (even if it had the same symptoms, but it doesn't).
Product: Core → Core Graveyard
Crash Signature: [@ nsPluginInstanceOwner::~nsPluginInstanceOwner]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: