Closed Bug 76936 Opened 23 years ago Closed 23 years ago

Leaving an applet page crashed browser - Trunk & M09 [@ nsPluginInstanceOwner::~nsPluginInstanceOwner]

Categories

(Core Graveyard :: Java: OJI, defect)

PowerPC
Mac System 9.x
defect
Not set
critical

Tracking

(Not tracked)

VERIFIED FIXED
mozilla0.9

People

(Reporter: joe.chou, Assigned: blizzard)

References

()

Details

(Keywords: crash, topcrash, Whiteboard: [critical to mozilla 0.9] [seeking reviews])

Crash Data

Attachments

(6 files)

Running a recent trunk build (at least after April 13th) on Linux and Solaris,
after loading a page with applets (i.e., www.java.sun.com), then either hit the
Back button or modify thr URL to go to another page, the browser will crash.

*** This bug has been marked as a duplicate of 75070 ***
Status: UNCONFIRMED → RESOLVED
Closed: 23 years ago
Resolution: --- → DUPLICATE
------- Additional Comments From edburns@acm.org 2001-04-12 15:21 -------

I'm finding that it's segfaulting when NS_RELEASING the nsIPluginInstance that
is the java plugin.  This is here:

nsObjectFrame.cpp:1492

  if (nsnull != mInstance)
  {
    if (mPluginHost)
      mPluginHost->StopPluginInstance(mInstance);
    NS_RELEASE(mInstance);
  }

This translates to the release method in 

/net/jano/export/disk02/deployment/ws/ladybird/ext/plugin/oji-plugin/src/motif/navig5/JavaPluginInstance5.cpp

I believe.  I'm re-assinging this to Jim so his team can look at it.

Ed





------- Additional Comments From edburns@acm.org 2001-04-12 15:44 -------

*** Bug 75515 has been marked as a duplicate of this bug. ***



------- Additional Comments From edburns@acm.org 2001-04-12 15:47 -------

It turns out that in the case of a page with more than one applet, only the
NS_RELEASE(mInstance) call on the last applet will crash.



------- Additional Comments From Robert Kaiser 2001-04-13 07:28 -------

This seems to be happening for everyone on *ix Systems when exiting a page with
a Java Plugin, interpreting what I read in bug 75515.
Should we nominate this for 0.9?



------- Additional Comments From edburns@acm.org 2001-04-13 14:41 -------

change subject to be more accurate.



------- Additional Comments From edburns@acm.org 2001-04-13 14:42 -------

*** Bug 75862 has been marked as a duplicate of this bug. ***



------- Additional Comments From edburns@acm.org 2001-04-13 14:45 -------

Jim, can you please have someone from your team investigate this bug?

Thanks,

Ed



------- Additional Comments From edburns@acm.org 2001-04-13 14:51 -------

*** Bug 73541 has been marked as a duplicate of this bug. ***



------- Additional Comments From edburns@acm.org 2001-04-16 22:51 -------

*** Bug 75565 has been marked as a duplicate of this bug. ***



------- Additional Comments From edburns@acm.org 2001-04-16 22:52 -------

There are at least four, and probably more bugs that are marked as dup of this 
bug.  This bug is real important.

Ed

------- Additional Comments From jpatel@netscape.com 2001-04-17 14:21 -------

Adding crash, topcrash keywords, Trunk and [@ libgklayout.so] to summary.  This 
crash has been on the Talkback topcrash list under that stack signature for a 
few days now.

The libgklayout.so crash is very easily reproducible, just go to this url:
http://www.elendor.net

Here are a few Talkback entries for that crash:

libgklayout.so + 0x5991d (0x4048891d) ecd5fd41
         line 
        Build: 2001041106 CrashDate: 2001-04-12 UptimeMinutes: 87  Total: 425 
        OS: Linux 2.4.1
         Detailed : http://climate/reports/incidenttemplate.cfm?bbid=29004983
         StackTrace: 
http://climate/reports/stackcommentemail.cfm?dynamicBBID=29004983
     (29004983) URL: www.elendor.net
     (29004983) Comments: switched from www.elendor.net to cnn.com... This has 
been happening for several days and involves going to any java applet holding 
page and then leaving it. The applets run fine

libgklayout.so + 0x599cd (0x4103c9cd) 95475b04
         line 
        Build: 2001041208 CrashDate: 2001-04-12 UptimeMinutes: 1  Total: 48 
        OS: Linux 2.2.14-5.0smp
         Detailed : http://climate/reports/incidenttemplate.cfm?bbid=29021948
         StackTrace: 
http://climate/reports/stackcommentemail.cfm?dynamicBBID=29021948
     (29021948) URL: clicked roofing materials link
     (29021948) Comments:  http://www.reroofingshowroom.com/




------- Additional Comments From edburns@acm.org 2001-04-18 13:16 -------

This is happening on Solaris with today's trunk.





------- Additional Comments From Joe Chou 2001-04-18 18:32 -------

This problem happened on trunk build on Unix, but not on OEM build. Based on the
crash stack, the problem of TalkBack reports, going to "http://www.elendor.net"
then hit Back button and crashed, has the identical stack (see below). Also the
crash happens as long as you are leaving an applet page (to a non-applet page or
another applet page or any page), and you don't have to click Back button, you
may change the URL and return to crash it. I have not tried on Waterfall yet,
since my build still has compiling problems, and I tried to launch George's
build without success.

Here is the top of the crash stack:
0xfb9a7eb8 in nsPluginInstanceOwner::~nsPluginInstanceOwner (this=0x753668, 
    __in_chrg=3) at nsObjectFrame.cpp:1506
1506        NS_RELEASE(mInstance);
(gdb) where
#0  0xfb9a7eb8 in nsPluginInstanceOwner::~nsPluginInstanceOwner (
    this=0x753668, __in_chrg=3) at nsObjectFrame.cpp:1506
#1  0xfb9a8474 in nsPluginInstanceOwner::Release (this=0x753668)
    at nsObjectFrame.cpp:1583
#2  0xfb9a2df4 in nsObjectFrame::~nsObjectFrame (this=0x7ca4dc, __in_chrg=3)
    at nsObjectFrame.cpp:267
#3  0xfb960a98 in nsFrame::Destroy (this=0x7ca4dc, aPresContext=0x725cc0)
    at nsFrame.cpp:427
#4  0xfb9594f4 in nsContainerFrame::Destroy (this=0x7ca4dc, 
    aPresContext=0x725cc0) at nsContainerFrame.cpp:98
#5  0xfb9a3b54 in nsObjectFrame::Destroy (this=0x7ca4dc, aPresContext=0x725cc0)
    at nsObjectFrame.cpp:471
#6  0xfb998368 in nsLineBox::DeleteLineList (aPresContext=0x725cc0, 
    aLine=0x7cab80) at nsLineBox.cpp:251
#7  0xfb942738 in nsBlockFrame::Destroy (this=0x7ca378, aPresContext=0x725cc0)
    at nsBlockFrame.cpp:1240
#8  0xfbb3dd6c in nsFrameList::DestroyFrames (this=0x7ca350, 
    aPresContext=0x725cc0) at nsFrameList.cpp:41
 #9  0xfb9594e4 in nsContainerFrame::Destroy (this=0x7ca31c, 
    aPresContext=0x725cc0) at nsContainerFrame.cpp:95
#10 0xfbb3dd6c in nsFrameList::DestroyFrames (this=0x80afa0, 
    aPresContext=0x725cc0) at nsFrameList.cpp:41



------- Additional Comments From Brendan Eich 2001-04-19 11:05 -------

What's the diagnosis?  Memory corruption?  Ref-count underflow (one too many
RELEASES on what mInstance pointed to)?

/be



------- Additional Comments From Joe Chou 2001-04-19 11:18 -------

Based on the preliminary trace, in both a crashing case and a non-crash case,
the destructor was invoked twice, and in both times the pointer of minstance
seemed valid (not null nor strange value), but it crashed in the second time of
calling the release in the crashing case. It seems a memory corruption, but need
further investigation. 

Status: RESOLVED → NEW
Ever confirmed: true
Status: NEW → ASSIGNED
Resolution: DUPLICATE → ---
This bug is created since the original problem of 75070 is a different problem,
which is a duplicate of 74531. This bug is to address a more serious problem
that the browser crashes when leaving any applet page.
Whiteboard: critical to mozilla 0.9
adding keywords from bug 75070, adding mozilla0.9 to reflect status whiteboard
comment.
summary of 75070 was "Trunk crash [@ libgklayout.so](SEGV) upon leaving a
java-enabled page (was: Segmentation fault (libgklayout.so), debugfriendly build?)"
*** Bug 77163 has been marked as a duplicate of this bug. ***
Joe, I think the reason that ~nsPluginInstanceOwner is called twice is: there 
are two applets in the page.  I think the crash happens when the refcount goes 
to zero.
*** Bug 77220 has been marked as a duplicate of this bug. ***
Adding Trunk [@ nsPluginInstanceOwner::~nsPluginInstanceOwner] to summary for 
tracking.  Here is some info about the crash from today's Talkback topcrash 
report:

nsPluginInstanceOwner::~nsPluginInstanceOwner   24 
     First BBID :29071286
     Last BBID  :29481330
     Min Runtime :43
     Max Runtime :25086
     First Appearance Date : 2001-04-13
     Last Appearance Date : 2001-04-22
     First BuildID : 2001041306
     Last BuildID : 2001042207

Some user commments and URLs:
     (29481330) URL: www.rgbgallery.com
     (29455441) URL: www.space.com
     (29455441) Comments: Pulling up space.com homepage after failing to get 
NASA TV streaming via the Real plugin at http://www.unitedspacealliance.com
     (29455158) URL: www.answermed.com
     (29291627) URL: 
http://www-4.ibm.com/software/events/keynotes/lvg/lvg_100k2.html
     (29291627) Comments: leaving the site after playing by clicking on a link
     (29290566) URL: 
http://www-4.ibm.com/software/events/keynotes/lvg/lvg_100k1.html
     (29290566) Comments: this time it was already stopped
     (29290519) URL: 
http://www-4.ibm.com/software/events/keynotes/lvg/lvg_100k1.html
     (29290519) Comments: While the video is running
     (29290159) Comments: starting a realplayer in a window
     (29289963) Comments: sending a mail when switching to another mozilla 
window with a realplayer in it (stopped)
     (29283816) Comments: launcing a video from yahoo crashes.
     (29085103) URL: news/foo.com (crash on startup)
     (29071286) URL: www.vw.com
     (29071286) Comments: I was closing the real player plugin powered RadioVW 
when it crashed the browser

Summary: Leaving an applet page crashed browser → Leaving an applet page crashed browser - Trunk [@ nsPluginInstanceOwner::~nsPluginInstanceOwner]
we gotta get this fixed for 0.9...  anyone know what's going on yet, 
or what the patch might be?
Target Milestone: --- → mozilla0.9
cc'ing Peter Lubczynski for possible plugin involvement.
Eww...thanks for the cc:...It could be one of my checkins to nsObjectFrame.cpp 
but since it's been happening since the 12th and Ed Burns says:

>>------- Additional Comments From edburns@acm.org 2001-04-12 15:47 -------
>>
>>It turns out that in the case of a page with more than one applet, only the
>>NS_RELEASE(mInstance) call on the last applet will crash.

It could also possibly be some other checkins to plugins which shutdown plugins 
when the last instance is destroyed or because of recent fixed to when Java 
starts (not till needed). cc:ing Andrei and Sean
Gussing for a fix would be to make sure that 
mPluginHost->StopPluginInstance(mInstance) isn't called for the last instance of 
Java???

Taking a look.....does someone have java and a fresh build on their Linux 
machine? Does this happen on Windows or Mac too?
This crashing problem does not happen on Windows.
nsPluginHostImpl.cpp changed on Apr 11 for bug 74485.  Part of the patch 
affected nsPluginHostImpl::StopPluginInstance() which is called from 
nsObjectFrame.cpp right before the NS_RELEASE of the plugin instance.

If the crash is actually occurring in the call to nsIPluginInstance->Release 
then it seems like there's a ref cnt problem and the mInstance value, which was 
once valid, is no longer valid.  Joe: can you confirm this?

What does the java plugin instance return for nsIPluginInstance->GetValue
(nsPluginInstanceVariable_DoCacheBool, ...)?
I'll give it a shot (removing the patch of 74485) on Solaris today, ans see if
the problem would go away.
Happens to me on a fresh pull on Linux today
Sean, I put some printf's in on someone machine and I confrim that the Java 
plugin is NOT being cached. I bet expanding the check to Java would solve this 
crash. I'll see if I can create a patch but what's the status of mozilla0.9?
Sean, I see why you are asking that. If it returns "don't cache", and I beleive 
it does, then StopPluginInstance will remove it from the active instance list, 
and if this is a last instance of the plugin we also fire TryUnloadPlugin which 
calls nsIPlugin::Shutdown and releases nsIPlugin. And then we come back to the 
nsPluginInstanceOwner::~nsPluginInstanceOwner and do this NS_RELEASE(mInstance). 
Could this be a problem? I would expect no instances existing by this point.
Yes Andrei, that's right. I'm trying to see how to fix this from the plugins 
side but I think it looks messy. I don't recall an easy way to check if a plugin 
instance is java without checking it's mime-type. Perhaps either the plugin can 
be modified to set the cache variable or perhaps when we create the Java 
instance, we set the variable.
Some of the talkbacks mention RealPlayer - so I don't think a Java specific 
solution is right - unless those urls that mention Real are also using Java.

Talking out loud: Seems like there's a refcnt problem with plugins on unix.  
But also seems like 
  NS_RELEASE(mInstance) 
shouldn't happen after calling 
  host->StopPluginInstance(mInstance); 
since if a plugin isn't cached, we try to unload the module (in which case the 
refcnt problem is on the non-unix platforms?).
OK, I think we may finally have some scant cycles to help with this bug.
What would be most helpful from the Java Plug-in team at this point?
I do not have Mozilla code expertise on my team, so we could mostly help
from the plugin side of things.  Suggestions?  Requests?

- Jim Melvin
On Linux and Solaris, are PR_UnloadLibrary() (like that in 
nsPluginTag::TryUnloadPlugin) calls protected by OS refcnts like they are on 
Windows?  If they aren't, then that could be a problem that wouldn't be 
manifested on Windows.

I think that the NS_RELEASE(mInstance) calls in both nsObjectFrame.cpp and 
nsPluginViewer.cpp should be moved up one line before the StopPluginInstance 
calls:
  if (mInstance)
    mInstance->Release;
  if (mPluginHost) // the plugin host still has a reference to the instance
    mPluginHost->StopPluginInstance(mInstance);
  mInstance = nsnull;
Peter, I don't understand how Java is different from other plugins in this 
respect. And why this is specific to Unix. Looking at the code I cannot see any 
mismatched addref/release's. In fact, release in the nsPluginInstanceOwner 
destructor makes sense since we addref it in nsPluginInstanceOwner::SetInstance.
Sean, I think PR_UnloadLibrary should be refcounted by NSPR. But I think your 
suggestion might work. Can somebody try it, please?
Andrei, yep NSPR handles the PR_UnloadLibrary nicely.
I've tried removing the patch of 74485, and it did not help, still crashed at
the same spot.

I am going to tried Sean's suggestion next.
Tried Sean's suggestion, releasing mInstance before calling
StopPluginInstance(), still crash with the following stack (see below after the
code change):

in nsObjectFrame.cpp:
...
nsPluginInstanceOwner::~nsPluginInstanceOwner()
{
  PRInt32 cnt;

  // shut off the timer.
  if (mPluginTimer != nsnull) {
    mPluginTimer->Cancel();
  }

  if (mInstance)
      mInstance->Release();
  if (mPluginHost)
      mPluginHost->StopPluginInstance(mInstance);
  mInstance = nsnull;
...

and in nsPluginViewer.cpp:
...
  if (nsnull != mInstance)
  {
    PRBool doCache = PR_TRUE;

    // determine if the plugin wants to be cached
    mInstance->GetValue(nsPluginInstanceVariable_DoCacheBool,
                        (void *) &doCache);
    mInstance->Stop();
    if (!doCache) {
      // if not, destroy the instance
      mInstance->Destroy();
    }
    mInstance->Release();
    if (doCache) {
      nsCOMPtr<nsIPluginHost> host;
      host = do_GetService(kCPluginManagerCID);
      if(host)
        host->StopPluginInstance(mInstance);
    }
    mInstance = nsnull;
  }
...

Program received signal SIGSEGV, Segmentation fault.
[Switching to LWP 4]
0xfc1af9f0 in ?? ()
(gdb) where
#0  0xfc1af9f0 in ?? ()
#1  0xfd953c0c in g_io_unix_dispatch (source_data=0x756520, 
    current_time=0xffbee670, user_data=0x635020) at giounix.c:135
#2  0xfd9558e8 in g_main_dispatch (current_time=0xffbee670) at gmain.c:656
#3  0xfd956170 in g_main_iterate (block=-40390908, dispatch=1) at gmain.c:874
#4  0xfd956384 in g_main_run (loop=0x19e730) at gmain.c:932
#5  0xfda3f9ac in gtk_main () at gtkmain.c:476
#6  0xfdbbfb84 in nsAppShell::Run (this=0x7d308) at nsAppShell.cpp:360
#7  0xfdcc66a4 in nsAppShellService::Run (this=0xab800)
    at nsAppShellService.cpp:407
#8  0x230e8 in main1 (argc=1, argv=0xffbeeafc, nativeApp=0x0)
    at nsAppRunner.cpp:1005
#9  0x24440 in main (argc=1, argv=0xffbeeafc) at nsAppRunner.cpp:1300
So the crash moved?  Any idea what was at 0xfc1af9f0 - part of the Java plugin? 
an nsIPluginInstance, nsIPlugin, nsIModule?  If you step though the 
StopPluginInstance call, does the pluginHost call PR_UnloadLibrary on the Java 
plugin?  Try a breakpoint in nsPluginTag::TryUnloadPlugin 
(nsPluginHostImpl.cpp).
What about using the NS_IF_RELEASE macro?
If the problem was that mInstance was no longer valid, then the NS_IF_RELEASE
macro offers no protection since it just checks to see that the argument is
non-null, not that it's pointing to valid memory.  It could be non-null and
invalid if the plugin library was unloaded or the plugin decided to delete the
plugin instance even though the instance had outstanding references.
The call to mInstance->Release() before calling
host->StopPluginInstance(mInstance) in that last code fragment is bogus. As far 
as your code is concerned, mInstance should be considered dead after calling 
Release(), as you may have decremented the refCount to 0 at that point.
Yeah it's bogus.  This means that the plugin host can't unload plugin libraries
since it doesn't have any means of determining whether there are outstanding
references to plugin instances.  An alternative is to make the
pluginInstanceOwner's mInstance a weak reference that doesn't get addref'd or
released (the plugin host already has a strong reference to the instance).
I am just wondering why this showed up only a week ago? The biggest suspect of 
course would be my fix to 74484 which happened to be not the case. What else?
bug 77319 looks similar - WinNT / default plugin.
Blocks: 77319
To bring us back on track.....Sean I think you were most correct. I set a 
breakpoint (in Windows) at nsPluginTag::TryUnloadPlugin and sure enough it gets 
called along with PR_UnloadLibrary.

I think (please correct me if I'm wrong) but the Java plugin must not be 
unloaded once it is loaded. This is probably causing the crash as I looked at 
the refcount it looks correct (at least on Windows).

Since I can't reproduce this, let me try creating an experimental patch for 
someone to try for TryUnloadPlugin to NOT return if the tag is Java. Comments 
Andrei/Sean?

Oh, and I couldn't reproduce the stack in bug 77319 on Windows. I got another 
stack for which I attached a patch.
Sounds good.  You mentioned that there was a code change recently regarding 
when java gets started up.  Where does that code live?
What do you mean by 'TryUnloadPlugin to NOT return if the tag is Java'? To not 
unload the library? When is it going to be unloaded then? We already have a 
mechanism to postpone unloading libraries, but this one look a special case, it 
needs to not be unloaded till the application exits, right?
Ok, my dirty patch should treat Java like a cached plugin in 
StopPluginInstance() so it shouldn't try to unload the library untill we shut 
down.

Now testing on Linux....(please try as well)

Andrei, TryUnloadPlugin() was the wrong place to check. I think this patch may 
prevent or delay the crash.
Whoops - guess I wasn't following correctly.  I think that the java plugin 
instances should not be cached but rather the plugin module should not be 
unloaded when the instance count reaches zero.

As I recall, the nsPluginInstanceVariable_DoCacheBool variable was created 
specifically to address problems with caching of applets at Sun's request.

Applying the patch will tell us if we are barking up the right tree though.
I tried Peter's patch, and seemed got the same result as Sean's:

Program received signal SIGSEGV, Segmentation fault.
0xfcaaf9f0 in ?? ()
(gdb) where
#0  0xfcaaf9f0 in ?? ()
#1  0xfe933c0c in g_io_unix_dispatch (source_data=0x72f0a8, 
    current_time=0xffbee670, user_data=0x631098) at giounix.c:135
#2  0xfe9358e8 in g_main_dispatch (current_time=0xffbee670) at gmain.c:656
#3  0xfe936170 in g_main_iterate (block=-23744764, dispatch=1) at gmain.c:874
#4  0xfe936384 in g_main_run (loop=0xc2fa0) at gmain.c:932
#5  0xfdebf9ac in gtk_main () at gtkmain.c:476
#6  0xfd6bfb84 in nsAppShell::Run (this=0x7d308) at nsAppShell.cpp:360
#7  0xfd7c66a4 in nsAppShellService::Run (this=0xaba08)
    at nsAppShellService.cpp:407
#8  0x230e8 in main1 (argc=1, argv=0xffbeeafc, nativeApp=0x0)
    at nsAppRunner.cpp:1005
#9  0x24440 in main (argc=1, argv=0xffbeeafc) at nsAppRunner.cpp:1300
Joe: Any idea what was at 0xfcaaf9f0 - part of the Java plugin, the 
objectFrame, an applet?  Similar offset as the last crash you reported.  Can 
you confirm that the java plugin is still loaded at the time of the crash?
*** Bug 77957 has been marked as a duplicate of this bug. ***
*** Bug 77875 has been marked as a duplicate of this bug. ***
Ahh...good news. After spending hours on the phone with Joe Chou from Sun, it 
turns out that my patch does solve the problem of caching the Java plugin but 
the string I was comparing to was incorrect (Should have been "Java" instead of 
"Java Plug-in"). Making that change to the patch prevents the crash.

However....I don't think this is the way we should solve this problem. The 
"correct" way would be to have the Java plugin return PR_TRUE for 
nsPluginInstanceVariable_DoCacheBool for GetValue. Stanley, that's where you 
come in as I understand you make changes to the Java Plugin but only in regular 
intervals. 
Comments?

Should we check-in my patch temporary till the Java plugin can fixed AND 
released....or create a better way of detecting Java?
Talk to Ed Burns and Stanley Ho.  They added the 
nsPluginInstanceVariable_DoCacheBool on Sept 14 because of problems starting up 
cached applets - Stanley <REALLY> didn't want java instances cached, but maybe 
that has changed.
They didn't want it so bad that they wanted to remove plugin caching 
completely.  That nsPluginInstanceVariable_DoCacheBool variable was the 
compromise that left caching intact.
Ahh...I see....Well, I don't know how to fix this crash without caching the 
Java plugin. What bug # was that? Can someone try with my [modified] patch?

What to do?
One could also use QueryInterface() to determine if you're dealing with the Java 
plugin. Java plugins must implement the nsIJVMPlugin interface.
That nsPluginInstanceVariable_DoCacheBool bug is:
http://bugzilla.mozilla.org/show_bug.cgi?id=50547

Does anyone know what is actually being accessed that has already been 
deleted?  Is it a plugin, a plugin instance or the plugin module?

Is this a refcnt bug?
I tested Peter's patch and it does prevent the crash, however after leaving the
page with java, I can no longer get any Java to run on any subsequent page. So,
it is good that the crash is gone, it is bad that Java stops working. BTW:
beard's idea of the QI instead of the string search is great: much more reliable
and less vulnerable to change.
Beard good idea! 

From talking with Marc, looks like even though the crash is fixed, there are 
still huge problems. I'll attempt to create another patch, this time to GetValue 
when doCacheBool is the case. If the plugin returns FALSE, I'll do the QI to 
check for Java and if that works, then return TRUE reguardless of what the Java 
plugin wants until such time it's fixed on the other side. 

Ed, you worked on the other bug, what are your feelings about taking this 
approach?
I have a hunch that this problem would be solved by re-compiling the java 
plugin on unix with current mozilla headers.
I've tried Peter's second patch, and the result is the same as what Marc saw:
no crash, but Java stopped working after that.
By the way, I also tried mozilla08.1 binary. Hitting the back button after
loading an applet page did not crash the browser, but then hitting the forward
button (to go back to the applet page) did.
I also tried mozilla08, and the problem was not there. I could toggled Back and
Forward buttons without errors. 
Therefore, the problem seems to be introduced between mozilla08 and 08.1.
So, Joe, to be clear, what you are saying is that my second patch fixes THIS bug 
but there is still an underlying bug about going Back and Forward that was 
introduced between Mozilla 0.8 and Mozilla 0.8.1.

Is there a bug already on THAT issue? If not, I suggest getting reviews and 
approval to check this patch in and open another bug on that problem. We 
probably shouldn't ship 0.9 with that not working either, but that's not my 
decision. My guess is that there is a problem in the caching logic but I'm not 
sure.
Joe, then I'd suspect that the root of the problem is the fix for bug 45009 
which made it into 0.8.1.  Is the java plugin doing anything in response to 
nsIPlugin::Shutdown()?  

Feels like I'm beating a deadhorse but we really shouldn't cache applet 
instances.  I think we just need to prevent calling TryUnloadPlugin() on the 
java library once the instance count reaches zero.  Has that at least been 
attempted?
Peter: Actually your second patch (same as the first patch) does stops the
crashing, but the browser also stops working with Java (can't load any Java
applets afterwords).

Sean: can you elaborate on your suggestion a little more please (prevent calling
TryUnloadPlugin() on the java library once the instance count reaches zero)? If
you put up a patch, I'll give it a try.
Joe: for a quick test, just comment out the implementation of 
nsPluginTag::TryUnloadPlugin in nsPluginHostImpl.cpp and try a simple test that 
loads an applet, go to a page without any applets, go back and then exit.
Sean: it worked! I put in the change, and I could toggled between Back and
Forward button without problems now. No crash, and Java still worked. So your
assumsion is correct. What is next? (I got to go now, but I'll check it online
from home later). 
TryUnloadPlugin does essentially two things: shutdown/release and unload the lib  
itself. Which one should be skipped to fix the crash? If this unloading lib from 
memory than we can try to employ the mechanism to postpone this which mechanism 
already exists. To quickly test this you can do the following in 
nsActivePluginList::remove implementation:

  // xpconnected plugins from the old world should postpone unloading library 
  // to avoid crash check, if so add library to the list of unloaded libraries
- if(pluginTag->mXPConnected && (pluginTag->mFlags & NS_PLUGIN_FLAG_OLDSCHOOL))
+ if(1) 
  {
    pluginTag->mCanUnloadLibrary = PR_FALSE;
 
    if(aUnloadLibraryLater)
      *aUnloadLibraryLater = PR_TRUE;
  }
Just yesterday I was talking to someone who anecdotally said something to the 
effect that every bug had 2 causes.

1) must ensure that the Java plugin is not shutdown/released/unloaded (yet to 
be determined whether shutdown/release needs to be avoided) by the plugin host 
even though the plugin instance count hits 0.  This alone will prevent applet 
crashes in ~nsObjectFrame.

2) nsObjectFrame (nsObjectFrame.cpp) and pluginInstanceOwner 
(nsPluginViewer.cpp) should switch from holding a plugin instance strong ref to 
a weak ref if the plugin host is to continue shutdown/release/unload of plugins 
when their instance counts reach 0.  This should prevent the crashes in 
~nsObjectFrame with Real player as reported in the talkbacks listed on the 
23rd.  I couldn't repro those on Win2K, so that's just another hunch.
I like your idea of switching to weak reference in the pluginInstanceOwner. This 
looks less hackery to me. From the other hand, there is still no guarantee that 
somebody else in future will not hold it strong, and we leave the potential 
possibility for this kind of crash to repeat. Should we vote I would cast my 
voice for the number two.
I'd go for the safer thing too - remove the call to TryUnloadPlugin from 
nsActivePluginList::remove and leave the one in nsPluginTag::~nsPluginTag as 
the only one.  Also remove all the *unusedLibrary* stuff?  Would need to see 
how javascript:navigator.plugins.refresh behaves.
That would be risky. I introduced it first time specifically to solve problems 
with this javascript command. Refreshing after _updating_ a plugin dll to a 
newer version, not just adding it. Also, this mechanism proved to be useful in 
another case -- our new way to make old style plugins scriptable. It will crash 
upon leaving the page without this unusedlibrary stuff.
And we shall not remove TryUnloadPlugin because the spec says very clearly that 
we shutdown the plugin after the last instance of it is gone.
Is there a way to find out the plugin instance count before calling 
TryUnloadPlugin()? If so, then we will only call TryUnloadPlugin() when the 
count is > 0. Would that be good enough?
*** Bug 78052 has been marked as a duplicate of this bug. ***
Andrei, I misread what your vote for number 2 actually meant.  In number 2 
there are 2 options.  Remove the strong ref or stop the host from unloading 
plugin libraries until shutdown.  If you remove the strong ref, then you still 
need a hack to keep the Java library loaded.  If you go the route of stopping 
the call to TryUnloadPlugin from within nsActivePluginList::remove then no Java 
specific hack is required.  But as you noted, removing the strong ref from 
nsObjectFrame is no guarantee that similar crashes won't occur when other 
things that hold instance refs call release.

Andrei wrote:
>our new way to make old style plugins scriptable. It will crash 
>upon leaving the page without this unusedlibrary stuff.

I don't think that's the case if you don't call TryUnloadPlugin until shutdown 
(solely from the nsPluginTag dtor).  nsPluginTags aren't destroyed until xpcom-
shutdown, therefore the libraries that plugin instances come from can still 
handle release calls made on the instances.  Just because the plugin host calls 
nsIPluginInstance::Destroy doesn't mean the plugin instance has been removed 
from memory.


>And we shall not remove TryUnloadPlugin because the spec says very clearly 
>that we shutdown the plugin after the last instance of it is gone.

The problem is that the plugin host does NOT know when the last instance is 
gone.  Only when the last instance has been stopped.  The whole problem here is 
that difference - we shouldn't unload the plugin library until the last 
reference to the instance has been *released*.  It doesn't matter that the last 
instance has been *stopped* if there are still outstanding references to it 
(things that hang onto plugin instances are nsObjectFrame, nsPluginViewer, 
xpcWrappedNative, stuff in the DOM).


Joe wrote:
>If so, then we will only call TryUnloadPlugin() when the count is > 0.

That's sure to cause crashes whenever a user leaves a page that has any plugin, 
no?
Sean: Now I am confused. (Sean wrote: "...I think we just need to prevent 
calling TryUnloadPlugin() on the java library once the instance count reaches 
zero...")  
Joe, perhaps, your "when the count is > 0" should read "when the count is = 0"?
Sean, as I understand it the presense of outstanding references to the plugin 
instances is quite orthogonal to the number of running instances, and what you 
are saying (correct me if I am wrong) is that we should not unload the dll from 
memory untill the last reference is released (which is not the same thing as 
last instance is stopped and destroyed). This basically means that we confess 
that plugin dlls may stay resident indefinitely. The only 100% guarantee they 
are unloaded is exiting application. I can live with that, although those who 
think footprint may not agree. What I am mostly concerned about is 
nsIPlugin::Shutdown call which according to the spec we are obliged to fire upon 
the desruction of the last instance of the plugin. Maybe TryUnloadPlugin should 
become TryShutdownPluginObject or something like that. If we take this, I agree, 
we can get rid of all the 'unusedlibrary' stuff which is kind of alien.
Joe: As things are now, TryUnloadPlugin is called when the plugin instance 
count reaches zero and at xpcom-shutdown (the call at shutdown won't do 
anything if the first call was made).  By "prevent calling TryUnloadPlugin() on 
the java library once the instance count reaches zero" I meant don't make that 
first call to TryUnloadPlugin - I didn't mean to imply that we should start 
calling it when there's a non-zero instance count.
Andrei, calling Shutdown when the plugin host's reckoning of the instance count 
reaches 0 might be ok.  It would be good to know what call in TryUnloadPlugin 
is causing problems for Java (Shutdown, Release or PR_UnloadLibrary).  Peter 
mentioned that there's special loader code for Java.  I wasn't able to find 
it.  If it exists, then it likely does a PR_LoadLibrary on the Java plugin.  If 
that's the case, then it's unlikely that the PR_UnloadLibrary in 
TryUnloadPlugin is the problem since NSPR refcnts the module (that is, the 
plugin host has 'unloaded' the module but it is still loaded due to the code 
that explicitly started up Java much in the same way that when the plugin host 
calls PR_UnloadLibrary on my xpcom module, it is not unloaded since there is 
the outstanding load done by the component manager).  That would leave either 
nsIPlugin::Shutdown or nsIPlugin::Release as the cause of java applet crashes.

I don't know how the Java plugin is implemented or how the authors interpreted 
the nsIPlugin::Shutdown documentation.  My interpretation of the doc at 
http://www.mozilla.org/docs/plugin.html places more relevance on the 
qualification made at the end of the second sentence than on the other part 
which you place more importance on:
*Navigator calls this function* once after the last instance of your plug-in is 
destroyed, *before unloading the plug-in library itself.*

Note also that the documentation for nsIPlugin::Initialize says:
After the last instance of the plug-in has been deleted, Navigator calls 
Shutdown.

That means that Shutdown shouldn't be called until all instance references have 
been released and all instances deleted.

I suppose part of the problem here is the ambiguity of this phrase in Shutdown:
once after the last instance of your plug-in is destroyed

Is that destroyed as in the plugin instance destructor (as stated in the 
Initialize documentation) or destroyed as in nsIPluginInstance::Destroy()?

My interpretation is that there will be no more instance calls after Shutdown.  
In my plugin, in order to prevent crashes at shutdown due to numerous plugin 
instance leaks in N6 and 6.01, my nsIPlugin::Shutdown cleans up all outstanding 
objects.  If Java does the same, then calling Shutdown before all objects are 
not released is not safe since Release calls on applet instances will deref 
freed memory.
Sean, I see what you mean. Just because the instances go away, doesn't mean the 
references (like from the DOM) have gone away as with Java and possibly other 
scriptable plugins.

Someone correct me if I'm wrong, but the OJI plugin once started isn't supposed 
to be shutdown until app shutdown? This was the case with Java in 4.x but that's 
completely different.

Calling shutdown when the refcount goes to zero sounds like a good idea.
And what to do with the obligation to "call shutdown after the last instance is 
destroyed"? I still don't see a way to solve this other than to use barometer 
length to measure the height of the building. Change spec? Forbid using strong 
references?
*** Bug 78164 has been marked as a duplicate of this bug. ***
To answer Sean's earlier question (which of the call in TryUnloadPlugin that
caused the crash: Shutdown(), Release() or PR_UnloadLibrary()?) I did some more
trace, and found out that it was the call to PR_UnloadLibrary() causing the
crash. Inside PR_UnloadLibrary(), the Java plugin lib (libjavaplugin_oji.so) was
unloaded. Before unloaded, lib->refCount was 1. In the test case, there was one
applet.
I don't think we can forbid the use of strong refs on plugin instances 
throughout mozilla.  The spec does need to be clarified because as it is now, 
it's ambiguous and depending on the interpretation can't be implemented at this 
point.  AFAIK XPCOM module unloading is not supported at this time in mozilla 
due to strong ref problems like this.  The host needs to be able to find out 
from the plugin if it is safe to call Shutdown (the plugin will need to account 
for all of the objects that are held by mozilla), but there's no infrastructure 
for this and should be explored.

As far as calling shutdown after the last instance is destroyed, I think 
calling nsIPlugin::Shutdown at xpcom-shutdown better meets this requirement.  I 
don't see where the spec necessitates that shutdown be called as soon as the 
last instance is destroyed, only that there is a sequence of events and the 
Shutdown call occurs between 2 events: after the last instance is destroyed and 
before the module is unloaded.  By waiting until xpcom-shutdown to make the 
Shutdown call, you are more likely to have fulfilled the requirement that the 
last instance is destroyed than if Shutdown is called when the last instance is 
stopped (but still no guarantee).

It would be nice to have some unbiased third party opinion on these larger 
issues.  What's the immediate goal? Resolve all the issues or get a bandaid 
together for 0.9 (a patch that prevents the unloadLibrary call for Java and 
that turns the nsObjectFrame instance reference from strong to weak)?
Joe, just to clarify, if you comment out the PR_UnloadLibrary, then Java is 
happy (back/forward testcase)?
Yep. If the call to PR_UnloadLibrary(mLibrary) is commented out, there is no
problem to go back and forth among applet and non-applet pages.

How about getting a patch to fix the crash problem here, and opening another bug
to clean up the whole issue?
cc:ing some XPCOM folks.
Any comments on what to do about this bug next?
Joe, does this patch work?
http://bugzilla.mozilla.org/showattachment.cgi?attach_id=32513

What needs to be done to fix this bug and others like it?
Re-assigning bug to myself. This bug needs to be fixed soon so 0.9 can go out 
the door. What is actually needed to be done here? Joe says:

>>Yep. If the call to PR_UnloadLibrary(mLibrary) is commented out, there is no
>> problem to go back and forth among applet and non-applet pages.

So what if there was an additional check if for the NSPR refcnts on the module 
before it gets unloaded?
Assignee: James.Melvin → peterlubczynski
Status: ASSIGNED → NEW
For the patch of the crash, correct me if I am wrong here. In
TryUnloadLibrary(), if it is Java plugin, then do not call PR_UnloadLibrary().
(What about Shutdown() and Release() there, should those calls be avoided too?)
About changing nsObjectFrame from strong instance to weak instance, can we leave
it to the complete cleanup (in a seperate bug)? Peter, whatever the outcome of
the discussion of the patch, when you come up with a patch, I'll test it.

For the complete cleanup of the issue that have been broght up here, can Sean or
Andrei write a new bug, please?
I've tried Peter's last patch, and it seems working fine. 
Java Plug-in cannot be unloaded once it is started, until the process exits. 
This is because the VM as well as the JDK libraries will have multiple threads 
running, and unloading the VM in the middle of browser session will be very 
tricky to do without crashing or deadlock. I think not calling TryToUnloadPlugin 
will be the proper workaround.
To get this off the radar, can I get reviews for patch #32513 unless someone 
else has input on how to better fix this for mozilla 0.9?
Status: NEW → ASSIGNED
Keywords: patch
Whiteboard: critical to mozilla 0.9 → [critical to mozilla 0.9] [seeking reviews]
Testing the MIME type string is fragile. I'd prefer we actually test the 
nsIPlugin itself and see if it implements nsIJVMPlugin. You can do this test once 
up front. Why are we eagerly unloading plugins anyway? Shouldn't they only be 
unloaded if the window is closed, and we are tossing history?
Oops, my mistake. The line under // XXX should read:

nsCOMPtr<nsIJVMPlugin> isJava = do_QueryInterface(mEntryPoint);

Joe, can you verify that this doesn't cause a crash on shutdown or the service 
manager not being held past XPCOM shutdown?
With that change, r=beard, assuming it fixes the problem.
*spam*

adding myself to the cc: list... i'd like to see a solution to this as well...
I replaced the line of code (plus added a line to include the header), and I was
able to go back and forth among applet pages (one applet or two applets
pages),and non-applet pages.
 
Any other test cases I should try? 

Peter, you seemed to have some test cases in mind when you said, "verify that
this doesn't cause a crash on shutdown or the service manager not being held
past XPCOM shutdown", right?

By the way, if you have access to a linux trunk build, you can apply the patch,
then build an optimized build, and try your own test cases.
To build an optimized build, simply add a file, .mozconfig to your mozilla
directory. The file should contain the following lines:

ac_add_options --disable-tests
ac_add_options --disable-debug
ac_add_options --enable-optimize
#ac_add_options --enable-mathml #uncomment to enable mathml 

Basically the file will turn off the debug mode for the build.
Joe, trying this late last night on Windows, I was seeing refcount problems with 
the plugin host that Sean and Andrei describe. Basically, if Java is installed 
on Windows, the plugin host has an extra reference to it and therefore does not 
get destroyed at XPCOM shutdown (and is leaked) even though Shutdown() and 
Release are called.

I guess you aren't seeing this and neither are other people, so perhaps it's 
something lurking in my tree. I had problems updating with CVS too last night, 
will try again today.

Actually, this was happening with and without this patch so I think it's 
something else but related.

So I assume you r=? If nobody objects, I'll try to get a super review and check 
this in.
*** Bug 78547 has been marked as a duplicate of this bug. ***
sr=waterson 
a= asa@mozilla.org for checkin to the MOZILLA_0_9_BRANCH
This just needs a r= right? Can we get this in as soon as the r+ becomes
available.  Thanks.
This bug is a topcrasher for M09 on Windows
Changed OS to All.

Here is a recent stack trace from a windows crash:

Incident ID 29798225 
nsPluginInstanceOwner::~nsPluginInstanceOwner 
[d:\builds\seamonkey\mozilla\layout\html\base\src\nsObjectFrame.cpp, line 1520] 
nsPluginInstanceOwner::Release 
[d:\builds\seamonkey\mozilla\layout\html\base\src\nsObjectFrame.cpp] 
nsObjectFrame::~nsObjectFrame 
[d:\builds\seamonkey\mozilla\layout\html\base\src\nsObjectFrame.cpp, line 267] 
nsObjectFrame::`scalar deleting destructor' 
nsFrame::Destroy [d:\builds\seamonkey\mozilla\layout\html\base\src\nsFrame.cpp, 
line 432] 
GKLAYOUT.DLL + 0xa9cb4 (0x603b9cb4) 
nsPluginInstanceOwner::AddRef 
[d:\builds\seamonkey\mozilla\layout\html\base\src\nsObjectFrame.cpp, line 1593] 
OS: Linux → All
Summary: Leaving an applet page crashed browser - Trunk [@ nsPluginInstanceOwner::~nsPluginInstanceOwner] → Leaving an applet page crashed browser - Trunk & M09 [@ nsPluginInstanceOwner::~nsPluginInstanceOwner]
Can someone try the patch on Windows, and see if it fixes the crash that janc
saw?
I'll try my most recent patch on Win32.
This patch will probably only fix Linux Java crashing. janc, how did you get 
that stack trace? Is there a testcase for Windows? What plugins? We need more 
information. Bug 77319 has the same trace but is a bug on Real. I just got the 
new Real bits an hour go, will try.

This bug NEVER crashed on Windows or Mac in the first place, that's why it was 
so hard to fix.
I just tried my most recent patch with a debug build from 01 May 2001 trunk.  
Applets worked fine, modulo bug 76921.

Depends on: 76921
Peter is correct.  The attached patch will only fix ~nsObjectFrame crashes when 
an applet is involved.

Given that the talkback info posted on the 23rd shows a few Real Player 
descriptions, the java-specific patch won't address all the nsObjectFrame 
crashes.

See the last comment I posted on the 27th.
Peter please note that nsPluginHostImpl.cpp has changed.  I've attached a patch
that reflects this change.
edburns, will we need to get the fix for 76921 on the branch for this bug's fix
to work on the branch?  Please excuse my ignorance.  I just want to make sure we
get the 0.9 branch working.
Asa, no they are orthogonal.  I include the patch as a convenience. I have 
already requested in n.p.m.builds to have bug 76921 approved for checkin on the 
branch, as I feel it is important functionality for plugins.  If you grant 
permission for me to check in 76921 on the branch I would greatly appreciate it.
yes please check into the MOZILLA_0_9_BRANCH
a= asa@mozilla.org (on behalf of drivers
Asa, do you mean for 76921?
Fine by me, r=peterl
a= asa@mozilla.org for checkin to both this bug and 76921
I see from the comments that this bug is solving the problem only for Linux.
However, but 77857 which was reported by me and was marked as a duplicate of
this one is talking about Windows 98. So either the problem occurs on Win98 as
well or it's not a dup and should be reopened.
checking into the TRUNK:

/cvsroot/mozilla/modules/plugin/nglsrc/nsPluginHostImpl.cpp,v  <--  
nsPluginHostImpl.cpp
new revision: 1.233.4.5; previous revision: 1.233.4.4
done
Checking in nsPluginHostImpl.h;
/cvsroot/mozilla/modules/plugin/nglsrc/nsPluginHostImpl.h,v  <--  
nsPluginHostImpl.h
Patch checked into the TRUNK. Marking FIXED.

/cvsroot/mozilla/modules/plugin/nglsrc/nsPluginHostImpl.cpp,v  <--  
nsPluginHostImpl.cpp
new revision: 1.242; previous revision: 1.241
done
Checking in nsPluginHostImpl.h;
/cvsroot/mozilla/modules/plugin/nglsrc/nsPluginHostImpl.h,v  <--  
nsPluginHostImpl.h
new revision: 1.53; previous revision: 1.52
Status: ASSIGNED → RESOLVED
Closed: 23 years ago23 years ago
Resolution: --- → FIXED
oops, meant BRANCH on that last comment.
i don't see this problem anymore on the trunk on windows /linux (0503). VERIFIED
Status: RESOLVED → VERIFIED
removing invalid dependency (bug 76921)
No longer depends on: 76921
Reopening since this is still crashing, just somewhere else now.

Here's my work around for the new crash on shutdown:

Index: nsPluginHostImpl.cpp
===================================================================
RCS file: /cvsroot/mozilla/modules/plugin/nglsrc/nsPluginHostImpl.cpp,v
retrieving revision 1.233.4.6
diff -u -r1.233.4.6 nsPluginHostImpl.cpp
--- nsPluginHostImpl.cpp        2001/05/03 21:06:35     1.233.4.6
+++ nsPluginHostImpl.cpp        2001/05/04 22:50:26
@@ -806,7 +806,9 @@
 void nsPluginTag::TryUnloadPlugin(PRBool aForceShutdown)
 {
   // XXX This is a hack to keep Java around, see bug 76936
-  nsCOMPtr<nsIJVMPlugin> isJava = do_QueryInterface(mEntryPoint);
+  PRBool isJava = PR_FALSE;
+  if (mDescription && !strcmp(mDescription, "Java(TM) Plug-in"))
+    isJava = PR_TRUE;
 
   if (isJava && !aForceShutdown) return;
 
Status: VERIFIED → REOPENED
Resolution: FIXED → ---
How about changing:

  nsCOMPtr<nsIJVMPlugin> isJava = do_QueryInterface(mEntryPoint);

to:

  nsCOMPtr<nsIJVMPlugin> isJava;
  if (mEntryPoint)
    isJava = do_QueryInterface(mEntryPoint);
Checking the string desciption vs. the interface makes a difference? 

I like Sean's idea better but I thought do_QueryInterface already catches null 
pointers?
Christopher: How to reproduce your crash to try the new patches?
Working for me with build 2001050408, Linux 2.4.4 i686, RedHat 6.1
I had a look and there are several variants in nsCOMPtr.h/cpp. Apparently, the 
version that got picked was a version in which no error checkin is made.
I would imagine that to repro, you need the same plugins that he has 
installed.  One of them must not implement nsIPlugin.
If Sean's approach works, I like it better than checking the description.
Checking the description is almost guaranteed to break non-sun java plugins.
Leaving aside nsCOMPtr<> for a moment, you could just do:

PRBool isJava = PR_FALSE;
if (mEntryPoint) {
  nsIJVMPlugin* jvmPlugin;
  isJava = NS_SUCCEEDED(mEntryPoint->QueryInterface(NS_GET_IID(nsIJVMPlugin),    
                       (void**)&jvmPlugin));
}
+ the NS_IF_RELEASE, of course...
Is the real problem the version of nsCOMPtr.h/cpp? LXR only reveals one set.

Like I said, I thought do_QueryInterface is safe to call on null pointers? If 
not, I like Sean's approach.

Is there a testcase to reproduce the crash?

I thought do_QI and nsCOMPtr saved you from doing all that as the guide says:
http://www.mozilla.org/projects/xpcom/nsCOMPtr.html
Status: REOPENED → ASSIGNED
To reproduce the crash just have the java plugin installed, start the browser,
and exit the browser.
I am observing that it isn't safe the optional 'rv' isn't specified.

I tried putting an assertion in a random place in the code, say:

NS_IMETHODIMP nsDocumentOpenInfo::OnStartRequest(nsIRequest *request, 
nsISupports * aCtxt)
{
  nsresult rv = NS_OK;
...
NS_ASSERTION(0,"Break");
//  nsCOMPtr<nsIHTTPChannel> httpChannel(do_QueryInterface(request, &rv));
  nsCOMPtr<nsIHTTPChannel> httpChannel(do_QueryInterface(NULL));
...
}

and tracing this didn't gave a null httpChannel (I am reading
httpChannel.mRawPtr = 0x01abfef4 on the trace :-)

But it was safe when I tried:

 nsCOMPtr<nsIHTTPChannel> httpChannel(do_QueryInterface(NULL, &rv));
It gave null, and a failure error code for rv. 
With build 2001050408, everything works fine, except that the browser
crashes on about:plugins if the Java plugin is installed; the bottom of
the stack trace looks like:

#0  0x400df6a4 in nsCOMPtr_base::~nsCOMPtr_base () at eval.c:41
#1  0x40ed7c8d in nsPluginTag::TryUnloadPlugin ()
   from mozilla/components/libgkplugin.so
#2  0x40ed7a5f in nsPluginTag::~nsPluginTag ()
   from mozilla/components/libgkplugin.so
#3  0x40eda017 in nsPluginHostImpl::ReloadPlugins ()
   from mozilla/components/libgkplugin.so
#4  0x404b4c24 in PluginArrayImpl::Refresh ()
   from mozilla/libjsdom.so
#5  0x404b2901 in PluginArrayRefresh ()
   from mozilla/libjsdom.so
#6  0x4014e0c6 in js_Invoke () at eval.c:41
#7  0x40155005 in js_Interpret () at eval.c:41
#8  0x4014e50f in js_Execute () at eval.c:41
#9  0x40132106 in JS_EvaluateUCScriptForPrincipals () at eval.c:41
#10 0x4048a36d in nsJSContext::EvaluateString ()
   from mozilla/libjsdom.so
#11 0x40ca2a43 in HTMLContentSink::EvaluateScript ()

Is the same bug causing this, or should I open up a new bug?
Sorry peterl, I spoke too quickly. It is (should be) safe. My tests were wrong.
Okay, now this makes a little more sense to me seeing that stack. That stack is 
for bug 77319 which is highly related to this bug. That stack shows the problem 
with the nsCOMPtr. 

For whoever can reproduce this bug try doing a null pointer check as Sean 
suggests and see if the crash goes away.

Also, can someone explain this piece of code in ScanPlugins:
// load the plugin's library so we can ask it some questions, but not for 
Windows
#ifndef XP_WIN
      if (pluginFile.LoadPlugin(pluginLibrary) != NS_OK || pluginLibrary == 
nsnull)
        continue;
#endif

Is this causing some of the PR_[Un]LoadLibrary() problems?

Ed, on Win32, I'm getting the service manager held past XPCOM shutdown doing 
what you describe. Even exit when choose profile causes this. I will the debug 
bits I got.
Okay, I just got bittin by this bug, but in a different way while debuging 
another. I see what Sean means if the plugin is 4.x. A QI won't work for a 4.x 
plugin. That's probably why the string comparision works the best.
Try this and see if it clears up THIS crash:

  // XXX This is a hack to keep Java around, see bug 76936
  PRBool isJava = PR_FALSE;
  if (mEntryPoint)
  {
    nsIJVMPlugin* jvmPlugin;
    isJava = NS_SUCCEEDED(mEntryPoint->QueryInterface(NS_GET_IID(nsIJVMPlugin),  
  
                       (void**)&jvmPlugin));
    if (isJava) NS_IF_RELEASE(jvmPlugin);
  }
Knowing not to unload Java in a navigator.plugins.refresh() situation is going 
to be a lot harder to fix. Use bug 77319 for that tracking. That WILL cause a 
crash because we are very deep in plugin destruction and it's hard to tell that 
sitation apart from when XPCOM gets shutdown. I think this solution may be too 
hacky to solve that crash. We must go back to making sure we don't call 
Shutdown() on the plugin unless it's refcount is down to one. I don't think 
doing the call to Shutdown() in TryUnload is a good idea anyway. Here's the top 
of the stack for the reload case:

nsPluginTag::TryUnloadPlugin(int 1) line 807
nsPluginTag::~nsPluginTag() line 763
DOMPluginImpl::~DOMPluginImpl() line 3092 + 11 bytes
DOMPluginImpl::`scalar deleting destructor'(unsigned int 1) + 15 bytes
DOMPluginImpl::Release(DOMPluginImpl * const 0x03b4b530) line 3139 + 154 bytes
PluginElementImpl::~PluginElementImpl() line 270 + 27 bytes
PluginElementImpl::`scalar deleting destructor'(unsigned int 1) + 15 bytes
PluginElementImpl::Release(PluginElementImpl * const 0x03b48a70) line 280 + 154 
bytes
PluginArrayImpl::Refresh(PluginArrayImpl * const 0x03b4bfc4, int 0) line 202 + 
45 bytes
PluginArrayRefresh(JSContext * 0x03b26c50, JSObject * 0x02ca4778, unsigned int 
1, long * 0x02d2ec98, long * 0x0012e2cc) line 333 + 16 bytes
js_Invoke(JSContext * 0x03b26c50, unsigned int 1, unsigned int 0) line 813 + 23 
bytes
Nit: some little changes:

  // XXX This is a hack to keep Java around, see bug 76936
  PRBool isJava = PR_FALSE;
  if (mEntryPoint) {
    nsIJVMPlugin* jvmPlugin;
    if (NS_SUCCEEDED(mEntryPoint->QueryInterface(NS_GET_IID(nsIJVMPlugin),  
                    (void**)&jvmPlugin))) {
      NS_RELEASE(jvmPlugin);
      isJava = PR_TRUE;
    }
  }
I tried Peter's last patch, and it seemed working OK (I tried the
"about:plugins" and exit rightaway cases). I also tried Christopher's patch, and
it seemed working fine, too. So somehow, nsComptr seemed to cause some problem.

One thing though, in the exit-right-away case, there were two assertions (for
both patches) at the end: 

nsPluginHostImpl::Observe "xpcom-shutdown"
### beging nsCacheService::Shutdown()
### starting ~nsMemoryCacheDevice()
###!!! ASSERTION: Service Manager being held past XPCOM shutdown.: 'cnt == 0',
file nsServiceManager.cpp, line 545
###!!! Break: at file nsServiceManager.cpp, line 545
CanUnload_enumerate: skipping native
GC Cache:
        hits:  395  203   72   11   31    9    0    1    2    2
        hits: 726, misses: 244, hit percent: 74.845360%
###!!! ASSERTION: Component Manager being held past XPCOM shutdown.: 'cnt == 0',
file nsXPComInit.cpp, line 505
###!!! Break: at file nsXPComInit.cpp, line 505
Joe, can you try another plugin. Throw in Flash the Default Plugin or any other 
4.x plugin. Acrobat PDF is broken (Bug 78741) on all builds from 4-24-01. Be 
sure when you are testing the closing case and about:plugins case that you FIRST 
visit a page with the plugin in question (namely Java or Flash or lack there 
of to trigger the Default). If that works, then I think rbs' second patch should 
go in.

Yes, there is the problem of XPCOM shutdown I see on Win32. Do you see on Linux? 
Can you check the refcount of the Java OJI Plugin when ::Observe is called?
Please keep in mind that the mEntryPoint isn't NULL but when you call
QueryInterface() on it it will fall over.  So you can't use that test.  Due to
changing symbol sizes and changing structure sizes I suspect that comparing the
char * description is the safest bet, assming that the description is the same
everywhere.
Correct me if I'm wrong, but don't we wrap 4.x plugins in an XPCOM wrapper 
anyway (ns4xPlugin*)? Since mEntryPoint is of type nsIPlugin and hence derives 
from nsIFactory which derives from nsISupports, it should be safe to assume 
there will be a QI or am I missing something here? 
I saw this function in the file: static PRBool isJavaPlugin(nsPluginTag * tag)
Is it also a possible alternative here?

==============================
No test against nsIJVMPlugin ?
nsresult
ns4xPlugin::QueryInterface(const nsIID& iid, void** instance)
{
    if (instance == NULL)
        return NS_ERROR_NULL_POINTER;

    if (iid.Equals(kIPluginIID))
    {
        *instance = (void *)(nsIPlugin *)this;
        AddRef();
        return NS_OK;
    }

    if (iid.Equals(kIFactoryIID))
    {
        *instance = (void *)(nsIFactory *)this;
        AddRef();
        return NS_OK;
    }

    if (iid.Equals(kISupportsIID))
    {
        *instance = (void *)(nsISupports *)this;
        AddRef();
        return NS_OK;
    }

    return NS_NOINTERFACE;
}
No need to test ns4xPlugin for nsIJVMPlugin in QI because it doesn't implement 
that interface. It's not Java. Take a look at MRJPlugin.h and notice it 
implements nsIPlugin and the QI. 

[static PRBool isJavaPlugin(nsPluginTag * tag)] is used to check for the old 4.x 
npjava plugin. Loading this causes crashes.
Chris wrote:
>Please keep in mind that the mEntryPoint isn't NULL but when you call
>QueryInterface() on it it will fall over.

That's bad.  mEntryPoint is supposed to either be NULL or an nsIPlugin*.  If
it's not a valid nsIPlugin*, then after this crash is addressed, there will be a
crash in the Shutdown call which is protected by if (mEntryPoint).


OK peterl, saw it, QI isn't the cause then.

sean/blizzard, it is declared as "nsIPlugin *mEntryPoint" so assumptions aren't 
broken.
Cc'ing scc, a bit late.  Are we sure the nsCOMPtr (vs. a raw strong-ref ptr) was
a necessary condition for the problem?

/be
May be I have found the problem... Preparing a patch so that other could
try for confirmation.
This is interesting, but there is no path through |do_QueryInterface| that
blindly calls through |NULL|.  It all must pass through

  http://lxr.mozilla.org/seamonkey/source/xpcom/base/nsCOMPtr.cpp#30

From your description of the behavior, it sounds like |mEntryPoint| isn't really
a pointer to something that inherits unambiguously from |nsISupports|.  Does
stepping into the QI jump into limbo?  Does the intermediate step look like a
vtable?  That would imply that it is an object, but the wrong kind.  If the
intermediate step doesn't even look like a vtable, that would imply the pointer
is garbage but non-|NULL|.

Here's one possibility: could it be that what |mEntryPoint| points to has
already been unloaded?  That would explain a lot.
Nope, it eliminates the crash but it *isn't* the correct fix:

NS_IMETHODIMP nsPluginHostImpl::Observe(nsISupports *aSubject,
                                        const PRUnichar *aTopic,
                                        const PRUnichar *someData)
{
#ifdef NS_DEBUG
  nsAutoString topic(aTopic);
  char * newString = topic.ToNewCString();
  printf("nsPluginHostImpl::Observe \"%s\"\n", newString ? newString : "");
  if (newString)
    nsCRT::free(newString);
#endif
//  if (NS_LITERAL_STRING(NS_XPCOM_SHUTDOWN_OBSERVER_ID) == aTopic || 
//      NS_LITERAL_STRING("quit-application") == aTopic)
  {
    Destroy();
  }
   
  NS_IF_RELEASE(serviceManager);  < where serviceManager is the global SM in
  return NS_OK;                   < the code (it is explicitly clear in the
                                  < code that there is *no* addref of the GSM)
}                                 < nope, this is the wrong fix...
Note this interesting comment from bug #77319 (claimed to be related to this bug)

  >The crash happens on NS_ADDREF(mInstance) and taking a look at that interface 
  >pointer shows it's vtable to be invalid so no wonder it crashes. What is going 
  >on with this member variable on this page.

Scott,

In the case of the Java plugin, at least on Win32, mEntryPoint shows up as 
deriving from nsIFactory, nsISupports, and there's a vtable. I realized that's 
all the info the debugger probably can tell as we have no symbols for those 
optimized bits.  Ed sent me the debug bits but they don't even seem to work, 
probably because of the reason I talk about in bug 78150.
The debugger is showing you the static type of the variable, so it's expected
that it _claims_ the inheritance is valid.  When you actually look into the
vtable and follow the appropriate entry, it should lead to a correct and
in-memory implementation of |QueryInterface|.  If it doesn't, then you will at
least understand the exact mechanism of the crash, though not the reason behind it.
I tried this with a few different plugins in different senerios and anytime that 
mEntryPoint is not null, there looks to be a valid vtable with pointers. In the 
4.x case it's ns4xPlugin and in the Java case, like I thought, it's just 3 
pointers to memory because of no symbols. This is on Win32 though, Linux is 
another story. Can someone step through and examine the vtable just before the 
crash?

Scott, if the Java plugin used an older nsCOMPtr version to build with, could it 
be possible that NULL gets through? What about this debug vs. optimized in terms 
of zeroing out pointers?
(a) no.  no version in more than a year could fail for |NULL|, but
(b) you're calling _your_ |do_QueryInterface|, not one in the plugin, right?
This code is in the body of Mozilla somewhere ... not the plugin, and that's the
|do_QueryInterface| (or really |nsQueryInterface::operator()|) we're calling,
which is just a helper routine that calls QI.  If you step into that call, what
happens?  See in particular, code around the URL I cited above.  Just after that
is where it actually queries your pointer:

  http://lxr.mozilla.org/seamonkey/source/xpcom/base/nsCOMPtr.cpp#32

Step into the query... do you get somewhere?  Or does just stepping into the
query crash you?  If you got somewhere, examine the code ... is it really an
implementation of |QueryInterface| (compare with dissasembly of a known entry
point if symbols are not available).  If this is not really an implementation of
|QueryInterface|, then that is strong evidence in favor of unloaded code or a
trashed |mEntryPoint|.
I hope no one here is under the impression that code responsible for loading or
unloading plugins respects strong references to objects whose implementations
are provided by that plugin.  _That's_ certainly not the case by default.

I'll wager that the |nsCOMPtr| is pointing to an object whose implementation has
been unloaded.  Then, any work done through that pointer, of course, leads to a
jump into limbo.  And work is guaranteed, since the |nsCOMPtr|s destructor will
try to call |Release|.
Here are some data that might perhaps prove helpful. I expanded the 
ADDREF/RELEASE macros in nsServiceManager.cpp in order to see exactly who was 
initiating the addref & release. Then, I started SeaMonkey and exited 
immediately. As far I can tell, it seems the bug is in NPOJI600, for it appears
to be doing a QI without a matching |release|:

Here is my console output (the source changes to nsServiceManager.cpp are below)

###!!! ASSERTION: ADDREF: '0', file C:\Mozilla\src-m0.9\mozilla\xpcom\components
\nsServiceManager.cpp, line 315
Type Manifest File: C:\Mozilla\src-m0.9\mozilla\dist\WIN32_D.OBJ\bin\components\
xpti.dat
nsNativeComponentLoader: autoregistering begins.
nsNativeComponentLoader: autoregistering succeeded
nNCL: registering deferred (0)
Initialized app shell component {18c2f989-b09f-11d2-bcde-00805f0e1353}, rv=0x000
00000
WEBSHELL+ = 1
nsPluginHostImpl ctor
plugins at: C:\Mozilla\src-m0.9\mozilla\dist\WIN32_D.OBJ\bin\plugins
plugins at: C:\Program Files\Netscape\Communicator\Program\Plugins
For application/x-java-vm found plugin C:\Mozilla\src-m0.9\mozilla\dist\WIN32_D.
OBJ\bin\plugins\NPOJI600.dll
###!!! ASSERTION: ADDREF: '0', file C:\Mozilla\src-m0.9\mozilla\xpcom\components
\nsServiceManager.cpp, line 315
WEBSHELL+ = 2
Disabling View Source StyleSheet
Enabling Quirk StyleSheet
Note: verifyreflow is disabled
Style Data Sharing is Enabled :)
Note: styleverifytree is disabled
Note: frameverifytree is disabled
###!!! ASSERTION: New URI failed: 'Error', file C:\Mozilla\src-m0.9\mozilla\netw
erk\base\src\nsProtocolProxyService.cpp, line 302
Disabling View Source StyleSheet
Start reading in bookmarks.html
Finished reading in bookmarks.html  (21000 microseconds)
WEBSHELL+ = 3
Disabling View Source StyleSheet
Enabling Quirk StyleSheet
has multiple monitor apis is 1
blank
WEBSHELL- = 2
/content/navigator.xul
blank
WEBSHELL- = 1
WEBSHELL- = 0
Shut down app shell component {18c2f989-b09f-11d2-bcde-00805f0e1353}, rv=0x00000
000
nsPluginHostImpl::Observe "xpcom-shutdown"
### beging nsCacheService::Shutdown()
### starting ~nsMemoryCacheDevice()
###!!! ASSERTION: RELEASE: '0', file C:\Mozilla\src-m0.9\mozilla\xpcom\component
s\nsServiceManager.cpp, line 299

And this is where we get the Service Manager being held pass down msg...


Changes to nsServiceManager.cpp (could someone try to reproduce as I am
going off-line now)
=====================================================

//NS_IMPL_ISUPPORTS1(nsServiceManagerImpl, nsIServiceManager)
NS_IMPL_QUERY_INTERFACE1(nsServiceManagerImpl, nsIServiceManager)

//NS_IMPL_RELEASE(nsServiceManagerImpl) 
NS_IMETHODIMP_(nsrefcnt) nsServiceManagerImpl::Release(void) 
{                                              
NS_ASSERTION(0,"RELEASE");
  NS_PRECONDITION(0 != mRefCnt, "dup release");
  NS_ASSERT_OWNINGTHREAD(nsServiceManagerImpl);              
  --mRefCnt;                                   
  NS_LOG_RELEASE(this, mRefCnt, "nsServiceManagerImpl");      
  if (mRefCnt == 0) {                          
    mRefCnt = 1; /* stabilize */               
    NS_DELETEXPCOM(this);                                  
    return 0;                                  
  }                                            
  return mRefCnt;                              
}

//#define NS_IMPL_ADDREF(nsServiceManagerImpl)
NS_IMETHODIMP_(nsrefcnt) nsServiceManagerImpl::AddRef(void)
{        
NS_ASSERTION(0,"ADDREF");
  NS_PRECONDITION(PRInt32(mRefCnt) >= 0, "illegal refcnt");
  NS_ASSERT_OWNINGTHREAD(nsServiceManagerImpl);  
  ++mRefCnt;  
  NS_LOG_ADDREF(this, mRefCnt, "nsServiceManagerImpl", sizeof(*this)); 
  return mRefCnt;
}




The stack trace at the time of the QI by NPOJI600 (I didn't capture the head)
==============================================================

NTDLL! 77f9eea9()
nsDebug::Assertion(const char * 0x100ecf24, const char * 0x100ecf20, const char 
* 0x100ecedc, int 315) line 286 + 13 bytes
nsServiceManagerImpl::AddRef(nsServiceManagerImpl * const 0x004840d0) line 315 + 
34 bytes
nsServiceManagerImpl::QueryInterface(nsServiceManagerImpl * const 0x004840d0, 
const nsID & {...}, void * * 0x00338140) line 294 + 139 bytes
JPINS32! 502e5e40()
JPINS32! 502e5d41()
NPOJI600! 5039104a()
nsJVMManager::StartupJVM() line 599 + 32 bytes
nsJVMManager::MaybeStartupLiveConnect() line 780 + 20 bytes
nsJVMManager::StartupLiveConnect(nsJVMManager * const 0x01580448, JSRuntime * 
0x00e16df0, int & 0) line 128 + 11 bytes
nsJSEnvironment::nsJSEnvironment() line 1505 + 49 bytes
nsJSEnvironment::GetScriptingEnvironment() line 1446 + 27 bytes
NS_CreateScriptContext(nsIScriptGlobalObject * 0x0152b6d0, nsIScriptContext * * 
0x0155cc10) line 1545 + 5 bytes
nsDocShell::EnsureScriptEnvironment(nsDocShell * const 0x0155cb60) line 4656 + 
50 bytes
nsWebShell::GetInterface(nsWebShell * const 0x0155cb84, const nsID & {...}, void 
* * 0x0012fbd4) line 329 + 19 bytes
nsGetInterface::operator()(const nsID & {...}, void * * 0x0012fbd4) line 37 + 31 
bytes
nsCOMPtr<nsIDOMWindowInternal>::assign_from_helper(const nsCOMPtr_helper & 
{...}, const nsID & {...}) line 974 + 18 bytes
nsCOMPtr<nsIDOMWindowInternal>::nsCOMPtr<nsIDOMWindowInternal>(const 
nsCOMPtr_helper & {...}) line 556
nsAppShellService::GetHiddenWindowAndJSContext(nsAppShellService * const 
0x00543630, nsIDOMWindowInternal * * 0x0012fc7c, JSContext * * 0x0012fc80) line 
713 + 32 bytes
nsAppShellService::SetXPConnectSafeContext() line 177 + 40 bytes
nsAppShellService::CreateHiddenWindow(nsAppShellService * const 0x00543630) line 
248
main1(int 1, char * * 0x00484460, nsISupports * 0x00000000) line 988
main(int 1, char * * 0x00484460) line 1309 + 37 bytes
mainCRTStartup() line 338 + 17 bytes
KERNEL32! 77e992a6()
Changes in diff -u format:

Index: nglsrc/nsPluginHostImpl.cpp
===================================================================
RCS file: /cvsroot/mozilla/modules/plugin/nglsrc/nsPluginHostImpl.cpp,v
retrieving revision 1.233.4.6
diff -u -r1.233.4.6 nsPluginHostImpl.cpp
--- nsPluginHostImpl.cpp        2001/05/03 21:06:35     1.233.4.6
+++ nsPluginHostImpl.cpp        2001/05/05 07:34:55
@@ -806,7 +806,15 @@
 void nsPluginTag::TryUnloadPlugin(PRBool aForceShutdown)
 {
   // XXX This is a hack to keep Java around, see bug 76936
-  nsCOMPtr<nsIJVMPlugin> isJava = do_QueryInterface(mEntryPoint);
+  PRBool isJava = PR_FALSE;
+  if (mEntryPoint) {
+    nsIJVMPlugin* jvmPlugin;
+    if (NS_SUCCEEDED(mEntryPoint->QueryInterface(NS_GET_IID(nsIJVMPlugin),
+                    (void**)&jvmPlugin))) {
+      NS_RELEASE(jvmPlugin);
+      isJava = PR_TRUE;
+    }
+  }

   if (isJava && !aForceShutdown) return;



Index: nsServiceManager.cpp
===================================================================
RCS file: /cvsroot/mozilla/xpcom/components/nsServiceManager.cpp,v
retrieving revision 3.48
diff -u -r3.48 nsServiceManager.cpp
--- nsServiceManager.cpp        2000/09/13 23:55:57     3.48
+++ nsServiceManager.cpp        2001/05/05 07:40:10
@@ -289,7 +289,35 @@
     }
 }

-NS_IMPL_ISUPPORTS1(nsServiceManagerImpl, nsIServiceManager)
+//NS_IMPL_ISUPPORTS1(nsServiceManagerImpl, nsIServiceManager)
+NS_IMPL_QUERY_INTERFACE1(nsServiceManagerImpl, nsIServiceManager)
+
+//NS_IMPL_RELEASE(nsServiceManagerImpl)
+NS_IMETHODIMP_(nsrefcnt) nsServiceManagerImpl::Release(void)
+{
+NS_ASSERTION(0,"RELEASE");
+  NS_PRECONDITION(0 != mRefCnt, "dup release");
+  NS_ASSERT_OWNINGTHREAD(nsServiceManagerImpl);
+  --mRefCnt;
+  NS_LOG_RELEASE(this, mRefCnt, "nsServiceManagerImpl");
+  if (mRefCnt == 0) {
+    mRefCnt = 1; /* stabilize */
+    NS_DELETEXPCOM(this);
+    return 0;
+  }
+  return mRefCnt;
+}
+
+//#define NS_IMPL_ADDREF(nsServiceManagerImpl)
+NS_IMETHODIMP_(nsrefcnt) nsServiceManagerImpl::AddRef(void)
+{
+NS_ASSERTION(0,"ADDREF");
+  NS_PRECONDITION(PRInt32(mRefCnt) >= 0, "illegal refcnt");
+  NS_ASSERT_OWNINGTHREAD(nsServiceManagerImpl);
+  ++mRefCnt;
+  NS_LOG_ADDREF(this, mRefCnt, "nsServiceManagerImpl", sizeof(*this));
+  return mRefCnt;
+}

 NS_IMETHODIMP
 nsServiceManagerImpl::GetService(const nsCID& aClass, const nsIID& aIID,
*** Bug 79019 has been marked as a duplicate of this bug. ***
OK, guys.  I'm about to make all of us feel really dumb.  Here's the correct fix:


void nsPluginTag::TryUnloadPlugin(PRBool aForceShutdown)
{
  // XXX This is a hack to keep Java around, see bug 76936
  PRBool isJava = PR_FALSE;
  nsCOMPtr<nsIJVMPlugin> jvm;

  if (mEntryPoint)
    jvm = do_QueryInterface(mEntryPoint);

  if (jvm) {
    isJava = PR_TRUE;
    // release before we shutdown below
    jvm = nsnull;
  }

  if (isJava && !aForceShutdown) return;

  if (mEntryPoint)
  {
    mEntryPoint->Shutdown();
    mEntryPoint->Release();
    mEntryPoint = nsnull;
  }

  // before we unload check if we are allowed to, see bug #61388
  if (mLibrary && mCanUnloadLibrary)
    PR_UnloadLibrary(mLibrary);

  // we should zero it anyway, it is going to be unloaded by 
  // CleanUnsedLibraries before we need to call the library 
  // again so the calling code should not be fooled and reload 
  // the library fresh
  mLibrary = nsnull;
}

The problem is that the nsCOMPtr destructor is trying to call Release() on a
pointer that was in a library that was unloaded by PR_LoadLibrary().  I didn't
notice this until I used step instruction through the entire function.

I think that when looking at the stack traces everyone including myselft thought
it was when we called _into_ the function since the nsCOMPtr::~nsCOMPtr call
would always show up at the beginning of the function since that's where the
destructor is located as far as the debugger is concerned.  So much for stack
traces.

Looking for real reviews here and testing.  I think this is right, though.
Do we really believe that only JVMs will want to use threads, or might be 
bothered by abrupt unloading?  I'm really nervous about this approach.

I'd prefer that we either never unload plugins, or that we _always_ unload 
plugins, and require all plugins that want to avoid that behaviour to use a new-
style plugin which returns the right thing from nsIModule::CanUnload.  (We 
currently don't ever unload components, even when they report that they can be 
unloaded, because lots of our components suck and lie to us.)

What does 4.x do here?  How does the Sun Java(TM) Plugin avoid crashing there?
Checking the string on the name of the plugin doesn't crash because it doesn't
try to release after the module has been unloaded.
>Do we really believe that only JVMs will want to use threads, or might be 
>bothered by abrupt unloading?

No.  There are MANY issues related to this bug and bug 77319 that need to be
worked out yet.  I'd prefer seeing all loaded plugin modules remain loaded until
xpcom-shutdown (see various posts throughout this bug and 77319); temporarily
until a CanUnload solution is impl'ed/tested.

>Checking the string on the name of the plugin doesn't crash because it doesn't
>try to release after the module has been unloaded.

yep.  Also explains why the patch from rbs (which used mEntryPoint but not via
nsComPtr) worked.  Sorry for throwing everyone off track with my assumption that
the QI was failing due to a NULL ptr.

Now: Why didn't the patch cause a shutdown crash before it was checked into the
0.9 branch?

blizzard: other than what I'm assuming is a small typo in your final comment
(CleanUnsedLibraries) it looks good to me.  [s]r=scc (on blizzard's implentation
provided in his comment at 2001-05-05 10:43), whichever you need.
I still crash on exit after visiting java.sun.com with a CVS build which had
blizzard's latest patch in it. Info on my system:

o Debian GNU/Linux 2.2 (Sid)
o glibc 2.2.2
o gcc-2.95.3
o Sun's Java SDK, Standard Edition. Version 1.3.0_02

I crash with the latest pre-built 0.9 release candidate too.
blizzard's nsCOMPtr-friendly patch is equivalent to mine, except that
he is explicitly destroying the plugin (i.e., mEntryPoint).

Another way to write his patch is:

  PRBool isJava = PR_FALSE;
  if (mEntryPoint) {
    nsCOMPtr<nsIJVMPlugin> jvm;
    jvm = do_QueryInterface(mEntryPoint);
    if (jvm) {
      isJava = PR_TRUE;
      // release before we shutdown below
      jvm = nsnull;
    }
  }

I was under the impression that the particularity with the Java plugin 
was that it needed to be kept around until shut-down? It would be great
if the plugin could indeed be destroyed at once... In that case, one would
also need to do:
      jvm = nsnull;
+     mEntryPoint = nsnull; //mEntryPoint isn't declared as a nsCOMPtr
This way, the following "if (mEntryPoint)" will always fail and there is 
never going to be an attempt at double destruction if the code ever reaches
there. And speaking of this, is setting "jvm = nsnull" the cleaner way to
destroy the plugin. Will the dtor do the right thing here? It seems one has
to call Shutdown() and then Release()?
 
====
As for the crash at exit, has anybody took a look at the service manager stuff 
that I described above?
rbs, I only changed it to use an nsCOMPtr.  The release of mEntryPoint is only
done if PR_TRUE is passed into TryUnloadPlugin which is only done at shutdown so
the jvm still isn't being released until then.  The QI that do_QueryInterface()
does does an addref and I do the release which means that I don't even unload
early at least as I understand it.
OK, got it. 

And with the fact that nsCOMPtr<> will auto-release when it goes out of scope,
the version of blizzard's nsCOMPtr patch that I would prefer would be:

  PRBool isJava = PR_FALSE;
  if (mEntryPoint) {
    nsCOMPtr<nsIJVMPlugin> jvm
    jvm = do_QueryInterface(mEntryPoint);
    if (jvm) {
      isJava = PR_TRUE;
    }
  }

Some will even prefer nsCOMPtr<nsIJVMPlugin> jvm(do_QueryInterface(mEntryPoint)) 
To summarize: the mistake/lesson from earlier attempts was to use the nsCOMPtr 
'jvm' as a flag; i.e., hold an extra ref to an object that we are trying to 
destroy. Very bad or "really dumb" to quote blizzard. But seeing the large 
number of people cc:ed, let's give to that the benefit of the sheer heat of
the last m0.9 bug :-)

====
[Repeating]
As for the crash at exit, has anybody took a look at the service manager stuff 
that I described above?
rbs, could you attach a patch so andre can try it out on his system as he
seems to be the only one stil seeing a problem. thanks
Actually, I am not sure what andre's problem is. To be more precise, what I am 
referring to is an assertion: "service manager held pass down". And this 
assertion arises when the crash in nsPluginTag::TryUnloadPlugin() in fixed 
(either with string compare, raw QI, nsCOMPtr QI). In other words, the assertion 
won't show up in release builds.

The assertion I am referring to is also mentioned in the comments from "Joe Chou 
2001-05-04 19:54" and "Peter Lubczynski 2001-05-04 20:08" who added that it 
seemed to be win32-specific. Because it is just an assertion in debug builds, 
there shouldn't be more issues for m0.9 as far as release is concerned, but 
it shows that there is a leak of the service manager. So if there is an easy 
fix that the Java people know about, then that will be fine to get it too. 
Otherwise, it is fine too :) that wouldn't be the first and last leak anyway :)

The reason I suggested for the leak is in my comments at "rbs@maths.uq.edu.au 
2001-05-05 00:24" and the patch that will help to trace it is the one that 
changes nsServiceManager.cpp which I added in my following post -- it just 
expands ADDREF/RELEASE in nsServiceManager.cpp and shows that there is a pending 
release that should have been fired by NPOJI600 (it may be too late to 
investigate for m0.9, but could serve as a starting point).
Attached below is a backtrace that I finally managed to get with a cvs snapshot
of gdb. This backtrace was done right after mozilla crashed on exit after
visiting java.sun.com.
Same bug with releasing, just keeps moving around. That's the trace I got 
in bug 77319. Looks like we ADDREF in GetInstance which I think may be causing 
the crash then try to release after Shutdown.
Andre, the patch hasn't been applied to the m0.9 branch. Now wonder that the
crash is still there. Folks, could I checkin this one that I have in my tree?

C:\Mozilla\src-m0.9\mozilla\modules\plugin\nglsrc>cvs diff
cvs server: Diffing .
Index: nsPluginHostImpl.cpp
===================================================================
RCS file: /cvsroot/mozilla/modules/plugin/nglsrc/nsPluginHostImpl.cpp,v
retrieving revision 1.233.4.6
diff -r1.233.4.6 nsPluginHostImpl.cpp
809c809,817
<   nsCOMPtr<nsIJVMPlugin> isJava = do_QueryInterface(mEntryPoint);
---
>   PRBool isJava = PR_FALSE;
>   if (mEntryPoint) {
>     nsIJVMPlugin* jvmPlugin;
>     if (NS_SUCCEEDED(mEntryPoint->QueryInterface(NS_GET_IID(nsIJVMPlugin),
>                     (void**)&jvmPlugin))) {
>       NS_RELEASE(jvmPlugin);
>       isJava = PR_TRUE;
>     }
>   }
Could someone _please_ tell me why we want this fix at all, rather than just 
turning off all plugin unloading until we can find a system (like nsIModule 
has) that won't require us to put special hacks in the code for every plugin 
that wants to use threads?

Why isn't this as simple as just removing the PR_UnloadLibrary call?
I don't know. Drivers?
Actually, this could be the orriginal reason for unloading:

http://bugzilla.mozilla.org/show_bug.cgi?id=58128

4.x style plugins need to be "shutdown" when the last instance goes away. 
Isn't OJI treated as a 4.x style plugin but it can't be unloaded. On the flip 
side, correct me if i'm wrong, XPCOM stay loaded once loaded.

And then, to complicate matters, calling nav.plugins.refresh() needs to 
try to shutdown plugins (all,I think) that are running because their libraries 
may have been replaced by another updated version or removed.
The last peterl's comment is a real threat. We will definitely crash on 
javascript:navigator.plugins.refresh(1) after updating the plugin if the plugin 
dll is not unloaded. And this particular functionality has been wanted for a 
long time.
Sure, being able to unload and reload plugins is a great thing to have, but we 
_don't_ have it if we don't have it safely.  Why not ask the plugin if it's 
safe to unload?  nsIModule::canUnload gives you that, as something to follow.  
(We don't currently unload C++ modules, because many of our modules are crap 
and lie to us when we ask them if it's safe, but don't get me started on that.  
It's not a flaw in the architecture, just a flaw in our general coding 
practices.)  When you put this in, did you really think that it would be safe 
to just unload 3rd-party software like that?  That no plugins would have timers 
or other threads to worry about?  (From reading 58218, it looks like we have to 
call NPP_Shutdown on 4.x plugins, but I can't find anything that says we have 
to unload them.  73071 is analogous, but for new-style plugins.  And, for new-
style plugins, peterl says in January that we _shouldn't_ call NPP_Shutdown for 
XPCOM plugins.  Very confusing.)

"Just like 4.x but..." bothers me a lot: we have a new plugin API, lots of 
which was "designed" to meet the needs of Sun's JVM plugin, and yet we have 
fragile stuff like this, special cased to avoid one specific piece of 
fragility.  Why isn't the OJI plugin using the New Plugin API, and being kept 
in by returning false from its module's canUnload hook?  This plan, and the 
stuff that led up to it, just reeks of bandaid, to me.

(I'm more than willing to walk away from navigator.plugins.refresh(1), or have 
it only pick up new plugins that aren't already known, if we absolutely can't 
find a way to ask a plugin if it's ready to be unloaded.  But it's not like 
this is rocket science, IMO.)
rbs, please file a new bug about the service manager leak wrt java.  It's a
seperate issue than what's being dealt with in this bug.

shaver, open a new bug if you want to talk about the architecture.  Yes, there
are problems but there's so much noise here it's difficult to talk about it.  I
want to see it fixed the right way as much as the next guy but let's deal with
it seperately.

I think we're shipping as is for now.  It crashes less anyway.

edburns, you need to look at andre's stack trace since he's reporting there's
still a problem and deal with it here.  Or open a new bug since this one is so
noisy and link the two with each other with a comment.
I'll file a bug later, but I have precisely zero confidence that we'll ever do 
the right thing here once the 0.9-topcrash heat is taken off.  Another special-
case hack for Sun's JVM in our plugin code, instead of making Sun respect the 
semantics of the API they chose to use, whee.
> ----- Additional Comments From Peter Lubczynski 2001-05-04 23:19 -------
[...]
> optimized bits.  Ed sent me the debug bits but they don't even seem to work, 
> probably because of the reason I talk about in bug 78150.

Peter, do you have JDK1.3.1 rc1 and only JDK1.3.1 rc1 installed on your 
system?  Those plugin bits only work with jdk1.3.1 rc1.  Please de-install all 
other Javas on your system and go to 
<http://developer.java.sun.com/servlet/SessionServlet?
url=http://developer.java.sun.com/developer/earlyAccess/j2sdk131/> and get 
1.3.1 rc2.  Unfortunately 1.3.1 rc1 is no longer offered.
Mike Shaver wrote:

> Why isn't the OJI plugin using the New Plugin API, and being kept 
> in by returning false from its module's canUnload hook?  This plan, and the 
> stuff that led up to it, just reeks of bandaid, to me.

Doch, Sun's java plugin does use the new interface:

// CNetscapePlugin.h  by Stanley Man-Kit Ho
// [...]

#ifndef CNETSCAPEPLUGIN_H
#define CNETSCAPEPLUGIN_H

class CPluginView;

class CNetscapePlugin : public nsIPluginInstance 
{

// [...]
};

// CPluginModule.h  by Stanley Man-Kit Ho
// [...]

#ifndef __PLUGINMODULE_H_
#define __PLUGINMODULE_H_

#include "resource.h"       // main symbols

#include "oji_clsid.h"

/////////////////////////////////////////////////////////////////////////////
// CPluginModule
class ATL_NO_VTABLE CPluginModule : 
// [...] super secret inheritance deleted...shhhh
	public IPluginModule
{
public:
    CPluginModule() {}
// [...]
public:

    // IPluginModule
    STDMETHOD_(nsresult, NSGetFactory) (const nsCID &aClass,
					nsISupports *aSupport,
                                        nsIFactory **aFactory);
    STDMETHOD_(PRBool, NSCanUnload) (void);
    STDMETHOD_(nsresult, NSRegisterSelf) (const char* path);
    STDMETHOD_(nsresult, NSUnregisterSelf) (const char* path);

};

// JavaPluginFactory.h  by Stanley Man-Kit Ho
// [...]
class CJavaPluginFactory : public nsIJVMPlugin, public nsIPlugin
{
// [...]
};


But it doesn't implement nsIModule. 
If it's using the XPCOM plugin interface, then it must be an XPCOM component,
right?  Which means that something must implement nsIModule (or NSGetFactory,
but I told you to get rid of that many milestones ago, and it's going to break
hard in 0.9.1, so I'll ignore than possibility), right?  And we don't unload any
modules once we've created them, and we never _will_ until shutdown, if they
return false from canUnload.

So, from the fact that this bug is happening at all, I think it's pretty clear
that either: 
 - the Sun OJI plugin doesn't really use the new-style plugin interface, or
 - the new-style plugin interface is savagely broken in ways I didn't previously
   suspect -- and my suspicions on that front are quite well-developed -- and
   actually _does_ unload XPCOM plugins.

The plot thickens.
Just to reassure Mike:  XPCOM plugins do not actually get unloaded when the 
plugin host thinks it is unloading them (nsPluginTag::TryUnloadPlugin) because 
the component manager continues to hold a ref to the nsIModule and nsIPlugin.
Sean: that's what I expected, but thanks for the reassurance.  So, if we're
seeing this crash, and Sun's plugin is using the XPCOM plugin interfaces, what
the heck is going on?  I'll be quite surprised, and even more frightened, if
it's an XPCOM bug.
Reference has been made to special code that preloads Java.  Maybe need to look 
at that (if it exists)?
Wow, if there's still a reference in the component manager then we probably
shouldn't be calling PR_UnloadLibrary() on any XPCOM-based plugin modules. Aeii!
The plugin host also does a PR_LoadLibrary.  So as long as the refcnt code on 
libraries is working, then the plugin host's call to PR_UnloadLibrary *should* 
be ok, right?
Please see:

http://bugzilla.mozilla.org/show_bug.cgi?id=79196

about the long term unloading issue.  This bug is just too noisy.
I did some more tracing on Peter's checked in patch:
...
  nsCOMPtr<nsIJVMPlugin> isJava;
  if (mEntryPoint) {
    isJava = do_QueryInterface(mEntryPoint);
  }

  //if (isJava && !aForceShutdown) return;
  if (isJava) return;
...


and found out that the flag, aForceShutdown, played a decisive role:

a)In the case of "about:plugins", isJava was non-null, but  was 1, therefore
skipped :
	if (isJava && !aForceShutdown) return;
and went to unload and crashed.

b) In the case of leaving an applet page (and Back/Forward with other pages),
isJava was non-null, and aForceShutdown was 0, therefore returned at: 
	if (isJava && !aForceShutdown) return;
and no crash.

Therefore, I took out the checking for aForceShutdown, and it worked in both
cases, plus the case of "launch and quite". No crashes. Did that mean anything?

If someone wants to try the Linux debug build of Mozilla with the debug build of
Java plugin, there two things you may need to know:
1) Get a recent trunk build. My build was checked out on last Friday, at it was
the first time that a debug mozilla ever worked with a debug Java plugin for me.
The same Java plugin did not work with the earlier builds of mozilla.
2) Expect longer startup time of mozilla with a debugger. In my case (using
gdb), it took longer than half an hour (no kidding!) to just bring up the
browser. It used around 99% cpu and most of the memory most of the startup time.
I have a pretty fast pentium with 256 mb of memory. After the browser came up,
speed went back to normal, but most of the memory was still occupied.
Sun's plugin implements these interfaces to be an XPCOM component:

    STDMETHOD_(nsresult, NSGetFactory) (const nsCID &aClass,
					nsISupports *aSupport,
                                        nsIFactory **aFactory);
    STDMETHOD_(PRBool, NSCanUnload) (void);
    STDMETHOD_(nsresult, NSRegisterSelf) (const char* path);
    STDMETHOD_(nsresult, NSUnregisterSelf) (const char* path);

It doesn't implement nsIModule.

I'll ask Stanley to explain why they don't implement nsIModule.

Joe, this is what I was worred about with aForceShutdown. I didn't realize that 
ReloadPlugin takes almost the same code path as actually shutting down so 
aForceShutdown being true sill force the Java XPCOM plugin to shutdown and 
unload pre-maturaly. You can't remove it all together because we do need to 
shutdown for 4.x but not for XPCOM.

Hm...I have another idea, do the reverse....

try doing a modification of the QI for Java. Instead of bailing when we have a 
nsIJVMPlugin, do the oppposite: do a QI for a ns4xPlugin and if that fails, 
bail, because we are an XPCOM plugin and service manager should shut us down.

What about that approach to solve this? Skips all this unloading for XPCOM 
plugins but keeps parity with 4.x.
I also tried to just take out the "!":
	...
 	if (isJava && aForceShutdown) return;
	...
and it also seemed to work OK, passed all three test cases. 
Joe, try something like:

What if we did something similar to:

  PRBool isXPCOM = PR_FALSE;
  if (mEntryPoint) {
    nsCOMPtr<nsI4xPlugin> the4xPlugin
    the4xPlugin = do_QueryInterface(mEntryPoint);
    if (!the4xPlugin) {
      isXPCOM = PR_TRUE;
    }
  }
  if (isXPCOM) return; // bail if XPCOM

My need to call Release() first, I forget what the refcount is of an XPCOM 
plugin at this point in execution.

Re-assigning to blizzard@mozilla.org to get it off my plate. Manager says I need 
to concentrate to fix Windows bugs for 0.9.1 embedding and this is bug is Linux 
(although VERY important). Please re-assign if you feel otherwise.

BTW, what is the current status of patches check-in to the trunk/branch on this 
bug?
Assignee: peterlubczynski → blizzard
Status: ASSIGNED → NEW
Peter: I tried the last patch (using 4x plugin), and it crashed when leaving an
applet. Taking out only "!" from your checkined fix seemed working for all three
cases (leaving an applet page, launch and exit, go to about:plugins), as the
following:
	old: if (isJava && !aForceShutdown) return;
	new: if (isJava && aForceShutdown) return;
Joe, my bad. There is no interface nsI4xPlugin, but rather ns4xPlugin implements 
nsIPlugin which is probably why that patch failed. In fact, what did you try 
since nsI4xPlugin won't compile? It makes sense that removing the bang from 
aForceShutdown fixes the problem as the code below is NEVER executed for the 
Java plugin. It prevents the crash...but only for Java. Other XPCOM plugins 
(like Real) will surely crash here.

Hm....I'm going on a hunch here that bailing for any XPCOM plugin is the right 
thing to do. Can anyone back me up?

Joe, can you try modifying my last patch. Instead of the doing the QI, check to 
see if mEntryPoints can be cast to an ns4xPlugin. If you are able to get a 
ns4xPlugin out of mEntryPoints then we have a 4.x plugin for sure and that rest 
of the code needs to be called.

I'll do some more testing when I get home.
Assuming you mean dynamic_cast<>, you can't use that in Mozilla: not all of our
compilers support it, and, in fact, we turn off RTTI on Linux explicitly.  (I
don't think static_cast<> or reinterpret_cast<> will tell you anything useful.)

Maybe we do need a (dummy) nsI4xPlugin interface, perhaps with an Unload/Reload
method where all of this logic could live?
>I'm going on a hunch here that bailing for any XPCOM plugin is the right 
>thing to do. Can anyone back me up?

Bailing during the ReloadPlugins sequence? yes.
Bailing at xpcom-shutdown for properly implemented xpcom plugins?  No way.  If
you do that, then just get rid of nsIPlugin::Initialize and nsIPlugin::Shutdown
altogether.  They weren't even being called until a couple days before 0.8.1 was
branched (bug 45009).


Wow. Just read back over bug 45009 and this bug now makes a whole lot more 
sense. Isn't that the bug that introduced holding the service manager past XPCOM 
shutdown as Andrei says that nsPluginHostImpl's dtor isn't 
called and nsIObserver was added. Ed, looks like you started with the patches, 
what's your take? cc:ing Waterson on the hook for the sr=

Question for the experts: When is the correct time to call 
nsIPlugin->Shutdown()? Shouldn't refcouting handle when shutdown gets called in 
XPCOM land? Should we call nsIPlugin->Shutdown before the RELEASE? For 4.x 
plugins it is a different case as nsIPlugin->Shutdown can be called not only at 
shutdown, but during ReloadPlugins and when the last instance gets Destroyed.

Sean says that "PR_FindSymbol(pluginTag->mLibrary, "NSGetFactory");
is not satisfactory for determining if a plugin module is an xpcom plugin or a 
4x plugin". Why? Will that not easily distinguish XPCOM plugins from 4.x at the 
library level?

I will try creating another band-aid patch to test. I'm almost all out of 
band-aids, though. :-)
>Sean says that "PR_FindSymbol(pluginTag->mLibrary, "NSGetFactory");
>is not satisfactory for determining if a plugin module is an xpcom plugin or a 
>4x plugin". Why? Will that not easily distinguish XPCOM plugins from 4.x at the
>library level?

NSGetFactory was deprecated a long time ago and is going away in 0.9.1.  XPCOM
components don't use NSGetFactory, they use NSGetModule.


>I will try creating another band-aid patch to test. I'm almost all out of 
>band-aids, though. :-)

0.9 was released today so I don't think we need another bandaid on account of it.

Peter: re bug 45009, I recommend talking to Andrei about it, since the fix that 
was checked in differed substantially from the patch I started with.  My 
interest in bug 45009 was to un-block bug 49510.
SPAM: mozilla 0.9 (and M1, and 0.8.1, etc.) has left the building.  please
update the target milestone so we can get a good idea of what's left for 0.9.1.
Patches look to be checked in and I think this is working. Marking FIXED. Please 
re-open if you think otherwise.

Note: bug 76936 will now deal with the crash in 
@nsPluginInstanceOwner::GetInstance
Status: NEW → RESOLVED
Closed: 23 years ago23 years ago
Resolution: --- → FIXED
Oops. I mean bug 80105.
works fine on linux/win 20010517 trunk builds. Junruh, pls verify on mac. Thx!
OS: All → Mac System 9.x
QA Contact: shrir → junruh
Hardware: PC → Macintosh
Verified on Mac netscape 6 2001051708
Status: RESOLVED → VERIFIED
Please reopen : this is a regression 
about:plugin crashes Mozilla

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.3; MultiZilla v1.4.0.1)
Gecko/20030317
Yves Lambert, <about:plugins> is not "an applet page" and not even a page with
plugins, it just lists them, it doesn't use plugins. Please file a new bug (even
if it had the same symptoms, but it doesn't).
Product: Core → Core Graveyard
Crash Signature: [@ nsPluginInstanceOwner::~nsPluginInstanceOwner]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: