Closed Bug 144838 Opened 22 years ago Closed 22 years ago

[viewpoint] Crashed destroying ?scriptable interface? after reload

Categories

(Core Graveyard :: Plug-ins, defect, P1)

x86
Windows 2000
defect

Tracking

(Not tracked)

VERIFIED DUPLICATE of bug 148889
mozilla1.0

People

(Reporter: aberger, Assigned: peterl-bugs)

References

()

Details

(Keywords: crash)

From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)
BuildID:    1.0 branch

When our plugin tries to load content and is told that it is too old, it begins 
an autoupdate process.  It brings down a new dll and xpt (which we use for 
sending javascript in and out of the plugin).
After we have the new dll and xpt in the correct folders, we GetURL
("javascript:navigator.plugins.refresh(true)").  This causes the page to get 
reloaded, and our new plugin is loaded and starts to show content.
A few seconds later, Mozilla crashes.  I think that it is the old scriptable 
interface being destroyed (garbage collection?), but I could be wrong. Here is 
the call stack:
nsCOMPtr<nsIClassInfo>::~nsCOMPtr<nsIClassInfo>() line 490 + 13 bytes
XPCWrappedNativeProto::~XPCWrappedNativeProto() line 83 + 11 bytes
XPCWrappedNativeProto::`scalar deleting destructor'(unsigned int 1) + 15 bytes
DyingProtoKiller(JSDHashTable * 0x010fc8c8, JSDHashEntryHdr * 0x03950000, 
unsigned long 10, void * 0x00000000) line 200 + 28 bytes
JS_DHashTableEnumerate(JSDHashTable * 0x010fc8c8, int (JSDHashTable *, 
JSDHashEntryHdr *, unsigned long, void *)* 0x014e9e50 DyingProtoKiller
(JSDHashTable *, JSDHashEntryHdr *, unsigned long, void *), void * 0x00000000) 
line 599 + 34 bytes
XPCWrappedNativeProtoMap::Enumerate(JSDHashOperator (JSDHashTable *, 
JSDHashEntryHdr *, unsigned long, void *)* 0x014e9e50 DyingProtoKiller
(JSDHashTable *, JSDHashEntryHdr *, unsigned long, void *), void * 0x00000000) 
line 639 + 27 bytes
XPCJSRuntime::GCCallback(JSContext * 0x0352b428, JSGCStatus JSGC_FINALIZE_END) 
line 485
DOMGCCallback(JSContext * 0x0352b428, JSGCStatus JSGC_FINALIZE_END) line 1622 + 
23 bytes
js_GC(JSContext * 0x0352b428, unsigned int 0) line 1345 + 12 bytes
js_ForceGC(JSContext * 0x0352b428, unsigned int 0) line 979 + 13 bytes
JS_GC(JSContext * 0x0352b428) line 1644 + 11 bytes
nsJSContext::Notify(nsJSContext * const 0x0352b248, nsITimer * 0x03859340) line 
1569 + 13 bytes
nsTimerImpl::Fire() line 348
nsTimerManager::FireNextIdleTimer(nsTimerManager * const 0x01792f48) line 591
nsAppShell::Run(nsAppShell * const 0x016d0b10) line 134
nsAppShellService::Run(nsAppShellService * const 0x016d0568) line 451
main1(int 1, char * * 0x00307c48, nsISupports * 0x00000000) line 1431 + 32 bytes
main(int 1, char * * 0x00307c48) line 1779 + 37 bytes
mainCRTStartup() line 338 + 17 bytes
KERNEL32! 77e97d08()

In the NSCOMPTR destructor:
          if ( mRawPtr )
            NSCAP_RELEASE(this, mRawPtr);
        }
mRawPtr is not NULL, but in the debugger it looks like it is probably not valid.

Looks to me like some timer is going off, which causes this object to get 
destroyed, but it is already gone (because the page was reloaded already? 
Because we replaced it with a new xpt?)


Reproducible: Sometimes
Steps to Reproduce:
1.Giving steps to reproduce this will be difficult.  If somebody needs to try 
to reproduce it, they can contact me.  If there are any suggestions as to how I 
could debug it myself and what to look for, I can work with someone.
assigning to peterl
Assignee: beppe → peterl
Priority: -- → P1
Whiteboard: [ViewPoint]
Target Milestone: --- → mozilla1.0
....wonder if its something with garbage collection????

The javascript:navigator.plugins.refresh(1) command causes all running plugins
to be shutdown, even in other windows.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Keywords: crash
Summary: Crashed destroying ?scriptable interface? after reload → [viewpoint] Crashed destroying ?scriptable interface? after reload
Whiteboard: [ViewPoint]
I would be very suprised if xpconnect is doing the refcounting wrong. I would
bet that there is a refcount imbalance either in the plugin subsystem or in the
plugin itself. If the plugin is unloading itself (and in effect deleting the
objects it has created - well maybe not the object itself if you are using the
shared heap functions, but certainly the vtbls and method code) while there are
still valid references held to those objects then you are going to crash like
this. We saw something like this once before where a dll component was
explicitly deleting objects and unloading its dll at app shutdown time with no
regard to the objects' ref counts and the normal course of the release calls as
other code released references during the shutdown sequence. That is certainly bad.

I'm guessing that you are using the nsIClassInfo mixin code in your plugin. It
is probably that classinfo that is being released here. JS GC happens when it
happens. Objects need to respect the references held and not delete or unload
prematurely.
Ari, do you see someone calling ~nsIClassInfo first?

I believe we should be keeping all scriptable plugins in memory and no unloading
them but that could be the bug here, especially with a TRUE argument to
navigator.plugins.refresh(1) which is supposed to shutdown and unload.

Hm...this maybe a long shot, but I wonder if calling
NPN_SetValue(NPPVpluginKeepLibraryInMemory, TRUE) helps?
Of course, while I could repeat this crash easily yesterday, and QA can still 
repeat it, I am having trouble trying to repeat it today.  However, I do have 
this information to offer:
1) We never explicitly destroy our scriptable class (derived off of 
classInfoMixin).  We merely call Release.  Normally, this causes it to get 
destroyed on exit:
CPluginInstanceScriptable::~CPluginInstanceScriptable() line 4142
CPluginInstanceScriptable::`scalar deleting destructor'(unsigned int 1) + 15 
bytes
CPluginInstanceScriptable::Release(CPluginInstanceScriptable * const 
0x037f8d48) line 4158 + 68 bytes
CPluginInstance::Destroy() line 736
NPP_Destroy(_NPP * 0x03a57400, _NPSavedData * * 0x0012e2c0) line 381
ns4xPluginInstance::Stop(ns4xPluginInstance * const 0x03a573e8) line 710 + 68 
bytes
nsActivePluginList::stopRunning() line 629
nsPluginHostImpl::ReloadPlugins(nsPluginHostImpl * const 0x0182a278, int 1) 
line 2751
PluginArrayImpl::Refresh(PluginArrayImpl * const 0x038b5300, int 1) line 204 + 
27 bytes

I do NOT think that the object is being deleted twice when it crashes.  I think 
what happens is that you try to delete the object (through the callstack that I 
posted yesterday) but the dll has already been unloaded!
The way that I see this is that if I add the code that Peter suggested trying 
(NPN_SetValue(NPPVpluginKeepLibraryInMemory, TRUE)) 
Then the destructor is NOT called on page refresh, and I get the same crash as 
yesterday (the crash is in trying to destroy the nsiclassinfo).  When I look at 
the modules currently loaded in memory, our new plugin (different filename) is 
currently running, but the OLD plugin (whose nsiclassinfo is being deleted) is 
not in memory!

So i think that whoever is deciding to not destroy the scriptable until garbage 
collection time is still allowing the library to actually get unloaded.
Sounds like an apt assessment to me. Unfortunately, (as I understand things) the
mechanisms of JS GC and xpcom refcounting are quite independent of the plugin
manager's mechanisms for dll unloading. I don't have a handy solution to offer.
Any one-off solutions that might be suggested for this specific case regarding
xpconnect would be, I think, off the mark. Once plugins are doing xpcom, then
*any* code might be holding valid references to objects created by the plugin.
I'd say the plugin manager has a real problem on its hands here in deciding (in
concert with the plugin itself probably) whether of not it is truly safe to
unload a given plugin dll at a given time. This is an issue me decided was too
thorny to handle for generic xpcom component dlls - for which we opted to simply
not implement dll unloading. Perhaps that is not an option here. But, I can't
say what the best solution will be.
FWIW, your stack trace doesn't look like one that is caused by an access
violation from a DLL that was unloaded. If the DLL was unloaded you should have
something like 12345678() as the last entry in the stack (was that the case and
it was just omitted?). The jump will actually be made, and the violation occurs
at the former location of the DLL.
good point re: access violation. But since I don't have nay better ideas at the 
moment, I will try calling loadLibrary on ourselves to avoid getting unloaded 
until the process goes away, and see if QA sees any difference.
Any other thoughts?
I think there may be a mess-up in the refcount. This is starting to show up in a
few places. Possible dups:
bug 148889
bug 144838
bug 109281
So bug 150764 also has a similar stack. In that bug, it appears that if the
plugin asks the host not to unload the library, we don't crash.

Since all these stacks looks like they are going through garbage collection,
would it be possible to fix this crash by forcing a GC to happen before
unloading the library?
> ...would it be possible to fix this crash by forcing a GC... ?

No, that would be a very bad bet. It assumes that doing a gc will for sure
release all references. But if the JS code happens to hold a reference then that
assumption will be incorrect. e.g...

var foo = some_plugin_interface;

// do the stuff that unloads with a forced gc. xpconnect does not release 
// references into plugin here because the JS code holds a ref. 

var foo = null;

// gc happens sometime later and xpconnect does try to release. bad things happen

The above noted bugs are clearly dups - and should be marked as such. 

Either there is a refcounting error - some code is releasing refs held by other
code - or you are unloading a dll for which there are still live referenced
objects. You need to figure out how to fix the real problem. I don't buy the
idea that the occurance of gc is the source of the problem or that invoking fgc
at a particular time is a viable solution.
Well...yeah...I think we are running into the problem where we still have a ref
to one of the plugin's objects and we are unloading their DLL. In bug 148889,
Jeff from EA say if he tells the us not to unload his plugin (through a special
NPAPI call), the crash does not happen. Serge attached a patch that also fixes
this problem by basically automatically doing the same thing if the plugin tells
us it's scriptable.  How can we tell if all references to the plugin have been
released? It looks like we may just have to leak the library if no other
solution can be found  but it could break plugin upgrade procedures where they
depend that a plugins.refresh(1) unloads the DLL.

Since bug 148889 has a patch and most of the same comments, I'm gona make that
the bug others are dupped against.

*** This bug has been marked as a duplicate of 148889 ***
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → DUPLICATE
mass duplicate verifications . For filtering purposes, pls use keywd
"massdupverification"

Status: RESOLVED → VERIFIED
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.