144838 - [viewpoint] Crashed destroying ?scriptable interface? after reload

Reporter

Description

•

22 years ago

From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)
BuildID:    1.0 branch

When our plugin tries to load content and is told that it is too old, it begins 
an autoupdate process.  It brings down a new dll and xpt (which we use for 
sending javascript in and out of the plugin).
After we have the new dll and xpt in the correct folders, we GetURL
("javascript:navigator.plugins.refresh(true)").  This causes the page to get 
reloaded, and our new plugin is loaded and starts to show content.
A few seconds later, Mozilla crashes.  I think that it is the old scriptable 
interface being destroyed (garbage collection?), but I could be wrong. Here is 
the call stack:
nsCOMPtr<nsIClassInfo>::~nsCOMPtr<nsIClassInfo>() line 490 + 13 bytes
XPCWrappedNativeProto::~XPCWrappedNativeProto() line 83 + 11 bytes
XPCWrappedNativeProto::`scalar deleting destructor'(unsigned int 1) + 15 bytes
DyingProtoKiller(JSDHashTable * 0x010fc8c8, JSDHashEntryHdr * 0x03950000, 
unsigned long 10, void * 0x00000000) line 200 + 28 bytes
JS_DHashTableEnumerate(JSDHashTable * 0x010fc8c8, int (JSDHashTable *, 
JSDHashEntryHdr *, unsigned long, void *)* 0x014e9e50 DyingProtoKiller
(JSDHashTable *, JSDHashEntryHdr *, unsigned long, void *), void * 0x00000000) 
line 599 + 34 bytes
XPCWrappedNativeProtoMap::Enumerate(JSDHashOperator (JSDHashTable *, 
JSDHashEntryHdr *, unsigned long, void *)* 0x014e9e50 DyingProtoKiller
(JSDHashTable *, JSDHashEntryHdr *, unsigned long, void *), void * 0x00000000) 
line 639 + 27 bytes
XPCJSRuntime::GCCallback(JSContext * 0x0352b428, JSGCStatus JSGC_FINALIZE_END) 
line 485
DOMGCCallback(JSContext * 0x0352b428, JSGCStatus JSGC_FINALIZE_END) line 1622 + 
23 bytes
js_GC(JSContext * 0x0352b428, unsigned int 0) line 1345 + 12 bytes
js_ForceGC(JSContext * 0x0352b428, unsigned int 0) line 979 + 13 bytes
JS_GC(JSContext * 0x0352b428) line 1644 + 11 bytes
nsJSContext::Notify(nsJSContext * const 0x0352b248, nsITimer * 0x03859340) line 
1569 + 13 bytes
nsTimerImpl::Fire() line 348
nsTimerManager::FireNextIdleTimer(nsTimerManager * const 0x01792f48) line 591
nsAppShell::Run(nsAppShell * const 0x016d0b10) line 134
nsAppShellService::Run(nsAppShellService * const 0x016d0568) line 451
main1(int 1, char * * 0x00307c48, nsISupports * 0x00000000) line 1431 + 32 bytes
main(int 1, char * * 0x00307c48) line 1779 + 37 bytes
mainCRTStartup() line 338 + 17 bytes
KERNEL32! 77e97d08()

In the NSCOMPTR destructor:
          if ( mRawPtr )
            NSCAP_RELEASE(this, mRawPtr);
        }
mRawPtr is not NULL, but in the debugger it looks like it is probably not valid.

Looks to me like some timer is going off, which causes this object to get 
destroyed, but it is already gone (because the page was reloaded already? 
Because we replaced it with a new xpt?)


Reproducible: Sometimes
Steps to Reproduce:
1.Giving steps to reproduce this will be difficult.  If somebody needs to try 
to reproduce it, they can contact me.  If there are any suggestions as to how I 
could debug it myself and what to look for, I can work with someone.

rubydoo123

Comment 1

•

22 years ago

assigning to peterl

Assignee: beppe → peterl

Priority: -- → P1

Whiteboard: [ViewPoint]

Target Milestone: --- → mozilla1.0

Peter Lubczynski

Comment 2

•

22 years ago

....wonder if its something with garbage collection????

The javascript:navigator.plugins.refresh(1) command causes all running plugins
to be shutdown, even in other windows.

Status: UNCONFIRMED → NEW

Ever confirmed: true

Keywords: crash

Summary: Crashed destroying ?scriptable interface? after reload → [viewpoint] Crashed destroying ?scriptable interface? after reload

Whiteboard: [ViewPoint]

John Bandhauer

Comment 3

•

22 years ago

I would be very suprised if xpconnect is doing the refcounting wrong. I would
bet that there is a refcount imbalance either in the plugin subsystem or in the
plugin itself. If the plugin is unloading itself (and in effect deleting the
objects it has created - well maybe not the object itself if you are using the
shared heap functions, but certainly the vtbls and method code) while there are
still valid references held to those objects then you are going to crash like
this. We saw something like this once before where a dll component was
explicitly deleting objects and unloading its dll at app shutdown time with no
regard to the objects' ref counts and the normal course of the release calls as
other code released references during the shutdown sequence. That is certainly bad.

I'm guessing that you are using the nsIClassInfo mixin code in your plugin. It
is probably that classinfo that is being released here. JS GC happens when it
happens. Objects need to respect the references held and not delete or unload
prematurely.

Peter Lubczynski

Comment 4

•

22 years ago

Ari, do you see someone calling ~nsIClassInfo first?

I believe we should be keeping all scriptable plugins in memory and no unloading
them but that could be the bug here, especially with a TRUE argument to
navigator.plugins.refresh(1) which is supposed to shutdown and unload.

Hm...this maybe a long shot, but I wonder if calling
NPN_SetValue(NPPVpluginKeepLibraryInMemory, TRUE) helps?

Ari Berger

Reporter

Comment 5

•

22 years ago

Of course, while I could repeat this crash easily yesterday, and QA can still 
repeat it, I am having trouble trying to repeat it today.  However, I do have 
this information to offer:
1) We never explicitly destroy our scriptable class (derived off of 
classInfoMixin).  We merely call Release.  Normally, this causes it to get 
destroyed on exit:
CPluginInstanceScriptable::~CPluginInstanceScriptable() line 4142
CPluginInstanceScriptable::`scalar deleting destructor'(unsigned int 1) + 15 
bytes
CPluginInstanceScriptable::Release(CPluginInstanceScriptable * const 
0x037f8d48) line 4158 + 68 bytes
CPluginInstance::Destroy() line 736
NPP_Destroy(_NPP * 0x03a57400, _NPSavedData * * 0x0012e2c0) line 381
ns4xPluginInstance::Stop(ns4xPluginInstance * const 0x03a573e8) line 710 + 68 
bytes
nsActivePluginList::stopRunning() line 629
nsPluginHostImpl::ReloadPlugins(nsPluginHostImpl * const 0x0182a278, int 1) 
line 2751
PluginArrayImpl::Refresh(PluginArrayImpl * const 0x038b5300, int 1) line 204 + 
27 bytes

I do NOT think that the object is being deleted twice when it crashes.  I think 
what happens is that you try to delete the object (through the callstack that I 
posted yesterday) but the dll has already been unloaded!
The way that I see this is that if I add the code that Peter suggested trying 
(NPN_SetValue(NPPVpluginKeepLibraryInMemory, TRUE)) 
Then the destructor is NOT called on page refresh, and I get the same crash as 
yesterday (the crash is in trying to destroy the nsiclassinfo).  When I look at 
the modules currently loaded in memory, our new plugin (different filename) is 
currently running, but the OLD plugin (whose nsiclassinfo is being deleted) is 
not in memory!

So i think that whoever is deciding to not destroy the scriptable until garbage 
collection time is still allowing the library to actually get unloaded.

John Bandhauer

Comment 6

•

22 years ago

Sounds like an apt assessment to me. Unfortunately, (as I understand things) the
mechanisms of JS GC and xpcom refcounting are quite independent of the plugin
manager's mechanisms for dll unloading. I don't have a handy solution to offer.
Any one-off solutions that might be suggested for this specific case regarding
xpconnect would be, I think, off the mark. Once plugins are doing xpcom, then
*any* code might be holding valid references to objects created by the plugin.
I'd say the plugin manager has a real problem on its hands here in deciding (in
concert with the plugin itself probably) whether of not it is truly safe to
unload a given plugin dll at a given time. This is an issue me decided was too
thorny to handle for generic xpcom component dlls - for which we opted to simply
not implement dll unloading. Perhaps that is not an option here. But, I can't
say what the best solution will be.

David Bradley

Comment 7

•

22 years ago

FWIW, your stack trace doesn't look like one that is caused by an access
violation from a DLL that was unloaded. If the DLL was unloaded you should have
something like 12345678() as the last entry in the stack (was that the case and
it was just omitted?). The jump will actually be made, and the violation occurs
at the former location of the DLL.

Ari Berger

Reporter

Comment 8

•

22 years ago

good point re: access violation. But since I don't have nay better ideas at the 
moment, I will try calling loadLibrary on ourselves to avoid getting unloaded 
until the process goes away, and see if QA sees any difference.
Any other thoughts?

Peter Lubczynski

Comment 9

•

22 years ago

I think there may be a mess-up in the refcount. This is starting to show up in a
few places. Possible dups:
bug 148889
bug 144838
bug 109281

Peter Lubczynski

Comment 10

•

22 years ago

So bug 150764 also has a similar stack. In that bug, it appears that if the
plugin asks the host not to unload the library, we don't crash.

Since all these stacks looks like they are going through garbage collection,
would it be possible to fix this crash by forcing a GC to happen before
unloading the library?

John Bandhauer

Comment 11

•

22 years ago

> ...would it be possible to fix this crash by forcing a GC... ?

No, that would be a very bad bet. It assumes that doing a gc will for sure
release all references. But if the JS code happens to hold a reference then that
assumption will be incorrect. e.g...

var foo = some_plugin_interface;

// do the stuff that unloads with a forced gc. xpconnect does not release 
// references into plugin here because the JS code holds a ref. 

var foo = null;

// gc happens sometime later and xpconnect does try to release. bad things happen

The above noted bugs are clearly dups - and should be marked as such. 

Either there is a refcounting error - some code is releasing refs held by other
code - or you are unloading a dll for which there are still live referenced
objects. You need to figure out how to fix the real problem. I don't buy the
idea that the occurance of gc is the source of the problem or that invoking fgc
at a particular time is a viable solution.

Peter Lubczynski

Comment 12

•

22 years ago

Well...yeah...I think we are running into the problem where we still have a ref
to one of the plugin's objects and we are unloading their DLL. In bug 148889,
Jeff from EA say if he tells the us not to unload his plugin (through a special
NPAPI call), the crash does not happen. Serge attached a patch that also fixes
this problem by basically automatically doing the same thing if the plugin tells
us it's scriptable.  How can we tell if all references to the plugin have been
released? It looks like we may just have to leak the library if no other
solution can be found  but it could break plugin upgrade procedures where they
depend that a plugins.refresh(1) unloads the DLL.

Since bug 148889 has a patch and most of the same comments, I'm gona make that
the bug others are dupped against.

*** This bug has been marked as a duplicate of 148889 ***

Status: NEW → RESOLVED

Closed: 22 years ago

Resolution: --- → DUPLICATE

shrirang khanzode

Comment 13

•

22 years ago

mass duplicate verifications . For filtering purposes, pls use keywd
"massdupverification"

Status: RESOLVED → VERIFIED

BMO Automation

Updated

•

2 years ago

Product: Core → Core Graveyard

Bugzilla

Quick Search

[viewpoint] Crashed destroying ?scriptable interface? after reload

Categories

(Core Graveyard :: Plug-ins, defect, P1)

Tracking

(Not tracked)

People

(Reporter: aberger, Assigned: peterl-bugs)

References

(
URL
)

Details

(Keywords: crash)

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Comment 13

Updated