Closed Bug 421217 Opened 16 years ago Closed 16 years ago

blocklist old Silverlight 2.0 beta versions - : crashes when closing tab/window- NPRuntime object reference counting is violated by NPObjWrapperPluginDestroyedCallback [@ JS_SetPrivate - NPObjWrapperPluginDestroyedCallback]

Categories

(Toolkit :: Blocklist Policy Requests, defect)

defect
Not set
blocker

Tracking

()

VERIFIED INVALID

People

(Reporter: phiw2, Unassigned)

References

()

Details

(Keywords: crash, topcrash)

Crash Data

Attachments

(4 files)

Attached file camino crash log
view some Silverlight 2.0 pages
(sample linked form http://silverlight.net/
close tab, crash.

Bothe with Minefield and Camino trunk

http://crash-stats.mozilla.com/report/index/85b42d75-eb26-11dc-9101-001a4bd43ef6
Signature	JS_SetPrivate
UUID	85b42d75-eb26-11dc-9101-001a4bd43ef6
Time	2008-03-05 18:38:54-08:00
Uptime	0
Product	Firefox
Version	3.0b5pre
Build ID	2008030504
OS	Mac OS X
OS Version	10.5.2 9C31
CPU	x86
CPU Info	GenuineIntel family 6 model 15 stepping 11
Crash Reason	EXC_BAD_ACCESS / KERN_PROTECTION_FAILURE
Crash Address	0x9fe3d
Comments	More silverlight 2 beta crash (when closing the window)
Crashing Thread
Frame 	Signature 	Source
0 	JS_SetPrivate 	mozilla/js/src/jsapi.c:2818
1 	NPObjWrapperPluginDestroyedCallback(PLDHashTable*, PLDHashEntryHdr*, unsigned int, void*) 	mozilla/modules/plugin/base/src/nsJSNPRuntime.cpp:1774
2 	PL_DHashTableEnumerate 	pldhash.c:724
3 	nsJSNPRuntime::OnPluginDestroy(_NPP*) 	mozilla/modules/plugin/base/src/nsJSNPRuntime.cpp:1815
4 	ns4xPluginInstance::Stop() 	mozilla/modules/plugin/base/src/ns4xPluginInstance.cpp:960
5 	DoStopPlugin(nsPluginInstanceOwner*) 	mozilla/layout/generic/nsObjectFrame.cpp:1663
6 	nsObjectFrame::StopPluginInternal(int) 	mozilla/layout/generic/nsObjectFrame.cpp:1723
7 	nsObjectFrame::Destroy() 	mozilla/layout/generic/nsObjectFrame.cpp:549
8 	nsFrameList::DestroyFrames() 	mozilla/layout/generic/nsFrameFrame.cpp:67
9 	nsContainerFrame::Destroy() 	mozilla/layout/generic/nsContainerFrame.cpp:257
10 	nsLineBox::DeleteLineList(nsPresContext*, nsLineList&) 	mozilla/layout/generic/nsLineBox.cpp:340
11 	nsBlockFrame::Destroy() 	mozilla/layout/generic/nsBlockFrame.cpp:300
12 	nsFrameList::DestroyFrames() 	mozilla/layout/generic/nsFrameFrame.cpp:67
13 	nsBlockFrame::Destroy() 	mozilla/layout/generic/nsBlockFrame.cpp:288
14 	nsLineBox::DeleteLineList(nsPresContext*, nsLineList&) 	mozilla/layout/generic/nsLineBox.cpp:340
15 	nsBlockFrame::Destroy() 	mozilla/layout/generic/nsBlockFrame.cpp:300
16 	nsLineBox::DeleteLineList(nsPresContext*, nsLineList&) 	mozilla/layout/generic/nsLineBox.cpp:340
17 	nsBlockFrame::Destroy() 	mozilla/layout/generic/nsBlockFrame.cpp:300
18 	nsLineBox::DeleteLineList(nsPresContext*, nsLineList&) 	mozilla/layout/generic/nsLineBox.cpp:340
19 	nsBlockFrame::Destroy() 	mozilla/layout/generic/nsBlockFrame.cpp:300
20 	nsFrameList::DestroyFrames() 	mozilla/layout/generic/nsFrameFrame.cpp:67
21 	nsContainerFrame::Destroy() 	mozilla/layout/generic/nsContainerFrame.cpp:257
22 	CanvasFrame::Destroy() 	mozilla/layout/generic/nsHTMLFrame.cpp:206
23 	nsFrameList::DestroyFrames() 	mozilla/layout/generic/nsFrameFrame.cpp:67
24 	nsContainerFrame::Destroy() 	mozilla/layout/generic/nsContainerFrame.cpp:257
25 	nsFrameList::DestroyFrames() 	mozilla/layout/generic/nsFrameFrame.cpp:67
26 	nsContainerFrame::Destroy() 	mozilla/layout/generic/nsContainerFrame.cpp:257
27 	nsFrameManager::Destroy() 	mozilla/layout/base/nsFrameManager.cpp:283
28 	PresShell::Destroy() 	mozilla/layout/base/nsPresShell.cpp:1677
29 	DocumentViewerImpl::Hide() 	mozilla/layout/base/nsDocumentViewer.cpp:1981
30 	nsDocShell::SetVisibility(int) 	mozilla/docshell/base/nsDocShell.cpp:3904
31 	nsSubDocumentFrame::Destroy() 	mozilla/layout/generic/nsFrameFrame.cpp:770
32 	nsFrameList::DestroyFrames() 	mozilla/layout/generic/nsFrameFrame.cpp:67
33 	nsContainerFrame::Destroy() 	mozilla/layout/generic/nsContainerFrame.cpp:257
34 	nsFrameList::DestroyFrames() 	mozilla/layout/generic/nsFrameFrame.cpp:67
35 	nsContainerFrame::Destroy() 	mozilla/layout/generic/nsContainerFrame.cpp:257
36 	nsBoxFrame::RemoveFrame(nsIAtom*, nsIFrame*) 	mozilla/layout/xul/base/src/nsBoxFrame.cpp:1024
37 	nsFrameManager::RemoveFrame(nsIFrame*, nsIAtom*, nsIFrame*) 	mozilla/layout/base/nsFrameManager.cpp:694
38 	nsCSSFrameConstructor::ContentRemoved(nsIContent*, nsIContent*, int, int*) 	mozilla/layout/base/nsCSSFrameConstructor.cpp:9649
39 	PresShell::ContentRemoved(nsIDocument*, nsIContent*, nsIContent*, int) 	mozilla/layout/base/nsPresShell.cpp:4739
40 	nsNodeUtils::ContentRemoved(nsINode*, nsIContent*, int) 	mozilla/content/base/src/nsNodeUtils.cpp:166
41 	nsGenericElement::doRemoveChildAt(unsigned int, int, nsIContent*, nsIContent*, nsIDocument*, nsAttrAndChildArray&) 	mozilla/content/base/src/nsGenericElement.cpp:2842
42 	nsGenericElement::RemoveChildAt(unsigned int, int) 	mozilla/content/base/src/nsGenericElement.cpp:2775
43 	nsXULElement::RemoveChildAt(unsigned int, int) 	mozilla/layout/xul/base/src/grid/nsGridRowGroupFrame.cpp:938
44 	nsGenericElement::doRemoveChild(nsIDOMNode*, nsIContent*, nsIDocument*, nsIDOMNode**) 	mozilla/content/base/src/nsGenericElement.cpp:3425
45 	nsGenericElement::RemoveChild(nsIDOMNode*, nsIDOMNode**) 	mozilla/content/base/src/nsGenericElement.cpp:2991
46 	NS_InvokeByIndex_P 	mozilla/xpcom/reflect/xptcall/src/md/unix/xptcinvoke_unixish_x86.cpp:179
47 	XPCWrappedNative::CallMethod(XPCCallContext&, XPCWrappedNative::CallMode) 	mozilla/js/src/xpconnect/src/xpcwrappednative.cpp:2369
48 	XPC_WN_CallMethod(JSContext*, JSObject*, unsigned int, long*, long*) 	mozilla/js/src/xpconnect/src/xpcwrappednativejsops.cpp:1470
49 	js_Invoke 	mozilla/js/src/jsinvoke.c:1265
50 	js_Interpret 	mozilla/js/src/jsinterp.c:4823
51 	js_Invoke 	mozilla/js/src/jsinvoke.c:1281
52 	js_InternalInvoke 	mozilla/js/src/jsinvoke.c:1337
53 	JS_CallFunctionValue 	mozilla/js/src/jsapi.c:4982
54 	nsJSContext::CallEventHandler(nsISupports*, void*, void*, nsIArray*, nsIVariant**) 	mozilla/dom/src/base/nsJSEnvironment.cpp:1961
55 	nsJSEventListener::HandleEvent(nsIDOMEvent*) 	mozilla/dom/src/events/nsJSEventListener.cpp:248
Severity: major → critical
Summary: Silverlight 2.0 beta: crashes when closing tab/window → Silverlight 2.0 beta: crashes when closing tab/window [@ JS_SetPrivate - NPObjWrapperPluginDestroyedCallback]
There are known issues with Silverlight 2 in Firefox trunk.  

--
>>>>>> Will this upcoming Silverlight Beta have support for Firefox 3.0 Beta 3?

FireFox 3.0 will be supported by SL 2 when it ships.  They recently changed how rendering works for plugins with their recent beta, and we know we have some issues with that with Beta1.  We'll be fixing them in Beta2 though.

Hope this helps,

Scott
--
That's in the comments
http://weblogs.asp.net/scottgu/archive/2008/02/22/first-look-at-silverlight-2.aspx

I'm just pointing that out.  I'm not saying this can't be our problem.  
Also happening on windows:
http://crash-stats.mozilla.com/report/index/4fd222f0-eebf-11dc-a09c-001a4bd43ed6
0  	JS_SetPrivate  	 mozilla/js/src/jsapi.c:2843
1 	NPObjWrapperPluginDestroyedCallback 	mozilla/modules/plugin/base/src/nsJSNPRuntime.cpp:1774
2 	xul.dll@0x8afd27 	
The person, who got this crash mentioned he was visiting msn.com.
OS: Mac OS X → All
Is this just Silverlight 2.0 beta, or does it also happen with Silverlight 1.0?
Nominating for blocking1.9? since this is currently the #6 topcrash for Firefox 3.0b4.
Flags: blocking1.9?
Keywords: topcrash
(In reply to comment #6)
> Is this just Silverlight 2.0 beta, or does it also happen with Silverlight 1.0?
> 
As far I can tell (OS X side of things - Minefield and Camino trunk), Silverlight 1.0 doesn't crash.

So when I load http://www.msn.com/ with Silverlight 2.0 beta 1 installed, I get an assertion at this stack.

Then when the control goes back out to NPObjWrapperPluginDestroyedCallback, we crash when calling JS_SetPrivate a few lines after our call into the plugin.
I thought this might fix it (and we might want this patch anyway).  It fixes the assertion, but not the crash.
Comment on attachment 308911 [details]
stack showing something pretty related to the problem

So the reason the above isn't sufficient is, I think, because:

> 	gkplugin.dll!nsNPObjWrapper::OnDestroy(NPObject * npobj=0x00189090)  Line 1632	C++
> 	gkplugin.dll!_releaseobject(NPObject * npobj=0x00189090)  Line 1619 + 0x9 bytes	C++
> 	npctrl.dll!CNPBrowser::ReleaseObject()  + 0x30 bytes	
> 	npctrl.dll!PluginObject::pluginDeallocate()  + 0x1b bytes	
> 	gkplugin.dll!NPObjWrapperPluginDestroyedCallback(PLDHashTable * table=0x05a17434, PLDHashEntryHdr * hdr=0x06156798, unsigned int number=0, void * arg=0x0012f898)  Line 1769 + 0xe bytes	C++

the top of the frames above does a PL_DHashTableRawRemove on the entry that is being enumerated in the bottom frame, so by the time we get to the JS_SetPrivate call in the bottom frame, entry->mJSObj is null because the entry has been removed.

I guess the question is whether the above stack is legal use of the npruntime API.
Comment on attachment 308911 [details]
stack showing something pretty related to the problem

> 	gkplugin.dll!_releaseobject(NPObject * npobj=0x00189090)  Line 1619 + 0x9 bytes	C++
> 	npctrl.dll!CNPBrowser::ReleaseObject()  + 0x30 bytes	
> 	npctrl.dll!PluginObject::pluginDeallocate()  + 0x1b bytes	
> 	gkplugin.dll!NPObjWrapperPluginDestroyedCallback(PLDHashTable * table=0x05a17434, PLDHashEntryHdr * hdr=0x06156798, unsigned int number=0, void * arg=0x0012f898)  Line 1769 + 0xe bytes	C++

The pointer we're passing to their deallocate callback is the same one they're giving back to us in _releaseobject.

As jst mentioned, they shouldn't be doing this, since we've already called their invalidate callback (if present) on this object, and per http://developer.mozilla.org/en/docs/NPClass they shouldn't further use any object on which the invalidate callback was used.
That said, our code would crash if they released an object that we hadn't yet invalidated, which seems to be legal per the API description.  Maybe we should handle this by doing separate hash table enumerations?
Anyway, I think this is clearly a bug in the Silverlight plugin.  The deallocate callback should not call NPN_ReleaseObject.  (Usually, except for this plugin teardown case, the opposite is true; this is the only case where the deallocate callback is not called *from* ReleaseObject.)
Not blocking on this, most likely a plugin bug.
Flags: blocking1.9? → blocking1.9-
schrep, any contacts on the silverlight team that might be able to investigate further?
Akhil Kaza <akaza@windows.microsoft.com> would probably be a good fitst point of contact.
I pinged our friends at Microsoft (Peter Puszkiewicz)... and I think comment 2 is our answer. The product team is aware of the issue and plans on fixing it in the upcoming release.

Is there any other help you need from their team?
Based on David Baron's comments in this thread it looks like the problem is that we're calling ReleaseObject from a call to DeallocateObject (which is called by FF). Apparently FF3's ReleaseObject implementation crashes when an object is passed in that FF3 believes to be dead.

The reason why we do this is related to how reference counting works in FF2. In our experience, FF2 calls DeallocateObject on our NPObjects when *it* no longer references that NPObject. It ignores the reference count. This means that if we – or a different plugin such as a Java applet (should it support that) – still holds on to this object, then there is no way to tell whether this object is actually still alive at any given moment. Take this scenario:

                var plugin = …; // Get a reference to the plugin.
                var customObject = plugin.getCustomObject(); // getCustomObject() is a plugin function which returns a custom NPObject under the hood.
                javaApplet.setCustomObject(customObject); // Pass a reference to the Java applet.
                customObject = null; // Assume FF has no references left to the custom object.

                // Assume the following destroys the entire plugin.
                ...removeChild(plugin);
                plugin = null;
                javaApplet.invokeCustomObject(); // Let the Java applet do something with the custom object.

This will result in unexpected behavior as the Java applet doesn't know the object was actually freed from memory by the Silverlight plugin, because FF invoked DeallocateObject on the "custom object" NPObject when there was still a reference to it (from the Java applet). We dealt with this problem by implementing all of our DeallocateObject functions as if they were "Release" functions: we'd simply call ReleaseObject if the ref. count was > 1 or free the object when the ref. count <= 1. This fixed the scenario above for us and it didn't introduce any other problems.

In order to proceed, we would like you to consider one of two things:
1. Call ReleaseObject when FF no longer references an NPObject, instead of calling DeallocateObject. (We suspect FF does this to avoid a memory leak in cases where someone did indeed forget to release a reference to an NPObject. Does that sound right?) The NPAPI spec states that DeallocateObject is called when the reference count hits zero, so technically this is a violation of the NPAPI. From http://developer.mozilla.org/en/docs/NPClass:
    "Called by NPN_ReleaseObject() when an object's reference count reaches zero. If this field is NULL, free() is called instead."
2. Expect ReleaseObject to be called from DeallocateObject. Nothing in the NPAPI spec states that this is illegal. The object we pass to ReleaseObject has always a reference count of > 0 and is still in memory.

Ideally we would proceed with #2, as that would be easiest for us to get things working in both FF2 and FF3. If however anyone has a different suggestion or recommendation, we'd be happy to hear about those.

- Wilco Bauwer (MSFT)
But I'd note that http://developer.mozilla.org/en/docs/NPClass also says:

# invalidate
#     Called on live objects that belong to a plugin instance that is being
# destroyed. This call is always followed by a call to the deallocate function,
# or free(). Any attempt to use an invalidated object will result in undefined
# behavior. 

The case we're dealing with is exactly this case:  live objects that belong to a plugin instance that is being destroyed.
Today we do call ReleaseObject on objects that we know have been invalidated. We can and will fix this. This fix should address the specific crash in this bug report.

However, how does this address the scenario where a different plugin has a reference to an invalidated NPObject? How does it know the object has been invalidated? This may be a flaw in the NPAPI, but hopefully browsers will consider following the reference counting rules a higher priority. One approach you could take is not to use invalidate at all, in which case you could argue you're still implementing the NPAPI correctly, while also following the general reference counting rules. FWIW, last time I checked, WebKit doesn't call invalidate either.

For us it's important that the reference counting rules are followed strictly for our NPObjects, because we allow you to pass them from one plugin to another. Unless we'd rip out our entire HTML/JS bridge, this would be an easy way for developers to write apps that crash a browser or cause other type of unexpected behavior.
Could anyone comment on whether there are any plans to address the problem I mentioned in my last reply? Without a fix in FF3 for this we'll likely be unable to prevent people from crashing FF3 via Silverlight, or cause other type of unexpected behavior (e.g. read random memory).
shouldn't the other plugin have called:
http://developer.mozilla.org/en/docs/NPN_RetainObject

and shouldn't that have prevented us from calling
http://developer.mozilla.org/en/docs/NPN_ReleaseObject
?
We do call RetainObject on it when we hold on to the object, and call ReleaseObject on it when we let go of it. We try to strictly follow the reference counting rules all the time. Unfortunately FF doesn't seem to follow it strictly (it calls pluginDeallocate even when we called RetainObject on it from a different plugin instance and thus the reference count > 1), breaking the reference counting rules. We were able to work around this in FF2 by implementing pluginDeallocate as if it was a ReleaseObject call. This no longer works in FF3 though because FF3 no longer supports ReleaseObject calls on NPObjects originating from a plugin which FF3 called pluginDeallocate on - regardless of its reference count.

Please let me know if my information/explanation is obscure or lacking in any way. I'd be happy to elaborate and help trying to get to the bottom of this issue before FF3 RTMs.
Original comment for wilco:
http://developer.mozilla.org/en/docs/How_to_get_a_stacktrace_with_WinDbg

can you possibly show the reference counting being violated?

this should help you collect evidence:
bp xul!_retainobject "kp; g"; bp xul!_releaseobject "kp; g";

however, afaict he's right, we're forcefully violating the refcounting API:
http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/modules/plugin/base/src/nsJSNPRuntime.cpp&rev=1.40&mark=1815,1846,1784,1797-1800#1780
Assignee: nobody → jst
Severity: critical → blocker
Flags: blocking1.9- → blocking1.9?
Summary: Silverlight 2.0 beta: crashes when closing tab/window [@ JS_SetPrivate - NPObjWrapperPluginDestroyedCallback] → NPRuntime object reference counting is violated by NPObjWrapperPluginDestroyedCallback causing Silverlight 2.0 beta: crashes when closing tab/window [@ JS_SetPrivate - NPObjWrapperPluginDestroyedCallback]
so, the problem is that we need to stop all the plugins, let them all drop all references to anything they have, and then tell them that their objects are dead.

This might not be a problem for us to implement, as in theory we still have a reference to the plugins until we call NP_Shutdown
http://developer.mozilla.org/en/docs/NP_Shutdown

fwiw, shaver has volunteered to write plugins to test this stuff.

I'm not going to, but I will provide a patch that might do something. (like blow up).
mDead is me trying to account for this bug.

npjsobj->mJSObj is me being paranoid. it's not really related to this bug, but the bookkeeping of invalidate and friends scares me.

malloc/memset=>calloc is just some habit of mine
"the problem is that we need to stop all the plugins, let them all drop all
references to anything they have, and then tell them that their objects are
dead."

I'm not sure I see how this would work. If you keep in mind that a plugin itself may still be alive and all of its *own* objects are too, then how would you tell this plugin that its references to objects from a different plugin are no longer valid?

Unfortunately I don't think I'm allowed to look at any FF source code. I may be missing something here, but is there any reason FF couldn't just let go of objects only when ReleaseObject is called on them *and* the reference count is 1? That way NPObjects could safely cross between plugins (even when the plugin an object originates from is removed from the DOM), etc. That way the lifetime of NPObjects would be based entirely on the reference count, rather than either the reference count or the time a plugin has been in the DOM, whichever is shorter.
+'ing this until a real decision is made here.  Johnny, can you please take a look at this and make a call one way or another?
Flags: blocking1.9? → blocking1.9+
afaiu, the problem is that we do want the plugin instance to go away when the page goes away or the object is removed from the dom and gc'd.

of course, the plugin library needs to not be unloaded at this point (until all references the plugin library has go away), but we want to tell the plugin that its page and instance need to die.

consider:
browser 0
page A
 plugin 1
 plugin 2

plugin 1 gives X to plugin 2
plugin 2 gives V to plugin 1
plugin 1 gives I to page A
plugin 1 gives Q (non ref counted) to page A
plugin 2 gives R (non ref counted) to page A

use loads page B

obviously X held by plugin 2 would normally keep plugin 1 alive, and V held by plugin 1 would keep plugin 2 alive. obviously we want all of them to die because the user replaced page A with page B. We do that by trying to tell each plugin instance in page A to die. And for objects which don't interact with reference counting, we can go off and kill them ourselves (which this patch does).

strange notation follows, sorry.

 caller| callee:method(object)

ideally w/ this patch, we(0) loop through our list:
0| 1:invalidate(Q)
0| 1:destroy(Q)
0| 1:invalidate(X)
 1| 0:release(X)
0| 1:invalidate(V)
 1| 0:release(V)
0| 1:release(I)
0| 1:invalidate(I)
 1| 0:release(I)
0| 1:deallocate(I)

0| 2:invalidate(R)
0| 2:destroy(R)
0| 2:invalidate(X)
 2| 0:release(X)
0| 1:deallocate(X)
0| 2:invalidate(V)
 2| 0:release(V)
0| 2:deallocate(V)

Sorry about the notation, I really can't think of a better way of describing it.
A simpler scenario would be:

Page A
  plugin 1
    object X
  plugin 2

- plugin 2 gets a reference to object X (from plugin 1)
- someone unloads plugin 1 (e.g. some JavaScript removes it from the DOM)
- plugin 2 releases its reference to object X at some point because it lets go of it (for example because someone null'ed the field that held the reference).

This scenario currently seems to translate to (roughly) the following events:
- retainObject(objectX) [by plugin 2]
- invalidate(plugin1) [by FF]
- deallocate(plugin1) [by FF]
- invalidate(objectX) [by FF]
- deallocate(objectX) [by FF]
- releaseObject(objectX) [by plugin 2]

The last event is where things go wrong because FF no longer expects objectX to be referenced by anyone. Unfortunately there is no way for plugin 2 to know that object X is dead. The desired sequence of events would be (from our point of view):
- retainObject(objectX) [by plugin 2]
[
- releaseObject(plugin1) [by FF]
OR (because FF sees plugin1.referenceCount == 1 and thus could short-circuit the call to invalidate/deallocate)
- invalidate(plugin1) [by FF]
- deallocate(plugin1) [by FF] // Assume object X doesn't depend on plugin 1 and 
can live by itself.
]
- releaseObject(objectX) [by plugin 2]
- invalidate(objectX) [by FF because objectX.referenceCount == 1 at the time plugin 2 called releaseObject]
- deallocate(objectX) [by FF]

Hope this makes sense. I can attach a very basic Silverlight repro later today that exploits the bug we're talking about.
I haven't followed all the discussion above; there's an awful lot of it.  But it seems like the documentation is pretty clear that everything is going to get destroyed when the plugin instance is destroyed, nonzero reference counts notwithstanding.

Shouldn't objects passed from one plugin instance to another end up double-wrapped, with the two wrappers invalidated at different times?  Or do you mean that your plugin is passing them between instances rather than allowing other code to do so?
"But it seems like the documentation is pretty clear that everything is going to get destroyed when the plugin instance is destroyed, nonzero reference counts
notwithstanding."

Would you mind pointing out where this is documented? The best I could find is http://developer.mozilla.org/en/docs/NPP_Destroy which mentions "NPP_Destroy releases the instance data and resources associated with a plug-in." It doesn't mention it disregards the reference count. Furthermore http://developer.mozilla.org/en/docs/NPN_ReleaseObject mentions that the browser should only delete an object when the reference count hits 0.

I don't understand how double-wrapping would solve anything. What happens if a plugin invokes a "double-wrapped" object? How can a wrapper tell whether it can safely access the underlying object?

Also, please take into account that the behavior in Safari/Opera is what we expect. The difference in behavior makes it difficult to write robust, "memory-leakless" code. (E.g. if a wrapper is supposed to not call release object on the object it wraps, then that might work in FF3, but would cause a leak in Safari.)
(In reply to comment #35)
> "But it seems like the documentation is pretty clear that everything is going
> to get destroyed when the plugin instance is destroyed, nonzero reference
> counts
> notwithstanding."
> 
> Would you mind pointing out where this is documented? The best I could find is
> http://developer.mozilla.org/en/docs/NPP_Destroy which mentions "NPP_Destroy
> releases the instance data and resources associated with a plug-in." It doesn't
> mention it disregards the reference count. Furthermore
> http://developer.mozilla.org/en/docs/NPN_ReleaseObject mentions that the
> browser should only delete an object when the reference count hits 0.

The NPRuntime API was never designed with plugins sharing NPObjects in mind which brings on this whole problem. What this really comes down to is that NPRuntime objects are tied to the instance they came from. The reason Firefox destroys all NPObjects that are associated with a particular plugin instance when that instance is stopped is that once the plugin is stopped, the plugin could get unloaded etc, which means that the code in the NPObject (i.e. the functions pointed to by its NPClass etc) could get unloaded (i.e. the dll might be unloaded) and thus it's not safe to leave such objects around. Destroying a plugin instance's NPObjects also makes sure that any reference counting leaks in plugin code doesn't cause random NPObjects to stay alive forever etc too.

> I don't understand how double-wrapping would solve anything. What happens if a
> plugin invokes a "double-wrapped" object? How can a wrapper tell whether it can
> safely access the underlying object?

In such a setup, the objects you'd be holding reference to would not be directly tied to another plugin instance, but rather to an intermediate object that could outlive the other plugin instance and make it so that calls on a double wrapped object whose real NPObject has been deleted would be safe since the reference from the wrapper object to the real object would be cleared when the NPObject is torn down.

> Also, please take into account that the behavior in Safari/Opera is what we
> expect. The difference in behavior makes it difficult to write robust,
> "memory-leakless" code. (E.g. if a wrapper is supposed to not call release
> object on the object it wraps, then that might work in FF3, but would cause a
> leak in Safari.)

Point noted, but this is unfortunately not something we'll be fixing in Firefox, at least not for Firefox 3, we're way too late into the release to be changing anything like this.

If you guys depend on inter-plugin communication through NPObjects or simply just want to support passing of NPObjects from one plugin to another, then you guys need to make it safe to do so. And not by leaking everything, but by setting up the appropriate communication between the instances that share NPObjects and once one instance is torn down (you'll know from getting the invalidate and deallocate calls on your NPObjects) you'll need to notify the other plugin and tell it that the instance is going away. Then it's up to the other instance to go through all references it has to NPObjects from other instances and dropping those references (i.e. setting them to NULL or what not).

Removing blocking1.9+ flag as this isn't something we'll be fixing for Firefox 3, if ever.
Flags: blocking1.9+ → blocking1.9-
> > [...]
> The NPRuntime API was never designed with plugins sharing NPObjects in mind
> which brings on this whole problem. What this really comes down to is that
> NPRuntime objects are tied to the instance they came from.

The NPRuntime allows passing NPObjects between plugins and browsers. Doesn't this mean that by definition one can pass objects between plugins? If a browser doesn't expect this, shouldn't *it* double-wrap such objects and make sure such wrappers are invalidated when the underlying object's plugin is gone? The browser has the ability to do this; I don't see how a plugin could reliably do this.

Also, keep in mind that technically there's no way a plug-in can tell an NPObject is owned by a different plugin (instance or even type) or if its owned by the browser. From a plugin's point of view they're all the same.

Furthermore, I can't find your statement anywhere in the documentation, and other browsers (including FF 2) seem to handle this just fine. I'd love to hear how we could've prepared for this rather surprising break, such that we can avoid this kind of situation as we move forward.

> The reason Firefox destroys all NPObjects that are associated with a 
> particular plugin instance when that instance is stopped is that once the 
> plugin is stopped, the plugin could get unloaded etc, which means that the 
> code in the NPObject (i.e. the functions pointed to by its NPClass etc) could 
> get unloaded (i.e. the dll might be unloaded) and thus it's not safe to leave 
> such objects around.

Isn't this what NPPVpluginKeepLibraryInMemory is for?

> Destroying a plugin instance's NPObjects also makes sure that any reference 
> counting leaks in plugin code doesn't cause random NPObjects to stay alive 
> forever etc too.

True. But this is at the expense of violating a very generally accepted technique for managing object life-time (reference counting), as well as violating the NPAPI (see the documentation for ReleaseObject).

> > [...]
> In such a setup, the objects you'd be holding reference to would not be
> directly tied to another plugin instance, but rather to an intermediate object
> that could outlive the other plugin instance and make it so that calls on a
> double wrapped object whose real NPObject has been deleted would be safe since
> the reference from the wrapper object to the real object would be cleared when
> the NPObject is torn down.

Our plugin would be able to safely invoke the wrapper ("intermediate object"), but how does this wrapper know whether the object it wraps is still alive? At some point we need to call into an object from a different plugin, and I don't see how we would be able to tell whether that object is alive.

> > [...]
> Point noted, but this is unfortunately not something we'll be fixing in
> Firefox, at least not for Firefox 3, we're way too late into the release to be
> changing anything like this.
> If you guys depend on inter-plugin communication through NPObjects or simply
> just want to support passing of NPObjects from one plugin to another, then you
> guys need to make it safe to do so. And not by leaking everything, but by
> setting up the appropriate communication between the instances that share
> NPObjects and once one instance is torn down (you'll know from getting the
> invalidate and deallocate calls on your NPObjects) you'll need to notify the
> other plugin and tell it that the instance is going away. Then it's up to the
> other instance to go through all references it has to NPObjects from other
> instances and dropping those references (i.e. setting them to NULL or what
> not).
> Removing blocking1.9+ flag as this isn't something we'll be fixing for Firefox
> 3, if ever.

We'd be happy to hear about any suggestions to work around this in a cross-browser way.
(In reply to comment #39)
> The NPRuntime allows passing NPObjects between plugins and browsers. Doesn't
> this mean that by definition one can pass objects between plugins? If a browser

One can, but that was never the intent, documented or not.

> doesn't expect this, shouldn't *it* double-wrap such objects and make sure such
> wrappers are invalidated when the underlying object's plugin is gone? The

Sure, it could, and a future version of Firefox might do that, but not Firefox 3.

> Also, keep in mind that technically there's no way a plug-in can tell an
> NPObject is owned by a different plugin (instance or even type) or if its owned
> by the browser. From a plugin's point of view they're all the same.

They all have an NPClass, if the class is one you recognize, or something in the class is something you recognize, you know it's a silverlight NPObject, once you know that you can cast the NPObject to a known type and that type could carry an instance identifier etc that you could use to determine if it's from a different instance or not, etc etc.

> Furthermore, I can't find your statement anywhere in the documentation, and
> other browsers (including FF 2) seem to handle this just fine. I'd love to hear
> how we could've prepared for this rather surprising break, such that we can
> avoid this kind of situation as we move forward.

Uh, this hasn't changed in our code since Firefox 2 that I know (or Firefox 1.0 for that matter, IIRC). What exactly are you claiming is working differently? I just diffed NPObject destruction the code between Firefox 2 and Firefox 3, and nothing significant has changed there.

[...]
> > Destroying a plugin instance's NPObjects also makes sure that any reference 
> > counting leaks in plugin code doesn't cause random NPObjects to stay alive 
> > forever etc too.
> 
> True. But this is at the expense of violating a very generally accepted
> technique for managing object life-time (reference counting), as well as
> violating the NPAPI (see the documentation for ReleaseObject).

Depends on where you draw the line here, at some point, objects whose reference count has yet to hit zero will be deleted too, i.e. the process going away or what not. The plugin is being stopped here, it's done, we're forcing it to release all the objects that it hasn't released yet itself. And again, this is nothing new in Firefox 3.

> > In such a setup, the objects you'd be holding reference to would not be
> > directly tied to another plugin instance, but rather to an intermediate object
> > that could outlive the other plugin instance and make it so that calls on a
> > double wrapped object whose real NPObject has been deleted would be safe since
> > the reference from the wrapper object to the real object would be cleared when
> > the NPObject is torn down.
> 
> Our plugin would be able to safely invoke the wrapper ("intermediate object"),
> but how does this wrapper know whether the object it wraps is still alive? At
> some point we need to call into an object from a different plugin, and I don't
> see how we would be able to tell whether that object is alive.

That's a problem for the browser to solve, not the plugin. The plugin would have a reference to an NPObject wrapper, i.e. another NPOjbect, and it would invoke a method on it. Whether that succeeds or not depends on whether the actual object is still alive etc, which is up to the browser to figure out. We already do this for JS->NPObject calls, and it works... Once the NPObject goes away, any calls from JS on the object that JS sees simply just fail.

[...]
> We'd be happy to hear about any suggestions to work around this in a
> cross-browser way.

See above.
> They all have an NPClass, if the class is one you recognize, or something in
> the class is something you recognize, you know it's a silverlight NPObject,
> once you know that you can cast the NPObject to a known type and that type
> could carry an instance identifier etc that you could use to determine if it's
> from a different instance or not, etc etc.

This sounds like a really bad idea. What if an NPObject is coming from a different plugin, e.g. Flash or a Java applet? No matter what way you put this, we'd be relying on implementation details of either the browser or another plugin, which isn't someting we want to do.

> That's a problem for the browser to solve, not the plugin. The plugin would
> have a reference to an NPObject wrapper, i.e. another NPOjbect, and it would
> invoke a method on it. Whether that succeeds or not depends on whether the
> actual object is still alive etc, which is up to the browser to figure out. We
> already do this for JS->NPObject calls, and it works... Once the NPObject goes
> away, any calls from JS on the object that JS sees simply just fail.

I am probably missing something here. Imagine the following scenario:

* Plugin A
  + Object X (custom NPObject)
* Plugin B

Now, some JavaScript mediates between plugin A and plugin B. It gets a reference to object X (a custom NPObject) and passes it on to plugin B. It then removes plugin A from the DOM. Assume that at this point the browser destroys plugin A. When plugin B wants to invoke object X, things will gracefully fail. When we pass it to the browser's ReleaseObject the browser will crash though.

In this scenario, where and how do you propose we wrap object X? We could wrap it inside plugin B, but technically that shouldn't make any difference. Whether plugin B directly invokes object X or does it via a wrapper, it'd still be accessing object X which may or may not be alive.
(In reply to comment #41)
> > They all have an NPClass, if the class is one you recognize, or something in
> > the class is something you recognize, you know it's a silverlight NPObject,
> > once you know that you can cast the NPObject to a known type and that type
> > could carry an instance identifier etc that you could use to determine if it's
> > from a different instance or not, etc etc.
> 
> This sounds like a really bad idea. What if an NPObject is coming from a
> different plugin, e.g. Flash or a Java applet? No matter what way you put this,
> we'd be relying on implementation details of either the browser or another
> plugin, which isn't someting we want to do.

If you're holding on to NPObjects from other plugins, then yes, but I would imagine you'd only be interested in holding on to other plugins from another silverlight plugin instance. So if an object is from anything you don't recognize, don't hold on to it...

[...]
> I am probably missing something here. Imagine the following scenario:
> 
> * Plugin A
>   + Object X (custom NPObject)
> * Plugin B
> 
> Now, some JavaScript mediates between plugin A and plugin B. It gets a
> reference to object X (a custom NPObject) and passes it on to plugin B. It then
> removes plugin A from the DOM. Assume that at this point the browser destroys
> plugin A. When plugin B wants to invoke object X, things will gracefully fail.
> When we pass it to the browser's ReleaseObject the browser will crash though.

No, releasing the object in this case would of course only release the wrapper object, which is still alive and well. The real object has already been deleted, and the wrapper knows that and no longer even has a reference to it.

But most importantly, what are you claiming has changed here since Firefox 2?
It seems what has changed is that this does not crash in Firefox 2, but does cause a crash in Firefox 3.  Thus something, presumably, has changed, somewhere.  Which presumably could be fixed.  Especially since this is causing crashes with javascript close even on pages that aren't Silverlight enabled.  
> > [...]
> If you're holding on to NPObjects from other plugins, then yes, but I would
> imagine you'd only be interested in holding on to other plugins from another
> silverlight plugin instance. So if an object is from anything you don't
> recognize, don't hold on to it...

We allow developers to pass objects back and forth between the browser and Silverlight. We don't want to put restrictions on the type of objects people pass back and forth - it should just work. If they can program against an arbitrary object from JavaScript, they should be able to do so from Silverlight as well.

> > [...]
> No, releasing the object in this case would of course only release the wrapper
> object, which is still alive and well. The real object has already been
> deleted, and the wrapper knows that and no longer even has a reference to it.

How exactly would the wrapper know when its underlying object is deleted? Unless the wrapper is a JavaScript object (which I hope isn't what you are suggesting), I don't see how this could possibly work, given that 1) two plugins have a completely separate lifetime 2) reference counting rules are not strictly followed, meaning the lifetime of an arbitrary object (owned by the browser or a different plugin) is non-deterministic and 3) an object doesn't get notified when a different object is killed regardless of its reference count.

> But most importantly, what are you claiming has changed here since Firefox 2?

Unfortunately the best I can do is point you to this thread and say "this used to work, but *something* has changed such that it suddenly breaks in FF3". If I were allowed to look at the FF source code, I would've been more than happy to try and find out the root cause of this change in behavior.
Can you find a regression range, by checking the alphas and betas on ftp.mozilla.org (betas in ftp://ftp.mozilla.org/pub/firefox/releases/, alphas in ftp://ftp.mozilla.org/pub/firefox/releases/granparadiso/).  That would help a bunch.

Can you produce a small test case that shows the behaviour?  Preferably using something with at least private symbol server coverage, if not a simple plugin with source.
So far I've tried looking for a regression range with the Silverlight 1 plugin.  I hope that's somewhat relevant here.  I used this page as a test.  
http://www.microsoft.com/silverlight/partners/default.aspx

I suspected bug 1156 so I tried trunk builds from 09/21/2005 and 09/22/2005 and indeed the plugin works before that change but doesn't after.
Silerlight 2 Beta 1 shows the same range.  At least I'm pretty sure that's the right plugin.  I uninstalled version 1.  Then installed 2b1.  About:plugins shows version 1.0.30226.2 which is the reason for my doubt.  But there is the added MIME type: application/x-silverlight-2-b1.
I apologize.  The regression range I found before was for when that site stopped working.  But that's separate from what this bug is about.  The regression range for this crash is:
2007 08 09 ok
2007 08 10 crash
http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=PhoenixTinderbox&branch=HEAD&branchtype=match&dir=&file=&filetype=match&who=&whotype=match&sortby=Date&hours=2&date=explicit&mindate=2007-08-09+04%3A00%3A00&maxdate=2007-08-10+06%3A00%3A00&cvsroot=%2Fcvsroot

I used this site to test the crash with: 
http://www.iagserver.org/
No longer blocks: 1156
I suppose the change in the regression range that matters here is bug 383553.  Is that correct?  
Blocks: 383553
(In reply to comment #31)
> +'ing this until a real decision is made here.  Johnny, can you please take a
> look at this and make a call one way or another?
> 

FWIW, This crash made it to #3 position on the topcrash list for RC1.

http://crash-stats.mozilla.com/topcrasher/byversion/Firefox/3.0


The same crash I'm able to reproduce when Silverlight 2.0 beta is installed in different ways:

1. Open: https://www.godaddy.com/gdshop/secure_transfer_domains.asp?ci=8987
2. Open Addons Manager and try to disable the Silverlight plugin

Crash ids: bp-bf95fe61-28c4-11dd-9466-001321b13766 and bp-bf95fe61-28c4-11dd-9466-001321b13766

Above tests with latest Silverlight 1.0.30401.0 plugin doesn't crash the browser for me on Windows.
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9) Gecko/2008052906 Firefox/3.0

This is still an error in RC2.
(In reply to comment #50)
> I suppose the change in the regression range that matters here is bug 383553. 
> Is that correct?

Bug 354380 seems far more likely.
This occurs for me in RC2 as well.

I have Silverlight 1.0.30226.2

I am using Windows XP Home SP3.
Due to this is a bug with the beta version of Silverlight 2.0, shouldn't it be marked as invalid? 
Hardware: PC → All
(In reply to comment #64)
> Due to this is a bug with the beta version of Silverlight 2.0, shouldn't it be
> marked as invalid? 

We're using it to track the progress of getting the plug-in updated.

Speaking of which, beta 2 of Silverlight 2 is out. I'm fairly sure this is fixed in that beta. Can someone who sees this confirm?
This should be fixed in Silverlight 2 beta2.  Please confirm that it is and let me know if it isn't.

Thanks!
I still have version check problems with Beta 2.
(In reply to comment #65)
> Speaking of which, beta 2 of Silverlight 2 is out. I'm fairly sure this is
> fixed in that beta. Can someone who sees this confirm?
>
Testing with Camino trunk (2.0a1pre) and Minefield nightly (3.1a1pre), OS X 10.5.2. And after some crazy gimnastics to find the plugin - I _had_ to use Safari 3.1.1 release build to download it - browser detection is so last century)

Starting with a couple of samples @ http://silverlight.net/.
Most of them do some stupid browser detection and ask me to download the plugin (including http://www.microsoft.com/Silverlight/ !!).

I did find one site that actually showed Silverlight content
http://memorabilia.hardrock.com/
Closing the window or tab did *NOT* crash the browser (that was my original STR in comment 0). It looks like the bug could be fixed, hard to say based on the one site.

About:plugins tells me that the installed version is SilverLight 2.0.30523.6.

I just updated to Silverlight 2 Beta 2 (2.0.30523.6)

Plugin status against different websites on FF3 RC2 as follows:
1. Silverlight.net - The silverlight based menu works without crashing
2. asp.net/learn videos - These videos are silverlight based and work without crashing (used to crash with Silverlight 2.0 B1 and earlier versions)
3. Channel8.msdn.com and Channel9.msdn.com videos - Although I can see the Silverlight 2.0 B2 plugin installed and enabled in the 'addons' window, the individual video viewports display the standard overlay prompting me to install Silverlight under FF3 RC2. Don't know why.
4. On all sites - A faint dotted box appears when mouse click focuses within the silverlight enabled area on the page.
Point 3, Kulin, when SLBeta2 views a site designed for SLBeta1, it prompts the user to install, since SLBeta2 is not backwards compatible. See here:

http://weblogs.asp.net/scottgu/archive/2008/06/06/silverlight-2-beta2-released.aspx

(Last item above summary: "Understanding Compatibility with Silverlight 1.0 and Silverlight 2 Beta 1").
Tested given URLs with Firefox 3 RC2 under Windows Vista and Mac OS X. Both don't crash anymore with Silverlight 2.0.30523.6 installed. Now it's even possible to enable/disable the plugin within the Add-ons Manager without a crash.

Looks like we can really close this bug as invalid. Microsoft has fixed the crashes in their own code. We don't need the tracker any longer.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → INVALID
could someone from microsoft explain what they did in case other plugin vendors encounter this?
Originally our custom NPObject implementations implemented NPDeallocate like this:

    if (object->referenceCount > 0) {
        // Make up for the browser's reference count violation.
        browser->releaseObject(object);
        return;
    }

    // Actual deallocation code goes here...
    delete object;

The problem here is that FF3 doesn't like it when we call releaseObject on an object with a reference count of 0. To deal with this, we changed the first line above to:

    if (object->referenceCount > 1) {

This means that when the reference count is <= 1, we directly run our deallocation code, rather than telling the browser that we'd like to release the last reference to our object and expecting it to call NPDeallocate on us again (and run any internal cleanup code that it might have) - like it did in FF2.
we should consider blocking the old versions prior to  2.0.30523.6 on firefox 3.
Status: RESOLVED → REOPENED
Resolution: INVALID → ---
Assignee: jst → nobody
Status: REOPENED → NEW
Component: Plug-ins → Blocklisting
Flags: blocking1.9-
Product: Core → addons.mozilla.org
QA Contact: plugins → blocklisting
Version: Trunk → unspecified
Summary: NPRuntime object reference counting is violated by NPObjWrapperPluginDestroyedCallback causing Silverlight 2.0 beta: crashes when closing tab/window [@ JS_SetPrivate - NPObjWrapperPluginDestroyedCallback] → blocklist old Silverlight 2.0 beta versions - : crashes when closing tab/window- NPRuntime object reference counting is violated by NPObjWrapperPluginDestroyedCallback [@ JS_SetPrivate - NPObjWrapperPluginDestroyedCallback]
You could reopen Bug 431335 for that purpose.
No, I don't want to have the blocklist service covering for beta versions of plugins that people knowingly install and use.

We need to use blocklisting judiciously, and this is a case where it's overkill.  Here is the current policy:
http://wiki.mozilla.org/Blocklisting

So just like bug 431335, not sure if it's the best use of the service to blocklist beta versions of plugins.
We do meet most of the criteria on the policy page

    *  Block

          o with versions with known vulnerabilities or major user-facing issues
          o with version ranges with known vulnerabilities
          o with fatal bugs (client crashes on startup or something causing an endless loop of unusability) 

>>> Its currently the #2 topcrash issue facing Firefox RC1 and RC2 users.  That meets 2 out of three Blocking criteria. <<<<

    * Don't block:
          o before we work with author/vendor to send out an update, so don't block the most recent version of a major addon/plugin

 >>> we have done that and they have provided a fixed version that we should be encouraging upgrade to...

          o for minor bugs for non-popular addons/plugins (crashes on event calls, messed up UI, etc.)

>>> not minor #2 top crash

          o if user has disabled compatibility checking and add-on causes problems/crashes

>>>  A few user comments associate the crash with the silverlight plugin, but most are not making the connection.  The just think Firefox is unstable when visiting particualar sites...

          o plugins that are pre-release, alpha or beta 


Its not a security issue, and it is pre-release, but other than that it matches the other 5 criteria for blocking.  It could also very well remove some of the noise in the reporter data for crashes like this, and help us to focus on the next critical issue.
I don't think that's what the blocklist service should be used for.  We should ship this crash info to vendors and/or make them aware of it, but I don't think that we should aggressively blocklist things like this just to save crashes.

Crashes caused by pre-release plugins, even if they are popular and sometimes fatal, are arguably an important part of the development process for anybody testing on our platform.  So we're walking a fine -- what I don't want to do is aggressively blocklist and prevent necessary crashes from happening.
Agreed with morgamic.  We have a policy that excludes beta/pre-release plugins for a reason, and this bug is not the place to advocate for a change to that policy.  I think we also don't want our blocklist to become the way that plugin authors get their users to update.
Its not making much sense to me...

> We should ship this crash info to vendors and/or make them aware of it, but I don't think that we should aggressively blocklist things like this just to save crashes.

We did "ship the crash info to the vendor" in this case, and it resulted in them fixing the incompatibility.

> Crashes caused by pre-release plugins, even if they are popular and sometimes
fatal, are arguably an important part of the development process for anybody
testing on our platform. 

I agree, it *was* an important part of the developement process.  It no longer is, since the bug has been fixed and there is a new version available.  No one is interested in these crashes, and it continues to lead to a bad experience for firefox users

>  We have a policy that excludes beta/pre-release plugins for a reason, and this bug is not the place to advocate for a change to that policy.  I think we also don't want our blocklist to become the way that plugin authors get their users to update.

Ok, I'll stop.  Returning this bug to invalid.  We ought to add the reasons for blocking beta versions of the plugins (including when fixes are available) to the policy.  It also seems like blocking *will* lead to firefox users updating to more secure and more stable versions of plugins, and that is a good thing, but I guess I'm confused on this.
Status: NEW → RESOLVED
Closed: 16 years ago16 years ago
Resolution: --- → INVALID
Status: RESOLVED → VERIFIED
Crash Signature: [@ JS_SetPrivate - NPObjWrapperPluginDestroyedCallback]
Product: addons.mozilla.org → Toolkit
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: