Closed Bug 686861 Opened 13 years ago Closed 7 years ago

Deadlock in silverlight plugin

Categories

(Core Graveyard :: Plug-ins, defect)

6 Branch
x86_64
Windows 7
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: ulrich.kroemer, Unassigned)

Details

User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/13.0.782.220 Safari/535.1

Steps to reproduce:

hi guys,

ff4,5 and 6 keep consistently hangingif a silverlight application is unloaded and for some reason accessibility has been loaded into the silverlight process. the stack in the main thread of the plugin-container.exe always looks like the callstack below. i have also added a comment to issue 558986, because it seems that exact scenario was encountered there as well but already marked as fixed. however i still get the same issue. the systems where this happens are win7 and winvista. 

unfortunately the issue is not reproducible as it seems to be timing sensitive as to whether the firefox parent process issues a request against the plugin container in that exact moment when the plugin container tries to clean up the accessibility objects.


>	ntdll.dll!_NtWaitForMultipleObjects@20()  + 0x15 bytes	
 	ntdll.dll!_NtWaitForMultipleObjects@20()  + 0x15 bytes	
 	kernel32.dll!_WaitForMultipleObjectsExImplementation@20()  + 0x8e bytes	
 	user32.dll!_RealMsgWaitForMultipleObjectsEx@20()  + 0xe2 bytes	
 	ole32.dll!CCliModalLoop::BlockFn()  + 0x96 bytes	
 	ole32.dll!ModalLoop()  + 0x52 bytes	
 	ole32.dll!ThreadSendReceive()  + 0x8a bytes	
 	ole32.dll!CRpcChannelBuffer::SwitchAptAndDispatchCall()  + 0xb5 bytes	
 	ole32.dll!CRpcChannelBuffer::SendReceive2()  + 0xa6 bytes	
 	ole32.dll!CCliModalLoop::SendReceive()  + 0x1e bytes	
 	ole32.dll!CAptRpcChnl::SendReceive()  + 0x72 bytes	
 	ole32.dll!CCtxComChnl::SendReceive()  + 0x47 bytes	
 	ole32.dll!NdrExtpProxySendReceive()  + 0x43 bytes	
 	rpcrt4.dll!@NdrpProxySendReceive@4()  + 0xe bytes	
 	rpcrt4.dll!_NdrClientCall2()  + 0x144 bytes	
 	ole32.dll!_ObjectStublessClient@8()  + 0x7a bytes	
 	ole32.dll!_ObjectStubless@0()  + 0xf bytes	
 	ole32.dll!RemoteReleaseRifRefHelper()  + 0x3c bytes	
 	ole32.dll!RemoteReleaseRifRef()  + 0xa5 bytes	
 	ole32.dll!CStdMarshal::DisconnectCliIPIDs()  - 0xea bytes	
 	ole32.dll!CStdMarshal::Disconnect()  + 0x160 bytes	
 	ole32.dll!CStdIdentity::~CStdIdentity()  + 0x7d bytes	
 	ole32.dll!CStdIdentity::`vector deleting destructor'()  + 0xd bytes	
 	ole32.dll!CStdIdentity::CInternalUnk::Release()  - 0x1d726 bytes	
 	ole32.dll!_IUnknown_Release_Proxy@4()  + 0x11 bytes	
 	oleacc.dll!AccWrap_Base::~AccWrap_Base()  + 0x3d bytes	
 	oleacc.dll!AccWrap_Annotate::`scalar deleting destructor'()  + 0xd bytes	
 	oleacc.dll!AccWrap_Base::Release()  + 0x257 bytes	
 	mscorwks.dll!ReleaseTransitionHelper()  + 0x5f bytes	
 	mscorwks.dll!SafeReleaseHelper()  + 0x70 bytes	
 	mscorwks.dll!SafeRelease()  + 0x2f bytes	
 	mscorwks.dll!IUnkEntry::Free()  + 0x50 bytes	
 	mscorwks.dll!RCW::ReleaseAllInterfaces()  + 0x18 bytes	
 	mscorwks.dll!RCW::ReleaseAllInterfacesCallBack()  + 0x2f bytes	
 	mscorwks.dll!RCW::Cleanup()  + 0x22 bytes	
 	mscorwks.dll!RCWCleanupList::ReleaseRCWListRaw()  + 0x16 bytes	
 	mscorwks.dll!RCWCleanupList::ReleaseRCWListInCorrectCtx()  + 0x7c bytes	
 	mscorwks.dll!CtxEntry::EnterContextCallback()  + 0x8c bytes	
 	ole32.dll!CRemoteUnknown::DoCallback()  + 0x3b bytes	
 	rpcrt4.dll!_Invoke@12()  + 0x2a bytes	
 	rpcrt4.dll!_NdrStubCall2@16()  + 0x256 bytes	
 	ole32.dll!_CStdStubBuffer_Invoke@12()  + 0x70 bytes	
 	ole32.dll!SyncStubInvoke()  + 0x34 bytes	
 	ole32.dll!StubInvoke()  + 0x7b bytes	
 	ole32.dll!CCtxComChnl::ContextInvoke()  + 0xe6 bytes	
 	ole32.dll!MTAInvoke()  + 0x1a bytes	
 	ole32.dll!STAInvoke()  + 0x4a bytes	
 	ole32.dll!AppInvoke()  + 0x92 bytes	
 	ole32.dll!ComInvokeWithLockAndIPID()  + 0x27c bytes	
 	ole32.dll!ComInvoke()  + 0x71 bytes	
 	ole32.dll!ThreadDispatch()  + 0x1a bytes	
 	ole32.dll!ThreadWndProc()  + 0xa0 bytes	
 	user32.dll!_InternalCallWinProc@20()  + 0x23 bytes	
 	user32.dll!_UserCallWinProcCheckWow@32()  + 0xb7 bytes	
 	user32.dll!_DispatchMessageWorker@8()  + 0xed bytes	
 	user32.dll!_DispatchMessageW@4()  + 0xf bytes	
 	xul.dll!base::MessagePumpForUI::ProcessMessageHelper(msg)  Line 368	C++
 	xul.dll!base::MessagePumpForUI::ProcessNextWindowsMessage()  Line 340 + 0xc bytes	C++
 	xul.dll!base::MessagePumpForUI::DoRunLoop()  Line 209 + 0x6 bytes	C++
 	xul.dll!base::MessagePumpWin::RunWithDispatcher(delegate, dispatcher)  Line 54	C++
 	xul.dll!base::MessagePumpWin::Run(delegate)  Line 78 + 0xc bytes	C++
 	xul.dll!MessageLoop::RunInternal()  Line 219	C++
 	xul.dll!MessageLoop::RunHandler()  + 0x2081f6 bytes	C++
 	xul.dll!MessageLoop::Run()  Line 177	C++
 	xul.dll!XRE_InitChildProcess(aArgc, aArgv, aProcess)  Line 514	C++


Actual results:

the silverlight plugin deadlocked


Expected results:

it should not have deadlocked
the problem seems to be related to garbage collection.

if for some reason .net objects are allocated in the silverlight plugin that themselves wrap accessible objects that point into the firefoxe.exe process than those accessible objects will be freed when .net garbage collection is done.

if then by chance the .net garbage collector runs and frees those objects (which will then issue a cross process call to the firefoxe.exe) at the same time that the firefox process also executes a request to the plugin process, then the plugin will deadlock.

that is also the reason why it is so non-deterministic to reproduce the problem.
Component: General → Plug-ins
Product: Firefox → Core
QA Contact: general → plugins
dbolter, this is the sort of thing I fear most about a11y for content processes, although the directions would have to be reversed (a11y object in chrome prcoess, blocking call from content process) since chrome is never supposed to block on content. Anyway, Jim can you maybe take a look at this since we've been looking for a reproducible testcase for this crash that we've seen on crash-stats?

Ulrich, it doesn't have to be 100% reproducible, but what are the specific steps? Which a11y software are you using, and what page are you visiting?
hi,

i'm using the MSUIA interface to provide automation support for Silverlight in Firefox. The problem seems to relate to the fact that I'm using the managed .Net MSUIA interface and not the COM interface, because this brings the GarbageCollector into the game. What happens is this:


1.)	Through the MSUIA some .net objects get instantiated in the plugin-container.exe
2.)	Some of those objects are wrappers around accessibility objects and they point to objects in the parent firefox.exe process (because the MSUIA internally somwehow wrappes MSAA)
3.)	At some point those objects are not needed anymore (but I can’t control that point because they’re internal to the MSUIA)
4.)	At some later point the garbage collector tries to free the unused objects. The accessibility wrappers (COM objects) now try to call into the parent firefox.exe process to free the COM object there
5.)	If by chance this happens at a moment where the parent firefoex.exe process tries to request anything from the child plugin-container.exe process it will now deadlock. The parent process will wait indefinitely long, the child process will do the same.

My problem is that the MSUIA only exposes a “Global Interface” which encapsulates process boundaries. I would not need the objects from the firefox.exe process but I can’t prevent them from getting instantiated as soon as I access objects in the plugin-container.exe

I do think however that the main problem is still in Firefox itself, because it so happens that as soon as the child process references a COM object in the parent process and calls that COM object at the same time that the parent process calls the plugin-container.exe process it will deadlock. And because the Garbage collector runs at any time I can't control when those COM objects are freed.

I should also add that the problem happens from time to time on my test machines here. Some of them seem to have enough memory available to run Garbage Collection always at the right time and there are never any problems while some of the less capable boxes seem to constantly run into exactly this problem.
hi again,

i also managed to create other scenarios, where the plugin-container deadlocks which seem to be completely unrelated to the test automation interface:

it seems firefox also deadlocks in case the silverlight plugin calls to the firefox process (you can't get symbols for npctrl and coreclr dlls):


>	ntdll.dll!_KiFastSystemCallRet@0() 	
 	ntdll.dll!_ZwWaitForMultipleObjects@20()  + 0xc bytes	
 	KernelBase.dll!_WaitForMultipleObjectsEx@20()  - 0x54 bytes	
 	kernel32.dll!_WaitForMultipleObjectsExImplementation@20()  + 0x8e bytes	
 	user32.dll!_RealMsgWaitForMultipleObjectsEx@20()  + 0xd7 bytes	
 	user32.dll!_MsgWaitForMultipleObjects@20()  + 0x1f bytes	
 	xul.dll!mozilla::ipc::RPCChannel::WaitForNotify()  Line 894	C++
 	xul.dll!mozilla::ipc::RPCChannel::Call(msg, reply)  Line 201 + 0x6 bytes	C++
 	xul.dll!mozilla::plugins::PPluginScriptableObjectChild::CallInvoke(aId, aArgs, aResult, aSuccess)  Line 213	C++
 	xul.dll!mozilla::plugins::PluginScriptableObjectChild::ScriptableInvoke(aObject, aName, aArgs, aArgCount, aResult)  Line 160	C++
 	xul.dll!mozilla::plugins::child::_invoke(aNPP, aNPObj, aMethod, aArgs, aArgCount, aResult)  Line 1282 + 0xf bytes	C++
 	npctrl.dll!613affc2() 	
 	[Frames below may be incorrect and/or missing, no symbols loaded for npctrl.dll]	
 	xul.dll!mozilla::plugins::child::_invoke(aNPP, aNPObj, aMethod, aArgs, aArgCount, aResult)  Line 1282 + 0xf bytes	C++
 	npctrl.dll!613ad078() 	
 	npctrl.dll!613ad0e4() 	
 	coreclr.dll!5f8c270e() 	
 	coreclr.dll!5f8c1e7e() 	
 	coreclr.dll!5f8c1aa5() 	
 	coreclr.dll!5f8a473a() 	
 	coreclr.dll!5f8a9892() 	
 	coreclr.dll!5f8a9917() 	

while firefox tries to operate on the plugin container window and simply updates it:


 	ntdll.dll!_KiFastSystemCallRet@0() 	
 	user32.dll!_NtUserCallHwndLock@8()  + 0xc bytes	
>	user32.dll!_UpdateWindow@4()  + 0x25 bytes	
 	xul.dll!nsWindow::Invalidate(aRect, aIsSynchronous)  + 0x2ef924 bytes	C++
 	xul.dll!nsRunnableMethodImpl<void (__thiscall nsMediaChannelStream::*)(void),0>::Run()  Line 346	C++
 	xul.dll!nsThread::ProcessNextEvent(mayWait, result)  Line 639	C++
 	xul.dll!mozilla::ipc::MessagePump::Run(aDelegate)  Line 110 + 0x29 bytes	C++
 	xul.dll!MessageLoop::RunInternal()  Line 219 + 0x9 bytes	C++

unfortunately i also cannot reliable reproduce this behavior but it definitely hung up in that scenario as well
(In reply to ulrich krömer from comment #4)
> ...
> unfortunately i also cannot reliable reproduce this behavior but it
> definitely hung up in that scenario as well

Still see this?
And, bug 654649 is the same issue?
Flags: needinfo?(ulrich.kroemer)
hi,

we have identified that one deadlock was on our side: we have loaded a separate .net framework into the plugin which caused deadlocks in the garbage collector.

however there's still another problem as far as i can see: when the MSUIA is active (used by programs such as UISpy, Inspect, ...) then the automation peers for the silverlight objects will make calls from inside the plugin that may deadlock as seen in https://bugzilla.mozilla.org/show_bug.cgi?id=654649 

i think there are problems for the plugin container when outgoing calls are made that are crossing process boundaries.

i still don't have any re-producible sample though as the problem occurs only from time to time
Flags: needinfo?(ulrich.kroemer)
I'm marking this bug as WONTFIX per bug #1269807.

For more information see - https://blog.mozilla.org/futurereleases/2015/10/08/npapi-plugins-in-firefox/
Status: UNCONFIRMED → RESOLVED
Closed: 7 years ago
Resolution: --- → WONTFIX
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.