Closed Bug 1428759 Opened 6 years ago Closed 6 years ago

Hang after inserting a multiline message into Slack and sending it

Categories

(Core :: IPC: MSCOM, defect)

59 Branch
x86
Windows 10
defect
Not set
major

Tracking

()

RESOLVED FIXED
mozilla60
Tracking Status
firefox59 --- fixed
firefox60 --- verified

People

(Reporter: MarcoZ, Assigned: bugzilla)

References

Details

(Keywords: hang)

Attachments

(1 file, 1 obsolete file)

I got a hang this morning with Firefox freezing on me completely with NVDA. Here's what I did:

1. Entered into our a11y-standup channel in Slack.
2. Read previous messages.
3. Wrote a message, inserting new lines via shift+enter.
4. At the end, hit Enter to send it.

Result: NVDA froze, like the braille display would stop blinking for over two minutes, then blink furiously for a few seconds, then stop blinking again. The last typed text remained. Firefox became totally unresponsive, Escape also didn't take me out of focus into browse mode.

I pressed my shortcut to restart NVDA, which worked, until focus went back into the Firefox window. Then it froze again.

I then shut down Windows, which was being blocked. I force-quit whichever was causing the blocking, and when I restarted Firefox after restarting Windows, I was notified of an unsent crash report. I submitted it. The ID is bp-7fb923e3-52a4-4b55-ae39-f14760180108.

Jamie, have you ever seen this in Slack? Can you make anything of this crash dump?
Flags: needinfo?(jteh)
I occasionally see a similar freeze in Gmail when entering text. Unfortunately, I suspect it's related to bug 1419362. Debugging it is tricky because it's not easy to reproduce and because I'm pretty sure it's an infinite loop of some sort, so the stack keeps moving.

That crash report is weird. I'd expect to see some accessibility stuff in the stacks, but I don't. I'm going to ignore it for now and try to repro the crash.
Assignee: nobody → jteh
Flags: needinfo?(jteh)
I haven't been able to reproduce this, but I did manage to capture some dumps when I saw this a while ago. At the time, I didn't have enough cycles to dig into them.

Here's what appears to be happening:

1. In the parent process, NVDA requests text.
2. The handler calls HandlerProvider::get_AllTextInfo via COM, which runs in the content process MTA.
3. HandlerProvider::get_AllTextInfo dispatches the real work to the main thread (HandlerProvider::GetAllTextInfoMainThread).
4. GetAllTextInfoMainThread starts doing its stuff, including retrieving hyperlinks.
5. Meanwhile, the parent process is blocked in the COM call from 2), but it's an STA, so it allows incoming COM calls to execute.
6. A client calls IAccessible::get_accChild to get an accessible. This is probably in response to an event fired previously. In this dump, it's UI Automation, but note that this isn't specific to UI Automation; it could just as well be any other client handling events async (including NVDA itself). In other words, if it weren't UIA, it'd be something else.
7. accChild makes a COM call to the content process, which then has to marshal the accessible.
8. In the content process, the marshaler calls QueryInterface; specifically, mscom::WeakReferenceSupport::QueryInterface.
9. That acquires a critical section (lock), then calls mscom::Interceptor::ThreadSafeQueryInterface.
10. That calls mscom::Interceptor::QueryInterfaceTarget, which dispatches work to the content main thread and waits for it to complete.
11. Meanwhile, the content main thread is still executing GetAllTextInfoMainThread from 4).
12. After it retrieves a hyperlink, it needs to wrap it with an interceptor and indirectly calls mozilla::mscom::Interceptor::Create.
13. This calls mscom::WeakReferenceSupport::QueryInterface, which attempts to acquire the same lock as in 9).
14. Tragic deadlock ensues.

Aaron, can you explain why the mCSForQI critical section is necessary? Any ideas on how I can get around this? It seems this might happen if ever something in the main thread is trying to create an interceptor while an interceptor is also being created in an MTA thread.

For reference, here are the relevant stack traces with irrelevant stuff removed for clarity.

The parent process main thread (see point 6 above):
---start---
0:073> ~0 kp
 # Child-SP          RetAddr           Call Site
00 00000014`c67f1048 00007ffb`91903f5d win32u!NtUserMsgWaitForMultipleObjectsEx+0x14
01 00000014`c67f1050 00007ffb`92bfaeeb user32!RealMsgWaitForMultipleObjectsEx+0x1d
02 00000014`c67f1090 00007ffb`92bfa416 combase!CCliModalLoop::BlockFn(void ** ahEvent = 0x00000014`c67f1120, unsigned long cEvents = 2, unsigned long * lpdwSignaled = 0x00000014`c67f1160)+0x11f [onecore\com\combase\dcomrem\callctrl.cxx @ 2351] 
... More COM stuff ...
10 00000014`c67f2200 00007ffb`4cc0dbde AccessibleHandler!mozilla::a11y::AccessibleHandler::get_accChild(struct tagVARIANT * varChild = 0x00000014`c67f2280 I4 = -33579625, struct IDispatch ** ppdispChild = 0x00000014`c67f2270)+0x81 [z:\build\build\src\accessible\ipc\win\handler\accessiblehandler.cpp @ 638] 
11 00000014`c67f2250 00007ffb`4cc0dcc9 xul!GetProxiedAccessibleInSubtree(class mozilla::a11y::DocAccessibleParent * aDoc = <Value unavailable error>, struct tagVARIANT * aVarChild = 0x00000014`c67f2338 I4 = -33579625)+0x12e [z:\build\build\src\accessible\windows\msaa\accessiblewrap.cpp @ 1432] 
12 00000014`c67f22c0 00007ffb`4cc0d6af xul!mozilla::a11y::AccessibleWrap::GetRemoteIAccessibleFor(struct tagVARIANT * aVarChild = 0x00000014`c67f2338 I4 = -33579625)+0xa9 [z:\build\build\src\accessible\windows\msaa\accessiblewrap.cpp @ 1589] 
13 00000014`c67f2310 00007ffb`4cc10138 xul!mozilla::a11y::AccessibleWrap::GetIAccessibleFor(struct tagVARIANT * aVarChild = <Value unavailable error>, bool * aIsDefunct = 0x00000014`c67f23b8)+0x12f [z:\build\build\src\accessible\windows\msaa\accessiblewrap.cpp @ 1491] 
14 00000014`c67f2380 00007ffb`4cc119b9 xul!mozilla::a11y::AccessibleWrap::ResolveChild(struct tagVARIANT * aVarChild = <Value unavailable error>, struct IAccessible ** aOutInterface = 0x00000014`c67f23d0)+0x48 [z:\build\build\src\accessible\windows\msaa\accessiblewrap.cpp @ 318] 
15 00000014`c67f23b0 00007ffb`4cc0af4e xul!mozilla::a11y::AccessibleWrap::get_accRole(struct tagVARIANT * varChild = 0x00000014`c67f25b0 I4 = -33579625, struct tagVARIANT * pvarRole = 0x00000014`c67f2c80 Empty)+0x69 [z:\build\build\src\accessible\windows\msaa\accessiblewrap.cpp @ 440] 
16 00000014`c67f2590 00007ffb`92af0b73 xul!mozilla::a11y::LazyInstantiator::get_accRole(struct tagVARIANT * varChild = 0x0000012d`b0c4ad70, struct tagVARIANT * pvarRole = 0x00000014`c67f2c80 Empty)+0x52 [z:\build\build\src\accessible\windows\msaa\lazyinstantiator.cpp @ 774] 
17 00000014`c67f25e0 00007ffb`92aca045 rpcrt4!Invoke+0x73
... More COM stuff ...
40 00000014`c67f4da0 00007ffb`4cc0dbde AccessibleHandler!mozilla::a11y::AccessibleHandler::get_accChild(struct tagVARIANT * varChild = 0x00000014`c67f4e20 I4 = -33579625, struct IDispatch ** ppdispChild = 0x00000014`c67f4e10)+0x81 [z:\build\build\src\accessible\ipc\win\handler\accessiblehandler.cpp @ 638] 
41 00000014`c67f4df0 00007ffb`4cc0dcc9 xul!GetProxiedAccessibleInSubtree(class mozilla::a11y::DocAccessibleParent * aDoc = <Value unavailable error>, struct tagVARIANT * aVarChild = 0x00000014`c67f4ed8 I4 = -33579625)+0x12e [z:\build\build\src\accessible\windows\msaa\accessiblewrap.cpp @ 1432] 
42 00000014`c67f4e60 00007ffb`4cc0d6af xul!mozilla::a11y::AccessibleWrap::GetRemoteIAccessibleFor(struct tagVARIANT * aVarChild = 0x00000014`c67f4ed8 I4 = -33579625)+0xa9 [z:\build\build\src\accessible\windows\msaa\accessiblewrap.cpp @ 1589] 
43 00000014`c67f4eb0 00007ffb`4cc111cd xul!mozilla::a11y::AccessibleWrap::GetIAccessibleFor(struct tagVARIANT * aVarChild = <Value unavailable error>, bool * aIsDefunct = 0x00000014`c67f4f60)+0x12f [z:\build\build\src\accessible\windows\msaa\accessiblewrap.cpp @ 1491] 
44 00000014`c67f4f20 00007ffb`4cc0ac9e xul!mozilla::a11y::AccessibleWrap::get_accChild(struct tagVARIANT * varChild = <Value unavailable error>, struct IDispatch ** ppdispChild = 0x0000012d`a9bb73e0)+0x41 [z:\build\build\src\accessible\windows\msaa\accessiblewrap.cpp @ 261] 
45 00000014`c67f4f50 00007ffb`92af0b73 xul!mozilla::a11y::LazyInstantiator::get_accChild(struct tagVARIANT * varChild = 0x0000012d`b0c4a570, struct IDispatch ** ppdispChild = 0x0000012d`a9bb73e0)+0x52 [z:\build\build\src\accessible\windows\msaa\lazyinstantiator.cpp @ 730] 
46 00000014`c67f4fa0 00007ffb`92aca045 rpcrt4!Invoke+0x73
... The same pattern repeats quite a few times ...
9e 00000014`c67fa120 00007ffb`4cc0dbde AccessibleHandler!mozilla::a11y::AccessibleHandler::get_accChild(struct tagVARIANT * varChild = 0x00000014`c67fa1a0 I4 = -33577248, struct IDispatch ** ppdispChild = 0x00000014`c67fa190)+0x81 [z:\build\build\src\accessible\ipc\win\handler\accessiblehandler.cpp @ 638] 
9f 00000014`c67fa170 00007ffb`4cc0dcc9 xul!GetProxiedAccessibleInSubtree(class mozilla::a11y::DocAccessibleParent * aDoc = <Value unavailable error>, struct tagVARIANT * aVarChild = 0x00000014`c67fa258 I4 = -33577248)+0x12e [z:\build\build\src\accessible\windows\msaa\accessiblewrap.cpp @ 1432] 
a0 00000014`c67fa1e0 00007ffb`4cc0d6af xul!mozilla::a11y::AccessibleWrap::GetRemoteIAccessibleFor(struct tagVARIANT * aVarChild = 0x00000014`c67fa258 I4 = -33577248)+0xa9 [z:\build\build\src\accessible\windows\msaa\accessiblewrap.cpp @ 1589] 
a1 00000014`c67fa230 00007ffb`4cc111cd xul!mozilla::a11y::AccessibleWrap::GetIAccessibleFor(struct tagVARIANT * aVarChild = <Value unavailable error>, bool * aIsDefunct = 0x00000014`c67fa2e0)+0x12f [z:\build\build\src\accessible\windows\msaa\accessiblewrap.cpp @ 1491] 
a2 00000014`c67fa2a0 00007ffb`4cc0ac9e xul!mozilla::a11y::AccessibleWrap::get_accChild(struct tagVARIANT * varChild = <Value unavailable error>, struct IDispatch ** ppdispChild = 0x00000014`c67fa420)+0x41 [z:\build\build\src\accessible\windows\msaa\accessiblewrap.cpp @ 261] 
a3 00000014`c67fa2d0 00007ffb`85d4ef49 xul!mozilla::a11y::LazyInstantiator::get_accChild(struct tagVARIANT * varChild = 0x00000014`c67fa340 I4 = -33577248, struct IDispatch ** ppdispChild = 0x00000014`c67fa420)+0x52 [z:\build\build\src\accessible\windows\msaa\lazyinstantiator.cpp @ 730] 
a4 00000014`c67fa320 00007ffb`6d770a6d oleacc!AccWrap_Base::get_accChild+0x59
a5 00000014`c67fa370 00007ffb`6d794c14 uiautomationcore!AccUtils::get_accChild+0x6d
a6 00000014`c67fa3d0 00007ffb`6d72b8ee uiautomationcore!AccessibleProxy::Release+0x134b4
a7 00000014`c67fa490 00007ffb`6d72c5e8 uiautomationcore!CreateInProcMsaaProxy+0x1e
a8 00000014`c67fa4e0 00007ffb`6d72bd99 uiautomationcore!HookBasedServerConnection::ClaimPendingObject+0xa8
a9 00000014`c67fa530 00007ffb`6d73673a uiautomationcore!GetNodeFromProviderEntryPoint+0x499
aa 00000014`c67fa670 00007ffb`6d7347ed uiautomationcore!ProcessIncomingRequest+0x38a
... More UIA stuff ...
b3 00000014`c67fbc10 00007ffb`92bfa8d9 combase!CCliModalLoop::MyPeekMessage(struct tagMSG * pMsg = 0x00000014`c67fbcb0 {msg=0x92ab3f88 wp=0x14c67fbed0 lp=0x0}, struct HWND__ * hwnd = 0x00000000`00050672, unsigned int min = 0x400, unsigned int max = 0x7fff, unsigned short wFlag = 3)+0x52 [onecore\com\combase\dcomrem\callctrl.cxx @ 3090] 
... More COM modal loop stuff ...
c5 (Inline Function) --------`-------- AccessibleHandler!mozilla::a11y::AccessibleHandler::GetAllTextInfo+0x40 [z:\build\build\src\accessible\ipc\win\handler\accessiblehandler.cpp @ 222] 
c6 00000014`c67fd150 00007ffb`6cfeb5a6 AccessibleHandler!mozilla::a11y::AccessibleHandler::get_text(long startOffset = 0n0, long endOffset = 0n-1, wchar_t ** text = 0x00000014`c67fd280)+0x85 [z:\build\build\src\accessible\ipc\win\handler\accessiblehandler.cpp @ 1720] 
c7 00000014`c67fd190 00007ffb`6cfedcac VBufBackend_gecko_ia2!GeckoVBufBackend_t::fillVBuf(struct IAccessible2 * pacc = 0x0000012d`a0a92e08, class VBufStorage_buffer_t * buffer = 0x0000012d`b0c53330, class VBufStorage_controlFieldNode_t * parentNode = 0x0000012d`aa1d3970, class VBufStorage_fieldNode_t * previousNode = 0x00000000`00000000, struct IAccessibleTable * paccTable = 0x00000000`00000000, struct IAccessibleTable2 * paccTable2 = 0x00000000`00000000, long tableID = 0n0, wchar_t * parentPresentationalRowNumber = 0x00000000`00000000 "", bool ignoreInteractiveUnlabelledGraphics = false)+0x1446 [c:\projects\nvda\build\x86_64\vbufbackends\gecko_ia2\gecko_ia2.cpp @ 571] 
...
---end---

The content process MTA thread handling the marshal for accChild (see points 7-10 above):
---start---
0:086> ~67 kp
 # Child-SP          RetAddr           Call Site
00 (Inline Function) --------`-------- xul!mozilla::mscom::SpinEvent::Wait+0x4b [z:\build\build\src\ipc\mscom\spinevent.cpp @ 62] 
01 (Inline Function) --------`-------- xul!`anonymous-namespace'::SyncRunnable::WaitUntilComplete+0x4b [z:\build\build\src\ipc\mscom\mainthreadinvoker.cpp @ 59] 
02 000000ff`81dfd3b0 00007ffb`4b5ca72f xul!mozilla::mscom::MainThreadInvoker::Invoke(struct already_AddRefed<nsIRunnable> * aRunnable = <Value unavailable error>)+0x14e [z:\build\build\src\ipc\mscom\mainthreadinvoker.cpp @ 136] 
03 000000ff`81dfd400 00007ffb`4b5c9c34 xul!mozilla::mscom::Interceptor::QueryInterfaceTarget(struct _GUID * aIid = 0x000002dd`2f696af0 --- memory read error at address 0x000002dd`2f696af0 ---, void ** aOutput = 0x000000ff`81dfd4b8, class mozilla::BaseTimeDuration<mozilla::TimeDurationValueCalculator> * aOutDuration = 0x000000ff`81dfd4f0)+0x63 [z:\build\build\src\ipc\mscom\interceptor.cpp @ 735] 
04 000000ff`81dfd470 00007ffb`4b5cabb1 xul!mozilla::mscom::Interceptor::GetInterceptorForIID(struct _GUID * aIid = <Value unavailable error>, void ** aOutInterceptor = 0x000000ff`81dfd6d0)+0x1dc [z:\build\build\src\ipc\mscom\interceptor.cpp @ 664] 
05 000000ff`81dfd5b0 00007ffb`4b5cc380 xul!mozilla::mscom::Interceptor::ThreadSafeQueryInterface(struct _GUID * aIid = <Value unavailable error>, struct IUnknown ** aOutInterface = 0x000000ff`81dfd6d0)+0x361 [z:\build\build\src\ipc\mscom\interceptor.cpp @ 824] 
06 000000ff`81dfd680 00007ffb`92c5f8a9 xul!mozilla::mscom::WeakReferenceSupport::QueryInterface(struct _GUID * riid = 0x000002dd`2f696af0 --- memory read error at address 0x000002dd`2f696af0 ---, void ** ppv = 0x000000ff`81dfd740)+0xa8 [z:\build\build\src\ipc\mscom\weakref.cpp @ 140] 
07 000000ff`81dfd6c0 00007ffb`92c66482 combase!CStdMarshal::CreateStub(struct tagIPIDEntry * pEntry = 0x000002dd`2f696ab0, struct IRpcStubBuffer ** ppStub = 0x000000ff`81dfd9f8, void ** ppv = 0x000000ff`81dfda00, int * pfNonNDR = 0x000000ff`81dfd9ac, struct IUnknown * pUnkUseInner = 0x00000000`00000000)+0xe9 [onecore\com\combase\dcomrem\marshal.cxx @ 6521] 
08 (Inline Function) --------`-------- combase!CStdMarshal::ConnectSrvIPIDEntry+0x2d [onecore\com\combase\dcomrem\marshal.cxx @ 2627] 
09 (Inline Function) --------`-------- combase!CStdMarshal::MarshalServerIPID+0x32e [onecore\com\combase\dcomrem\marshal.cxx @ 1581] 
0a (Inline Function) --------`-------- combase!CStdMarshal::MarshalIPID+0x36a [onecore\com\combase\dcomrem\marshal.cxx @ 1433] 
0b 000000ff`81dfd910 00007ffb`92c64f57 combase!CStdMarshal::MarshalObjRef(struct tagOBJREF * objref = 0x000000ff`81dfdb50, struct _GUID * riid = 0x000002dd`10c62c00 --- memory read error at address 0x000002dd`10c62c00 ---, unsigned long mshlflags = 0, unsigned long dwDestCtx = 0, void * pvDestCtx = 0x00000000`00000000, class PreventRundownBiasContainer * pBiasContainer = 0x00000000`00000000, bool bApplyDirectMarshalingMitigation = false)+0x992 [onecore\com\combase\dcomrem\marshal.cxx @ 1120] 
0c 000000ff`81dfdae0 00007ffb`4b5d2341 combase!CStdMarshal::MarshalInterface(struct IStream * pStm = 0x000000ff`81dfdd68, struct _GUID * riid = 0x000002dd`10c62c00 --- memory read error at address 0x000002dd`10c62c00 ---, void * pv = 0x000002dd`059863a0, unsigned long dwDestCtx = 0, void * pvDestCtx = 0x00000000`00000000, unsigned long mshlflags = 4)+0x117 [onecore\com\combase\dcomrem\marshal.cxx @ 1010] 
0d 000000ff`81dfdc10 00007ffb`4b5ca3c0 xul!mozilla::mscom::FastMarshaler::MarshalInterface(struct IStream * pStm = 0x000000ff`81dfdd68, struct _GUID * riid = 0x000002dd`10c62c00 --- memory read error at address 0x000002dd`10c62c00 ---, void * pv = 0x000002dd`059863a0, unsigned long dwDestContext = 0, void * pvDestContext = 0x00000000`00000000, unsigned long mshlflags = 0)+0x71 [z:\build\build\src\ipc\mscom\fastmarshaler.cpp @ 157] 
0e 000000ff`81dfdc70 00007ffb`92c68995 xul!mozilla::mscom::Interceptor::MarshalInterface(struct IStream * pStm = 0x000000ff`81dfdd68, struct _GUID * riid = 0x000002dd`10c62c00 --- memory read error at address 0x000002dd`10c62c00 ---, void * pv = 0x000002dd`059863a0, unsigned long dwDestContext = 0, void * pvDestContext = 0x00000000`00000000, unsigned long mshlflags = 0)+0x9c [z:\build\build\src\ipc\mscom\interceptor.cpp @ 392] 
0f 000000ff`81dfdd00 00007ffb`92af0b73 combase!CRemoteUnknown::RemQueryInterface2(struct _GUID * ripid = <Value unavailable error>, unsigned short cIids = 1, struct _GUID * iids = 0x000002dd`10c62c00 --- memory read error at address 0x000002dd`10c62c00 ---, HRESULT * phr = 0x000002dd`10c62a50, struct tagMInterfacePointer ** ppMIFs = 0x000002dd`10c62f90)+0x1c5 [onecore\com\combase\dcomrem\remoteu.cxx @ 1330] 
10 000000ff`81dfdde0 00007ffb`92aca045 rpcrt4!Invoke+0x73
...
---end---

The content process main thread running GetAllTextInfoMainThread and trying to wrap a hyperlink in an interceptor (see points 11-13 above):
---start---
0:086> ~0 kp
 # Child-SP          RetAddr           Call Site
00 000000ff`ea9fe5e8 00007ffb`9435de7b ntdll!NtWaitForAlertByThreadId+0x14
01 000000ff`ea9fe5f0 00007ffb`9435dd92 ntdll!RtlpWaitOnAddressWithTimeout+0x43
02 000000ff`ea9fe620 00007ffb`9435dc17 ntdll!RtlpWaitOnAddress+0xb2
03 000000ff`ea9fe690 00007ffb`943759f9 ntdll!RtlpWaitOnCriticalSection+0xd7
04 000000ff`ea9fe700 00007ffb`94375910 ntdll!RtlpEnterCriticalSectionContended+0xd9
05 000000ff`ea9fe730 00007ffb`4b5cc359 ntdll!RtlEnterCriticalSection+0x40
06 (Inline Function) --------`-------- xul!AutoCriticalSection::{ctor}+0x9 [z:\build\build\src\xpcom\base\nswindowshelpers.h @ 25] 
07 000000ff`ea9fe760 00007ffb`4b5c9317 xul!mozilla::mscom::WeakReferenceSupport::QueryInterface(struct _GUID * riid = 0x00007ffb`4d9a2b20 {01C20F2B-3DD2-400F-949F-AD00BDAB1D41}, void ** ppv = 0x000002dd`057e97d8)+0x81 [z:\build\build\src\ipc\mscom\weakref.cpp @ 140] 
08 000000ff`ea9fe7a0 00007ffb`4cc55e60 xul!mozilla::mscom::Interceptor::Create(class mozilla::UniquePtr<IUnknown,mozilla::mscom::detail::MainThreadRelease<IUnknown> > * aTarget = 0x000000ff`ea9fe830, struct mozilla::mscom::IInterceptorSink * aSink = 0x000002dd`14f75ee0, struct _GUID * aInitialIid = 0x00007ffb`4d9a2b20 {01C20F2B-3DD2-400F-949F-AD00BDAB1D41}, void ** aOutInterface = 0x000002dd`057e97d8)+0xb3 [z:\build\build\src\ipc\mscom\interceptor.cpp @ 266] 
09 000000ff`ea9fe800 00007ffb`4cc563d6 xul!mozilla::mscom::CreateInterceptor<IAccessibleHyperlink>(class mozilla::UniquePtr<IAccessibleHyperlink,mozilla::mscom::detail::MainThreadRelease<IAccessibleHyperlink> > * aTargetInterface = 0x000000ff`ea9fe850, struct mozilla::mscom::IInterceptorSink * aEventSink = <Value unavailable error>, struct IAccessibleHyperlink ** aOutInterface = <Value unavailable error>)+0x3c [z:\build\build\src\obj-firefox\dist\include\mozilla\mscom\interceptor.h @ 162] 
0a 000000ff`ea9fe830 00007ffb`4cc56333 xul!mozilla::mscom::MainThreadHandoff::WrapInterface<IAccessibleHyperlink>(class mozilla::UniquePtr<IAccessibleHyperlink,mozilla::mscom::detail::MainThreadRelease<IAccessibleHyperlink> > * aTargetInterface = 0x000000ff`ea9fe8c0, struct mozilla::mscom::IHandlerProvider * aHandlerProvider = <Value unavailable error>, struct IAccessibleHyperlink ** aOutInterface = 0x000002dd`057e97d8)+0x76 [z:\build\build\src\obj-firefox\dist\include\mozilla\mscom\mainthreadhandoff.h @ 50] 
0b 000000ff`ea9fe870 00007ffb`4cc56cca xul!mozilla::a11y::HandlerProvider::ToWrappedObject<IAccessibleHyperlink>(struct IAccessibleHyperlink ** aObj = 0x000002dd`057e97d8)+0x83 [z:\build\build\src\accessible\ipc\win\handlerprovider.cpp @ 554] 
0c 000000ff`ea9fe8b0 00007ffb`4cc574ec xul!mozilla::a11y::HandlerProvider::GetAllTextInfoMainThread(wchar_t ** aText = 0x000000ff`f783e5e0, struct IAccessibleHyperlink *** aHyperlinks = 0x000000ff`f783e5f0, long * aNHyperlinks = 0x000000ff`f783e600, struct IA2TextSegment ** aAttribRuns = 0x000000ff`f783e610, long * aNAttribRuns = 0x000000ff`f783e620, HRESULT * result = 0x000000ff`f783e0d0)+0xc2 [z:\build\build\src\accessible\ipc\win\handlerprovider.cpp @ 609] 
---end---
Flags: needinfo?(aklotz)
I haven't been able to reproduce this in Slack, nor could I reproduce it using an HTML test case. I eventually found a very obscure way to reproduce the deadlock, but it requires quite a few things. I'm noting this here mostly so I don't forget how to manage it. :)

First, you need to modify Gecko to give you sufficient time to reproduce it. The following patch causes a sleep of 5 seconds when retrieving text for a node with 97 hyperlinks. The arbitrary number is so I can use the browser enough to test this without things falling over.

---start---
diff --git a/accessible/ipc/win/HandlerProvider.cpp b/accessible/ipc/win/HandlerProvider.cpp
index b27ecd08f625..cfa7d446a2d0 100644
--- a/accessible/ipc/win/HandlerProvider.cpp
+++ b/accessible/ipc/win/HandlerProvider.cpp
@@ -604,6 +604,8 @@ HandlerProvider::GetAllTextInfoMainThread(BSTR* aText,
     // -1 signals to the handler that it should call hyperlinks itself.
     *aNHyperlinks = -1;
   }
+  if (*aNHyperlinks == 97)
+    Sleep(5000); // jtd
   // We must wrap these hyperlinks in an interceptor.
   for (long index = 0; index < *aNHyperlinks; ++index) {
     ToWrappedObject(&(*aHyperlinks)[index]);
---end---

Then you need this test case, which has 97 hyperlinks inside the content div:

---start---
<div id="content"></div>
<script>
var content = document.getElementById("content");
var out = "";
for (var i = 0; i < 97; ++i) {
	out += "<p>test</p>";
}
content.innerHTML = out;
</script>
---

Now, with NVDA:
1. Start the modified Firefox.
2. Load the test case.
3. After NVDA finishes loading the buffer (which will take 5 seconds), use object navigation to get to the first child of the document (a section, which is the content div).
4. Open the NVDA Python Console with NVDA+control+z.
5. Enter the following command:
l = nav.iaHypertext.hyperlink(0)
6. Now, enter the following command but do *not* execute it yet:
import IAccessibleHandler as iah; l.QueryInterface(iah.IAccessibleApplication)
7. Alt+tab back to Firefox.
8. NVDA+f5 to refresh the buffer.
9. Quickly (before 5 seconds), alt+tab back to the Python console.
10. Press enter.
11. Deadlock!

The key here is that we need to make an interceptor query for an interface that isn't cached (so it has to ask the main thread), but have GetAllTextInfoMainThread try to query an interface from the *same* interceptor. That's why we need the hyperlink (step 5), rather than being able to just navigate directly to the first child of the div, which would use a different interceptor (different IUnknown).

I suspect this would be less fiddly if we were doing this in-process because of the handler payload, but doing it in-proc is much harder itself, so this will do. It's definitely the same deadlock, just not the same path to obtaining it as in the real world.
Spoke to Aaron at some length about this yesterday, so cancelling NI. In short, the current lock is probably protecting more than it needs to, but figuring out exactly what needs protecting and when is a bit tricky.
Flags: needinfo?(aklotz)
Updating component.
Component: Disability Access APIs → IPC: MSCOM
Upon further examination, I have reason to believe that WeakReferenceSupport::mCSForQI is mostly vestegial with respect to the way that mscom weak refs were initially implemented (obviously something like thread-safe COM weak refs is complex enough to require a few iterations to get right ;-) ).

OTOH, because that CS was there, any code paths stemming from ThreadSafeQueryInterface are assuming thread safety. Based on this assessment, I think that:

1. We cam remove mCSForQI entirely and rename ThreadSafeQueryInterface to something else, however:
2. We need to review Interceptor and MainThreadHandoff code and implement mutual exclusion in those areas as necessary.

Jamie, what do you think about this assessment?
Flags: needinfo?(jteh)
Attached patch aklotz's POC (obsolete) — Splinter Review
Here's my current POC. I haven't completely analyzed the whole thing for correctness.
Attachment #8942339 - Flags: feedback?(jteh)
(In reply to Aaron Klotz [:aklotz] from comment #6)
> 1. We cam remove mCSForQI entirely and rename ThreadSafeQueryInterface to
> something else, however:
> 2. We need to review Interceptor and MainThreadHandoff code and implement
> mutual exclusion in those areas as necessary.
> 
> Jamie, what do you think about this assessment?

This makes sense to me, and is where my thinking was heading (albeit not quite as clearly or concisely as yours :) ). I'm not worried about QI/destruction races or the like because there's a stabilisation reference in WeakReferenceSupport::QueryInterface.
Flags: needinfo?(jteh)
Comment on attachment 8942339 [details] [diff] [review]
aklotz's POC

This looks good to me. I also tested the patch with the repro steps I outlined in comment 3 and couldn't reproduce the deadlock.

One question: You have a mutex around all uses of mStdMarshal. I totally understand why you need that for QI; two threads could race querying for and thus creating mStdMarshal. However, is this actually necessary outside of QI?
Other code which uses mStdmarshal should only be reachable after QI has been called. Therefore, mStdMarshal can never be null in such code.
QI will only ever create mStdMarshal once.
Code other than QI only calls mStdMarshal; it doesn't assign to it.
Therefore, after the first QI for IMarshal, mStdMarshal will never change.

Do we have reason to believe the standard marshaler isn't thread-safe? Or is there another reason? or is this just a matter of "we're not sure, better to be safe than sorry"? I don't think we were previously guarding mStdMarshal for calls like GetUnmarshalClass, GetMarshalSizeMax, etc.
Attachment #8942339 - Flags: feedback?(jteh) → feedback+
FWIW, I tested quickly removing the mStdMarshal mutex outside of QI and everything seemed to behave correctly. Obviously, that doesn't "prove" anything; it could just be that I didn't hit the relevant race. Still, I thought it worth testing and noting.
Attached patch PatchSplinter Review
Revised patch. We don't need to lock inside the IMarshal method implementations.
Assignee: jteh → aklotz
Attachment #8942339 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #8944940 - Flags: review?(jteh)
Comment on attachment 8944940 [details] [diff] [review]
Patch

Looks great. Thank you sir!
Attachment #8944940 - Flags: review?(jteh) → review+
https://hg.mozilla.org/mozilla-central/rev/06e5dd66fb82
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla60
Depends on: 1433046
Comment on attachment 8944940 [details] [diff] [review]
Patch

Approval Request Comment
[Feature/Bug causing the regression]: Exposed by bug 1419362, but may have occurred under other circumstances as well.
[User impact if declined]: Intermittent complete hangs for screen reader users requiring the user to manually kill the Firefox processes.
[Is this code covered by automated tests?]: No, because we don't have a mechanism for testing platform accessibility.
[Has the fix been verified in Nightly?]: Yes.
[Needs manual test from QE? If yes, steps to reproduce]: No.
[List of other uplifts needed for the feature/fix]: The patches from bug 1433046.
[Is the change risky?]: Low risk.
[Why is the change risky/not risky?]: While the patch did introduce some early crashes, these have since been fixed and we've given this quite some time to bake to ensure no further problems. In addition, this only affects accessibility users.
[String changes made/needed]: None.
Attachment #8944940 - Flags: approval-mozilla-beta?
Verified fixed in nightly by both Marco and I using the STR from comment 3. Also, neither of us have seen this in our daily usage since the patch landed.
Comment on attachment 8944940 [details] [diff] [review]
Patch

Fix for a new issue exposed in 59 with a well-baked and verified patch. Approved for Fx59rc1.
Attachment #8944940 - Flags: approval-mozilla-beta? → approval-mozilla-release+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: