Closed Bug 1325834 Opened 3 years ago Closed 3 years ago

Crash in mozalloc_abort | NS_DebugBreak | mozilla::ipc::FatalError | mozilla::a11y::PDocAccessibleChild::OnMessageReceived

Categories

(Core :: Disability Access APIs, defect, critical)

Unspecified
Windows 10
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla55
Tracking Status
firefox52 --- disabled
firefox-esr52 --- disabled
firefox53 --- disabled
firefox54 --- disabled
firefox55 --- fixed

People

(Reporter: njn, Assigned: aklotz)

References

Details

(Keywords: crash, Whiteboard: aes+)

Crash Data

Attachments

(4 files, 1 obsolete file)

This bug was filed from the Socorro interface and is 
report bp-7aa74c38-d302-4440-8f7c-c6ea62161225.
=============================================================

135 occurrences of this crash on Nightly and Aurora over the past week, but from only 6 installations.

First Nightly occurrence was in 20161217030205, first Aurora occurrence was in 20161222004019, but the crashes don't show up in every day's build so the relevant code may have landed earlier than those dates.

I don't think this crash has shown up before so I suspect it's fallout from one of the a11y fixes that landed/uplifted recently.

tbsaunde, any ideas?
Flags: needinfo?(tbsaunde+mozbugs)
The fatal error message is "Error deserializing 'IAccessibleHolder'".
Crash Signature: [@ mozalloc_abort | NS_DebugBreak | mozilla::ipc::FatalError | mozilla::a11y::PDocAccessibleChild::OnMessageReceived] → [@ mozalloc_abort | NS_DebugBreak | mozilla::ipc::FatalError | mozilla::a11y::PDocAccessibleChild::OnMessageReceived] [@ mozalloc_abort | NS_DebugBreak | mozilla::ipc::FatalError | mozilla::ipc::IProtocol::HandleFatalError | mozilla::ipc::IProtocol::Fata…
Whiteboard: aes?
Adding it to the pile.
Flags: needinfo?(tbsaunde+mozbugs)
Whiteboard: aes? → aes+
Trevor is back and agreed to take a look along with the other top e10s a11y stability issues.
Assignee: nobody → tbsaunde+mozbugs
Assignee: tbsaunde+mozbugs → aklotz
also, no 3rd party module correlations, just our shim dll. It's odd this shows up so high on the top crash list (#6 on aurora), I wonder if this is being pushed up by desktop helper apps like remote desktop that use our a11y apis?

(100.0% in signature vs 08.10% overall) moz_crash_reason = MOZ_CRASH()
(100.0% in signature vs 11.98% overall) Module "ia2marshal.dll" = true
(100.0% in signature vs 15.72% overall) accessibility = Active
--
(82.28% in signature vs 01.70% overall) Module "msiltcfg.dll" = true
(82.28% in signature vs 02.70% overall) Module "msi.dll" = true
(99.40% in signature vs 28.50% overall) Module "samlib.dll" = true
See Also: → 1329816
Keywords: leave-open
I want to add some diagnostic asserts to COM proxy deserialization code so that we can narrow this down a bit.
Attachment #8825535 - Flags: review?(jmathies)
Attachment #8825535 - Flags: review?(jmathies) → review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/8da0d71a754c6dc949b65c50bfea0ed6eee1f8ee
Bug 1325834: Add MOZ_DIAGNOSTIC_ASSERTs to mscom proxy deserialization code; r=jimm
Those failures are actually useful to this cause.

After a few try builds to properly massage the failures so that they are surfaced in our CI crash dumps, I see that CreateStreamOnHGlobal is failing with E_INVALIDARG. If that is the case, then we would never have sent a valid stream to begin with.

But I have no idea why that would be failing. The parameters are trivial!
Flags: needinfo?(aklotz)
Crashes with this signature:

> mozalloc_abort | NS_DebugBreak | mozilla::ipc::FatalError | mozilla::ipc::IProtocol::HandleFatalError | mozilla::ipc::IProtocol::FatalError | mozilla::a11y::PDocAccessibleChild::OnMessageReceived

are #6 topcrash in Nightly 20170113030227.
Looks like I was getting bad data. A few more try pushes later, I see that we're trying to serialize a null object. This works, but isn't really supposed to be happening.

I don't think that I can move further on this until bug 1329977 lands.
Status: NEW → ASSIGNED
Depends on: 1329977
It seems likely that some documents are created in content processes without
  a DocAccessibleChild actor because there is no docshell or tabchild
  associated with the document.  However DocAccessible::DoInitialUpdate()
  already calls functions that assume the document is associated with a
  docshell.  So hopefully trying to create the child actor there will mean it
  is more successful.
Attachment #8827429 - Flags: review?(dbolter)
Comment on attachment 8827429 [details] [diff] [review]
create the DocAccessibleChild in DocAccessible::DoInitialUpdate()

Review of attachment 8827429 [details] [diff] [review]:
-----------------------------------------------------------------

::: accessible/generic/DocAccessible.cpp
@@ +1466,5 @@
> +    if (IPCAccessibilityActive()) {
> +      nsIDocShell* docShell = mDocumentNode->GetDocShell();
> +      if (RefPtr<dom::TabChild> tabChild = dom::TabChild::GetFrom(docShell)) {
> +        DocAccessibleChild* ipcDoc = new DocAccessibleChild(this);
> +        SetIPCDoc(ipcDoc);

Notes to self: there is a case in NotificationController::WillRefresh where we set an IPCDoc if there isn't one, but we don't tell the parent (we don't SendBindChildDoc).
Attachment #8827429 - Flags: review?(dbolter) → review+
Pushed by tsaunders@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/cbce75114440
create the DocAccessibleChild in DocAccessible::DoInitialUpdate() r=davidb
Pushed by tsaunders@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/148ee5df4dd3
fixup windows bustage landed on a CLOSED TREE
I'm not sure we've seen this crash since build 20170116030326 ?
See Also: → 1333275
(In reply to David Bolter [:davidb] from comment #19)
> I'm not sure we've seen this crash since build 20170116030326 ?

I see them, latest reports are on 20170130030205. so still a valid bug.
Blocks: 1329816
No longer depends on: 1329977
Comment on attachment 8841679 [details]
Bug 1325834: Add crash reporter annotations to report COM marshaling failure codes;

https://reviewboard.mozilla.org/r/115834/#review117442
Attachment #8841679 - Flags: review?(jmathies) → review+
Pushed by aklotz@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/82b3706348ae
Add crash reporter annotations to report COM marshaling failure codes; r=jimm
Depends on: 1344187
No longer blocks: 1344905
Depends on: 1345550
Hi Aaron, where would I look for these annotations? Are they collected in a field on the crash report?
Flags: needinfo?(aklotz)
(In reply to David Bolter [:davidb] from comment #26)
> Hi Aaron, where would I look for these annotations? Are they collected in a
> field on the crash report?

Yes, in the metadata tab. For instance: bp-4c57c544-4220-43d4-883c-3ca452170310
Flags: needinfo?(aklotz)
Here are today's aggregations, with the HRESULT constant names substituted for the error codes:

CoGetInterfaceAndReleaseStream failures:

1 E_INVALIDARG 			23 	36.51 %
2 TYPE_E_CANTLOADLIBRARY	16 	25.40 %
3 TYPE_E_LIBNOTREGISTERED 	4 	6.35 %

CoMarshalInterface failures:
1 E_INVALIDARG 	 	 	43 	68.25 %
Here's a breakdown by OS version:

Windows 7
=========

CoGetInterfaceAndReleaseStream failures:
1 E_INVALIDARG 	 	 	23 	53.49 %
2 TYPE_E_CANTLOADLIBRARY 	8 	18.60 %

CoMarshalInterface failures:
1 E_INVALIDARG 	 	 	31 	72.09 %



Windows 10
==========

CoGetInterfaceAndReleaseStream failures:
1 TYPE_E_CANTLOADLIBRARY 	8 	40.00 %
2 TYPE_E_LIBNOTREGISTERED 	4 	20.00 %

It is interesting to note that there are no E_INVALIDARG errors from CoGetInterfaceAndReleaseStream on Windows 10.

CoMarshalInterface failures:
1 E_INVALIDARG 	 	 	12 	60.00 %
(In reply to Aaron Klotz [:aklotz] from comment #29)
> It is interesting to note that there are no E_INVALIDARG errors from
> CoGetInterfaceAndReleaseStream on Windows 10.

I say this because the IStream implementation that we use for that use case is from SHCreateMemStream, which was rewritten for Windows 8 with a more robust implementation. I am wondering whether the Windows 7 implementation is causing us problems in that case. I might try using a different stream implementation to see if that helps at all.
I also did some additional cleanup for robustness in the hope that maybe it will help deal with some of these other errors.
Attachment #8847365 - Flags: review?(jmathies)
This revision includes some more cleanup
Attachment #8847365 - Attachment is obsolete: true
Attachment #8847365 - Flags: review?(jmathies)
Attachment #8847380 - Flags: review?(jmathies)
Comment on attachment 8847380 [details] [diff] [review]
Change mscom::ProxyStream to use CreateStreamOnHGlobal instead of SHCreateMemStream on Windows 7 (r2)

Review of attachment 8847380 [details] [diff] [review]:
-----------------------------------------------------------------

::: ipc/mscom/COMPtrHolder.h
@@ +105,5 @@
>      int bufLen;
>      const BYTE* buf = proxyStream.GetBuffer(bufLen);
>      MOZ_ASSERT(buf || !bufLen);
>      aMsg->WriteInt(bufLen);
> +    if (bufLen) {

Curious, why add this check?

::: ipc/mscom/ProxyStream.cpp
@@ +99,5 @@
> +    if (!stream) {
> +      return nullptr;
> +    }
> +  } else {
> +    HGLOBAL hglobal = ::GlobalAlloc(GMEM_MOVEABLE, aInitBufSize);

we have a helper for HGLOBALs that might be useful -
http://searchfox.org/mozilla-central/source/xpcom/base/nsWindowsHelpers.h#220
Attachment #8847380 - Flags: review?(jmathies) → review+
(In reply to Jim Mathies [:jimm] from comment #33)
> Comment on attachment 8847380 [details] [diff] [review]
> Change mscom::ProxyStream to use CreateStreamOnHGlobal instead of
> SHCreateMemStream on Windows 7 (r2)
> 
> Review of attachment 8847380 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Curious, why add this check?
> 

ProxyStream is now more permissive toward marshaling null COM objects. Instead of returning a failure and crashing IPC, it will now output an empty buffer. In that case I wanted to skip the unnecessary call to WriteBytes. I do suspect that we may be seeing null objects being passed through a11y IPC. While this is not good, I think that crashing in IPC is the wrong place for those problems to be surfaced. If we're passing nulls, I'd rather that the failure be exposed in code that actually points to the real problem, i.e. when a11y actually tries to do something with that null.

> ::: ipc/mscom/ProxyStream.cpp
> @@ +99,5 @@
> > +    if (!stream) {
> > +      return nullptr;
> > +    }
> > +  } else {
> > +    HGLOBAL hglobal = ::GlobalAlloc(GMEM_MOVEABLE, aInitBufSize);
> 
> we have a helper for HGLOBALs that might be useful -
> http://searchfox.org/mozilla-central/source/xpcom/base/nsWindowsHelpers.h#220

I decided against using that because ownership of the HGLOBAL is immediately handed off to the stream in the call to CreateStreamFromHGlobal.
https://hg.mozilla.org/integration/mozilla-inbound/rev/2989d8d208f26e8f699e313fba1e45c8fc993381
Bug 1325834: Make mscom::ProxyStream use CreateStreamOnHGlobal instead of SHCreateMemStream on Windows 7; r=jimm
Depends on: 1348069
This hasn't been seen for quite some time. Resolving.
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Do we need to backport something to Aurora/Beta here still?
Flags: needinfo?(aklotz)
Target Milestone: --- → mozilla55
It's disabled on those channels.
Depends on: 1363127
Depends on: 1380905
Removing leave-open keyword from resolved bugs, per :sylvestre.
Keywords: leave-open
No longer depends on: 1380905
Regressions: 1380905
You need to log in before you can comment on or make changes to this bug.