Closed Bug 1383501 Opened 7 years ago Closed 6 years ago

PBrowserParent::RecvPDocAccessibleConstructor Constructing a top-level PDocAccessible with null COM

Categories

(Core :: Disability Access APIs, defect, P1)

56 Branch
defect

Tracking

()

RESOLVED FIXED
mozilla58
Tracking Status
firefox-esr52 --- unaffected
firefox56 --- unaffected
firefox57 + fixed
firefox58 --- fixed

People

(Reporter: fahimazulfath.a, Assigned: jimm)

References

Details

(Keywords: crash, crashreportid, topcrash, Whiteboard: aes+)

Crash Data

Attachments

(12 files, 1 obsolete file)

59.97 KB, image/png
Details
10.49 KB, patch
jimm
: review+
Details | Diff | Splinter Review
1.90 KB, patch
eeejay
: review+
Details | Diff | Splinter Review
12.55 KB, patch
jimm
: review+
Details | Diff | Splinter Review
4.22 KB, patch
jimm
: review+
Details | Diff | Splinter Review
2.27 KB, patch
handyman
: review+
Details | Diff | Splinter Review
2.13 KB, patch
jimm
: review+
Details | Diff | Splinter Review
1.18 KB, patch
yzen
: review+
Details | Diff | Splinter Review
2.20 KB, patch
eeejay
: review+
Details | Diff | Splinter Review
1.11 KB, patch
jimm
: review+
Details | Diff | Splinter Review
9.31 KB, patch
jimm
: review+
Details | Diff | Splinter Review
3.31 KB, patch
jimm
: review+
Details | Diff | Splinter Review
Attached image Testcase1.png
User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:56.0) Gecko/20100101 Firefox/56.0
Build ID: 20170722030204

Steps to reproduce:

Step to Reproduce:
1) Open any different site in 4 tabs,
2) In the 5th tab open http://www.daenotes.com/
3) Wait for secs to load.



Actual results:

5th tab crashed.


Expected results:

5th tab should load the site properly.
Thanks!
Crash Signature: [@ IPCError-browser | PBrowserParent::RecvPDocAccessibleConstructor Constructing a top-level PDocAccessible with null COM ]
Component: Untriaged → Disability Access APIs
Keywords: crash, crashreportid
Product: Firefox → Core
Aaron, is this one of our known crashes?
Flags: needinfo?(aklotz)
Priority: -- → P1
Whiteboard: aes+
Hi Fahima, your report indicates the accessibility service is active. Are you running any of the following?

1) screen readers or other assistive technology
2) accessing the machine through remote desktop
3) running any desktop utilities that might trigger this like form fillers or spyware protection

Also, would you mind posting your about:support text?

Thanks!
Flags: needinfo?(fahimazulfath.a)
Depends on: 1383260
Flags: needinfo?(aklotz)
Marco does the STR work for you?
Flags: needinfo?(mzehe)
I don't get a crash. I cannot reproduce this.
Flags: needinfo?(mzehe)
Any theories here? This is the #2 crash sig in current nightlies.
Flags: needinfo?(surkov.alexander)
Flags: needinfo?(eitan)
Flags: needinfo?(aklotz)
This happens when the content process count (4) is exhausted. Something about loading a second doc in the first process?
I am really annoyed that this is still happening. Maybe I'll try to MOZ_DIAGNOSTIC_ASSERT the shit out of it.
Flags: needinfo?(aklotz)
Flags: needinfo?(fahimazulfath.a)
Flags: needinfo?(eitan)
from crash stats:
- affects all win platforms
- predominately a startup crash
- UIA/UNKNOWN are the predominate clients
I wonder if I could repro this by re-enabling touchscreen a11y instantiation (it uses UIA)...
Jim can we facet crash data on out of process instantiator to help find potential STR?
Flags: needinfo?(jmathies)
I'm wondering if somebody is trying to use Narrator with Firefox. That uses UIA exclusively AFAIK.
Summary: A tab got crashed on opening a site → PBrowserParent::RecvPDocAccessibleConstructor Constructing a top-level PDocAccessible with null COM
(In reply to David Bolter [:davidb] (NeedInfo me for attention) from comment #13)
> Jim can we facet crash data on out of process instantiator to help find
> potential STR?

once bug 1388866 gets fixed.
Flags: needinfo?(jmathies)
Assignee: nobody → aklotz
Status: NEW → ASSIGNED
Flags: needinfo?(surkov.alexander)
Attachment #8895954 - Flags: review?(jmathies) → review+
This is the #4 windows topcrash in Nightly 20170818100226, with 76 occurrences so far.

It's still present but less frequent in Nightly 20170819100442 (6 occurrences so far) and Nightly 20170820100343 (2 occurrences so far).

aklotz, do those numbers seem reasonable to you?
Flags: needinfo?(aklotz)
(It looks like rscp_bg.exe is the only out-of-proc accessibility client that shows up in reports? (less than 25% though))
HKEY_CLASSES_ROOT presents a merged view of HKEY_LOCAL_MACHINE and HKEY_CURRENT_USER such that HKEY_CURRENT_USER overrides HKEY_LOCAL_MACHINE.

We should always be using the HKEY_LOCAL_MACHINE view, which is located at HKEY_LOCAL_MACHINE\SOFTWARE\Classes.
Attachment #8899564 - Flags: review?(eitan)
Two comment says searching in baidu.com crashes the browser.
Attachment #8899564 - Flags: review?(eitan) → review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/63750be3d228a98cb05b0b569d1df81cdaf4bceb
Bug 1383501: Use HKEY_LOCAL_MACHINE\SOFTWARE\Classes for resolving COM registration instead of HKEY_CLASSES_ROOT; r=eeejay
The good news is that so far with today's Nightly, I haven't seen any new reports of the typical crash that we see with this signature.

The bad news is that there are two reports from today's Nightly build that are different from any that we've seen before:

https://crash-stats.mozilla.com/report/index/3e6b076a-1ba2-4909-a926-71b610170825
https://crash-stats.mozilla.com/report/index/9115ee82-6028-4d63-ac90-eafd80170825

(Same user, two different crashes)

Their browser is displaying the show script dialog. Pumping the main event loop is causing incoming IPC messages to be received. CoGetInterfaceAndReleaseStream has returned 0x8001010D.

Neither of these crashes have complete call stacks for their browser main threads, but what we must be seeing here is that, higher up this stack, we are already handling a sent message. Then we receive the IPC during the event loop spinning, triggering RecvPDocAccessibleConstructor and then failing (because on the COM STA we can't already be inside a SendMessage handler).

Note that this user has an UNKNOWN a11y instantiator (from the perspective of a11y::Compatibility, though the external instantiator is being shown as explorer.exe). We don't enable my compat hack for this for UNKNOWN clients.

I think we either need to figure out a way to fail gracefully when we hit this error code, or revisit that compat hack and consider enabling it for unknown a11y clients.
Flags: needinfo?(aklotz)
I guess a better question is, why are we seeing a modal slow-script dialog under e10s?
Hmmm... looks like addons.
This is the #3 Windows topcrash in Nightly 20170825100126, with 103 occurrences so far.
(In reply to Aaron Klotz [:aklotz] (a11y work receiving priority right now, please send interceptor reviews to dmajor or handyman) from comment #25)
 
> I think we either need to figure out a way to fail gracefully when we hit
> this error code, or revisit that compat hack and consider enabling it for
> unknown a11y clients.

There are a lot of unknowns that make it a tough decision. My thinking is that if we can increase stability by enabling the compat hack for unknowns then perhaps we should go that route so we can ship, then later revisit removal another time? As for failing gracefully, should we do that regardless?

The crash correlations tab is currently empty... not sure how to best tease out addon correlations. Jim got any tricks?
Flags: needinfo?(jmathies)
(In reply to David Bolter [:davidb] (NeedInfo me for attention) from comment #29)
> (In reply to Aaron Klotz [:aklotz] (a11y work receiving priority right now,
> please send interceptor reviews to dmajor or handyman) from comment #25)
>  
> > I think we either need to figure out a way to fail gracefully when we hit
> > this error code, or revisit that compat hack and consider enabling it for
> > unknown a11y clients.
> 
> There are a lot of unknowns that make it a tough decision. My thinking is
> that if we can increase stability by enabling the compat hack for unknowns
> then perhaps we should go that route so we can ship, then later revisit
> removal another time? As for failing gracefully, should we do that
> regardless?
> 
> The crash correlations tab is currently empty... not sure how to best tease
> out addon correlations. Jim got any tricks?

Not seeing much for correlations - 

https://crash-stats.mozilla.com/search/?signature=%3DIPCError-browser%20%7C%20PBrowserParent%3A%3ARecvPDocAccessibleConstructor%20Constructing%20a%20top-level%20PDocAccessible%20with%20null%20COM&product=Firefox&version=57.0a1&date=%3E%3D2017-08-25T14%3A02%3A00.000Z&date=%3C2017-08-28T14%3A02%3A00.000Z&_sort=-date&_facets=signature&_facets=accessibility_client&_facets=accessibility_in_proc_client&_facets=platform_version&_facets=uptime&_facets=addons&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-signature

the addons present are our system addons. Also various other facets here, not much to go on unfortunately.
Flags: needinfo?(jmathies)
Attached patch More annotationsSplinter Review
This patch adds a few new categories of crash annotations:

* The current activation context handle at the time of unmarshaling vs the xul.dll activation context handle.
* Recording of whether a11y thinks that the handler is enabled;
* Expected vs actual length of incoming marshaled proxy stream;
* Handler registration info.
Attachment #8901979 - Flags: review?(jmathies)
Attachment #8901979 - Flags: review?(jmathies) → review+
Blocks: 1394874
https://hg.mozilla.org/integration/mozilla-inbound/rev/7fd5b91b024dc5386be36c7457d30234b50b10fe
Bug 1383501: Restrict mscom::MainThreadRuntime crash annotations to chrome and content to avoid false-positive leakchecks in gpu process; r=bustage CLOSED TREE
HKEY_CLASSES_ROOT is a merged view of HKLM\SOFTWARE\Classes and HKCU\SOFTWARE\Classes. We should replace that with HKLM\SOFTWARE\Classes to remove ambiguity and because sandboxed content probably can't see HKCU properly.
Attachment #8902487 - Flags: review?(jmathies)
Depends on: 1395113
Attachment #8902487 - Flags: review?(jmathies) → review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/10d42af6319d08daca29d5d163f301696b5ff1fd
Bug 1383501: Change remaining references to HKEY_CLASSES_ROOT to HKEY_LOCAL_MACHINE in mscom-related code; r=jimm
Attachment #8903278 - Flags: review?(davidp99) → review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/eb75dcb49495d0aa4c4eac03193bf0f92fb6348e
Bug 1383501: Move a crash annotation so that it does not mask a previous annotation with the same key; r=handyman
Comment on attachment 8905119 [details] [diff] [review]
Add diagnostic assert for non-null com proxy

Review of attachment 8905119 [details] [diff] [review]:
-----------------------------------------------------------------

thanks
Attachment #8905119 - Flags: review?(yzenevich) → review+
Pushed by aklotz@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/31ba8f169c5d
Assert that com proxy is non-null before calling SendPDocAccessibleConstructor; r=yzen
Attachment #8905118 - Flags: review?(jmathies) → review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/7922386758a48b3eceef1291222e17de610e610a
Bug 1383501: Move stream creation below buffer size check in mscom::ProxyStream::ProxyStream; r=jimm
Attached patch Add fences (obsolete) — Splinter Review
I noticed that after the diagnostic assert landed, we haven't seen any crashes for the past few nightly builds. This makes me wonder if perhaps the SetEvent/WaitForSingleObject synchronization is insufficient, resulting in data races. I'd like to add fences to this code to ensure that any data written by the closure is visible to the calling thread before EnsureMTA returns.
Attachment #8905779 - Flags: review?(jmathies)
There are 19 crashes (from 3 installs) in nightly 57 with buildid 20170907220212 (see [1]).
The crash reason is always MOZ_RELEASE_ASSERT(!holder.IsNull()) which has been added in [2].

[1] http://bit.ly/2xh9H6d
[2] https://hg.mozilla.org/mozilla-central/rev/31ba8f169c5d
Crash Signature: [@ IPCError-browser | PBrowserParent::RecvPDocAccessibleConstructor Constructing a top-level PDocAccessible with null COM ] → [@ IPCError-browser | PBrowserParent::RecvPDocAccessibleConstructor Constructing a top-level PDocAccessible with null COM ] [@ mozilla::a11y::DocAccessible::DoInitialUpdate]
Comment on attachment 8905779 [details] [diff] [review]
Add fences

Canceling. The assertions were showing up under a different signature than I was looking for.
Attachment #8905779 - Flags: review?(jmathies)
Spun off bug 1399557 for DoInitialUpdate needs.
Crash Signature: [@ IPCError-browser | PBrowserParent::RecvPDocAccessibleConstructor Constructing a top-level PDocAccessible with null COM ] [@ mozilla::a11y::DocAccessible::DoInitialUpdate] → [@ IPCError-browser | PBrowserParent::RecvPDocAccessibleConstructor Constructing a top-level PDocAccessible with null COM ]
In some dumps I found that the location of the DLL in the registry does not match the location of the remaining Firefox binaries. This may cause problems due to sandboxing (since our sandbox only recognized the current install's binaries), or perhaps outdated binaries in the "other" installation.

This could happen if somebody is concurrently running multiple installs of the same Firefox release channel. Or if one installation was done with the installer, while the other one was a simple unzip.

By verifying that the handler in the registry matches the binary path of our install, we can disable the handler in these situations.
Attachment #8905779 - Attachment is obsolete: true
Attachment #8908301 - Flags: review?(eitan)
(In reply to Aaron Klotz [:aklotz] (a11y work receiving priority right now, please send interceptor reviews to dmajor or handyman) from comment #53)

> By verifying that the handler in the registry matches the binary path of our
> install, we can disable the handler in these situations.

Assume you mean *does not* match the binary path.
Comment on attachment 8908301 [details] [diff] [review]
Add check for matching handler binary path to a11y::IsHandlerRegistered

Review of attachment 8908301 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good!
Attachment #8908301 - Flags: review?(eitan) → review+
(In reply to Eitan Isaacson [:eeejay] from comment #54)
> (In reply to Aaron Klotz [:aklotz] (a11y work receiving priority right now,
> please send interceptor reviews to dmajor or handyman) from comment #53)
> 
> > By verifying that the handler in the registry matches the binary path of our
> > install, we can disable the handler in these situations.
> 
> Assume you mean *does not* match the binary path.

Errr, yeah.
This crash signature is still present in Nightly 20170917100334, where it is the #7 Windows topcrash.
Judging by the remaining crash reports for this signature, this is the final class of failures that is resulting in crashes. Asserting in content will give us better data.
Attachment #8909374 - Flags: review?(jmathies)
Attachment #8909374 - Flags: review?(jmathies) → review+
There are 5 crashes from the same installation with signature "<T>::operator() | mozilla::detail::RunnableFunction<T>::Run".
The crash reason is "MOZ_RELEASE_ASSERT(marshalResult != ((HRESULT)0x80070057L))" (see [1])

[1] http://bit.ly/2wvOG88
Crash Signature: [@ IPCError-browser | PBrowserParent::RecvPDocAccessibleConstructor Constructing a top-level PDocAccessible with null COM ] → [@ IPCError-browser | PBrowserParent::RecvPDocAccessibleConstructor Constructing a top-level PDocAccessible with null COM]
Setting 57 status to fix-optional since we will be making a special case final call on windows e10s a11y support early in beta cycle.
Crash Signature: [@ IPCError-browser | PBrowserParent::RecvPDocAccessibleConstructor Constructing a top-level PDocAccessible with null COM] → [@ IPCError-browser | PBrowserParent::RecvPDocAccessibleConstructor Constructing a top-level PDocAccessible with null COM] [@ <T>::operator() | mozilla::detail::RunnableFunction<T>::Run ]
Attachment #8909972 - Flags: review?(jmathies) → review+
See Also: → 1403358
[Tracking Requested - why for this release]:
this signature accounts for a third of all content crashes in first data from the rollout of 57.0b3 to the general beta population.
Hi David, are we ready to make a decision now (from comment 66)? Philipp noted in the comment above that with a small beta rollout this signature accounts for a third of all content crashes. That is significant enough and a "fix-optional" for 57 doesn't seem like the right decision anymore. Thoughts?
Flags: needinfo?(dbolter)
Keywords: topcrash
Hi Ritu, you are right, we'll need to fix this before we can say yes to ship on 57. (NI jimm for awareness)
Flags: needinfo?(dbolter) → needinfo?(jmathies)
Flags: needinfo?(jmathies)
(In reply to David Bolter [:davidb] (NeedInfo me for attention) from comment #71)
> Hi Ritu, you are right, we'll need to fix this before we can say yes to ship
> on 57. (NI jimm for awareness)

:davidb, we just talked about this in channel and it would be good to get this fix in no later than GTB for Beta 8 which is next Thursday, Oct 12th so this needs to land by Oct 11th. Thank you!
From Aaron: Patches for that bug are landing under different bugs on nightly because the crash signatures are different on Nightly. Bug 1399557 should made a big difference here and I plan to request uplift once I see some confirmation on crash-stats.
The `<T>::operator() | mozilla::detail::RunnableFunction<T>::Run` signature is still occurring; it's the #2 Windows topcrash in Nightly 20171003100226, which is currently the most recent Nightly build.
(In reply to Nicholas Nethercote [:njn] from comment #75)
> The `<T>::operator() | mozilla::detail::RunnableFunction<T>::Run` signature
> is still occurring; it's the #2 Windows topcrash in Nightly 20171003100226,
> which is currently the most recent Nightly build.

Hoping this is fixed too... (currently no crash reports for builds more recent than 20171003220138)
Assignee: aklotz → jmathies
Attachment #8915221 - Flags: review?(jmathies) → review+
request is for path labeled 'Handle null COM proxies'
Keywords: checkin-needed
s/path/patch in previous comment.

We're going to request uplift on this so here's a beta try test run:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=e064193d4957d2b0112100e330b22aca0d538cab
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/e2a6e2ddfa18
Do not crash when TabParent::RecvPDocAccessibleConstructor receives a null COM proxy sent to the parent process. r=jimm
Keywords: checkin-needed
Jim, no rush but please fill the uplift request so that we can take it to beta. Merci!
Flags: needinfo?(jmathies)
https://hg.mozilla.org/mozilla-central/rev/e2a6e2ddfa18
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla58
This signature isn't fully addressed yet so lets keep this open a bit longer.
Status: RESOLVED → REOPENED
Keywords: leave-open
Resolution: FIXED → ---
Comment on attachment 8915221 [details] [diff] [review]
Handle null COM proxies

Approval Request Comment
[Feature/Bug causing the regression]:
e10s
[User impact if declined]:
crashy beta
[Is this code covered by automated tests?]:
no
[Has the fix been verified in Nightly?]:
yes
[Needs manual test from QE? If yes, steps to reproduce]: 
no
[List of other uplifts needed for the feature/fix]:
none
[Is the change risky?]:
not very
[Why is the change risky/not risky?]:
change is designed to improve stability
[String changes made/needed]:
none
Flags: needinfo?(jmathies)
Attachment #8915221 - Flags: approval-mozilla-beta?
Comment on attachment 8915221 [details] [diff] [review]
Handle null COM proxies

Fixes an e10s crash, Beta57+
Attachment #8915221 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
https://hg.mozilla.org/releases/mozilla-beta/rev/fce8e9263a01

Leaving 57 set to affected under the assumption that more patches are yet to come.
Target Milestone: mozilla58 → ---
Epic. Aaron, would you like to have a separate bug for the lower frequency signature?
Flags: needinfo?(aklotz)
Now that we have bug 1411685, I'm removing that signature from this one.
Crash Signature: [@ IPCError-browser | PBrowserParent::RecvPDocAccessibleConstructor Constructing a top-level PDocAccessible with null COM] [@ <T>::operator() | mozilla::detail::RunnableFunction<T>::Run ] → [@ IPCError-browser | PBrowserParent::RecvPDocAccessibleConstructor Constructing a top-level PDocAccessible with null COM]
Flags: needinfo?(aklotz)
Thanks (I meant to do that).

This bugs spike has been fixed. Aaron would you like to leave this open for the remaining crashes -- and lower the priority, or shall we close it out and open a follow up for the remaining? Leaving this open and P1 will be a false flag to triage programs.
Flags: needinfo?(jmathies)
Flags: needinfo?(aklotz)
Pretty sure the plan is to break remaining work out to other bugs.
Flags: needinfo?(jmathies)
Status: REOPENED → RESOLVED
Closed: 7 years ago6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla58
Flags: needinfo?(aklotz)
This signature is topcrash #7 in the Windows nightly 20180125104809,
with 42 crashes from 3 different installations.
Flags: needinfo?(jmathies)
This is Nightly-only.
Flags: needinfo?(jmathies)
I'll probably remove the signature(s) from this bug and shift them over to other ones that are still open with respect to the overarching issue.
Removing leave-open keyword from resolved bugs, per :sylvestre.
Keywords: leave-open
You need to log in before you can comment on or make changes to this bug.