Closed Bug 1309236 Opened 8 years ago Closed 8 years ago

Crash in mozilla::a11y::DocAccessibleParent::RecvMsaaID

Categories

(Core :: Disability Access APIs, defect)

Unspecified
Windows 10
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla52
Tracking Status
firefox51 + fixed
firefox52 --- fixed

People

(Reporter: marcia, Assigned: bugzilla)

References

(Blocks 1 open bug)

Details

(Keywords: crash, topcrash, Whiteboard: aes+)

Crash Data

Attachments

(1 file, 1 obsolete file)

This bug was filed from the Socorro interface and is 
report bp-d29228bf-c8a5-4dec-b5ec-4211a2161011.
=============================================================

Seen while looking at crash stats: http://bit.ly/2dGmch8. Regression that started in 20161006030208.

Possible regression from work in Bug 1304449? ni on Aaron Klotz.
Flags: needinfo?(aklotz)
I have not yet encountered this crash.
Blocks: 1258839
This is saying that a DocAccessibleParent has a null wrapper, which doesn't make much sense to me. Somehow the DocAccessibleParent would need to be Destroy()ed between its constructor and the call to SendMsaaID. AFAICT that would need to be initiated by content itself, which obviously we don't do.
Flags: needinfo?(aklotz)
(In reply to Aaron Klotz [:aklotz] from comment #2)
> This is saying that a DocAccessibleParent has a null wrapper, which doesn't
> make much sense to me. Somehow the DocAccessibleParent would need to be
> Destroy()ed between its constructor and the call to SendMsaaID. AFAICT that
> would need to be initiated by content itself, which obviously we don't do.

Trev, can you think of any kind of race that might be occurring such that we could reach this state?
Flags: needinfo?(tbsaunde+mozbugs)
(In reply to Aaron Klotz [:aklotz] from comment #3)
> (In reply to Aaron Klotz [:aklotz] from comment #2)
> > This is saying that a DocAccessibleParent has a null wrapper, which doesn't
> > make much sense to me. Somehow the DocAccessibleParent would need to be
> > Destroy()ed between its constructor and the call to SendMsaaID. AFAICT that
> > would need to be initiated by content itself, which obviously we don't do.

I agree it doesn't make a lot of sense.

The one theory I have and I'm skeptical of it is this.  Both MsaaID and PDocAccessibleConstructor are async, so it is possible the child sends both before the parent handles the constructor message.  If that happens, and the constructor message fails and returns false I'm not sure exactly what happens.  However it feels like the answer shouldn't be "haha methods get called on a half constructed actor" ;) so this seems unlikely.
Flags: needinfo?(tbsaunde+mozbugs)
100% of these crashes have AddonsShouldHaveBlockedE10S = 1. I am highly suspicious about that.
Whiteboard: aes+
I still don't understand how this particular crash signature can happen, but I have managed to reproduce a case where we try to operate on a proxied accessible whose MSAA ID hasn't been set yet.

A client could call into chrome and navigate to that proxied accessible and try to resolve it during the window of time between receipt of PDocAccessibleConstructor and RecvMsaaID. The accessible's uninitialized MSAA ID would be kNoID, which then triggers that same stack overflow from bug 1308397.

As annoying as it is to have dummy parameters on non-Windows, I think this is the more robust path.
Assignee: nobody → aklotz
Status: NEW → ASSIGNED
Attachment #8803018 - Flags: review?(tbsaunde+mozbugs)
(In reply to Aaron Klotz [:aklotz] from comment #6)
> Created attachment 8803018 [details] [diff] [review]
> Send MSAA ID along with PDocAccessibleConstructor
> 
> I still don't understand how this particular crash signature can happen, but
> I have managed to reproduce a case where we try to operate on a proxied
> accessible whose MSAA ID hasn't been set yet.
> 
> A client could call into chrome and navigate to that proxied accessible and
> try to resolve it during the window of time between receipt of
> PDocAccessibleConstructor and RecvMsaaID. The accessible's uninitialized
> MSAA ID would be kNoID, which then triggers that same stack overflow from
> bug 1308397.

makes sense, given the current setup can't clients also actually get there hands on a accessible for a document that doesn't yet have a proxy?  Because we fire events that let them see the accessible in RecvShowEvent(), but don't set the proxy up until RecvCOMProxy()?

> As annoying as it is to have dummy parameters on non-Windows, I think this
> is the more robust path.

annoying but seems reasonable to me.
Comment on attachment 8803018 [details] [diff] [review]
Send MSAA ID along with PDocAccessibleConstructor

>-          static_cast<TabChild*>(tabChild.get())->
>-            SendPDocAccessibleConstructor(ipcDoc, nullptr, 0);
> 
> #if defined(XP_WIN)
>+          static_cast<TabChild*>(tabChild.get())->
>+            SendPDocAccessibleConstructor(ipcDoc, nullptr, 0,
>+                                          AccessibleWrap::GetChildIDFor(docAcc));

the only part you actually need to ifdef is getting the id right? could you do that so we always compile as much as we can.

>           IAccessibleHolder holder(CreateHolderFromAccessible(docAcc));
>           ipcDoc->SendCOMProxy(holder);

see previous concern that events are fired on the target document before it gets a proxy.

>       if (tabChild) {
>-        static_cast<TabChild*>(tabChild.get())->
>-          SendPDocAccessibleConstructor(ipcDoc, parentIPCDoc, id);
> #if defined(XP_WIN)
>         MOZ_ASSERT(parentIPCDoc);

same comment on the ifdef stuff here.

>     /**
>      * Tell the parent process a new accessible document has been created.
>      * aParentDoc is the accessible document it was created in if any, and
>      * aParentAcc is the id of the accessible in that document the new document
>      * is a child of.

document the new arg?
Attachment #8803018 - Flags: review?(tbsaunde+mozbugs) → review+
Currently top crash in nightly.
Keywords: topcrash
Addressed comments, carrying forward r+.

Agreed on the SendCOMProxy point, however I think I'm going to need to put some more thought into that one.

Moving it into the constructor complicates things as on the surface it looks like we would need to change the constructor from async to sync, which is annoying to have to do even on platforms that don't need that.

The good news is that encountering a null COM proxy should be more robust (due to the lazy loading stuff), whereas the MSAA ID is critical.
Attachment #8803018 - Attachment is obsolete: true
Attachment #8805238 - Flags: review+
https://hg.mozilla.org/mozilla-central/rev/0694957ec094
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla52
Depends on: 1314310
Should we uplift this to 51?
Crash Signature: [@ mozilla::a11y::DocAccessibleParent::RecvMsaaID] → [@ mozilla::a11y::DocAccessibleParent::RecvMsaaID] [@ mozilla::a11y::ProxyAccessible::GetCOMInterface ] [@ mozilla::a11y::AccessibleWrap::GetIAccessibleFor ]
Flags: needinfo?(aklotz)
Aaron or Trevor, can you request uplift to aurora 51? Thanks.
Flags: needinfo?(tbsaunde+mozbugs)
Alright. FWIW this feature is disabled on 51 but it looks like there are enough people force enabling that it is an issue.
Flags: needinfo?(tbsaunde+mozbugs)
Flags: needinfo?(aklotz)
Comment on attachment 8805238 [details] [diff] [review]
Send MSAA ID along with PDocAccessibleConstructor (r2)

Approval Request Comment
[Feature/regressing bug #]: Bug 1304449
[User impact if declined]: Crashes on Windows with a11y or touchscreens, though this should only affect users who have force-enabled e10s.
[Describe test coverage new/current, TreeHerder]: Currently in 52, crashes are gone there.
[Risks and why]: Low, only affects users who have force-enabled e10s.
[String/UUID change made/needed]: None
Attachment #8805238 - Flags: approval-mozilla-aurora?
Comment on attachment 8805238 [details] [diff] [review]
Send MSAA ID along with PDocAccessibleConstructor (r2)

Fix a crash. Take it in 51 aurora.
Attachment #8805238 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.