Closed Bug 1324863 Opened 3 years ago Closed 3 years ago

Crash in mozilla::a11y::DocAccessibleParent::RemoveChildDoc

Categories

(Core :: Disability Access APIs, defect, critical)

Unspecified
Windows 7
defect
Not set
critical

Tracking

()

RESOLVED DUPLICATE of bug 1326084
mozilla54
Tracking Status
firefox50 --- unaffected
firefox51 --- unaffected
firefox52 --- affected
firefox53 --- affected
firefox54 --- fixed

People

(Reporter: marcia, Assigned: tbsaunde)

References

(Depends on 1 open bug)

Details

(4 keywords, Whiteboard: [adv-main54-] aes+)

Crash Data

Attachments

(1 file)

This bug was filed from the Socorro interface and is 
report bp-251dc21e-f125-4301-b257-b37392161219.
=============================================================

Seen while looking at trunk data. This one started using 20161217030205: http://bit.ly/2gZ0frO

Possible regression range: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=63b447888a6469b9f6ae8f76ac5f0d7c6ea239da&tochange=34a1ab064cb5b868fa75cb74d052e978eb34d6c1

Bug 1321936 is in the regression range - ni on aklotz
Flags: needinfo?(aklotz)
Not related to 1321936. Could be fallout from bug 1314707 or bug 1321935. I don't think we should refer to this as a "regression," however, as cleaning up those bugs probably just means that whatever weird conditions were triggering those bugs are probably triggering this one too.

In particular it looks like the child doc being destroyed does not have a parent.
Depends on: 1258839
Flags: needinfo?(aklotz)
Crash Signature: [@ mozilla::a11y::DocAccessibleParent::RemoveChildDoc] → [@ mozilla::a11y::DocAccessibleParent::RemoveChildDoc] [@nsTArray_Impl<T>::IndexOf<T> | mozilla::a11y::DocAccessibleParent::RemoveChildDoc]
Is a null check the right fix?
That would help with wallpapering over the crash, for sure, however I am concerned about the fact that we even reach that state in the first place.

Trev, what are your thoughts about this crash?
Flags: needinfo?(tbsaunde+mozbugs)
Whiteboard: aes+
127 occurrences across Nightly and Aurora in the past 7 days.
(In reply to Nicholas Nethercote [:njn] (PTO until January 9th) from comment #4)
> 127 occurrences across Nightly and Aurora in the past 7 days.

#2 crash for nightly
Keywords: topcrash-win
#7 topcrash in Nightly 20161230030205. Still waiting for info from tbsaunde.
Trevor is back agreed to take a look at this.
Assignee: nobody → tbsaunde+mozbugs
> In particular it looks like the child doc being destroyed does not have a
> parent.

I see we have DocAccessibleParent::Destroy() calling DocAccessibleparent::Destroy() that suggests there is a document that has the child as its child, but the child doesn't know about the parent.  I'm tempted to go back to the checking patch in bug 1184217 to try and crash as soon as we corrupt the data structure.
Flags: needinfo?(tbsaunde+mozbugs)
<davidb> tbsaunde: are there any more findings we can add to bug 1324863?
<@tbsaunde> davidb: at the moment trying to fix other bugs first awith the hope its a dup of one of the others
#7 topcrash again in Nightly 20170113030227.
I think 100% of these crashes are amd64. Is that interesting?
Flags: needinfo?(tbsaunde+mozbugs)
Flags: needinfo?(tbsaunde+mozbugs)
(In reply to David Bolter [:davidb] from comment #11)
> I think 100% of these crashes are amd64. Is that interesting?

These are all happening in 64-bit builds where we still have a11y getting activated by touchscreens. Fixing that would lessen the severity but wouldn't fix the core problem.
Only about 30% are shutdown crashes, so this is something users see while surfing. user comments back this up.
I'm not really a fan of this, but what's the worst that can happen?
Attachment #8829627 - Flags: review?(dbolter)
Comment on attachment 8829627 [details] [diff] [review]
paper over null parent proxy accessible

Review of attachment 8829627 [details] [diff] [review]:
-----------------------------------------------------------------

Hhmmm okay... note this won't paper all the crashes.
Attachment #8829627 - Flags: review?(dbolter) → review+
(In reply to Jim Mathies [:jimm] from comment #13)
> (In reply to David Bolter [:davidb] from comment #11)
> > I think 100% of these crashes are amd64. Is that interesting?
> 
> These are all happening in 64-bit builds where we still have a11y getting
> activated by touchscreens. Fixing that would lessen the severity but
> wouldn't fix the core problem.

Yeah.

And it seems there is a signature "nsTArray_Impl<T>::IndexOf<T> | mozilla::a11y::DocAccessibleParent::RemoveChildDoc" crash-stats treats separately which shows os arch as x86. So we're not just talking about amd64. E.g. https://crash-stats.mozilla.com/report/index/2bcfd386-c4ca-463f-863b-713ca2170123
Pushed by tsaunders@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/0498497a3282
paper over null parent proxy accessible r=davidb
Don't know if this will help, but I just got a crash with this signature as I closed frozen Nightly from the task bar. Report is here:
https://crash-stats.mozilla.com/report/index/9cebbafb-ca75-4587-bcc7-1aa9b2170124
Marking as sec bug.  Note that the crash address is mostly non-0 and non-nullptr-plus-offset.  Also, there are clear UAF signatures, and this likely would be used as object pointer for a method call (i.e. the vptr table will be used for execution addresses).

The wallpaper that landed has no chance of fixing this.  The problem appears to be that nothing is holding a reference to the object returned by Parent() (I presume).
Group: core-security
Flags: needinfo?(tbsaunde+mozbugs)
https://hg.mozilla.org/mozilla-central/rev/0498497a3282
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla54
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(In reply to Randell Jesup [:jesup] from comment #20)
> Marking as sec bug.  Note that the crash address is mostly non-0 and
> non-nullptr-plus-offset.  Also, there are clear UAF signatures, and this
> likely would be used as object pointer for a method call (i.e. the vptr
> table will be used for execution addresses).
> 
> The wallpaper that landed has no chance of fixing this.  The problem appears

that is what was sort of obscurely said in earlier comments, but I was told it is 0 some of the time, I'm not sure if that is a different issue from the UAF or just random chance.  So obviously it doesn't fix things, but David seems to think avoiding the null crashes may be good, and I don't see it being any worse so *shrug*.

> to be that nothing is holding a reference to the object returned by Parent()
> (I presume).

well, its that something should be clearing the mParent pointer, but isn't.  Unfortunately tracking that down hasn't proved to be simple yet.
Flags: needinfo?(tbsaunde+mozbugs)
I think at this point we can say this is actually a dup
Status: REOPENED → RESOLVED
Closed: 3 years ago3 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 1326084
Comment on attachment 8829627 [details] [diff] [review]
paper over null parent proxy accessible

Review of attachment 8829627 [details] [diff] [review]:
-----------------------------------------------------------------

Parent is bad (and usually non-null) so this doesn't help us really.

::: accessible/ipc/DocAccessibleParent.h
@@ +113,5 @@
>    {
> +    ProxyAccessible* parent = aChildDoc->Parent();
> +    MOZ_ASSERT(parent);
> +    if (parent) {
> +      aChildDoc->Parent()->ClearChildDoc(aChildDoc);

Why not reuse the parent variable here? (Not that it makes a difference)
(In reply to David Bolter [:davidb] from comment #24)

> Parent is bad (and usually non-null) so this doesn't help us really.

Oh and comment 20. Thanks Randell.
Whiteboard: aes+ → [adv-main54-] aes+
Group: core-security
You need to log in before you can comment on or make changes to this bug.