Closed Bug 1240893 Opened 4 years ago Closed 3 years ago

crash in PLDHashTable::Remove | mozilla::a11y::AccessibleWrap::Shutdown

Categories

(Core :: Disability Access APIs, defect, critical)

Unspecified
Windows NT
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla53
Tracking Status
firefox-esr45 --- wontfix
firefox51 + wontfix
firefox52 + fixed
firefox53 + fixed

People

(Reporter: davidb, Assigned: tbsaunde)

Details

(4 keywords, Whiteboard: [post-critsmash-triage][adv-main52+] aes+ (esr-45 is only a null-deref))

Crash Data

Attachments

(2 files)

This bug was filed from the Socorro interface and is 
report bp-52b6bac0-ec3e-43d7-a740-bb1442160119.
=============================================================

There are reports for 45 and 46.

More reports here: https://crash-stats.mozilla.com/report/list?product=Firefox&signature=PLDHashTable%3A%3ARemove+%7C+mozilla%3A%3Aa11y%3A%3AAccessibleWrap%3A%3AShutdown

From Doug: AccessibleWrap::mDoc (which is a weak reference) is being released before AccessibleWrap::Shutdown is called.
Crash volume for signature 'PLDHashTable::Remove | mozilla::a11y::AccessibleWrap::Shutdown':
 - nightly (version 50): 0 crash from 2016-06-06.
 - aurora  (version 49): 0 crash from 2016-06-07.
 - beta    (version 48): 0 crash from 2016-06-06.
 - release (version 47): 0 crash from 2016-05-31.
 - esr     (version 45): 94 crashes from 2016-04-07.

Crash volume on the last weeks:
             Week N-1   Week N-2   Week N-3   Week N-4   Week N-5   Week N-6   Week N-7
 - nightly          0          0          0          0          0          0          0
 - aurora           0          0          0          0          0          0          0
 - beta             0          0          0          0          0          0          0
 - release          0          0          0          0          0          0          0
 - esr              3         20          5         17          4          8          0

Affected platform: Windows
should it be called wfm based on comment #1?
Whiteboard: aes+
Hey Trevor let me know if you need a crash dump.
Flags: needinfo?(tbsaunde+mozbugs)
This is #18 topcrash on Aurora in the past day, with 34 crashes across 23 installations.
Flags: needinfo?(aklotz)
#7 in Nightly 20161202030204.
#12 in Nightly 20161209030212.
This is still showing up in the latest Aurora builds, so it hasn't been affected by other recent a11y fixes.
This has been around for a while however a11y+e10s work appears to have regressed the rate. it's currently #8 on aurora.

https://crash-stats.mozilla.com/search/?product=Firefox&version=52.0a2&platform=Windows&date=%3E%3D2016-12-13T16%3A48%3A00.000Z&date=%3C2016-12-20T16%3A48%3A00.000Z&_sort=-date&_facets=signature&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-signature

Clearing old nis and adding an ni to davidb. David, does Trevor have cycles for this? I'd like to find an owner to investigate since I'm betting this is going to get flagged as a top crash once we hit beta.
Flags: needinfo?(tbsaunde+mozbugs)
Flags: needinfo?(dbolter)
Flags: needinfo?(aklotz)
Trevor said he'd take a look.
Assignee: nobody → tbsaunde+mozbugs
Flags: needinfo?(dbolter)
Crash volume for signature 'PLDHashTable::Remove | mozilla::a11y::AccessibleWrap::Shutdown':
 - nightly (version 53): 1316 crashes from 2016-11-14.
 - aurora  (version 52): 2176 crashes from 2016-11-14.
 - beta    (version 51): 581 crashes from 2016-11-14.
 - release (version 50): 0 crashes from 2016-11-01.
 - esr     (version 45): 589 crashes from 2016-07-06.

Crash volume on the last weeks (Week N is from 01-02 to 01-08):
            W. N-1  W. N-2  W. N-3  W. N-4  W. N-5  W. N-6  W. N-7
 - nightly     154     134     179     234     252     176     141
 - aurora      301     378     362     352     363     352       0
 - beta         56      69     103     124     117      59      39
 - release       0       0       0       0       0       0       0
 - esr          36      57      35      22      33      30      32

Affected platform: Windows

Crash rank on the last 7 days:
           Browser   Content   Plugin
 - nightly #255      #8
 - aurora  #312      #7
 - beta    #5199     #40
 - release
 - esr     #410
#6 topcrash in Nightly 20170106030204.
We need to make sure that the accessible lives longer than the document it is in, and if the cycle collector ends up breaking a cycle the accessible is shut down before it drops its ref to the document.  I'm not sure this is the cause here, but it seems possible.
Attachment #8825501 - Flags: review?(dbolter)
Comment on attachment 8825501 [details] [diff] [review]
ensure accessibles are shutdown before their document

Review of attachment 8825501 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks
Attachment #8825501 - Flags: review?(dbolter) → review+
This is... gross, but I don't see a better alternative :(
Attachment #8827668 - Flags: review?(bugs)
oh, the other patch made doc->doc cycle.
Comment on attachment 8827668 [details] [diff] [review]
fixup refcount logging

FWIW, DOM tree has a bit unique setup for similar tree structure.
Document doesn't own itself, but all the child nodes do own it (that happens via NodeInfos). NodeInfos aren't unlinked, but released only when nodes are deleted.
That way OwnerDoc() returns always non-null value.

But perhaps similar setup isn't needed here.
Attachment #8827668 - Flags: review?(bugs) → review+
(In reply to Olli Pettay [:smaug] (review request backlog because of a work week) from comment #16)
> oh, the other patch made doc->doc cycle.

yeah, but its also important it took a ref before the ctor got to the most derived class, I actually caused a similar problem a while back without the cycle.  It seems like it would be good to handle that better, but I don't have time to deal with it I'm afraid.

(In reply to Olli Pettay [:smaug] (review request backlog because of a work week) from comment #17)
> Comment on attachment 8827668 [details] [diff] [review]
> fixup refcount logging
> 
> FWIW, DOM tree has a bit unique setup for similar tree structure.
> Document doesn't own itself, but all the child nodes do own it (that happens
> via NodeInfos). NodeInfos aren't unlinked, but released only when nodes are
> deleted.
> That way OwnerDoc() returns always non-null value.
> 
> But perhaps similar setup isn't needed here.

yeah, there's no need at this point for the second object, so this is a little trickier I suspect.  I considered some special smart pointer thing that in this one case wouldn't hold a ref, but it just seemed like too much work, and not much better.

The other difference seems to be nsIDocument::nsIDocument() passes nullNodeInfo to the nsINode ctor and then presumably later the nodeinfo gets set up to point to the document?
Pushed by tsaunders@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/0fe2b3363c9e
ensure accessibles are shutdown before their document r=davidb, smaug
https://hg.mozilla.org/mozilla-central/rev/0fe2b3363c9e
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
Too late for 51.
Trevor, could you fill and uplift request to 52 (currently aurora) if it makes sense? Thanks
Flags: needinfo?(tbsaunde+mozbugs)
I'm not completely sure if this patch fixes the issue so lets give it a couple days to see?  If it does I guess we might as well.
Flags: needinfo?(tbsaunde+mozbugs)
Flags: needinfo?(tbsaunde+mozbugs)
I don't see any crash reports for build 20170120030214 so this looks good.
We'll need an uplift on this if the fix is still good; this is a sec bug (UAF).  It appears it once was a nullptr crash (in 45esr), but in all recent builds it's a UAF.  Since it appears to be a nullptr in 45, we can keep that as wontfix I think
Group: core-security
Moving 51 back to affected since we know this is a sec issue

Dan, Al - please vet the sec-high rating here, and we probably want to consider this for any point release or 51 respin.  Patch is already live in 53/54
Flags: needinfo?(dveditz)
Flags: needinfo?(abillings)
(In reply to Randell Jesup [:jesup] from comment #24)
> We'll need an uplift on this if the fix is still good; this is a sec bug
> (UAF).  It appears it once was a nullptr crash (in 45esr), but in all recent
> builds it's a UAF.  Since it appears to be a nullptr in 45, we can keep that
> as wontfix I think

see comment 22, though its been a couple days, so I guess it worked.
Flags: needinfo?(tbsaunde+mozbugs)
Comment on attachment 8827668 [details] [diff] [review]
fixup refcount logging

Approval Request Comment
[Feature/Bug causing the regression]:unknown
[User impact if declined]:exploitable crashes, though how to cause this is unknown.
[Is this code covered by automated tests?]:yes
[Has the fix been verified in Nightly?]:yes
[Needs manual test from QE? If yes, steps to reproduce]: no
[List of other uplifts needed for the feature/fix]:none
[Is the change risky?]:not really we just hold a reference we didn't before.
[Why is the change risky/not risky?]:see previous answer
[String changes made/needed]:none
Attachment #8827668 - Flags: approval-mozilla-beta?
[Tracking Requested - why for this release]:
This fixed has landed in mozilla-central already. If there's a 51 point release it could be a good candidate. In addition to being a security fix it also appears to be a Fx51b topcrash.
Flags: needinfo?(dveditz)
Flags: needinfo?(abillings)
Keywords: regression
Whiteboard: aes+ → aes+ (esr-45 is only a null-deref)
Comment on attachment 8827668 [details] [diff] [review]
fixup refcount logging

crash fix for beta52
Attachment #8827668 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Group: core-security → core-security-release
Track 51+ as sec-high.
Flags: qe-verify-
Whiteboard: aes+ (esr-45 is only a null-deref) → [post-critsmash-triage] aes+ (esr-45 is only a null-deref)
Whiteboard: [post-critsmash-triage] aes+ (esr-45 is only a null-deref) → [post-critsmash-triage][adv-main52+] aes+ (esr-45 is only a null-deref)
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.