Closed Bug 1376825 Opened 3 years ago Closed 3 years ago

Crash in mozilla::a11y::DocAccessible::GetAccessible

Categories

(Core :: Disability Access APIs, defect, P1, critical)

56 Branch
x86
Windows 7
defect

Tracking

()

RESOLVED FIXED
mozilla56
Tracking Status
firefox-esr52 --- fixed
firefox54 --- wontfix
firefox55 --- fixed
firefox56 --- verified

People

(Reporter: calixte, Assigned: eeejay)

References

()

Details

(Keywords: crash)

Crash Data

Attachments

(1 file)

This bug was filed from the Socorro interface and is 
report bp-b2b3d531-86ea-41de-b214-4465b0170628.
=============================================================

There are 400 crashes in beta 55 and is #34 in top crashers.
:surkov, could you investigate please ?
Flags: needinfo?(surkov.alexander)
possibly Document() returns null in [1], but it's not clear how it may happen, aContainer [2], we grab a document from, is in document.

we might want to add diagnostic assertions since the crash is visible on beta.

Eitan, thoughts?

[1] https://hg.mozilla.org/releases/mozilla-beta/annotate/af94af58d1f3/accessible/generic/DocAccessible.cpp#l1802
[2] https://hg.mozilla.org/releases/mozilla-beta/annotate/af94af58d1f3/accessible/generic/DocAccessible.cpp#l1851
Flags: needinfo?(surkov.alexander) → needinfo?(eitan)
See Also: → 1376754
Due to my testing for bug 1376754 I came across this one. I'm unable to reproduce this bug with 54. So I've narrowed down the regression range with MozRegression.

Steps to reproduce:

1. visit http://www.voetbalzone.nl/social_item.asp?uid=9412
2. accept cookies
3. wait until the page is fully loaded (if no crash occurs, then hard reload the page until a crash happens. Less than five hard reloads should be sufficient.)

Actual result: browser crashes

Expected result: no crash

According to MozRegression this is caused by Bug 1342433 - onclick changes shouldn't recreate the tree, r=yzen

It's probably not e10s related. See https://bugzilla.mozilla.org/show_bug.cgi?id=1376754#c3 and further

I won't change the tracking flag of firefox54 for now.
Blocks: 1342433
Has Regression Range: --- → yes
Has STR: --- → yes
(In reply to Arjen de Jager from comment #2)

> Steps to reproduce:
> 
> 1. visit http://www.voetbalzone.nl/social_item.asp?uid=9412
> 2. accept cookies
> 3. wait until the page is fully loaded (if no crash occurs, then hard reload
> the page until a crash happens. Less than five hard reloads should be
> sufficient.)
> 
> Actual result: browser crashes

I've got two crash signatures so far [1] (no bug) and bug 1380199 ('Event coalescence killed the accessible' assertion), and no crash signatures for this bug.

[1] https://crash-stats.mozilla.com/report/index/f4a82bbf-58d4-4dba-b03d-c3f050170717
I managed to capture this in rr, I'll work on it tomorrow.
I suspect this is a regression from bug 1363027.

In short, we shut down the accessible without uncaching it.

What happens is:
1. A parent is moved, so a hide event is queued.
2. Then a child is removed, DocAccessible::ContentRemoved is called
  a. Its hide event is coalesced with its parent's event.
  b. DropMutationEvent shuts down the child (Accessible::Shutdown), it goes to defunct state with no doc.
  c. ContentRemoved returns early and does not uncache the child and its subtree.
3. Child node is re-inserted
  a. GetAccessible gets a defunct accessible with a null doc.
  b. crash
Assignee: nobody → eitan
Flags: needinfo?(eitan)
(In reply to Eitan Isaacson [:eeejay] from comment #5)
> I suspect this is a regression from bug 1363027.
> 
> In short, we shut down the accessible without uncaching it.
> 
> What happens is:
> 1. A parent is moved, so a hide event is queued.
> 2. Then a child is removed, DocAccessible::ContentRemoved is called
>   a. Its hide event is coalesced with its parent's event.
>   b. DropMutationEvent shuts down the child (Accessible::Shutdown), it goes
> to defunct state with no doc.
>   c. ContentRemoved returns early and does not uncache the child and its
> subtree.

ShutdownChildrenInSubtree is supposed to uncache it anyway. It's not called by DropMutationEvent?
I first thought that this could be remedied by checking if given parent is a show event[1] target before checking for it being a hide event[2] target. That would have indicated a move.

The problem is, that at least in this case, the parent's show event is dropped in favor of a coalesced grandparent show event, so swapping those blocks wouldn't always help.

1. https://dxr.mozilla.org/mozilla-central/rev/1b065ffd8a535a0ad4c39a912af18e948e6a42c1/accessible/base/NotificationController.cpp#391
2. https://dxr.mozilla.org/mozilla-central/rev/1b065ffd8a535a0ad4c39a912af18e948e6a42c1/accessible/base/NotificationController.cpp#387
(In reply to alexander :surkov from comment #6)
> (In reply to Eitan Isaacson [:eeejay] from comment #5)
> > I suspect this is a regression from bug 1363027.
> > 
> > In short, we shut down the accessible without uncaching it.
> > 
> > What happens is:
> > 1. A parent is moved, so a hide event is queued.
> > 2. Then a child is removed, DocAccessible::ContentRemoved is called
> >   a. Its hide event is coalesced with its parent's event.
> >   b. DropMutationEvent shuts down the child (Accessible::Shutdown), it goes
> > to defunct state with no doc.
> >   c. ContentRemoved returns early and does not uncache the child and its
> > subtree.
> 
> ShutdownChildrenInSubtree is supposed to uncache it anyway. It's not called
> by DropMutationEvent?

Yeah, you are right. The accessible isn't cached anymore, but eIsNotInDocument state is not set.

eIsNotInDocument is set in UncacheChildrenInSubtreem but not in UnbindFromDocument, this seems like a bug.

So, this crash happens both because we wrongly(?) coalesce the hide event, and subsequently, we remove the accessible from the doc with the buggy UnbindFromDocument, instead of the usual path of UncacheChildrenInSubtreem.
(In reply to Eitan Isaacson [:eeejay] from comment #8)

> eIsNotInDocument is set in UncacheChildrenInSubtreem but not in
> UnbindFromDocument, this seems like a bug.
> 
> So, this crash happens both because we wrongly(?) coalesce the hide event,
> and subsequently, we remove the accessible from the doc with the buggy
> UnbindFromDocument, instead of the usual path of UncacheChildrenInSubtreem.

Correct. There's indeed something wrong with the coalescence system, and it makes sense to update eIsNotInDocument flag in this case. The latter one seems to be a sufficient fix for this bug as well.
I spent forever trying to reproduce this in a clean test with no luck. Figure my time is better spent elsewhere, and the fix looks straightforward.
Attachment #8888495 - Flags: review?(surkov.alexander)
Comment on attachment 8888495 [details] [diff] [review]
Set eIsNotInDocument to accessible's state when unbinding from doc. r?surkov

Review of attachment 8888495 [details] [diff] [review]:
-----------------------------------------------------------------

On a second glance it shouldn't fix the crash (this is a correct thing though I believe), because DocAccessible::GetAccessible doesn't rely on this flag.

I'm still not sure I have of parts of the puzzle. Is the problem here is we fail to remove the accessible from mNodeToAccessibleMap for some reason? If so, then do you know how it may happen?

::: accessible/generic/DocAccessible.cpp
@@ +1337,5 @@
>    if (aAccessible->IsNodeMapEntry() &&
>        mNodeToAccessibleMap.Get(aAccessible->GetNode()) == aAccessible)
>      mNodeToAccessibleMap.Remove(aAccessible->GetNode());
>  
> +  aAccessible->mStateFlags |= eIsNotInDocument;

Accessible::Shutdown probably would be a more appropriate place
Comment on attachment 8888495 [details] [diff] [review]
Set eIsNotInDocument to accessible's state when unbinding from doc. r?surkov

cancelling review (bumps in email queue), comment #11 needs to be addressed
Flags: needinfo?(eitan)
Attachment #8888495 - Flags: review?(surkov.alexander)
Priority: -- → P1
(In reply to alexander :surkov from comment #11)
> Comment on attachment 8888495 [details] [diff] [review]
> Set eIsNotInDocument to accessible's state when unbinding from doc. r?surkov
> 
> Review of attachment 8888495 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> On a second glance it shouldn't fix the crash (this is a correct thing
> though I believe), because DocAccessible::GetAccessible doesn't rely on this
> flag.
> 
> I'm still not sure I have of parts of the puzzle. Is the problem here is we
> fail to remove the accessible from mNodeToAccessibleMap for some reason? If
> so, then do you know how it may happen?
> 
> ::: accessible/generic/DocAccessible.cpp
> @@ +1337,5 @@
> >    if (aAccessible->IsNodeMapEntry() &&
> >        mNodeToAccessibleMap.Get(aAccessible->GetNode()) == aAccessible)
> >      mNodeToAccessibleMap.Remove(aAccessible->GetNode());
> >  
> > +  aAccessible->mStateFlags |= eIsNotInDocument;
> 
> Accessible::Shutdown probably would be a more appropriate place

ProcessContentInserted does. That is what catches us from processing a container that has been removed in the meantime.

The stack trace is misleading. The crash occurs because the document is null. Because the container returns a null document in ProcessContentInserted. Here stack trace with argument values:

#0  0x00007fffe6d4ae78 in mozilla::a11y::DocAccessible::GetAccessible(nsINode*) const (this=0x0, aNode=0x7fffcc5c0b00) at /home/eitan/Mozilla/gecko/accessible/generic/DocAccessible.h:237
#1  0x00007fffe6d8dbc0 in mozilla::a11y::DocAccessible::GetAccessibleOrContainer(nsINode*) const (this=0x0, aNode=0x7fffcc5c0b00) at /home/eitan/Mozilla/gecko/accessible/generic/DocAccessible.cpp:1253
#2  0x00007fffe6d6216d in mozilla::a11y::DocAccessible::AccessibleOrTrueContainer(nsINode*) const (this=0x0, aNode=0x7fffcc5c0b00) at /home/eitan/Mozilla/gecko/accessible/generic/DocAccessible-inl.h:30
#3  0x00007fffe6d9d8e2 in InsertIterator::Next() (this=0x7fffffffa340)
    at /home/eitan/Mozilla/gecko/accessible/generic/DocAccessible.cpp:1812
#4  0x00007fffe6d9db6b in mozilla::a11y::DocAccessible::ProcessContentInserted(mozilla::a11y::Accessible*, nsTArray<nsCOMPtr<nsIContent> > const*) (this=0x7fffa1708a60, aContainer=0x7fffac9665c0, aNodes=0x7fffccc4ce70) at /home/eitan/Mozilla/gecko/accessible/generic/DocAccessible.cpp:1871
#5  0x00007fffe6d5bbba in mozilla::a11y::NotificationController::WillRefresh(mozilla::TimeStamp) (this=0x7fffb6802040, aTime=...) at /home/eitan/Mozilla/gecko/accessible/base/NotificationController.cpp:727
#6  0x00007fffe4ddbb1d in nsRefreshDriver::Tick(long, mozilla::TimeStamp) (this=0x7fffacf0c400, aNowEpoch=1501005526235087, aNowTime=...) at /home/eitan/Mozilla/gecko/layout/base/nsRefreshDriver.cpp:1861
Flags: needinfo?(eitan)
Attachment #8888495 - Flags: review?(surkov.alexander)
Duplicate of this bug: 1380172
See Also: → 1380153
Comment on attachment 8888495 [details] [diff] [review]
Set eIsNotInDocument to accessible's state when unbinding from doc. r?surkov

Review of attachment 8888495 [details] [diff] [review]:
-----------------------------------------------------------------

r=me, thanks!

what do you think about moving this flag unsetting under Accessible::Shutdown() btw?
Attachment #8888495 - Flags: review?(surkov.alexander) → review+
(In reply to alexander :surkov from comment #15)
> Comment on attachment 8888495 [details] [diff] [review]
> Set eIsNotInDocument to accessible's state when unbinding from doc. r?surkov
> 
> Review of attachment 8888495 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> r=me, thanks!
> 
> what do you think about moving this flag unsetting under
> Accessible::Shutdown() btw?

I don't have my head completely around all the shutdown stages, why would you want to unbind the top node and not uncache all the children? When do you do the former without calling shutdown?

If I were to change this and make it a bit more foolproof, I would do:

bool IsInDocument() const { return mDoc != nullptr; }
(In reply to Eitan Isaacson [:eeejay] from comment #16)

> > what do you think about moving this flag unsetting under
> > Accessible::Shutdown() btw?
> 
> I don't have my head completely around all the shutdown stages, why would
> you want to unbind the top node and not uncache all the children? When do
> you do the former without calling shutdown?

Not sure I follow the questions. My suggestion was to move eIsInDocument flag unsetting from DocAccessible::ShutdownChildrenInSubtree into Accessible::Shutdown. ShutdownChildrenInSubtree calls Shutdown for each accessible it traverses, so it's basically same, expect it feels a bit weird that Shutdown doesn't make sure that all properties are unset properly.

> If I were to change this and make it a bit more foolproof, I would do:
> 
> bool IsInDocument() const { return mDoc != nullptr; }

I think we don't unset mDoc during unbinding stage and use a flag for that, because we need mDoc later. So this approach shouldn't work.
Blocks: 1380153
See Also: 1380153
Since this is a crasher, and not a big change I will nominate this for beta once this settles in on central.
Keywords: checkin-needed
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/63f14ebd2473
Set eIsNotInDocument to accessible's state when unbinding from doc. r=surkov
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/63f14ebd2473
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
FWIW, we built our final 55 beta today. The next build will be RC1 on Monday. Are you still aiming to get this uplifted into 55? If so, please nominate it ASAP :)
Flags: needinfo?(eitan)
Comment on attachment 8888495 [details] [diff] [review]
Set eIsNotInDocument to accessible's state when unbinding from doc. r?surkov

Approval Request Comment
[Feature/Bug causing the regression]: 
[User impact if declined]: Crashes on certain content when accessibility is enabled.
[Is this code covered by automated tests?]: Not yet, I'm working on simplifying a case where this can happen
[Has the fix been verified in Nightly?]: Yes.
[Needs manual test from QE? If yes, steps to reproduce]: Same steps as above, with accessibility enabled:
1. visit http://www.voetbalzone.nl/social_item.asp?uid=9412
2. accept cookies
3. wait until the page is fully loaded (if no crash occurs, then hard reload the page until a crash happens. Less than five hard reloads should be sufficient.)

[List of other uplifts needed for the feature/fix]: None
[Is the change risky?]: I don't think so.
[Why is the change risky/not risky?]: It is one line that was overlooked before.
[String changes made/needed]: None.
Flags: needinfo?(eitan)
Attachment #8888495 - Flags: approval-mozilla-beta?
Verified this bug is fixed on 56.0a1 20170729100254 with e10s turned off.
(In reply to Eitan Isaacson [:eeejay] from comment #22)
> Comment on attachment 8888495 [details] [diff] [review]
> Set eIsNotInDocument to accessible's state when unbinding from doc. r?surkov
> 
> Approval Request Comment
> [Feature/Bug causing the regression]: 
> [User impact if declined]: Crashes on certain content when accessibility is
> enabled.
> [Is this code covered by automated tests?]: Not yet, I'm working on
> simplifying a case where this can happen
> [Has the fix been verified in Nightly?]: Yes.
> [Needs manual test from QE? If yes, steps to reproduce]: Same steps as
> above, with accessibility enabled:
> 1. visit http://www.voetbalzone.nl/social_item.asp?uid=9412
> 2. accept cookies
> 3. wait until the page is fully loaded (if no crash occurs, then hard reload
> the page until a crash happens. Less than five hard reloads should be
> sufficient.)
> 
> [List of other uplifts needed for the feature/fix]: None
> [Is the change risky?]: I don't think so.
> [Why is the change risky/not risky?]: It is one line that was overlooked
> before.
> [String changes made/needed]: None.

It turned out the URL was a temporary one. Now it forwards to a permanent URL, which also did crash. Therefore changing the URL accordingly to http://www.voetbalzone.nl/social.asp.
See Also: 1376754
Comment on attachment 8888495 [details] [diff] [review]
Set eIsNotInDocument to accessible's state when unbinding from doc. r?surkov

a11y crash fix for 55.0 rc1
Attachment #8888495 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
the patch is landed, but it'd be curious to get comment #17 commented anyways (Eitan, that one might slept off the radar).
(In reply to alexander :surkov from comment #27)
> the patch is landed, but it'd be curious to get comment #17 commented
> anyways (Eitan, that one might slept off the radar).

In a new bug, of course :)
crash-stats is showing this signature happening a lot on ESR52 still. Please nominate the patch for approval when you get a chance.
Flags: needinfo?(eitan)
Comment on attachment 8888495 [details] [diff] [review]
Set eIsNotInDocument to accessible's state when unbinding from doc. r?surkov

[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration:
User impact if declined: Crashes
Fix Landed on Version: 55
Risk to taking this patch (and alternatives if risky): This should not be risky. It has been in central and beta for a while and has not introduced regressions.
String or UUID changes made by this patch: 

See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for more info.
Flags: needinfo?(eitan)
Attachment #8888495 - Flags: approval-mozilla-esr52?
Comment on attachment 8888495 [details] [diff] [review]
Set eIsNotInDocument to accessible's state when unbinding from doc. r?surkov

high volume crash fix for esr52.4
Attachment #8888495 - Flags: approval-mozilla-esr52? → approval-mozilla-esr52+
You need to log in before you can comment on or make changes to this bug.