Crash in mozilla::a11y::TreeWalker::TreeWalker

RESOLVED FIXED in Firefox 39, Firefox OS v2.2

Status

()

RESOLVED FIXED
4 years ago
3 years ago

People

(Reporter: ntroast, Assigned: yzen)

Tracking

({crash})

unspecified
mozilla39
ARM
Gonk (Firefox OS)
crash
Points:
---

Firefox Tracking Flags

(blocking-b2g:2.2+, firefox37 wontfix, firefox38 wontfix, firefox39 fixed, b2g-v2.2 fixed, b2g-master fixed)

Details

(Whiteboard: [b2g-crash][caf priority: p2][CR 811496][caf-crash 584])

Attachments

(7 attachments, 1 obsolete attachment)

(Reporter)

Description

4 years ago
The following crash signature was observed. cafbot will upload the decoded minidump and extra file

[@ mozilla::a11y::TreeWalker::TreeWalker | nsAccessibilityService::ContentRemoved | mozilla::ElementRestyler::SendAccessibilityNotifications | mozilla::ElementRestyler::RestyleChildren ]

Related Crash: https://crash-stats.mozilla.com/report/index/dd9a84b3-488d-4957-b33c-5f6c22150315
Created attachment 8580820 [details]
EXTRA file attachment -
Created attachment 8580822 [details]
decoded minidump -

Comment 3

4 years ago
Hi David,

Please assign someone on your team to help to root-cause this crash or route this to the appropriate team.

Thanks!
Mike
Component: General → Disability Access APIs
Flags: needinfo?(dbolter)
Product: Firefox OS → Core

Comment 4

4 years ago
Additional info from CAF (Code Aurora Foundation) regarding urgency:


On Mar 20, 2015, at 2:53 PM, Kumar, Indrajeet <ikumar@codeaurora.org> wrote:

Hi Mike,
   Can you please find someone to take a look at https://bugzilla.mozilla.org/show_bug.cgi?id=1145724
As per the description Mozilla has also seen it earlier and the fix could be as simple as a null check.

This crash has happened 20 times already in our stability testing in just one run so we need to fix it asap.
Current MTBF is 4 and needs to reach 30+ in a week!

Thanks,
-Inder
(In reply to Nicholas Troast [:ntroast] from comment #0)

> Related Crash:
> https://crash-stats.mozilla.com/report/index/dd9a84b3-488d-4957-b33c-
> 5f6c22150315

Hmmm, this looks like a Firefox 23 crash on Windows NT and I couldn't find any similar reports. There is a PluginTimerCallBack::Notify call in the stack which I'm not familiar with...

Alex any ideas?
Flags: needinfo?(dbolter) → needinfo?(surkov.alexander)
My drive-by guess is we're getting a nullptr for aContext at https://dxr.mozilla.org/mozilla-central/source/accessible/base/TreeWalker.cpp#24

The segfault in attachment 8580822 [details] is quite close to 0 and the constructor's initialization list dereferences aContext->Document() without a NULL check.

nsAccessibilityService::ContentRemoved() isn't checking for NULL before creating a TreeWalker?

TreeWalker will assert a non-NULL aContext at https://dxr.mozilla.org/mozilla-central/source/accessible/base/TreeWalker.cpp#27, even though it'll never get a chance to make that assertion due to the potential nullptr dereference on line 24, which we have managed to hit?
Whiteboard: [caf-crash 584]
(In reply to Michael Vines [:m1] [:evilmachines] (PTO 3/21-4/4) from comment #6)
> My drive-by guess is we're getting a nullptr for aContext at
> https://dxr.mozilla.org/mozilla-central/source/accessible/base/TreeWalker.
> cpp#24
> 
> The segfault in attachment 8580822 [details] is quite close to 0 and the
> constructor's initialization list dereferences aContext->Document() without
> a NULL check.
> 
> nsAccessibilityService::ContentRemoved() isn't checking for NULL before
> creating a TreeWalker?

I believe this is correct, though I'm not sure if it should be possible for doc->GetAccessibleOrContainer() to return nullptr in this case.

I wish someone would debug why GetAccessibleOrContainer returns null, but I doubt I'm going to get to it so if you want to take a patch for release branches adding a null check I can live with that.
(In reply to Trevor Saunders (:tbsaunde) from comment #7)
> 
> I wish someone would debug why GetAccessibleOrContainer returns null, but I
> doubt I'm going to get to it so if you want to take a patch for release
> branches adding a null check I can live with that.

We've only been able to reproduce this in monkey testing I believe, so no better STR than run monkey for hours until it crashes.  However we can easily bring in debug patches that add more logging to gather additional data around the event.  Integrating bug 1043112 into our builds could really help here too, but we're not there yet.  So logcat tracing it is for now :(

Yes, a patch with moar null checking would be a big help for the b2g v2.2 release!  Then we can resume the hunt for other crashes that this one may be masking.

Updated

4 years ago
Whiteboard: [caf-crash 584] → [CR 811496][caf-crash 584]

Updated

4 years ago
Whiteboard: [CR 811496][caf-crash 584] → [caf priority: p2][CR 811496][caf-crash 584]

Updated

4 years ago
Whiteboard: [caf priority: p2][CR 811496][caf-crash 584] → [b2g-crash][caf priority: p2][CR 811496][caf-crash 584]

Updated

4 years ago
Keywords: crash
Created attachment 8581225 [details]
EXTRA file attachment - AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.106
Created attachment 8581226 [details]
decoded minidump - AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.106
(In reply to Michael Vines [:m1] [:evilmachines] (PTO 3/21-4/4) from comment #8)
> (In reply to Trevor Saunders (:tbsaunde) from comment #7)
> > 
> > I wish someone would debug why GetAccessibleOrContainer returns null, but I
> > doubt I'm going to get to it so if you want to take a patch for release
> > branches adding a null check I can live with that.
> 
> We've only been able to reproduce this in monkey testing I believe, so no
> better STR than run monkey for hours until it crashes.  However we can
> easily bring in debug patches that add more logging to gather additional
> data around the event. 

can you enable "tree" logging pls? see http://asurkov.blogspot.ca/2012/10/debugging-firefox-accessibility.html
Flags: needinfo?(surkov.alexander)

Comment 13

4 years ago
> can you enable "tree" logging pls? see
> http://asurkov.blogspot.ca/2012/10/debugging-firefox-accessibility.html

Hi Alex, how much extra logging will this generate?  I want to know if it's safe to turn this on in more than just an engineering build.
a lot of logging, every content insertion/deletion/some style changes will generate into it, so it'd be good to restrict that output as much as possible
blocking-b2g: 2.2? → 2.2+
Why is a11y enabled at all?
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #15)
> Why is a11y enabled at all?

I'm not sure what "monkey testing" involves, but maybe the screen reader got enabled?  but I'm not really sure and it is a good question.
I'm reverting this back to ARM/gonk and I think the stack in comment 0 is unrelated.
OS: Linux → Gonk (Firefox OS)
Hardware: x86_64 → ARM
Note I meant unrelated as in not the actual crash instances we're talking about here (since that crash was on windows). The fact that contentremoval is in (all?) the stack(s) may indeed be related.

Comment 19

4 years ago
Nicholas,

Please respond, or have someone at CAF who's able to, respond to Kyle's comment #15:

> Why is a11y enabled at all?


and Trevor's comment #16:

> I'm not sure what "monkey testing" involves, but maybe the screen reader got
> enabled?  but I'm not really sure and it is a good question.


Our Disability Access APIs team (:dbolter) believes a11y should only be enabled by users' specific action via the settings view; cc'ing :eeejay + :yzen for confirmation.

Thanks,
Mike
Flags: needinfo?(ntroast)

Comment 20

4 years ago
If a11y can be turned on through settings, then I think it can turned on by monkey, too.  Monkey is just pressing different areas of the screen at random.

However, I think this particular crash was found by stability test team (through scripting or manual test).  Whether they turned a11y on or not, I'm not sure.  I'll try to find out more.  Hopefully you can carry on with the analysis while I find out, thanks.
Flags: needinfo?(ntroast)
Another interesting question is whether this code works in the presence of web components.

Comment 22

4 years ago
add ni for myself for tracking.
Flags: needinfo?(sku)
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #21)
> Another interesting question is whether this code works in the presence of
> web components.

blaim says it comes from bug 1040735 so it should work.  Though clearly there exists some case in which it doesn't.

Comment 24

4 years ago
The tester that found the issue told me the following:

The accessibility option in settings shows only one menu “color Filter” which is disabled. Still we are able to reproduce the crash.
Below are the steps to reproduce this-
1. Dial a number in dial pad.
2. Long press the green icon used to make a call.
3. You will find phone just crashed pop-up.
Created attachment 8582241 [details]
EXTRA file attachment - AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.110
Created attachment 8582242 [details]
decoded minidump - AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.110

Comment 28

4 years ago
we can not repo. this issue on both ZTE open L and Nexus5-L per comment 24.

Comment 29

4 years ago
Hi David:
 Could you please also check this issue to see any light to shed?
Flags: needinfo?(dbaron)
What are the hg changesets (or git revisions) for the source corresponding to the minidumps in comment 2 and comment 27?  It's hard to analyze them with confidence without knowing that.

(Also, I'm in meetings all day today.)
Flags: needinfo?(dbaron)
Er, I guess comment 25 answers that question for comment 27's minidump (which I didn't look at as closely as comment 2's).
Flags: needinfo?(dbaron)
(Assignee)

Comment 32

4 years ago
(In reply to Greg Grisco from comment #24)
> The tester that found the issue told me the following:
> 
> The accessibility option in settings shows only one menu “color Filter”
> which is disabled. 

AFAIK, if the menu for screen reader is not visible, it has never been activated by the user. The screen reader menu item would remain visible forever if it was enabled at least once, or enabled via the Developer Settings.

> Still we are able to reproduce the crash.
> Below are the steps to reproduce this-
> 1. Dial a number in dial pad.
> 2. Long press the green icon used to make a call.
> 3. You will find phone just crashed pop-up.

Comment 33

4 years ago
(In reply to Yura Zenevich [:yzen] from comment #32)
> (In reply to Greg Grisco from comment #24)
> 
> AFAIK, if the menu for screen reader is not visible, it has never been
> activated by the user. The screen reader menu item would remain visible
> forever if it was enabled at least once, or enabled via the Developer
> Settings.

So you're saying that it seems inconsistent that with screen reader never being enabled that we're landing in a11y code?  Is there any explanation for this?

Comment 34

4 years ago
This one crash has been seen so many times that it could block our FC, we need to have some solution for this very soon.
Flags: needinfo?(mlee)

Comment 35

4 years ago
Yura, please respond to Greg's comment #33:

> So you're saying that it seems inconsistent that with screen reader never
> being enabled that we're landing in a11y code?  Is there any explanation for
> this?

As Greg mentions in comment 34, this is a critical crash that we need to resolve and understanding what's happening with a11y seems to be key to achieving that.
 
Thanks,
Mike
Flags: needinfo?(mlee) → needinfo?(yzenevich)

Comment 36

4 years ago
Revisiting comment 6, 7 and 8, should null checking be done?
(Assignee)

Comment 37

4 years ago
(In reply to Greg Grisco from comment #33)
> (In reply to Yura Zenevich [:yzen] from comment #32)
> > (In reply to Greg Grisco from comment #24)
> > 
> > AFAIK, if the menu for screen reader is not visible, it has never been
> > activated by the user. The screen reader menu item would remain visible
> > forever if it was enabled at least once, or enabled via the Developer
> > Settings.
> 
> So you're saying that it seems inconsistent that with screen reader never
> being enabled that we're landing in a11y code?  Is there any explanation for
> this?

So far the only explanation I have is based on comment 20:

> However, I think this particular crash was found by stability test team
> (through scripting or manual test).  Whether they turned a11y on or not, I'm
> not sure.  I'll try to find out more.  Hopefully you can carry on with the
> analysis while I find out, thanks.

If at any point an integration test was run, accessibility will be turned on.
Flags: needinfo?(yzenevich)
(Assignee)

Comment 38

4 years ago
Created attachment 8583346 [details] [diff] [review]
1145724 null check patch

Adding a null check patch in the meanwhile.
Attachment #8583346 - Flags: review?(surkov.alexander)

Comment 39

4 years ago
bug 1147646 is another a11y crash that was recently found by stress test.  Although, that crash has only been seen once so far.  Referencing it here in case it helps the analysis.

I'm going to apply the null-check patch, thanks.

Comment 40

4 years ago
Yura, could you help on this bug?
Assignee: nobody → yzenevich
Comment on attachment 8583346 [details] [diff] [review]
1145724 null check patch

Review of attachment 8583346 [details] [diff] [review]:
-----------------------------------------------------------------

r=me with comment fixed

::: accessible/base/nsAccessibilityService.cpp
@@ +535,5 @@
>      if (!child) {
> +      Accessible* container = document->GetContainerAccessible(aChildNode);
> +      if (container) {
> +        a11y::TreeWalker walker(container, aChildNode,
> +                              a11y::TreeWalker::eWalkCache);

container is not really used when you don't create the tree so you can use 'document' if container is null
nit: please make sure you lined up arguments
Attachment #8583346 - Flags: review?(surkov.alexander) → review+
(Assignee)

Comment 42

4 years ago
Created attachment 8583813 [details] [diff] [review]
1145724 null check patch v2

Addressed Alex's comments, carrying over r+
Attachment #8583813 - Flags: review+
(Assignee)

Updated

4 years ago
Attachment #8583346 - Attachment is obsolete: true

Updated

4 years ago
Whiteboard: [b2g-crash][caf priority: p2][CR 811496][caf-crash 584] → [b2g-crash][caf priority: p1][CR 811496][caf-crash 584]
(Assignee)

Comment 43

4 years ago
Ill land the null check patch once the tree is open.

Updated

4 years ago
Whiteboard: [b2g-crash][caf priority: p1][CR 811496][caf-crash 584] → [b2g-crash][caf priority: p2][CR 811496][caf-crash 584]

Updated

4 years ago
Flags: needinfo?(sku)
https://hg.mozilla.org/mozilla-central/rev/3f5c7e485c60
Status: NEW → RESOLVED
Last Resolved: 4 years ago
status-firefox39: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla39

Updated

4 years ago
Attachment #8583813 - Flags: approval-mozilla-b2g37?

Updated

4 years ago
Flags: needinfo?(bbajaj)
Comment on attachment 8582242 [details]
decoded minidump - AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.110

[Triage Comment]

Approving the low risk null check
Flags: needinfo?(bbajaj)
Attachment #8582242 - Flags: approval-mozilla-b2g37+

Updated

4 years ago
Attachment #8583813 - Flags: approval-mozilla-b2g37? → approval-mozilla-b2g37+

Updated

4 years ago
Attachment #8582242 - Flags: approval-mozilla-b2g37+
https://hg.mozilla.org/releases/mozilla-b2g37_v2_2/rev/4d973ff565c7
status-b2g-v2.2: --- → fixed
status-b2g-master: --- → fixed
status-firefox37: --- → wontfix
status-firefox38: --- → wontfix
Flags: needinfo?(dbaron)
You need to log in before you can comment on or make changes to this bug.