Closed
Bug 1145724
Opened 10 years ago
Closed 10 years ago
Crash in mozilla::a11y::TreeWalker::TreeWalker
Categories
(Core :: Disability Access APIs, defect)
Tracking
()
People
(Reporter: ntroast, Assigned: yzen)
References
Details
(Keywords: crash, Whiteboard: [b2g-crash][caf priority: p2][CR 811496][caf-crash 584])
Attachments
(7 files, 1 obsolete file)
The following crash signature was observed. cafbot will upload the decoded minidump and extra file
[@ mozilla::a11y::TreeWalker::TreeWalker | nsAccessibilityService::ContentRemoved | mozilla::ElementRestyler::SendAccessibilityNotifications | mozilla::ElementRestyler::RestyleChildren ]
Related Crash: https://crash-stats.mozilla.com/report/index/dd9a84b3-488d-4957-b33c-5f6c22150315
Comment 1•10 years ago
|
||
Comment 2•10 years ago
|
||
Comment 3•10 years ago
|
||
Hi David,
Please assign someone on your team to help to root-cause this crash or route this to the appropriate team.
Thanks!
Mike
Component: General → Disability Access APIs
Flags: needinfo?(dbolter)
Product: Firefox OS → Core
Comment 4•10 years ago
|
||
Additional info from CAF (Code Aurora Foundation) regarding urgency:
On Mar 20, 2015, at 2:53 PM, Kumar, Indrajeet <ikumar@codeaurora.org> wrote:
Hi Mike,
Can you please find someone to take a look at https://bugzilla.mozilla.org/show_bug.cgi?id=1145724
As per the description Mozilla has also seen it earlier and the fix could be as simple as a null check.
This crash has happened 20 times already in our stability testing in just one run so we need to fix it asap.
Current MTBF is 4 and needs to reach 30+ in a week!
Thanks,
-Inder
Comment 5•10 years ago
|
||
(In reply to Nicholas Troast [:ntroast] from comment #0)
> Related Crash:
> https://crash-stats.mozilla.com/report/index/dd9a84b3-488d-4957-b33c-
> 5f6c22150315
Hmmm, this looks like a Firefox 23 crash on Windows NT and I couldn't find any similar reports. There is a PluginTimerCallBack::Notify call in the stack which I'm not familiar with...
Alex any ideas?
Flags: needinfo?(dbolter) → needinfo?(surkov.alexander)
Comment 6•10 years ago
|
||
My drive-by guess is we're getting a nullptr for aContext at https://dxr.mozilla.org/mozilla-central/source/accessible/base/TreeWalker.cpp#24
The segfault in attachment 8580822 [details] is quite close to 0 and the constructor's initialization list dereferences aContext->Document() without a NULL check.
nsAccessibilityService::ContentRemoved() isn't checking for NULL before creating a TreeWalker?
TreeWalker will assert a non-NULL aContext at https://dxr.mozilla.org/mozilla-central/source/accessible/base/TreeWalker.cpp#27, even though it'll never get a chance to make that assertion due to the potential nullptr dereference on line 24, which we have managed to hit?
Whiteboard: [caf-crash 584]
Comment 7•10 years ago
|
||
(In reply to Michael Vines [:m1] [:evilmachines] (PTO 3/21-4/4) from comment #6)
> My drive-by guess is we're getting a nullptr for aContext at
> https://dxr.mozilla.org/mozilla-central/source/accessible/base/TreeWalker.
> cpp#24
>
> The segfault in attachment 8580822 [details] is quite close to 0 and the
> constructor's initialization list dereferences aContext->Document() without
> a NULL check.
>
> nsAccessibilityService::ContentRemoved() isn't checking for NULL before
> creating a TreeWalker?
I believe this is correct, though I'm not sure if it should be possible for doc->GetAccessibleOrContainer() to return nullptr in this case.
I wish someone would debug why GetAccessibleOrContainer returns null, but I doubt I'm going to get to it so if you want to take a patch for release branches adding a null check I can live with that.
Comment 8•10 years ago
|
||
(In reply to Trevor Saunders (:tbsaunde) from comment #7)
>
> I wish someone would debug why GetAccessibleOrContainer returns null, but I
> doubt I'm going to get to it so if you want to take a patch for release
> branches adding a null check I can live with that.
We've only been able to reproduce this in monkey testing I believe, so no better STR than run monkey for hours until it crashes. However we can easily bring in debug patches that add more logging to gather additional data around the event. Integrating bug 1043112 into our builds could really help here too, but we're not there yet. So logcat tracing it is for now :(
Yes, a patch with moar null checking would be a big help for the b2g v2.2 release! Then we can resume the hunt for other crashes that this one may be masking.
Updated•10 years ago
|
Whiteboard: [caf-crash 584] → [CR 811496][caf-crash 584]
Updated•10 years ago
|
Whiteboard: [CR 811496][caf-crash 584] → [caf priority: p2][CR 811496][caf-crash 584]
Updated•10 years ago
|
Whiteboard: [caf priority: p2][CR 811496][caf-crash 584] → [b2g-crash][caf priority: p2][CR 811496][caf-crash 584]
Comment 9•10 years ago
|
||
Observed on:
Device: msm8909
Gonk Version: AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.106
Moz BuildID: 20150317002504
Manifest: https://www.codeaurora.org/cgit/quic/lf/b2g/manifest/tree/caf_AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.106.xml?h=release
Gecko Version: 37.0
Gaia: http://git.mozilla.org/?p=releases/gaia.git;a=commit;h=d0e09d5e6367e558824f9cbf691da99cedf63037
Gecko: http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=bf84ffe0f7b44cc5248a84d1c04ec61d38547df9
Patches: bug 1133147, bug 1139469, bug 1142770, bug 1053634
Comment 10•10 years ago
|
||
Comment 11•10 years ago
|
||
Comment 12•10 years ago
|
||
(In reply to Michael Vines [:m1] [:evilmachines] (PTO 3/21-4/4) from comment #8)
> (In reply to Trevor Saunders (:tbsaunde) from comment #7)
> >
> > I wish someone would debug why GetAccessibleOrContainer returns null, but I
> > doubt I'm going to get to it so if you want to take a patch for release
> > branches adding a null check I can live with that.
>
> We've only been able to reproduce this in monkey testing I believe, so no
> better STR than run monkey for hours until it crashes. However we can
> easily bring in debug patches that add more logging to gather additional
> data around the event.
can you enable "tree" logging pls? see http://asurkov.blogspot.ca/2012/10/debugging-firefox-accessibility.html
Flags: needinfo?(surkov.alexander)
Comment 13•10 years ago
|
||
> can you enable "tree" logging pls? see
> http://asurkov.blogspot.ca/2012/10/debugging-firefox-accessibility.html
Hi Alex, how much extra logging will this generate? I want to know if it's safe to turn this on in more than just an engineering build.
Comment 14•10 years ago
|
||
a lot of logging, every content insertion/deletion/some style changes will generate into it, so it'd be good to restrict that output as much as possible
Updated•10 years ago
|
blocking-b2g: 2.2? → 2.2+
Why is a11y enabled at all?
Comment 16•10 years ago
|
||
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #15)
> Why is a11y enabled at all?
I'm not sure what "monkey testing" involves, but maybe the screen reader got enabled? but I'm not really sure and it is a good question.
Comment 17•10 years ago
|
||
I'm reverting this back to ARM/gonk and I think the stack in comment 0 is unrelated.
OS: Linux → Gonk (Firefox OS)
Hardware: x86_64 → ARM
Comment 18•10 years ago
|
||
Note I meant unrelated as in not the actual crash instances we're talking about here (since that crash was on windows). The fact that contentremoval is in (all?) the stack(s) may indeed be related.
Comment 19•10 years ago
|
||
Nicholas,
Please respond, or have someone at CAF who's able to, respond to Kyle's comment #15:
> Why is a11y enabled at all?
and Trevor's comment #16:
> I'm not sure what "monkey testing" involves, but maybe the screen reader got
> enabled? but I'm not really sure and it is a good question.
Our Disability Access APIs team (:dbolter) believes a11y should only be enabled by users' specific action via the settings view; cc'ing :eeejay + :yzen for confirmation.
Thanks,
Mike
Flags: needinfo?(ntroast)
Comment 20•10 years ago
|
||
If a11y can be turned on through settings, then I think it can turned on by monkey, too. Monkey is just pressing different areas of the screen at random.
However, I think this particular crash was found by stability test team (through scripting or manual test). Whether they turned a11y on or not, I'm not sure. I'll try to find out more. Hopefully you can carry on with the analysis while I find out, thanks.
Flags: needinfo?(ntroast)
Another interesting question is whether this code works in the presence of web components.
Comment 23•10 years ago
|
||
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #21)
> Another interesting question is whether this code works in the presence of
> web components.
blaim says it comes from bug 1040735 so it should work. Though clearly there exists some case in which it doesn't.
Comment 24•10 years ago
|
||
The tester that found the issue told me the following:
The accessibility option in settings shows only one menu “color Filter” which is disabled. Still we are able to reproduce the crash.
Below are the steps to reproduce this-
1. Dial a number in dial pad.
2. Long press the green icon used to make a call.
3. You will find phone just crashed pop-up.
Comment 25•10 years ago
|
||
Observed on:
Device: msm8909
Gonk Version: AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.110
Moz BuildID: 20150322002503
Manifest: https://www.codeaurora.org/cgit/quic/lf/b2g/manifest/tree/caf_AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.110.xml?h=release
Gecko Version: 37.0
Gaia: http://git.mozilla.org/?p=releases/gaia.git;a=commit;h=44c62060581fde8de1e12e94cf55e9673b401a47
Gecko: http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=c36e800ae5e5e7e790129e18edaef202d8d3dbff
Patches: bug 1133147, bug 1139469, bug 1053634, bug 1142770
Comment 26•10 years ago
|
||
Comment 27•10 years ago
|
||
Comment 28•10 years ago
|
||
we can not repo. this issue on both ZTE open L and Nexus5-L per comment 24.
Comment 29•10 years ago
|
||
Hi David:
Could you please also check this issue to see any light to shed?
Flags: needinfo?(dbaron)
What are the hg changesets (or git revisions) for the source corresponding to the minidumps in comment 2 and comment 27? It's hard to analyze them with confidence without knowing that.
(Also, I'm in meetings all day today.)
Flags: needinfo?(dbaron)
Er, I guess comment 25 answers that question for comment 27's minidump (which I didn't look at as closely as comment 2's).
Flags: needinfo?(dbaron)
Assignee | ||
Comment 32•10 years ago
|
||
(In reply to Greg Grisco from comment #24)
> The tester that found the issue told me the following:
>
> The accessibility option in settings shows only one menu “color Filter”
> which is disabled.
AFAIK, if the menu for screen reader is not visible, it has never been activated by the user. The screen reader menu item would remain visible forever if it was enabled at least once, or enabled via the Developer Settings.
> Still we are able to reproduce the crash.
> Below are the steps to reproduce this-
> 1. Dial a number in dial pad.
> 2. Long press the green icon used to make a call.
> 3. You will find phone just crashed pop-up.
Comment 33•10 years ago
|
||
(In reply to Yura Zenevich [:yzen] from comment #32)
> (In reply to Greg Grisco from comment #24)
>
> AFAIK, if the menu for screen reader is not visible, it has never been
> activated by the user. The screen reader menu item would remain visible
> forever if it was enabled at least once, or enabled via the Developer
> Settings.
So you're saying that it seems inconsistent that with screen reader never being enabled that we're landing in a11y code? Is there any explanation for this?
Comment 34•10 years ago
|
||
This one crash has been seen so many times that it could block our FC, we need to have some solution for this very soon.
Flags: needinfo?(mlee)
Comment 35•10 years ago
|
||
Yura, please respond to Greg's comment #33:
> So you're saying that it seems inconsistent that with screen reader never
> being enabled that we're landing in a11y code? Is there any explanation for
> this?
As Greg mentions in comment 34, this is a critical crash that we need to resolve and understanding what's happening with a11y seems to be key to achieving that.
Thanks,
Mike
Flags: needinfo?(mlee) → needinfo?(yzenevich)
Comment 36•10 years ago
|
||
Revisiting comment 6, 7 and 8, should null checking be done?
Assignee | ||
Comment 37•10 years ago
|
||
(In reply to Greg Grisco from comment #33)
> (In reply to Yura Zenevich [:yzen] from comment #32)
> > (In reply to Greg Grisco from comment #24)
> >
> > AFAIK, if the menu for screen reader is not visible, it has never been
> > activated by the user. The screen reader menu item would remain visible
> > forever if it was enabled at least once, or enabled via the Developer
> > Settings.
>
> So you're saying that it seems inconsistent that with screen reader never
> being enabled that we're landing in a11y code? Is there any explanation for
> this?
So far the only explanation I have is based on comment 20:
> However, I think this particular crash was found by stability test team
> (through scripting or manual test). Whether they turned a11y on or not, I'm
> not sure. I'll try to find out more. Hopefully you can carry on with the
> analysis while I find out, thanks.
If at any point an integration test was run, accessibility will be turned on.
Flags: needinfo?(yzenevich)
Assignee | ||
Comment 38•10 years ago
|
||
Adding a null check patch in the meanwhile.
Attachment #8583346 -
Flags: review?(surkov.alexander)
Comment 39•10 years ago
|
||
bug 1147646 is another a11y crash that was recently found by stress test. Although, that crash has only been seen once so far. Referencing it here in case it helps the analysis.
I'm going to apply the null-check patch, thanks.
Comment 41•10 years ago
|
||
Comment on attachment 8583346 [details] [diff] [review]
1145724 null check patch
Review of attachment 8583346 [details] [diff] [review]:
-----------------------------------------------------------------
r=me with comment fixed
::: accessible/base/nsAccessibilityService.cpp
@@ +535,5 @@
> if (!child) {
> + Accessible* container = document->GetContainerAccessible(aChildNode);
> + if (container) {
> + a11y::TreeWalker walker(container, aChildNode,
> + a11y::TreeWalker::eWalkCache);
container is not really used when you don't create the tree so you can use 'document' if container is null
nit: please make sure you lined up arguments
Attachment #8583346 -
Flags: review?(surkov.alexander) → review+
Assignee | ||
Comment 42•10 years ago
|
||
Addressed Alex's comments, carrying over r+
Attachment #8583813 -
Flags: review+
Assignee | ||
Updated•10 years ago
|
Attachment #8583346 -
Attachment is obsolete: true
Updated•10 years ago
|
Whiteboard: [b2g-crash][caf priority: p2][CR 811496][caf-crash 584] → [b2g-crash][caf priority: p1][CR 811496][caf-crash 584]
Assignee | ||
Comment 43•10 years ago
|
||
Ill land the null check patch once the tree is open.
Assignee | ||
Comment 44•10 years ago
|
||
Updated•10 years ago
|
Whiteboard: [b2g-crash][caf priority: p1][CR 811496][caf-crash 584] → [b2g-crash][caf priority: p2][CR 811496][caf-crash 584]
Updated•10 years ago
|
Flags: needinfo?(sku)
Comment 45•10 years ago
|
||
Status: NEW → RESOLVED
Closed: 10 years ago
status-firefox39:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla39
Attachment #8583813 -
Flags: approval-mozilla-b2g37?
Comment 46•10 years ago
|
||
Comment on attachment 8582242 [details]
decoded minidump - AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.110
[Triage Comment]
Approving the low risk null check
Flags: needinfo?(bbajaj)
Attachment #8582242 -
Flags: approval-mozilla-b2g37+
Updated•10 years ago
|
Attachment #8583813 -
Flags: approval-mozilla-b2g37? → approval-mozilla-b2g37+
Updated•10 years ago
|
Attachment #8582242 -
Flags: approval-mozilla-b2g37+
Comment 47•10 years ago
|
||
status-b2g-v2.2:
--- → fixed
status-b2g-master:
--- → fixed
status-firefox37:
--- → wontfix
status-firefox38:
--- → wontfix
Flags: needinfo?(dbaron)
You need to log in
before you can comment on or make changes to this bug.
Description
•