Closed Bug 1055004 Opened 10 years ago Closed 10 years ago

crash in mozilla::a11y::DocAccessible::UpdateTree(mozilla::a11y::Accessible*, nsIContent*, bool)

Categories

(Core :: Disability Access APIs, defect)

All
Android
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla34
blocking-b2g 2.1+
Tracking Status
b2g-v2.1 --- verified

People

(Reporter: Bebe, Assigned: eeejay)

References

Details

(4 keywords, Whiteboard: [xfail])

Crash Data

Attachments

(2 files, 2 obsolete files)

This bug was filed from the Socorro interface and is 
report bp-f3fae058-15cc-4f53-aad3-1d5c52140818.
=============================================================

Build ID:

Gaia      aa8aace12d65956dd9525da5dac66e0d3b28597f
Gecko     https://hg.mozilla.org/mozilla-central/rev/0aaa2d3d15cc
BuildID   20140818040201
Version   34.0a1
ro.build.version.incremental=110
ro.build.date=Fri Jun 27 15:57:58 CST 2014
B1TC00011230

STR:
1. Add a smart collection to the home screen ex: "Around Me"
2. Open the smart collection

Expected:
2. the smart collection should open

Actual:

2. The phone crashes


We can reproduce this manually and with automation

test to run to reproduce this:

https://github.com/mozilla-b2g/gaia/blob/master/tests/python/gaia-ui-tests/gaiatest/tests/functional/everythingme/test_everythingme_add_collection_save_bookmark.py
Component: Gaia::Everything.me → Disability Access APIs
Product: Firefox OS → Core
Version: unspecified → Trunk
Is this consistently reproducible (i.e. can you reproduce this with the same STR more than once)?
blocking-b2g: --- → backlog
Flags: needinfo?(florin.strugariu)
Keywords: qawanted, regression
I have been able to reproduce it every time while running that test locally.
Okay. Also that today's automation report shows this not happening on 2.0, so this is a regression from 2.0 --> 2.1.

[Blocking Requested - why for this release]:

QA blocking crash since it causes one of automated tests to permafail & a regression.
blocking-b2g: backlog → 2.1?
Flags: needinfo?(florin.strugariu)
Whiteboard: [xfail]
Keywords: smoketest
Keywords: smoketest
jsmith Yes I can reproduce the crash every time I launch that smart collection with both manual and automation.


From what I can see in the B2G-i builds:

Last good build:

application_buildid: 20140815024915
application_changeset: f0fa964f10f5
application_display_name: B2G
application_name: B2G
application_repository: https://hg.mozilla.org/integration/b2g-inbound
application_version: 34.0a1
build_changeset: 3aa6abd313f965a84aa86c6b213dc154e4875139
device_firmware_date: 1403855878
device_firmware_version_incremental: 110
device_firmware_version_release: 4.3
device_id: flame
gaia_changeset: 40a5e2589b7947543ca57f40393b0c9af5e5fd74
gaia_date: 1408095144
platform_buildid: 20140815024915
platform_changeset: f0fa964f10f5
platform_repository: https://hg.mozilla.org/integration/b2g-inbound


First bad build:

application_buildid: 20140815041216
application_changeset: 192a8a9862a2
application_display_name: B2G
application_name: B2G
application_repository: https://hg.mozilla.org/integration/b2g-inbound
application_version: 34.0a1
build_changeset: 3aa6abd313f965a84aa86c6b213dc154e4875139
device_firmware_date: 1406125546
device_firmware_version_incremental: eng.cltbld.20140723.102536
device_firmware_version_release: 4.3
device_id: flame
gaia_changeset: b7fa47012abbb9fa3df8fa370ce8e9dfc36f3267
gaia_date: 1408098423
platform_buildid: 20140815041216
platform_changeset: 192a8a9862a2
platform_repository: https://hg.mozilla.org/integration/b2g-inbound
Hmm...my guess is that we made a change in Gaia that exposed a gecko crash.

Maybe bug 1050404?

Wilson - What do you think?
Flags: needinfo?(wilsonpage)
Trev, can this bea regression from the recent treeupdate/doc lifecycle / etc. changes you and Alex were working on?
Flags: needinfo?(trev.saunders)
Keywords: qaurgent, smoketest
Keywords: qaurgent
I tried walking through the steps manually from the integration test, but all seems to work as expected. Not experiencing any crashes. Can I get some manual STR?
Flags: needinfo?(wilsonpage)
(In reply to Wilson Page [:wilsonpage] from comment #9)
> I tried walking through the steps manually from the integration test, but
> all seems to work as expected. Not experiencing any crashes. Can I get some
> manual STR?

Have you tried testing on device?
(In reply to Jason Smith [:jsmith] from comment #10)
> Have you tried testing on device?

I am running through steps manually on a flame.
QA Wanted - Can we get the STR clarified here?
Keywords: qaurgent, qawanted
QA Contact: ckreinbring
Unable to repro on today's Flame 2.1.  After adding a smart collection then selecting it, the collection loaded with no errors.

I tried the following tests: One preset collection, one custom collection, multiple preset collections, multiple preset collections with a custom collection, plugged/unplugged, with(out) SIM in the device.

BuildID: 20140818072913
Gaia: ba1992f2addc5a84afc2eab426f222a6bf2962ba
Gecko: bf27e27c994d
Platform Version: 34.0a1
Firmware Version: V123
User Agent: Mozilla/5.0 (Mobile; rv:33.0) Gecko/33.0 Firefox/33.0
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(jmitchell)
Keywords: qaurgent, qawanted
Dropping flags since this actually is not manually reproducible. We need more details from Bebe on how to reproduce this bug.
blocking-b2g: 2.1? → ---
Flags: needinfo?(florin.strugariu)
Whiteboard: [xfail]
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]
Flags: needinfo?(jmitchell)
I also checked the Central Engineering build and couldn't reproduce the crash.  I tried both a WiFi Connection and Data Connection to complete the STR's.

Environmental Variables:
Device: Flame Master
Build ID: 20140818130516
Gaia: 778c39b5597ea424d6a75934221265423ab3c9e7
Gecko: cd7cbdacf9d8
Version: 34.0a1 (Master)
Firmware Version: v123
User Agent: Mozilla/5.0 (Mobile; rv:34.0) Gecko/34.0 Firefox/34.0
(In reply to Marco Zehe (:MarcoZ) from comment #8)
> Trev, can this bea regression from the recent treeupdate/doc lifecycle /
> etc. changes you and Alex were working on?

the only change to this code in many months was Eitan's change to GetAccessibleOrContainer to use GetCurrentCrossShadowDoc().
Flags: needinfo?(trev.saunders)
(In reply to Trevor Saunders (:tbsaunde) from comment #16)
> (In reply to Marco Zehe (:MarcoZ) from comment #8)
> > Trev, can this bea regression from the recent treeupdate/doc lifecycle /
> > etc. changes you and Alex were working on?
> 
> the only change to this code in many months was Eitan's change to
> GetAccessibleOrContainer to use GetCurrentCrossShadowDoc().

Cool thanks! Eitan, any ideas? I know it seems to be reproducible only in tests, but still...
Flags: needinfo?(eitan)
So, looking at the stack it seems pretty clear we're crashing because UpdateTree() is called with null aContainer.  The call stack is junk, but it seems very likely it is DocAccessible::ContentRemoved calling UpdateTree.  The only way for ContentRemoved to pass null as aContainer is for GetAccessibleOrContainer() to return null when it is passed a non null node.  It looks like The first way that can happen is for the node to not be in any document, but I don't think ContentRemoved can be called with a node not in a document.  So I believe the issue is when the node passed  to GetAccessibleOrContainer is in a shadow subtree, and so GetParentNode will never come up into the hosting content.
(In reply to Marco Zehe (:MarcoZ) from comment #17)
> (In reply to Trevor Saunders (:tbsaunde) from comment #16)
> > (In reply to Marco Zehe (:MarcoZ) from comment #8)
> > > Trev, can this bea regression from the recent treeupdate/doc lifecycle /
> > > etc. changes you and Alex were working on?
> > 
> > the only change to this code in many months was Eitan's change to
> > GetAccessibleOrContainer to use GetCurrentCrossShadowDoc().
> 
> Cool thanks! Eitan, any ideas? I know it seems to be reproducible only in
> tests, but still...

I'm pretty sure bug 1027315 should have changed GetParentNode() in GetAccessibleOrContainer to GetFlatteneedTreeParent(), but I'll leave verify that and the patch to Eitan
The only way I can replicate this manually is after having run the automated test on the profile. Thus it's a dirty profile and the bug can be replicated repeatedly.

On a fresh flash I can't replicate it manually.

It screams of a race condition but I sprayed some sleeps around to slow the test down and it didn't seem to help.
Zac is correct here.

I tried to replicate this on a fresh flash phone and I failed to reproduce the issue.

But after running the test I can replicate it every time.


Also this reproduces on the latest build:

Gaia      778c39b5597ea424d6a75934221265423ab3c9e7
Gecko     https://hg.mozilla.org/mozilla-central/rev/cd7cbdacf9d8
BuildID   20140818160207
Version   34.0a1
ro.build.version.incremental=110
ro.build.date=Fri Jun 27 15:57:58 CST 2014
B1TC00011230
Flags: needinfo?(florin.strugariu)
Attached file eme_crash.log
Here is a logcat of the automated test running and experiencing the crash.


Notable lines:
1662: Long press on homescreen's gaia-grid to open context menu
1847: Around Me collection added
1977: Marionette taps the collection icon (the step immediately preceding the crash)
2022: Report crash error


I can't really see anything there, I guess because this comes from higher up in Gecko?

We could ask the homescreen team if some improvements were made to homescreen that might have introduced this.
[Blocking Requested - why for this release]:

Okay - reflagging the nomination here since this is causing one of our tests to permafail, which makes this a qablocker.
blocking-b2g: --- → 2.1?
Whiteboard: [xfail]
Also flagging window wanted since we should be able to bisect this with the new information given.
(In reply to Jason Smith [:jsmith] from comment #24)
> Also flagging window wanted since we should be able to bisect this with the
> new information given.

Specifically looking for info on:

* A m-c tinderbox window
* Gaia/Gecko swap
* If the m-c tinderbox window does not point to b2g-inbound, then bisect on the other inbound branch
I could reproduce this with the test. I'll have a go at it, and have trevor review it, since he obviously has a much better understanding of what is going on..
Assignee: nobody → eitan
Flags: needinfo?(eitan)
We're getting similar failures on pending pull-requests related to bug 1005830:

https://tbpl.mozilla.org/?rev=ecb205a539ad792df27797ead41e7c7a8b2bcc67&tree=Gaia-Try

Is this the same issue?
Flags: needinfo?(zcampbell)
Trevor, do you have other ideas?
Attachment #8475429 - Flags: feedback?(trev.saunders)
Apparently this work, and fixes the crash. I have no idea if this is sane or not.
Attachment #8475429 - Attachment is obsolete: true
Attachment #8475429 - Flags: feedback?(trev.saunders)
Attachment #8475488 - Flags: review?(trev.saunders)
QA Whiteboard: [QAnalyst-Triage+] → [QAnalyst-Triage+][lead-review+]
Posting Regression Window to confirm patch fix above. 

B2G Inbound Regression Window:

Last Working

Device: Flame Master
BuildID: 20140815035215
Gaia: b7fa47012abbb9fa3df8fa370ce8e9dfc36f3267
Gecko: 70ce13bca890
Version: 34.0a1 (Master)
Firmware Version: v123
User Agent: Mozilla/5.0 (Mobile; rv:34.0) Gecko/34.0 Firefox/34.0

First Broken

Device: Flame Master
Build ID: 20140815035215
Gaia: b7fa47012abbb9fa3df8fa370ce8e9dfc36f3267
Gecko: 70ce13bca890
Version: 34.0a1 (Master)
Firmware Version: v123
User Agent: Mozilla/5.0 (Mobile; rv:34.0) Gecko/34.0 Firefox/34.0

Last Working Gaia and First Broken Gecko
Issue DOES NOT occur here.
Gaia: 40a5e2589b7947543ca57f40393b0c9af5e5fd74
Gecko: 70ce13bca890

Last Working Gecko and First Broken Gaia
Issue DOES occur here.
Gaia: b7fa47012abbb9fa3df8fa370ce8e9dfc36f3267
Gecko: f0fa964f10f5

Gaia Pushlog:
https://github.com/mozilla-b2g/gaia/compare/40a5e2589b7947543ca57f40393b0c9af5e5fd74...b7fa47012abbb9fa3df8fa370ce8e9dfc36f3267

Possible Causes:

Bug 1050404 - [Collections] Update to use gaia-header

Bug 1015294 - [Homescreen] Update to use gaia-header
QA Whiteboard: [QAnalyst-Triage+][lead-review+] → [QAnalyst-Triage?][lead-review+]
Flags: needinfo?(jmitchell)
QA Whiteboard: [QAnalyst-Triage?][lead-review+] → [QAnalyst-Triage+][lead-review+]
Flags: needinfo?(jmitchell)
(In reply to Wilson Page [:wilsonpage] from comment #27)
> We're getting similar failures on pending pull-requests related to bug
> 1005830:
> 
> https://tbpl.mozilla.org/
> ?rev=ecb205a539ad792df27797ead41e7c7a8b2bcc67&tree=Gaia-Try
> 
> Is this the same issue?

It looks very similar.

I can see it on several other Gaia-Try jobs too, it seems this has spread now.

That means it's more urgent to get an r? or a backout for this if we can!
Flags: needinfo?(zcampbell)
Blocks: 1056047
Comment on attachment 8475488 [details] [diff] [review]
Fix failure of GetAccessibleOrContainer() to return a parent past a shadow root.

I'm confused what the difference between these functions is, but in any case this needs a test.
Attachment #8475488 - Flags: review?(trev.saunders)
Blocks: 1011602
We need a fix for this landing today, it's blocking a load of patches.
Flags: needinfo?(trev.saunders)
Flags: needinfo?(eitan)
Blocks: gaia-header
See Also: → 1057304
OK, this patch is better documented and understood.

I have been trying for an entire day to reproduce this issue in a mochitest with no luck. When I add/remove content from a shadow dom I don't get ContentInserted/ContentRemoved. At least not for anything below the shadow root. This is very urgent for the FxOS folks, could we land this, and write tests later?
Attachment #8475488 - Attachment is obsolete: true
Attachment #8477434 - Flags: review?(trev.saunders)
Flags: needinfo?(eitan)
Comment on attachment 8477434 [details] [diff] [review]
Fix failure of GetAccessibleOrContainer() to return a parent past a shadow root.

Review of attachment 8477434 [details] [diff] [review]:
-----------------------------------------------------------------

r=me.

I managed to chat with Trevor (he is currently indisposed). Filing a follow up for tests should be fine.

::: accessible/generic/DocAccessible.cpp
@@ +1313,5 @@
> +    nsINode* parent = nullptr;
> +
> +    // If this is a content node, try to get a flattened parent content node.
> +    if (currNode->IsContent())
> +      parent = currNode->AsContent()->GetFlattenedTreeParent();

Note to self: ok should cross from shadow to host properly (where else might we need to make a change like this?)

The comment doesn't tell us anything the code doesn't ;) Maybe say something about the case this fixes?

@@ +1318,5 @@
> +
> +    // Fallback to just get parent node, in case there is no parent content
> +    // node. Or current node is not a content node.
> +    if (!parent)
> +      parent = currNode->GetParentNode();

Note to self: ... and this is what we had already.
Attachment #8477434 - Flags: review+
(In reply to David Bolter [:davidb] from comment #36)
> Comment on attachment 8477434 [details] [diff] [review]
> Fix failure of GetAccessibleOrContainer() to return a parent past a shadow
> root.
> 
> Review of attachment 8477434 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> r=me.
> 
> I managed to chat with Trevor (he is currently indisposed). Filing a follow
> up for tests should be fine.
> 
> ::: accessible/generic/DocAccessible.cpp
> @@ +1313,5 @@
> > +    nsINode* parent = nullptr;
> > +
> > +    // If this is a content node, try to get a flattened parent content node.
> > +    if (currNode->IsContent())
> > +      parent = currNode->AsContent()->GetFlattenedTreeParent();
> 
> Note to self: ok should cross from shadow to host properly (where else might
> we need to make a change like this?)
> 

I think we already made a bunch of changes, and I guess missed a few :P

> The comment doesn't tell us anything the code doesn't ;) Maybe say something
> about the case this fixes?
> 

Fixed that.

> @@ +1318,5 @@
> > +
> > +    // Fallback to just get parent node, in case there is no parent content
> > +    // node. Or current node is not a content node.
> > +    if (!parent)
> > +      parent = currNode->GetParentNode();
> 
> Note to self: ... and this is what we had already.

Try looks green enough.. landing
https://hg.mozilla.org/mozilla-central/rev/3488976ecf0f
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla34
Attachment #8477434 - Flags: review?(trev.saunders)
Flags: needinfo?(trev.saunders)
> OK, this patch is better documented and understood.
> 
> I have been trying for an entire day to reproduce this issue in a mochitest
> with no luck. When I add/remove content from a shadow dom I don't get
> ContentInserted/ContentRemoved. At least not for 

Well then your clearly doing something wrong, because this crash can only happen if it is called.

anything below the shadow
> root. This is very urgent for the FxOS folks, could we land this, and write
> tests later?


I'm dubious it couldn't wait, but what's done is done.
GetFlattenedParent() returns GetParent() by default. However you call GetParent() if GetFlattenedParent() failed. Are you trying to fix some broken bindings by that or what is it?
would be good to get comment #41 answered :) thanks
Flags: needinfo?(trev.saunders)
(In reply to alexander :surkov from comment #42)
> would be good to get comment #41 answered :) thanks

I have no clue its not my patch.
Flags: needinfo?(trev.saunders)
(In reply to alexander :surkov from comment #41)
> GetFlattenedParent() returns GetParent() by default. However you call
> GetParent() if GetFlattenedParent() failed. Are you trying to fix some
> broken bindings by that or what is it?

GetFlattenedTreeParent only returns nsIContent parents. If we reach the root content node of a document, we want to get the document node as well, which is not an nsIContent.

If we first tried to GetParentNode() and then use GetFlattenedTreeParent() for shadow root fallback, we would end up on the document fragment node, which we have no interest in, and it is not a content node, so it does not have GetFlattenedTreeParent(), and thus we are stuck.

bz said maybe a GetFlattenedTreeParentNode() method on nsINode makes sense. Possibly.
(In reply to Eitan Isaacson [:eeejay] from comment #44)
> (In reply to alexander :surkov from comment #41)
> > GetFlattenedParent() returns GetParent() by default. However you call
> > GetParent() if GetFlattenedParent() failed. Are you trying to fix some
> > broken bindings by that or what is it?
> 
> GetFlattenedTreeParent only returns nsIContent parents. If we reach the root
> content node of a document, we want to get the document node as well, which
> is not an nsIContent.

its seems to me we just want to use GetCrossShadowRootParentElement or whatever that thing is called.
(In reply to Trevor Saunders (:tbsaunde) from comment #43)
> (In reply to alexander :surkov from comment #42)
> > would be good to get comment #41 answered :) thanks
> 
> I have no clue its not my patch.

sorry, wrong ID :)

(In reply to Eitan Isaacson [:eeejay] from comment #44)
> (In reply to alexander :surkov from comment #41)
> bz said maybe a GetFlattenedTreeParentNode() method on nsINode makes sense.
> Possibly.

that sounds good and less suspicious than current code, wanna go this way?

(In reply to Trevor Saunders (:tbsaunde) from comment #45)
> (In reply to Eitan Isaacson [:eeejay] from comment #44)
> its seems to me we just want to use GetCrossShadowRootParentElement or
> whatever that thing is called.

can you point out exact name pls? I don't recall this method
(In reply to alexander :surkov from comment #46)
> (In reply to Trevor Saunders (:tbsaunde) from comment #43)
> > (In reply to alexander :surkov from comment #42)
> > > would be good to get comment #41 answered :) thanks
> > 
> > I have no clue its not my patch.
> 
> sorry, wrong ID :)
> 
> (In reply to Eitan Isaacson [:eeejay] from comment #44)
> > (In reply to alexander :surkov from comment #41)
> > bz said maybe a GetFlattenedTreeParentNode() method on nsINode makes sense.
> > Possibly.
> 
> that sounds good and less suspicious than current code, wanna go this way?
> 
> (In reply to Trevor Saunders (:tbsaunde) from comment #45)
> > (In reply to Eitan Isaacson [:eeejay] from comment #44)
> > its seems to me we just want to use GetCrossShadowRootParentElement or
> > whatever that thing is called.
> 
> can you point out exact name pls? I don't recall this method

nsINode::GetParentElementCrossingShadowRoot apparently because who likes consistancy anyway.
(In reply to Trevor Saunders (:tbsaunde) from comment #47)
> nsINode::GetParentElementCrossingShadowRoot apparently because who likes
> consistancy anyway.

From listening in to a conversation you and smaug were having on IRC, I got the impression that method does not work for crossing from the light dom into the shadow dom. I might have totally misunderstood.
(In reply to Trevor Saunders (:tbsaunde) from comment #47)

> nsINode::GetParentElementCrossingShadowRoot apparently because who likes
> consistancy anyway.

it seems it doesn't take into account XBL stuff but anyway, it has same issue as GetFlattenedParent(), i.e. it doesn't return document node.
(In reply to alexander :surkov from comment #49)
> (In reply to Trevor Saunders (:tbsaunde) from comment #47)
> 
> > nsINode::GetParentElementCrossingShadowRoot apparently because who likes
> > consistancy anyway.
> 
> it seems it doesn't take into account XBL stuff but anyway, it has same

Well, neither does GetParentNode afaik

> issue as GetFlattenedParent(), i.e. it doesn't return document node.

I think that's actually fine so long as you don't ask for a direct child of the root doc which I don't think we ever will.
(In reply to Trevor Saunders (:tbsaunde) from comment #50)
> (In reply to alexander :surkov from comment #49)
> > (In reply to Trevor Saunders (:tbsaunde) from comment #47)
> > 
> > > nsINode::GetParentElementCrossingShadowRoot apparently because who likes
> > > consistancy anyway.
> > 
> > it seems it doesn't take into account XBL stuff but anyway, it has same
> 
> Well, neither does GetParentNode afaik

it does

> > issue as GetFlattenedParent(), i.e. it doesn't return document node.
> 
> I think that's actually fine so long as you don't ask for a direct child of
> the root doc which I don't think we ever will.

yes but it's not what Eitan's comment #44 is about
Flags: qe-verify-
Switching the 2.1?->2.1+, on these fixed bugs as these are regression.

Nothing to land here, its just flag-cleanup of 2.1? list. Please Ni me if there is confusion/disagreement.
blocking-b2g: 2.1? → 2.1+
VERIFIED FIXED on v2.1.

Environmental Variables:
Device: Flame 2.1 KK (319 MB)
BuildID: 20141011000201 (full flash)
Gaia: f5d4ff60ffed8961f7d0380ada9d0facfdfd56b1
Gecko: d813d79d3eae
Gonk: 52c909e821d107d414f851e267dedcd7aae2cebf
Version: 34.0a2 (2.1)
Firmware: V180
User Agent: Mozilla/5.0 (Mobile; rv:34.0) Gecko/34.0 Firefox/34.0

Smart collection add and launch functions as expected.
Status: RESOLVED → VERIFIED
QA Whiteboard: [QAnalyst-Triage+][lead-review+] → [QAnalyst-Triage?][lead-review+]
Flags: needinfo?(ktucker)
QA Whiteboard: [QAnalyst-Triage?][lead-review+] → [QAnalyst-Triage+][lead-review+]
Flags: needinfo?(ktucker)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: