Closed Bug 1219153 Opened 9 years ago Closed 9 years ago

Web component on homescreen w/ pinned tabs triggers MOZ_ASSERT(IsFrameListSorted<IsLessThanOrEqual>(aFrameList), "After we sort a frame list, it should be in sorted order..."); at layout/generic/nsIFrame.h:3550

Categories

(Firefox OS Graveyard :: Gaia, defect, P2)

defect

Tracking

(blocking-b2g:2.5+, b2g-v2.5 fixed)

RESOLVED FIXED
blocking-b2g 2.5+
Tracking Status
b2g-v2.5 --- fixed

People

(Reporter: gwagner, Assigned: dholbert)

References

Details

Attachments

(5 files, 1 obsolete file)

Attached file gdb bt
Seen on B2G bit current tip and debug gecko.

STR: I added a few pinned pages via the browser
Build ID               20151028125855
Gaia Revision          ed72b65f23af569a763e94dfde4c71242c4f3072
Gaia Date              2015-10-27 17:55:37
Gecko Revision         n/a
Gecko Version          44.0a1
Device Name            aries
Firmware(Release)      4.4.2
Firmware(Incremental)  eng.naoki.20151007.074137
Firmware Date          Wed Oct  7 07:41:46 PDT 2015
Bootloader             s1
[Blocking Requested - why for this release]:

This happens in the homescreen app and I get an endless homescreen reboot cycle. Restarting the phone doesn't fix the problem.
blocking-b2g: --- → 2.5?
Roc, do you know if anything around this code changed recently?
Flags: needinfo?(roc)
I don't know of anything. Shouldn't be hard to get a regression range.
Flags: needinfo?(roc)
Updating keyword to get Regression window.
I cannot reproduce this bug on today's Aries. Added 7 pinned pages and 5 pinned sites and didn't see this error in logcat and didn't see anything visually weird on Homescreen.

Device: Aries Master
BuildID: 20151029124810
Gaia: 91cac94948094cfdcd00cba5c6483e27e80cb3b0
Gecko: acb3f4ac5424181d3d4d73283668162137918cf1
Gonk: 2916e2368074b5383c80bf5a0fba3fc83ba310bd
Version: 45.0a1 (Master) 
Firmware Version: D5803_23.1.A.1.28_NCB.ftf
User Agent: Mozilla/5.0 (Mobile; rv:45.0) Gecko/45.0 Firefox/45.0
Gregor can we get more information from you here?  All you did was pin some pages to the homescreen and then it caused your homescreen to reboot constantly?

Can we possibly get a list (incomplete is fine) of the pages you added or at least an estimate of how many?

Thanks
Flags: needinfo?(anygregor)
That only happens on debug gecko builds :/
Flags: needinfo?(anygregor)
Maybe a flexbox issue given the top stack frames in the attachment?

Is the homescreen using flexbox layout with the 'order' CSS property by any chance?
Flags: needinfo?(dholbert)
Yup, it looks like that this is a flex container w/ "order" set on some children. (And MergeSort is failing to sort them, somehow.)

This is surprising & somewhat troubling, but unlikely to be dangerous; for now, we can just soften the assertion to NS_ASSERTION (spammy but non-fatal) to fix the reboot loop. It's only fatal (MOZ_ASSERT) right now because it's so obvious that it should succeed, not because anything particularly bad happens if it fails. (just potentially broken layout)

A frame-tree dump at the time of the assertion-failure would be somewhat handy (combined with the "this" pointer of the nsFlexContainerFrame at stack-level 2), though I'm not sure how easy it is to extract that from the homescreen on B2G....
(In reply to Daniel Holbert [:dholbert] from comment #10)
> Yup, it looks like that this is a flex container w/ "order" set on some
> children.

(I guess technically we don't know if 'order' has been set at this point, but it seems likely.  We do know, however, that the children have been reordered in the DOM, which is why we're using IsOrderLEQWithDOMFallback instead of just IsOrderLEQ)
Flags: needinfo?(dholbert)
Here's a band-aid patch which we could take if we need to.

First, though, I'm curious if we have concrete STR, and if it's possible to get a regression range using debug builds.
(ni=gwagner to see if this is reproducible)
[not sure if your phone's still stuck in reboot-mode or if you've restored it to working order & could try to repro again]
Flags: needinfo?(anygregor)
See Also: → 1221112
Ah, bug 1059138 is one way this can happen. And bug 1221112 may have given us a testcase...
See Also: → 1059138
(In reply to Daniel Holbert [:dholbert] from comment #13)
> (ni=gwagner to see if this is reproducible)
> [not sure if your phone's still stuck in reboot-mode or if you've restored
> it to working order & could try to repro again]

Thanks Daniel! That fixes the homescreen reboot loop.
Flags: needinfo?(anygregor)
The assertion-silencing patch does, you mean? That's good news.

With any luck, the patch I landed this morning on bug 1221112 may fix this for reals.
We were unable to reprduce this issue but it sounds like the window here is no longer necessary.  If I'm incorrect please add the tag again.
Gregor, could you check if bug 1221112 (which merged to m-c 12 hours ago) fixed this issue?
Flags: needinfo?(anygregor)
(In reply to Daniel Holbert [:dholbert] from comment #18)
> Gregor, could you check if bug 1221112 (which merged to m-c 12 hours ago)
> fixed this issue?

No, I still see it :/
Flags: needinfo?(anygregor)
Jet, We have made this a blocker for 2.5. Can you please find an owner to fix this and put it up for approval once fixed?

Thanks
blocking-b2g: 2.5? → 2.5+
Flags: needinfo?(bugs)
Priority: -- → P2
I would be the obvious owner, so I'll chime in here.

As a 2.5 blocker, for the moment, I'd suggest just softening the assertion with the attached patch. As noted in comment 10, this is unexpected, but worst-case it just means the layout might end up slightly wrong. (Or it might be right, and we're just getting something wrong & failing this assertion when we're trying to verify that it's right.)

In the meantime, I can take some time to flash a debug build onto a device to see if I can reproduce & investigate further into what's actually going on.
Flags: needinfo?(bugs) → needinfo?(dholbert)
Good news; I can reproduce the fatal assert on a desktop b2g build, with a fresh gaia profile & 2 pinned pages.

Also, I get zillions of this related nonfatal assertion before the fatal assertion happens:
[Parent 20491] ###!!! ASSERTION: Child frames aren't sorted correctly: '(!mFrames.IsEmpty() && mFrames.FirstChild()->GetContent()->GetContainingShadow()) || nsIFrame::IsFrameListSorted<IsOrderLEQWithDOMFallback>(mFrames)', file /scratch/work/builds/b2g-inbound/mozilla/layout/generic/nsFlexContainerFrame.cpp, line 1986

More info shortly, after lunch + a gdb session...
(and the zillions of nonfatal assertions start happening pretty much right away, before I've done anything.)
gdb shows that this is a version of bug 1059138. Basically:
 - There's a flex container ("gaia-container") which has some children which are part of a shadow DOM, and some children which are not.
 - It also has some children which have "order" set, so it has to sort its children (and asserts that it sorted them correctly)
 - It sorts by "order", though among items with equal "order" values (like the default value) it sorts by DOM order, using nsContentUtils::PositionIsBefore().
 - BUT, unfortunately, nsContentUtils::PositionIsBefore() is not reliable (it always returns false) if one of the elements is in a shadow DOM.

So, this means the sort operation is unreliable, because our "lessThan" operator effectively returns false more often than it should, and so when we go through the just-sorted list to be sure it's actually sorted, we find two adjacent elements where "lessThan" surprisingly returns false (because one of them is in the shadow DOM).

For the time being, I think we should just soften this assertion, and keep an eye out for layout issues.  And really, we should extend nsContentUtils::PositionIsBefore() to actually work correctly w/ elements in a shadow DOM.
Depends on: 1059138
Flags: needinfo?(dholbert)
(Note that this is only a problem if webcomponents are preffed on, and they're still preffed off by default in Nightly, but b2g has them enabled.)
Comment on attachment 8681352 [details] [diff] [review]
possible band-aid: use non-fatal assertion

gregor, is reducing the assertion-severity OK with you as a way of addressing this startup-abort, for the time being? As noted above, a real fix will require some work from someone who knows a bit more about shadow DOM than I do, I think. (probably over on bug 1059138)

In the meantime, there may be some assertion-spam as a result of this (in debug builds), but it's better than an abort at least...
Attachment #8681352 - Flags: review?(anygregor)
(I may be able to take the shadow DOM work, but I can't really dive in at the moment, and I don't feel comfortable committing to figuring that out & fixing & backporting that as a FxOS 2.5 blocker, at this point.)
Assignee: nobody → dholbert
(In reply to Daniel Holbert [:dholbert] from comment #26)
> Comment on attachment 8681352 [details] [diff] [review]
> possible band-aid: use non-fatal assertion
> 
> gregor, is reducing the assertion-severity OK with you as a way of
> addressing this startup-abort, for the time being? As noted above, a real
> fix will require some work from someone who knows a bit more about shadow
> DOM than I do, I think. (probably over on bug 1059138)
> 
> In the meantime, there may be some assertion-spam as a result of this (in
> debug builds), but it's better than an abort at least...

Yeah thats fine with me. At least we can test the feature with debug mode.
Comment on attachment 8681352 [details] [diff] [review]
possible band-aid: use non-fatal assertion

Thanks. I suppose I should get review from a fellow layout person - tagging mats.

Mats, see comment 24 for what's going on here.  Basically, we're fatally-asserting something that we can't reliably enforce right now, when web components (shadow DOM, really) are enabled.

I hope I or a DOM person can make this better in bug 1059138, but for now, this can't be fatal because b2g debug is tripping over it.

Commit message will be something like:
>Bug 1219153: Reduce the severity of a flex item sortedness assertion that can be made to fail with web components.
Attachment #8681352 - Flags: review?(anygregor) → review?(mats)
Comment on attachment 8681352 [details] [diff] [review]
possible band-aid: use non-fatal assertion

Actually, I think I found a gaia hackaround which will prevent gaia from hitting this (at least in this spot).  Canceling review.
Attachment #8681352 - Flags: review?(mats)
This removes all sortedness assertions from my debug output (both fatal & non-fatal).

The node in question here, which was part of the shadow-DOM, was an all-whitespace node (which I'm deleting here by removing this whitespace).

(tangent: normally this node wouldn't even show up in the frame tree -- we don't make anonymous flex items around all-whitespace runs -- but in this case there are some adjacent placeholder frames which get grouped together with it into an anonymous flex item, so the all-whitespace run *does* end up in the frame tree [as the first child of this anonymous flex item], and so it gets used when we try to sort that anonymous flex item by DOM order.  If we remove this node, then the anonymous flex item will just have the abspos placeholders, and we'll use them for sorting instead, and it seems they're not part of a shadow DOM so they don't break stuff in the way that this whitespace broke stuff.)
Comment on attachment 8684445 [details] [diff] [review]
gaia hackaround: remove whitespace from gaia-container web component

gwagner, one last "please test this" request from me here (I think) -- could you give this patch a try and see if it fixes the abort, to sanity-check my promising local results?
Attachment #8684445 - Flags: feedback?(anygregor)
Looks like @CWiiis has done most of the work on apps/homescreen/bower_components/gaia-container/script.js (according to git blame at least), so I'll tag him on for review on this.
Attachment #8684459 - Flags: review?(chrislord.net)
Attachment #8684456 - Attachment is obsolete: true
Comment on attachment 8684459 [details] [review]
gaia hackaround, as pull request [later redone as pull request to "gaia-container" project]

I'm happy with this change and it's an r+ from me, but this is an external component, so the change needs to be checked in here: https://gitlab.com/Cwiiis/gaia-container and then the component needs to be updated and checked in to Gaia (by installing bower, and running 'bower update gaia-container' in the homescreen directory).

If you could open a merge request for the gaia-container repository and a pull-request for Gaia, that'd be appreciated :) If you don't have bower installed, I don't mind handling the latter half of that.
Attachment #8684459 - Flags: review?(chrislord.net) → feedback+
(In reply to Chris Lord [:cwiiis] from comment #35)
> I'm happy with this change and it's an r+ from me, but this is an external
> component, so the change needs to be checked in here:
> https://gitlab.com/Cwiiis/gaia-container

Gotcha -- created an account & cloned & submitted a merge request there:
  https://gitlab.com/Cwiiis/gaia-container/merge_requests/11

> and then the component needs to be
> updated and checked in to Gaia (by installing bower, and running 'bower
> update gaia-container' in the homescreen directory).
> 
> If you could open a merge request for the gaia-container repository and a
> pull-request for Gaia, that'd be appreciated :) If you don't have bower
> installed, I don't mind handling the latter half of that.

I don't have bower installed.  If you'd be kind enough to handle that second half, I would be much obliged. Thanks!
Comment on attachment 8684551 [details] [review]
[gaia] Cwiiis:bug1219153-gaia-container-gecko-bug-workaround > mozilla-b2g:master

Will land when tests come back green.
Attachment #8684551 - Flags: review+
Merged: https://github.com/mozilla-b2g/gaia/commit/844661cdb33ef72c01066593111a8a8070b3f4d9

Not sure if this bug should be marked as fixed or not, leaving n?dholbert
Flags: needinfo?(dholbert)
Attachment #8684459 - Attachment description: gaia hackaround, as pull request → gaia hackaround, as pull request [later redone as pull request to "gaia-container" project]
Flags: needinfo?(dholbert)
Thanks! I think we can mark this FIXED; the underlying platform issue is still tracked in bug 1059138, but this instance w/ the homescreen should no longer be a problem, based on my earlier testing.  (Gregor or QA can reopen if I'm wrong on this.)
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Comment on attachment 8684445 [details] [diff] [review]
gaia hackaround: remove whitespace from gaia-container web component

(Converting feedback request (for sanity-checking that this fixes things) to needinfo, now that this has been merged to mainline gaia.)
Flags: needinfo?(anygregor)
Attachment #8684445 - Flags: feedback?(anygregor)
Once we've confirmed that this fixes things on Gaia trunk, we can request approval to uplift for 2.5, I guess. (since this is a 2.5 blocker)
[reclassifying bug as Firefox OS | Gaia to make the component & the code-that-is-changing-to-fix-this match. There is also a Gecko bug here, which is tracked in bug 1059138 as noted above.]
Component: Layout → Gaia
Product: Core → Firefox OS
Summary: MOZ_ASSERT(IsFrameListSorted<IsLessThanOrEqual>(aFrameList), "After we sort a frame list, it should be in sorted order..."); at layout/generic/nsIFrame.h:3550 → Web component on homescreen w/ pinned tabs triggers MOZ_ASSERT(IsFrameListSorted<IsLessThanOrEqual>(aFrameList), "After we sort a frame list, it should be in sorted order..."); at layout/generic/nsIFrame.h:3550
All good now!
Flags: needinfo?(anygregor)
That's great news!

Cwiiis, any chance I could lean on you to generate the uplift-to-Gaia-2.5 pull request here, using the same process you used to generate the gaia Trunk pull request?  (If you post an attachment for that pull request here, I can take care of requesting approval & filling out the approval Q&A.)
Flags: needinfo?(chrislord.net)
Comment on attachment 8684551 [details] [review]
[gaia] Cwiiis:bug1219153-gaia-container-gecko-bug-workaround > mozilla-b2g:master

[Approval Request Comment]
[Bug caused by] (feature/regressing bug #):
[User impact] if declined: Crashing when using pinned pages
[Testing completed]: On master for a while
[Risk to taking this patch] (and alternatives if risky): Very low risk (just whitespace change in markup)
[String changes made]: None
Flags: needinfo?(chrislord.net)
Attachment #8684551 - Flags: approval-gaia-v2.5?
(In reply to Daniel Holbert [:dholbert] from comment #45)
> That's great news!
> 
> Cwiiis, any chance I could lean on you to generate the uplift-to-Gaia-2.5
> pull request here, using the same process you used to generate the gaia
> Trunk pull request?  (If you post an attachment for that pull request here,
> I can take care of requesting approval & filling out the approval Q&A.)

We should just be able to flag master PRs for uplift at the moment :)
Comment on attachment 8684551 [details] [review]
[gaia] Cwiiis:bug1219153-gaia-container-gecko-bug-workaround > mozilla-b2g:master

Approved for 2.5 uplift. 

Thanks
Attachment #8684551 - Flags: approval-gaia-v2.5? → approval-gaia-v2.5+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: