Closed Bug 1299736 Opened 3 years ago Closed 3 years ago
_CRASH(unexpected child list) in ns Block Frame::Remove Frame
This bug was filed from the Socorro interface and is report bp-3b3d40b0-3b32-4f01-9da5-939892160901. ============================================================= Crash that first appeared in the 8-30 Nightly, and is already the #5 top crash for that Nightly with 38 crashes. Regression range is , but I'm not sure of the exact bug.  https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=1a5b53a831e5a6c20de1b081c774feb3ff76756c&tochange=26e22af660e543ebb69930f082188b69ec756185
Taking a longer view this crash has come and gone for a while now, there does seem to be a spike in 51.0a1 starting on 8-30 though. https://crash-stats.mozilla.com/search/?signature=~nsBlockFrame%3A%3ARemoveFrame&date=%3E%3D2016-03-30&date=%3C2016-08-31&_sort=-date&_facets=signature&_facets=version&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#crash-reports
[Tracking Requested - why for this release]: possible crash regression I was able to reproduce this in a Linux Nightly using the set of URLs in bug 1300123 comment 2: bp-a680f2d8-431b-4670-b3d2-c46432160902
... and again by reloading a few tabs: bp-e5abd5b3-25d6-4f9b-a81b-04f512160902
I haven't managed to crash it in a debug build yet (I always get bug 1300123 instead!), but it seems quite clear from the above crashes that since we're coming from nsPlaceholderFrame::DestroyFrom that the OOF is an abspos frame and that we should've taken the GetAbsoluteContainingBlock() path in nsFrameManager::RemoveFrame but didn't: https://hg.mozilla.org/mozilla-unified/annotate/506facea6316/layout/base/nsFrameManager.cpp#l505 The most likely culprits in the regression range seems to be bug 1251075 and bug 1258911.
Tracking 51+ - makes sense to track this crash to see if we can diagnose what might be going on.
The regression range from comment 0 is quite solid due to the large number of crashes. A better crash-stats link is perhaps: https://crash-stats.mozilla.com/search/?signature=~nsBlockFrame%3A%3ARemoveFrame&date=%3E%3D2016-03-30&moz_crash_reason=%3DMOZ_CRASH%28unexpected%20child%20list%29&_sort=-date&_facets=signature&_facets=version&_facets=build_id&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=moz_crash_reason&_columns=url&_columns=user_comments#facet-build_id
I read through the relevant code and I suspect the problem may be the code in RestyleManagerBase::ProcessRestyledFrames that calls cont->MarkAsNotAbsoluteContainingBlock().
It took me a little longer than I thought to figure out how to use that code to get the crash to occur, but here's a testcase. Explanation shortly...
(1) So, in this testcase, style change processing will generate nsChangeHint_UpdateContainingBlock for *two* different frames, A and B. (2) When we process the hint for A, NeedToReframeForAddingOrRemovingTransform *incorrectly* returns false, since FrameHasPositionedPlaceholderDescendants incorrectly returns false, since it calls f->IsAbsPosContainingBlock and f->IsFixedPosContainingBlock on B, which return the *new* state on B (reflecting the results of the style change) rather than the *old* state on B (which is the state of the current frame tree, since we haven't processed the change hints yet). (3) Because of this, RestyleManagerBase::ProcessRestyledFrames goes on to test IsAbsPosContainingBlock() on A, which returns false, and it thus calls cont->MarkAsNotAbsoluteContainingBlock() on A. (That call is somewhat dangerous and I suspect not a particularly valuable optimization. At the very least we should condition it on actually checking for a lack of absolute frames, with an assertion if we've left any.) (4) Then we process the change hint for B, where NeedToReframeForAddingOrRemovingTransform will return true, as it should. When we reframe, we try to remove the frame for C from A, but nsFrameManager::RemoveFrame takes the wrong path because IsAbsoluteContainer() returns false for A, because of point (2) above. I think we need to fix two things here: (A) the issue in (2) that the patch in https://hg.mozilla.org/mozilla-central/rev/6f9fa77780cc is considering the post-style-change state of descendants when it needs to be considering the current state of the frame tree, which is pre-style-change. (I think this was partly at my suggestion during the review. bz's original patch posted for review did incorrectly depend on post-style-change state, but the particular dependency this testcase uses was introduced during review at my suggestion (bug 1258911 comment 23). I haven't thought much about whether I could trigger the bug without the change I suggested.) (B) I think we should make the code in (3) somewhat more defensive, so that if there are remaining absolute frames, it will assert, *instead of* (even in opt builds) calling MarkAsNotAbsoluteContainingBlock.
I suppose it's possible (and if it is possible, probably more efficient) to skip fixing (A), assuming we can guarantee that the cases where we fail to reframe on an ancestor will reframe a child and fix up the frame tree anyway. I'd want to think through more carefully what happens when changing transforms the opposite way (adding to ancestor and removing from descendant), though.
Oh, and I should clarify that I didn't actually verify (1) through (4) in comment 9, but they were the among the observations (looking at the code) that led me to write a testcase that crashed where I expected it to, so I believe them to be correct.
Assignee: nobody → dbaron
Status: NEW → ASSIGNED
I believe this should be safe because we don't use IsAbsoluteContainer (which just returns the bit that MarkAs(Not)AbsoluteContainingBlock sets) for decisions about whether a frame *is* an absolute containing block, other than to check that the frame we've chosen actually supports absolute children. MozReview-Commit-ID: 4DYtXqNYfPq
I suppose leaving this code in is a little risky, but so far I haven't been able to think of things that will go wrong once the other fix is present. MozReview-Commit-ID: FStXA2IyPnP
David, thank you for digging into this. > assuming we can guarantee that the cases where we fail to reframe > on an ancestor will reframe a child and fix up the frame tree anyway. I was certainly assuming that. That's the only thing that makes any of the "is the child a containing block of the right sort already?" optimizations from bug 1258911 at all ok. I _think_ this really does work.. > because we don't use IsAbsoluteContainer (which just returns > the bit that MarkAs(Not)AbsoluteContainingBlock sets) for > decisions about whether a frame *is* an absolute containing block We sort of do. See nsCSSFrameConstructor::GetAbsoluteContainingBlock which will stop at the first IsAbsPosContainingBlock() that is also IsAbsoluteContainer() but will skip over things that are IsAbsPosContainingBlock but not IsAbsoluteContainer(). I guess your real claim is that there are no cases when having IsAbsoluteContainer() incorrectly test true will cause a thing that was not IsAbsPosContainingBlock() to test as one. I _think_ that's true at least for the frame constructor. But I'm not sure what nsGridContainerFrame::Grid::PlaceGridItems is doing with IsAbsoluteContainer()... I guess it might be ok as long as the absolute child list is empty? Similar for nsGridContainerFrame::ReflowChildren. In general, it worries me that we can toggle style back and forth and leave the frame tree in a different state than we started with in terms of the absolute-container bit. That seems like fragility ripe for something breaking. That said, I did walk through the IsAbsoluteContainer callers and they all _seem_ OK at first glance with this setup... It still seems to me like it might be simpler to just take out the optimization attempts in FrameHasPositionedPlaceholderDescendants altogether. I was somewhat dubious about them to start with... Another option might be to twiddle the absolute-container bit after all the other changehint processing is done. At that point we have done whatever reframing of descendants we plan to do, right? Would that also solve the problem?
Flags: needinfo?(bzbarsky) → needinfo?(dbaron)
(In reply to Boris Zbarsky [:bz] from comment #14) > But I'm not sure what > nsGridContainerFrame::Grid::PlaceGridItems / ReflowChildren is doing with > IsAbsoluteContainer()... Both of those are just optimizations. The assumption is that if IsAbsoluteContainer() is false then the frame can't have any child frames on its absolute child list. > I guess it might be ok as long as the absolute > child list is empty? Yes.
~800 crashes on Aurora/51 and 52 in the last week... We should land and uplift a fix for this. I hit it twice in the last 10 minutes on a particular site (details if needed). https://crash-stats.mozilla.com/report/index/327942ec-3081-4994-aecd-589e82161003 https://crash-stats.mozilla.com/report/index/6cb94736-8b96-4a3a-9ba3-842882161003 David? Boris?
Yes, we need to land and uplift a fix for this. I'm waiting for David to get back to me on comment 14...
Crash volume for signature 'nsBlockFrame::RemoveFrame': - nightly (version 52): 562 crashes from 2016-09-19. - aurora (version 51): 462 crashes from 2016-09-19. - beta (version 50): 1 crash from 2016-09-20. - release (version 49): 1 crash from 2016-09-05. - esr (version 45): 0 crashes from 2016-06-01. Crash volume on the last weeks (Week N is from 10-03 to 10-09): W. N-1 W. N-2 - nightly 267 295 - aurora 413 49 - beta 1 0 - release 1 0 - esr 0 0 Affected platforms: Windows, Mac OS X, Linux Crash rank on the last 7 days: Browser Content Plugin - nightly #12 #4 - aurora #13 #8 - beta #3781 - release #32503 - esr
Comment on attachment 8788694 [details] [diff] [review] Don't call MarkAsNotAbsoluteContainingBlock if frame still has absolute children New patch posted, but I can't seem to request review to you. And, also, mozreview decided it was a different patch despite my intentionally keeping the same mozreview-request-id.
Attachment #8788694 - Attachment is obsolete: true
Attachment #8788695 - Attachment is obsolete: true
Comment on attachment 8797313 [details] Bug 1299736 - Remove unsafe optimizations from FrameHasPositionedPlaceholderDescendants. https://reviewboard.mozilla.org/r/82910/#review81812 r=me
Attachment #8797313 - Flags: review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/9d592efc88512b8ba5a2b38a61ae4a2f3cbd846a Bug 1299736 - Remove unsafe optimizations from FrameHasPositionedPlaceholderDescendants. r=bzbarsky
Fixing the normal screwups by the incompetent-as-always release management account bot.
Comment on attachment 8797313 [details] Bug 1299736 - Remove unsafe optimizations from FrameHasPositionedPlaceholderDescendants. Approval Request Comment [Feature/regressing bug #]: bug 1258911 [User impact if declined]: crashes [Describe test coverage new/current, TreeHerder]: crashtest in the tree [Risks and why]: low risk; removes an incorrect optimization added in bug 1258911 [String/UUID change made/needed]: no
Attachment #8797313 - Flags: approval-mozilla-aurora?
Comment on attachment 8797313 [details] Bug 1299736 - Remove unsafe optimizations from FrameHasPositionedPlaceholderDescendants. Fix a crash. Take it in 51 aurora.
Attachment #8797313 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.