Closed
Bug 1261351
Opened 8 years ago
Closed 8 years ago
B2G debug segfault on MOZ_ASSERT(!mOldRootNode, "Should have GCed old root node"), because undisplayed map (for handling display:none) doesn't work right for children of shadow roots
Categories
(Core :: CSS Parsing and Computation, defect)
Core
CSS Parsing and Computation
Tracking
()
RESOLVED
FIXED
mozilla50
Tracking | Status | |
---|---|---|
firefox50 | --- | fixed |
People
(Reporter: gerard-majax, Unassigned)
References
Details
(Keywords: assertion, crash, Whiteboard: fixed-in-pine)
Attachments
(4 files, 2 obsolete files)
Hitting this when loading gaia on mulet debug. More to come ...
Reporter | ||
Comment 1•8 years ago
|
||
Reporter | ||
Comment 2•8 years ago
|
||
Bobby, does it looks like something you can easily find why out of the stack?
Flags: needinfo?(bobbyholley)
Reporter | ||
Comment 3•8 years ago
|
||
Removing the assert, I hit:
> Assertion failure: !geckoStyleSet || geckoStyleSet->GetRuleTree() == mSource.AsGeckoRuleNode()->RuleTree() || geckoStyleSet->IsInRuleTreeReconstruct() (destroying style context from old rule tree too late), at /home/alex/codaz/Mozilla/gecko-cinnabar/layout/style/nsStyleContext.cpp:198
At shutdown.
Reporter | ||
Comment 4•8 years ago
|
||
Updated•8 years ago
|
Comment 5•8 years ago
|
||
Both assertions indicate to me that somebody is holding a reference to an nsStyleContext such that it stays alive past a style reconstruction. I think this could mean either that some nodes in the tree aren't getting properly restyled (and thus keeping their old style context), or that somebody is holding an external reference to it. The reason there are two assertions is that there are two trees, the nsStyleContext tree and the nsRuleNode tree. The former holds strong references to the latter. So after a reconstruction, we first assert that all the nsRuleNodes have been destroyed (culminating in destroying the root, since children hold strong refs to parents). If this isn't true, this means that somebody is holding something in the nsRuleNode tree alive, most likely an nsStyleContext. The second assertion confirms this hypothesis: an nsStyleContext finally gets destroyed, and when it does we notice that it references an nsRuleNode in an out-of-date tree. And the second assertion stacktrace _also_ tells us who was holding a style context alive past the reconstruction: a mozilla::UndisplayedNode, which does indeed seem to hold a strong reference to an nsStyleContext. I don't know enough about undisplayed nodes and the invariants of reconstruction to tell exactly what the bug is here, but presumably reconstruction should traverse the hash table of undisplayed nodes.
Flags: needinfo?(bobbyholley) → needinfo?(dbaron)
Reporter | ||
Comment 6•8 years ago
|
||
Right, thanks for this explanation. Out of curiosity, is it possible there is some relationship with what got fixed recently in bug 1253788 ?
Flags: needinfo?(bobbyholley)
Comment 7•8 years ago
|
||
(In reply to Alexandre LISSY :gerard-majax from comment #6) > Right, thanks for this explanation. Out of curiosity, is it possible there > is some relationship with what got fixed recently in bug 1253788 ? This is almost certainly related to bug 1258017. But I'm not convinced it's a regression, since the new asserts may just be uncovering latent bugs.
Flags: needinfo?(bobbyholley)
Reporter | ||
Comment 8•8 years ago
|
||
Not sure if it is of any help, but tapping into AddRef/Release method, I can see the MOZ_ASSERT() gets triggered while mRefCnt is 5:
> nsRuleNode::Release(): this=0x7f2b51cd2020 mRefCnt=5
> nsStyleSet::GCRuleTrees(): This is going to be a problem, mOldRootNode=0x7f2b51cd2020
Reporter | ||
Comment 9•8 years ago
|
||
including restyle logs
Reporter | ||
Comment 10•8 years ago
|
||
I am not sure if that huge log file brings anything useful, maybe you can find the undisplayed items inside ? Anyway it's huge once uncompressed (437M)
Flags: needinfo?(bobbyholley)
Comment 11•8 years ago
|
||
The next step here is to get input from dbaron (or another style system peer) on how this is supposed to work.
Updated•8 years ago
|
Flags: needinfo?(bobbyholley)
(In reply to Bobby Holley (busy) from comment #5) > I don't know enough about undisplayed nodes and the invariants of > reconstruction to tell exactly what the bug is here, but presumably > reconstruction should traverse the hash table of undisplayed nodes. It does, via ElementRestyler::RestyleUndisplayedNodes. Presumably fuzzers should be able to find a simple testcase for this (or perhaps fuzzing-related tools could reduce gaia to a simple testcase); I don't think it's worth debugging gaia directly.
Flags: needinfo?(dbaron)
Comment 13•8 years ago
|
||
(In reply to David Baron [:dbaron] ⌚️UTC-7 from comment #12) > (In reply to Bobby Holley (busy) from comment #5) > > I don't know enough about undisplayed nodes and the invariants of > > reconstruction to tell exactly what the bug is here, but presumably > > reconstruction should traverse the hash table of undisplayed nodes. > > It does, via ElementRestyler::RestyleUndisplayedNodes. I see. > Presumably fuzzers should be able to find a simple testcase for this (or > perhaps fuzzing-related tools could reduce gaia to a simple testcase); I > don't think it's worth debugging gaia directly. Well, I assume Alexandre wants to debug this. Alexandre, the way to debug this is to figure out why we don't call ElementRestyler::RestyleUndisplayedNodes between and StyleSet::BeginReconstruct and StyleSet::EndReconstruct.
Reporter | ||
Comment 14•8 years ago
|
||
(In reply to Bobby Holley (busy) from comment #13) > (In reply to David Baron [:dbaron] ⌚️UTC-7 from comment #12) > > (In reply to Bobby Holley (busy) from comment #5) > > > I don't know enough about undisplayed nodes and the invariants of > > > reconstruction to tell exactly what the bug is here, but presumably > > > reconstruction should traverse the hash table of undisplayed nodes. > > > > It does, via ElementRestyler::RestyleUndisplayedNodes. > > I see. > > > Presumably fuzzers should be able to find a simple testcase for this (or > > perhaps fuzzing-related tools could reduce gaia to a simple testcase); I > > don't think it's worth debugging gaia directly. > > Well, I assume Alexandre wants to debug this. Yes we do, this is breaking any B2G transition debug build (mulet or device, as Alberto reported today). > > Alexandre, the way to debug this is to figure out why we don't call > ElementRestyler::RestyleUndisplayedNodes between and > StyleSet::BeginReconstruct and StyleSet::EndReconstruct. Perfect, I will investigate around this then!
Reporter | ||
Comment 15•8 years ago
|
||
I'm having a hard time finding a way to link calls to ElementRestyler::RestyleUndisplayedNodes with the MOZ_ASSERT()
Reporter | ||
Comment 16•8 years ago
|
||
Changing title since Alberto reported this also happens on device.
Summary: Mulet debug segfault on MOZ_ASSERT(!mOldRootNode, "Should have GCed old root node"); → B2G debug segfault on MOZ_ASSERT(!mOldRootNode, "Should have GCed old root node");
Comment 17•8 years ago
|
||
A tool like https://github.com/MozillaSecurity/lithium/ can help you make a reduced testcase, as long as you know which CSS/HTML files are involved.
We can help you more if we know how to reproduce this on our side. Is there an easy way to get mulet running? I downloaded and ran the firefox-*.dmg (actually firefox within FirefoxNightly.app/Contents/MacOS/) in: https://archive.mozilla.org/pub/b2g/nightly/2016/04/2016-04-05-15-02-59-mozilla-central/ and went to: chrome://b2g/content/shell.html but didn't seem to hit anything. Perhaps I'm doing it wrong?
Reporter | ||
Comment 19•8 years ago
|
||
(In reply to Jesse Ruderman from comment #17) > A tool like https://github.com/MozillaSecurity/lithium/ can help you make a > reduced testcase, as long as you know which CSS/HTML files are involved. Good to know but as of now, I still don't know which files are involved :)
Reporter | ||
Comment 20•8 years ago
|
||
(In reply to Gary Kwong [:gkw] [:nth10sd] from comment #18) > We can help you more if we know how to reproduce this on our side. Is there > an easy way to get mulet running? > > I downloaded and ran the firefox-*.dmg (actually firefox within > FirefoxNightly.app/Contents/MacOS/) in: > > https://archive.mozilla.org/pub/b2g/nightly/2016/04/2016-04-05-15-02-59- > mozilla-central/ > > and went to: > > chrome://b2g/content/shell.html > > but didn't seem to hit anything. Perhaps I'm doing it wrong? Yes, you are doing it wrong :). You need to buid a debug B2G or Mulet from pine tree https://hg.mozilla.org/projects/pine/ and a Gaia from kanikani branch. We don't have (yet) debug builds out of pine on taskcluster :(
Reporter | ||
Comment 21•8 years ago
|
||
Crash is bypassed if I remove *both* <gaia-container> elements from homescreen/index.html: - https://github.com/mozilla-b2g/gaia/blob/eb053a9a544a3cd011a9cf8d6500fb96ec75a84f/apps/homescreen/index.html#L74 - https://github.com/mozilla-b2g/gaia/blob/eb053a9a544a3cd011a9cf8d6500fb96ec75a84f/apps/homescreen/index.html#L105 So far, I could find a LOT of calls to RestyleUndisplayedNodes() between BeginReconstruct() and EndReconstruct()
Reporter | ||
Comment 22•8 years ago
|
||
(In reply to Alexandre LISSY :gerard-majax from comment #21) > Crash is bypassed if I remove *both* <gaia-container> elements from > homescreen/index.html: > - > https://github.com/mozilla-b2g/gaia/blob/ > eb053a9a544a3cd011a9cf8d6500fb96ec75a84f/apps/homescreen/index.html#L74 > - > https://github.com/mozilla-b2g/gaia/blob/ > eb053a9a544a3cd011a9cf8d6500fb96ec75a84f/apps/homescreen/index.html#L105 > > So far, I could find a LOT of calls to RestyleUndisplayedNodes() between > BeginReconstruct() and EndReconstruct() Keeping index.html unchanged EXCEPT removing <link> elements pointing to window.css *and* grid.css, crash is not happening anymore.
Reporter | ||
Comment 23•8 years ago
|
||
Hacking into gaia-container's component, I can safely assert: crash: > --- a/apps/homescreen/bower_components/gaia-container/script.js > +++ b/apps/homescreen/bower_components/gaia-container/script.js > @@ -761,9 +761,7 @@ window.GaiaContainer = (function(exports) { > > var template = document.createElement('template'); > > - template.innerHTML = > - `<style>gaia-container { position: relative; display: block; }</style>` + > - `<content select='*'></content>`; > + template.innerHTML = `<style></style><content select='*'></content>`; > > function GaiaContainerChild(element) { > this._element = element; no crash: > --- a/apps/homescreen/bower_components/gaia-container/script.js > +++ b/apps/homescreen/bower_components/gaia-container/script.js > @@ -761,9 +761,7 @@ window.GaiaContainer = (function(exports) { > > var template = document.createElement('template'); > > - template.innerHTML = > - `<style>gaia-container { position: relative; display: block; }</style>` + > - `<content select='*'></content>`; > + template.innerHTML = `<content select='*'></content>`; > > function GaiaContainerChild(element) { > this._element = element;
Comment 24•8 years ago
|
||
Ok, so this is related to html <template> elements and their undisplayed children. I'm helping Alexandre debug this in rr over IRC. Hopefully he should have it figured out soon.
Comment 25•8 years ago
|
||
So, from debugging this with Alexandre, it looks like the culprit is Shadow DOM content. In particular, we end up adding a <style> element (whose parent is a shadow root) to the undisplayed content map like so: https://pastebin.mozilla.org/8866640 https://pastebin.mozilla.org/8866641 And then we never end up traversing that with ElementRestyler, which is supposed to call RestyleUndisplayedNodes. wchen, how is this supposed to work?
Flags: needinfo?(wchen)
Comment 26•8 years ago
|
||
(The one thing I haven't been able to figure out yet is exactly which caller adds the shadow content to the list of undisplayed items, which we use at [1]. Alexandre is going to figure that out with rr) [1] https://hg.mozilla.org/mozilla-central/file/494289c72ba3/layout/base/nsCSSFrameConstructor.h#l800
Comment 27•8 years ago
|
||
(In reply to Bobby Holley (busy) from comment #26) > [1]. Alexandre is going to figure that out with rr) Ok, the backtrace is here: https://pastebin.mozilla.org/8866647 Which means that this is the code that adds the content to the list of undisplayed items: https://hg.mozilla.org/mozilla-central/file/494289c72ba3/layout/base/nsCSSFrameConstructor.cpp#l5767 So apparently the parent frame of this shadow <template> content is a colgroup frame? The plot thickens, but I don't understand shadow DOM stuff well enough to go further.
Reporter | ||
Comment 28•8 years ago
|
||
The WebComponent is https://github.com/mozilla-b2g/gaia/blob/eb053a9a544a3cd011a9cf8d6500fb96ec75a84f/apps/homescreen/bower_components/gaia-container/script.js and being used at https://github.com/mozilla-b2g/gaia/blob/eb053a9a544a3cd011a9cf8d6500fb96ec75a84f/apps/homescreen/index.html#L74 and https://github.com/mozilla-b2g/gaia/blob/eb053a9a544a3cd011a9cf8d6500fb96ec75a84f/apps/homescreen/index.html#L105
Reporter | ||
Comment 29•8 years ago
|
||
I'm not sure why we do the call to SetAsUndisplayedContent, given at least one condition of the if() seems not true. Am I missing something obvious? (rr) f 1 #1 0x00007fd3f6407835 in nsCSSFrameConstructor::AddFrameConstructionItemsInternal (this=this@entry=0x7fd3b7463d00, aState=..., aContent=aContent@entry=0x7fd3e6a13cc0, aParentFrame=aParentFrame@entry=0x7fd3cfefa788, aTag=0x7fd3e9676790, aNameSpaceID=3, aSuppressWhiteSpaceOptimizations=true, aStyleContext=0x7fd3cfefac80, aFlags=3, aAnonChildren=0x0, aItems=...) at /home/alex/codaz/Mozilla/gecko-cinnabar/layout/base/nsCSSFrameConstructor.cpp:5767 5767 SetAsUndisplayedContent(aState, aItems, aContent, styleContext, isGeneratedContent); (rr) l 5762 // Inside colgroups, suppress everything except columns. 5763 if (aParentFrame && 5764 aParentFrame->GetType() == nsGkAtoms::tableColGroupFrame && 5765 (!(bits & FCDATA_IS_TABLE_PART) || 5766 display->mDisplay != NS_STYLE_DISPLAY_TABLE_COLUMN)) { 5767 SetAsUndisplayedContent(aState, aItems, aContent, styleContext, isGeneratedContent); 5768 return; 5769 } 5770 5771 bool canHavePageBreak = (rr) print aParentFrame $45 = (nsContainerFrame *) 0x7fd3cfefa788 (rr) print aParentFrame->GetType() $46 = (nsIAtom *) 0x7fd3e9672a30 (rr) print *(aParentFrame->GetType()) $47 = {<nsISupports> = {_vptr.nsISupports = 0x7fd3f9aaf090 <vtable for PermanentAtomImpl+16>}, mLength = 18, mHash = 3713396032, mString = 0x7fd3f9f58988 <flexContainerFrame_buffer+8> u"FlexContainerFrame"} (rr) print nsGkAtoms::tableColGroupFrame $48 = (nsIAtom *) 0x7fd3e9673280 (rr) print *nsGkAtoms::tableColGroupFrame $49 = {<nsISupports> = {_vptr.nsISupports = 0x7fd3f9aaf090 <vtable for PermanentAtomImpl+16>}, mLength = 18, mHash = 416995712, mString = 0x7fd3f9f580a8 <tableColGroupFrame_buffer+8> u"TableColGroupFrame"} (rr) Are nsGkAtoms::tableColGroupFrame and nsGkAtoms::flexContainerFrame equals?
Comment 30•8 years ago
|
||
I'm also hitting this when trying to print the following on Windows with latest m-c: https://html.spec.whatwg.org/multipage/iana.html#iana
Comment 31•8 years ago
|
||
> Are nsGkAtoms::tableColGroupFrame and nsGkAtoms::flexContainerFrame equals?
No, they are not.
Reporter | ||
Comment 32•8 years ago
|
||
Right, so maybe the stuff about nsGkAtoms was bogus, I did a new rr session and I'm getting a little bit different backtrace. The nsStyleContext triggering assert at shutdown has been added to the map of undisplayed elements at https://dxr.mozilla.org/mozilla-central/rev/68c0b7d6f16ce5bb023e08050102b5f2fe4aacd8/layout/base/nsCSSFrameConstructor.cpp#5740
Attachment #8737794 -
Attachment is obsolete: true
Attachment #8737811 -
Attachment is obsolete: true
Reporter | ||
Comment 33•8 years ago
|
||
William, last time we spoke you were investigating this because it was not just a leaking issue but something more serious you said. Any news? Thanks!
Flags: needinfo?(wchen)
Assignee: nobody → khuey
I rediscovered this bug when noticing that the testcase in bug 1264380 would crash the browser when opened in e10s and then resized. After minimization, searching for a programmatic substitute for resizing the browser, and some debugging with btseng and dbaron we located the root cause.
Attachment #8741187 -
Flags: review?(dbaron)
Comment 35•8 years ago
|
||
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #34) > Created attachment 8741187 [details] [diff] [review] > Patch > > I rediscovered this bug when noticing that the testcase in bug 1264380 would > crash the browser when opened in e10s and then resized. After minimization, > searching for a programmatic substitute for resizing the browser, and some > debugging with btseng and dbaron we located the root cause. Still hits the assertion when trying to print as in comment 30 with this patch applied.
Reporter | ||
Comment 36•8 years ago
|
||
Comment on attachment 8741187 [details] [diff] [review] Patch Fixes all the crashes of my debug mulet :=)
Attachment #8741187 -
Flags: feedback+
Reporter | ||
Comment 37•8 years ago
|
||
Also works on device but does not fix bug 1261612
Comment 38•8 years ago
|
||
(In reply to Alexandre LISSY :gerard-majax from comment #36) > Comment on attachment 8741187 [details] [diff] [review] > Patch > > Fixes all the crashes of my debug mulet :=) Nice! (In reply to Alexandre LISSY :gerard-majax from comment #37) > Also works on device but does not fix bug 1261612 It's not clear to me why it would. (In reply to Bob Owen (:bobowen) from comment #35) > Still hits the assertion when trying to print as in comment 30 with this > patch applied. That's more worrisome. Kyle, is it related or something entirely different? Would still be nice to get bug 1258017 uplifted to fix the style crashes, but it's getting pretty late in the cycle...
Updated•8 years ago
|
Flags: needinfo?(khuey)
Reporter | ||
Comment 39•8 years ago
|
||
(In reply to Bobby Holley (busy) from comment #38) > (In reply to Alexandre LISSY :gerard-majax from comment #37) > > Also works on device but does not fix bug 1261612 > > It's not clear to me why it would. > Well I hoped that we could be lucky and that the white was because of this bug as the previous web component bug that got fixed was breaking all styles related to them :). But it's not the case and anyway kanru found why we have that other bug, so it's okay :)
(In reply to Bobby Holley (busy) from comment #38) > (In reply to Bob Owen (:bobowen) from comment #35) > > Still hits the assertion when trying to print as in comment 30 with this > > patch applied. > > That's more worrisome. Kyle, is it related or something entirely different? I don't see why crashes when printing would be related to the shadow DOM ... that should probably go in another bug.
Flags: needinfo?(khuey)
One other thought: maybe the assertion should be changed back to an NS_ASSERTION (like the assertion it replaced) rather than a MOZ_ASSERT so that it's safer to backport? (Isn't that why these are looking like "regressions"?)
Flags: needinfo?(bobbyholley)
Er, let's discuss that in bug 1258017.
Flags: needinfo?(bobbyholley)
Comment on attachment 8741187 [details] [diff] [review] Patch >+ MOZ_ASSERT_IF(!parent, !aContent->GetParent()); // No non-Elements! Please don't use MOZ_ASSERT_IF in layout code. (Its parameters are named backwards. If it's called MOZ_ASSERT_IF, it should assert param1 if param2, since the name has assert before if. Standard smalltalk naming convention, which is really the only sane way to name functions where each word in the function's name corresponds to a parameter of the function.) r=dbaron with that unwound into a single MOZ_ASSERT
Attachment #8741187 -
Flags: review?(dbaron) → review+
Reporter | ||
Comment 44•8 years ago
|
||
landed in this shape on pine to unblock other work: https://hg.mozilla.org/projects/pine/rev/e417ce4b94e4
Whiteboard: fixed-in-pine
Ok, this is above my pay grade. The attached patch doesn't work because we're not being consistent about always skipping the ShadowRoot whenever we use the UndisplayedMap. I tried to fix that, but ran into another problem, which is that GetParentElementCrossingShadowRoot behaves weirdly with nested ShadowRoots. Specifically, in <my-custom-element> *ShadowRoot* <div id=1> *ShadowRoot* <div id=2> Calling GetParentElementCrossingShadowRoot on div2 will return my-custom-element. Which is not what we want here. Similarly, GetFlattenedTreeParent appears to do the same thing, and also screws around with XBL. Since GetParentElementCrossingShadowRoot doesn't actually do what it says (what is really does is GetParentElementUnlessParentIsShadowRootThenGetHostElement) I don't know what to do here. I also don't know whether or not the other uses of GetParentElementCrossingShadowRoot are correct or whether they are also misbehaving in the case of nested ShadowRoots.
Assignee: khuey → nobody
Flags: needinfo?(wchen)
Flags: needinfo?(bzbarsky)
Comment 46•8 years ago
|
||
Is there a standalone testcase showing the problem around somewhere? I'm not seeing one in the bug offhand. :(
Flags: needinfo?(khuey)
There's one in the patch.
Flags: needinfo?(khuey)
Summary: B2G debug segfault on MOZ_ASSERT(!mOldRootNode, "Should have GCed old root node"); → B2G debug segfault on MOZ_ASSERT(!mOldRootNode, "Should have GCed old root node"), because undisplayed map (for handling display:none) doesn't work right for children of shadow roots
Comment 48•8 years ago
|
||
William told me this is happening because of multiple shadow roots and that feature is gone from the spec. William will likely land a patch to remove that from our current implementation in the near future.
Flags: needinfo?(wchen)
Reporter | ||
Comment 49•8 years ago
|
||
(In reply to Andrew Overholt [:overholt] from comment #48) > William told me this is happening because of multiple shadow roots and that > feature is gone from the spec. William will likely land a patch to remove > that from our current implementation in the near future. So it means we should update gaia web components to not make use of this feature anymore I guess?
Flags: needinfo?(overholt)
Comment 50•8 years ago
|
||
(In reply to Alexandre LISSY :gerard-majax from comment #49) > (In reply to Andrew Overholt [:overholt] from comment #48) > > William told me this is happening because of multiple shadow roots and that > > feature is gone from the spec. William will likely land a patch to remove > > that from our current implementation in the near future. > > So it means we should update gaia web components to not make use of this > feature anymore I guess? I believe the problem that Kyle was seeing was related to having multiple shadow roots on the same element with the use of a shadow insertion point. Those features have been removed in the latest spec. As far as I know, Gaia only uses nested shadow DOM (which is fine) and does not use multiple shadow roots on the same element or shadow insertion points.
Flags: needinfo?(overholt)
Reporter | ||
Comment 51•8 years ago
|
||
If we don't care about nested shadowroot then maybe comment 45 from kyle does not stand anymore and this is okay?
Flags: needinfo?(wchen)
Comment 52•8 years ago
|
||
Pushed by wchen@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/f31f71abd551 Look past ShadowRoots when looking for the parent for the UndisplayedMap. r=dbaron
Comment 53•8 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/f31f71abd551
Status: NEW → RESOLVED
Closed: 8 years ago
status-firefox50:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla50
Comment 54•8 years ago
|
||
The patch in bug 1274539 fixes the inconsistency with using GetParentElementCrossingShadowRoot().
Flags: needinfo?(wchen)
Flags: needinfo?(bzbarsky)
You need to log in
before you can comment on or make changes to this bug.
Description
•