Closed Bug 1258017 Opened 4 years ago Closed 4 years ago

nsStyleContext destructor release and potentially GCs rule node before a call to mRuleNode->IsRoot()

Categories

(Core :: CSS Parsing and Computation, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla48
Tracking Status
firefox47 --- fixed
firefox48 --- fixed

People

(Reporter: bholley, Assigned: bholley)

References

Details

Attachments

(3 files, 4 obsolete files)

This isn't a security issue, because IsRoot() isn't virtual, but potentially a correctness one. I'll write a patch for this issue (which we may want to backport) as well as some additional cleanup.
This allows us to release the rule node as our final act in the destructor, which makes
it something we can turn into a smart pointer member in future patches.
Attachment #8732457 - Flags: review?(dbaron)
Attachment #8732458 - Flags: review?(dbaron)
Comment on attachment 8732456 [details] [diff] [review]
Part 1 - Release the rule node after detaching from the parent. v1

>   }
> 
>+
>   // Free up our data structs.

No need for a double blank line here.

>-nsStyleSet::NotifyStyleContextDestroyed(nsStyleContext* aStyleContext)
>+nsStyleSet::NotifyStyleContextDestroyed(nsStyleContext* aStyleContext, bool hadParent)

hadParent -> aHadParent.  (Same for the header file.)



I'd like to see this patch get backported to branches, since it might fix some crashes we're seeing.

r=dbaron
Attachment #8732456 - Flags: review?(dbaron) → review+
Comment on attachment 8732457 [details] [diff] [review]
Part 2 - Trigger GC in RuleNodeUnused rather than NotifyStyleContextDestroyed. v1

I think you should pass a boolean to RuleNodeUnused (named with an "a" argument, please), and only do the GC from RuleNodeUnused when it's called from nsRuleNode::Release... and not when it's called from nsRuleNode::nsRuleNode.

r=dbaron with that
Attachment #8732457 - Flags: review?(dbaron) → review+
Comment on attachment 8732458 [details] [diff] [review]
Part 3 - Use a RefPtr for mRuleNode. v1

>-  nsRuleNode* const       mRuleNode;
>+  RefPtr<nsRuleNode> const       mRuleNode;

Please write this as:
  const RefPtr<nsRuleNode> mRuleNode;
i.e., with const at the start, and without the gobs of extra spaces (which were previously lining things up with the other members below)

r=dbaron with that
Attachment #8732458 - Flags: review?(dbaron) → review+
So Part 1 actually creates a GC hazard. By removing ourselves as a child of the parent, we don't get marked in a GC triggered by the NotifyStyleContextDestroyed call, which means that the eventual Release() call can operate on freed memory. This can be observed by changing kGCInterval from 300 to 10 and then running tests.

I think that this whole mark-and-sweep scheme is really error-prone, and could be lot simpler. We can use a reference count to track ownership, but maintain the deferred-finalize behavior. Boris and I talked about it, and I think I can make it work.
Depends on: 1259677
Attachment #8734813 - Flags: review?(bzbarsky)
Try push in comment 9 looked generally green modulo some opt build fail-on-warning. Pushing again to be sure:

remote:   https://treeherder.mozilla.org/#/jobs?repo=try&revision=53ccd7e1d75a
remote: 
remote: It looks like this try push has talos jobs. Compare performance against a baseline revision:
remote:   https://treeherder.mozilla.org/perf.html#/comparechooser?newProject=try&newRevision=53ccd7e1d75a
Attachment #8732456 - Attachment is obsolete: true
Attachment #8732457 - Attachment is obsolete: true
Attachment #8732458 - Attachment is obsolete: true
Blocks: stylo
Oh, and here is a commit message for the big patch that I just added:

    The basic idea here is as follows:
    * Rule nodes are reference-counted, but releasing them adds them to a linked
      list rather than freeing them. This allows for the reuse that motivated the
      original GC scheme.
    * We get rid of the marking, and instead rely on the reference count.
    * Sweeping no longer requires a complicated traversal. We just pop items
      off the free list until it's empty. When a child is destroyed, its parent
      may go onto the free list.
    * We remove special handling for the root node, and use a regular reference-counted
      edge from the style set.
    * The free list automatically asserts that it's empty (meaning all nodes have been
      freed) in its destructor, which runs when the style set is destroyed.
    * We get rid of the list of style context roots on the style set. We still need
      a count though, because of the HasCachedStyleData check.
The testcase that motivated bz to implement deferred finalization was the one in bug 501847. The metric of that testcase, presumably, is that we should avoid running a rule GC at all when loading that testcase. I confirmed that this is still the case with my patch.

The testcase that motivated dbaron to implement the original GC was bug 117316. For that one, we presumably want to avoid memory spiraling out of control, and also prevent GCs from getting significantly slower.

I verified both of those, both on the testcase (which doesn't seem to trigger GCs at all on modern Gecko), and on a few sites like gawker.com. On the web, threshold GCs never go much longer than .7 ms on my machine in either case. My patches seemed to be a modest (~10%) improvement in GC times on gawker, but hard to say precisely.

Talos results also look fine.
> The testcase that motivated bz to implement deferred finalization was the one in bug 501847.

I am guessing you mistyped the bug#?
(In reply to Boris Zbarsky [:bz] from comment #16)
> > The testcase that motivated bz to implement deferred finalization was the one in bug 501847.
> 
> I am guessing you mistyped the bug#?

I don't think so? I'm basing this off bug 514773#c0.
Oh, I see.  OK, then!
(In reply to Bobby Holley (busy) from comment #14)
>     The basic idea here is as follows:
>     * Rule nodes are reference-counted, but releasing them adds them to a linked
>       list rather than freeing them. This allows for the reuse that motivated the
>       original GC scheme.

Rule nodes can be reused when on the free list (and thus taken off the free list), right?

>     * Sweeping no longer requires a complicated traversal. We just pop items
>       off the free list until it's empty. When a child is destroyed, its parent
>       may go onto the free list.

Parent and child can still be destroyed in a single sweeping pass, right?

>     * We get rid of the list of style context roots on the style set. We still need
>       a count though, because of the HasCachedStyleData check.

Do you still retain mOldRuleTrees, and the assertions relating to it?
(In reply to David Baron [:dbaron] ⌚️UTC (less responsive until April 4) from comment #19)
> (In reply to Bobby Holley (busy) from comment #14)
> >     The basic idea here is as follows:
> >     * Rule nodes are reference-counted, but releasing them adds them to a linked
> >       list rather than freeing them. This allows for the reuse that motivated the
> >       original GC scheme.
> 
> Rule nodes can be reused when on the free list (and thus taken off the free
> list), right?

Yes. They insert/remove themselves from the free list when their reference count hits/leaves zero. They are only removed from the tree in a GC.

> 
> >     * Sweeping no longer requires a complicated traversal. We just pop items
> >       off the free list until it's empty. When a child is destroyed, its parent
> >       may go onto the free list.
> 
> Parent and child can still be destroyed in a single sweeping pass, right?

Right. Destroying a child can cause the parent to end up in the free list. We iteratively pop the free list until it's empty.

> 
> >     * We get rid of the list of style context roots on the style set. We still need
> >       a count though, because of the HasCachedStyleData check.
> 
> Do you still retain mOldRuleTrees, and the assertions relating to it?

I removed it, because with the new scheme it was only being used for the assertions, and it seemed like the extra complexity wasn't justified. Granted, I don't have a good sense of how critical those assertions are, and whether the bugs they're trying to catch are likely to pop up with the new system.
(In reply to Bobby Holley (busy) from comment #20)
> > Do you still retain mOldRuleTrees, and the assertions relating to it?
> 
> I removed it, because with the new scheme it was only being used for the
> assertions, and it seemed like the extra complexity wasn't justified.
> Granted, I don't have a good sense of how critical those assertions are, and
> whether the bugs they're trying to catch are likely to pop up with the new
> system.

What replaces it?  Just allowing the old rule trees to hang around without anything complaining?

I think those assertions are pretty important, since they tend to be good at catching bugs in the restyling code, which is quite tricky.

That said, you may well be able to retain the assertions without keeping the mOldRuleTrees list.
(In reply to David Baron [:dbaron] ⌚️UTC (less responsive until April 4) from comment #21)
> (In reply to Bobby Holley (busy) from comment #20)
> > > Do you still retain mOldRuleTrees, and the assertions relating to it?
> > 
> > I removed it, because with the new scheme it was only being used for the
> > assertions, and it seemed like the extra complexity wasn't justified.
> > Granted, I don't have a good sense of how critical those assertions are, and
> > whether the bugs they're trying to catch are likely to pop up with the new
> > system.
> 
> What replaces it?  Just allowing the old rule trees to hang around without
> anything complaining?

Right. In BeginReconstruct, we do:

> mInReconstruct = true;
...
> mRuleTree = nsRuleNode::CreateRootNode(mRuleTree->PresContext());

mRuleTree is a RefPtr, so this drops the strong ref to the old rule tree. But it doesn't get collected, since mInReconstruct blocks GCs.

In EndReconstruct, we set mInReconstruct to false, and then GC. Assuming that nobody else is holding an external reference to it or any of its descendants, it will be collected.


> I think those assertions are pretty important, since they tend to be good at
> catching bugs in the restyling code, which is quite tricky.

Can you be specific about what precisely is important to assert? Is it that every root style context is hooked up to the new rule tree at the end of reconstruction?
Flags: needinfo?(dbaron)
(In reply to Bobby Holley (busy) from comment #22)
> Can you be specific about what precisely is important to assert? Is it that
> every root style context is hooked up to the new rule tree at the end of
> reconstruction?

Every style context, not just the roots.  The bugs are places where we fail to reach some style contexts during restyling, because we're traversing the frame tree to find style contexts incorrectly.
Flags: needinfo?(dbaron)
(In reply to David Baron [:dbaron] ⌚️UTC (less responsive until April 4) from comment #23)
> (In reply to Bobby Holley (busy) from comment #22)
> > Can you be specific about what precisely is important to assert? Is it that
> > every root style context is hooked up to the new rule tree at the end of
> > reconstruction?
> 
> Every style context, not just the roots.  The bugs are places where we fail
> to reach some style contexts during restyling, because we're traversing the
> frame tree to find style contexts incorrectly.

Ok, but the old code just checks the roots, right?

https://hg.mozilla.org/mozilla-central/file/d1d47ba19ce9/layout/style/nsStyleSet.cpp#l311

I guess I could add back a debug-only nsTArray of roots, and pass |this| to nsStyleContext::RootStyleContext{Added,Removed}.
(In reply to Bobby Holley (busy) from comment #24)
> Ok, but the old code just checks the roots, right?
> 
> https://hg.mozilla.org/mozilla-central/file/d1d47ba19ce9/layout/style/
> nsStyleSet.cpp#l311

That's effectively checking all contexts, since mRuleNode is a const member, and there's also an assertion in nsStyleContext::nsStyleContext.  (I'd forgotten that that was how it did it, though.)

> I guess I could add back a debug-only nsTArray of roots, and pass |this| to
> nsStyleContext::RootStyleContext{Added,Removed}.

I think that would probably be helpful.  Thanks.
David, do you just want to steal this review, since you have this paged in way more than I do at this point?
Flags: needinfo?(dbaron)
Sure, I can, though I won't do it tonight.
Flags: needinfo?(dbaron)
Attachment #8734811 - Flags: review?(bzbarsky) → review+
Attachment #8734812 - Flags: review?(bzbarsky) → review+
Comment on attachment 8734813 [details] [diff] [review]
Part 3 - Redesign and simplify rule tree GC. v1

As far as the assertions go, I think it may be simpler to retain
mOldRuleTrees and use that for the assertions (i.e., assert they're all
destroyed) than to retain mRoots.  In fact, I think it's sufficient to
retain only a single "old rule tree", and just find a way to assert
in EndReconstruct that it has been destroyed by the GCRuleTrees call.
(One slightly ugly way would be to hold a reference to it, assert on
its reference count after the first GCRuleTrees, drop the reference,
and then just call GCRuleTrees again.)


nsRuleNode::Destroy should assert that the rule node has no children
(!HaveChildren()).


>+ * presentation of the document goes away. Its entries are reference-
>+ * counted, with strong references held by child nodes, style structs
>+ * and (for the root), the style set.

This should probably mention that they're not immediately destroyed.

>+class nsRuleNode : public mozilla::LinkedListElement<nsRuleNode> {

This is confusing because rule nodes are part of linked lists in
two different ways (the next-sibling list, and the unused list).  At
the very least, it should be clearly commented.  (I might prefer
a more traditional "nsRuleNode* mNextUnusedRuleNode" instead, since
this is otherwise more tempting to use, but I'm ok with this.)

>-  nsRuleNode* mNextSibling; // This value should be used only by the
>-                            // parent, since the parent may store
>-                            // children in a hash, which means this
>-                            // pointer is not meaningful.  Order of
>-                            // siblings is also not meaningful.
>+  nsRuleNode* mNextSibling;

You're not changing the concept that mNextSibling is sometimes
not meaningful because the children are hashed, so you should leave
this comment.

>+  // Free all the rules with reference-count zero.
>   void GCRuleTrees();

The comment should mention that it continues freeing rule nodes
until none have reference count of zero (i.e., to free parents too).

nsStyleSet::GCRuleTrees should probably have a comment explaining that
it's important that it destroys nodes whose reference count becomes
zero during GCRuleTrees, because that's what allows it to destroy
chains of ancestors up to the root.

nsStyleSet::Shutdown should probably assert that mRootStyleContextCount
is 0.

>+  // Free all the rules with reference-count zero.

This is freeing rule nodes, not rules.  And it frees the ones that
become zero during the GC too.

Putting every rule node we construct into the unused list and then
immediately removing it is a bit annoying.  I guess it's not that
horrible, though.

Other than that, I think this looks good, but I'd like to have a quick
look at the revisions.
Attachment #8734813 - Flags: review?(bzbarsky) → review-
(In reply to David Baron [:dbaron] ⌚️UTC+1 (less responsive until April 4) from comment #28)
> Comment on attachment 8734813 [details] [diff] [review]
> Part 3 - Redesign and simplify rule tree GC. v1
> 
> As far as the assertions go, I think it may be simpler to retain
> mOldRuleTrees and use that for the assertions (i.e., assert they're all
> destroyed) than to retain mRoots.  In fact, I think it's sufficient to
> retain only a single "old rule tree", and just find a way to assert
> in EndReconstruct that it has been destroyed by the GCRuleTrees call.
> (One slightly ugly way would be to hold a reference to it, assert on
> its reference count after the first GCRuleTrees, drop the reference,
> and then just call GCRuleTrees again.)

How about just keeping a weak ref to it as a member on nsStyleSet in debug builds, checking for it (and null it out when found) before every destroy call in GCRuleTrees, and asserting that it's null at the end of GCRuleTrees? That allows us to keep it all debug-only, and avoid having any side-effects on the reference count.

> This is confusing because rule nodes are part of linked lists in
> two different ways (the next-sibling list, and the unused list).  At
> the very least, it should be clearly commented.  (I might prefer
> a more traditional "nsRuleNode* mNextUnusedRuleNode" instead, since
> this is otherwise more tempting to use, but I'm ok with this.)

It needs to be a doubly-linked list for constant-time removal. And hand-written doubly-linked-list manipulation is definitely something to avoid IMO. I'll add a comment.

I probably won't get to writing the patch up until tomorrow, but wanted to check in about the assertion strategy.
(In reply to Bobby Holley (busy) from comment #29)
> How about just keeping a weak ref to it as a member on nsStyleSet in debug
> builds, checking for it (and null it out when found) before every destroy
> call in GCRuleTrees, and asserting that it's null at the end of GCRuleTrees?
> That allows us to keep it all debug-only, and avoid having any side-effects
> on the reference count.

Yes, that's fine, and probably actually better.  I was thinking that sort of approach is more expensive, but being able to be debug-only is indeed an advantage and avoids the actual problem of being more expensive.
I added the setup proposed in comment 29.

(In reply to David Baron [:dbaron] ⌚️UTC+1 (less responsive until April 4) from comment #28)
> nsRuleNode::Destroy should assert that the rule node has no children
> (!HaveChildren()).

This is already asserted in ~nsRuleNode.

> >+ * presentation of the document goes away. Its entries are reference-
> >+ * counted, with strong references held by child nodes, style structs
> >+ * and (for the root), the style set.
> 
> This should probably mention that they're not immediately destroyed.

Done.

> 
> >+class nsRuleNode : public mozilla::LinkedListElement<nsRuleNode> {
> 
> This is confusing because rule nodes are part of linked lists in
> two different ways (the next-sibling list, and the unused list).  At
> the very least, it should be clearly commented.  (I might prefer
> a more traditional "nsRuleNode* mNextUnusedRuleNode" instead, since
> this is otherwise more tempting to use, but I'm ok with this.)

Commented.

> 
> >-  nsRuleNode* mNextSibling; // This value should be used only by the
> >-                            // parent, since the parent may store
> >-                            // children in a hash, which means this
> >-                            // pointer is not meaningful.  Order of
> >-                            // siblings is also not meaningful.
> >+  nsRuleNode* mNextSibling;
> 
> You're not changing the concept that mNextSibling is sometimes
> not meaningful because the children are hashed, so you should leave
> this comment.

Fixed.
> 
> >+  // Free all the rules with reference-count zero.
> >   void GCRuleTrees();
> 
> The comment should mention that it continues freeing rule nodes
> until none have reference count of zero (i.e., to free parents too).

Done.
 
> nsStyleSet::GCRuleTrees should probably have a comment explaining that
> it's important that it destroys nodes whose reference count becomes
> zero during GCRuleTrees, because that's what allows it to destroy
> chains of ancestors up to the root.

This is the same as the above, right?

> 
> nsStyleSet::Shutdown should probably assert that mRootStyleContextCount
> is 0.

Done.

> 
> >+  // Free all the rules with reference-count zero.
> 
> This is freeing rule nodes, not rules.  And it frees the ones that
> become zero during the GC too.

Same here?

> 
> Putting every rule node we construct into the unused list and then
> immediately removing it is a bit annoying.  I guess it's not that
> horrible, though.
> 
> Other than that, I think this looks good, but I'd like to have a quick
> look at the revisions.
Attachment #8735344 - Flags: review?(dbaron)
Attachment #8734813 - Attachment is obsolete: true
Comment on attachment 8735344 [details] [diff] [review]
Redesign and simplify rule tree GC. v2

>   nsRuleNode* mNextSibling; // This value should be used only by the
>                             // parent, since the parent may store
>                             // children in a hash, which means this
>                             // pointer is not meaningful.  Order of
>                             // siblings is also not meaningful.
>-
>   struct Key {
>     nsIStyleRule* mRule;
>     mozilla::SheetType mLevel;
>     bool mIsImportantRule;

Please leave the blank line, too.


>+  // Free all the rules with reference-count zero. This continues iterating over
>+  // the free list until it is empty, which allows immediate collection of nodes
>+  // whose reference-count drops to zero during the destruction of a child node.
>+  // This allows the collection of entire trees at once, since children hold their
>+  // parents alive.

Please wrap comment at something less than 80 characters.


Thanks for fixing this; r=dbaron.
Attachment #8735344 - Flags: review?(dbaron) → review+
Thanks for the fast reviews!
I'm inclined to think we should just backport this whole thing to aurora, since there are a bunch of crashes that we see in crash-stats that this probably fixes.  What do you think?
Flags: needinfo?(bobbyholley)
(In reply to David Baron [:dbaron] ⌚️UTC-7 from comment #38)
> I'm inclined to think we should just backport this whole thing to aurora,
> since there are a bunch of crashes that we see in crash-stats that this
> probably fixes.  What do you think?

That seems sensible, though do you think we should wait for Alexandre to finish debugging bug 1261351 first? I gave him some guidance today, and he's going to take a crack at it tomorrow in rr.
Flags: needinfo?(dbaron)
So I think the "regressions" are because what was an NS_ASSERTION in the old code is now a fatal MOZ_ASSERT in the new code, so people are noticing it more.

Maybe you can backport now, but with the relevant MOZ_ASSERT (or MOZ_ASSERTs) changed back to NS_ASSERTION?
Flags: needinfo?(dbaron)
Comment on attachment 8735344 [details] [diff] [review]
Redesign and simplify rule tree GC. v2

Uplift applies to all the patches in this bug.

Approval Request Comment
[Feature/regressing bug #]: Longstanding
[User impact if declined]: We believe that the simplifications introduced in this patch are likely to fix various crashes in the style system we see on crash-stats. See comment 5. 
[Describe test coverage new/current, TreeHerder]: No explicit test coverage, but this is core style system code, for which there is a ton of test coverage.
[Risks and why]: It's a large change, but it's been baking on m-c for several weeks. There were some "regressions", but it turns out they were just the result of my patches having turned an NS_ASSERTION into a MOZ_CRASH, which made certain pre-existing bugs more noticeable. We can backport with an NS_ASSERTION.
[String/UUID change made/needed]: None
Flags: needinfo?(bobbyholley)
Attachment #8735344 - Flags: approval-mozilla-aurora?
Comment on attachment 8735344 [details] [diff] [review]
Redesign and simplify rule tree GC. v2

Crash fixes, has baked on Nightly for a bit, Aurora47+
Attachment #8735344 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
I will do the backport here.
This broke Aurora on the uplift. My Try uplift simulations caught this same bustage as well.
https://treeherder.mozilla.org/logviewer.html#?job_id=2397811&repo=mozilla-aurora
Flags: needinfo?(cbook)
Flags: needinfo?(bobbyholley)
Flags: needinfo?(wkocher)
bholley can you fix this quickly, or do we need to backout ?
Flags: needinfo?(cbook) → needinfo?(rkothari)
I will look at this later today, out why this is aurora-only, and decide what to do here.
Oh. Didn't catch this was brought up already before I backed it out in https://hg.mozilla.org/releases/mozilla-aurora/rev/789ad2053247

Feel free to reland a fixed version.
Flags: needinfo?(wkocher)


(In reply to Wes Kocher (:KWierso) from comment #49)
> This also appears to have caused a different failure in
> https://treeherder.mozilla.org/logviewer.html#?job_id=2397745&repo=mozilla-
> aurora

I annotated this reftest to expect 1-2 assertions.

(In reply to Wes Kocher (:KWierso) from comment #48)
> Oh. Didn't catch this was brought up already before I backed it out in
> https://hg.mozilla.org/releases/mozilla-aurora/rev/789ad2053247
> 
> Feel free to reland a fixed version.

This is just a case where I didn't fully neuter the impact of the newly-introduced fatal assertions. This should do the trick, though I can't verify locally since it's android-only.

remote:   https://hg.mozilla.org/releases/mozilla-aurora/rev/59d900f70453
remote:   https://hg.mozilla.org/releases/mozilla-aurora/rev/27a056f5a548
remote:   https://hg.mozilla.org/releases/mozilla-aurora/rev/b2cc32b7bbe6
Flags: needinfo?(bobbyholley)
Please make sure your follow-up changes land on trunk as well so that this doesn't re-break on Aurora next week.
Flags: needinfo?(bobbyholley)
Flags: needinfo?(bobbyholley)
You need to log in before you can comment on or make changes to this bug.