Closed Bug 479655 Opened 11 years ago Closed 9 years ago

optimize setting of style attributes to avoid rerunning selector matching

Categories

(Core :: CSS Parsing and Computation, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla2.0b1
Tracking Status
blocking2.0 --- -
status2.0 --- wanted

People

(Reporter: dbaron, Assigned: bzbarsky)

References

(Blocks 3 open bugs)

Details

(Keywords: perf)

Attachments

(9 files, 2 obsolete files)

2.77 KB, patch
jst
: review+
Details | Diff | Splinter Review
5.54 KB, patch
dbaron
: review+
Details | Diff | Splinter Review
21.24 KB, patch
dbaron
: review+
Details | Diff | Splinter Review
9.16 KB, patch
dbaron
: review+
Details | Diff | Splinter Review
14.62 KB, patch
dbaron
: review+
Details | Diff | Splinter Review
3.08 KB, patch
dbaron
: review+
Details | Diff | Splinter Review
2.91 KB, patch
jst
: review+
Details | Diff | Splinter Review
30.82 KB, patch
dbaron
: review+
Details | Diff | Splinter Review
18.27 KB, patch
Details | Diff | Splinter Review
I've come up with what I think is a reasonably simple strategy that allows us to avoid re-running selector matching when authors set style attributes (or, for that matter, otherwise change style rules... I think).  I'm not sure how much this would gain us, but I wanted to file a bug on it before I forgot about it.

Part of what's tricky here is that we don't want to defeat style changes being asynchronous.

So to avoid doing that, the way we could do this is by building up our set of buffered style changes in a tree rather than (as we now do) in a list.  (We could, in theory, use bits on the frame tree itself, but that would be unnecessarily slow for frames that have large numbers of children only one of which has a style change, and would also require some unnecessary complexity to deal with UndisplayedNodes.  So I think we should have an out-of-band tree just like the ReflowTree that we used to have.)

So, the way this would work, is, when we enqueue a style change for a frame or undisplayed node (which has a "parent" frame), we would walk its parent chain up to the root frame and build this out-of-band tree back down to the node.

Each node in this out-of-band tree would have the following states (possibly more than one):
 * has a descendant in one of the states below (this could be represented as "no state")
 * has had a matched style rule change (we might need more than one state here to distinguish between style attributes and CSS rules if we want to handle more than style attributes here... I haven't thought about that case much)
 * needs selector matching rerun (nodes in this state should destroy or suppress construction of child nodes)

Then, to process a style change, we would walk this tree of nodes.

When a node simply exists because it has a descendant that needs a change, we'd just walk down to that descendant.

When a node needs selector matching rerun, we'd just call ReResolveStyleContext on it.

The tricky part is when a node has had a matched style rule change.  We'd need to build a new style context based on the new versions of the existing rules for its rule node (e.g., all rules the same except the new version of the style attribute; this is the reason I haven't thought through the non-style-attribute case).  Then, we'd need to call ReParentStyleContext on all its descendants, but simultaneously process the out-of-band tree so we didn't repeat lots of work.  So we would integrate processing of this out-of-band tree into ReParentStyleContext, or something like that.  (Then again, even without this integration, it would be an improvement over what we do now.)

So we'd win both by avoiding rerunning of selector matching in many cases where we currently rerun it, and (maybe not completely) by coalescing style changes in tree form rather than list form, which would do better ancestor/descendant coalescing.
I just did some profiling of various testcases I have that toggle .style.*, and it looks like selector matching is somewhere between 5 and 7% of total time (depending on whether I include the pseudo-probing that ReResolveStyleContext does) on those testcases.  Restyle processing in general is at about 25%.

The tree idea does make sense; I considered doing it when I was doing the async restyling initially and decided to take the simpler approach as a first cut.
Blocks: 493651
Blocks: 500410
I was thinking about this some more today, and one option for doing the "build a new style context" part of this is to have separate restyle hints per-ruleprocessor.  If we move the style attribute checking for AttributeChanged into the right ruleprocessor (nsHTMLCSSStyleSheet), then a change to inline style would mean that we could start at the old rulenode, walk up to a rulenode at a level before the changed one, then walk the rule processors from there as needed, or walk the rules in our existing rulenode parent chain as needed.  For the inline style case, just walking rule processors should be fine, since the only things coming after eStyleAttrSheet are the override and transition sheets.  But if we want to make it possible to change preshint attrs quickly, then we'd need to do something to avoid matching on the eDocSheet level.  I'm tempted to not worry about it, since it would involve building up this array of rules (or equivalent) going up the rulenode tree, and would slow down the style attr case.  People probably don't change preshints much anyway...

The proposal above would, on its own, also mean typically not having to redo the ua/preshint stuff if something changes that involves an eDocSheet level reresolve (classname/id/whatever change, say).
Oh, and the point is that right now nsReStyleHint is 2 bits.  Bug 494117 might make it 3 bits.  eSheetTypeCount is currently 8.  So stuffing one nsReStyleHint per ruleprocessor into a single 32-bit restyle hint is quite simple.  We could even set aside a few bits for things like "apply this hint to all ruleprocessors" and the like if there's a need.
I just realized that comment 2 (and bug 494117) are no good without a solution to the issue raised in comment 0 of this bug.  Specifically, for descendants processing the pending style changes while reparenting is not just an optimization: if it's not done we can end up with GetStyleData() on rulenodes that are no longer backed by sane CSSDeclarations (and which would no longer match the node if full selector matching had been done on that node); I don't think we want to go there.
Blocks: 494117
Boris and I discussed this further today.  We can:

 * store the style change data in a hashtable by content node, and set a bit on the content node to indicate that it has such data

 * when posting a style change, we can walk from the content node to the root looking for that bit.  If the bit is *not* present, we append the content node into a (strong-pointer) array of "style change roots", which maintains the invariant that it never contains a descendant of a node *following* its ancestor

 * we process the style changes in the roots array in reverse order, skipping any roots that no longer have the style change data and corresponding bit (see below).  This ensures we never process any subtree twice.

 * when processing a style change, if we hit a content node that has the bit set, we incorporate its style change hints and *remove the bit and the style change data*


We can then make further optimizations to:
 * split eReStyle_Self into three parts:
   + eReStyle_StyleAttribute
   + eReStyle_Self
   + eReStyle_Descendants
to avoid rerunning selector matching on descendants (and, for style attributes, even self).  Distinguishing the latter two in nsCSSRuleProcessor depends on bug 508466.
Blocks: 467452
I filed bug 534335 on comment 2.
Blocks: 534335
Oh, and we probably want two copies of everything described in comment 5: one for animation restyles, one for normal ones.
One open question unanswered in comment 5 is how we want to handle ~ and + combinators in this setup.  Presumably the "incorporate its style change" bit will add the later siblings to the hashtable or something... Or we have a preceding step to do it.  Or something.  Since processing restyles along the frame tree doesn't necessarily process nodes in DOM order due to XBL, the extra step might be needed.
Oh, and I guess when we walk to the root that needs to be a walk along the XBL flattened tree parent chain, not the DOM parent chain...
Blocks: 522930
Blocks: 547128
Blocks: 455690
Taking.
Assignee: nobody → bzbarsky
Depends on: 565809
So the plan in comment 5 doesn't quite handle eRestyle_LaterSiblings well.  In particular, having just that bit set tells you nothing useful about the node's descendants (as in, that node can't really be a restyle root)...  Thinking about how to make this work.
Put each of the later siblings in the list instead?
Sure; the question is when.  In testcases where kids of a node which have :nth-something selectors applied get rearranged (say sorting rows in a table) via the DOM, adding at DOM mutation time is bad.  We don't want to be adding all the kids to the list on every DOM mutation there, since that will make the list length O(N^2) in number of kids.

The other obvious option is adding right before we process restyles; that can still end up with O(N^2) behavior in the worst case if we hash the later-siblings stuff and happen to process in reverse DOM order.  But that's just a hazard of any approach that doesn't sort the nodes before marking the siblings; I think it should be ok.

Another thing I was thinking about earlier today is a hybrid approach.  Add the later-siblings stuff for "root" nodes before processing restyles and for others when we hit them in ReResolveStyleContext.  This would give us processing in the "right" order for non-roots (modulo XBL).  But this might be too complex to really make me happy, esp. if I want to handle the XBL case right too.
Blocks: 567944
OK.  So I'm about to start posting patches here.  General plan:

This bug: infrastructure with restyle tracker and changes to
          ReResolveStyleContext to merge in descendant data.
Bug 494117: Reparenting style contexts as needed instead of reresolving.
Bug 534335 (upcoming): Make it so inline style changes on a node don't involve
                       any selector matching even on that node (not doing it on
                       kids is done in bug 494117).
Attachment #447813 - Flags: superreview?(jst)
Attachment #447813 - Flags: review?(dbaron)
Attached patch Part 7. Main part of the patch (obsolete) — Splinter Review
If you want I can split this up a bit more (add the API on restyle tracker for GetStyleData first, add the frame constructor bit twiddling, then the rest).  Let me know.
Attachment #447826 - Flags: review?(dbaron)
This is effectively an alpha5 blocker because it blocks bug 567944.
blocking2.0: --- → ?
I'm going to spot-fix bug 567944.  I do think this should block beta, though.
Attachment #447812 - Flags: review?(jst) → review+
Attachment #447813 - Flags: superreview?(jst) → superreview+
Comment on attachment 447813 [details] [diff] [review]
New Element bits to track stuff

r=dbaron
Attachment #447813 - Flags: review?(dbaron) → review+
Comment on attachment 447814 [details] [diff] [review]
Part 3.  Introduce RestyleTracker

Could you put file descriptions just below the license block to show
up in MXR (for both the .h and .cpp)?

>+} // namespacs mozilla

namespace


r=dbaron with that
Attachment #447814 - Flags: review?(dbaron) → review+
Comment on attachment 447817 [details] [diff] [review]
Part 4.  Move handling of eRestyle_LaterSiblings to RestyleTracker

Should this comment be put into RestyleTracker.h? :

>-  // Note: It's the caller's responsibility to make sure to wrap a
>-  // ProcessOneRestyle call in a view update batch.
>-  // This function does not call ProcessAttachedQueue() on the binding manager.
>-  // If the caller wants that to happen synchronously, it needs to handle that
>-  // itself.

I think if you really want ProcessOneRestyle to be inlined usefully,
it should be before its caller.

>+  if (aElement->GetCurrentDoc() != mFrameConstructor->mDocument) {
>+    // Content node has been removed from our document; nothing else
>+    // to do here
>+    return;
>+  }

Maybe assert before this that mFrameConstructor->mDocument is non-null?

r=dbaron with that
Attachment #447817 - Flags: review?(dbaron) → review+
Comment on attachment 447823 [details] [diff] [review]
Part 5.  Pass the restyle tracker down through style reresolution

r=dbaron
Attachment #447823 - Flags: review?(dbaron) → review+
Comment on attachment 447825 [details] [diff] [review]
Part 6.  Tweak the AddPendingRestyle API a bit

r=dbaron.  I guess I'll see what this is for in the next patch.
Attachment #447825 - Flags: review?(dbaron) → review+
The patch for bug 564863 makes us not have enough bits for this patch series.  Trying to figure out how to deal with that now...
Attachment #450033 - Flags: review?(jst) → review+
The previous patch didn't have the masking in GetScriptTypeID (or SetScriptTypeID for that matter) right, so caused test fails of various sorts.
Attachment #450979 - Flags: review?(jst)
Attachment #450033 - Attachment is obsolete: true
Attachment #450979 - Flags: review?(jst) → review+
Comment on attachment 447826 [details] [diff] [review]
Part 7.  Main part of the patch

I think it's confusing to use RestyleTracker::Element.  It makes it seem
like a different type of element.  I'd rather either see
mozilla::dom::Element, dom::Element, or Element.

I think the inline RestyleTracker::Document method should probably be
before its uses.  Same for AddPendingRestylesForRange.

nsRestyleTracker.h:

>+  if ((aRestyleHint & eRestyle_Self) ||
>+      (aMinChangeHint & nsChangeHint_ReconstructFrame)) {

Why this particular check on aMinChangeHint (and not aMinChangeHint != 0)?

>+    for (const Element* cur = aElement; !cur->HasFlag(RootBit()); ) {
>+      nsIContent* parent = cur->GetFlattenedTreeParent();
>+      // Stop if we have no parent or the parent is not an element or
>+      // we're part of the viewport scrollbars (because those are not
>+      // frametree descendants of the primary frame of the root
>+      // element).
>+      // XXXbz maybe the primary frame of the root should be the root scrollframe?
>+      if (!parent || !parent->IsElement() ||
>+          // Check that we've hit the root via a native anonymous kid
>+          // and that this native anonymous kid is not obviously a
>+          // descendant of the root's primary frame.  If so, assume
>+          // we're under the root scrollbars.

I think this would be clearer with s/Check that/If/ and s/.  If so// .

>+          (cur->IsInNativeAnonymousSubtree() && !parent->GetParent() &&
>+           cur->GetPrimaryFrame() &&
>+           cur->GetPrimaryFrame()->GetParent() != parent->GetPrimaryFrame())) {
>+        mRestyleRoots.AppendElement(aElement);
>+        break;
>+      }
>+      cur = parent->AsElement();
>+    }
>+    aElement->SetFlags(RootBit());

You should add a comment explaining that you can set the root bit on an
element even though it's an ancestor of that element that's in
mRestyleRoots, and this can speed up future searching for a root.

>+  }

>+  // A hashtable that maps elements to RestyleData structs.  The
>+  // values only make sense if the element's current document is our
>+  // document and it has our RestyleBit() flag set.
>   PendingRestyleTable mPendingRestyles;

Could you give examples of what causes the restyle bit to get out of
sync with mPendingRestyles?  (It looks like having frame reconstruction
not directly from restyling is a possibility.)

>+  // An array that keeps track of our possible restyle roots.  This
>+  // maintains the invariant that if A and B are both restyle roots
>+  // and A is an ancestor of B then A will come after B in the array.
>+  RestyleRootArray mRestyleRoots;

It's probably worth saying that we maintain that invariant by checking
whether an element's has an ancestor with the restyle root bit set before
we append it to the array.



nsRestyleTracker.cpp:

>+  PRUint32 count;
>+  while ((count = mPendingRestyles.Count())) {

|count| is unused; you can just drop it (and should, since it's
confusable with other things).


Above the various checks that RestyleBit() is set, it might help to have
a comment that the reason to check it is that it might have already been
restyled by a restyle on one of its ancestors (generated later than the
restyle on it).

Why does RestyleTracker::GetRestyleData need to deal with the
possibility that mPendingRestyles.Get returns false?  Shouldn't the
bit-check already exclude that possibility?

>+      // Do the document check before calling GetStyleData, since we
>+      // don't want to do the sibling-processing GetStyleData does if
>+      // the node is no longer relevant.

Seems like GetRestyleData also ought to assert that this is true (even
if the assertion comment only says that the reason not to do it is that
it's a waste of time, which, if that's the case, it probably should say).
ProcessOneRestyle should probably also assert about it.

Also, s/GetStyleData/GetRestyleData/g.

>+    // At this point, the hashtable still contains entries that didn't
>+    // have RootBit() set on the element.  These come in two
>+    // varieties: those that have eRestyle_LaterSiblings set in their
>+    // restyle hint and those that don't.  Process the former first,
>+    // since they will commonly modify mRestyleRoots; we want to do
>+    // this against a "live" hashtable because that lets us optimize
>+    // away multiple siblings with eRestyle_LaterSiblings all set.
>+    // But we don't want to modify the hashtable while enumerating it,
>+    // so collect up the relevant elements first.

I find this comment a little misleading, since the two categories aren't
really distinct; a single restyle entry could fall in both categories.



I'm not a big fan of the approach you took for handling restyling of
later siblings; it seems unnecessarily complicated (esp. the code in
HandleLaterSiblings).  Why not handle eReStyle_LaterSiblings either (a)
eagerly (i.e., when enqueuing restyle) or (b) where you now do (i.e.,
after processing later-enqueued restyles that might restyle all the
siblings via an ancestor) by making an entry in the restyle table for
each sibling, but adding a bit to that entry (maybe even the
eRestyle_LaterSiblings bit) that indicates that all its siblings are
already in the table.  This means that when expanding to all siblings
you can (1) no-op if there's already a restyle for the element with that
bit set and (2) break out of the loop over siblings if you hit that bit.
Choosing option (b) means you'll avoid expanding to the full list of
later siblings in slightly more cases (i.e., when there's a
later-enqueued restyle of an ancestor), but (a) gives better coalescing
(i.e., with a later-enqueued restyle in one of the siblings or one of
its descendants) and is also simpler.  I think I probably prefer (a).

Doing this would simplify a bunch of the pieces of added code in
the restyle tracker and (a) would also eliminate the need for patch 6
and the avoiding of unsetting all flags during frame construction.

(Sorry for not seeing this earlier in response to your bugzilla comments
above.)


Again, please put the implementations of the inline methods before
any of their uses.


nsCSSFrameConstructor.cpp:

If you take (a) above, you can remove all the bits in all cases, I think.

nsFrameManager.cpp:

>+          // Nothing to do with it for now; when we don't
>+          // automatically restyle our kids we'll need to handle that
>+          // here.  We do want the GetRestyleData call, though, to
>+          // preserve the restyle tracker's invariants.

Why would we ever need to do something for an undisplayed node
beyond what we're already going to do?

Or does this comment belong on the previous hunk?  We will need more
handling there in the next patch.
Attachment #447826 - Flags: review?(dbaron) → review-
Er, actually, I guess (a) doesn't work because of the mutation cases you mentioned.  But I'd still prefer doing what I suggested with bits either at the start of restyle processing or after handling the roots.
(In reply to comment #32)
> I think it's confusing to use RestyleTracker::Element.

OK.  Changed to dom::Element.

> I think the inline RestyleTracker::Document method should probably be
> before its uses.  Same for AddPendingRestylesForRange.

Done.

> >+  if ((aRestyleHint & eRestyle_Self) ||
> >+      (aMinChangeHint & nsChangeHint_ReconstructFrame)) {
> 
> Why this particular check on aMinChangeHint (and not aMinChangeHint != 0)?

Added a comment:

  // We can only treat this element as a restyle root if we would
  // actually restyle its descendants (so either call
  // ReResolveStyleContext on it or just reframe it).

In particular, values of aMinChangeHint that don't include a reconstruct wouldn't ensure that we restyle descendants, and if we marked the element as a root we'd suppress adding its descendants as roots.

> I think this would be clearer with s/Check that/If/ and s/.  If so// .


Done.

> You should add a comment explaining that you can set the root bit on an
> element even though it's an ancestor of that element that's in
> mRestyleRoots, and this can speed up future searching for a root.

Done:

    // At this point some ancestor of aElement (possibly aElement
    // itself) is in mRestyleRoots.  Set the root bit on aElement, to
    // speed up searching for an existing root on its descendants.

> Could you give examples of what causes the restyle bit to get out of
> sync with mPendingRestyles?  (It looks like having frame reconstruction
> not directly from restyling is a possibility.)

Precisely.  In particular being removed from the DOM and then readded to it would do that.  Modified the comment to list that example.

> It's probably worth saying that we maintain that invariant by checking
> whether an element's has an ancestor with the restyle root bit set before
> we append it to the array.

Done.

> |count| is unused

Good catch.  Dropped it.  I think some early version of the patch used it for something....

> Above the various checks that RestyleBit() is set, it might help to have
> a comment that the reason to check it is that it might have already been
> restyled by a restyle on one of its ancestors (generated later than the
> restyle on it).

Or the element could have been moved in the DOM, yes.  This is in the two collector callback methods, right?

> Why does RestyleTracker::GetRestyleData need to deal with the
> possibility that mPendingRestyles.Get returns false?  Shouldn't the
> bit-check already exclude that possibility?

Hmm.  Yes, it should.  I'm not sure why I added the check.  Removed and replaced with an assertion.

> >+      // Do the document check before calling GetStyleData, since we
> >+      // don't want to do the sibling-processing GetStyleData does if
> >+      // the node is no longer relevant.
> 
> Seems like GetRestyleData also ought to assert that this is true (even
> if the assertion comment only says that the reason not to do it is that
> it's a waste of time, which, if that's the case, it probably should say).
> ProcessOneRestyle should probably also assert about it.

Yes.  Added both asserts.

> Also, s/GetStyleData/GetRestyleData/g.

Done.

> I find this comment a little misleading, since the two categories aren't
> really distinct; a single restyle entry could fall in both categories.

Reworded the comment as follows:

    // At this point, the hashtable still contains entries that didn't have
    // RootBit() set on the element but have eRestyle_LaterSiblings set in
    // their restyle hint.  Process those, since they will commonly modify
    // mRestyleRoots; we want to do this against a "live" hashtable because
    // that lets us optimize away multiple siblings with
    // eRestyle_LaterSiblings all set.  But we don't want to modify the
    // hashtable while enumerating it, so collect up the relevant elements
    // first.

> Why not handle eReStyle_LaterSiblings either (a) eagerly

As you noted, this has poor performance when mutating the DOM, per comment 13.

> (b) where you now do (i.e., after processing later-enqueued restyles that
> might restyle all the siblings via an ancestor) by making an entry in the
> restyle table for each sibling, but adding a bit to that entry (maybe
> even the eRestyle_LaterSiblings bit) that indicates that all its siblings
> are already in the table.

For what it's worth, this is what I initially implemented.  It has the following failure mode.  Say we have a bunch of siblings with the eRestyle_LaterSiblings and eRestyle_Self bits _both_ set in the hashtable (e.g. we inserted a bunch of kids under an element whose kids have applicable :nth-* selectors and back out the changes from bug 567944, which I did locally as part of this patch after landing that bug, so the following element of each kid we inserted is marked with both bits) and no ancestor in the hashtable.  We now process these in an order which is the reverse of the order they were added in; worst case this is just reverse DOM order.  Since they're all restyle roots, we'll take the last one, call GetRestyleData on it, flag all its later siblings, and add them to the restyle roots, in order.  Then we will process its eRestyle_Self hint.  Then we'll go back to the restyle root array, grab the last sibling from it, process its eRestyle_Self, hint, etc.  By the time we get back to what used to be the next-to-last restyle root, which has eRestyle_LaterSiblings on it, we've processed all those restyles we created for our later siblings, and don't know that we already restyled them.  So we have to do it again.  This isn't a hypothetical situation.  We hit it on the HTML5 spec at the moment if we restyle later siblings instead of the parent for :nth-* stuff, and making the change from (b) to what I have now made the pageload ther drop from 50s to 25s.

Now I suppose we could do option (b) plus an out-of-band data structure that keeps track of what we've already restyled or something, but that would rely on restyle processing not posting any eRestyle_LaterSiblings restyles.  I can try to do this 

> Again, please put the implementations of the inline methods before
> any of their uses.

Done.

> Why would we ever need to do something for an undisplayed node
> beyond what we're already going to do?

In bug 494117 we use the restyle data to decide whether we need to redo selector matching on the undisplayed node's element.  That is, if the parent doesn't force selector rematching, and the restyle data doesn't either, we'll skip it, but if either one does then we'll have to do it.
> For what it's worth, this is what I initially implemented.

David wasn't suggesting what I though he was suggesting.  He was suggesting that the later-siblings enumerate move earlier in the loop, so that we handle them all at once before doing root processing.
Attachment #452171 - Flags: review?(dbaron)
Attachment #447826 - Attachment is obsolete: true
Comment on attachment 452171 [details] [diff] [review]
Part 7 updated to comments

r=dbaron
Attachment #452171 - Flags: review?(dbaron) → review+
I forgot to address the comment 25 review comments in my initial push; will push a changeset to do that.

(In reply to comment #26)
> Should this comment be put into RestyleTracker.h? :

That leads to include hell.

> Maybe assert before this that mFrameConstructor->mDocument is non-null?

Added, in said additional changeset.
Gah.  And http://hg.mozilla.org/mozilla-central/rev/41bdc98c32bd to fix the comment positioning.
Depends on: 573127
Depends on: 573221
No longer depends on: 573221
Depends on: 573354
Depends on: 574238
Depends on: 575172
Blocks: 977991
You need to log in before you can comment on or make changes to this bug.