Open Bug 1213122 Opened 9 years ago Updated 1 year ago

Loading diff with dozens of file changes is slow due to time spent restyling

Categories

(Core :: CSS Parsing and Computation, defect)

defect

Tracking

()

People

(Reporter: rnewman, Unassigned)

References

Details

Attachments

(2 files, 1 obsolete file)

The slowdown seems to mostly be computing the 'dots', but this review request pretty much locks up my Firefox. Switching back to Splinter.

https://bugzilla.mozilla.org/attachment.cgi?id=8668327
We should probably reduce the page size of the diff viewer pagination. At the moment this seems to be set to 80 files per page, although the default value is 20. Steven do you know where that value is set?
Flags: needinfo?(smacleod)
We increased it files per page limit because people like writing large patches and were complaining about having to sort through multiple pages.

Also, the "dots" are ReviewBoard doing XHR to retrieve the diff content. Chrome handles large reviews like https://reviewboard.mozilla.org/r/20953/ much better than Firefox. It might be worth asking a DOM, etc expert to see if Review Board is doing something that could be better optimized either in Firefox or in Review Board.

I also think glandium filed a similar issue a few months back.
Summary: MozReview perf issue → Loading diff with dozens of file changes is slow
(In reply to Mauro Doglio [:mdoglio] from comment #1)
> We should probably reduce the page size of the diff viewer pagination. At
> the moment this seems to be set to 80 files per page, although the default
> value is 20. Steven do you know where that value is set?
This is set under the diff settings in the admin panel.

Yes, this was changed due to complaints about having paginated diffs
Flags: needinfo?(smacleod)
A +1 for this, but from a slightly different angle.  I got flagged for a review in https://reviewboard.mozilla.org/r/32477/ and I wanted to look at the entire diff:

https://reviewboard.mozilla.org/r/32477/diff/3#index_header

to see how the xpcom change worked in the larger context of the patch.  Loading the diff is an exercise in patience, with just 58 files changed according to diffstat.  I can't imagine doing reviews on large refactoring patches that could easily touch a hundred files...
I profiled this and it's spending a bunch of time in the restyler. Profile too big to attach, let's see how long this works: https://cleopatra.io/#report=ef76034852019ee59d0437490aba109331c05b60
This issue is almost certainly an upstream issue and the fix will originate there.

I'm not sure when a MozReview developer will have time to look into this. But if you discover the source of the slowdown (restyling seems to be a candidate), upstream bugs can be reported at https://hellosplat.com/s/beanbag/tickets/.
CC dbaron since he is next to me debugging a Review Board CPU explosion and figured he would be interested in this one too :)
Summary: Loading diff with dozens of file changes is slow → Loading diff with dozens of file changes is slow due to time spent restyling
Product: Developer Services → MozReview
Cameron, do you have time to look at the restyler bits here?
Flags: needinfo?(cam)
I'm on PTO from today for two weeks so won't be able to look at it before then.
Flags: needinfo?(cam)
Cameron, do you have any time now?
Flags: needinfo?(cam)
it doesn't appear to be the number of files alone which drives this slowdown.
https://reviewboard.mozilla.org/r/44623/diff/1 touches hundreds of files and doesn't appear to trigger this issue.
Assignee: nobody → cam
Status: NEW → ASSIGNED
Flags: needinfo?(cam)
For each diff view that is loaded, a <style> element is added to the document that looks like this:

https://github.com/reviewboard/reviewboard/blob/3f6a996f2c488e70164aa0eb756a37ce0c4e9d7e/reviewboard/static/rb/js/diffviewer/views/diffReviewableView.js#L17

where "id" expands out to a string like "#file12345" (which identifies a <table> that contains the diff for a single file).  When the <style> element is added to the document initially, it is empty (which we don't need to process):

https://github.com/reviewboard/reviewboard/blob/3f6a996f2c488e70164aa0eb756a37ce0c4e9d7e/reviewboard/static/rb/js/diffviewer/views/diffReviewableView.js#L100

but when its inline style contents are updated:

https://github.com/reviewboard/reviewboard/blob/3f6a996f2c488e70164aa0eb756a37ce0c4e9d7e/reviewboard/static/rb/js/diffviewer/views/diffReviewableView.js#L521

we blow away the cascade and restyle the entire document.  We currently can't do anything smart when style sheets are added/updated like this -- we don't know what parts of the rule cascade are still valid, so we recompute it entirely, and then since we don't know which elements might have changed which rules they match, we restyle with eRestyle_Subtree from the root element.

On the diff from comment 0, we end up with 70 <style> elements inserted into the document, one after each diff gets loaded, so we spend quadratic time restyling the document.  Interestingly, if you ignore the #fileNNNNN part of the selectors, we only end up generating four unique style sheet texts, which suggests that ReviewBoard could do something smarter such as generating style sheets like:

  .unique-column-widths-1 td pre,
  .unique-column-widths-1 .revision-row th.revision-col {
    min-width: ...;
    max-width: ...;
  }

and then set the appropriate class on the #fileNNNNN element.  That should avoid all but four of the whole document restyles.


In terms of what we could do to make style sheet changes faster: it's going to be hard to update the cascade incrementally, so I think we'll continue to need to re-run the cascade entirely.  But we could do something to limit the elements that get restyled.

Similarly to how we now optimize restyles due to rules like:

  .a .b .c { ... }

where if we toggle the "a" class somewhere, we no longer restyle the entire tree beneath that element that had the toggled class but instead traverse down and search for elements that match .c, we could look at the rightmost selectors in the style sheet that was just updated, and generate an eRestyle_SomeDescendants on the root element.  That would mean we effectively search the document for -- on the case of ReviewBoard's style sheets -- elements that match pre, th.revision-col or th, and re-run selector matching and restyle them.

A problem with this is that we've blown away the old rule tree.  Maybe we can restyle the entire document, resolve new style contexts point at (what will be) equivalent rules in the new rule tree, but know that they can't have updated style values and so keep around the old style contexts to avoid computing them?  Sounds tricky (although maybe less tricky re style struct ownership once we do bug 1188721).

Or perhaps when we add a new style sheet, we don't blow away the rule tree, but keep it around and add to it when re-running the cascade?  Maybe that would let the elements that we know can't be affected by the new rules keep their old style contexts with their old rule nodes?
I tried modifying nsStyleLinkElement::DoUpdateStyleSheet to set any inline style sheet contents to the empty string if it begins with "#file", to ignore these 70 <style> elements, and the page loads somewhat more quickly -- about 30 seconds rather than 3 minutes on my machine -- and is much more responsive while loading.
Cameron, thanks for hunting that down!

I can confirm (via experiments in adding style elements to https://html.spec.whatwg.org/ via the console) that Chrome generally seems to handle addition of a new <style> element that doesn't affect any elements (or affects only a small number of elements?) better than we do.

> so I think we'll continue to need to re-run the cascade entirely

I think that should be fine.  Rebuilding the cascade is fairly cheap in most cases.  I checked on the link from comment 0, and these are the numbers I see in a current nightly on my hardware:

1)  Inserting a style element and flushing layout: 1400ms.
2)  Doing the same when body is |display:none|: 2.3ms.

> A problem with this is that we've blown away the old rule tree.

Where do you see this happening?  I don't see us doing that anywhere obvious.  PresShell::StyleSheetAdded just ends up setting mStylesHaveChanged, which then in PresShell::EndUpdate calls nsIPresShell::ReconstructStyleDataInternal which just posts a eRestyle_Subtree at the root.  And I assume we do a ClearRuleCascades in there, but that also shouldn't blow away the old rule tree.

I think the only thing that blows away the old rule tree is nsStyleSet::BeginReconstruct, which is called from RestyleManager::StartRebuildAllStyleData, which is called, afaict, only if rem units change or if RestyleManager::RebuildAllStyleData or RestyleManager::PostRebuildAllStyleDataEvent are called.  Do we actually reach either of those on adding a sheet to the document?  I'm not seeing it obviously by code inspection, or via setting a breakpoint in nsStyleSet::BeginReconstruct and then adding a <style> element with rules in it; the breakpoint is not getting hit.
(In reply to Boris Zbarsky [:bz] from comment #15)
> Where do you see this happening?  I don't see us doing that anywhere
> obvious.

You are right.  (I saw the calls to ReconstructStyleData in e.g. PresShell::AddAuthorSheet and thought that did a reconstruct, but it's deceptively named.)

Barring better ideas, I'll try generating an eRestyle_SomeDescendants based on the rules in the sheet that was updated.  Not sure whether we should have a limit on the number of rules that we do this for.  (And probably we should make {RestyleHintData,ElementRestyler}::mSelectorsForDescendants an nsHashtable of nsCSSSelectors rather than an array, so we don't have to deal with duplicates.)
The tradeoff here is basically how long it will take to find all the relevant elements, right?  I guess that search would happen during the actual restyle, not while posting the pending restyle, so during the restyle we would be comparing everything in the document tree to all the rules in the new sheet, bypassing the normal rulehash optimizations.  Or am I misunderstanding how it would work?
Yes, that's right.  We're traversing the whole document anyway, though, so the "how long it will take to find all the relevant elements" is the time to evaluate all of the selectors in mSelectorsForDescendants against each element in the document, and we just need that time to be shorter than the time made up by not needing to restyle some elements in the document.

Worst case would be mSelectorsForDescendants has very many entries that we test against each element, and the last entry is something that matches most or all elements in the document.  Then we'll take the normal eRestyle_Subtree time plus the "wasted" time testing against the mSelectorsForDescendants entries.

One thing to note is that as soon as you have a rule like:

  * { ... }

or

  .a > * { ... }

then you've lost.


Is there information stored in the rulehash that can help us do better here?
And note that each entry in mSelectorsForDescendants is a single selector without combinators.  If it matches, then we re-run selector matching and restyle as usual, as if we had an eRestyle_Self hint on that element.
Ah, if we're just storing the rightmost selector that helps a bit.

The rulehash is all about quickly optimizing out compares against selectors that can't possibly match the element, in the limit of "one element, many potentially matching selectors).

Is the worst case really when the last entry matches many elements?  Once we convert to an eRestyle_Subtree, we stop checking mSelectorsForDescendants on that subtree, no?  It seems like a worst case could be a huge number of rules that we end up having to match one by one against every single element in the tree.  Think 10,000 ID rules for nonexistent IDs, leading to 10000*(number of elements) GetID() calls; that sort of thing.
(In reply to Boris Zbarsky [:bz] from comment #20)
> Is the worst case really when the last entry matches many elements?  Once we
> convert to an eRestyle_Subtree, we stop checking mSelectorsForDescendants on
> that subtree, no?

We convert it to eRestyle_Self, not eRestyle_Subtree.

> It seems like a worst case could be a huge number of
> rules that we end up having to match one by one against every single element
> in the tree.  Think 10,000 ID rules for nonexistent IDs, leading to
> 10000*(number of elements) GetID() calls; that sort of thing.

So yes, 10,000 entries in mSelectorsForDescendants is almost the worst case, but having the final entry match is slightly worse since that will cause style contexts to be resolved.  If no entries in mSelectorsForDescendants match any elements, we won't resolve any new style contexts at all.  (We switch to a mode in RestyleManager where we call ConditionallyRestyleChildren rather than ReparentStyleContext down the tree.)

(If we converted a matching mSelectorsForDescendants into eRestyle_Subtree, and then ignored mSelectorsForDescendants for the subtree from that point onward, then I think you are right that an all non-matching mSelectorsForDescendants array could be worse.)
Another thought: instead of just testing against a single nsCSSSelector object before deciding whether to fully perform selector matching and restyle an element, we could match against the entire selector (i.e. the chain of nsCSSSelector objects), and only then decide to restyle the element if it matches.  To avoid having to duplicate the selector matching effort, we could temporarily cache the result of matching the Element against that entire selector somwhere so that once we come to it in nsCSSRuleProcessor::SelectorMatchesTree, we could return true immediately.

This would let us handle sheets with rules like:

  div > * { ... }

without having to give up an restyle the entire document.
An idea that might be too much effort:

If we are performing this optimization only when we add a sheet to the document, then we might even be able to skip selector matching and instead re-build the path in the rule tree to include the newly matched rule (a bit like what we do with eRestyle_StyleAttribute).  We would need to work out where in the tree the new rule node needs to be inserted, though, which isn't straightforward.
Feel free to move this bug to the layout component. I reckon Review Board could implement the suggested workaround in comment #13 if we don't want to wait on a fix for this performance issue. Please file a new bug to track that if you move this one.
Note that I filed https://hellosplat.com/s/beanbag/tickets/4389/ on the reviewboard side of this.
> we could match against the entire selector

That seems like it could go very slow, esp. if we don't have the bloom filter in place there.  Or do we have it?

> We would need to work out where in the tree the new rule node needs to be inserted

And which elements are affected, yes?
(In reply to Boris Zbarsky [:bz] from comment #26)
> > we could match against the entire selector
>
> That seems like it could go very slow, esp. if we don't have the bloom
> filter in place there.  Or do we have it?

We will have a TreeMatchContext available, since we'll do this selector matching in the middle of restyling.  (We would have to push ancestors inside ConditionallyRestyleContentChildren, which we don't do currently.)

> > We would need to work out where in the tree the new rule node needs to be inserted
> 
> And which elements are affected, yes?

Yes.  So we still traverse the document, but at each element we can just check its rule node's rule.


I'm going to try the eRestyle_SomeDescendants approach first, and if that's good enough, look at comment 22 as a followup.
Attached patch WIPSplinter Review
This is the approach I'm trying.  Unfortunately, it only reduces the page loading time down to around 2 minutes on my machine.  Looking at restyle logs, we still end up restyling 40% of the elements in the document (the "pre" selector matches most of those).
Attached patch WIP v2 (obsolete) — Splinter Review
Here's the comment 22 approach, where we evaluate the entire selector chain.  Compared to attachment 8743047 [details] [diff] [review], I removed the removal of duplicate entries in mSelectorsForDescendants, which isn't as important when considering the entire selector chain.

For the comment 0 mozreview page, I now get 45s from page load until all the patch hunks finish loading.

Another test: with the single page HTML spec, if I insert the element

  <style>h1 { color: red !important; }</style>

into the document and flush styles, without the patch I get around 2100ms
and with the patch it's down to 95ms.
Component: General → CSS Parsing and Computation
Product: MozReview → Core
Version: Production → unspecified
Attached patch WIP v2.1Splinter Review
Boris, what do you think of this?

I don't have a good feeling for what kMaxSelectorsForRestyle should be.  It's a trade off between selector complexity and document size, so a fixed number isn't really appropriate.  Do you think a limit is necessary?
Attachment #8743128 - Attachment is obsolete: true
Attachment #8743172 - Flags: feedback?(bzbarsky)
Comment on attachment 8743172 [details] [diff] [review]
WIP v2.1

We should probably check that @import matches the prescontext too if it has media before diving into it.

I also don't have a good feel for how big kMaxSelectorsForRestyle should be.  Could we try just doing some measurements with a few different values and different sheet/document sizes?  My gut feeling is that it should be closer to "10" than "1000".

We could probably extend this setup to handle StyleRuleAdded as a followup, if we really care....
Attachment #8743172 - Flags: feedback?(bzbarsky) → feedback+
All that said, I'd be really curious to find out how Blink handles this situation.
(In reply to Boris Zbarsky [:bz] (still a bit busy) from comment #33)
> All that said, I'd be really curious to find out how Blink handles this
> situation.

Just arrived here somehow and had a local Blink debug build around so felt curious...

So doing the same test as heycam (adding "h1 { color: red!important; }" to the HTML spec) seemed to recalc the whole world. I was already making conjectures about why Blink's style system was so much faster, looking at all the caches they have and similar stuff, when I found the case for this specific difference.

They do what they call "Style invalidation analysis", that mainly marks as needing style recalc only subtrees that match a given scope.

What this means, in practice, is that when the stylesheet is added, they go through all the rules[1], and if they find only class or id rules (or a few more, but let's simplify), they only invalidate the subtree of elements matching those selectors[2], which means they'll only restyle that table in the specific Reviewboard case.

Otherwise they'll issue the same full-document restyle as Gecko.

[1]: https://cs.chromium.org/chromium/src/third_party/WebKit/Source/core/css/invalidation/StyleSheetInvalidationAnalysis.cpp?dr=C&sq=package:chromium&rcl=1479284732&l=123
[2]: https://cs.chromium.org/chromium/src/third_party/WebKit/Source/core/css/invalidation/StyleSheetInvalidationAnalysis.cpp?dr=C&sq=package:chromium&rcl=1479284732&l=187
Emilio, thank you!

So that's similar to the idea from comment 16, but possibly scoped to selectors for which they can very quickly find the set of matching elements (assuming that's true for class selectors in their case).  We can certainly do that for id selectors, using the id table on the document...

Cameron, any chance you'll have time to finish this up?
Flags: needinfo?(cam)
Ah, forgot I dropped this bug.  I don't have time currently to do more analysis, which we might want to do to select the kMaxSelectorsForRestyle value, but I could get the patch ready for review besides that.  I guess it would be safer to start with a smaller value that we know helps for ReviewBoard, and then we could look at fine tuning it later.

The other thing would be to consider, as you say, using the document's ID table to look up the element to post a restyle for directly, instead of doing a search through the document with eRestyle_SomeDescendants.  I think this could be a followup too; the current WIP patch will already do a much better job for rules like |#abc { ... }| if the element is deep in the document than we do at the moment.
Flags: needinfo?(cam)
Two problems I just found.

The first is that for @import rules, we shouldn't be traversing into the referenced sheet, since the referenced sheet is added separately to the document and we will have already handled inspecting its rules.  Not a problem, we just should skip doing this.  I think that child sheet should already have its media list set due to the @import rule, so we can still check that.

The second: handling multiple style sheet additions before flushing doesn't work so well, since the first style sheet addition will post SomeDescendants hints, and the second addition will clear all those out and convert them to Subtree hints due to the ClearSelectors call under ClearRuleCascades.

We clear those selectors because the style sheet update might have destroyed the nsCSSSelector objects.  So I'm wondering now whether we should be cloning those nsCSSSelectors so that the RestyleData can own them, or just make them refcounted.  The latter is probably easier, given the way we take selectors out of the RestyleData and then append them to ElementRestyler::mSelectorsForDescendants as we go down the tree.
Blocks: 1273303
Assignee: cam → nobody
Status: ASSIGNED → NEW
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: