Closed Bug 1342009 Opened 7 years ago Closed 7 years ago

A fast path for BuildDisplayListForChild() improves more than 20% of time.

Categories

(Core :: Web Painting, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla55
Performance Impact high
Tracking Status
firefox55 --- fixed

People

(Reporter: sinker, Assigned: sinker)

References

Details

Attachments

(2 files, 12 obsolete files)

5.69 KB, patch
Details | Diff | Splinter Review
28.26 KB, patch
sinker
: review+
Details | Diff | Splinter Review
Attached patch frame-fast-path.diff (obsolete) — Splinter Review
I found for most of time |nsFrame::BuildDisplayListForChild()| are running along the same path for most of all frames.  In the case of JavaScript item on wikipedia, it runs the same path for 90+% of frames for painting; no stacking context and no pseudo stacking context.  I have a patch to remember it on frames.  So, next time when the same frame is built it does exact what is needed to do without recomputing and checking conditions.

With my simple experiments, the JavaScript page on wikipedia, it takes about 4.3ms for building display list w/ the patch, and about 5.5ms w/o the patch.

The steps of experiments.
 - with frame-scopetime.diff
 - w/ or w/o frame-fast-path.diff
 - load the wikipedia page
 - in JS conole, run |vx = 0; window.setInterval(function(){document.body.style=vx%2 ? "padding: 10px;" : "padding: 0px"; vx = vx + 1; }, 1000)|
 - wait for 30s
 - |print gFrameScopeAcc / gFrameScopeCount| in gdb
Blocks: gdoc_pageend
Comment on attachment 8840389 [details] [diff] [review]
frame-fast-path.diff

Review of attachment 8840389 [details] [diff] [review]:
-----------------------------------------------------------------

Hi David,
I have two questions about invalidation of styles and display items.

::: layout/base/RestyleManager.cpp
@@ +1384,5 @@
>    // optmization is possible if nsStyleChangeList::AppendChange could coalesce
>    for (const nsStyleChangeData& data : aChangeList) {
>      if (data.mFrame) {
>        propTable->Set(data.mFrame, ChangeListProperty(), true);
> +      data.mFrame->DisableFastPath();

Is it reasonable to invalidate a flag relative to style on frames at here?  Is there any better solution?

::: layout/generic/nsFrame.cpp
@@ +2751,5 @@
> +  if (child->DoFastPath() &&
> +      !aBuilder->GetSelectedFramesOnly() &&
> +      !aBuilder->GetIncludeAllOutOfFlows() &&
> +      aBuilder->IsPaintingToWindow() &&
> +      aBuilder->IsBuildingLayerEventRegions()) {

I am not sure if the assumption here right.  The decision making during the previous painting is still valid here if these conditions are met without restyling in between painting.  How do you think?
Attachment #8840389 - Flags: feedback?(dbaron)
Attached patch frame-fast-path.diff, v2 (obsolete) — Splinter Review
Disable fast path for frames in |RestyleSelf()|.
Attachment #8840389 - Attachment is obsolete: true
Attachment #8840389 - Flags: feedback?(dbaron)
Comment on attachment 8841318 [details] [diff] [review]
frame-fast-path.diff, v2

So a few preliminary comments here:

(1) We don't normally add booleans to nsIFrame; we used state bits in
mState.  (At some point we might need to add another 32 bits worth,
perhaps using a bitfield instead of an integer, but we should structure
it in a way that's easily extensible.)

(2) "ValidFastPath" is much to vague/general a name to be clear what it
means in the context of frames; the naming here would need to be much
more specific.

As far as your first question goes:

Hooking in to RestyleSelf seems unlikely to be the right thing.  If
you want something that resets when a frame's style context changes, you
could put code in nsFrame::DidSetStyleContext, which will both be in a
more logical place and a better single point to catch what you need to
catch.  However, the more normal way to handle things that need to be
invalidated by style changes it to use change hint processing, either by
doing something (in RestyleManager::ProcessRestyledFrames or something
it calls) in response to existing nsChangeHint values, or by adding a
new nsChangeHint value, producing it in the appropriate structs in
nsStyleStruct.{h,cpp} (CalcDifference methods, and the helper methods
like MaxDifference that are used for assertions), and then using it in
ProcessRestyledFrames.


I don't think I'm qualified to answer the question at the end of comment
2; you should ask Matt Woodrow.
Attachment #8841318 - Flags: feedback-
Just as a head's up, the try push here:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=1b58a03d704f617d729b071fcc61799a593c0215

...has taskcluster jobs that time out at one hour, but before doing so generate in some cases 300MB+ logs that are causing backlogs in Treeherder log ingestion.

Treeherder should definitely handle this case gracefully (see bug 1343831), but in the meantime if it's possible to reduce the number of platforms you run at once when experimenting/debugging, it should reduce the impact :-)
Depends on: 1343840
No longer depends on: 1343840
(In reply to Ed Morley (Away 3rd-5th March) [:emorley] from comment #5)
> ...has taskcluster jobs that time out at one hour, but before doing so
> generate in some cases 300MB+ logs that are causing backlogs in Treeherder
> log ingestion.
Sorry for that!
I would be careful!
Attached patch frame-fast-path.diff, v3 (obsolete) — Splinter Review
This patch is adapted to what dbaron's suggestion of |DidSetStyleContext()|.
Attachment #8841318 - Attachment is obsolete: true
Attachment #8843290 - Attachment is obsolete: true
Hi Shako,
could you do a hasal test for my patch?
I need the comparison with and without the patch here for gsuite, especially for the case of 200 pages document.
Flags: needinfo?(wachen)
Flags: needinfo?(sho)
Hi Thinker,

  Sure, the test result as below:

                              MEDIAN         	AVG            	STD            	SI             	PSI  
test_firefox_gdoc_load_file   8940.0            8940.0          1775.378704     1725            1265   (Latest Nightly Build)
test_firefox_gdoc_load_file   9080.0         	9080.0         	218.4172779    	2160           	1814   (1b58a03d704f617d729b071fcc61799a593c0215)
Flags: needinfo?(sho)
I have done tests for NYTimes and gdoc with following scripts

 - NYTImes: var stop=0; var durations=[]; var timer=window.setInterval(function () { if (stop == 1 || durations.length >= 30) {window.clearInterval(timer); durations.sort(function (a,b) {return a - b;}); console.log("median: " + durations[Math.floor(durations.length / 2)]); console.log("shortest: " + durations[0]); return;} var stepi= 0; var starttm=Date.now(); (function step() { if (stepi % 2) window.scrollTo(0,window.scrollMaxY); else window.scrollTo(0,window.scrollMaxY / 2); stepi = stepi + 1; if (stepi < 10) window.requestAnimationFrame(step); else { var duration=Date.now() - starttm; durations.push(duration); console.log(duration);}})();}, 5000);

 - gdoc: var scrollframe = document.getElementsByClassName("kix-appview-editor")[0]; var stop=0; var step=0; var timer=window.setInterval(function () { if (stop == 1 || step >= 60) {window.clearInterval(timer); return;} if (step % 2) scrollframe.scrollTo(0,scrollframe.scrollHeight); else scrollframe.scrollTo(0,scrollframe.scrollHeight / 2); step=step+1;}, 500);

These script are triggered as bookmarklets.  They just do scrolling repetitively to see how much would be spent in building display list.  They are used to test the main page of NYTimes, and one of gdoc with 60 pages.

And use the patch of attachment 8840390 [details] [diff] [review] to track the time spent in building display list for painting to window.
The results are,

 - w/o patch:
   - NYTimes 3204, 3403, 3212
   - gdoc 4664, 5064, 4818, 4602, 5012
 - w   patch:
   - NYTimes 2974, 2823, 3024
   - gdoc 4307, 3925, 4176, 4243, 4104

For NYTimes, the patch improves 11%.
For gdoc, the patch improves 16%.
Assignee: nobody → tlee
Whiteboard: [qf:p1]
Thanks to Walter!
He do some test for my updates, an give me following numbers for
loading 200+pages gdoc.

pc2 72428ae: si 1577 psi 729 med 6160 std_dev 284 avg 6110
pc2 b51cd81: si 1887 psi 775 med 5970 std_dev 195 avg 5979
pc4 72428ae: si 2273 psi 1825 med 9080 std_dev x avg 9080
pc4 b51cd81: si 2120 psi 1411 med 8600 std_dev 320 avg 8600

pc2 and pc4 are two separated machines.  The changeset 72428ae is the parent of
my patch queue, b51cd81 is the top of my latest patch.

The medium value has been improved 3% and 5% for pc2 and pc4.  In consideration
of that content's Javascript is heavy during page loading, the improvement is
considerable.
Flags: needinfo?(wachen)
Matt, I found most of time BuildDisplayListForChild() is running along the same path.  This patch remember the fact that a child frame was handled by that major path, and provide a shortcut for next time.

The cached result would be clear in |nsFrame::DidSetStyleContext()| whenever the frame is restyled.
Attachment #8843875 - Attachment is obsolete: true
Attachment #8849048 - Flags: review?(matt.woodrow)
Comment on attachment 8849048 [details] [diff] [review]
Provide a shortcut in BuildDisplayListForChild, v2

Review of attachment 8849048 [details] [diff] [review]:
-----------------------------------------------------------------

I need to think about this more over the weekend.

I really like the idea (and the performance improvement), but it's really hard to verify that it's correct and we're not missing anything.

It looks like the initial place that sets pseudoStackingContext = true needs to be counted as away from the major path?

::: layout/generic/nsIFrame.h
@@ +3870,5 @@
> +  /**
> +   * True if the frame can be handled with the shortcut in
> +   * BuildDisplayListForChild().
> +   */
> +  bool mCanDoDisplayListShortcut: 1;

Use a frame state bit for this, rather than making nsIFrame bigger.
Comment on attachment 8849048 [details] [diff] [review]
Provide a shortcut in BuildDisplayListForChild, v2

Review of attachment 8849048 [details] [diff] [review]:
-----------------------------------------------------------------

::: layout/generic/nsFrame.cpp
@@ +2938,5 @@
>        // to enter to reach other out-of-flow frames that are visible.
>        dirty.SetEmpty();
>      }
>      pseudoStackingContext = true;
> +    awayMajorPath = true;

The problem is here why I don't remove this line.

@@ +3100,5 @@
>          if (eventRegions) {
>            eventRegions->AddFrame(aBuilder, child);
>          }
> +        if (!awayMajorPath &&
> +            !pseudoStackingContext &&

Since we have this line, |pseudoStackingContext| does not imply to have to set |awayMajorPath|.  Or, always to set |awayMajorPath| for |pseudoStackingContext|, and remove this line.
Status: NEW → ASSIGNED
(In reply to Thinker Li [:sinker] from comment #15)

> Since we have this line, |pseudoStackingContext| does not imply to have to
> set |awayMajorPath|.  Or, always to set |awayMajorPath| for
> |pseudoStackingContext|, and remove this line.

I feel like the second option would be easier to follow.
Comment on attachment 8849048 [details] [diff] [review]
Provide a shortcut in BuildDisplayListForChild, v2

Review of attachment 8849048 [details] [diff] [review]:
-----------------------------------------------------------------

::: layout/generic/nsFrame.cpp
@@ +984,5 @@
>    }
>  
>    RemoveStateBits(NS_FRAME_SIMPLE_EVENT_REGIONS);
> +
> +  DisableDisplayListShortcut();

How about copying the event regions shortcut style above?

Make the frame state bit NS_FRAME_SIMPLE_DISPLAY_LIST, and just call RemoveStateBits(NS_FRAME_SIMPLE_DISPLAY_LIST) here.

I'm not sure we really need the helpers in nsIFrame.h if we do that.

@@ +2043,5 @@
>  /**
>   * If the CSS 'overflow' property applies to this frame, and is not
>   * handled by constructing a dedicated nsHTML/XULScrollFrame, set up clipping
>   * for that overflow in aBuilder->ClipState() to clip all containing-block
>   * descendants.

Add 'Returns true if clipping was applied' to this comment.

@@ +2793,5 @@
> +  return true;
> +}
> +
> +#ifndef NO_DISPLAY_LIST_SHORTCUT
> +typedef bool MPFlag_t;

I think we can just use bool and not worry about this.

@@ +2836,5 @@
>    nsIFrame* child = aChild;
>    if (child->GetStateBits() & NS_FRAME_TOO_DEEP_IN_FRAME_TREE)
>      return;
>  
> +  if (child->CanDoDisplayListShortcut() &&

It would be nice if we could rearrange the code so that the fast path code is shared with the normal path.

Is it possible to have the checks/clipping etc at the start, and just skip them if CanDoDisplayListShortcut is true?

That way we run the same code, we just get there faster if we're on the shortcut.

@@ +2893,5 @@
>  
> +  // It is raised if the control flow strays off the major path.
> +  // The major path is the most common one of THE COMMON CASE
> +  // mentioned later.
> +  MPFlag_t awayMajorPath = false;

Maybe from awayFromCommonPath?

::: layout/generic/nsIFrame.h
@@ +3864,5 @@
>  
>    virtual nsresult  GetFrameName(nsAString& aResult) const = 0;
>  #endif
>  
> +#ifndef NO_DISPLAY_LIST_SHORTCUT

I don't think having these defines is worth it. It makes the code harder to read, and this should be really easy to disable at the time if we need to backout.
(In reply to Matt Woodrow (:mattwoodrow) from comment #17)
> Comment on attachment 8849048 [details] [diff] [review]
> Provide a shortcut in BuildDisplayListForChild, v2
> 
> Review of attachment 8849048 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Make the frame state bit NS_FRAME_SIMPLE_DISPLAY_LIST, and just call
> RemoveStateBits(NS_FRAME_SIMPLE_DISPLAY_LIST) here.

We have run out of bits.  So, I will add another 64bits flags here.

> > +  if (child->CanDoDisplayListShortcut() &&
> 
> It would be nice if we could rearrange the code so that the fast path code
> is shared with the normal path.
> 
> Is it possible to have the checks/clipping etc at the start, and just skip
> them if CanDoDisplayListShortcut is true?
> 
> That way we run the same code, we just get there faster if we're on the
> shortcut.

With this way, I should touch almost every single line before the COMMON CASE
to make sure no addition task having been done here.  If it doesn't annoy people,
I would like to do it.
I mean no redundant task at run time.
Fix the problems mentioned above, except using state bits.  State bits are full.  Do we add another state bits?
Attachment #8849048 - Attachment is obsolete: true
Attachment #8849048 - Flags: review?(matt.woodrow)
Attachment #8856924 - Flags: review?(matt.woodrow)
Refactor code as |BeforeBuildDisplayList()| and |BuildDisplayListCommonCase()| to make it more maintainable.

|BeforeBuildDisplayList() includes tasks that is necessary to be done before building display list for a frame.

|BuildDisplayListCommonCase()| includes tasks that is performed for COMMON CASE to build display list for a frame.
Attachment #8856928 - Flags: review?(matt.woodrow)
(In reply to Thinker Li [:sinker] from comment #20)
> Created attachment 8856924 [details] [diff] [review]
> Part 1: Provide a shortcut in BuildDisplayListForChild, v3
> 
> Fix the problems mentioned above, except using state bits.  State bits are
> full.  Do we add another state bits?

Yeah, I think we need to, as adding a bool:1 is going to round the object up another 64bits anyway.

Bug 1257732 was filed for solving this, so we probably need to push that through and then rebase this patch on top of it. Sorry that's you've ended up being blocked on this :(

Another alternative is to see if any of the existing bits are no longer necessary, but that's just putting this problem off for a bit longer.

Another another alternative is bug 1331718. If we put the FrameProperties pointer on the frame directly, then we'll remove the cost of the hashtable lookup (which should be a boost to DL/FLB!). That will likely remove the need for quite a few existing frame state bits (some of them are just so we can skip the FrameProperties hashtable search if we know it won't be found).
That might be a better use for the extra 64 bits, as it'll give us some perf wins as well as providing new state bits.

Putting FrameProperites on nsIFrame directly wasn't popular since it makes all frames bigger, even ones that don't have any FrameProperties. If we have to add another 64bits of frame state regardless, then this is no longer a concern (as we're adding 64bits regardless) and it becomes more attractive.

We'll need dbaron to sign off on the change to the sizeof(nsIFrame) regardless of which path we go down.
First patch looks good though! Will r+ once the frame state situation is sorted out.
(In reply to Thinker Li [:sinker] from comment #20)
> State bits are full.  Do we add another state bits?

Bit 57 looks free to me:

http://searchfox.org/mozilla-central/rev/2fc8c8d483d9ec9fd0ec319c6c53807f7fa8e8a2/layout/generic/nsFrameStateBits.h#274,278
(In reply to Mats Palmgren (:mats) from comment #24)
> (In reply to Thinker Li [:sinker] from comment #20)
> > State bits are full.  Do we add another state bits?
> 
> Bit 57 looks free to me:
> 
> http://searchfox.org/mozilla-central/rev/
> 2fc8c8d483d9ec9fd0ec319c6c53807f7fa8e8a2/layout/generic/nsFrameStateBits.
> h#274,278

Good spotting, thanks Mats!


Some other ideas for freeing state bits:

NS_FRAME_HAS_INVALID_RECT and NS_FRAME_HAS_LAYER_ACTIVITY_PROPERTY both could probably just use NS_FRAME_HAS_PROPERTIES.

NS_FRAME_NEEDS_PAINT could probably just mark the DisplayItemData objects for that frame as invalid, instead of marking the frame state bit.

NS_FRAME_UPDATE_LAYER_TREE is only used on display roots, doesn't need to be super performant, we could mark it with a frame property instead.

NS_FRAME_SVG_LAYOUT is only added on SVG specific frame types, shouldn't need to occupy a Generic bit.
Use state bits.
Attachment #8856924 - Attachment is obsolete: true
Attachment #8856924 - Flags: review?(matt.woodrow)
Attachment #8857399 - Flags: review?(matt.woodrow)
rebase
Attachment #8856928 - Attachment is obsolete: true
Attachment #8856928 - Flags: review?(matt.woodrow)
Attachment #8857401 - Flags: review?(matt.woodrow)
Attachment #8857399 - Flags: review?(matt.woodrow) → review+
Comment on attachment 8857401 [details] [diff] [review]
Part 2: Refactoring for COMMON CASE and BeforeBuildDi splayList, v2

Review of attachment 8857401 [details] [diff] [review]:
-----------------------------------------------------------------

Actually, I'm not convinced that this is worth it. You should just go ahead and land the first patch.
Attachment #8857401 - Flags: review?(matt.woodrow)
Attachment #8857401 - Attachment is obsolete: true
please check-in attachment #8859825 [details] [diff] [review], Thank you!
Keywords: checkin-needed
Pushed by cbook@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/af7955546902
Provide a shortcut in BuildDisplayListForChild. r=mattwoodrow
Keywords: checkin-needed
backed out for mochitest failures like https://treeherder.mozilla.org/logviewer.html#?job_id=92935570&repo=mozilla-inbound
Flags: needinfo?(tlee)
Backout by cbook@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/5b9b767015e2
Backed out changeset af7955546902 for mochitest failures
AGR status of scrollable and SVG frames would be affected by active scrolling and fragment identifiers in respective.  This patch handles these problems.
Attachment #8859825 - Attachment is obsolete: true
Flags: needinfo?(tlee)
Attachment #8862743 - Flags: review?(matt.woodrow)
Comment on attachment 8862743 [details] [diff] [review]
Provide a shortcut in BuildDisplayListForChild, v4

Review of attachment 8862743 [details] [diff] [review]:
-----------------------------------------------------------------

::: layout/painting/nsDisplayList.cpp
@@ +1534,5 @@
>    return false;
>  }
>  
> +bool
> +nsDisplayListBuilder::MaybeAnimatedGeometryRoot(const nsIFrame* aFrame)

Why do we need this to be separate from IsAnimatedGeometryRoot?

Can we combine them somehow so we don't have to keep two functions in sync?

Possibly have the function return an enum Yes/No/Maybe?
This patch is changed with the suggestion in last comment.
Attachment #8862743 - Attachment is obsolete: true
Attachment #8862743 - Flags: review?(matt.woodrow)
Attachment #8864022 - Flags: review?(matt.woodrow)
Comment on attachment 8864022 [details] [diff] [review]
Provide a shortcut in BuildDisplayListForChild, v5

Review of attachment 8864022 [details] [diff] [review]:
-----------------------------------------------------------------

::: layout/painting/nsDisplayList.cpp
@@ +1503,5 @@
>    nsIFrame* parent = nsLayoutUtils::GetCrossDocParentFrame(aFrame);
>    if (!parent)
> +    return AGR_YES;
> +
> +  bool maybe = false; // Possible to transite the state from not AGR

Possible to transition from not being an AGR to an AGR without a style change.
Attachment #8864022 - Flags: review?(matt.woodrow) → review+
Keywords: checkin-needed
Pushed by ihsiao@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/c23e896dcbb1
Provide a shortcut in BuildDisplayListForChild. r=mattwoodrow
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/c23e896dcbb1
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Performance Impact: --- → P1
Whiteboard: [qf:p1]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: