Closed
Bug 1351313
Opened 8 years ago
Closed 7 years ago
Reuse nsDisplayText for frames
Categories
(Core :: Web Painting, enhancement)
Core
Web Painting
Tracking
()
People
(Reporter: sinker, Assigned: u459114)
References
(Depends on 1 open bug)
Details
Attachments
(1 file, 3 obsolete files)
59 bytes,
text/x-review-board-request
|
Details |
|nsDisplayText| is usually the most popular type of display items. It would improve overall time of building display list to reuse the instances of |nsDislayText| created for frames.
The idea here is to set the instances of |nsDisplayText| as one of properties of the associated frame. So, next time, it is reused to avoid memory allocation, initialization and a lot of checks.
The problem is |nsDisplayItem| has some member variables; related to clips, AGR, and ASR, that would be destroyed every time after use. We should/can keep it as a part of frames too, to avoid recompute them every time.
Reporter | ||
Comment 1•8 years ago
|
||
clips, AGR and ASR can be kept in a struct with a global shared arena. Since they are fixed size, a free list can be used for reusing memory once they are destroyed.
Reporter | ||
Comment 2•8 years ago
|
||
For |TryMerge()|, a flag is necessary to avoid doing it more than one time.
Reporter | ||
Updated•8 years ago
|
Whiteboard: [qf]
Reporter | ||
Comment 3•8 years ago
|
||
For some web-sites like nytimes, BBC, ..., nsDisplayText is counted up to 50+% of all items.
Updated•8 years ago
|
Whiteboard: [qf] → [qf:investigate:p1]
Have matt, I know you are working on retaining display list. From your point of view, is it worthy to work on text specific optimization?
Flags: needinfo?(matt.woodrow)
Comment 5•8 years ago
|
||
Yeah, I think this is still worth doing.
For one, it's a smaller project, so we can likely land it earlier and get wins.
The other is that retained display lists have to rebuild all items that intersect a changed item, so that it can figure out how to sort the items correctly.
Often this will mean we'll rebuild items that haven't actually changed, so making building of items faster is still valuable.
Flags: needinfo?(matt.woodrow)
1.
nsDisplayImte::mClipChain/mClip/mActiveScrolledRoot are all created and hosted in nsDisplayListBuilder.
To persist nsDisplayText, we need to move these data out of nsDisplayListBuilder. Name it as nsBullderStorage? And carry this object in nsDocument?
2.
How to know a cached nsDisplayText is corrputed? Thinker's suggestion is just flush out all cached items when style change(SetStyleContext??)
Most operations used in nsDisplayText's constructor are O(1)
* mBounds = mFrame->GetVisualOverflowRectRelativeToSelf() + ToReferenceFrame();
O(1)
* mClipChain(aBuilder->ClipState().GetCurrentCombinedClipChain(aBuilder))
Can be O(N*M), where N is the lenth of mClipChainContentDescendants/ M is the length of mClipChainContainingBlockDescendants. But most of time, it O(1) after bug 1298218 landed.
* mClip(DisplayItemClipChain::ClipForASR(mClipChain, aActiveScrolledRoot))
Conceptually, it's O(N), but most of time, it is O(1).
* mReferenceFrame = aBuilder->FindReferenceFrameFor(aFrame, &mToReferenceFrame);
O(N), but unfortunately, mToReferenceFrame can not be persisted. We need to re-compute this value everytime.
* mAnimatedGeometryRoot = aBuilder->FindAnimatedGeometryRootFor(aFrame);
Can be O(N), but most of time, it's O(1) for nsDisplayText.
I am not sure whether reuse nsDisplayText is still a good idea now.
But I do find some place that we can fine tune in text painting code path while profiling. Woking on it right now.
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Reporter | ||
Comment 11•7 years ago
|
||
This is about time complexity, it is more about absolute overhead in time.
So, it is about to reduce a big part of overhead of rendering even its time complexity will be not improved.
IIRC, bug 1351881 intents to fix problems including FindReferenceFrameFor().
By invalidate items properly, items are built only necessary.
It means for most of time, we don't need to call FindReferenceFrameFor() for each item.
According previous profiling results, builddisplaylist costs roughly about 30% (I don't remember its exactly number).
And, text items are about 50% for some situations, for NYC, IIRC, it is about 30%~40%.
So, even it improves only 10% in average, it ends to 2% of improvement overall.
Think it is very worth if it is not too complicated and there is no other low hand fruit.
Reporter | ||
Comment 12•7 years ago
|
||
sorry, it is bug 1351811.
Comment hidden (mozreview-request) |
Attachment #8884556 -
Attachment is obsolete: true
Attachment #8884557 -
Attachment is obsolete: true
Attachment #8884558 -
Attachment is obsolete: true
Assignee | ||
Comment 14•7 years ago
|
||
(In reply to Thinker Li [:sinker] from comment #11)
> This is about time complexity, it is more about absolute overhead in time.
> So, it is about to reduce a big part of overhead of rendering even its time
> complexity will be not improved.
>
> IIRC, bug 1351881 intents to fix problems including FindReferenceFrameFor().
> By invalidate items properly, items are built only necessary.
> It means for most of time, we don't need to call FindReferenceFrameFor() for
> each item.
When nsDisplayText uses nsDisplayListBuilder::FindReferenceFrameFor, mostly, it eary returns at
https://hg.mozilla.org/mozilla-central/file/93af9545a6f4/layout/painting/nsDisplayList.cpp#l1440
So, I did not even notice time used by FindReferenceFrameFor while profiling nytime or bbc.
At the time I profiled, nsDisplayListBuilder::GetWidgetLayerManager occupied almost all executing time of nsDisplayText::nsDisplayText
>> And, text items are about 50% for some situations, for NYC, IIRC, it is about 30%~40%.
>> So, even it improves only 10% in average, it ends to 2% of improvement overall.
I can not find this situation on current nightly. The way I profile bbc/nytimes is
1. Load a news page.
2. Wait until loading finish.
3. Scroll up and down
4. Start profiler and scroll up/down several times.
5. Capture.
Take bbc as an example, nsDisplayText::nsDisplayText takes abound 0.1% of running time.
I notice that you said "some situations" it may to to 30~40%, may I know the scenario that you did profiling?
Assignee | ||
Comment 15•7 years ago
|
||
I tried to cache both reference frame and AGR frame, so that we do not to search them in nsDisplayItem::nsDisplayItem, but can not see much perf improvement.
Assignee | ||
Comment 16•7 years ago
|
||
Base on profiling result on current nightly, I do not see a reason that we should keep working on this.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WONTFIX
Comment 17•7 years ago
|
||
(In reply to C.J. Ku[:cjku](UTC+8) from comment #16)
> Base on profiling result on current nightly, I do not see a reason that we
> should keep working on this.
I'm surprised to hear this. Can you share the tests & profile that led to this conclusion? I would have expected a significant speed improvement on pages like https://html.spec.whatwg.org/
Flags: needinfo?(cku)
Assignee | ||
Comment 18•7 years ago
|
||
(In reply to Jet Villegas (:jet) from comment #17)
> (In reply to C.J. Ku[:cjku](UTC+8) from comment #16)
> > Base on profiling result on current nightly, I do not see a reason that we
> > should keep working on this.
>
> I'm surprised to hear this. Can you share the tests & profile that led to
> this conclusion? I would have expected a significant speed improvement on
> pages like https://html.spec.whatwg.org/
Profile data of https://html.spec.whatwg.org/
https://perfht.ml/2vCOvWd
And the profile data of bbc
https://perfht.ml/2vD9Lez
In both cases, try to look throuth any time slice of "DisplayList", either you can not find any time spent on nsDisplayItem::nsDisplayItem or time it take is real minor.
Several reasons that I decided to drop this bug:
1. Since heap allocation in displaylistbuilder is very efficient, and most of functions been called in nsDisplayItem ctor are O(1).
2. The most important, profile data tells me that we can not gain much by doing this optimization.
Flags: needinfo?(cku)
Assignee | ||
Comment 19•7 years ago
|
||
Take FindReferenceFrameFor as an example, when thinker did profile long time ago, this function is in his radar. But, on Nightly, it's already difficult to find out this function appear in a profile data.
Reporter | ||
Comment 20•7 years ago
|
||
Hi CJ,
I don't know what is happening, maybe Gecko changes so much out of my imagination. One thing I have noticed is
that your profile data includes only painting, no any samples are about building display items. I can not believe
that building display items is so fast that we don't see any sample in the profile data. In the other word, samples
in |nsLayoutUtils::PaintFrames()| comprising only |nsDisplayList::PaintRoot()| is too weird for me.
Comment 21•7 years ago
|
||
(In reply to Thinker Li [:sinker] from comment #20)
> Hi CJ,
>
> I don't know what is happening, maybe Gecko changes so much out of my
> imagination. One thing I have noticed is
> that your profile data includes only painting, no any samples are about
> building display items. I can not believe
> that building display items is so fast that we don't see any sample in the
> profile data. In the other word, samples
> in |nsLayoutUtils::PaintFrames()| comprising only
> |nsDisplayList::PaintRoot()| is too weird for me.
For this profile https://perfht.ml/2vCOvWd
BuildDisplayList is hidden by the filter. If I filter by nsDisplayText::nsDisplayText the total time is about 5ms.
nsDisplayText::Paint is about 254ms.
Assignee | ||
Comment 22•7 years ago
|
||
(In reply to Kan-Ru Chen [:kanru] (UTC+8) from comment #21)
> For this profile https://perfht.ml/2vCOvWd
>
> BuildDisplayList is hidden by the filter. If I filter by
> nsDisplayText::nsDisplayText the total time is about 5ms.
>
> nsDisplayText::Paint is about 254ms.
In full range, what I saw is nsDisplayText::nsDisplayText consumed 2ms of 23310ms.
nsDisplayText::Paint is not relative to this bug. What we want to save is the construction time of nsDisplayText.
Assignee | ||
Comment 23•7 years ago
|
||
(In reply to Thinker Li [:sinker] from comment #20)
> Hi CJ,
>
> I don't know what is happening, maybe Gecko changes so much out of my
> imagination. One thing I have noticed is
> that your profile data includes only painting, no any samples are about
> building display items. I can not believe
> that building display items is so fast that we don't see any sample in the
> profile data. In the other word, samples
> in |nsLayoutUtils::PaintFrames()| comprising only
> |nsDisplayList::PaintRoot()| is too weird for me.
Oh, please go back to full range. You can still see BuildDisplayList things.
Reporter | ||
Comment 24•7 years ago
|
||
In the profile data, |nsDisplayText::nsDisplayText| has taken 5ms.
Meanwhile, |nsIFrame::BuildDisplayListForStackingContext| has taken 383ms.
It is about 1.3%. It is a long tail improvement, collect several one percent improvement to make a 10% improvement.
Another point is that this bug is a part of a more big plan.
A big part of changes should be reusable for the project of display list cache working by Matt.
Reporter | ||
Comment 25•7 years ago
|
||
(In reply to C.J. Ku[:cjku](UTC+8) from comment #22)
> In full range, what I saw is nsDisplayText::nsDisplayText consumed 2ms of
> 23310ms.
> nsDisplayText::Paint is not relative to this bug. What we want to save is
> the construction time of nsDisplayText.
Apparently, something is wrong here. Kanru and I get 5ms with the scope of 1.8sec.
But, you get only 2ms for the full range.
Assignee | ||
Comment 26•7 years ago
|
||
(In reply to Thinker Li [:sinker] from comment #25)
> (In reply to C.J. Ku[:cjku](UTC+8) from comment #22)
> > In full range, what I saw is nsDisplayText::nsDisplayText consumed 2ms of
> > 23310ms.
> > nsDisplayText::Paint is not relative to this bug. What we want to save is
> > the construction time of nsDisplayText.
>
> Apparently, something is wrong here. Kanru and I get 5ms with the scope of
> 1.8sec.
> But, you get only 2ms for the full range.
16 ms for full range. Ok, 5ms for 1.8 seconds range I see
Reporter | ||
Comment 27•7 years ago
|
||
For full range, |nsIFrame::BuildDisplayListForStackingContext| where Gecko build display lists is 2635ms.
It takes 0.6% of 2635ms. (The number 23310ms includes a lot of computation that we don't like to include.)
However, please consider this bug would benefit items of other types because most of changes are about nsDisplayItem
, not nsDisplayText itself.
Reporter | ||
Comment 28•7 years ago
|
||
And, some gains are out of the scope of |nsDisplayText::nsDisplayText|.
For example, computation for AGR, ASR, reference frame ...etc.
Assignee | ||
Comment 29•7 years ago
|
||
In 1.8 seconds range: 5ms/ 1833 ms = 0.2% improvement.
Even we cache nsDisplayText, we still need to compute scroll position before painting, so we can not save 5ms totally. For most web pages, it improve even less then 0.2%. On bbc/cnn/yahoo news, there is no sample data of nsDisplayText::nsDisplayText. That why I close this bug. But if we get stronger evidence that we should still do this optimization, it's welcome to reopen it.
For computation of AGR/ASR and reference frame, like I said, they are mostly O(1).
Take nsDisplayListBuilder::FindReferenceFrameFor as an example, although it's an O(N) function, but most of time, we early return at:
http://searchfox.org/mozilla-central/source/layout/painting/nsDisplayList.cpp#1448
same for AGR and ASR searching functions. That's why I can not see noticeable improvement after persist nsDisplayText.
Updated•3 years ago
|
Performance Impact: --- → P1
Whiteboard: [qf:investigate:p1]
You need to log in
before you can comment on or make changes to this bug.
Description
•