Closed Bug 1351313 Opened 8 years ago Closed 7 years ago

Reuse nsDisplayText for frames

Categories

(Core :: Web Painting, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED WONTFIX
Performance Impact high
Tracking Status
firefox55 --- affected

People

(Reporter: sinker, Assigned: u459114)

References

(Depends on 1 open bug)

Details

Attachments

(1 file, 3 obsolete files)

|nsDisplayText| is usually the most popular type of display items. It would improve overall time of building display list to reuse the instances of |nsDislayText| created for frames. The idea here is to set the instances of |nsDisplayText| as one of properties of the associated frame. So, next time, it is reused to avoid memory allocation, initialization and a lot of checks. The problem is |nsDisplayItem| has some member variables; related to clips, AGR, and ASR, that would be destroyed every time after use. We should/can keep it as a part of frames too, to avoid recompute them every time.
clips, AGR and ASR can be kept in a struct with a global shared arena. Since they are fixed size, a free list can be used for reusing memory once they are destroyed.
For |TryMerge()|, a flag is necessary to avoid doing it more than one time.
Depends on: 1351811
Whiteboard: [qf]
For some web-sites like nytimes, BBC, ..., nsDisplayText is counted up to 50+% of all items.
Whiteboard: [qf] → [qf:investigate:p1]
Assignee: nobody → cku
Have matt, I know you are working on retaining display list. From your point of view, is it worthy to work on text specific optimization?
Flags: needinfo?(matt.woodrow)
Yeah, I think this is still worth doing. For one, it's a smaller project, so we can likely land it earlier and get wins. The other is that retained display lists have to rebuild all items that intersect a changed item, so that it can figure out how to sort the items correctly. Often this will mean we'll rebuild items that haven't actually changed, so making building of items faster is still valuable.
Flags: needinfo?(matt.woodrow)
1. nsDisplayImte::mClipChain/mClip/mActiveScrolledRoot are all created and hosted in nsDisplayListBuilder. To persist nsDisplayText, we need to move these data out of nsDisplayListBuilder. Name it as nsBullderStorage? And carry this object in nsDocument? 2. How to know a cached nsDisplayText is corrputed? Thinker's suggestion is just flush out all cached items when style change(SetStyleContext??)
Most operations used in nsDisplayText's constructor are O(1) * mBounds = mFrame->GetVisualOverflowRectRelativeToSelf() + ToReferenceFrame(); O(1) * mClipChain(aBuilder->ClipState().GetCurrentCombinedClipChain(aBuilder)) Can be O(N*M), where N is the lenth of mClipChainContentDescendants/ M is the length of mClipChainContainingBlockDescendants. But most of time, it O(1) after bug 1298218 landed. * mClip(DisplayItemClipChain::ClipForASR(mClipChain, aActiveScrolledRoot)) Conceptually, it's O(N), but most of time, it is O(1). * mReferenceFrame = aBuilder->FindReferenceFrameFor(aFrame, &mToReferenceFrame); O(N), but unfortunately, mToReferenceFrame can not be persisted. We need to re-compute this value everytime. * mAnimatedGeometryRoot = aBuilder->FindAnimatedGeometryRootFor(aFrame); Can be O(N), but most of time, it's O(1) for nsDisplayText. I am not sure whether reuse nsDisplayText is still a good idea now. But I do find some place that we can fine tune in text painting code path while profiling. Woking on it right now.
Depends on: 1379404
This is about time complexity, it is more about absolute overhead in time. So, it is about to reduce a big part of overhead of rendering even its time complexity will be not improved. IIRC, bug 1351881 intents to fix problems including FindReferenceFrameFor(). By invalidate items properly, items are built only necessary. It means for most of time, we don't need to call FindReferenceFrameFor() for each item. According previous profiling results, builddisplaylist costs roughly about 30% (I don't remember its exactly number). And, text items are about 50% for some situations, for NYC, IIRC, it is about 30%~40%. So, even it improves only 10% in average, it ends to 2% of improvement overall. Think it is very worth if it is not too complicated and there is no other low hand fruit.
sorry, it is bug 1351811.
Attachment #8884556 - Attachment is obsolete: true
Attachment #8884557 - Attachment is obsolete: true
Attachment #8884558 - Attachment is obsolete: true
(In reply to Thinker Li [:sinker] from comment #11) > This is about time complexity, it is more about absolute overhead in time. > So, it is about to reduce a big part of overhead of rendering even its time > complexity will be not improved. > > IIRC, bug 1351881 intents to fix problems including FindReferenceFrameFor(). > By invalidate items properly, items are built only necessary. > It means for most of time, we don't need to call FindReferenceFrameFor() for > each item. When nsDisplayText uses nsDisplayListBuilder::FindReferenceFrameFor, mostly, it eary returns at https://hg.mozilla.org/mozilla-central/file/93af9545a6f4/layout/painting/nsDisplayList.cpp#l1440 So, I did not even notice time used by FindReferenceFrameFor while profiling nytime or bbc. At the time I profiled, nsDisplayListBuilder::GetWidgetLayerManager occupied almost all executing time of nsDisplayText::nsDisplayText >> And, text items are about 50% for some situations, for NYC, IIRC, it is about 30%~40%. >> So, even it improves only 10% in average, it ends to 2% of improvement overall. I can not find this situation on current nightly. The way I profile bbc/nytimes is 1. Load a news page. 2. Wait until loading finish. 3. Scroll up and down 4. Start profiler and scroll up/down several times. 5. Capture. Take bbc as an example, nsDisplayText::nsDisplayText takes abound 0.1% of running time. I notice that you said "some situations" it may to to 30~40%, may I know the scenario that you did profiling?
Depends on: 1381503
I tried to cache both reference frame and AGR frame, so that we do not to search them in nsDisplayItem::nsDisplayItem, but can not see much perf improvement.
Base on profiling result on current nightly, I do not see a reason that we should keep working on this.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WONTFIX
(In reply to C.J. Ku[:cjku](UTC+8) from comment #16) > Base on profiling result on current nightly, I do not see a reason that we > should keep working on this. I'm surprised to hear this. Can you share the tests & profile that led to this conclusion? I would have expected a significant speed improvement on pages like https://html.spec.whatwg.org/
Flags: needinfo?(cku)
(In reply to Jet Villegas (:jet) from comment #17) > (In reply to C.J. Ku[:cjku](UTC+8) from comment #16) > > Base on profiling result on current nightly, I do not see a reason that we > > should keep working on this. > > I'm surprised to hear this. Can you share the tests & profile that led to > this conclusion? I would have expected a significant speed improvement on > pages like https://html.spec.whatwg.org/ Profile data of https://html.spec.whatwg.org/ https://perfht.ml/2vCOvWd And the profile data of bbc https://perfht.ml/2vD9Lez In both cases, try to look throuth any time slice of "DisplayList", either you can not find any time spent on nsDisplayItem::nsDisplayItem or time it take is real minor. Several reasons that I decided to drop this bug: 1. Since heap allocation in displaylistbuilder is very efficient, and most of functions been called in nsDisplayItem ctor are O(1). 2. The most important, profile data tells me that we can not gain much by doing this optimization.
Flags: needinfo?(cku)
Take FindReferenceFrameFor as an example, when thinker did profile long time ago, this function is in his radar. But, on Nightly, it's already difficult to find out this function appear in a profile data.
Hi CJ, I don't know what is happening, maybe Gecko changes so much out of my imagination. One thing I have noticed is that your profile data includes only painting, no any samples are about building display items. I can not believe that building display items is so fast that we don't see any sample in the profile data. In the other word, samples in |nsLayoutUtils::PaintFrames()| comprising only |nsDisplayList::PaintRoot()| is too weird for me.
(In reply to Thinker Li [:sinker] from comment #20) > Hi CJ, > > I don't know what is happening, maybe Gecko changes so much out of my > imagination. One thing I have noticed is > that your profile data includes only painting, no any samples are about > building display items. I can not believe > that building display items is so fast that we don't see any sample in the > profile data. In the other word, samples > in |nsLayoutUtils::PaintFrames()| comprising only > |nsDisplayList::PaintRoot()| is too weird for me. For this profile https://perfht.ml/2vCOvWd BuildDisplayList is hidden by the filter. If I filter by nsDisplayText::nsDisplayText the total time is about 5ms. nsDisplayText::Paint is about 254ms.
(In reply to Kan-Ru Chen [:kanru] (UTC+8) from comment #21) > For this profile https://perfht.ml/2vCOvWd > > BuildDisplayList is hidden by the filter. If I filter by > nsDisplayText::nsDisplayText the total time is about 5ms. > > nsDisplayText::Paint is about 254ms. In full range, what I saw is nsDisplayText::nsDisplayText consumed 2ms of 23310ms. nsDisplayText::Paint is not relative to this bug. What we want to save is the construction time of nsDisplayText.
(In reply to Thinker Li [:sinker] from comment #20) > Hi CJ, > > I don't know what is happening, maybe Gecko changes so much out of my > imagination. One thing I have noticed is > that your profile data includes only painting, no any samples are about > building display items. I can not believe > that building display items is so fast that we don't see any sample in the > profile data. In the other word, samples > in |nsLayoutUtils::PaintFrames()| comprising only > |nsDisplayList::PaintRoot()| is too weird for me. Oh, please go back to full range. You can still see BuildDisplayList things.
In the profile data, |nsDisplayText::nsDisplayText| has taken 5ms. Meanwhile, |nsIFrame::BuildDisplayListForStackingContext| has taken 383ms. It is about 1.3%. It is a long tail improvement, collect several one percent improvement to make a 10% improvement. Another point is that this bug is a part of a more big plan. A big part of changes should be reusable for the project of display list cache working by Matt.
(In reply to C.J. Ku[:cjku](UTC+8) from comment #22) > In full range, what I saw is nsDisplayText::nsDisplayText consumed 2ms of > 23310ms. > nsDisplayText::Paint is not relative to this bug. What we want to save is > the construction time of nsDisplayText. Apparently, something is wrong here. Kanru and I get 5ms with the scope of 1.8sec. But, you get only 2ms for the full range.
(In reply to Thinker Li [:sinker] from comment #25) > (In reply to C.J. Ku[:cjku](UTC+8) from comment #22) > > In full range, what I saw is nsDisplayText::nsDisplayText consumed 2ms of > > 23310ms. > > nsDisplayText::Paint is not relative to this bug. What we want to save is > > the construction time of nsDisplayText. > > Apparently, something is wrong here. Kanru and I get 5ms with the scope of > 1.8sec. > But, you get only 2ms for the full range. 16 ms for full range. Ok, 5ms for 1.8 seconds range I see
For full range, |nsIFrame::BuildDisplayListForStackingContext| where Gecko build display lists is 2635ms. It takes 0.6% of 2635ms. (The number 23310ms includes a lot of computation that we don't like to include.) However, please consider this bug would benefit items of other types because most of changes are about nsDisplayItem , not nsDisplayText itself.
And, some gains are out of the scope of |nsDisplayText::nsDisplayText|. For example, computation for AGR, ASR, reference frame ...etc.
In 1.8 seconds range: 5ms/ 1833 ms = 0.2% improvement. Even we cache nsDisplayText, we still need to compute scroll position before painting, so we can not save 5ms totally. For most web pages, it improve even less then 0.2%. On bbc/cnn/yahoo news, there is no sample data of nsDisplayText::nsDisplayText. That why I close this bug. But if we get stronger evidence that we should still do this optimization, it's welcome to reopen it. For computation of AGR/ASR and reference frame, like I said, they are mostly O(1). Take nsDisplayListBuilder::FindReferenceFrameFor as an example, although it's an O(N) function, but most of time, we early return at: http://searchfox.org/mozilla-central/source/layout/painting/nsDisplayList.cpp#1448 same for AGR and ASR searching functions. That's why I can not see noticeable improvement after persist nsDisplayText.
Performance Impact: --- → P1
Whiteboard: [qf:investigate:p1]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: