Closed Bug 1351313 Opened 8 years ago Closed 7 years ago

Reuse nsDisplayText for frames

Tracking

()

Status:

RESOLVED WONTFIX

Project Flags:

Performance Impact

high

Tracking Flags:

Tracking

Status

firefox55

---

affected

People

(Reporter: sinker, Assigned: u459114)

References

(Depends on 1 open bug)

Details

Attachments

(1 file, 3 obsolete files)

Bug 1351313 - Part 1. Prevent using gfxContetext::Save/Restore in nsTextFrame. 7 years ago u459114 59 bytes, text/x-review-board-request		Details
Bug 1351313 - Part 2. Prevent using gfxContext::Save/Restore in gfxFont. 7 years ago u459114 59 bytes, text/x-review-board-request		Details
Bug 1351313 - Part 3. Prevent using gfxContetext::Save/Restore in gfxTextRun. 7 years ago u459114 59 bytes, text/x-review-board-request		Details
Bug 1351313 - Part 1. Cache reference frame. 7 years ago u459114 59 bytes, text/x-review-board-request		Details

Thinker Li [:sinker]

Reporter

Description

•

8 years ago

|nsDisplayText| is usually the most popular type of display items. It would improve overall time of building display list to reuse the instances of |nsDislayText| created for frames. The idea here is to set the instances of |nsDisplayText| as one of properties of the associated frame. So, next time, it is reused to avoid memory allocation, initialization and a lot of checks. The problem is |nsDisplayItem| has some member variables; related to clips, AGR, and ASR, that would be destroyed every time after use. We should/can keep it as a part of frames too, to avoid recompute them every time.

Thinker Li [:sinker]

Reporter

Comment 1

•

8 years ago

clips, AGR and ASR can be kept in a struct with a global shared arena. Since they are fixed size, a free list can be used for reusing memory once they are destroyed.

Thinker Li [:sinker]

Reporter

Comment 2

•

8 years ago

For |TryMerge()|, a flag is necessary to avoid doing it more than one time.

Thinker Li [:sinker]

Reporter

Updated

•

8 years ago

Depends on: 1351811

Thinker Li [:sinker]

Reporter

Updated

•

8 years ago

Whiteboard: [qf]

Thinker Li [:sinker]

Reporter

Comment 3

•

8 years ago

For some web-sites like nytimes, BBC, ..., nsDisplayText is counted up to 50+% of all items.

Naveed Ihsanullah [:naveed]

Updated

•

8 years ago

Whiteboard: [qf] → [qf:investigate:p1]

u459114

Assignee

Updated

•

8 years ago

Assignee: nobody → cku

u459114

Assignee

Comment 4

•

8 years ago

Have matt, I know you are working on retaining display list. From your point of view, is it worthy to work on text specific optimization?

Flags: needinfo?(matt.woodrow)

Matt Woodrow (:mattwoodrow)

Comment 5

•

8 years ago

Yeah, I think this is still worth doing. For one, it's a smaller project, so we can likely land it earlier and get wins. The other is that retained display lists have to rebuild all items that intersect a changed item, so that it can figure out how to sort the items correctly. Often this will mean we'll rebuild items that haven't actually changed, so making building of items faster is still valuable.

Flags: needinfo?(matt.woodrow)

u459114

Assignee

Comment 6

•

7 years ago

1. nsDisplayImte::mClipChain/mClip/mActiveScrolledRoot are all created and hosted in nsDisplayListBuilder. To persist nsDisplayText, we need to move these data out of nsDisplayListBuilder. Name it as nsBullderStorage? And carry this object in nsDocument? 2. How to know a cached nsDisplayText is corrputed? Thinker's suggestion is just flush out all cached items when style change(SetStyleContext??)

u459114

Assignee

Comment 7

•

7 years ago

Most operations used in nsDisplayText's constructor are O(1) * mBounds = mFrame->GetVisualOverflowRectRelativeToSelf() + ToReferenceFrame(); O(1) * mClipChain(aBuilder->ClipState().GetCurrentCombinedClipChain(aBuilder)) Can be O(N*M), where N is the lenth of mClipChainContentDescendants/ M is the length of mClipChainContainingBlockDescendants. But most of time, it O(1) after bug 1298218 landed. * mClip(DisplayItemClipChain::ClipForASR(mClipChain, aActiveScrolledRoot)) Conceptually, it's O(N), but most of time, it is O(1). * mReferenceFrame = aBuilder->FindReferenceFrameFor(aFrame, &mToReferenceFrame); O(N), but unfortunately, mToReferenceFrame can not be persisted. We need to re-compute this value everytime. * mAnimatedGeometryRoot = aBuilder->FindAnimatedGeometryRootFor(aFrame); Can be O(N), but most of time, it's O(1) for nsDisplayText. I am not sure whether reuse nsDisplayText is still a good idea now. But I do find some place that we can fine tune in text painting code path while profiling. Woking on it right now.

Comment hidden (mozreview-request)

u459114

Assignee

Updated

•

7 years ago

Depends on: 1379404

Thinker Li [:sinker]

Reporter

Comment 11

•

7 years ago

This is about time complexity, it is more about absolute overhead in time. So, it is about to reduce a big part of overhead of rendering even its time complexity will be not improved. IIRC, bug 1351881 intents to fix problems including FindReferenceFrameFor(). By invalidate items properly, items are built only necessary. It means for most of time, we don't need to call FindReferenceFrameFor() for each item. According previous profiling results, builddisplaylist costs roughly about 30% (I don't remember its exactly number). And, text items are about 50% for some situations, for NYC, IIRC, it is about 30%~40%. So, even it improves only 10% in average, it ends to 2% of improvement overall. Think it is very worth if it is not too complicated and there is no other low hand fruit.

Thinker Li [:sinker]

Reporter

Comment 12

•

7 years ago

sorry, it is bug 1351811.

Comment hidden (mozreview-request)

u459114

Assignee

Updated

•

7 years ago

Attachment #8884556 - Attachment is obsolete: true

u459114

Assignee

Updated

•

7 years ago

Attachment #8884557 - Attachment is obsolete: true

u459114

Assignee

Updated

•

7 years ago

Attachment #8884558 - Attachment is obsolete: true

u459114

Assignee

Comment 14

•

7 years ago

(In reply to Thinker Li [:sinker] from comment #11) > This is about time complexity, it is more about absolute overhead in time. > So, it is about to reduce a big part of overhead of rendering even its time > complexity will be not improved. > > IIRC, bug 1351881 intents to fix problems including FindReferenceFrameFor(). > By invalidate items properly, items are built only necessary. > It means for most of time, we don't need to call FindReferenceFrameFor() for > each item. When nsDisplayText uses nsDisplayListBuilder::FindReferenceFrameFor, mostly, it eary returns at https://hg.mozilla.org/mozilla-central/file/93af9545a6f4/layout/painting/nsDisplayList.cpp#l1440 So, I did not even notice time used by FindReferenceFrameFor while profiling nytime or bbc. At the time I profiled, nsDisplayListBuilder::GetWidgetLayerManager occupied almost all executing time of nsDisplayText::nsDisplayText >> And, text items are about 50% for some situations, for NYC, IIRC, it is about 30%~40%. >> So, even it improves only 10% in average, it ends to 2% of improvement overall. I can not find this situation on current nightly. The way I profile bbc/nytimes is 1. Load a news page. 2. Wait until loading finish. 3. Scroll up and down 4. Start profiler and scroll up/down several times. 5. Capture. Take bbc as an example, nsDisplayText::nsDisplayText takes abound 0.1% of running time. I notice that you said "some situations" it may to to 30~40%, may I know the scenario that you did profiling?

u459114

Assignee

Updated

•

7 years ago

Depends on: 1381503

u459114

Assignee

Comment 15

•

7 years ago

I tried to cache both reference frame and AGR frame, so that we do not to search them in nsDisplayItem::nsDisplayItem, but can not see much perf improvement.

u459114

Assignee

Comment 16

•

7 years ago

Base on profiling result on current nightly, I do not see a reason that we should keep working on this.

Status: NEW → RESOLVED

Closed: 7 years ago

Resolution: --- → WONTFIX

Jet Villegas (inactive)

Comment 17

•

7 years ago

(In reply to C.J. Ku[:cjku](UTC+8) from comment #16) > Base on profiling result on current nightly, I do not see a reason that we > should keep working on this. I'm surprised to hear this. Can you share the tests & profile that led to this conclusion? I would have expected a significant speed improvement on pages like https://html.spec.whatwg.org/

Flags: needinfo?(cku)

u459114

Assignee

Comment 18

•

7 years ago

(In reply to Jet Villegas (:jet) from comment #17) > (In reply to C.J. Ku[:cjku](UTC+8) from comment #16) > > Base on profiling result on current nightly, I do not see a reason that we > > should keep working on this. > > I'm surprised to hear this. Can you share the tests & profile that led to > this conclusion? I would have expected a significant speed improvement on > pages like https://html.spec.whatwg.org/ Profile data of https://html.spec.whatwg.org/ https://perfht.ml/2vCOvWd And the profile data of bbc https://perfht.ml/2vD9Lez In both cases, try to look throuth any time slice of "DisplayList", either you can not find any time spent on nsDisplayItem::nsDisplayItem or time it take is real minor. Several reasons that I decided to drop this bug: 1. Since heap allocation in displaylistbuilder is very efficient, and most of functions been called in nsDisplayItem ctor are O(1). 2. The most important, profile data tells me that we can not gain much by doing this optimization.

Flags: needinfo?(cku)

u459114

Assignee

Comment 19

•

7 years ago

Take FindReferenceFrameFor as an example, when thinker did profile long time ago, this function is in his radar. But, on Nightly, it's already difficult to find out this function appear in a profile data.

Thinker Li [:sinker]

Reporter

Comment 20

•

7 years ago

Hi CJ, I don't know what is happening, maybe Gecko changes so much out of my imagination. One thing I have noticed is that your profile data includes only painting, no any samples are about building display items. I can not believe that building display items is so fast that we don't see any sample in the profile data. In the other word, samples in |nsLayoutUtils::PaintFrames()| comprising only |nsDisplayList::PaintRoot()| is too weird for me.

Kan-Ru Chen [:kanru] (UTC+9)

Comment 21

•

7 years ago

(In reply to Thinker Li [:sinker] from comment #20) > Hi CJ, > > I don't know what is happening, maybe Gecko changes so much out of my > imagination. One thing I have noticed is > that your profile data includes only painting, no any samples are about > building display items. I can not believe > that building display items is so fast that we don't see any sample in the > profile data. In the other word, samples > in |nsLayoutUtils::PaintFrames()| comprising only > |nsDisplayList::PaintRoot()| is too weird for me. For this profile https://perfht.ml/2vCOvWd BuildDisplayList is hidden by the filter. If I filter by nsDisplayText::nsDisplayText the total time is about 5ms. nsDisplayText::Paint is about 254ms.

u459114

Assignee

Comment 22

•

7 years ago

(In reply to Kan-Ru Chen [:kanru] (UTC+8) from comment #21) > For this profile https://perfht.ml/2vCOvWd > > BuildDisplayList is hidden by the filter. If I filter by > nsDisplayText::nsDisplayText the total time is about 5ms. > > nsDisplayText::Paint is about 254ms. In full range, what I saw is nsDisplayText::nsDisplayText consumed 2ms of 23310ms. nsDisplayText::Paint is not relative to this bug. What we want to save is the construction time of nsDisplayText.

u459114

Assignee

Comment 23

•

7 years ago

(In reply to Thinker Li [:sinker] from comment #20) > Hi CJ, > > I don't know what is happening, maybe Gecko changes so much out of my > imagination. One thing I have noticed is > that your profile data includes only painting, no any samples are about > building display items. I can not believe > that building display items is so fast that we don't see any sample in the > profile data. In the other word, samples > in |nsLayoutUtils::PaintFrames()| comprising only > |nsDisplayList::PaintRoot()| is too weird for me. Oh, please go back to full range. You can still see BuildDisplayList things.

Thinker Li [:sinker]

Reporter

Comment 24

•

7 years ago

In the profile data, |nsDisplayText::nsDisplayText| has taken 5ms. Meanwhile, |nsIFrame::BuildDisplayListForStackingContext| has taken 383ms. It is about 1.3%. It is a long tail improvement, collect several one percent improvement to make a 10% improvement. Another point is that this bug is a part of a more big plan. A big part of changes should be reusable for the project of display list cache working by Matt.

Thinker Li [:sinker]

Reporter

Comment 25

•

7 years ago

(In reply to C.J. Ku[:cjku](UTC+8) from comment #22) > In full range, what I saw is nsDisplayText::nsDisplayText consumed 2ms of > 23310ms. > nsDisplayText::Paint is not relative to this bug. What we want to save is > the construction time of nsDisplayText. Apparently, something is wrong here. Kanru and I get 5ms with the scope of 1.8sec. But, you get only 2ms for the full range.

u459114

Assignee

Comment 26

•

7 years ago

(In reply to Thinker Li [:sinker] from comment #25) > (In reply to C.J. Ku[:cjku](UTC+8) from comment #22) > > In full range, what I saw is nsDisplayText::nsDisplayText consumed 2ms of > > 23310ms. > > nsDisplayText::Paint is not relative to this bug. What we want to save is > > the construction time of nsDisplayText. > > Apparently, something is wrong here. Kanru and I get 5ms with the scope of > 1.8sec. > But, you get only 2ms for the full range. 16 ms for full range. Ok, 5ms for 1.8 seconds range I see

Thinker Li [:sinker]

Reporter

Comment 27

•

7 years ago

For full range, |nsIFrame::BuildDisplayListForStackingContext| where Gecko build display lists is 2635ms. It takes 0.6% of 2635ms. (The number 23310ms includes a lot of computation that we don't like to include.) However, please consider this bug would benefit items of other types because most of changes are about nsDisplayItem , not nsDisplayText itself.

Thinker Li [:sinker]

Reporter

Comment 28

•

7 years ago

And, some gains are out of the scope of |nsDisplayText::nsDisplayText|. For example, computation for AGR, ASR, reference frame ...etc.

u459114

Assignee

Comment 29

•

7 years ago

In 1.8 seconds range: 5ms/ 1833 ms = 0.2% improvement. Even we cache nsDisplayText, we still need to compute scroll position before painting, so we can not save 5ms totally. For most web pages, it improve even less then 0.2%. On bbc/cnn/yahoo news, there is no sample data of nsDisplayText::nsDisplayText. That why I close this bug. But if we get stronger evidence that we should still do this optimization, it's welcome to reopen it. For computation of AGR/ASR and reference frame, like I said, they are mostly O(1). Take nsDisplayListBuilder::FindReferenceFrameFor as an example, although it's an O(N) function, but most of time, we early return at: http://searchfox.org/mozilla-central/source/layout/painting/nsDisplayList.cpp#1448 same for AGR and ASR searching functions. That's why I can not see noticeable improvement after persist nsDisplayText.

Dave Hunt [:davehunt] [he/him] ⌚BST

Updated

•

3 years ago

Performance Impact: --- → P1

Whiteboard: [qf:investigate:p1]

You need to log in before you can comment on or make changes to this bug.