Closed Bug 1426044 Opened 8 years ago Closed 7 years ago

Optimize RetainedDisplayListBuilder::MergeDisplayLists, use of hashtables seems overkill

Tracking

()

Status:

RESOLVED DUPLICATE of bug 1439809

Tracking Flags:

Tracking

Status

firefox59

---

affected

People

(Reporter: mozbugz, Assigned: mozbugz)

References

(Blocks 1 open bug)

Details

Attachments

(7 files)

dl-test.html testing display lists with 10k items 8 years ago Gerald Squelart (he/him) (not at Mozilla since 2022-09-15) 715 bytes, text/html		Details
Bug 1426044 - Move list lookup into separate class, lazily construct hashtable - 8 years ago Gerald Squelart (he/him) (not at Mozilla since 2022-09-15) 59 bytes, text/x-review-board-request	mattwoodrow : review+	Details
Bug 1426044 - Merging expects old and new list to mostly match already - 8 years ago Gerald Squelart (he/him) (not at Mozilla since 2022-09-15) 59 bytes, text/x-review-board-request	mattwoodrow : review+	Details
Bug 1426044 - Use flat array when merging small lists - 8 years ago Gerald Squelart (he/him) (not at Mozilla since 2022-09-15) 59 bytes, text/x-review-board-request	mattwoodrow : review+	Details
Bug 1426044 - Restart linear search from after previously-found item - 8 years ago Gerald Squelart (he/him) (not at Mozilla since 2022-09-15) 59 bytes, text/x-review-board-request	mattwoodrow : review+	Details
bug1426044_benchmarks.patch 8 years ago Gerald Squelart (he/him) (not at Mozilla since 2022-09-15) 6.26 KB, patch		Details \| Diff \| Splinter Review
bug1426044_benchmarks.ods 8 years ago Gerald Squelart (he/him) (not at Mozilla since 2022-09-15) 253.77 KB, application/vnd.oasis.opendocument.spreadsheet		Details

Gerald Squelart (he/him) (not at Mozilla since 2022-09-15)

Assignee

Description

•

8 years ago

Attached file dl-test.html testing display lists with 10k items — Details

Currently, RDL merging uses hashtables to find items from one display list in another one. The creation of, and search in, these hashtables is visible in profiling runs. Some quick printf-debugging showed me that in most cases, display lists only contain a few items, and most lists match 1:1 (with some skipping sometimes; didn't notice any reordering), a full hashtable search seems wasteful. Instead, just going through the list, starting at index 0 or (previous search + 1), should take care of most real-world cases much more efficiently. The worst case with this would be for items that are not in the searched list, for which we would need to go through the whole list. This could probably be optimized by looking at the display items linked with the associated frame (nsIFrame::DisplayItems -- but we need to be careful there, as we don't know which lists these items belong to). The attached dl-test.html may be used to measure such cases -- we can accept some performance impact from a new algorithm, but if it looks much worse, we may need to keep the old algorithm around, to be used for such big lists. I will try to gather more profiling data first, then design and optimize an algorithm that should handle most cases, ensuring that worst-case scenarios don't blow up. Feedback & help welcome.

Matt Woodrow (:mattwoodrow)

Updated

•

8 years ago

Priority: -- → P2

Matt Woodrow (:mattwoodrow)

Updated

•

8 years ago

Blocks: RDLPerf

Comment hidden (mozreview-request)

Matt Woodrow (:mattwoodrow)

Comment 5

•

8 years ago

mozreview-review

Comment on attachment 8941250 [details] Bug 1426044 - Move list lookup into separate class, lazily construct hashtable - https://reviewboard.mozilla.org/r/211520/#review217328 ::: layout/painting/RetainedDisplayListBuilder.cpp:321 (Diff revision 1) > + template<typename F> > + explicit ListLookup(nsDisplayList* aList, F&& aItemAction) > + : mList(aList) > + , mTable(aList->Count() ? Table(Undecided()) : Table(Empty())) > + { > + for (nsDisplayItem* i = aList->GetBottom(); i != nullptr; If we're running this in the constructor (not doing it lazily), it might be worth just making the caller doing it manually rather than making ListLookup code more complicated. Unless of course there is more code in a latter patch that reuses this functionality. ::: layout/painting/RetainedDisplayListBuilder.cpp:329 (Diff revision 1) > + } > + } > + > + nsDisplayItem* GetSame(nsDisplayItem* aItem) > + { > + struct MatcherForGet Bit unfortunate how much boilerplate these matcher classes add, it would be nice if there was a version of Variant::match that just took N lambdas.

Attachment #8941250 - Flags: review?(matt.woodrow) → review+

Matt Woodrow (:mattwoodrow)

Comment 6

•

8 years ago

mozreview-review

Comment on attachment 8941251 [details] Bug 1426044 - Merging expects old and new list to mostly match already - https://reviewboard.mozilla.org/r/211522/#review217332

Attachment #8941251 - Flags: review?(matt.woodrow) → review+

Matt Woodrow (:mattwoodrow)

Comment 7

•

8 years ago

mozreview-review

Gerald Squelart (he/him) (not at Mozilla since 2022-09-15)

Assignee

Comment 8

•

8 years ago

mozreview-review-reply

Comment on attachment 8941250 [details] Bug 1426044 - Move list lookup into separate class, lazily construct hashtable - https://reviewboard.mozilla.org/r/211520/#review217328 > If we're running this in the constructor (not doing it lazily), it might be worth just making the caller doing it manually rather than making ListLookup code more complicated. > > Unless of course there is more code in a latter patch that reuses this functionality. Hopefully the 3rd patch will make it clearer why I do this (using the list visitation to do something else that's relevant to the lookup). > Bit unfortunate how much boilerplate these matcher classes add, it would be nice if there was a version of Variant::match that just took N lambdas. Agreed! :-) Except in this case, it would mean *each* lambda would have to carry its own capture, instead of having one shared capture in the struct.

Matt Woodrow (:mattwoodrow)

Comment 9

•

8 years ago

mozreview-review

Comment on attachment 8941252 [details] Bug 1426044 - Use flat array when merging small lists - https://reviewboard.mozilla.org/r/211524/#review217338 ::: layout/painting/RetainedDisplayListBuilder.cpp:374 (Diff revision 1) > + // finding the old item at the same relative location, which should now > + // be at the bottom (previous ones having been consumed). > + nsDisplayItem* bottomItem = mSelf.mList->GetBottom(); > + if (!bottomItem || IsSameItem(mItem, bottomItem)) { > + return bottomItem; > + } Looks like we have these same 4 lines in all match impls (except for the empty). Can we just run this before calling match() and have it once? ::: layout/painting/RetainedDisplayListBuilder.cpp:460 (Diff revision 1) > // Yet-undecided state, may create a table if later needed. > struct Undecided > { > }; > > - // Hash table (frame, key) -> item. > + struct ItemInfo Maybe consider moving this struct, the constants, and the using Array= bits outside of this class, so that the code for ListLookup stays smaller and more concise. ::: layout/painting/RetainedDisplayListBuilder.cpp:481 (Diff revision 1) > + return mFrame == aItem->Frame() && > + mPerFrameKey == aItem->GetPerFrameKey(); > + } > + }; > + // Inline array storage, created eagerly on the first list reading. > + static constexpr size_t scInlineArraySize = I think it might be simpler to pick a single number for all platforms where we're confident it's a win (10?), and not get too worried about trying to maximize it completely. Either way, please add a comment detailing how/why the numbers were chosen, so if we ever get more data on the behaviour of this, we can evaluate it against the data that lead to the current choices.

Attachment #8941252 - Flags: review?(matt.woodrow) → review+

Matt Woodrow (:mattwoodrow)

Comment 10

•

8 years ago

mozreview-review

Comment on attachment 8941253 [details] Bug 1426044 - Restart linear search from after previously-found item - https://reviewboard.mozilla.org/r/211526/#review217344 ::: layout/painting/RetainedDisplayListBuilder.cpp:400 (Diff revision 1) > - } > + } > + if (iter == start) { > - return nullptr; > + return nullptr; > - } > + } > + } > + } Breakout out of the array container and doing pointer math iteration always makes me feel a bit uneasy. Can't we do something like: MOZ_ASSERT(mNextStartIndex < len); uint32_t start = mNextStartIndex; do { const ItemInfo* elem = aArray.mArray[mNextStartIndex]; mNextStartIndex++; if (mNextStartIndex == len) { mNextStartIndex = 0; } if (elem == mItem) { return elem->mItem; } } while (mNextStartIndex != start) ::: layout/painting/RetainedDisplayListBuilder.cpp:438 (Diff revision 1) > } > - void match(Array& aArray) { aArray.RemoveElement(mItem); } > + void match(Array& aArray) > + { > + if (aArray.mArray.RemoveElement(mItem)) { > + if (aArray.mNextStartIndex != 0) { > + --aArray.mNextStartIndex; Shouldn't we only decrement if the removed element was before the index? Or are we just assuming it's cheaper to always decrement, and take the +1 iteration hit on the next search?

Jeff Muizelaar [:jrmuizel]

Comment 11

•

8 years ago

How much do these patches improve performance?

Gerald Squelart (he/him) (not at Mozilla since 2022-09-15)

Assignee

Updated

•

8 years ago

Blocks: 1429616

Gerald Squelart (he/him) (not at Mozilla since 2022-09-15)

Assignee

Comment 12

•

8 years ago

mozreview-review-reply

Comment on attachment 8941252 [details] Bug 1426044 - Use flat array when merging small lists - https://reviewboard.mozilla.org/r/211524/#review217338 > Looks like we have these same 4 lines in all match impls (except for the empty). > > Can we just run this before calling match() and have it once? Agreed, I'll pull the common code out. And even in the 'Empty' case, having that extra test will be compensated by not running the Variant.match() code at all. > Maybe consider moving this struct, the constants, and the using Array= bits outside of this class, so that the code for ListLookup stays smaller and more concise. I always try to encapsulate structs to avoid polluting the surrounding namespace, and to prevent unexpected outside use. I'll see if I can cleanly separate them. > I think it might be simpler to pick a single number for all platforms where we're confident it's a win (10?), and not get too worried about trying to maximize it completely. > > Either way, please add a comment detailing how/why the numbers were chosen, so if we ever get more data on the behaviour of this, we can evaluate it against the data that lead to the current choices. Based on my benchmarks, picking single numbers would harm some platforms (e.g., inline>10 shows a perf cliff on Linux, but inline<24 wastes perf on Mac.) So I would really prefer to keep these separate numbers at least for now. I'll add a comment in the code referring to this bug, and add my benchmark details to the bug.

Gerald Squelart (he/him) (not at Mozilla since 2022-09-15)

Assignee

Comment 13

•

8 years ago

mozreview-review-reply

Comment on attachment 8941253 [details] Bug 1426044 - Restart linear search from after previously-found item - https://reviewboard.mozilla.org/r/211526/#review217344 > Breakout out of the array container and doing pointer math iteration always makes me feel a bit uneasy. > > Can't we do something like: > > > MOZ_ASSERT(mNextStartIndex < len); > uint32_t start = mNextStartIndex; > do { > const ItemInfo* elem = aArray.mArray[mNextStartIndex]; > > mNextStartIndex++; > if (mNextStartIndex == len) { > mNextStartIndex = 0; > } > > if (elem == mItem) { > return elem->mItem; > } > } while (mNextStartIndex != start) Yeah, clearer and safer, thank you for the suggestion. > Shouldn't we only decrement if the removed element was before the index? > > Or are we just assuming it's cheaper to always decrement, and take the +1 iteration hit on the next search? My first pointer-based attempt was too complicated, so yes I went for easier & cheap-enough. But I found nsTArray::IndexOf, much easier and allows me to conditionally move mNextStartIndex when appropriate.

Comment hidden (mozreview-request)

Matt Woodrow (:mattwoodrow)

Comment 18

•

8 years ago

mozreview-review

Comment on attachment 8941253 [details] Bug 1426044 - Restart linear search from after previously-found item - https://reviewboard.mozilla.org/r/211526/#review217690

Attachment #8941253 - Flags: review?(matt.woodrow) → review+

Gerald Squelart (he/him) (not at Mozilla since 2022-09-15)

Assignee

Comment 19

•

8 years ago

Attached patch bug1426044_benchmarks.patch — Details — Splinter Review

Benchmarking code used to compare AutoTArray<ItemInfo, N> vs nsDataHashtable<DisplayItemHashEntry, nsDisplayItem*> (i.e., as used in this bug's main patches). Run with `./mach gtest DisplayListBenchmark*`. Numbers are microseconds, lower is better.

Gerald Squelart (he/him) (not at Mozilla since 2022-09-15)

Assignee

Comment 20

•

8 years ago

Attached file bug1426044_benchmarks.ods — Details

Results from running the benchmark code on Try, taken from 'opt' runs on different platforms. (win64 numbers missing, but a local run showed me they were similar to win32 numbers.)

Gerald Squelart (he/him) (not at Mozilla since 2022-09-15)

Assignee

Updated

•

8 years ago

Depends on: 1420737

Gerald Squelart (he/him) (not at Mozilla since 2022-09-15)

Assignee

Comment 21

•

8 years ago

(In reply to Jeff Muizelaar [:jrmuizel] from comment #11) > How much do these patches improve performance? I haven't forgotten you! I've been trying to measure performance improvements, but so far I'm afraid I'm not seeing them. How I've tested so far: I time the calls to MergeDisplayLists in RetainedDisplayListBuilder.cpp, gathering simple stats, and run RefTests on Try, hoping to get stable-enough numbers (because tests are working with the same pages, in the same environments). E.g., without my patches: https://treeherder.mozilla.org/#/jobs?repo=try&revision=0d30dc880bd7be1c4331583771eda2bca84af58e With patches: https://treeherder.mozilla.org/#/jobs?repo=try&revision=0ccf7eee62e05e8d4b14387caff76e6b1453a6f7 Unfortunately, results so far are pointing at similar or slightly worse results with the patches. For example, 'R' tests on Mac, without patches: average=15.1381394 min=0.8744 q1=3.544 med=6.3114 q3=13.8602 max=3121.8202 (count=26699) with patches: average=15.013578 min=1.0092 q1=4.0178 med=7.1412 q3=12.3158 max=3130.7498 (count=26801) (Numbers are microseconds, lower is better.) So the average looks a little bit better with patches, however the lower quartiles are worse, when I would have hoped these patches would have helped the most. And if the numbers are correct, 1% improvements may not justify the extra complexity brought by these patches. It's possible I'm not measuring things properly, or not measuring the right things -- suggestions? -- but I'm seeing this trend in lots of tests I've conducted over the past few days and across platforms. Maybe it's good news, as I've "proven" that the current code is already optimal? I'll try to better understand what could be wrong with my patches. In the meantime, I've found a couple of other actual improvements nearby, I'll open new bugs for them soon.

Gerald Squelart (he/him) (not at Mozilla since 2022-09-15)

Assignee

Updated

•

8 years ago

Depends on: 1430494

Matt Woodrow (:mattwoodrow)

Updated

•

7 years ago

Updated

•

7 years ago

Status: NEW → RESOLVED

Closed: 7 years ago

Resolution: --- → DUPLICATE

You need to log in before you can comment on or make changes to this bug.