Closed
Bug 1374112
Opened 6 years ago
Closed 6 years ago
FrameConstructionItemList is malloc heavy
Categories
(Core :: Layout, defect, P3)
Core
Layout
Tracking
()
Tracking | Status | |
---|---|---|
firefox57 | --- | fixed |
People
(Reporter: smaug, Assigned: MatsPalmgren_bugz)
References
(Blocks 1 open bug)
Details
Attachments
(4 files, 3 obsolete files)
1.85 KB,
patch
|
dholbert
:
review+
|
Details | Diff | Splinter Review |
3.82 KB,
patch
|
Details | Diff | Splinter Review | |
4.88 KB,
patch
|
MatsPalmgren_bugz
:
review+
|
Details | Diff | Splinter Review |
40.17 KB,
patch
|
dholbert
:
review+
|
Details | Diff | Splinter Review |
dtor calls delete several items. Could SegmentedVector be used here, allocate items from the vector and not one by one using new?
Reporter | ||
Comment 1•6 years ago
|
||
Or if SegmentedVector doesn't work, perhaps allocate the items from an arena and recycle the objects?
Reporter | ||
Comment 2•6 years ago
|
||
When running Speedometer, the list has usually 0 or 1 entries, but there are cases when there are more than 200 entries.
Comment 3•6 years ago
|
||
Or maybe make FrameConstructionItem a movable type and just use nsTArray<FrameConstructionItem>? Trying to improve the memory locality of these objects might be helpful.
Assignee | ||
Comment 4•6 years ago
|
||
I've started investigating this a bit... using a dedicated arena + free list. Olli, what's the STR for reproducing the performance problem exactly?
Assignee: nobody → mats
Flags: needinfo?(bugs)
Assignee | ||
Comment 6•6 years ago
|
||
OK, here's some statistics for FrameConstructionItem memory allocation/deallocation while running http://speedometer2.benj.me/ ~47000 items are allocated in total during the run, the high-water mark of items in use is 323. Measuring the associated "malloc/free" time only, it takes ~100ms in total, out of roughly 3 min 40 seconds it takes to run this test to completion. With my patch that does arena allocation + free list, it takes ~30ms in total instead. So for this test I don't see why FrameConstructionItem allocation is an issue. It's just 0.045% of the time it takes to run the test. However, on a different test: loading the single-page HTML spec (https://html.spec.whatwg.org/) the numbers are a bit different: ~355000 items allocated in total, 20130 the high-water mark. ~60ms in total, out of roughly 6 seconds it takes to render this page (from a saved copy), or 1% is spent on [de]allocating items. With my patch the [de]allocation takes ~16ms instead. So, it looks like the [de]allocation got about four times faster. Note, I'm only measuring the time of the actual [de]allocation calls, so there might be some other wins, e.g. avoiding jemalloc lock contention (if there is any) on other threads that I'm not seeing with these measurements. All my testing is on 64-bit Opt Linux build, so the numbers might be different on other platforms. (I'll try to measure it on OSX too, which I think has relatively slow memory allocation, IIRC.)
Comment 7•6 years ago
|
||
How do overall memory usage look with the patches? Did we end up fragmenting more or less?
Assignee | ||
Comment 8•6 years ago
|
||
I'd be happy to measure it if you can suggest a tool (on OSX or Linux) that can measure that.
Assignee | ||
Comment 9•6 years ago
|
||
As expected, the malloc/free performance is worse on OSX, so the relative improvement is better there: before ~160ms, after ~31ms.
Comment 10•6 years ago
|
||
(In reply to Mats Palmgren (:mats) from comment #8) > I'd be happy to measure it if you can suggest a tool (on OSX or Linux) that > can measure that. I was just think about:memory. Minimize and then measure to see if the RSS/USS are dramatically different. You could also do an awsy try run with your patches.
Comment 11•6 years ago
|
||
Is the arena in your patches a static or per window/TabGroup? I wonder if this would cause any problems with quantum DOM in the mix.
Assignee | ||
Comment 12•6 years ago
|
||
(In reply to Ben Kelly [reviewing, but slowly][:bkelly] from comment #10) > I was just think about:memory. Minimize and then measure to see if the > RSS/USS are dramatically different. I don't see any significant change there. Note that the memory used for FrameConstructionItems is entirely temporary - once the frame tree is created they all go away... unless we decide to explicitly retain some of that memory for performance. This is under our control. Currently, I'm retaining one arena chunk, which currently is 16kb. (In reply to Ben Kelly [reviewing, but slowly][:bkelly] from comment #11) > Is the arena in your patches a static or per window/TabGroup? It's one arena per nsCSSFrameConstructor instance, which is "one per document" roughly. > I wonder if this would cause any problems with quantum DOM in the mix. I don't know anything about quantum DOM so I can't answer this. I'll attach a WIP patch so you can test it yourself if you want...
Assignee | ||
Comment 13•6 years ago
|
||
Attachment #8884372 -
Flags: review?(dholbert)
Assignee | ||
Comment 14•6 years ago
|
||
Attachment #8884374 -
Flags: review?(erahm)
Assignee | ||
Comment 15•6 years ago
|
||
Comment on attachment 8884374 [details] [diff] [review] part 2 - Introduce ArenaAllocator::ClearRetainOneChunk() that Clears the arena but retains one chunk to avoid malloc/free churn when the arena is re-used a lot (FYI, this is exactly as Clear(), but retains one chunk.)
Assignee | ||
Comment 16•6 years ago
|
||
Attachment #8884377 -
Flags: review?(dholbert)
Assignee | ||
Comment 17•6 years ago
|
||
Apparently, Visual C++ has a bug b/c it won't accept: void operator delete(void*) = delete; even though it's never used.
Attachment #8884378 -
Flags: review?(dholbert)
Assignee | ||
Comment 18•6 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=e75efdb7872263d552a56f459d4fdaa2ee67ced2
Updated•6 years ago
|
Attachment #8884372 -
Flags: review?(dholbert) → review+
Comment 19•6 years ago
|
||
Comment on attachment 8884377 [details] [diff] [review] part 3 - Introduce nsCSSFrameConstructor::AllocateFCItem/FreeFCItem for allocating FrameConstructionItems from an arena/free list Review of attachment 8884377 [details] [diff] [review]: ----------------------------------------------------------------- r=me. Nothing to change, just two thoughts: ::: layout/base/nsCSSFrameConstructor.cpp @@ +13367,5 @@ > + return item; > +} > + > +void > +nsCSSFrameConstructor::FreeFCItem(FrameConstructionItem* aItem) BTW: I reviewed the AllocFCItem/FreeFCItem impls here by comparing them to the similar impls in TimerThread.cpp: https://dxr.mozilla.org/mozilla-central/rev/20f32734df750bddada9d1edca665c2ea53946f0/xpcom/threads/TimerThread.cpp#213-240 Based on that, they look good. ::: layout/base/nsCSSFrameConstructor.h @@ +2180,5 @@ > nsContainerFrame* mDocElementContainingBlock; > nsIFrame* mPageSequenceFrame; > + > + // FrameConstructionItem arena + list of freed items available for re-use. > + mozilla::ArenaAllocator<4096, 8> mFCItemPool; Thinking out loud about the memory usage of this new member: Given that we clean up this structure using ClearRetainOneChunk(), I think this will increase the typical "resting" per-nsCSSFrameConstructor memory usage by 4KB. I don't have a good feel for our current per-document memory usage... but I'm imagining the potentially-worst-case being a news site that has e.g. 300 tiny ad/tracker iframes. So if each iframe gets its own nsCSSFrameConstructor, this changeset will cause a 300 * 4KB = 1200KB = 1.2MB increase in resting memory usage for such a site. Correct? That feels reasonably small for a "~worst case" site, and feels like a reasonable tradeoff IMO.
Attachment #8884377 -
Flags: review?(dholbert) → review+
Comment 20•6 years ago
|
||
Comment on attachment 8884378 [details] [diff] [review] part 4 - Use AllocateFCItem/FreeFCItem exclusively for allocating FrameConstructionItems Review of attachment 8884378 [details] [diff] [review]: ----------------------------------------------------------------- Partial review of part 4 (haven't made it all the way through, but switching gears for a bit and want to post what I've got so far): ::: layout/base/nsCSSFrameConstructor.cpp @@ +13265,5 @@ > } > > void > nsCSSFrameConstructor::FrameConstructionItemList:: > +Iterator::AppendItemsToList(nsCSSFrameConstructor* aFctor, const Iterator& aEnd, Nit: you've got "aFctor" here with a lowercase "c", but it should be an uppercase C for consistency with aFCtor/mFCtor usages elsewhere in the patch. ::: layout/base/nsCSSFrameConstructor.h @@ +1102,5 @@ > + MOZ_COUNT_CTOR(FrameConstructionItemList); > + memset(mDesiredParentCounts, 0, sizeof(mDesiredParentCounts)); > + } > + > + // Not allocated from the stack, except as member of FrameConstructionItem. This comment is slightly confusing -- this struct *is* allocated on the stack as instances of its derived class AutoFrameConstructionItemList, right? Might be worth adding another line to this comment to mention that. @@ +1165,5 @@ > + , mFCtor(aFCtor) > + {} > + ~AutoFrameConstructionItemList() { Destructor(mFCtor); } > + private: > + nsCSSFrameConstructor* mFCtor; Let's make this a const pointer: nsCSSFrameConstructor* const mFCtor; ...to enforce & document that its pointer-value never changes during the lifetime of this object. Also, maybe worth asserting that this pointer is non-null in the (currently-empty) constructor? (both as a sanity-check and as documentation of the expectations here) @@ +1347,5 @@ > + ~AutoFrameConstructionItem() { mItem->Delete(mFCtor); } > + operator FrameConstructionItem&() { return *mItem; } > + private: > + nsCSSFrameConstructor* mFCtor; > + FrameConstructionItem* mItem; As above, let's make these nsCSSFrameConstructor* const FrameConstructionItem* const ...and consider asserting that mFCtor is non-null, in the constructor here.
Assignee | ||
Comment 21•6 years ago
|
||
(In reply to Daniel Holbert [:dholbert] from comment #19) Correct. I guess we could use a timer to kill the last chunk too, if say it hasn't been used in 30 seconds or so. I'll file follow-up bug on investigating that...
Comment 22•6 years ago
|
||
(In reply to Mats Palmgren (:mats) from comment #21) > Correct. I guess we could use a timer to kill the last chunk too, > if say it hasn't been used in 30 seconds or so. I'll file > follow-up bug on investigating that... Or an idle callback?
Assignee | ||
Comment 23•6 years ago
|
||
Comment on attachment 8884378 [details] [diff] [review] [diff] [review] part 4 - Use AllocateFCItem/FreeFCItem exclusively for allocating FrameConstructionItems > Nit: you've got "aFctor" here with a lowercase "c", Fixed. > + // Not allocated from the stack, except as member of FrameConstructionItem. > This comment is slightly confusing -- this struct *is* allocated > on the stack as instances of its derived class AutoFrameConstructionItemList, right? Right. It probably wasn't a good idea to have a long explanatory comment here though. So I moved that bit to the class doc instead: + /* A class representing a list of FrameConstructionItems. Instances of this + class are only created as AutoFrameConstructionItemList, or as a member + of FrameConstructionItem. */ And changed the dtor doc to: + // Prevent stack instances (except as AutoFrameConstructionItemList). which is more on point for why the dtor is protected. >Let's make this a const pointer: > nsCSSFrameConstructor* const mFCtor; Fixed. > Also, maybe worth asserting that this pointer is non-null Sure. > nsCSSFrameConstructor* const > FrameConstructionItem* const >...and consider asserting that mFCtor is non-null, in the constructor here. Fixed.
Comment 24•6 years ago
|
||
Comment on attachment 8884374 [details] [diff] [review] part 2 - Introduce ArenaAllocator::ClearRetainOneChunk() that Clears the arena but retains one chunk to avoid malloc/free churn when the arena is re-used a lot Review of attachment 8884374 [details] [diff] [review]: ----------------------------------------------------------------- Good idea! Looks good, just a few minor questions. Also can you update the tests to add this function? ::: xpcom/ds/ArenaAllocator.h @@ +110,5 @@ > + * > + * NB: This will not run destructors of any objects that were allocated from > + * the arena. > + */ > + void ClearRetainOneChunk() Maybe just calling this |Reset| would be good enough? I don't want to overthink this, so if you feel strongly about keeping the name as-is that's fine. @@ +120,5 @@ > + return; > + } > + auto first = a; > + a = a->next; > + while (a) { We could avoid this code duplication for freeing by doing something like: > mHead->next = a->next; > Clear(); I don't feel too strongly about this. @@ +131,5 @@ > + mHead.next = first; > + mCurrent = first; > + // Re-init first chunk to be empty. > + size_t size = first->header.tail - uintptr_t(first); > + new (first) ArenaChunk(size); We should mimic what we do in AllocateChunk [1] and use the KnownNotNull attribute and mark the memory for the sanitizers. One key difference is that the memory will not be junk filled like a fresh malloc is (nor poisoned like when we free). We might consider manually junk filling / poisoning it in this case. [1] http://searchfox.org/mozilla-central/rev/5dadcbe55b4ddd1e448c06c77390ff6483aa009b/xpcom/ds/ArenaAllocator.h#184-186
Attachment #8884374 -
Flags: review?(erahm) → feedback+
Comment 25•6 years ago
|
||
ni for Mats when he gets back so we don't lose this bug.
Flags: needinfo?(mats)
Comment 26•6 years ago
|
||
Comment on attachment 8884378 [details] [diff] [review] part 4 - Use AllocateFCItem/FreeFCItem exclusively for allocating FrameConstructionItems Review of attachment 8884378 [details] [diff] [review]: ----------------------------------------------------------------- Finished reviewing this. r- for now because of my concerns about Reset. ::: layout/base/nsCSSFrameConstructor.h @@ +870,2 @@ > public: > + void Destructor(nsCSSFrameConstructor* aFCtor) Two things: (1) I think this needs a rename -- it's quite confusing to have a method called "Destructor()" as well as the *actual* destructor, ~FrameConstructionItemList(). Particularly given that they're entirely separate -- i.e. neither one calls the other one. Maybe name this "Destroy"? or "PreDestructor"? Or something describing its actual operation? (though it does a few things at once) Something so that the term "the FrameConstructionItemList destructor" can go back to being unambiguous. :) (2) Could you move this method down to the "protected" section (along with the constructor & actual-destructor)? It doesn't seem like something that people should be arbitrarily calling on instances of this class. (Its current callsites all seem to have "protected" access, by virtue of being in this class, or a subclass, or a friend.) @@ +884,1 @@ > mUndisplayedItems[0].mStyleContext->PresContext()->FrameManager(); Question on the chunk that this ^^^ line is from -- I'll quote a bit more of its surrounding code (which isn't fully shown in the patch context): # // We could store the frame manager in a member, but just # // getting it off the style context is not too bad. # nsFrameManager *mgr = # mUndisplayedItems[0].mStyleContext->PresContext()->FrameManager(); I think this chunk doesn't make sense anymore -- the nsFrameManager is just our associated nsCSSFrameConstructor (which derives from nsFrameManager), right? So can we just get rid of those four lines and use aFCtor instead of "mgr"? (This removal might make sense as a followup patch if you want to avoid doing too much refactoring all at once.) @@ +892,5 @@ > + void Reset(nsCSSFrameConstructor* aFCtor) > + { > + Destructor(aFCtor); > + this->~FrameConstructionItemList(); > + new (this) FrameConstructionItemList(); Is it safe for this "Reset" function to be available on our AutoFrameConstructionItemList subclass? (as a public method, which could be called by anyone working with a "FrameConstructionItemList&" which may or may not be an instance of the subclass) Specifically: is it safe for someone to invoke this method & directly call our superclass destructor/construtor on an instance of AutoFrameConstructionItemList, without invoking the destructor/constructor for that subclass? That seems fragile, if not dangerous, in general... (Even if we made it protected/private, I'm not sure that really helps -- its callsite is from our Iterator class, which I suspect doesn't know which type of list it's iterating.) @@ +1155,5 @@ > bool mTriedConstructingFrames; > }; > > + /* A class representing a list of FrameConstructionItems on the stack. */ > + struct MOZ_RAII AutoFrameConstructionItemList final Nit: s/A class/A struct/ @@ +1320,5 @@ > bool mIsForSVGAElement:1; > > private: > + // Not allocated from the general heap - use AllocateFCItem/FreeFCItem. > + void* operator new(size_t) = delete; The instruction at the end of this comment -- "use AllocateFCItem/FreeFCItem" -- is probably suboptimal advice. (Nobody should actually directly call those functions -- instead they should use the blessed wrappers new/Delete, right?) Maybe instead: // Not allocated from the general heap - instead, use the new/Delete APIs // that take a nsCSSFrameConstructor* (which manages our arena allocation). @@ +1337,5 @@ > + MOZ_ASSERT(mChildItems.IsEmpty(), "leaking"); > + } > + }; > + > + struct MOZ_RAII AutoFrameConstructionItem final This struct needs some documentation. In particular, one thing that confused me initially: FCItemList/AutoFCItemList and FCItem/AutoFCItem have the same naming pattern, but they don't have the same relationship. The "list" structs are related to each other through inheritance, but the "item" structs are not. (Though, in both cases, the Auto version is castable to the non-Auto version.) Possible documentation: /** * Convenience struct to assist in managing a temporary FrameConstructionItem * using a local variable. Castable to FrameConstructionItem so that it can * be passed transparently to functions that expect that type. * (This struct exists because FrameConstructionItem is arena-allocated, and * it's nice to abstract away its allocation/deallocation.) */
Attachment #8884378 -
Flags: review?(dholbert) → review-
Assignee | ||
Comment 27•6 years ago
|
||
Part 2 with Eric's feedback addressed. Attaching it here for posterity, but I've decided not to use it.
Attachment #8884374 -
Attachment is obsolete: true
Flags: needinfo?(mats)
Assignee | ||
Comment 28•6 years ago
|
||
(In reply to Daniel Holbert [:dholbert] from comment #19) > Thinking out loud about the memory usage of this new member: Given that we > clean up this structure using ClearRetainOneChunk(), I think this will > increase the typical "resting" per-nsCSSFrameConstructor memory usage by > 4KB. I don't have a good feel for our current per-document memory usage... > but I'm imagining the potentially-worst-case being a news site that has e.g. > 300 tiny ad/tracker iframes. So if each iframe gets its own > nsCSSFrameConstructor, this changeset will cause a 300 * 4KB = 1200KB = > 1.2MB increase in resting memory usage for such a site. Correct? > > That feels reasonably small for a "~worst case" site, and feels like a > reasonable tradeoff IMO. Correct. I think in general that 4KB per shell is acceptable for a demonstratable performance improvement. However, I went back and measured how much of an improvement ClearRetainOneChunk really gives, and it's neglible. Even for huge pages like the single-page-html-spec it's just a 1ms improvement in wall clock time of the page load, which is in the 8 sec range. It might be better on some pages that generate (re-)frames very frequently but that (also) seems like an edge case. So I decided that, for now at least, just call Clear() instead. We can always add it later if we find it's valuable on some sites. Perhaps also with a timer that would remove the retained chunk if it isn't used within a certain time to avoid the memory bloat. (Forwarding r+ since it's just a s/ClearRetainOneChunk/Clear/ change.)
Attachment #8906106 -
Flags: review+
Assignee | ||
Updated•6 years ago
|
Attachment #8906096 -
Attachment description: part 2 - Introduce ArenaAllocator::ClearRetainOneChunk() that Clears the arena but retains one chunk to avoid malloc/free churn when the arena is re-used a lot → (not landed) Introduce ArenaAllocator::ClearRetainOneChunk() that Clears the arena but retains one chunk to avoid malloc/free churn when the arena is re-used a lot
Updated•6 years ago
|
Priority: -- → P3
Assignee | ||
Comment 29•6 years ago
|
||
(In reply to Daniel Holbert [:dholbert] from comment #26) > Maybe name this "Destroy"? or "PreDestructor"? > (2) Could you move this method down to the "protected" section Sure, I renamed it Destroy and made it protected. > # // We could store the frame manager in a member, but just > # // getting it off the style context is not too bad. > # nsFrameManager *mgr = > # mUndisplayedItems[0].mStyleContext->PresContext()->FrameManager(); > > I think this chunk doesn't make sense anymore -- the nsFrameManager is just > our associated nsCSSFrameConstructor (which derives from nsFrameManager), Removed as suggested. > Specifically: is it safe for someone to invoke this method & directly call > our superclass destructor/construtor on an instance of > AutoFrameConstructionItemList, without invoking the destructor/constructor > for that subclass? That seems fragile, if not dangerous, in general... Yes, I believe it should be safe also for that case. But yes, I agree it's a bit fragile. However, I believe frame construction will soon-ish be moved to Rust so perhaps we can live with this for a while. Also, it's extremely rare that someone changes these classes in any substantial way... > The instruction at the end of this comment -- "use > AllocateFCItem/FreeFCItem" -- is probably suboptimal advice. (Nobody should > actually directly call those functions -- instead they should use the > blessed wrappers new/Delete, right?) > > Maybe instead: > // Not allocated from the general heap - instead, use the new/Delete APIs > // that take a nsCSSFrameConstructor* (which manages our arena > allocation). Fixed as suggested. > > + struct MOZ_RAII AutoFrameConstructionItem final > > This struct needs some documentation. Fixed as suggested.
Attachment #8884377 -
Attachment is obsolete: true
Attachment #8884378 -
Attachment is obsolete: true
Attachment #8906157 -
Flags: review?(dholbert)
Comment 30•6 years ago
|
||
Comment on attachment 8906157 [details] [diff] [review] part 3 - Use AllocateFCItem/FreeFCItem exclusively for allocating FrameConstructionItems r=me
Attachment #8906157 -
Flags: review?(dholbert) → review+
Comment 31•6 years ago
|
||
We should probably track this for [qf]. (Note that it was filed for speedometer (comment 2, comment 5), and mats found there to be perf improvements at least with early versions of the patch (comment 6) -- a reduction from 100ms to 30ms (i.e. a 70ms improvement) in malloc/free time on speedometer.)
Whiteboard: [qf]
Comment 32•6 years ago
|
||
Pushed by mpalmgren@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/7ee14a111f8c part 1 - Move the mUndisplayedItems field to the start to avoid alignment spill after a bool. r=dholbert https://hg.mozilla.org/integration/mozilla-inbound/rev/8bc4d3b25dfb part 2 - Introduce nsCSSFrameConstructor::AllocateFCItem/FreeFCItem for allocating FrameConstructionItems from an arena/free list. r=dholbert https://hg.mozilla.org/integration/mozilla-inbound/rev/9dcca3ac0aaa part 3 - Use AllocateFCItem/FreeFCItem exclusively for allocating FrameConstructionItems. r=dholbert
![]() |
||
Comment 33•6 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/7ee14a111f8c https://hg.mozilla.org/mozilla-central/rev/8bc4d3b25dfb https://hg.mozilla.org/mozilla-central/rev/9dcca3ac0aaa
Status: NEW → RESOLVED
Closed: 6 years ago
status-firefox57:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
Updated•1 year ago
|
Performance Impact: --- → ?
Whiteboard: [qf]
You need to log in
before you can comment on or make changes to this bug.
Description
•