Closed Bug 1374112 Opened 7 years ago Closed 7 years ago

FrameConstructionItemList is malloc heavy

Categories

(Core :: Layout, defect, P3)

defect

Tracking

()

RESOLVED FIXED
mozilla57
Performance Impact ?
Tracking Status
firefox57 --- fixed

People

(Reporter: smaug, Assigned: MatsPalmgren_bugz)

References

(Blocks 1 open bug)

Details

Attachments

(4 files, 3 obsolete files)

1.85 KB, patch
dholbert
: review+
Details | Diff | Splinter Review
3.82 KB, patch
Details | Diff | Splinter Review
4.88 KB, patch
MatsPalmgren_bugz
: review+
Details | Diff | Splinter Review
40.17 KB, patch
dholbert
: review+
Details | Diff | Splinter Review
dtor calls delete several items.
Could SegmentedVector be used here, allocate items from the vector and not one by one using new?
Or if SegmentedVector doesn't work, perhaps allocate the items from an arena and recycle the objects?
When running Speedometer, the list has usually 0 or 1 entries, but there are cases when there are more than 200 entries.
Or maybe make FrameConstructionItem a movable type and just use nsTArray<FrameConstructionItem>?  Trying to improve the memory locality of these objects might be helpful.
I've started investigating this a bit... using a dedicated arena + free list.

Olli, what's the STR for reproducing the performance problem exactly?
Assignee: nobody → mats
Flags: needinfo?(bugs)
I was just running speedometer v2.
Flags: needinfo?(bugs)
OK, here's some statistics for FrameConstructionItem memory
allocation/deallocation while running http://speedometer2.benj.me/

~47000 items are allocated in total during the run,
the high-water mark of items in use is 323.
Measuring the associated "malloc/free" time only, it takes
~100ms in total, out of roughly 3 min 40 seconds it takes
to run this test to completion.

With my patch that does arena allocation + free list, it takes
~30ms in total instead.

So for this test I don't see why FrameConstructionItem allocation
is an issue.  It's just 0.045% of the time it takes to run the test.


However, on a different test: loading the single-page HTML spec
(https://html.spec.whatwg.org/) the numbers are a bit different:

~355000 items allocated in total, 20130 the high-water mark.
~60ms in total, out of roughly 6 seconds it takes to render this
page (from a saved copy), or 1% is spent on [de]allocating items.
With my patch the [de]allocation takes ~16ms instead.

So, it looks like the [de]allocation got about four times faster.
Note, I'm only measuring the time of the actual [de]allocation calls,
so there might be some other wins, e.g. avoiding jemalloc lock
contention (if there is any) on other threads that I'm not seeing
with these measurements.  All my testing is on 64-bit Opt Linux build,
so the numbers might be different on other platforms.  (I'll try to
measure it on OSX too, which I think has relatively slow memory
allocation, IIRC.)
How do overall memory usage look with the patches?  Did we end up fragmenting more or less?
I'd be happy to measure it if you can suggest a tool (on OSX or Linux) that can measure that.
As expected, the malloc/free performance is worse on OSX, so the relative
improvement is better there: before ~160ms, after ~31ms.
(In reply to Mats Palmgren (:mats) from comment #8)
> I'd be happy to measure it if you can suggest a tool (on OSX or Linux) that
> can measure that.

I was just think about:memory.  Minimize and then measure to see if the RSS/USS are dramatically different.

You could also do an awsy try run with your patches.
Is the arena in your patches a static or per window/TabGroup?

I wonder if this would cause any problems with quantum DOM in the mix.
(In reply to Ben Kelly [reviewing, but slowly][:bkelly] from comment #10)
> I was just think about:memory.  Minimize and then measure to see if the
> RSS/USS are dramatically different.

I don't see any significant change there.  Note that the memory used for
FrameConstructionItems is entirely temporary - once the frame tree is
created they all go away... unless we decide to explicitly retain some
of that memory for performance.  This is under our control.  Currently,
I'm retaining one arena chunk, which currently is 16kb.

(In reply to Ben Kelly [reviewing, but slowly][:bkelly] from comment #11)
> Is the arena in your patches a static or per window/TabGroup?

It's one arena per nsCSSFrameConstructor instance, which is "one per document" roughly.

> I wonder if this would cause any problems with quantum DOM in the mix.

I don't know anything about quantum DOM so I can't answer this.

I'll attach a WIP patch so you can test it yourself if you want...
Comment on attachment 8884374 [details] [diff] [review]
part 2 - Introduce ArenaAllocator::ClearRetainOneChunk() that Clears the arena but retains one chunk to avoid malloc/free churn when the arena is re-used a lot

(FYI, this is exactly as Clear(), but retains one chunk.)
Apparently, Visual C++ has a bug b/c it won't accept:
  void operator delete(void*) = delete;
even though it's never used.
Attachment #8884378 - Flags: review?(dholbert)
Attachment #8884372 - Flags: review?(dholbert) → review+
Comment on attachment 8884377 [details] [diff] [review]
part 3 - Introduce nsCSSFrameConstructor::AllocateFCItem/FreeFCItem for allocating FrameConstructionItems from an arena/free list

Review of attachment 8884377 [details] [diff] [review]:
-----------------------------------------------------------------

r=me. Nothing to change, just two thoughts:

::: layout/base/nsCSSFrameConstructor.cpp
@@ +13367,5 @@
> +  return item;
> +}
> +
> +void
> +nsCSSFrameConstructor::FreeFCItem(FrameConstructionItem* aItem)

BTW: I reviewed the AllocFCItem/FreeFCItem impls here by comparing them to the similar impls in TimerThread.cpp:
https://dxr.mozilla.org/mozilla-central/rev/20f32734df750bddada9d1edca665c2ea53946f0/xpcom/threads/TimerThread.cpp#213-240

Based on that, they look good.

::: layout/base/nsCSSFrameConstructor.h
@@ +2180,5 @@
>    nsContainerFrame*   mDocElementContainingBlock;
>    nsIFrame*           mPageSequenceFrame;
> +
> +  // FrameConstructionItem arena + list of freed items available for re-use.
> +  mozilla::ArenaAllocator<4096, 8> mFCItemPool;

Thinking out loud about the memory usage of this new member: Given that we clean up this structure using ClearRetainOneChunk(), I think this will increase the typical "resting" per-nsCSSFrameConstructor memory usage by 4KB. I don't have a good feel for our current per-document memory usage... but I'm imagining the potentially-worst-case being a news site that has e.g. 300 tiny ad/tracker iframes.  So if each iframe gets its own nsCSSFrameConstructor, this changeset will cause a 300 * 4KB = 1200KB = 1.2MB increase in resting memory usage for such a site.  Correct?

That feels reasonably small for a "~worst case" site, and feels like a reasonable tradeoff IMO.
Attachment #8884377 - Flags: review?(dholbert) → review+
Comment on attachment 8884378 [details] [diff] [review]
part 4 - Use AllocateFCItem/FreeFCItem exclusively for allocating FrameConstructionItems

Review of attachment 8884378 [details] [diff] [review]:
-----------------------------------------------------------------

Partial review of part 4 (haven't made it all the way through, but switching gears for a bit and want to post what I've got so far):

::: layout/base/nsCSSFrameConstructor.cpp
@@ +13265,5 @@
>  }
>  
>  void
>  nsCSSFrameConstructor::FrameConstructionItemList::
> +Iterator::AppendItemsToList(nsCSSFrameConstructor* aFctor, const Iterator& aEnd,

Nit: you've got "aFctor" here with a lowercase "c", but it should be an uppercase C for consistency with aFCtor/mFCtor usages elsewhere in the patch.

::: layout/base/nsCSSFrameConstructor.h
@@ +1102,5 @@
> +      MOZ_COUNT_CTOR(FrameConstructionItemList);
> +      memset(mDesiredParentCounts, 0, sizeof(mDesiredParentCounts));
> +    }
> +
> +    // Not allocated from the stack, except as member of FrameConstructionItem.

This comment is slightly confusing -- this struct *is* allocated on the stack as instances of its derived class AutoFrameConstructionItemList, right?

Might be worth adding another line to this comment to mention that.

@@ +1165,5 @@
> +      , mFCtor(aFCtor)
> +    {}
> +    ~AutoFrameConstructionItemList() { Destructor(mFCtor); }
> +  private:
> +    nsCSSFrameConstructor* mFCtor;

Let's make this a const pointer:
  nsCSSFrameConstructor* const mFCtor;
...to enforce & document that its pointer-value never changes during the lifetime of this object.

Also, maybe worth asserting that this pointer is non-null in the (currently-empty) constructor? (both as a sanity-check and as documentation of the expectations here)

@@ +1347,5 @@
> +    ~AutoFrameConstructionItem() { mItem->Delete(mFCtor); }
> +    operator FrameConstructionItem&() { return *mItem; }
> +  private:
> +    nsCSSFrameConstructor* mFCtor;
> +    FrameConstructionItem* mItem;

As above, let's make these 
 nsCSSFrameConstructor* const
 FrameConstructionItem* const

...and consider asserting that mFCtor is non-null, in the constructor here.
(In reply to Daniel Holbert [:dholbert] from comment #19)

Correct.  I guess we could use a timer to kill the last chunk too,
if say it hasn't been used in 30 seconds or so.  I'll file
follow-up bug on investigating that...
(In reply to Mats Palmgren (:mats) from comment #21)
> Correct.  I guess we could use a timer to kill the last chunk too,
> if say it hasn't been used in 30 seconds or so.  I'll file
> follow-up bug on investigating that...

Or an idle callback?
Comment on attachment 8884378 [details] [diff] [review] [diff] [review]
part 4 - Use AllocateFCItem/FreeFCItem exclusively for allocating FrameConstructionItems

> Nit: you've got "aFctor" here with a lowercase "c",

Fixed.

> +    // Not allocated from the stack, except as member of FrameConstructionItem.

> This comment is slightly confusing -- this struct *is* allocated
> on the stack as instances of its derived class AutoFrameConstructionItemList, right?

Right.  It probably wasn't a good idea to have a long explanatory
comment here though.  So I moved that bit to the class doc instead:

+  /* A class representing a list of FrameConstructionItems.  Instances of this
+     class are only created as AutoFrameConstructionItemList, or as a member
+     of FrameConstructionItem. */

And changed the dtor doc to:

+    // Prevent stack instances (except as AutoFrameConstructionItemList).

which is more on point for why the dtor is protected.

>Let's make this a const pointer:
>  nsCSSFrameConstructor* const mFCtor;

Fixed.

> Also, maybe worth asserting that this pointer is non-null

Sure.

> nsCSSFrameConstructor* const
> FrameConstructionItem* const
>...and consider asserting that mFCtor is non-null, in the constructor here.

Fixed.
Comment on attachment 8884374 [details] [diff] [review]
part 2 - Introduce ArenaAllocator::ClearRetainOneChunk() that Clears the arena but retains one chunk to avoid malloc/free churn when the arena is re-used a lot

Review of attachment 8884374 [details] [diff] [review]:
-----------------------------------------------------------------

Good idea! Looks good, just a few minor questions. Also can you update the tests to add this function?

::: xpcom/ds/ArenaAllocator.h
@@ +110,5 @@
> +   *
> +   * NB: This will not run destructors of any objects that were allocated from
> +   * the arena.
> +   */
> +  void ClearRetainOneChunk()

Maybe just calling this |Reset| would be good enough? I don't want to overthink this, so if you feel strongly about keeping the name as-is that's fine.

@@ +120,5 @@
> +      return;
> +    }
> +    auto first = a;
> +    a = a->next;
> +    while (a) {

We could avoid this code duplication for freeing by doing something like:

> mHead->next = a->next;
> Clear();

I don't feel too strongly about this.

@@ +131,5 @@
> +    mHead.next = first;
> +    mCurrent = first;
> +    // Re-init first chunk to be empty.
> +    size_t size = first->header.tail - uintptr_t(first);
> +    new (first) ArenaChunk(size);

We should mimic what we do in AllocateChunk [1] and use the KnownNotNull attribute and mark the memory for the sanitizers.

One key difference is that the memory will not be junk filled like a fresh malloc is (nor poisoned like when we free). We might consider manually junk filling / poisoning it in this case.

[1] http://searchfox.org/mozilla-central/rev/5dadcbe55b4ddd1e448c06c77390ff6483aa009b/xpcom/ds/ArenaAllocator.h#184-186
Attachment #8884374 - Flags: review?(erahm) → feedback+
ni for Mats when he gets back so we don't lose this bug.
Flags: needinfo?(mats)
Comment on attachment 8884378 [details] [diff] [review]
part 4 - Use AllocateFCItem/FreeFCItem exclusively for allocating FrameConstructionItems

Review of attachment 8884378 [details] [diff] [review]:
-----------------------------------------------------------------

Finished reviewing this. r- for now because of my concerns about Reset.

::: layout/base/nsCSSFrameConstructor.h
@@ +870,2 @@
>    public:
> +    void Destructor(nsCSSFrameConstructor* aFCtor)

Two things:
 (1) I think this needs a rename -- it's quite confusing to have a method called "Destructor()" as well as the *actual* destructor, ~FrameConstructionItemList().  Particularly given that they're entirely separate -- i.e. neither one calls the other one.

Maybe name this "Destroy"? or "PreDestructor"?  Or something describing its actual operation? (though it does a few things at once)  Something so that the term "the FrameConstructionItemList destructor" can go back to being unambiguous. :)

 (2) Could you move this method down to the "protected" section (along with the constructor & actual-destructor)?  It doesn't seem like something that people should be arbitrarily calling on instances of this class.  (Its current callsites all seem to have "protected" access, by virtue of being in this class, or a subclass, or a friend.)

@@ +884,1 @@
>            mUndisplayedItems[0].mStyleContext->PresContext()->FrameManager();

Question on the chunk that this ^^^ line is from -- I'll quote a bit more of its surrounding code (which isn't fully shown in the patch context):
 #       // We could store the frame manager in a member, but just
 #       // getting it off the style context is not too bad.
 #       nsFrameManager *mgr =
 #         mUndisplayedItems[0].mStyleContext->PresContext()->FrameManager();

I think this chunk doesn't make sense anymore -- the nsFrameManager is just our associated nsCSSFrameConstructor (which derives from nsFrameManager), right?  So can we just get rid of those four lines and use aFCtor instead of "mgr"?

(This removal might make sense as a followup patch if you want to avoid doing too much refactoring all at once.)

@@ +892,5 @@
> +    void Reset(nsCSSFrameConstructor* aFCtor)
> +    {
> +      Destructor(aFCtor);
> +      this->~FrameConstructionItemList();
> +      new (this) FrameConstructionItemList();

Is it safe for this "Reset" function to be available on our AutoFrameConstructionItemList subclass? (as a public method, which could be called by anyone working with a "FrameConstructionItemList&" which may or may not be an instance of the subclass)

Specifically: is it safe for someone to invoke this method & directly call our superclass destructor/construtor on an instance of AutoFrameConstructionItemList, without invoking the destructor/constructor for that subclass?  That seems fragile, if not dangerous, in general...

(Even if we made it protected/private, I'm not sure that really helps -- its callsite is from our Iterator class, which I suspect doesn't know which type of list it's iterating.)

@@ +1155,5 @@
>      bool mTriedConstructingFrames;
>    };
>  
> +  /* A class representing a list of FrameConstructionItems on the stack. */
> +  struct MOZ_RAII AutoFrameConstructionItemList final

Nit: s/A class/A struct/

@@ +1320,5 @@
>      bool mIsForSVGAElement:1;
>  
>    private:
> +    // Not allocated from the general heap - use AllocateFCItem/FreeFCItem.
> +    void* operator new(size_t) = delete;

The instruction at the end of this comment -- "use AllocateFCItem/FreeFCItem" -- is probably suboptimal advice. (Nobody should actually directly call those functions -- instead they should use the blessed wrappers new/Delete, right?)

Maybe instead:
    // Not allocated from the general heap - instead, use the new/Delete APIs
    // that take a nsCSSFrameConstructor* (which manages our arena allocation).

@@ +1337,5 @@
> +      MOZ_ASSERT(mChildItems.IsEmpty(), "leaking");
> +    }
> +  };
> +
> +  struct MOZ_RAII AutoFrameConstructionItem final

This struct needs some documentation.

In particular, one thing that confused me initially: FCItemList/AutoFCItemList and FCItem/AutoFCItem have the same naming pattern, but they don't have the same relationship. The "list" structs are related to each other through inheritance, but the "item" structs are not.  (Though, in both cases, the Auto version is castable to the non-Auto version.)

Possible documentation:
  /**
   * Convenience struct to assist in managing a temporary FrameConstructionItem
   * using a local variable. Castable to FrameConstructionItem so that it can
   * be passed transparently to functions that expect that type.
   * (This struct exists because FrameConstructionItem is arena-allocated, and
   * it's nice to abstract away its allocation/deallocation.)
   */
Attachment #8884378 - Flags: review?(dholbert) → review-
Part 2 with Eric's feedback addressed.  Attaching it here for posterity,
but I've decided not to use it.
Attachment #8884374 - Attachment is obsolete: true
Flags: needinfo?(mats)
(In reply to Daniel Holbert [:dholbert] from comment #19)
> Thinking out loud about the memory usage of this new member: Given that we
> clean up this structure using ClearRetainOneChunk(), I think this will
> increase the typical "resting" per-nsCSSFrameConstructor memory usage by
> 4KB. I don't have a good feel for our current per-document memory usage...
> but I'm imagining the potentially-worst-case being a news site that has e.g.
> 300 tiny ad/tracker iframes.  So if each iframe gets its own
> nsCSSFrameConstructor, this changeset will cause a 300 * 4KB = 1200KB =
> 1.2MB increase in resting memory usage for such a site.  Correct?
> 
> That feels reasonably small for a "~worst case" site, and feels like a
> reasonable tradeoff IMO.

Correct.  I think in general that 4KB per shell is acceptable for
a demonstratable performance improvement.  However, I went back and
measured how much of an improvement ClearRetainOneChunk really gives,
and it's neglible.  Even for huge pages like the single-page-html-spec
it's just a 1ms improvement in wall clock time of the page load, which
is in the 8 sec range.  It might be better on some pages that generate
(re-)frames very frequently but that (also) seems like an edge case.
So I decided that, for now at least, just call Clear() instead.

We can always add it later if we find it's valuable on some sites.
Perhaps also with a timer that would remove the retained chunk
if it isn't used within a certain time to avoid the memory bloat.

(Forwarding r+ since it's just a s/ClearRetainOneChunk/Clear/ change.)
Attachment #8906106 - Flags: review+
Attachment #8906096 - Attachment description: part 2 - Introduce ArenaAllocator::ClearRetainOneChunk() that Clears the arena but retains one chunk to avoid malloc/free churn when the arena is re-used a lot → (not landed) Introduce ArenaAllocator::ClearRetainOneChunk() that Clears the arena but retains one chunk to avoid malloc/free churn when the arena is re-used a lot
Priority: -- → P3
(In reply to Daniel Holbert [:dholbert] from comment #26)
> Maybe name this "Destroy"? or "PreDestructor"? 
>  (2) Could you move this method down to the "protected" section

Sure, I renamed it Destroy and made it protected.

>  #       // We could store the frame manager in a member, but just
>  #       // getting it off the style context is not too bad.
>  #       nsFrameManager *mgr =
>  #         mUndisplayedItems[0].mStyleContext->PresContext()->FrameManager();
> 
> I think this chunk doesn't make sense anymore -- the nsFrameManager is just
> our associated nsCSSFrameConstructor (which derives from nsFrameManager),

Removed as suggested.

> Specifically: is it safe for someone to invoke this method & directly call
> our superclass destructor/construtor on an instance of
> AutoFrameConstructionItemList, without invoking the destructor/constructor
> for that subclass?  That seems fragile, if not dangerous, in general...

Yes, I believe it should be safe also for that case.
But yes, I agree it's a bit fragile.  However, I believe frame construction
will soon-ish be moved to Rust so perhaps we can live with this for a while.
Also, it's extremely rare that someone changes these classes in any substantial way...

> The instruction at the end of this comment -- "use
> AllocateFCItem/FreeFCItem" -- is probably suboptimal advice. (Nobody should
> actually directly call those functions -- instead they should use the
> blessed wrappers new/Delete, right?)
> 
> Maybe instead:
>     // Not allocated from the general heap - instead, use the new/Delete APIs
>     // that take a nsCSSFrameConstructor* (which manages our arena
> allocation).

Fixed as suggested.

> > +  struct MOZ_RAII AutoFrameConstructionItem final
> 
> This struct needs some documentation.

Fixed as suggested.
Attachment #8884377 - Attachment is obsolete: true
Attachment #8884378 - Attachment is obsolete: true
Attachment #8906157 - Flags: review?(dholbert)
Comment on attachment 8906157 [details] [diff] [review]
part 3 - Use AllocateFCItem/FreeFCItem exclusively for allocating FrameConstructionItems

r=me
Attachment #8906157 - Flags: review?(dholbert) → review+
We should probably track this for [qf].

(Note that it was filed for speedometer (comment 2, comment 5), and mats found there to be perf improvements at least with early versions of the patch (comment 6) -- a reduction from 100ms to 30ms (i.e. a 70ms improvement) in malloc/free time on speedometer.)
Whiteboard: [qf]
Pushed by mpalmgren@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/7ee14a111f8c
part 1 - Move the mUndisplayedItems field to the start to avoid alignment spill after a bool.  r=dholbert
https://hg.mozilla.org/integration/mozilla-inbound/rev/8bc4d3b25dfb
part 2 - Introduce nsCSSFrameConstructor::AllocateFCItem/FreeFCItem for allocating FrameConstructionItems from an arena/free list.  r=dholbert
https://hg.mozilla.org/integration/mozilla-inbound/rev/9dcca3ac0aaa
part 3 - Use AllocateFCItem/FreeFCItem exclusively for allocating FrameConstructionItems.  r=dholbert
Performance Impact: --- → ?
Whiteboard: [qf]
You need to log in before you can comment on or make changes to this bug.