Closed Bug 1368369 Opened 3 years ago Closed 3 years ago

Use an nsTArray rather than AutoTArray to hold frame properties and embed FrameProperties directly in nsIFrame to reduce RAM footprint

Categories

(Core :: Layout, enhancement)

enhancement
Not set

Tracking

()

RESOLVED FIXED
mozilla55
Tracking Status
firefox55 --- fixed

People

(Reporter: jfkthame, Assigned: jfkthame)

References

Details

Attachments

(2 files, 2 obsolete files)

The FrameProperties object in bug 1365982 uses an AutoTArray<PropertyValue,1>, so that a single property can be stored in the FrameProperties object itself. When more than one property is present, an additional heap allocation will be needed to create a separate array contents buffer.

It might be worth making the autoarray buffer slightly larger, so that the need for an additional allocation becomes rarer, but of course there'll be a space trade-off to be considered.

I did some general browsing with an instrumented build (visiting major sites like wikipedia, ebay, amazon, nytimes, bbc news, gmail, ...). Altogether, about 170K frames of all types were created during the (short) session. Of these, 66% never had any frame properties set, and so no FrameProperties object was created for them.

Of the 34% (57K) frames that did get properties, about 60% only ever had a single property, and so it was stored in the autoarray's inline buffer, with no additional allocation.

A further 22% had two properties, and a further 8% had three. Higher numbers of properties on a frame, as would be expected, become increasingly rare.

This suggests that if we bumped the autoarray buffer size from 1 to 3 (increasing the size of FrameProperties, assuming a 64-bit build, from 32 to 64 bytes), we'd avoid separate array buffer allocation for about 90% of the frames that have any properties attached (up from 60% now).

Profiling suggests this may significantly reduce the time spent under FrameProperties methods. With a current trunk build, loading the HTML5 spec and filtering the profile for FrameProperties, I generally see around 18-20ms. With the autoarray buffer increased to 3, this drops to around 10-12ms.
Maybe worth doing, for the slight extra perf (see above)?
Attachment #8872190 - Flags: review?(mats)
Assignee: nobody → jfkthame
Status: NEW → ASSIGNED
Another way of saying that is that increasing the AutoTArray size to 3 makes an additional
10.2% (96.6% - 86.4%) of the total number of frames avoid one extra malloc during its lifetime?
Using your numbers:
66% of all frames have no properties
20.4% have 1 property   => 86.4%  (with <=1)
7.48% have 2 properties => 93.88% (with <=2)
2.72% have 3 properties => 96.6%  (with <=3)

I looked into the allocation size overhead a bit, which increases significantly with this change:

AutoTArray<PropertyValue,1>
0: 0 (i.e. I'm not including the 8 bytes for nsIFrame:mProperties in any numbers)
1: required_sz:16 actual_sz:32 mallocs:1 (overhead: 16 -- for 34199 frames that's 547184 bytes)
2: required_sz:32 actual_sz:96 mallocs:2 (overhead: 64 -- for 12539 frames that's 802496 bytes)
3: required_sz:48 actual_sz:96 mallocs:2 (overhead: 48 -- for 4559 frames that's 218832 bytes)
4: required_sz:64 actual_sz:160 mallocs:3 (overhead: 96)
5: required_sz:80 actual_sz:160 mallocs:3 (overhead: 80)
6: required_sz:96 actual_sz:160 mallocs:3 (overhead: 64)
7: required_sz:112 actual_sz:160 mallocs:3 (overhead: 48)
8: required_sz:128 actual_sz:288 mallocs:4 (overhead: 160)

AutoTArray<PropertyValue,3>
0: 0 (i.e. I'm not including the 8 bytes for nsIFrame:mProperties in any numbers)
1: required_sz:16 actual_sz:64 mallocs:1 (overhead: 48 -- for 34199 frames that's 1641552 bytes)
2: required_sz:32 actual_sz:64 mallocs:1 (overhead: 32 -- for 12539 frames that's 401248 bytes)
3: required_sz:48 actual_sz:64 mallocs:1 (overhead: 16 -- for 4559 frames that's 72944 bytes)
4: required_sz:64 actual_sz:192 mallocs:2 (overhead: 128)
5: required_sz:80 actual_sz:192 mallocs:2 (overhead: 112)
6: required_sz:96 actual_sz:160 mallocs:2 (overhead: 64)
7: required_sz:112 actual_sz:160 mallocs:2 (overhead: 48)
8: required_sz:128 actual_sz:288 mallocs:3 (overhead: 160)

The "overhead" here is the sum of moz_malloc_usable_size for the allocations made
minus the sizeof(mProperties[0]) * number of properties), which is also an
*underestimation* since we're wasting 8 bytes for the key for each value,
but for comparing N=1 vs N=3 this doesn't matter much.

So, currently for N=1, we have: total overhead 1568512, 68395 mallocs
and for N=3, we have: total overhead 2115744, 51297 mallocs.
So with an additional 547232 bytes overhead, we save 17098 mallocs.
(only counting frames with 3 properties or less for which you gave frame counts)

I'm not sure these stats helps much to make a decision, but I wanted to save
them for the record anyway.


> ... loading the HTML5 spec and filtering the profile for FrameProperties,
> I generally see around 18-20ms. With the autoarray buffer increased to 3,
> this drops to around 10-12ms.

While it's an impressive improvement in terms of time spent in FrameProperties,
we're still "just" saving 8ms for a page load that takes about 6s, so I'm
kind of leaning towards it's not worth it given the additional memory overhead.

Can you provide more details about the page load improvement for some important
web site(s)?  How much memory is typically allocated for that page and what is
the typical load time? how many operations is performed for each of the sizes?
how much FrameProperties time did we save by using N=3?
This would make it easier to see if the memory cost is worth it or not.
Flags: needinfo?(jfkthame)
Actually, what I think we should do, to make it easier to explore this, is to address the fact that currently frame properties are not accounted for at all in about:memory. :( Prior to bug 1365982, the size of the FramePropertyTable was included in the layout/pres-contexts figure, but now that the properties are no longer managed by the presContext, that's not the case. But they're also not accounted for by the individual frames, which are just reported via the arena stats.

So we need to add a reporter/mechanism that handles the FrameProperties. I'll file a separate bug on this.
Flags: needinfo?(jfkthame)
Depends on: 1368654
Another possibility here, actually, would be to replace the AutoTArray with a plain nsTArray, and then in nsIFrame, make the mProperties field directly hold a FrameProperties object instead of a pointer.

The size of FrameProperties would just be the size of an nsTArray (i.e. the same as a pointer); a frame with no properties would do no additional allocations; when the first property is added, the nsTArray would allocate storage (where currently we create the FrameProperties object at that time); as more properties are added, array storage allocation just continues as usual.

Exactly how this compares (in speed and memory use) to the AutoTArray<PropertyValue,1> version would depend on details of how array allocation is managed, but it should be pretty close. In simple profiling, it seems to show more time spent in FrameProperties methods; but I suspect that's misleading, because the current code has the overhead of checking "if (!mProperties) ..." in the nsIFrame accessors (so it doesn't show up under FrameProperties in the profile), whereas the nsTArray version can drop these checks; in effect, they happen implicitly inside the FrameProperties methods.
Here's the alternative approach mentioned above. I rather like the simplification this gives. Better memory reporting would be helpful in evaluating the trade-offs here, though, before making any actual decisions
Using the patch in bug 1368654, here are snippets from about:memory showing frame-properties memory usage for a few typical pages. (These initial results are with current trunk code where FrameProperties has an AutoTArray<PropertyValue,1>.)

eBay search results page
│  │  │   │  │   ├────133,792 B (00.06%) ── frame-properties [4]
nytimes article
│   │  │  │  │   ├─────69,008 B (00.04%) ── frame-properties [3]
bbc news home page
│   │  │   │  ├─────76,784 B (00.04%) ── frame-properties
amazon search results
│   │  │  │   ├────403,120 B (00.33%) ── frame-properties
wikipedia article
│  │  │   ├────489,712 B (00.50%) ── frame-properties


Updating FrameProperties to use AutoTArray<PropertyValue,3>, and loading the same pages, we get:

eBay
│   │  │  │  │   ├────206,784 B (00.10%) ── frame-properties [4]
nytimes
│   │  │  │  │   ├────111,904 B (00.06%) ── frame-properties [2]
bbc
│      │   │  ├────117,232 B (00.07%) ── frame-properties
amazon
│   │  │  │   ├────447,472 B (00.36%) ── frame-properties
wikipedia
│  │  │   ├────818,832 B (00.83%) ── frame-properties

(The test pages may not be identical, due to search results changing between runs, etc., but should be pretty comparable.)

So that's a pretty substantial increase in RAM footprint, which I don't think looks worthwhile for the sake of a millisecond here or there.

OTOH, replacing the AutoTArray with an nsTArray<PropertyValue>, as in the alternative patch above (comment 5) is much more interesting:

eBay
│   │  │  │   ├─────60,032 B (00.03%) ── frame-properties [4]
nytimes
│   │  │  │  │   ├─────26,416 B (00.01%) ── frame-properties [2]
bbc
│   │  │   │  ├─────36,320 B (00.02%) ── frame-properties
amazon
│   │  │  │   ├────150,160 B (00.12%) ── frame-properties
wikipedia
│   │  │   ├────169,104 B (00.17%) ── frame-properties

Given this, I think it's clear we should switch to the nsTArray implementation and get rid of the AutoTArray overhead altogether.
Summary: Consider increasing the autoarray buffer size in FrameProperties, to reduce number of separate allocations → Use an nsTArray rather than AutoTArray to hold frame properties and embed FrameProperties directly in nsIFrame to reduce RAM footprint
Attachment #8872606 - Attachment is obsolete: true
You might want to re-measure, see bug 1368654 comment 3. :-)
(In reply to Mats Palmgren (:mats) from comment #8)
> You might want to re-measure, see bug  comment 3. :-)

Indeed, thanks for catching that!

Re-measuring with the part 2 patch from bug 1368654 included:

Current (FrameProperties uses AutoTArray<PropertyValue,1>):

eBay
│   │  │  │  │   ├────151,968 B (00.08%) ── frame-properties [4]
nytimes
│   │  │  │   ├─────71,808 B (00.04%) ── frame-properties
bbc news
│   │  │   │  │  ├─────95,392 B (00.05%) ── frame-properties
amazon
│   │  │  │   ├────264,928 B (00.23%) ── frame-properties
wikipedia
│   │  │   ├────488,928 B (00.47%) ── frame-properties


AutoTArray<PropertyValue,3>:

eBay
│  │  │   │  │   ├────174,976 B (00.08%) ── frame-properties [4]
nytimes
│   │  │  │   ├─────92,288 B (00.04%) ── frame-properties
bbc news
│   │  │   │  │  ├─────95,936 B (00.04%) ── frame-properties
amazon
│   │  │  │   ├────371,776 B (00.32%) ── frame-properties
wikipedia
│   │  │   ├────647,936 B (00.63%) ── frame-properties


nsTArray<PropertyValue>:

eBay
│   │  │  │  │   ├────118,400 B (00.07%) ── frame-properties [4]
nytimes
│   │  │  │  │   ├─────54,688 B (00.03%) ── frame-properties
bbc news
│   │  │   │  │  ├─────68,960 B (00.04%) ── frame-properties
amazon
│   │  │  │   ├────222,880 B (00.19%) ── frame-properties
wikipedia
│   │  │   ├────366,112 B (00.36%) ── frame-properties
Attachment #8872190 - Attachment is obsolete: true
Attachment #8872190 - Flags: review?(mats)
Comment on attachment 8872706 [details] [diff] [review]
Use nsTArray instead of AutoTArray in FrameProperties, and embed the FrameProperties object directly in nsIFrame

> -    MOZ_ASSERT(mMaxLength > 0, "redundant FrameProperties!");

It looks like this was the only use of mMaxLength, so we can
now remove it?

Also, I was looking through the Set/Add/Remove/DeleteInternal
methods in the .cpp file and I think those are trivial enough
to be moved to the .h file to give the compiler the chance to
optimize these calls better.  It will then see all the way from
call site -> nsIFrame -> FrameProperties -> nsTArray.
(Alternatively, add a FramePropertiesInlines.h file if you
think it's too much code for the .h file)
Feel free to tack that on as a part 2 here if you agree.
Attachment #8872706 - Flags: review?(mats) → review+
Added patch to move the internal methods to the header, as suggested. (I've also updated the first patch locally to remove mMaxLength, now redundant.)
Attachment #8873078 - Flags: review?(mats)
Comment on attachment 8873078 [details] [diff] [review]
part 2 - Move internal FrameProperties methods to the header as inlines, for better optimization opportunities

Thanks, r=mats

>+  void DeleteAll(const nsIFrame* aFrame) {
>+    for (auto& prop : mProperties) {
>+      prop.DestroyValueFor(aFrame);

Perhaps this is overkill, but it wouldn't hurt to assert that Length()
doesn't change while we iterate here.  Something like this?:

#ifdef DEBUG
    size_t len = mProperties.Length();
#endif
    for (auto& prop : mProperties) {
      prop.DestroyValueFor(aFrame);
      MOZ_ASSERT(len == mProperties.Length());
Attachment #8873078 - Flags: review?(mats) → review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/4abbee1979c215ffbf126534d45bf1ca76c0b29b
Bug 1368369 - pt 1 - Use nsTArray instead of AutoTArray in FrameProperties, and embed the FrameProperties object directly in nsIFrame. r=mats

https://hg.mozilla.org/integration/mozilla-inbound/rev/0ea05f32de96073ce623e0831c0f79a65daebda0
Bug 1368369 - pt 2 - Move internal FrameProperties methods to the header as inlines, for better optimization opportunities. r=mats
https://hg.mozilla.org/mozilla-central/rev/4abbee1979c2
https://hg.mozilla.org/mozilla-central/rev/0ea05f32de96
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
You need to log in before you can comment on or make changes to this bug.