For example loading the single page HTML spec spends ~3% in the slow hashtable lookups related to FramePropertyTable. Could we perhaps move some properties to be member variables of nsIFrame or some kind of slots structure? (This bug might be a dup of some other bug I've filed years ago)
Do I recall correctly that dholbert is looking at some layout perf issues. This bug is about the generic overhead.
I've already started working on this actually... I'm gathering some data on which properties are used on which types of frames and the average count and such things...
The other thing we do quite a bit (and could do more, depending on where the perf problems are and what the usage patterns are) is guard lookups in the frame property table with frame state bits.
Created attachment 8848871 [details] frame property stats on html.spec.whatwg.org Notes: This is from a debug build, so ignore DebugInitialOverflowPropertyApplied. These properties are in use at the time PresShell::Destroy() is called, i.e. any transient properties, for example only used during reflow, are not captured here. Those are likely *in addition* to these properties. Data for each PresShell is separated by: ======================================== <URL> Thus, data for the main document starts at: ======================================== https://html.spec.whatwg.org/
Created attachment 8848929 [details] frame property stats on html.spec.whatwg.org This is from a Linux Opt build.
ALL: 385756 frames, 214606 (55.6%) had properties; total 264015 properties The two frame properties that stand out are: OverflowAreasProperty (type nsOverflowAreas): on 63383 frames (29.5%) BidiDataProperty (type mozilla::SmallValueHolder<mozilla::FrameBidiData>): on 166957 frames (77.8%) (the percentages at the end are relative the frames with one or more properties (55.6%), so roughly half that number of all frames) I filed bug 1348697 to investigate OverflowAreasProperty. The number of BidiDataProperty is just way too much... :-( Why do we need that many? This document doesn't contain much RTL text does it?
(In reply to Mats Palmgren (:mats) from comment #6) > The number of BidiDataProperty is just way too much... :-( > Why do we need that many? > This document doesn't contain much RTL text does it? AIUI, it doesn't matter whether it contains "much". The only thing matters is whether it contains any. If there is any, bidi algorithm would be triggered for all content and fill probably every inline frame with BidiDataProperty. I'm not sure whether it's worth doing a finer-grained bidi detection.
(In reply to Xidorn Quan [:xidorn] (UTC+10) from comment #7) > (In reply to Mats Palmgren (:mats) from comment #6) > > The number of BidiDataProperty is just way too much... :-( > > Why do we need that many? > > This document doesn't contain much RTL text does it? > > AIUI, it doesn't matter whether it contains "much". The only thing matters > is whether it contains any. If there is any, bidi algorithm would be > triggered for all content and fill probably every inline frame with > BidiDataProperty. As soon as we encounter the first bidi character in a textnode, all bets are lost :( <http://searchfox.org/mozilla-central/rev/9af9b1c7946ebef6056d2ee4c368d496395cf1c8/dom/base/nsGenericDOMDataNode.cpp#357>
To Mats per comment 2
Given that the hashtable lookup seems to be a substantial part of the cost here, I wonder if one option would be to hang the list of properties directly off nsIFrame? At the cost of adding one pointer to nsIFrame, we'd avoid the hashtable search entirely and go directly to the individual frame's PropertyValue [array]. Re BidiDataProperty: this is a 3-byte struct, and as it happens, there are currently 3 bytes of padding in nsIFrame, so we could store the BidiDataProperty value directly in nsIFrame without increasing its size. (Better still, of course, if we could avoid the need for it altogether in more cases, but that may be hard...)
Something to figure out too: What is the size of typical nsIFrame implementations? How do they fit into FrameArena and does FrameArena use jemalloc the optimal way? Do we possibly have some spare bytes there?