Closed Bug 1364805 Opened 7 years ago Closed 7 years ago

Add a nsIFrame::mClass field with the concrete frame class ID to devirtualize various frame methods

Categories

(Core :: Layout, enhancement, P2)

enhancement

Tracking

()

RESOLVED FIXED
mozilla55
Tracking Status
firefox55 --- fixed

People

(Reporter: MatsPalmgren_bugz, Assigned: MatsPalmgren_bugz)

References

(Blocks 2 open bugs)

Details

(Keywords: perf)

Attachments

(4 files)

In bug 1360241 we added mType to devirtualize GetType(), which is great.
In bug 1362886 we tried to devirtualize IsLeaf() by piggybacking a
table lookup based on this field, which did not end well.  We could
simply add a bit for that, but there are other things we want to
devirtualize like IsFrameOfType, do_QueryFrame etc and storing what
is mostly static class data on every frame for each of those seems
wasteful.

In this bug I will replace mType with a mClass field (also a byte) to
store the unique nsQueryFrame::FrameIID which every concrete frame class
has (it's used for allocating it from the pres shell arena).
This can be used to devirtualize all these methods using a table-based
lookup.

I'll refactor Type() to use this field so that it becomes
"return sLayoutFrameTypes[mClass]".
Keywords: perf
Blocks: 1364813
Blocks: 1364815
Mostly just boilerplate changes.  The relevant change is to nsFrame.h.
Attachment #8867979 - Flags: review?(jfkthame)
The interesting changes are in nsIFrame.h, nsFrame.* and nsQueryFrame.h.
I'm adding a nsQueryFrame::ClassID enum class and a nsIFrame::mClass
field of that type.  I'm also adding a kClass static const to every
concrete frame class, and then propagating that value up the ctor
chain to nsIFrame.  I've also added an assertion in nFrame::Init:
  MOZ_ASSERT(nsQueryFrame::FrameIID(mClass) == GetFrameId());
which should catch any errors.

(I'm intentionally keeping mType for now, it'll be removed in part 3)
Attachment #8867980 - Flags: review?(jfkthame)
This adds the Type() values for each concrete frame class in nsFrameIdList.h
so that we can generate an array with those values to look them up
in Type(), which becomes "return sLayoutFrameTypes[mClass]".

Fwiw, I did a Try run that asserted that mType == sLayoutFrameTypes[mClass]
before I removed that field and it passed all regression tests.
Attachment #8867982 - Flags: review?(jfkthame)
This sounds reasonable, just wondered if you have been able to see any impact on profiles or is it all so low-level that it gets lost in the noise?
I suspect that each of these suggested improvements (see depends list) is probably
too small to move the needle on something like Tp5.  I've seen them show up in
perf logs, and I think others (including Ehsan) have too; usually in the ~0.1%
range though.  I do think it's worth optimizing though, since it's pretty easy
to do and even small improvements eventually adds up.
I'll try to do a talos Try run with all of them when I get the chance...
How does the cost of looking up a global variable compare to the cost of a virtual function call?  (Does this vary by OS, depending on how the adjustments for loading libraries at different base addresses are handled?)
Comment on attachment 8867979 [details] [diff] [review]
part 1 - Make every concrete frame class be a NS_DECL_QUERYFRAME_TARGET.

Review of attachment 8867979 [details] [diff] [review]:
-----------------------------------------------------------------

I confess I didn't individually check all the boilerplate changes!
Attachment #8867979 - Flags: review?(jfkthame) → review+
Comment on attachment 8867980 [details] [diff] [review]
part 2 - Add a nsIFrame::mClass field and propagate the concrete class' value up the ctor chain.

Review of attachment 8867980 [details] [diff] [review]:
-----------------------------------------------------------------

::: layout/forms/nsFormControlFrame.h
@@ +24,5 @@
>      * Main constructor
>      * @param aContent the content representing this frame
>      * @param aParentFrame the parent frame
>      */
> +  explicit nsFormControlFrame(nsStyleContext*, nsIFrame::ClassID);

We could drop the 'explicit' here (and in other places where a 1-arg constructor is becoming a 2-arg one), though I guess it's harmless to leave it.

::: layout/generic/nsIFrame.h
@@ +606,4 @@
>  
>    NS_DECL_QUERYFRAME_TARGET(nsIFrame)
>  
> +  explicit nsIFrame(ClassID aID, mozilla::LayoutFrameType aType)

Drop 'explicit'.

(Looks like there are quite a few more places we could do this; I'm not noting them individually.)
Attachment #8867980 - Flags: review?(jfkthame) → review+
Comment on attachment 8867982 [details] [diff] [review]
part 3 - Implement Type() by indexing a static array using the mClass field.  Remove the nsIFrame::mType field.

Review of attachment 8867982 [details] [diff] [review]:
-----------------------------------------------------------------

::: layout/generic/nsIFrame.h
@@ +2643,5 @@
>     *
>     * @see mozilla::LayoutFrameType
>     */
> +  mozilla::LayoutFrameType Type() const {
> +    MOZ_ASSERT(uint8_t(mClass) < MOZ_ARRAY_LENGTH(sLayoutFrameTypes));

s/MOZ_ARRAY_LENGTH/mozilla::ArrayLength/, I think.
Attachment #8867982 - Flags: review?(jfkthame) → review+
Comment on attachment 8867980 [details] [diff] [review]
part 2 - Add a nsIFrame::mClass field and propagate the concrete class' value up the ctor chain.

Review of attachment 8867980 [details] [diff] [review]:
-----------------------------------------------------------------

::: layout/generic/nsIFrame.h
@@ +606,4 @@
>  
>    NS_DECL_QUERYFRAME_TARGET(nsIFrame)
>  
> +  explicit nsIFrame(ClassID aID, mozilla::LayoutFrameType aType)

Actually, given that patch 4 will then remove the second (original) param, you might as well leave this alone at this stage.
Attachment #8867983 - Flags: review?(jfkthame) → review+
(In reply to David Baron :dbaron: ⌚️UTC-7 from comment #8)
> How does the cost of looking up a global variable compare to the cost of a
> virtual function call?  (Does this vary by OS, depending on how the
> adjustments for loading libraries at different base addresses are handled?)

Intuitively, I'd expect the global variable to be slightly cheaper; both need to look something up -- the global var or the class's vtable -- but the global variable doesn't involve a PC jump, so instruction cache behavior should be better. But the combination of optimizing compilers and CPU features like branch prediction can easily play havoc with my intuition about stuff...!
(In reply to David Baron :dbaron: ⌚️UTC-4 from comment #8)
> How does the cost of looking up a global variable compare to the cost of a
> virtual function call?  (Does this vary by OS, depending on how the
> adjustments for loading libraries at different base addresses are handled?)

I posted some performance analysis in bug 1362886 comment 17.
It seems to me that an array lookup is 1.3-2.0 times faster than a virtual call,
even under ideal conditions for virtual calls (no I-cache misses).
Pushed by mpalmgren@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/45bf72e9b3b3
part 1 - Make every concrete frame class be a NS_DECL_QUERYFRAME_TARGET.  r=jfkthame
https://hg.mozilla.org/integration/mozilla-inbound/rev/47ed59045f9b
part 2 - Add a nsIFrame::mClass field and propagate the concrete class' value up the ctor chain.  r=jfkthame
https://hg.mozilla.org/integration/mozilla-inbound/rev/9164c8933697
part 3 - Implement Type() by indexing a static array using the mClass field.  Remove the nsIFrame::mType field.  r=jfkthame
https://hg.mozilla.org/integration/mozilla-inbound/rev/78e989ad55af
part 4 - Remove the now unused LayoutFrameType values from the ctors.  r=jfkthame
According to AreWeFastYet.com, this contributed a 20% improvement to speedometer-misc-Preact-TodoMVC-Adding100Items-sync.

https://arewefastyet.com/#machine=17&view=single&suite=speedometer-misc&subtest=Preact-TodoMVC-Adding100Items-sync
Also some nice improvements in speedometer-misc-VueJS-TodoMVC-Adding100Items-sync:

https://arewefastyet.com/#machine=17&view=single&suite=speedometer-misc&subtest=VueJS-TodoMVC-Adding100Items-sync
Whiteboard: [sp3]
Whiteboard: [sp3]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: