Closed Bug 104336 Opened 23 years ago Closed 23 years ago

Eliminate nsIRuleNode, nsIRuleWalker; inline some of nsIFrame

Categories

(Core :: CSS Parsing and Computation, defect)

x86
Windows 2000
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla0.9.6

People

(Reporter: hyatt, Assigned: hyatt)

References

(Blocks 1 open bug)

Details

(Keywords: perf)

Attachments

(2 files)

As established in performance work (both profiling internally and with what we
know from the Intel guys), we know that the virtual function overhead of
GetStyleData is killing us.  This bug proposes to remedy the problem by
eliminating nsIStyleContext and nsIRuleNode.

Here is the proposed strategy for doing so:
(1) We need to implement a smart pointer class that can ref count objects
without all of the extra COM baggage.  dbaron proposed the name "nsRefPtr." 
This new pointer will support all the same ref-counting mechanisms as nsCOMPtr,
e.g., getter_AddRefs.
(2) Armed with this pointer, we eliminate nsIStyleContext and nsIRuleNode from
the tree and eliminate their virtual functions, i.e., in nsStyleContext and
nsRuleNode the functions become non-virtual.  The rule node and style context
files move into content/shared so that they can be used by both layout and content.
(3) GetStyleData is expanded to take the pres context as an argument, so that
the mPresContext member variable of rule nodes can be eliminated (thus enabling
brutal sharing of rule trees, which will give us a footprint win in the chrome).
(4) We eliminate nsIFrame::GetStyleData, and make sure people go through the
style context instead.
Status: NEW → ASSIGNED
Keywords: perf
Target Milestone: --- → mozilla0.9.7
(4) I'm not sure whether we want to eliminate nsIFrame::GetStyleData or do the
same transition from nsIFrame to an nsAFrame that's more like nsAString --
having some concrete methods (perhaps all inline, at least at first, to avoid
linkage problems?).

(2) Also, an alternative to moving these things into content/shared would be to
move all the style stuff back into layout, where I think it belongs -- after
all, layout is a CSS-based layout engine.  This would probably require
redesigning attribute mapping, which might not be a bad thing anyway.

(3) I've been wondering whether some of the parts of the pres context that
describe longer-lived state (e.g., per-medium) should be on a longer lived
object, or whether the lifetimes of the pres context and pres shell should be
different.  But that's a different bug entirely.
(1) Thanks to the overloading of getter_AddRefs (there's one that means the same
thing as dont_AddRef), we couldn't actually use the name getter_AddRefs. 
Although we could change nsCOMPtr to eliminate that overloading and deprecate
|template <class T> inline const already_AddRefed<T> getter_AddRefs( T* aRawPtr
)| in favor of |dont_AddRef|, which is something I wouldn't mind doing anyway...
So a simple first pass for the frame problem would be to move nsFrame's members
into nsIFrame and then un-virtualize and inline some of the common functions
(like GetStyleData and GetContent).  Then nsIFrame's GetStyleData wouldn't have
to be eliminated, since the vptr overhead has been eliminated.  Then as time
permits we could make the name change from nsIFrame to nsAFrame.

Moving data members into nsIFrame and making non-virtual functions? I think this
is a bad idea - I am not ready to abandon the interface yet, especially since we
have not formulated out plans for extensible frames, user-implemtned frames etc.
I'm not sure about this - it seems like just eliminating GetStyleData from the
nsIFrame would be better. On the other hand, if we want to rearchitect nsIFrame
and nsFrame then that is cool, but that is a big analysis and design job. See
bug 51684
Is there a hack that we could do cheaply to verify that we'd see gain here? (Not
to check in -- just to verify.)

attinasi: this showed up as _the_ hottest spot (in the entire browser!) in terms
of the number of missed branch predictions. IOW, vtables are screwing us badly here.
I think before we worry about the bath water or the baby, we need to finish the
bathing process and scrub our performance-baby till it shines.  nsIFrame is not
a proper interface, so not a viable extension mechanism at present or in any
near term (mozilla1.0).  The COM costs are bad enough when paid within a DLL,
where there is no credible need, plan, or hope for extensibility via public
XPCOM APIs.  It's even sillier for not-quite-COM (NQCOM) code that gives us all
the costs and none of the gains, apart from maybe QI.

We need to factor out ref-counting and an nsRefPtr so code that currently uses
the XPCOM hammer on all nails, but that would rather use ref-counted concrete
classes and non-virtual methods, can do so.

Our performance story is hurt in lots of ways by gratuitous XPCOM and NQCOM
usage.  L2 cache miss rate is too high on P4, in addition to mispredicted
branches, although that may be due more to data traffic than to gratuitously
many (due to virtual methods, or otherwise) instructions.  We're not only taking
too many cycles compared to IE (due to misprediction stalls, cache misses, etc.)
-- we're simply issuing too many instructions per page load.  Code size bloat is
another target.

Let a sensible de-COMtamination process begin!

/be
Can we get vtune data out somewhere where we can poke over it?  (Not that I
think there's anything but "win" in pushing common handling code into a common
nsIFrame base class.)
XBL also uses way too much COM.  The problem I see here is that we have many
objects (in layout and content) that only implement one interface, e.g.,
nsIStyleContext/nsStyleContext, nsIRuleNode/nsRuleNode,
nsIXBLBinding/nsXBLBinding) and that have no subclasses.  These objects are not
extensible, and they are used by only two DLLs, layout and content.  All of
these objects should be converted away from COM.

As for layout's extensibility story (i.e., XLCs or extensible layout
components), I wrote up a design for how such an extensibility story would work
based off our layout technical meeting for Vidur and Dave Barrowman.  This story
does not involve exposing nsIFrame.  In this design, nsIFrame becomes nsAFrame
and is not exposed outside of the layout system.  

nsIFrame should never morph into anything extensible.  There will always be a
need for a fast internal representation of our frame tree, our style data, and
our XBL bindings, and that representation should contain no COM.  

There should be no hesitation over eliminating nsIFrame or any worries that our
extensibility story will suffer if nsIFrame acquires concrete non-virtual
methods.  Believe me, it won't.
For the record, I'm not _against_ removing nsIFrame - I just have not given it
much thought and I'm really really (painfully?) cautious :)  I also do not have
the benefit of reading Hyatt's doc on XLC's (hint hint).

Also, given what waterson said about this being...

"_the_ hottest spot (in the entire browser!) in terms of the number of missed
branch predictions"

What does thime mean in terms of real, user-perceived performance problems? I
guess that is the 'hack' Chris was referring to.
Blocks: 71668
I cut 4-5% off page load time (cached pages tested) by moving nsFrame's member
variables into nsIFrame and then inlining all getter/setter functions on those
member variables.

I've also fully deCOMtaminated nsRuleNode and done some cleanup work there.  The
perf win from this was pretty negligible, but the footprint win is quite large
(shaving off 8 bytes per rule node, since I can ditch the nsIRuleNode and
mRefCnt bytes).
Target Milestone: mozilla0.9.7 → mozilla0.9.6
Summary: Eliminate nsIStyleContext and nsIRuleNode → Eliminate nsIRuleNode, nsIRuleWalker, inline nsIFrame funcs
This bug is ready for r/sr.
Do you have a diff of the old nsRuleNode.cpp and the new one (since they moved
locations)?
Blocks: deCOM
Loks like nsIFrame needs to be renamed in this patch since it is not an
interface but a class.

Rather than keep the nsIFrame class at all, why not eliminate it entirely, make
the methods you care about inline in nsFrame, and then eliminate the references
to nsIFrame (or typedef it to nsFrame so you don't have to whack every file in
Layout)? I don't think that anyone besides nsFrame inherits from nsIFrame, and
nsIFrame adds nothing to the object model (or am I wrong?). Also, what happens
on platforms that cannot sucessfully inline the larger inline methods? Will we
get linkage problems?
Compilers that can't inline them will just bloat each compilation unit with a
copy of an out-of-line function, maybe folding them to shared versions at link time.

Waterson: didn't you say that the Intel guys preferred out-of-line stuff to
avoid blowing the instruction cache?
They didn't seem to like inline functions, but I think that dislike really only
makes sense for large inline functions.  So maybe we need to recombine layout
and content sooner rather than later?  Perhaps a better first pass would be just
the nsIRuleNode and nsIRuleWalker changes?


BTW, this looks like an accidental backout of Bernd's rv cleanup:

@@ -4115,10 +3971,8 @@
   if (!view)
   {
     nsresult rv = GetParentWithView(aPresContext, &parent);
-    if (NS_FAILED(rv)) 
-      return rv;
-    if (!parent) 
-      return NS_ERROR_FAILURE;
+    if (!parent || NS_FAILED(rv))
+      return rv?rv:NS_ERROR_FAILURE;
     parent->GetView(aPresContext,&view);
   }
 
@@ -4149,11 +4003,8 @@
   {
     nsIFrame *parent;//might be THIS frame thats ok
     nsresult rv = GetParentWithView(aPresContext, &parent);
-    if (NS_FAILED(rv)) 
-      return rv;
-    if (!parent) 
-      return NS_ERROR_FAILURE;
-
+    if (!parent || NS_FAILED(rv))
+      return rv?rv:NS_ERROR_FAILURE;
     parent->GetView(aPresContext,&view);
   }
   nsCOMPtr<nsIViewManager> viewMan;

I cannot rename nsIFrame without patching all of the call sites.  The problem is
that nsIFrame is used all over the code base like it is an interface (e.g., with
COM ptrs, etc.).  A simple typedef won't cut it.

The task of eliminating nsIFrame as an actual interface is a huge wrist-burner,
and there's no way I can do it.  I still recommend that my patch go in as is,
since it gives an immediate boost to our performance.  

Arguments of XPCOM elegance don't wash with me, given that nsIFrame is already
not a real interface anyway.
XPCOM elegance be damned ;) Seriously, we can just make nsFrame extend
nsISupports, get rid of nsIFrame and update all of the callers to use nsFrame. I
volunteer to do the grunt work of updating the callers, and that can be done
after this is committed if we use a typedef for now.

Removing the intermediary (what was nsIFrame) will simplify the model, allow
this bitchin' performance gain, and make it easier to explain and understand
(nsIFrame was a mutant anyway, not supporting AddRef and Release).

Dave agreed to rip out nsIFrame and just make nsFrame extend nsISupports. Then
we'll export nsFrame.h, typedef nsIFrame to nsFrame, and file a bug on me to
change references to nsIFrame to nsFrame after this lands. All in favor, say 'Aye'.
AYE! :)
Aye!  And then disinherit nsFrame from nsISupports, eh?

/be
Some frames are still using QI (and maybe even AddRef / Release?). Hopefully
those frames can just implement nsIsupports thenselves and not get it from nsFrame.
If some frames are using AddRef and Release, aren't they in a lot of trouble
already?
Err, so I spent a while attempting the typedef and ended up giving up.  It's
pretty complex, and I'm fairly convinced that without doing the remerge of the
DLLs that there are going to be linkage issues.

So I'm just going to make this patch be about rule node and rule walker and
request a review of only those changes.  I give up on the frame thing even
though it provided most of the perf gain.  It's just too much typing.

Summary: Eliminate nsIRuleNode, nsIRuleWalker, inline nsIFrame funcs → Eliminate nsIRuleNode, nsIRuleWalker
Can I get r/sr just on the rule node/rule walker changes?  I'd like to move that
along independent of the frame stuff.
Is there a URL or other reference to this data about mispredicts and such?

FYI, I have been working on a system to automatically devirtualize XPCOM
interface calls when some simple class analysis shows there is only one
implementation of a given interface method. It has the advantage that no source
code changes are required, so tradeoffs between extensibility and performance
can be chosen at build time. It's mostly a hack to the IDL compiler. No C++
compiler changes are required, although to catch some big fish I added a
Java-like "final" method attribute to gcc3. It currently "works" but I need to
spend more time to make it really solid, and I have higher priority work
blocking it until end of November.
Is there a URL or other reference to this data about mispredicts and such?

FYI, I have been working on a system to automatically devirtualize XPCOM
interface calls when some simple class analysis shows there is only one
implementation of a given interface method. It has the advantage that no source
code changes are required, so tradeoffs between extensibility and performance
can be chosen at build time. It's mostly a hack to the IDL compiler. No C++
compiler changes are required, although to catch some big fish I added a
Java-like "final" method attribute to gcc3. It currently "works" but I need to
spend more time to make it really solid, and I have higher priority work
blocking it until end of November.
For the sake of symmetry, you should add a static Create method to nsRuleList
(which inline calls |operator new|), and make |operator new| and |operator
delete| protected. Destroy should also be static; e.g.,

  nsRuleList::Destroy(mChildren, mPresContext);

Otherwise, looks fine. r= or sr=waterson.

Comment on attachment 54079 [details] [diff] [review]
A diff only of nsRuleNode.cpp

r= or sr=waterson
Attachment #54079 - Flags: review+
Actually, I'm going to ask again if it's ok to land this as is.  I think the
typedef of nsIFrame is rather complex (and bears some thought about what should
be done with all the extra methods on nsFrame).  Can't we take the performance
boost now?  I can even add a great big "// XXX" comment to nsIFrame explaining
that it is not a real interface and that it will be going away real soon now...

Thoughts?


Well, as much as I'd like to see this fixed the ``right way'', I'd rather take
the performance gains (and then hold the line on them) than wait for the sweater
to unravel (i.e., getting rid of nsIFrame, which depends on merging content and
layout, which depends on us having a decent extensibility story, etc.).
In fact, we could even start adding something like:

"// XXX deCOMtamination in progress"

or come up with some standard comment, since I expect this will start happening 
to a lot of interfaces as we start decomifying them.
Comments on this patch:

  nsStyleContext::nsStyleContext, nsStyleContext::RemoveChild, and
  nsStyleContext::FindChildWithRules should just say |if
  (ruleNode->IsRoot())| rather than assigning to an |isRoot| variable.

  Ditto for |isAtRoot| somewhere in nsStyleSet.cpp

  Ditto for |atRoot| in nsHTMLCSSStyleSheet.cpp, twice.

  The indentation in html/style/public/Makefile.in and
  html/style/src/Makefile.in is wrong -- you probably need to use tabs
  rather than spaces.  (Same for html/style/src/makefile.win.)  And
  content/shared/src/Makefile.in.  And content/shared/src/makefile.win
  (the REQUIRES change).

  In nsIFrame::GetStyleData and GetStyle, could you change the null
  check on |mStyleContext| to an assertion, and just dereference
  mStyleContext?

  Could you add a comment at the top of nsIFrame.h saying that nsIFrame
  is not really a COM interface, and it is in transition from being a
  NQCOM interface to either being collapsed with nsFrame or being just
  the publicly accessible base class of nsFrame?

  Could you not undo Bernd's changes to nsFrame.cpp (the last 2 changes
  in the diff)?

  In nsRuleWalker.h, could you switch the order of the member
  initializers so they don't cause massive numbers of warnings on gcc?

  In nsRuleNode.cpp, the following code:

+  if (!mParent && mRootChildren)
+    mRootChildren->Enumerate(ClearCachedDataInSubtreeHelper, (void*)aRule);
+  else if (mParent && mChildren)
+    for (nsRuleList* curr = mChildren; curr; curr = curr->mNext)
+      curr->mRuleNode->ClearCachedDataInSubtree(curr->mRuleNode->mRule);

  would probably compile to something simpler when written as:

+  if (!mParent) {
+    if (mRootChildren)
+      mRootChildren->Enumerate(ClearCachedDataInSubtreeHelper, (void*)aRule);
+  } else
+    for (nsRuleList* curr = mChildren; curr; curr = curr->mNext)
+      curr->mRuleNode->ClearCachedDataInSubtree(curr->mRuleNode->mRule);

Have you done leak tests?  You should probably instrument nsRuleNode and
nsRuleWalker with MOZ_COUNT_[C|D]TOR.

With those comments fixed, and some simple leak tests done, r=dbaron.


Further cleanup that should happen after this lands:

  Make nsIFrame not inherit from nsISupports (but still have QI).

  Make nsRuleNode::Transition return its result.
  Make nsIStyleContext::GetRuleNode return its result. (?)

  Make nsIFrame::SetParent take an |nsIFrame*| (look ma, no const!).

  Removing many of the |nsresult| return types.
Summary: Eliminate nsIRuleNode, nsIRuleWalker → Eliminate nsIRuleNode, nsIRuleWalker; inline some of nsIFrame
> I can even add a great big "// XXX" comment to nsIFrame explaining
> that it is not a real interface and that it will be going away real soon now...
> Thoughts?

OK, let's take the perf boost. Do we need to open a bug or something to track
the progress of the dependant work (merging of layout and content) and make sure
the follow up gets done?



Er, I am not happy at all hearing all this talk about remerging content and
layout... There were reasons for it and they have not changed (decoupling things
that should not be coupled and preparing for a DOM/content module that stands
independent of layout, to be used on the server for example). Also it has made
the life easier for people who work on content or layout on a daily basis
because the time to compile content is only about half of the old layout (same
for the new layout when compared to old layout). I would be ok to move style
back to layout, though.
Fixed.  A bug has been opened on the remerging of content/layout filed to
dbaron.   Let's debate that over there rather than in this bug.  

I have the decomify XBL bug.  We still need to file a new bug for the followup
work to nsIFrame.  Marc, you want to file that on yourself?
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
filed a bug on followup work to nsIFrame (bug 107021)
merge contnet and layout dll is bug 106161, dp is working on it.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: