Closed Bug 104336 Opened 23 years ago Closed 23 years ago

Eliminate nsIRuleNode, nsIRuleWalker; inline some of nsIFrame

Categories

(Core :: CSS Parsing and Computation, defect)

x86
Windows 2000
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla0.9.6

People

(Reporter: hyatt, Assigned: hyatt)

References

(Blocks 1 open bug)

Details

(Keywords: perf)

Attachments

(2 files)

As established in performance work (both profiling internally and with what we know from the Intel guys), we know that the virtual function overhead of GetStyleData is killing us. This bug proposes to remedy the problem by eliminating nsIStyleContext and nsIRuleNode. Here is the proposed strategy for doing so: (1) We need to implement a smart pointer class that can ref count objects without all of the extra COM baggage. dbaron proposed the name "nsRefPtr." This new pointer will support all the same ref-counting mechanisms as nsCOMPtr, e.g., getter_AddRefs. (2) Armed with this pointer, we eliminate nsIStyleContext and nsIRuleNode from the tree and eliminate their virtual functions, i.e., in nsStyleContext and nsRuleNode the functions become non-virtual. The rule node and style context files move into content/shared so that they can be used by both layout and content. (3) GetStyleData is expanded to take the pres context as an argument, so that the mPresContext member variable of rule nodes can be eliminated (thus enabling brutal sharing of rule trees, which will give us a footprint win in the chrome). (4) We eliminate nsIFrame::GetStyleData, and make sure people go through the style context instead.
Status: NEW → ASSIGNED
Keywords: perf
Target Milestone: --- → mozilla0.9.7
(4) I'm not sure whether we want to eliminate nsIFrame::GetStyleData or do the same transition from nsIFrame to an nsAFrame that's more like nsAString -- having some concrete methods (perhaps all inline, at least at first, to avoid linkage problems?). (2) Also, an alternative to moving these things into content/shared would be to move all the style stuff back into layout, where I think it belongs -- after all, layout is a CSS-based layout engine. This would probably require redesigning attribute mapping, which might not be a bad thing anyway. (3) I've been wondering whether some of the parts of the pres context that describe longer-lived state (e.g., per-medium) should be on a longer lived object, or whether the lifetimes of the pres context and pres shell should be different. But that's a different bug entirely.
(1) Thanks to the overloading of getter_AddRefs (there's one that means the same thing as dont_AddRef), we couldn't actually use the name getter_AddRefs. Although we could change nsCOMPtr to eliminate that overloading and deprecate |template <class T> inline const already_AddRefed<T> getter_AddRefs( T* aRawPtr )| in favor of |dont_AddRef|, which is something I wouldn't mind doing anyway...
So a simple first pass for the frame problem would be to move nsFrame's members into nsIFrame and then un-virtualize and inline some of the common functions (like GetStyleData and GetContent). Then nsIFrame's GetStyleData wouldn't have to be eliminated, since the vptr overhead has been eliminated. Then as time permits we could make the name change from nsIFrame to nsAFrame.
Moving data members into nsIFrame and making non-virtual functions? I think this is a bad idea - I am not ready to abandon the interface yet, especially since we have not formulated out plans for extensible frames, user-implemtned frames etc. I'm not sure about this - it seems like just eliminating GetStyleData from the nsIFrame would be better. On the other hand, if we want to rearchitect nsIFrame and nsFrame then that is cool, but that is a big analysis and design job. See bug 51684
Is there a hack that we could do cheaply to verify that we'd see gain here? (Not to check in -- just to verify.) attinasi: this showed up as _the_ hottest spot (in the entire browser!) in terms of the number of missed branch predictions. IOW, vtables are screwing us badly here.
I think before we worry about the bath water or the baby, we need to finish the bathing process and scrub our performance-baby till it shines. nsIFrame is not a proper interface, so not a viable extension mechanism at present or in any near term (mozilla1.0). The COM costs are bad enough when paid within a DLL, where there is no credible need, plan, or hope for extensibility via public XPCOM APIs. It's even sillier for not-quite-COM (NQCOM) code that gives us all the costs and none of the gains, apart from maybe QI. We need to factor out ref-counting and an nsRefPtr so code that currently uses the XPCOM hammer on all nails, but that would rather use ref-counted concrete classes and non-virtual methods, can do so. Our performance story is hurt in lots of ways by gratuitous XPCOM and NQCOM usage. L2 cache miss rate is too high on P4, in addition to mispredicted branches, although that may be due more to data traffic than to gratuitously many (due to virtual methods, or otherwise) instructions. We're not only taking too many cycles compared to IE (due to misprediction stalls, cache misses, etc.) -- we're simply issuing too many instructions per page load. Code size bloat is another target. Let a sensible de-COMtamination process begin! /be
Can we get vtune data out somewhere where we can poke over it? (Not that I think there's anything but "win" in pushing common handling code into a common nsIFrame base class.)
XBL also uses way too much COM. The problem I see here is that we have many objects (in layout and content) that only implement one interface, e.g., nsIStyleContext/nsStyleContext, nsIRuleNode/nsRuleNode, nsIXBLBinding/nsXBLBinding) and that have no subclasses. These objects are not extensible, and they are used by only two DLLs, layout and content. All of these objects should be converted away from COM. As for layout's extensibility story (i.e., XLCs or extensible layout components), I wrote up a design for how such an extensibility story would work based off our layout technical meeting for Vidur and Dave Barrowman. This story does not involve exposing nsIFrame. In this design, nsIFrame becomes nsAFrame and is not exposed outside of the layout system. nsIFrame should never morph into anything extensible. There will always be a need for a fast internal representation of our frame tree, our style data, and our XBL bindings, and that representation should contain no COM. There should be no hesitation over eliminating nsIFrame or any worries that our extensibility story will suffer if nsIFrame acquires concrete non-virtual methods. Believe me, it won't.
For the record, I'm not _against_ removing nsIFrame - I just have not given it much thought and I'm really really (painfully?) cautious :) I also do not have the benefit of reading Hyatt's doc on XLC's (hint hint). Also, given what waterson said about this being... "_the_ hottest spot (in the entire browser!) in terms of the number of missed branch predictions" What does thime mean in terms of real, user-perceived performance problems? I guess that is the 'hack' Chris was referring to.
Blocks: 71668
I cut 4-5% off page load time (cached pages tested) by moving nsFrame's member variables into nsIFrame and then inlining all getter/setter functions on those member variables. I've also fully deCOMtaminated nsRuleNode and done some cleanup work there. The perf win from this was pretty negligible, but the footprint win is quite large (shaving off 8 bytes per rule node, since I can ditch the nsIRuleNode and mRefCnt bytes).
Target Milestone: mozilla0.9.7 → mozilla0.9.6
Summary: Eliminate nsIStyleContext and nsIRuleNode → Eliminate nsIRuleNode, nsIRuleWalker, inline nsIFrame funcs
This bug is ready for r/sr.
Do you have a diff of the old nsRuleNode.cpp and the new one (since they moved locations)?
Blocks: deCOM
Loks like nsIFrame needs to be renamed in this patch since it is not an interface but a class. Rather than keep the nsIFrame class at all, why not eliminate it entirely, make the methods you care about inline in nsFrame, and then eliminate the references to nsIFrame (or typedef it to nsFrame so you don't have to whack every file in Layout)? I don't think that anyone besides nsFrame inherits from nsIFrame, and nsIFrame adds nothing to the object model (or am I wrong?). Also, what happens on platforms that cannot sucessfully inline the larger inline methods? Will we get linkage problems?
Compilers that can't inline them will just bloat each compilation unit with a copy of an out-of-line function, maybe folding them to shared versions at link time. Waterson: didn't you say that the Intel guys preferred out-of-line stuff to avoid blowing the instruction cache?
They didn't seem to like inline functions, but I think that dislike really only makes sense for large inline functions. So maybe we need to recombine layout and content sooner rather than later? Perhaps a better first pass would be just the nsIRuleNode and nsIRuleWalker changes? BTW, this looks like an accidental backout of Bernd's rv cleanup: @@ -4115,10 +3971,8 @@ if (!view) { nsresult rv = GetParentWithView(aPresContext, &parent); - if (NS_FAILED(rv)) - return rv; - if (!parent) - return NS_ERROR_FAILURE; + if (!parent || NS_FAILED(rv)) + return rv?rv:NS_ERROR_FAILURE; parent->GetView(aPresContext,&view); } @@ -4149,11 +4003,8 @@ { nsIFrame *parent;//might be THIS frame thats ok nsresult rv = GetParentWithView(aPresContext, &parent); - if (NS_FAILED(rv)) - return rv; - if (!parent) - return NS_ERROR_FAILURE; - + if (!parent || NS_FAILED(rv)) + return rv?rv:NS_ERROR_FAILURE; parent->GetView(aPresContext,&view); } nsCOMPtr<nsIViewManager> viewMan;
I cannot rename nsIFrame without patching all of the call sites. The problem is that nsIFrame is used all over the code base like it is an interface (e.g., with COM ptrs, etc.). A simple typedef won't cut it. The task of eliminating nsIFrame as an actual interface is a huge wrist-burner, and there's no way I can do it. I still recommend that my patch go in as is, since it gives an immediate boost to our performance. Arguments of XPCOM elegance don't wash with me, given that nsIFrame is already not a real interface anyway.
XPCOM elegance be damned ;) Seriously, we can just make nsFrame extend nsISupports, get rid of nsIFrame and update all of the callers to use nsFrame. I volunteer to do the grunt work of updating the callers, and that can be done after this is committed if we use a typedef for now. Removing the intermediary (what was nsIFrame) will simplify the model, allow this bitchin' performance gain, and make it easier to explain and understand (nsIFrame was a mutant anyway, not supporting AddRef and Release). Dave agreed to rip out nsIFrame and just make nsFrame extend nsISupports. Then we'll export nsFrame.h, typedef nsIFrame to nsFrame, and file a bug on me to change references to nsIFrame to nsFrame after this lands. All in favor, say 'Aye'.
AYE! :)
Aye! And then disinherit nsFrame from nsISupports, eh? /be
Some frames are still using QI (and maybe even AddRef / Release?). Hopefully those frames can just implement nsIsupports thenselves and not get it from nsFrame.
If some frames are using AddRef and Release, aren't they in a lot of trouble already?
Err, so I spent a while attempting the typedef and ended up giving up. It's pretty complex, and I'm fairly convinced that without doing the remerge of the DLLs that there are going to be linkage issues. So I'm just going to make this patch be about rule node and rule walker and request a review of only those changes. I give up on the frame thing even though it provided most of the perf gain. It's just too much typing.
Summary: Eliminate nsIRuleNode, nsIRuleWalker, inline nsIFrame funcs → Eliminate nsIRuleNode, nsIRuleWalker
Can I get r/sr just on the rule node/rule walker changes? I'd like to move that along independent of the frame stuff.
Is there a URL or other reference to this data about mispredicts and such? FYI, I have been working on a system to automatically devirtualize XPCOM interface calls when some simple class analysis shows there is only one implementation of a given interface method. It has the advantage that no source code changes are required, so tradeoffs between extensibility and performance can be chosen at build time. It's mostly a hack to the IDL compiler. No C++ compiler changes are required, although to catch some big fish I added a Java-like "final" method attribute to gcc3. It currently "works" but I need to spend more time to make it really solid, and I have higher priority work blocking it until end of November.
Is there a URL or other reference to this data about mispredicts and such? FYI, I have been working on a system to automatically devirtualize XPCOM interface calls when some simple class analysis shows there is only one implementation of a given interface method. It has the advantage that no source code changes are required, so tradeoffs between extensibility and performance can be chosen at build time. It's mostly a hack to the IDL compiler. No C++ compiler changes are required, although to catch some big fish I added a Java-like "final" method attribute to gcc3. It currently "works" but I need to spend more time to make it really solid, and I have higher priority work blocking it until end of November.
For the sake of symmetry, you should add a static Create method to nsRuleList (which inline calls |operator new|), and make |operator new| and |operator delete| protected. Destroy should also be static; e.g., nsRuleList::Destroy(mChildren, mPresContext); Otherwise, looks fine. r= or sr=waterson.
Comment on attachment 54079 [details] [diff] [review] A diff only of nsRuleNode.cpp r= or sr=waterson
Attachment #54079 - Flags: review+
Actually, I'm going to ask again if it's ok to land this as is. I think the typedef of nsIFrame is rather complex (and bears some thought about what should be done with all the extra methods on nsFrame). Can't we take the performance boost now? I can even add a great big "// XXX" comment to nsIFrame explaining that it is not a real interface and that it will be going away real soon now... Thoughts?
Well, as much as I'd like to see this fixed the ``right way'', I'd rather take the performance gains (and then hold the line on them) than wait for the sweater to unravel (i.e., getting rid of nsIFrame, which depends on merging content and layout, which depends on us having a decent extensibility story, etc.).
In fact, we could even start adding something like: "// XXX deCOMtamination in progress" or come up with some standard comment, since I expect this will start happening to a lot of interfaces as we start decomifying them.
Comments on this patch: nsStyleContext::nsStyleContext, nsStyleContext::RemoveChild, and nsStyleContext::FindChildWithRules should just say |if (ruleNode->IsRoot())| rather than assigning to an |isRoot| variable. Ditto for |isAtRoot| somewhere in nsStyleSet.cpp Ditto for |atRoot| in nsHTMLCSSStyleSheet.cpp, twice. The indentation in html/style/public/Makefile.in and html/style/src/Makefile.in is wrong -- you probably need to use tabs rather than spaces. (Same for html/style/src/makefile.win.) And content/shared/src/Makefile.in. And content/shared/src/makefile.win (the REQUIRES change). In nsIFrame::GetStyleData and GetStyle, could you change the null check on |mStyleContext| to an assertion, and just dereference mStyleContext? Could you add a comment at the top of nsIFrame.h saying that nsIFrame is not really a COM interface, and it is in transition from being a NQCOM interface to either being collapsed with nsFrame or being just the publicly accessible base class of nsFrame? Could you not undo Bernd's changes to nsFrame.cpp (the last 2 changes in the diff)? In nsRuleWalker.h, could you switch the order of the member initializers so they don't cause massive numbers of warnings on gcc? In nsRuleNode.cpp, the following code: + if (!mParent && mRootChildren) + mRootChildren->Enumerate(ClearCachedDataInSubtreeHelper, (void*)aRule); + else if (mParent && mChildren) + for (nsRuleList* curr = mChildren; curr; curr = curr->mNext) + curr->mRuleNode->ClearCachedDataInSubtree(curr->mRuleNode->mRule); would probably compile to something simpler when written as: + if (!mParent) { + if (mRootChildren) + mRootChildren->Enumerate(ClearCachedDataInSubtreeHelper, (void*)aRule); + } else + for (nsRuleList* curr = mChildren; curr; curr = curr->mNext) + curr->mRuleNode->ClearCachedDataInSubtree(curr->mRuleNode->mRule); Have you done leak tests? You should probably instrument nsRuleNode and nsRuleWalker with MOZ_COUNT_[C|D]TOR. With those comments fixed, and some simple leak tests done, r=dbaron. Further cleanup that should happen after this lands: Make nsIFrame not inherit from nsISupports (but still have QI). Make nsRuleNode::Transition return its result. Make nsIStyleContext::GetRuleNode return its result. (?) Make nsIFrame::SetParent take an |nsIFrame*| (look ma, no const!). Removing many of the |nsresult| return types.
Summary: Eliminate nsIRuleNode, nsIRuleWalker → Eliminate nsIRuleNode, nsIRuleWalker; inline some of nsIFrame
> I can even add a great big "// XXX" comment to nsIFrame explaining > that it is not a real interface and that it will be going away real soon now... > Thoughts? OK, let's take the perf boost. Do we need to open a bug or something to track the progress of the dependant work (merging of layout and content) and make sure the follow up gets done?
Er, I am not happy at all hearing all this talk about remerging content and layout... There were reasons for it and they have not changed (decoupling things that should not be coupled and preparing for a DOM/content module that stands independent of layout, to be used on the server for example). Also it has made the life easier for people who work on content or layout on a daily basis because the time to compile content is only about half of the old layout (same for the new layout when compared to old layout). I would be ok to move style back to layout, though.
Fixed. A bug has been opened on the remerging of content/layout filed to dbaron. Let's debate that over there rather than in this bug. I have the decomify XBL bug. We still need to file a new bug for the followup work to nsIFrame. Marc, you want to file that on yourself?
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
filed a bug on followup work to nsIFrame (bug 107021) merge contnet and layout dll is bug 106161, dp is working on it.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: