Closed Bug 53620 Opened 24 years ago Closed 24 years ago

5% of mail scrolling time in nsStyleUtil::IsSimpleXLink()

Categories

(Core :: CSS Parsing and Computation, defect, P3)

x86
Windows 2000
defect

Tracking

()

VERIFIED FIXED
mozilla0.8

People

(Reporter: waterson, Assigned: attinasi)

References

Details

(Keywords: perf)

Attachments

(6 files)

Profiling page-down style scrolling in mailnews. We spend 5% of the total time
in nsStyleUtil::IsSimpleXLink() (called from SelectorMatches()). This call is
the single most expensive operation in SelectorMatches(), accounting for 33% of
the CPU time.

Can we avoid doing this? A simple hack would be to *never* check for XLinks in
XUL documents. However, I'm concerned that we need to come up with a better
long-term solution, because this'll start to show up on XHTML profiles eventually...
Severity: normal → major
Keywords: perf
IsSimpleXLink should be blazing if the element is NOT an XLink.

Currently the IsSimpleXLink method checks:
1) if we have an XML element (QI for nsIXMLContent)
2) if the type attribute is 'simple' (GetAttribute, nsString::operator ==
NS_LITERAL_STRING('simple'))

It seems strange to me that a QI, a GetAttribute call, and a string compare
could be that expensive. How about the NS_LITERAL_STRING macro, it looks like it
creates an nsLiteralString instance, could that be the CPU-sucker? Can we avoid
creating the temporary nsLiteralString by using a global constant instead?

Checking for a XUL document seems reasonable, if we are sure XUL documents will
never have XLinks. However, I'd like to first get a handle on why the method is
looking so slow...
Status: NEW → ASSIGNED
Probably not entirely safe to assume no XLinks in XUL, and waterson: you're
probably right about it showing up in XHTML soon. The code to check looks good
to me, though -- I don't see where it would be slow...
Well, Quantify don't lie, so saying the code "looks fast" ain't gonna make it
*be* fast.

The problem is that it's called 93,748 times when scrolling through 107
messages. FWIW, here's where the time goes:

Method                                %Time Calls Time(us)
nsXULElement::GetAttribute            28.29 93514 19093.49
Compare                               27.55 93724 18594.04
nsCOMPtr_base::assign_from_helper     12.29 93748  8294.77
nsAutoString::nsAutoString            11.36 93724  7663.76
nsAutoString::~nsAutoString            6.33 93724  4271.60
nsCOMPtr_base::~nsCOMPtr_base          2.23 93748  1508.03
nsQueryInterface::nsQueryInterface     1.30 93748   879.67
nsQueryInterface::nsQueryInterface     1.12 93748   754.01
AnonymousElement::GetAttribute         0.02   210    16.05

As you can see, calling GetAttribute() and doing a string compare is *not* fast
when you do it 100,000 times. At this scale, even AddRef() and Release() count.

Could we only do this check if we're looking at a rule that could potentially be
affected by a :link (etc.) pseudo-class?
Chris, thanks for the Quantify data - it is very illuminating.

First, we only call IsSimpleXLink for rules that have selectors with link pseudos:

From SelectorMatches:

else if (IsLinkPseudo(pseudoClass->mAtom)) {
  if (!contentTag) aContent->GetTag(*getter_AddRefs(contentTag));
  if (nsStyleUtil::IsHTMLLink(aContent, contentTag, aPresContext, &linkState) ||
      nsStyleUtil::IsSimpleXlink(aContent, aPresContext, &linkState)) {
 ...

So we actually call IsHTMLLink more often (or the same number of times if there
are no links), and the reason that routie is not showing up is that it first
checks the tag:

  if ((aTag == nsHTMLAtoms::a) ||
      (aTag == nsHTMLAtoms::link) ||
      (aTag == nsHTMLAtoms::area)) {

and last time I checked pointer comparisons were pretty fast, faster than
GetAttribute calls anyway!

I'll investigate further and try to eliminate the GetAttribute calls when
possible...
As far as I know, the chrome contains no rules with :link... so the question is, 
why are we coming into this code at all?

David, what about the rules from ua.css? Aren't those applied too?
I'll try to QI for a XULElement first and bail if it succeeds - that should help
avoid the costly GetAttribute and Comparisons.
Only to HTML elements.  There shouldn't be a :link rule in the XUL namespace.
There's a hideous descendant selector rule in html.css.  We have to be more 
careful about letting inefficient rules like this slip by.

:link img, :visited img, img[usemap], object[usemap] {
  border: 2px solid;
  cursor: pointer;
}

With this rule, now anytime anything goes into :link, it probably grovels down 
into the kids looking for an img (or something equally inefficient).
Hmmm. html.css is defaulted to the HTML namespace though, so I still don't see 
why XUL would be affected by this...

lxr reveals html.css as the only css file with :link rules. 
QA Contact: chrisd → petersen
It looks to me like we are not applying he default namespace properly.

I instrumented the code a bit and I see that the :link pseudo is getting the
namespaceID of -1, or kNameSpaceID_Unknown, so it is never getting screened out.

I'm debugging the application of the default namespace now. If it is the case
that we are ignoring the default namespace then we can get HUGE performance
improvements in the chrome (and many XML documents) by fixing it, not just for
:link but for all of the rules in html.css
Digging a little deeper it is apparent that we have some namespace handling
problems in CSSParserImpl::ParseSelector. Specifically, if the selector starts
with an ID, a class, a pseudo-class, or an attribute, then we do not set the
default namespaceID into the selector. That means that all of the rules in
html.css (and quirk.css) that don't start with an element name will be applied
to XUL elements because they will have the kNameSpaceID_Unknown namespace.

Brewing up a patch to fix this now...
I have a patch that sets the namespace of the selector to the the default
namespace if there is no namespace in the selector for the id, class, pseudo and
attribute selector cases that were not getting it set before. This fixes the
problem reported in this bug because we now skip even applying the rules from
html.css to the XUL elements (thus the :link rules are never even run thgough
the body of SelectorMatches).

One problem I saw is that the text controls in the chrome (location bar, edits
in the dialogs) are now getting a different background color. I'm hoping that
there is just a simple bug in the css for those controls in the chrome, probably
the wrong namespace? Anyway, if somebody familiar with chrome styling can check
it out I'd appreciate it.

Patch coming...
Depends on: 53796
Added dependency on new bug that attacks the namespace application problem.
That won't fix it. The rule in html.css is there because it needs to be there --
once 53796 is fixed it will have to be changed to read

   *|*:link img, *|*:visited img, img[usemap], object[usemap] {
     border: 2px solid;
     cursor: pointer;
   }

...since it is perfectly valid for an XLink in some other namespace to contain
an IMG in the HTML namespace, and then you would want that IMG to have the blue
border and the pointer (hand) cursor.

The fix to this bug is to use a Link Manager that caches which nodes are links.
This was requested several months ago now... ;-)
The comment about the ':link img' rule doesn't make sense to me, unless there
are IMG elements in the tree that is being scrolled.  We do the selector
matching from the bottom up, so we would only check the :link rule for parents
of IMG.  Right?

The testing of HTML links has been heavily optimized, but it is harder to do the
same for XLinks because they can be on any element.  It would probably be nice
to have a HasAttribute call (that is something other than a wrapped GetAttribute
that ignores the returned attribute) -- that might help some...
OK, given these facts:
1) the :link and :visited rules should apply to all namespaces
2) an XLink can be on any element
3) this performance problem is real and needs to be addressed for RTM
4) we do not have time to re-architect link management for RTM

I propose that we 'tweak' the IsSimpleXLink method to improve the performance 
for scrolling (and all XUL element style resolution, in general).

NOTE: This proposal assumes that XUL elements will NEVER contain simple XLinks!

In IsSimpleXLink(), first QI for an nsIXULContent and if it succeeds, return 
FALSE. That gets rid of the costliest components of the method, GetAttribute and  
the comparison. Alternatively, we could check for an XUL Document, as Chris W. 
suggested, and skip checking for xlinks (and HTMLLinks) if it is.

If the assumption stated above is true, and this is deemed important enough, 
then I'd be happy to make the change.

Long term: we need, as Ian has asserted may times, a more intelligent and 
optimized LinkManager that can cache the target and status of links of all 
types. This is obviously a post-RTM endeavor.
Profiling menus (moving around in them) etc., 7% of the time is spent in 
IsSimpleXLink.
I'd prefer the nsIXULContent approach to the nsIXULDocument approach, simply 
because a XUL doc could contain elements from non-XUL namespaces that did 
support XLinks.  I am ok with XUL elements in Netscape 6.0 not supporting :link. 
If we make this fix, we should file a bug on XUL not supporting :link and note 
why.

I do believe we should check these default namespace fixes in for RTM however, 
as there are other rules in html.css that we are probably wasting valuable time 
testing over in XUL.  Do we want a separate bug on improper handling of default 
namespaces, or is one already filed?
Thanks for the input David. My assertion was the a XUL element cannot be an
XLink, not that :link won't apply.

Agreed that the fix for default namespace application on selectors with no tag
(or explicit universal selector) should wait for 6.01, for risk-management
purposes only. Bug 53796 covers default namespace problem, I'll open another
about XUL not supporting styling of XLinks once the change is in.

I'm going to nominate this for RTM now, since we seem to agree that the solution
(exclude XULContent from IsSimpleXLink checks) will give a measurable
performance boost, and it of acceptable risk (i.e. very low). I'll attach a
patch with the fix as well.
Keywords: rtm
Marc, are you sure the latest attachment is the one you meant it to be?
doh! wrong patch, sorry. And thanks for catching it, Andreas. Correct patch
coming up.
Marc, I think that there may be a better way to do this.

What if added some flags to ContentEnumData:

  mIsHTMLContent
  mIsHTMLLink
  mIsSimpleXLink

as well as another pointer:

  mTag

We could then hoist the computation of these values out from SelectorMatches()
to CSSRuleProcessor::RulesMatching(). I'm seeing about a 10x fanout from
RulesMatching() to SelectorMatches() in mailnews, so at least in this case, the
hoist makes sense.

Here's a more detailed breakdown:

- 14% of the time is spent QI()'ing the element to see if it's HTML Content
  at line 2820.

- 6% of the time is spent in GetTag, which is dominated by
  PR_AtomicIncrement and PR_AtomicDecrement.

- 2% of the time is spent in nsStyleUtil::IsHTMLLink()

(I've commented out the call to IsSimpleXLink in my tree, but as discussed
earlier, this dominated the function before!)
Chris, are you saying we call SelectorMatches 10 times more than RulesMatching?
In that case I totally agree that we should cache the tasty information in the
the ContentEnumData in RulesMatching, and pass it to SelectorMatches (and
SelectorMatchesTree I think) from the ContentEnumFunc. I'll give it a go if you
haven't already - what a great idea!
Right, there are about 10,000 calls to RulesMatching(), and 100,000 calls to 
SelectorMatches().

The only caveat is that this was observed scrolling the mailnews threadpane, so 
it might no accurately reflect what's going on in "general" chrome or HTML. 
(...But, I suspect that the situation is similar.)

If we do hoist these invariants out, I think that the worst thing that happens 
is we do an extra QI() or two, right?
I'm not worried about the possible extra QI (on the tag). However, some
selectors don't have anything to do with links, and we currently never bother to
check for HTMLlinks or XLinks for them. We should probably not cache the
mIsHTMLLink or mIsXLink state for those selectors, otherwise we are doing more
work. (This is currently encapsulated in the method IsLinkPseudo()) This means
we have to get the link-state values in ContentEnumFunc (where the selector is
known) rather than the RulesMatching method. I presume the fan out is the same
10:1 for the ContentEnumFunc?

I think the worse problem is that we are locally fixing something that we
already plan to provide a more global fix for later (LinkManager
implementation), thus we are _wasting_ some effort. However, it seems to be a
small amount of effort and could even be leveraged further when we do the
general, LinkManager solution in the future (we can still cache the link-state
in the enum data, even if the LinkManager does make the lookup faster).

BTW: I think we should hoist the linkState value as well, as we need that
whenever we have an HTMLLink or and XLink.
Target Milestone: --- → M19
I'm not going to get to work on this any time soon... Sorry. If you wanna take
it over or use the existing patch, that's cool. Otherwise, post-RTM I'll get to
it, when performance issues are brought to the top of the PDT priority list.
Target Milestone: M19 → mozilla0.6
rtm-, not enough improvement for whatever risk.  Glad to have you working on 
something more important instead.
Whiteboard: [rtm-]
Targeting for M0.8
Keywords: rtm
Whiteboard: [rtm-]
Target Milestone: mozilla0.6 → mozilla0.8
Keywords: nsbeta1
What is interesting in the analysis posted by waterson on 2000-09-29 16:27 is
that QI shows up so prominently. We have not yet optimized our QI
implementations. We could speed them up by changing to table-driven
implementations. alecf has been meaning to do that but hasn't yet.
If you check out nsXULElement::QueryInterface you can see that it has to be slow
- it has to check with the BindingManager in the case where it doesn't know
about the IID, and the check for IsHTMLContent has to hit that (it also has to
check for a lot of IIDs!).

http://lxr.mozilla.org/seamonkey/source/rdf/content/src/nsXULElement.cpp#594

Checking if an element is a XUL Element _before_ checking if it an HTML element
might be a good optimization, as it will avoid QI'ing XUL elements for
HTMLContent. I am incorporating this into the changes I am making to move checks
for isHTMLContent, isHTMLLink, isSimpleXLink, linkState, isQuirkMode and
contentTag out of SelectorMatches and into the RulesMatches, but I'll have to
test to see if it is a net win or not. My guess is that it will be, as most
other elements can QI much faster.
I have a patch that hoists the computation of the following out to the EnumData:
ContentTag, IsHTMLContent, IsHTMLLink, IsSimpleXLink, LinkState, and QuirksMode.

I have not Quantified this and was hoping that somebody else who is setup to
Quantify would try it out for me (wink, wink, nudge, nudge, know what I mean,
Waterson?) I'll attach my patch...
I'm taking some jprof profiles right now.  However, this patch seems to cause
some pretty obvious regressions.  In the classic skin, the menubars have arrows
next to them and the text inputs look a bit strange (an extra border).  My
homepage also has some headers in bold that shouldn't be...
I profiled a reasonable small testcase (opening a new mail window).  The
performance improvement was significant:  the time spent within
CSSRuleProcessor::RulesMatching fell from 32 and 36 to 24 and 20 (2 profiles
before and 2 profiles after), where the numbers are the number of
timer-generated stacks containing the function.  You didn't yet pull out some of
the biggest factors:

 * [4 and 3 stacks in the 2 after profiles] the GetAttribute call (It's not
possible to pull this one out -- we really need to discuss better string
solutions for this with scc!)
 * [2 stacks in both after profiles] The GetID call and the refcounting of the
associated atom -- this should probably come out too.  (Refcounting atoms is
slow because they're threadsafe.)

I'm wondering if there are any other computations we can pull out -- maybe even
some within the body of SelectorMatches.

One other question:  Would you want to make one of the arguments to
SelectorMatches a StyleSheetEnumData struct rather than passing each argument
separately?  (You could then create separate variables within it for the most
frequently used variables, e.g., content, to avoid dereferencing too often,
although I don't know if that would make much of a difference.)
Thanks, David. I'm not seeing the  menu problem or the text-input problem on my
Linux build (Classic or Blue skin). But, I'll admit I have not fully tested the
patch yet. I posted it as soon as it was working in hopes of learning if there
is any advantage. I'll continue testing and see if I can observe the regressions
you mentioned (or others).

And thanks for the profile data too! I think you are correct about the ID
getting pulled out, I'll add that. Regarding your question about passing the
SheetLoadData to SelectorMatches[Tree]: I started out this way, but I changed it
back before postig the patch. I'm still not sure which way I like better, mostly
I am concerned about SelectorMatching being coupled to the EnumData, since it
should be possible to call SelectorMatches elsewhere. Maybe I'd like it better
if I just changed the name to SelectorMatchesData...

One more thing, what do you think of my 'optimization' to QI for XULElements
before QI'ing for HTMLContent?
2 more nits:
 * please untabify the one tabbed line "PRBool isCaseSensitive = ..."
 * Your change to put the :lang pseudo stuff in an #ifdef is substantive, and
I'd rather not make that now.  We should probably file another bug to disable it
the correct way (in the parser) if we're not going to support it soon, although
the way it's disabled before your changes is close to correct (i.e., it is
correct for non-grouped selectors).
I think that further optimization for QIing for XULContent first is basically
useless now.  The constructor for StyleSheetEnumData doesn't show up in the
"after" profiles at all.  With these changes, it doesn't matter anymore...
The #ifdef of the :lang pseudo will cause no change. Since it always returns
false, it will act exactly the same if we ignore it since it will be treated as
an unknown pseudo, which always returns false (and it will save a costly
pointer-comparison ;) You are correct that it should be done under another bug,
however, so I'll put it back for now. Thanks.

BTW: the change, as posted, has the optimization for XULElements QI'ing for
HTMLContent turned ON already.
It looks like we can pull out the content parent as well, and possibly the
eventstate (and eventStateManager). I'm definitely going to have to pass in a
structure now, we are already up to 10 new arguments to SelectorMatches[Tree]
and there are probably one or two more hiding in there...
OK, yeah, I missed the |result = PR_FALSE; // unknown pseudo-class| at the
bottom of the function.

Changing to a struct does have the big advantage that it's very easy to add
things.  (For example, we might want to add some representation of all the
attributes present on the element to make the attribute selector testing a
lot faster.)

I think you can skip that optimization for XUL Content now.   Although if
we're sure that nothing will ever implement both nsIXULContent and
nsIHTMLContent (are we??), then we could leave it.
Here's a thought:  In the SelectorMatchesData, have the result of a call to
nsIContent::GetAttributeCount (perhaps stored as a bool, hasAttributes).  If
hasAttributes is false and there's an attribute selector, set the result to
false and short-circuit the call to GetAttribute.  (I wonder how much that will
help.  What percentage of elements have attributes?)
Yes, I changed the argument to SelectorMatches[Tree] to take the struct - it was 
getting out of control with arguments. Also, I found the regression you saw - it 
was a problem in the recursion in SelectorMatchesTree - and I have it fixed now.

I like your idea of checking if the content has any attributes at all and using 
that as a short-circuit: very smart. I'll do a test first to see how often we 
find elements with 0 attributes.
I noticed many (thousands) elements with no attributes: mousing through menus, 
for example, and hovering around on a simple test page, caused my debug output 
(from SelectorMatchesData's ctor) to spit out 0-attribute indicators, so I think 
it will be a win. It also appears that the call to GetAttributeCount is very 
fast.

This can be done for classes too, but determining the class-count is more 
expensive as it iterates over the class list, but it might be worth it...
Not surprisingly, that patch takes away some of the gain.  (I did one profile of
the same thing, and it had 27 stacks in RulesMatching.  5 stacks were within the
call to IsSimpleXLink within SelectorMatchesTree.)  I suspect it would be better
to make SelectorMatchesData a linked list that has a parent chain of
SelectorMatchesData all the way up to the root element, so that you never have
to reinitialize it in SelectorMatchesTree.  (You'd also change the GetParent
call on the element to one on the SelectorMatchesData.)

The hasAttributes check doesn't seem to do as much as I had hoped, though.
er, parent style context, not parent content... (I think)
As Marc pointed out by email, my idea won't work for sibling combinators. 
However, it might be worthwhile to do it even if it only works for descendant
and child combinators, although it would make the code a bit more complicated.
Maybe we should just work on what's here and work on the rest later?
Another small change that might speed things up (if it is correct): if the
element has no attributes then it cannot be a SimpleXLink, and so there is no
need to test for that. I'm pretty sure it cannot be an HTML link either, so no
need to check for that either. It is really fast to get the attribute count, and
we do that anyway for SelectorMatches to use, so it should be a win.
I'm going to test and see if computing the SelectorMatchesData all the way up
the content tree is any better then doing it on demand in SelectorMatchesTree -
I cannot tell from inspection, but it seems to me that there is the chance that
precomputing the SelectorMatchesData tree up to the root when any content is
being sent through RulesMatching will be more costly than doing it in
SelectorMatchesTree - I'm not sure. I think, however, that what we have now is a
good candidate for Quantify timings and maybe good enough to close this bug.
Optimizing SelectorMatches should be a lifelong endeavor anyway...
Correct, an element cannot be a simple XLink if it does not have attributes.
Does hasAttributes/GetAttributeCount() check for default attributes defined in
the DTD (or internal subset in our case since we do not read external DTDs)?
Heikki - GetAttributeCount looks like it just returns the number of elements in
a list (void array actually) of attributes (for XUL and HTML at least - I
couldn't find out what XML Elements do). So, the question is, do the default
attributes get pushed into the attrubute collections? I am not sure and will
probably have to test it, but I do not know how to get default attributes into
an element. Can you possibly give me a hint (via email)?
After some testing (and discussion with rick g.) it looks like default
attributes are pushed into the attributes collection for XML elements, so
checking for no attributes should be fine.
This is great stuff, Marc!

- You could save a virtual dispatch by getting the content element's
  namespace up-front.

- I hate to see us keeping the tortured state variable logic now that
  you're univserally using nsCOMPtr's. I think that early returns
  would result in more efficient code (by saving cascading test-and-branch
  to the end of the routine), and could make the logic a lot cleaner. And
  since you're practically re-writing the whole thing anyway...

Take 'em or leave 'em. [r/sr]=waterson, whatever helps.
These comments are all really minor (except for the one about potential
build bustage) so r=dbaron with or without them.  There's a bit more
room for optimization at a number of places, but let's get this in, since
it's a big improvement.

Should SelectorMatchesData::SelectorMatchesData be written as inline?
I don't think any compiler will actually inline it, so maybe it should
be separate from the class definition?  It would also be easier to read
the class definition.

In SelectorMatchesData::SelectorMatchesData, I think preprocessor directives
should always begin at the beginning of the line -- I think some compilers
don't like it if they don't.  (I'm not sure which, but if it were OK
to have them start elsewhere I'm sure it would be more common...)

There's a tab on the line:
+         mIsHTMLContent = PR_TRUE;

In SelectorMatchesTree, the line
+          if (SelectorMatchesTree(newdata,selector)) {
is missing a space between the arguments.

Thanks David, I appreciate your comments.

Accroding to the C++ Prog. Lang. 2nd Ed. (section r.16):
'Lines beginning with #, optionally preceded by space or horizontal tab
characters, (also called "directives") communicate with this preprocessor.' -
referring to the macro preprocessor. I have used indented preprocessor
directives for years, on many platforms, and have never had any problems, but
there is little value in this case so I'll move them out to the start of the
line (because I'm paranoid). Also, I moved the ctor to be out of the class
declaration; you are correct that it is much nicer to look at and should not
(cannot) be inline. Finally, I fixed the tabs and spaces as you noted.

When this bug is closed I'll open another for future improvements to
SelectorMatches and related methods.
Checked in nsCSSStyleSheet.cpp, v 3.138
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
*** Bug 63711 has been marked as a duplicate of this bug. ***
Marking verified per last comments.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: