reduce repeated string conversions

NEW
Unassigned

Status

()

Core
Layout: Misc Code
14 years ago
8 years ago

People

(Reporter: Sam Emrick, Unassigned)

Tracking

({perf})

Trunk
x86
All
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment)

(Reporter)

Description

14 years ago
User-Agent:       Mozilla/5.0(compatible;MSIE 5.5;Windows 98)
Build Identifier: 

In several areas of layout, strings are repeatedly converted from ascii to ucs,
in  either comparison or assignment. Performance can be improved by storing the
ucs string and doing ucs-to-ucs comparisons and assignments. The purpose of this
bug will be to produce a patch that implements improvements in this area. The
focus of this is only to reduce the mainline, frequently repeated, string
conversion operations.

Reproducible: Always

Steps to Reproduce:
(Reporter)

Comment 1

14 years ago
My initial investigation has led me to the following areas of code producing
many/most of the repeated conversions. Often, the conversions are a form of 
  this.EqualsIgnoreCase("FooBar"); 
Where it would be more efficient to create a static ucs with value 'FooBar', and
then, do something more like this: 
  this.Equals(mFooBar,nsCaseInsensitiveComparitor)
This can be much faster because, as a string object, most cases simply the
length will/can be compared to determine inequality. With a non-object string
literal, the compare is character-by charater.  
 
Here is a list of code areas proposed to be improved.  
(This list is not in a particular order).
1. nsDomEvent::nsDomEvent
http://lxr.mozilla.org/seamonkey/source/content/events/src/nsDOMEvent.cpp#150
Here, there is a additional kind of fix. In the big if/else clause here, 
the most frequent kind of if (eventType.EqualsIgnoreCase("xxxx")) is the
eventType of "Events". A check for "Events" should be made as the first if in
the set of if/else clauses. 

2. nsEventListenerManager::CreateEvent
http://lxr.mozilla.org/seamonkey/source/content/events/src/nsEventListenerManager.cpp#1536
Again here, "Events" is the most common case and should be moved to first in list. 

3.nsMenuFrame::IsSizedToPopup
http://lxr.mozilla.org/seamonkey/source/layout/xul/base/src/nsMenuFrame.cpp#946
the checks for "always" and "pref" are many times repeated. These strings should
be static ucs string objects, comparison made against those.

4.CSSParserImpl::ParseAttributeSelector
http://lxr.mozilla.org/seamonkey/source/content/html/style/src/nsCSSParser.cpp#1901
More than one area of opportunity here, the most important one is 
2065             while ((htmlAttr = caseSensitiveHTMLAttribute[i++])) {
2066               if (attr.EqualsIgnoreCase(htmlAttr)) {
running through the list of seven attributes. These should be static ucs objects
rather than local ascii char*, to reduce repeated creation and conversion.  

5.Slightly different flavor, same kinda problem: PresShell::NotifyReflowObservers
http://lxr.mozilla.org/seamonkey/source/layout/html/base/src/nsPresShell.cpp#2699
takes aData as parm and always does  NS_ConvertASCIItoUCS2(aData).get(). The
strings used as input to this routine ("RESIZE REFLOW") should be crated as ucs
statics and passed in that way, eliminating the need for conversion. 

That's my starting list. This so far looks to be worth 2%-4% performance gain
for typical busy pages, such as cnn.com.
  
Comments or questions?
Keywords: perf
(Reporter)

Comment 2

14 years ago
Note, I am particularly interested, at the moment, in Linux performance, where
NS_LITERAL_STRING is not compiled away, but rather, results in runtime
NS_ConvertASCIItoUTF16(). I would also like to reduce the number of these are
repeated as well. 
Why do we even have this NotifyReflowObservers method?  I can't seem to find any
observers for that topic in the tree....
Oh, and NS_LITERAL_STRING is "compiled away" with GCC 3.x.  It's just GCC 2.9x
that does the NS_ConvertASCIItoUTF16.
(Reporter)

Comment 5

14 years ago
Thanks Boris for the quick feedback. 

On Reflow, I dunno, I could not find the use for them either, but the 
scope of all-of-reflow is beyond my brain.  

On gcc versions, do we all not care about gcc 2.96 ?? Real question, 
not rhetorical, I don't know. I thought there might be concerns/issues 
about binary compatibility, and plugins, derivitives, and etc???? 
Some platforms have problems with static constructors and destructors, so we try
not to use them (if that's what you meant by "a static ucs").

In most cases, comparison that has a conversion within it really shouldn't be
much slower than comparison without a conversion within it, since comparisons
usually fail within the first few characters.  We need some work on string
comparison APIs, though.  (I'm not sure what plans already exist.)

Also, NS_LITERAL_STRING is compiled away on not only gcc 3.x but any of the
newer RedHat gcc 2.96 packages.  I'm not particularly interested in optimizing
performance for the bad NS_LITERAL_STRING case, unless there are a very small
number of callsites that can make a significant improvement.

Does anyone depend on PresShell::NotifyReflowObservers, or can it just be removed?
(Reporter)

Comment 7

14 years ago
Question: is code of this form okay?
static CONST NS_NAMED_LITERAL_STRING(mFooBar, "FooBar");

I thought these kind of static initializers were 'A Bad Thing',
but find them used heavily, for example, 
http://lxr.mozilla.org/seamonkey/source/extensions/webservices/soap/src
/nsDefaultSOAPEncoder.cpp 
http://lxr.mozilla.org/seamonkey/source/content/xul/document/src/nsXULC
ontentSink.cpp#1116
http://lxr.mozilla.org/seamonkey/source/htmlparser/src/nsHTMLTokens.cpp
#590

so now I am confused.. Okay to do this?
Also, it sounds like at least part of what this bug is reporting is that the
equivalent of an EqualsIgnoreCase that takes an nsASingleFragmentCString& would
be useful, since it could do a length check first.
I think the initializers in comment 7 may be the cause of our portability
problems on OpenBSD.  Function-scope statics with destructors are also
problematic on Linux if we ever do library unloading.  So the first two examples
above are things that I would prefer not to imitate.
(Reporter)

Comment 10

14 years ago
David, yes, agree, you edge onto a couple things that are pertinent, but maybe 
I was trying to avoid. 

Changed/new string library calls would be useful here. Problem I worry about 
is potential long debate about thawing strings and adding or changing function 
there. In addition to what you said, other things could be better. Right now, 
EqualsIgnoreCase comparison flows into general compare (less, equal, or 
greater), and does more work than simpler 'equal or not-equal' requires. Also 
in strings, today, IgnoreCase compares do a ToLower of both of the strings. In 
most cases (text literal in the code), the compare string is known already all 
lower case, so there's no need to ToLower that side of the compare, and so on.    

Second, not clear to me why/where the ref'd EqualIgnoreCase calls should be 
doing 'ignore case' comparisons. I think XUL related stuff is spec'd to be 
case sensitive. Also where the strings are being internally-generated (i.e 
they come from other code, not from an input document) then the case of the 
string is known. Case conversion ahould not be needed in a lot of cases. 
Problem here is, regardless of spec or whatever, where code today accepts 
ignore-case, then 'fixing it' to not ignore-case might break things. 

Because these issues get into things more complex, I was just trying to stay 
in more simple, safe areas of the problem.

I also agree compare code now is NOT horrible. Only reason there is perf 
potential is because its very heavily used function. Saving a few microseconds 
is small, but saving that tens of thousands of times adds up.       
(Reporter)

Comment 11

14 years ago
Not sure it is a useful thing to whine about... doing it anyway... its a shame 
to have limited use of static and/or static const because of compiler bugs on 
a rather low-importance platform... Its not just a coding inconvience issue. 
In some cases, the compiler generated code is dramaticly improved when data 
involved is known by compiler to be static and/or const. (lacking this, the 
compiler usually constantly reloads data's address and then then reload  
contents of that address). Can dramaticly expand the code in a tight code 
loop. Okay, anyway, got that off my chest.... now, off I go to do code to deal 
with it...       
There's nothing wrong with using statics if you handle the contstruction and
destruction manually, and there's nothing wrong with using const.
(Reporter)

Comment 13

14 years ago
Okay, another way to improve this. Its probably the most fast and least changed
code fix. However its pretty ugly source code wise. Approach is, only call
EqualsIgnoreCase when strings are equal length. Compare lengths in calling-code
before calling EqualsIgnoreCase. 

Example: 
Another function to add to my list above is mozilla/gfx/src/nsFonts.cpp
nsFont::GetGenericID
http://lxr.mozilla.org/seamonkey/source/gfx/src/nsFont.cpp#208
today looks like this:

void nsFont::GetGenericID(const nsString& aGeneric, PRUint8* aID)
 {
   *aID = kGenericFont_NONE;
   if (aGeneric.EqualsIgnoreCase("-moz-fixed")) *aID = GenericFont_moz_fixed;
   else if (aGeneric.EqualsIgnoreCase("serif"))*aID = kGenericFont_serif;
   else if (aGeneric.EqualsIgnoreCase("sans-serif")) *aID = kGenericFont_sans_serif;
   else if (aGeneric.EqualsIgnoreCase("cursive"))    *aID = kGenericFont_cursive;
   else if (aGeneric.EqualsIgnoreCase("fantasy"))    *aID = kGenericFont_fantasy;
   else if (aGeneric.EqualsIgnoreCase("monospace"))  *aID = kGenericFont_monospace;
 }

changed to this:
void nsFont::GetGenericID(const nsString& aGeneric, PRUint8* aID)              
                 
{                                                                              
                 
  *aID = kGenericFont_NONE;                                                    
                 
  uint len = aGeneric.Length();                                                
                 
                                                                               
                 
  if (len==10 && aGeneric.EqualsIgnoreCase("-moz-fixed")) *aID =
kGenericFont_moz_fixed;    
  else if (len==5  && aGeneric.EqualsIgnoreCase("serif")) *aID =
kGenericFont_serif;        
  else if (len==10 && aGeneric.EqualsIgnoreCase("sans-serif")) *aID =
kGenericFont_sans_serif;   
  else if (len==7  && aGeneric.EqualsIgnoreCase("cursive")) *aID =
kGenericFont_cursive;      
  else if (len==7  && aGeneric.EqualsIgnoreCase("fantasy")) *aID =
kGenericFont_fantasy;      
  else if (len==9  && aGeneric.EqualsIgnoreCase("monospace"))  *aID =
kGenericFont_monospace;    
}                                                                              
                 

To be sure, its not very pretty. On the other hand, just the routine above
reduces number of calls to EqualsIgnoreCase by several hundred calls per web
page. By doing this to all the code areas mentioned above, causes about
two-thirds reduction in the number of repeated (non-init-tine) calls to
EqualsIgnoreCase. 

(Although ugly, its much easier change than alternatives. Maybe it'd be okay to
use such code for just the worst places????? I think its an easy low-risk patch
like this. The other ways to do it 'as well or better' are big and risky
patches. IMHO)

Comments please.

  
Filed bug 223012 to deal with the "should xul be case sensitive" issue.
Status: UNCONFIRMED → NEW
Ever confirmed: true

Comment 15

14 years ago
How about
#define EqualsLiteralIgnoreCase(aString, aLiteral) \
(aString.Length() == sizeof(aLiteral) - 1 && aString.EqualsIgnoreCase(aLiteral))

Comment 16

14 years ago
(I probably missed out some ()s above)
It's a pity you can't template on the length of an array:
template <class C> inline PRBool EqualsIgnoreCase(C) {
  return sizeof(C) - 1 == Length() && EqualsIgnoreCase((const char *)C);
}

Comment 17

14 years ago
Created attachment 133716 [details]
Works, sort of...

Ok, so stepping through this in gdb does what I expect :-)
We could have something like

EqualsUTF8IgnoreCase(nsAString&, nsACString&)

and then just do

EqualsUTF8IgnoreCase(aSource, NS_LITERAL_CSTRING("FooBar"))
We have a Layout: Misc Code component for a reason. (Namely, so I don't get 
notifications from bugs like these. :-P)
Assignee: other → misc
Component: Layout → Layout: Misc Code
QA Contact: ian → nobody
Assignee: layout.misc-code → nobody
QA Contact: nobody → layout.misc-code
You need to log in before you can comment on or make changes to this bug.