Open
Bug 221026
Opened 22 years ago
Updated 3 years ago
reduce repeated string conversions
Categories
(Core :: Layout, defect)
Tracking
()
NEW
People
(Reporter: emrick, Unassigned)
Details
(Keywords: perf)
Attachments
(1 file)
605 bytes,
text/plain
|
Details |
User-Agent: Mozilla/5.0(compatible;MSIE 5.5;Windows 98)
Build Identifier:
In several areas of layout, strings are repeatedly converted from ascii to ucs,
in either comparison or assignment. Performance can be improved by storing the
ucs string and doing ucs-to-ucs comparisons and assignments. The purpose of this
bug will be to produce a patch that implements improvements in this area. The
focus of this is only to reduce the mainline, frequently repeated, string
conversion operations.
Reproducible: Always
Steps to Reproduce:
Reporter | ||
Comment 1•22 years ago
|
||
My initial investigation has led me to the following areas of code producing
many/most of the repeated conversions. Often, the conversions are a form of
this.EqualsIgnoreCase("FooBar");
Where it would be more efficient to create a static ucs with value 'FooBar', and
then, do something more like this:
this.Equals(mFooBar,nsCaseInsensitiveComparitor)
This can be much faster because, as a string object, most cases simply the
length will/can be compared to determine inequality. With a non-object string
literal, the compare is character-by charater.
Here is a list of code areas proposed to be improved.
(This list is not in a particular order).
1. nsDomEvent::nsDomEvent
http://lxr.mozilla.org/seamonkey/source/content/events/src/nsDOMEvent.cpp#150
Here, there is a additional kind of fix. In the big if/else clause here,
the most frequent kind of if (eventType.EqualsIgnoreCase("xxxx")) is the
eventType of "Events". A check for "Events" should be made as the first if in
the set of if/else clauses.
2. nsEventListenerManager::CreateEvent
http://lxr.mozilla.org/seamonkey/source/content/events/src/nsEventListenerManager.cpp#1536
Again here, "Events" is the most common case and should be moved to first in list.
3.nsMenuFrame::IsSizedToPopup
http://lxr.mozilla.org/seamonkey/source/layout/xul/base/src/nsMenuFrame.cpp#946
the checks for "always" and "pref" are many times repeated. These strings should
be static ucs string objects, comparison made against those.
4.CSSParserImpl::ParseAttributeSelector
http://lxr.mozilla.org/seamonkey/source/content/html/style/src/nsCSSParser.cpp#1901
More than one area of opportunity here, the most important one is
2065 while ((htmlAttr = caseSensitiveHTMLAttribute[i++])) {
2066 if (attr.EqualsIgnoreCase(htmlAttr)) {
running through the list of seven attributes. These should be static ucs objects
rather than local ascii char*, to reduce repeated creation and conversion.
5.Slightly different flavor, same kinda problem: PresShell::NotifyReflowObservers
http://lxr.mozilla.org/seamonkey/source/layout/html/base/src/nsPresShell.cpp#2699
takes aData as parm and always does NS_ConvertASCIItoUCS2(aData).get(). The
strings used as input to this routine ("RESIZE REFLOW") should be crated as ucs
statics and passed in that way, eliminating the need for conversion.
That's my starting list. This so far looks to be worth 2%-4% performance gain
for typical busy pages, such as cnn.com.
Comments or questions?
Keywords: perf
Reporter | ||
Comment 2•22 years ago
|
||
Note, I am particularly interested, at the moment, in Linux performance, where
NS_LITERAL_STRING is not compiled away, but rather, results in runtime
NS_ConvertASCIItoUTF16(). I would also like to reduce the number of these are
repeated as well.
![]() |
||
Comment 3•22 years ago
|
||
Why do we even have this NotifyReflowObservers method? I can't seem to find any
observers for that topic in the tree....
![]() |
||
Comment 4•22 years ago
|
||
Oh, and NS_LITERAL_STRING is "compiled away" with GCC 3.x. It's just GCC 2.9x
that does the NS_ConvertASCIItoUTF16.
Reporter | ||
Comment 5•22 years ago
|
||
Thanks Boris for the quick feedback.
On Reflow, I dunno, I could not find the use for them either, but the
scope of all-of-reflow is beyond my brain.
On gcc versions, do we all not care about gcc 2.96 ?? Real question,
not rhetorical, I don't know. I thought there might be concerns/issues
about binary compatibility, and plugins, derivitives, and etc????
Some platforms have problems with static constructors and destructors, so we try
not to use them (if that's what you meant by "a static ucs").
In most cases, comparison that has a conversion within it really shouldn't be
much slower than comparison without a conversion within it, since comparisons
usually fail within the first few characters. We need some work on string
comparison APIs, though. (I'm not sure what plans already exist.)
Also, NS_LITERAL_STRING is compiled away on not only gcc 3.x but any of the
newer RedHat gcc 2.96 packages. I'm not particularly interested in optimizing
performance for the bad NS_LITERAL_STRING case, unless there are a very small
number of callsites that can make a significant improvement.
Does anyone depend on PresShell::NotifyReflowObservers, or can it just be removed?
Reporter | ||
Comment 7•22 years ago
|
||
Question: is code of this form okay?
static CONST NS_NAMED_LITERAL_STRING(mFooBar, "FooBar");
I thought these kind of static initializers were 'A Bad Thing',
but find them used heavily, for example,
http://lxr.mozilla.org/seamonkey/source/extensions/webservices/soap/src
/nsDefaultSOAPEncoder.cpp
http://lxr.mozilla.org/seamonkey/source/content/xul/document/src/nsXULC
ontentSink.cpp#1116
http://lxr.mozilla.org/seamonkey/source/htmlparser/src/nsHTMLTokens.cpp
#590
so now I am confused.. Okay to do this?
Also, it sounds like at least part of what this bug is reporting is that the
equivalent of an EqualsIgnoreCase that takes an nsASingleFragmentCString& would
be useful, since it could do a length check first.
I think the initializers in comment 7 may be the cause of our portability
problems on OpenBSD. Function-scope statics with destructors are also
problematic on Linux if we ever do library unloading. So the first two examples
above are things that I would prefer not to imitate.
Reporter | ||
Comment 10•22 years ago
|
||
David, yes, agree, you edge onto a couple things that are pertinent, but maybe
I was trying to avoid.
Changed/new string library calls would be useful here. Problem I worry about
is potential long debate about thawing strings and adding or changing function
there. In addition to what you said, other things could be better. Right now,
EqualsIgnoreCase comparison flows into general compare (less, equal, or
greater), and does more work than simpler 'equal or not-equal' requires. Also
in strings, today, IgnoreCase compares do a ToLower of both of the strings. In
most cases (text literal in the code), the compare string is known already all
lower case, so there's no need to ToLower that side of the compare, and so on.
Second, not clear to me why/where the ref'd EqualIgnoreCase calls should be
doing 'ignore case' comparisons. I think XUL related stuff is spec'd to be
case sensitive. Also where the strings are being internally-generated (i.e
they come from other code, not from an input document) then the case of the
string is known. Case conversion ahould not be needed in a lot of cases.
Problem here is, regardless of spec or whatever, where code today accepts
ignore-case, then 'fixing it' to not ignore-case might break things.
Because these issues get into things more complex, I was just trying to stay
in more simple, safe areas of the problem.
I also agree compare code now is NOT horrible. Only reason there is perf
potential is because its very heavily used function. Saving a few microseconds
is small, but saving that tens of thousands of times adds up.
Reporter | ||
Comment 11•22 years ago
|
||
Not sure it is a useful thing to whine about... doing it anyway... its a shame
to have limited use of static and/or static const because of compiler bugs on
a rather low-importance platform... Its not just a coding inconvience issue.
In some cases, the compiler generated code is dramaticly improved when data
involved is known by compiler to be static and/or const. (lacking this, the
compiler usually constantly reloads data's address and then then reload
contents of that address). Can dramaticly expand the code in a tight code
loop. Okay, anyway, got that off my chest.... now, off I go to do code to deal
with it...
There's nothing wrong with using statics if you handle the contstruction and
destruction manually, and there's nothing wrong with using const.
Reporter | ||
Comment 13•22 years ago
|
||
Okay, another way to improve this. Its probably the most fast and least changed
code fix. However its pretty ugly source code wise. Approach is, only call
EqualsIgnoreCase when strings are equal length. Compare lengths in calling-code
before calling EqualsIgnoreCase.
Example:
Another function to add to my list above is mozilla/gfx/src/nsFonts.cpp
nsFont::GetGenericID
http://lxr.mozilla.org/seamonkey/source/gfx/src/nsFont.cpp#208
today looks like this:
void nsFont::GetGenericID(const nsString& aGeneric, PRUint8* aID)
{
*aID = kGenericFont_NONE;
if (aGeneric.EqualsIgnoreCase("-moz-fixed")) *aID = GenericFont_moz_fixed;
else if (aGeneric.EqualsIgnoreCase("serif"))*aID = kGenericFont_serif;
else if (aGeneric.EqualsIgnoreCase("sans-serif")) *aID = kGenericFont_sans_serif;
else if (aGeneric.EqualsIgnoreCase("cursive")) *aID = kGenericFont_cursive;
else if (aGeneric.EqualsIgnoreCase("fantasy")) *aID = kGenericFont_fantasy;
else if (aGeneric.EqualsIgnoreCase("monospace")) *aID = kGenericFont_monospace;
}
changed to this:
void nsFont::GetGenericID(const nsString& aGeneric, PRUint8* aID)
{
*aID = kGenericFont_NONE;
uint len = aGeneric.Length();
if (len==10 && aGeneric.EqualsIgnoreCase("-moz-fixed")) *aID =
kGenericFont_moz_fixed;
else if (len==5 && aGeneric.EqualsIgnoreCase("serif")) *aID =
kGenericFont_serif;
else if (len==10 && aGeneric.EqualsIgnoreCase("sans-serif")) *aID =
kGenericFont_sans_serif;
else if (len==7 && aGeneric.EqualsIgnoreCase("cursive")) *aID =
kGenericFont_cursive;
else if (len==7 && aGeneric.EqualsIgnoreCase("fantasy")) *aID =
kGenericFont_fantasy;
else if (len==9 && aGeneric.EqualsIgnoreCase("monospace")) *aID =
kGenericFont_monospace;
}
To be sure, its not very pretty. On the other hand, just the routine above
reduces number of calls to EqualsIgnoreCase by several hundred calls per web
page. By doing this to all the code areas mentioned above, causes about
two-thirds reduction in the number of repeated (non-init-tine) calls to
EqualsIgnoreCase.
(Although ugly, its much easier change than alternatives. Maybe it'd be okay to
use such code for just the worst places????? I think its an easy low-risk patch
like this. The other ways to do it 'as well or better' are big and risky
patches. IMHO)
Comments please.
Filed bug 223012 to deal with the "should xul be case sensitive" issue.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Comment 15•22 years ago
|
||
How about
#define EqualsLiteralIgnoreCase(aString, aLiteral) \
(aString.Length() == sizeof(aLiteral) - 1 && aString.EqualsIgnoreCase(aLiteral))
Comment 16•22 years ago
|
||
(I probably missed out some ()s above)
It's a pity you can't template on the length of an array:
template <class C> inline PRBool EqualsIgnoreCase(C) {
return sizeof(C) - 1 == Length() && EqualsIgnoreCase((const char *)C);
}
Comment 17•22 years ago
|
||
Ok, so stepping through this in gdb does what I expect :-)
We could have something like
EqualsUTF8IgnoreCase(nsAString&, nsACString&)
and then just do
EqualsUTF8IgnoreCase(aSource, NS_LITERAL_CSTRING("FooBar"))
Comment 19•22 years ago
|
||
We have a Layout: Misc Code component for a reason. (Namely, so I don't get
notifications from bugs like these. :-P)
Assignee: other → misc
Component: Layout → Layout: Misc Code
QA Contact: ian → nobody
Updated•16 years ago
|
Assignee: layout.misc-code → nobody
QA Contact: nobody → layout.misc-code
Updated•7 years ago
|
Product: Core → Core Graveyard
Assignee | ||
Updated•7 years ago
|
Component: Layout: Misc Code → Layout
Product: Core Graveyard → Core
Updated•3 years ago
|
Severity: normal → S3
You need to log in
before you can comment on or make changes to this bug.
Description
•