Open Bug 221026 Opened 22 years ago Updated 3 years ago

reduce repeated string conversions

Categories

(Core :: Layout, defect)

x86
All
defect

Tracking

()

People

(Reporter: emrick, Unassigned)

Details

(Keywords: perf)

Attachments

(1 file)

User-Agent: Mozilla/5.0(compatible;MSIE 5.5;Windows 98) Build Identifier: In several areas of layout, strings are repeatedly converted from ascii to ucs, in either comparison or assignment. Performance can be improved by storing the ucs string and doing ucs-to-ucs comparisons and assignments. The purpose of this bug will be to produce a patch that implements improvements in this area. The focus of this is only to reduce the mainline, frequently repeated, string conversion operations. Reproducible: Always Steps to Reproduce:
My initial investigation has led me to the following areas of code producing many/most of the repeated conversions. Often, the conversions are a form of this.EqualsIgnoreCase("FooBar"); Where it would be more efficient to create a static ucs with value 'FooBar', and then, do something more like this: this.Equals(mFooBar,nsCaseInsensitiveComparitor) This can be much faster because, as a string object, most cases simply the length will/can be compared to determine inequality. With a non-object string literal, the compare is character-by charater. Here is a list of code areas proposed to be improved. (This list is not in a particular order). 1. nsDomEvent::nsDomEvent http://lxr.mozilla.org/seamonkey/source/content/events/src/nsDOMEvent.cpp#150 Here, there is a additional kind of fix. In the big if/else clause here, the most frequent kind of if (eventType.EqualsIgnoreCase("xxxx")) is the eventType of "Events". A check for "Events" should be made as the first if in the set of if/else clauses. 2. nsEventListenerManager::CreateEvent http://lxr.mozilla.org/seamonkey/source/content/events/src/nsEventListenerManager.cpp#1536 Again here, "Events" is the most common case and should be moved to first in list. 3.nsMenuFrame::IsSizedToPopup http://lxr.mozilla.org/seamonkey/source/layout/xul/base/src/nsMenuFrame.cpp#946 the checks for "always" and "pref" are many times repeated. These strings should be static ucs string objects, comparison made against those. 4.CSSParserImpl::ParseAttributeSelector http://lxr.mozilla.org/seamonkey/source/content/html/style/src/nsCSSParser.cpp#1901 More than one area of opportunity here, the most important one is 2065 while ((htmlAttr = caseSensitiveHTMLAttribute[i++])) { 2066 if (attr.EqualsIgnoreCase(htmlAttr)) { running through the list of seven attributes. These should be static ucs objects rather than local ascii char*, to reduce repeated creation and conversion. 5.Slightly different flavor, same kinda problem: PresShell::NotifyReflowObservers http://lxr.mozilla.org/seamonkey/source/layout/html/base/src/nsPresShell.cpp#2699 takes aData as parm and always does NS_ConvertASCIItoUCS2(aData).get(). The strings used as input to this routine ("RESIZE REFLOW") should be crated as ucs statics and passed in that way, eliminating the need for conversion. That's my starting list. This so far looks to be worth 2%-4% performance gain for typical busy pages, such as cnn.com. Comments or questions?
Keywords: perf
Note, I am particularly interested, at the moment, in Linux performance, where NS_LITERAL_STRING is not compiled away, but rather, results in runtime NS_ConvertASCIItoUTF16(). I would also like to reduce the number of these are repeated as well.
Why do we even have this NotifyReflowObservers method? I can't seem to find any observers for that topic in the tree....
Oh, and NS_LITERAL_STRING is "compiled away" with GCC 3.x. It's just GCC 2.9x that does the NS_ConvertASCIItoUTF16.
Thanks Boris for the quick feedback. On Reflow, I dunno, I could not find the use for them either, but the scope of all-of-reflow is beyond my brain. On gcc versions, do we all not care about gcc 2.96 ?? Real question, not rhetorical, I don't know. I thought there might be concerns/issues about binary compatibility, and plugins, derivitives, and etc????
Some platforms have problems with static constructors and destructors, so we try not to use them (if that's what you meant by "a static ucs"). In most cases, comparison that has a conversion within it really shouldn't be much slower than comparison without a conversion within it, since comparisons usually fail within the first few characters. We need some work on string comparison APIs, though. (I'm not sure what plans already exist.) Also, NS_LITERAL_STRING is compiled away on not only gcc 3.x but any of the newer RedHat gcc 2.96 packages. I'm not particularly interested in optimizing performance for the bad NS_LITERAL_STRING case, unless there are a very small number of callsites that can make a significant improvement. Does anyone depend on PresShell::NotifyReflowObservers, or can it just be removed?
Question: is code of this form okay? static CONST NS_NAMED_LITERAL_STRING(mFooBar, "FooBar"); I thought these kind of static initializers were 'A Bad Thing', but find them used heavily, for example, http://lxr.mozilla.org/seamonkey/source/extensions/webservices/soap/src /nsDefaultSOAPEncoder.cpp http://lxr.mozilla.org/seamonkey/source/content/xul/document/src/nsXULC ontentSink.cpp#1116 http://lxr.mozilla.org/seamonkey/source/htmlparser/src/nsHTMLTokens.cpp #590 so now I am confused.. Okay to do this?
Also, it sounds like at least part of what this bug is reporting is that the equivalent of an EqualsIgnoreCase that takes an nsASingleFragmentCString& would be useful, since it could do a length check first.
I think the initializers in comment 7 may be the cause of our portability problems on OpenBSD. Function-scope statics with destructors are also problematic on Linux if we ever do library unloading. So the first two examples above are things that I would prefer not to imitate.
David, yes, agree, you edge onto a couple things that are pertinent, but maybe I was trying to avoid. Changed/new string library calls would be useful here. Problem I worry about is potential long debate about thawing strings and adding or changing function there. In addition to what you said, other things could be better. Right now, EqualsIgnoreCase comparison flows into general compare (less, equal, or greater), and does more work than simpler 'equal or not-equal' requires. Also in strings, today, IgnoreCase compares do a ToLower of both of the strings. In most cases (text literal in the code), the compare string is known already all lower case, so there's no need to ToLower that side of the compare, and so on. Second, not clear to me why/where the ref'd EqualIgnoreCase calls should be doing 'ignore case' comparisons. I think XUL related stuff is spec'd to be case sensitive. Also where the strings are being internally-generated (i.e they come from other code, not from an input document) then the case of the string is known. Case conversion ahould not be needed in a lot of cases. Problem here is, regardless of spec or whatever, where code today accepts ignore-case, then 'fixing it' to not ignore-case might break things. Because these issues get into things more complex, I was just trying to stay in more simple, safe areas of the problem. I also agree compare code now is NOT horrible. Only reason there is perf potential is because its very heavily used function. Saving a few microseconds is small, but saving that tens of thousands of times adds up.
Not sure it is a useful thing to whine about... doing it anyway... its a shame to have limited use of static and/or static const because of compiler bugs on a rather low-importance platform... Its not just a coding inconvience issue. In some cases, the compiler generated code is dramaticly improved when data involved is known by compiler to be static and/or const. (lacking this, the compiler usually constantly reloads data's address and then then reload contents of that address). Can dramaticly expand the code in a tight code loop. Okay, anyway, got that off my chest.... now, off I go to do code to deal with it...
There's nothing wrong with using statics if you handle the contstruction and destruction manually, and there's nothing wrong with using const.
Okay, another way to improve this. Its probably the most fast and least changed code fix. However its pretty ugly source code wise. Approach is, only call EqualsIgnoreCase when strings are equal length. Compare lengths in calling-code before calling EqualsIgnoreCase. Example: Another function to add to my list above is mozilla/gfx/src/nsFonts.cpp nsFont::GetGenericID http://lxr.mozilla.org/seamonkey/source/gfx/src/nsFont.cpp#208 today looks like this: void nsFont::GetGenericID(const nsString& aGeneric, PRUint8* aID) { *aID = kGenericFont_NONE; if (aGeneric.EqualsIgnoreCase("-moz-fixed")) *aID = GenericFont_moz_fixed; else if (aGeneric.EqualsIgnoreCase("serif"))*aID = kGenericFont_serif; else if (aGeneric.EqualsIgnoreCase("sans-serif")) *aID = kGenericFont_sans_serif; else if (aGeneric.EqualsIgnoreCase("cursive")) *aID = kGenericFont_cursive; else if (aGeneric.EqualsIgnoreCase("fantasy")) *aID = kGenericFont_fantasy; else if (aGeneric.EqualsIgnoreCase("monospace")) *aID = kGenericFont_monospace; } changed to this: void nsFont::GetGenericID(const nsString& aGeneric, PRUint8* aID) { *aID = kGenericFont_NONE; uint len = aGeneric.Length(); if (len==10 && aGeneric.EqualsIgnoreCase("-moz-fixed")) *aID = kGenericFont_moz_fixed; else if (len==5 && aGeneric.EqualsIgnoreCase("serif")) *aID = kGenericFont_serif; else if (len==10 && aGeneric.EqualsIgnoreCase("sans-serif")) *aID = kGenericFont_sans_serif; else if (len==7 && aGeneric.EqualsIgnoreCase("cursive")) *aID = kGenericFont_cursive; else if (len==7 && aGeneric.EqualsIgnoreCase("fantasy")) *aID = kGenericFont_fantasy; else if (len==9 && aGeneric.EqualsIgnoreCase("monospace")) *aID = kGenericFont_monospace; } To be sure, its not very pretty. On the other hand, just the routine above reduces number of calls to EqualsIgnoreCase by several hundred calls per web page. By doing this to all the code areas mentioned above, causes about two-thirds reduction in the number of repeated (non-init-tine) calls to EqualsIgnoreCase. (Although ugly, its much easier change than alternatives. Maybe it'd be okay to use such code for just the worst places????? I think its an easy low-risk patch like this. The other ways to do it 'as well or better' are big and risky patches. IMHO) Comments please.
Filed bug 223012 to deal with the "should xul be case sensitive" issue.
Status: UNCONFIRMED → NEW
Ever confirmed: true
How about #define EqualsLiteralIgnoreCase(aString, aLiteral) \ (aString.Length() == sizeof(aLiteral) - 1 && aString.EqualsIgnoreCase(aLiteral))
(I probably missed out some ()s above) It's a pity you can't template on the length of an array: template <class C> inline PRBool EqualsIgnoreCase(C) { return sizeof(C) - 1 == Length() && EqualsIgnoreCase((const char *)C); }
Attached file Works, sort of...
Ok, so stepping through this in gdb does what I expect :-)
We could have something like EqualsUTF8IgnoreCase(nsAString&, nsACString&) and then just do EqualsUTF8IgnoreCase(aSource, NS_LITERAL_CSTRING("FooBar"))
We have a Layout: Misc Code component for a reason. (Namely, so I don't get notifications from bugs like these. :-P)
Assignee: other → misc
Component: Layout → Layout: Misc Code
QA Contact: ian → nobody
Assignee: layout.misc-code → nobody
QA Contact: nobody → layout.misc-code
Product: Core → Core Graveyard
Component: Layout: Misc Code → Layout
Product: Core Graveyard → Core
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: