Closed Bug 196506 Opened 21 years ago Closed 21 years ago

Excessive inlining in string libs: Substring()

Categories

(Core :: XPCOM, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: dwitte, Assigned: jag+mozilla)

References

Details

Attachments

(1 file, 3 obsolete files)

So during my rewrite of cookie code, alecf noticed that one patch in particular
had increased codesize by ~4k. This was determined to be due to converting
cookie code to use the string classes instead of char*. We kinda let that one
lie, but I've just finished another patch for cookies which increases codesize
by ~3k again. After digging around, mvl found the culprit... Substring().

It turns out that Substring() is inlined. Substring() calls either
nsDependent{C}Substring or nsDependentSingleFragment{C}Substring, whose ctors
are themselves inlined. Which call other ctors/functions that are inlined. This
would be fine if the resulting functions were small, but it turns out that each
instance of any of the above three functions are 200-300bytes in opt builds!

nsDependent{C}Substring: ~200 bytes per instance
nsDependentSingleFragment{C}Substring: ~300 bytes per instance
Substring: either of the above, depending what type is handed to it

We did some further testing and found that un-inlining the two ctors mentioned
above, results in a ~57k codesize reduction. mvl is unable to reliably measure
any impact on Ts/Tp/Txul (which jag recommended measuring), due to too much
fluctuation. However, a simple test on a 1,000,000-iteration |while| loop that
calls Substring() on an nsString and an nsCString, showed:

non-inline is 0.6% slower than inline (with a standard deviation of ~0.3%,
averaged over 6 runs). Details: platform is linux, compiler is gcc 3.2.2 debian
prerelease, compiled with -o2.

This can be expected to make even less difference in "real" code, and could even
result in a speedup due to less icache blowage etc, but that's purely
speculation... my point is that we shouldn't blindly expect unlining to reduce
performance, without testing it out.

I'm attaching a patch (used to get that 57k reduction figure) that uninlines all
four ctors mentioned above. Since it's been difficult to get reliable Ts/Tp/Txul
figures on dev boxes, it might be worth checking this into the tree and letting
the tboxes get a few build cycles in, just to see what difference it makes. What
do you guys think?
Attached patch uninlining patch (obsolete) — Splinter Review
Blocks: 188224
Comment on attachment 116670 [details] [diff] [review]
uninlining patch

it'd be nice if the lines tried to honor 80cols :)
Attachment #116670 - Flags: review+
Checked in for dwitte. We'll watch tinderbox to check for issues.
Status: NEW → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
We have issues.

bbaetz kindly checked in during the wee hours of Monday morning, to see what
impact the patch will have across the platforms. However, it killed all the clbr
builds (dep was fine), due to a linker error...

embedstring.lib(nsDependentSubstring.obj) : error LNK2001: unresolved external
symbol "unsigned int __cdecl Distance(class nsReadingIterator<char> const
&,class nsReadingIterator<char> const &)"
(?Distance@@YAIABV?$nsReadingIterator@D@@0@Z)

no idea what's going on here, perhaps we need to add nsStringIteratorUtils.h or
nsReadableUtils.h to the string embedding makefile, string/embed/Makefile.in?

BTW, we got a few good tbox results:
a) from linux comet dep: -78k codesize, *no change* in Txul/Ts (if anything,
Txul dropped by an amount slightly outside the experimental error, but this
could be noise)...
b) from MacOSX monkey dep: no change in Tp (within exp error), perhaps a slight
increase in Ts (~1%?)
c) WINNT beast dep: no change in Txul, inconclusive data for Ts (fluctuates all
over the place). interestingly, codesize was +755b on this platform (->0 after
stripping?), so i guess this compiler already doesn't inline those functions.
reopening
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
nsEmbedString is a very small implementation of the AString and ACString.  If
you un-inline stuff, you may have to make a copy of that method in string/embed/.
ah, okay. I'm not very familiar with string/embed - can you explain why it broke
there (if you know)? What would I need to change - just add something to a
makefile, or copy-paste the function impl's completely?
Basically, string/embed has a small implementation of the AString and ACString.
 It compiles the bare minimun source files in string/* to implement these two
classes.  Now, if you change an inline method in a header file file, the embed
string implementation of this method may no longer be available to the library
or the consumer of the library.  Therefore, you might have to copy this
no-longer-inline method into the string/embed library.
Attached patch fix embedstring (obsolete) — Splinter Review
Patch which includes copying Distance() into nsEmbedString.cpp
I am not sure if it is the right thing to do. Maybe it would be better to
create a new file?
Comment on attachment 116784 [details] [diff] [review]
fix embedstring

this actually seems like the right thing to do, as long as you add a comment
explaining where the code came from. sr=alecf as long as the r= is from dougt
Attachment #116784 - Flags: superreview+
I just had another thought about this - is there any way we could just stop
inling Substring()? If substring (and not EVERY use of nsDependentString) is the
cause of the bloat, then it would make sense to make Substring() a non-inline
call, and then nsDependentString would be inlined inside of that. 

That said, if this causes no noticable performance degredation, then we should
surely stop inlining nsDependentString, as the patch does.
Attached patch Include comment (obsolete) — Splinter Review
Now with comment. Carrying forward sr=
Attachment #116784 - Attachment is obsolete: true
Attachment #116795 - Flags: superreview+
Attachment #116795 - Flags: review?(dougt)
Initially I thought it'd be nice to uninline all of them, for consistency
(they're the same functions really...), but I was hoping you guys'd have a
comment on that.

Looks like we're getting no degradation (luna even showed some slight
improvements), so doesn't look like it's a big deal. If the iterator-based ctors
are small (the length-based ones aren't), maybe we could leave those as inline.
Attachment #116670 - Attachment is obsolete: true
Attachment #116795 - Flags: review?(dougt) → review+
Speaking of inlining, has anyone tried to inline ns{C}AutoString constructors?
That might be a perf win.
I don't think inlining those constructors would be a win -- I think there's
enough inside of them that we can't inline the whole thing, and, given that,
we'd be better off leaving them non-inline and perhaps inlining or simplifying
the things inside of them (see, e.g., bug 188828).
okay, this caused bustage again - even though it was tested on linux. This is
strange.

the link error looks like libembedstring and libstring are picking up multiple
definitions of Distance() now, instead of not enough (as previously). they're
finding them in nsReadableUtils.o - looks like we need some Makefile changes to
go along with this patch.

embedstring_s.lib(nsEmbedString.obj) : error LNK2005: "unsigned int __cdecl
Distance(class nsReadingIterator<unsigned short> const &,class
nsReadingIterator<unsigned short> const &)"
(?Distance@@YAIABV?$nsReadingIterator@G@@0@Z) already defined in
string_s.lib(nsReadableUtils.obj)
embedstring_s.lib(nsEmbedString.obj) : error LNK2005: "unsigned int __cdecl
Distance(class nsReadingIterator<char> const &,class nsReadingIterator<char>
const &)" (?Distance@@YAIABV?$nsReadingIterator@D@@0@Z) already defined in
string_s.lib(nsReadableUtils.obj)
embedstring_s.lib(nsEmbedString.obj) : warning LNK4006: "unsigned int __cdecl
Distance(class nsReadingIterator<unsigned short> const &,class
nsReadingIterator<unsigned short> const &)"
(?Distance@@YAIABV?$nsReadingIterator@G@@0@Z) already defined in
string_s.lib(nsReadableUtils.obj); second definition ignored
embedstring_s.lib(nsEmbedString.obj) : warning LNK4006: "unsigned int __cdecl
Distance(class nsReadingIterator<char> const &,class nsReadingIterator<char>
const &)" (?Distance@@YAIABV?$nsReadingIterator@D@@0@Z) already defined in
string_s.lib(nsReadableUtils.obj); second definition ignored
   Creating library mozilla.lib and object mozilla.exp
mozilla.exe : fatal error LNK1169: one or more multiply defined symbols found

dougt, is there any preferred way you'd like us to fix this?
Attached patch new patchSplinter Review
after talking with dbaron, we agreed that all that a solution was to mask out
the the CalculateLength, Distance_Impl, and Distance(*) methods from the
non-standalone version of the embedstring.  To do this, I added a new build
flag STRING_STANDALONE which is included defined in the string/embed/standalone
makefile.  If this define is true, the CalculateLength, Distance_Impl, and
Distance(*) methods are compilied into the lib, otherwise they are not.
Attachment #116795 - Attachment is obsolete: true
Comment on attachment 117648 [details] [diff] [review]
new patch

sounds great to me.
sr=alecf, assuming r=dbaron
Attachment #117648 - Flags: superreview+
Comment on attachment 117648 [details] [diff] [review]
new patch

>Index: embed/Makefile.in
>===================================================================
>RCS file: /cvsroot/mozilla/string/embed/Makefile.in,v
>retrieving revision 1.1
>diff -u -1 -0 -r1.1 Makefile.in
>--- embed/Makefile.in	14 Jan 2003 03:34:58 -0000	1.1
>+++ embed/Makefile.in	18 Mar 2003 22:53:19 -0000
>@@ -91,11 +91,11 @@
> GARBAGE		+= $(addprefix $(srcdir)/,$(STRING_LCSRCS))
> endif
> 
> SRCS_IN_OBJDIR	= 1
> 
> include $(topsrcdir)/config/rules.mk
> 
> export:: $(STRING_CSRCS)
> 	$(INSTALL) $^ .
> 
>-DEFINES		+= -DXPCOM_GLUE
>+DEFINES		+= -DXPCOM_GLUE 

Better not to add extra whitespace at the end (which is what this change is).

Also, I would slightly prefer that the macro be called EMBEDSTRING_STANDALONE
rather than STRING_STANDALONE, but if you want to leave it that's fine as well.

r=dbaron
Attachment #117648 - Flags: review+
neat, thanks a lot for fixing it up dougt :)

let's hope it was worth it, after all that.

by the way - now that this low-hanging fruit has been picked, do you have any
opinion on optimizing any of the string impls? it just irks me that these beasts
are 300+ bytes to start with, when you can do the same thing in char* for
probably 4.

maybe one day when i'm really bored, i'll start playing in string/ ...
third time lucky, though not without bustage.
Status: REOPENED → RESOLVED
Closed: 21 years ago21 years ago
Resolution: --- → FIXED
Dan Witte wrote:
> third time lucky, though not without bustage

Is anyone looking at the "nebiros" bustage ? AFAIK it misses a small |#include
"nsReadableUtils.h"| to go back to "green" state again...
Component: String → XPCOM
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: