Closed
Bug 276366
Opened 20 years ago
Closed 8 years ago
need alloca() support in configure.in
Categories
(Firefox Build System :: General, enhancement)
Firefox Build System
General
Tracking
(Not tracked)
RESOLVED
INCOMPLETE
People
(Reporter: tim.ruehsen, Unassigned)
Details
(Keywords: perf)
Attachments
(2 files)
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.5) Gecko/20041219 Firefox/1.0 (Debian package 1.0+dfsg.1-1)
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.5) Gecko/20041219 Firefox/1.0 (Debian package 1.0+dfsg.1-1)
We have *many* functions which use malloc() (e.g. in the form of PR_Malloc())
for temporary buffers. malloc() is (very) slow in comparison with alloca(),
implies the possibility of memory leaks, crashes (if freed twice) and memory
fragmentation.
So, why not using alloca() on systems that support it (at least most gcc/glibc
systems)?
Here is a typical (bit stupid) example of malloc usage, found in many routines:
buf = PR_MALLOC(strlen(s1)+strlen(s2)+1);
if (buf) {
strcpy(buf,s1);
strcat(buf,s2);
some_function(buf);
PR_FREE(buf);
}
or
buf = PR_smprintf("%s%s",s1,s2);
if (buf) {
some_function(buf);
PR_FREE(buf);
}
with alloca(), not nicer but faster:
#ifdef HAVE_ALLOCA
buf = alloca(strlen(s1)+strlen(s2)+1);
if (buf) {
strcpy(buf,s1);
strcat(buf,s2);
some_function(buf);
}
#else
... malloc code...
#endif
For this, we need to change the configure script. It is just these two lines to
be inserted into configure.in:
AC_CHECK_HEADERS(alloca.h)
AC_FUNC_ALLOCA
Reproducible: Always
Comment 1•20 years ago
|
||
of course the other possibility for such code would be nsCAutoString/nsPrintfCString...
| Reporter | ||
Comment 2•20 years ago
|
||
-alloca() is available to C as well -construction/destruction of C++ classes is slow (just to mention, but please - the discussion is beyond the scope of this report) -nsPrintfCString malloc()s as soon as the resulting buffer is bigger than 15 bytes I can't find the class definition and implementation for nsAutoString - please give a pointer.
Comment 3•20 years ago
|
||
http://lxr.mozilla.org/seamonkey/source/xpcom/string/public/nsTString.h#492 (Available both as nsCAutoString and nsAutoString)
the last thing we need is a way for people to confuse yet another allocator. let's not.
| Reporter | ||
Comment 5•20 years ago
|
||
I was a bit confused yesterday (from C++ sources looking like C sources). We do not need alloca() for C++ since constructs like char buf[strlen(s1)+strlen(s2)+1]; are allowed by C++ definition. So, we are just talking about C sources. The above construct is not a problem for 'modern' C compilers like gcc, but still it is an extension to the C standard. alloca() does the same, but is also available outside the variable definition scope. alloca() is worth using for tuning C programms. It is not meant as 'another allocator' for general purpose. But it should be available for developers who are working on tuning. Anyway, we already can use it: #ifdef __GNUC__ use alloca() #else use malloc()/free() #endif But I would not claim this as save. HAVE_ALLOCA and HAVE_ALLOCA_H would be more reliable. So, why not adding 2 lines to configure.in?
Comment 6•20 years ago
|
||
> char buf[strlen(s1)+strlen(s2)+1];
>are allowed by C++ definition.
no they are not. C99 allows variable-length arrays, and GNU C++ allows them, but
standard C++ doesn't, and compilers like MSVC reject them.
Comment 7•20 years ago
|
||
(see for example http://www.redhat.com/docs/manuals/enterprise/RHEL-3-Manual/gcc/variable-length.html - "Variable-length automatic arrays are allowed in ISO C99, and as an extension GCC accepts them in C89 mode and in C++.")
| Reporter | ||
Comment 8•20 years ago
|
||
thx for enlightenment. these are good reasons to support alloca() via configure.in.
| Reporter | ||
Comment 9•20 years ago
|
||
Just a last word before I go home...
I made a quick and dirty speed comparison between nsCAutoString and alloca():
each 10 million times:
a)
void test(char *s1, char *s2)
{
nsCAutoString as(s1);
as+=s2;
}
b)
void test(char *s1, char *s2)
{
int len1=strlen(s1), len2=strlen(s2);
char *buf;
buf=(char *)PR_Malloc(len1+len2+1);
if (buf) {
strcpy(buf,s1);
strcpy(buf+len1,s2);
PR_Free(buf);
}
}
c)
void test(char *s1, char *s2)
{
int len1=strlen(s1), len2=strlen(s2);
char *buf;
buf=(char *)alloca(len1+len2+1);
if (buf) {
strcpy(buf,s1);
strcpy(buf+len1,s2);
}
}
a) 4.10 seconds
b) 2.25 seconds
c) 1.32 seconds
Comment 10•20 years ago
|
||
nsAutoString is sometimes used as a member variable in classes that are intended to be instantiated on the stack. Could such use cases be a concern? Moreover, right now when nsAutoString overflows to the heap, it allocates a sharable buffer. If alloca is used instead of malloc, then I don't think it will be possible to share the buffer with any arbitrary nsSubstring.
| Reporter | ||
Comment 11•20 years ago
|
||
Darin, i am not shure that we are talking about the same thing. Maybe you can be more precise. What exactly is 'sharable'(I can just guess what you are talking about, but i am not 100% shure)? Can you give a code example to demonstrate your point? Don't think alloca() is ment to be a replacement for malloc()! My examples should show you in which cases it might worth using it...
Comment 12•20 years ago
|
||
Tim:
Take this for example:
nsAutoString a;
// make |a|'s buffer contain a copy of "foopy", which happens to fit within
// |a|'s stack buffer, so no heap allocation is required.
a.Assign("foopy", 5);
// make |a|'s buffer contain an allocated copy of the first 1024 elements of
// the character array pointed to by |blah|.
a.Assign(blah, 1024);
nsString b;
b = a;
PRBool sameBuffer = b.get() == a.get();
I'm saying that |sameBuffer| would be true given that whenever our string code
implements copy-on-write buffer sharing. In other words, whenever it is
necessary to allocate a string buffer from the heap, an additional few bytes are
allocated at the front of the string buffer to store a reference count (and a
few other things).
So, I'm not sure how alloca would be useful for nsAutoString.
Comment 13•20 years ago
|
||
alloca's pretty righteous if you just want fast LIFO storage of variable size. I know of cases where nsAutoString or nsCAutoString is used today that would benefit from it, although it's hard to say the perf gain in real apps would be measurable. Maybe we need an unshared (thread-local) auto-string template type? /be
| Reporter | ||
Comment 14•20 years ago
|
||
Darin: I did not talk about using alloca() withing nsAutoString. First of all, we should discuss if we allow using alloca() at all. If we come to the point where we say 'yes', we can discuss using alloca() within nsAutoString with all it's implications within a second bug/request. I just say this because i am afraid that the original request is getting out of sight. Brendan: (real app perf measurement) This is a question for later... once we say 'yes' to alloca() we have the possibility to optimize code that seems to be 'slow' (e.g. loading the mysql documentation from hard disk takes much too long in firefox... and this is definitely an issue with string parsing and copying). (unshared auto-string template) this is also an issue that we should discuss after we said 'yes' to alloca(). After all, you see i am absolutely convinced about using alloca(). I used for several year in many threaded applications. Especially in heavily used daemons. Not only speed but also to lower (sometimes even avoid) memory fragmentation. I have lot's of mozilla source code here that needs to be fixed. And once I touch it, I would like to make a complete review/rewrite. In several cases I would like to introduce alloca() (and besides, some other functions like PR_asprintf and PL_strlcpy - but thats another bug report...)
Comment 15•20 years ago
|
||
(note that NSPR uses its own configure script, in nsprpub/, which would also need the ALLOCA checks if you want to use it there)
Comment 16•20 years ago
|
||
> a) 4.10 seconds
> b) 2.25 seconds
> c) 1.32 seconds
wow... i wouldn't have expected there to be so much difference between case a
and case b. perhaps we should investigate that testcase further because it may
reveal a great opportunity to improve performance.
anyways, in response to your previous comment, i understand that this bug is
just about adding configure magic for alloca. i was just responding to the
claim that alloca might be useful as an allocator for one or more of our string
types. i'm concerned that that may be difficult to implement given the way the
string classes are used today.
as for alloca alone, it would be nice to have some idea of where it may be used
in mozilla code. do you have some mozilla profiles that reveal significant
cases that could be optimized with some #ifdef HAVE_ALLOCA fun?
also, you should know that mozilla is mostly a single threaded app. it is true
that there are some background threads, but they probably account for no more
than 1% of the CPU time spent inside the mozilla process for typical page loads.
Comment 17•20 years ago
|
||
Of course, case a was slower. It should be written like this instead:
void test(char *s1, char *s2)
{
nsCAutoString as;
as = nsDependentCString(s1) + nsDependentCString(s2);
}
If the size of s1 and s2 combined fits within as's stack buffer (64 bytes), then
I would expect this testcase to perform similarly to case c. Otherwise, it
would perform like case b. There is extra overhead in this testcase though
because of the ctor/dtor's for nsCAutoString and the nsDependentCString objects.
A better testcase would take two nsCString objects as parameters or somehow
mitigate the overhead of those wrapper classes since that overhead would skew
the results.| Reporter | ||
Comment 18•20 years ago
|
||
I will make a test d) at the beginning of next week (than i am back in the office). Of course the overhead of ctors/dtors is relative big when using small strings. With larger strings - i assume - the relative difference between a),b) and c) will be smaller. I try to find some time to make graphs of time consumed as a f(strlen(s1)+strlen(s2)) for a)-d). That should give us an exact overview of when using alloca is reasonable. Happy new year!
| Reporter | ||
Comment 19•20 years ago
|
||
Here is the promised graphic.
Test case e) (just to make it complete):
void test_e(char *s1, char *s2)
{
char buf[256];
PRUint32 len1, len2;
len1=PL_strlcpy(buf,s1,sizeof(buf));
if (len1>=sizeof(buf))
test_c(s1,s2);
else {
len2=PL_strlcpy(buf+len1,s2,sizeof(buf));
if (len2>=sizeof(buf)-len1)
test_c(s1,s2);
}
}
The function PL_strlcpy() has been filed as bug #276845.
Before actually starting the test, it tried to fragment the malloc a bit. Just
to build up a 'real' environment. Here is the code:
static void *mem[1024];
for (it=0;it<1024;it++) mem[it]=0;
for (it=0;it<10000;it++) {
r=rand()%1024;
if (mem[r]) { PR_Free(mem); mem[r]=0; }
else mem[r]=PR_Malloc(rand()%1024);
}
I wonder about getting 4763 lines of error (all lines the same):
free(): invalid pointer 0x804d280!
Any explanation?
But however, I ignored this and made a measurement like this
for (len=0;len<=127;len++) {
memset(s,'a',len);
s[len]=0;
t1=getmicros();
for (it=0;it<1000000;it++) {
test_c(s,s);
}
printf("%d %lld\n",len*2,(getmicros()-t1)/1000);
}
#include <sys/time.h>
#include <time.h>
long long getmicros()
{
struct timeval tv;
gettimeofday(&tv,NULL);
return tv.tv_sec*1000000+tv.tv_usec;
}
Just one conclusion: c++ ctors/dtors are a big overhead. This is not compiler
dependant but a design decision (or flaw) of the c++ inventor(s). There had
been a good articel about c++ vs. java in a german computer magazin (c't). It
showed and proved that ctors/dtors in java are faster than in c++ (by design).
If we want fast routines we should always think about using (temporary)
classes. Sometimes we should avoid them even if the code does not look
'objectoriented' or 'c++ish' any more.
OK, here is my next question: how long do i have to wait for alloca()? ;-)
Comment 20•20 years ago
|
||
Tim: this is very interesting. can you please provide complete source code for the various test cases? thanks!
| Reporter | ||
Comment 21•20 years ago
|
||
please do not wonder. I raped 'TestStrings.cpp' from xpcom/tests, so I did not have to hassle with the makefiles or anything. I made output for a-e and put this into a gfx tool, which I wrote a while ago. I can't give you gfx tool, it belongs the company i am working for.
Comment 22•20 years ago
|
||
There is something misleading about these tests since test (a) and test (e) should overlap up until the point where the nsCAutoString in test (a) is forced to allocate from the heap. I think that the overhead of the nsCAutoString API is obscuring the picture here. Yes, it shows that we should reduce the overhead of the nsCAutoString API. A better way to test this would be to modify the code for nsCAutoString and simply change the allocator. For test (e) you could change nsCAutoString to have a 256 byte stack buffer, and for test (c) you could change nsCAutoString's allocator to use alloca instead of malloc. Then, you would be doing a fair comparison.
Comment 23•20 years ago
|
||
what options do you compile with? may I ask which c't edition contains that comparison? this interests me...
| Reporter | ||
Comment 24•20 years ago
|
||
I compile 'non-optimized', set in .mozconfig. But notice, that I do not claim the results are 'water-proof'. I just wanted to show a tendency and make the developers aware of dtor/ctor implications. Hmmm... after all I just want to use alloca for further patches within C and C++ sources.
| Reporter | ||
Comment 25•20 years ago
|
||
The articles are in c't 19/03 and 21/03.
Comment 26•20 years ago
|
||
--enable-debug build disable inlining. I do not think that such benchmarks can show usable results...
| Reporter | ||
Comment 27•20 years ago
|
||
this is the same with statistics: never trust them if you didn't fake them ;-) before we go into too many iterations: just make the comparisons with your own favorite settings (the source code is attached). i assume that the result will differ in quantity but not in quality. the next iteration will be with optimization on ;-)
Comment 28•20 years ago
|
||
Tim: it's pointless to compare allocators in the raw vs. the allocators used by nsAutoString internally. either hack nsAutoString to use different allocators and test that or don't include nsAutoString in the comparisons. more importantly: i can see that alloca might be useful in some cases, but what are those cases? have you collected a list of places in the mozilla codebase where you would make use of alloca?
| Reporter | ||
Comment 29•20 years ago
|
||
biesi: I completely recompiled the mozilla source without debug and made the test again, no gfx, i just compared the numbers roughly: a) and d) are now slightly slower (???) b) ist slightly faster c) stays the same but maybe something is wrong with 'make clean' - sometimes i experienced compiling problems (then I had to: 'make clean','./configure','make clean') Here is how TestStrings compiled: TestStrings.cpp c++ -o TestStrings.o -c -DOSTYPE=\"Linux2.6\" -DOSARCH=\"Linux\" -I./../ds -I./services -I../../dist/include/string -I../../dist/include/xpcom -I../../dist/include -I/usr/oms/src/mozilla/mozilla/dist/include/nspr -I/usr/X11R6/include -fPIC -I/usr/X11R6/include -fno-rtti -fno-exceptions -Wall -Wconversion -Wpointer-arith -Wcast-align -Woverloaded-virtual -Wsynth -Wno-ctor-dtor-privacy -Wno-non-virtual-dtor -Wno-long-long -pedantic -fshort-wchar -pthread -pipe -DNDEBUG -DTRIMMED -O -I/usr/X11R6/include -DMOZILLA_CLIENT -include ../../mozilla-config.h -Wp,-MD,.deps/TestStrings.pp TestStrings.cpp TestStrings.cpp: In function `int main(int, char**)': TestStrings.cpp:697: warning: unused variable `char*buf' c++ -I/usr/X11R6/include -fno-rtti -fno-exceptions -Wall -Wconversion -Wpointer-arith -Wcast-align -Woverloaded-virtual -Wsynth -Wno-ctor-dtor-privacy -Wno-non-virtual-dtor -Wno-long-long -pedantic -fshort-wchar -pthread -pipe -DNDEBUG -DTRIMMED -O -o TestStrings TestStrings.o -L../../dist/bin -L../../dist/lib -L../../dist/bin -lxpcom -lxpcom_core -L/usr/oms/src/mozilla/mozilla/dist/lib -lplds4 -lplc4 -lnspr4 -lpthread -ldl -ldl -lm
| Reporter | ||
Comment 30•20 years ago
|
||
Darin, it was never my intention to make nsCAutoString obsolete. The only reason why I made the comparisons were comments #1 and #4. Maybe the comparison was not useless and helps investigating for a faster nsCAutoString (but this is definitely not my job). Just one example of needless nsCAutoString usage: http://lxr.mozilla.org/mozilla1.7/source/browser/components/migration/src/nsOperaProfileMigrator.cpp#503 (lines 503-505 could be replaced by *aResult = atoi(val); The comparison shows that strlen+alloca+strcpy is faster than strlen+malloc+strcpy+free. Much faster for short strings. Ok, back to alloca. http://lxr.mozilla.org/mozilla1.7/source/dbm/src/nsres.c#284 (either use alloca instead of malloc or use a C99 variable-length array. Question: do we still support C89 C compilers, if yes why and which?) http://lxr.mozilla.org/mozilla1.7/source/embedding/browser/activex/src/control/ActiveScriptSite.cpp#172 (same here) http://lxr.mozilla.org/mozilla1.7/source/expat/xmlwf/readfilemap.c#54 (same here) http://lxr.mozilla.org/mozilla1.7/source/mailnews/local/src/nsMailboxService.cpp#97 (same here) http://lxr.mozilla.org/mozilla1.7/source/mailnews/addrbook/src/nsVCardObj.cpp#588 (an example of implicit malloc. can be replaced by alloca+strcpy) http://lxr.mozilla.org/mozilla1.7/source/mailnews/extensions/mdn/src/nsMsgMdnGenerator.cpp#473 (ughh... nothing to do with alloca but I stumbled over this ****. maybe the sourcecode should be completely rewritten) http://lxr.mozilla.org/mozilla1.7/source/mailnews/local/src/nsPop3Protocol.cpp#2688 (could be replaced by a static char buffer) http://lxr.mozilla.org/mozilla1.7/source/mailnews/mime/cthandlers/vcard/mimevcrd.cpp#575 (good example for alloca. since we have several lines doing memory allocation but just one line for freeing)
Comment 31•20 years ago
|
||
OK, that list is interesting. Did you generate that list by inspecting the code or did you use some profiling tool to determine that those are hot spots worth fixing?
| Reporter | ||
Comment 32•20 years ago
|
||
I manually search for hot spots, which are easy to find (typical misuse of e.g. strcpy, strcat, strncpy, strncat, sprintf, malloc for temp buffers etc.). When compiling the mozilla project i also see lots of warnings (gcc 3.3), some of them might be dangerous like 'use of uninitialized variable'. Also, did you know that gcc is able to check the format string against the argument types for printf like functions? I can not see that this feature is used within the mozilla code. I will create a bug with patches for PR_smprintf and similar functions.
Updated•20 years ago
|
Version: unspecified → 1.7 Branch
Updated•20 years ago
|
Version: 1.7 Branch → Trunk
Comment 33•19 years ago
|
||
This is an automated message, with ID "auto-resolve01". This bug has had no comments for a long time. Statistically, we have found that bug reports that have not been confirmed by a second user after three months are highly unlikely to be the source of a fix to the code. While your input is very important to us, our resources are limited and so we are asking for your help in focussing our efforts. If you can still reproduce this problem in the latest version of the product (see below for how to obtain a copy) or, for feature requests, if it's not present in the latest version and you still believe we should implement it, please visit the URL of this bug (given at the top of this mail) and add a comment to that effect, giving more reproduction information if you have it. If it is not a problem any longer, you need take no action. If this bug is not changed in any way in the next two weeks, it will be automatically resolved. Thank you for your help in this matter. The latest beta releases can be obtained from: Firefox: http://www.mozilla.org/projects/firefox/ Thunderbird: http://www.mozilla.org/products/thunderbird/releases/1.5beta1.html Seamonkey: http://www.mozilla.org/projects/seamonkey/
Updated•19 years ago
|
Status: UNCONFIRMED → NEW
Ever confirmed: true
Product: Mozilla Application Suite → Core
Comment 34•8 years ago
|
||
I'm going to guess that memory changes in Gecko in the 12 years since this bug was updated have addressed matters.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → INCOMPLETE
Updated•7 years ago
|
Product: Core → Firefox Build System
You need to log in
before you can comment on or make changes to this bug.
Description
•