Closed
Bug 94580
Opened 24 years ago
Closed 24 years ago
speedup JS_strdup() ?
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
VERIFIED
FIXED
People
(Reporter: bernard.alleysson, Assigned: bernard.alleysson)
Details
(Keywords: perf)
Attachments
(2 files, 1 obsolete file)
|
2.76 KB,
patch
|
Details | Diff | Splinter Review | |
|
628 bytes,
patch
|
brendan
:
review+
shaver
:
superreview+
|
Details | Diff | Splinter Review |
from jsapi.c
1431 JS_PUBLIC_API(char *)
1432 JS_strdup(JSContext *cx, const char *s)
1433 {
1434 char *p = (char *) JS_malloc(cx, strlen(s) + 1);
1435 if (!p)
1436 return NULL;
1437 return strcpy(p, s);
1438 }
I've always thought that
"
JS_PUBLIC_API(char *)
JS_strdup(JSContext *cx, const char *s)
{
int (or size_t ?) n = strlen(s) + 1;
char *p = (char *) JS_malloc(cx, n);
if (!p)
return NULL;
return memcpy(p, s, n);
}
"
would be faster because it uses the fact that we know the string len in advance
maybe I'm just wrong ?
this code seems ok because it is in fact already used in the JS engine, at least
in js_NewStringCopyZ() in an inlined form
if one will correct this then the same bug applies to NSPR (function PL_strdup)
Comment 1•24 years ago
|
||
cc'ing Brendan on this one -
Assignee: rogerl → khanson
Status: UNCONFIRMED → NEW
Ever confirmed: true
Comment 2•24 years ago
|
||
I have no idea whether the patch helps or hurts performance. strcpy can
actually be faster than memcpy because it tests the last-copied char against
'\0', instead of having to maintain a count-down induction variable or check a
pointer against a fencepost. Without some evidence that this change actually
speeds things up, I question whether this bug is valid.
I'm also curious to see compiled code (optimized) for before and after, on at
least MSVC and modern gcc.
/be
Comment 3•24 years ago
|
||
IMHO memcpy is usualy faster (at least on Intel), because it uses 32-bit copy
instructions (rep movsd - I tested it on Watcom 10.x) which automaticaly
decrements counter register (ECX). Other libraries/compilers may use other
processor specific instructions (such as MMX which copies 8 bytes per cycle
which is even faster). Of course the code is not as simple as for strcpy,
because it must cope with rest of copied block (if it does not round up to 4
bytes), so copying of longer strings would benefit more. But maybe I'm missing
something....
| Assignee | ||
Comment 4•24 years ago
|
||
| Assignee | ||
Comment 5•24 years ago
|
||
OK, so as the reporter of that stupid bug I made some measurments
I'm running Win2K with VC++6.0
I've just attached strdup.cpp I used with my tests
Here are the results
MaxSpeed (/O2) MinSize (/O1)
Slow(strdup_slow()) 4350 2490
Fast(strdup_fast()) 3160 2420
Numbers shown are the average results of 10 run of each test (please note that
the test will eat up to 200Mo of memory, feel free to add free() if you like)
I've compared times with /O2 where strcpy() and memcpy() are inlined with very
simple assembly code against /O1 where strcpy() and memcpy() use some (tricky)
assembly code from the MS runtime library (src are available)
Results:
1) strdup_fast() is always a win (not by much with /O1)
2)inlined functions (/O2 implies /Oi) are slower than sophisticated library
functions
Comments are welcome
| Assignee | ||
Comment 6•24 years ago
|
||
Here is the generated code (VC++ with /O2, strlen, strcpy)
(/O1 version code is useless because the complexity is hidden into library
functions)
1)strdup_fast() (memcpy instead of strcpy)
?strdup_fast@@YAPADPBD@Z PROC NEAR ; strdup_fast, COMDAT
; 7 : {
push ebx
push esi
; 8 : void *s;
; 9 : size_t n = strlen(source) + 1;
mov esi, DWORD PTR _source$[esp+4]
push edi
mov edi, esi
or ecx, -1
xor eax, eax
repne scasb
not ecx
dec ecx
mov ebx, ecx
inc ebx
; 10 : if (NULL == (s = malloc(n)))
push ebx
call _malloc
add esp, 4
test eax, eax
jne SHORT $L52071
pop edi
pop esi
pop ebx
; 13 : }
ret 0
$L52071:
; 11 : return NULL;
; 12 : return (char *)memcpy(s, source, n);
mov ecx, ebx
mov edi, eax
mov edx, ecx
shr ecx, 2
rep movsd
mov ecx, edx
and ecx, 3
rep movsb
pop edi
pop esi
pop ebx
; 13 : }
2)strdup_slow() (from JS)
?strdup_slow@@YAPADPBD@Z PROC NEAR ; strdup_slow, COMDAT
; 16 : {
push esi
; 17 : char *s;
; 18 : size_t n = strlen(source) + 1;
mov esi, DWORD PTR _source$[esp]
push edi
mov edi, esi
or ecx, -1
xor eax, eax
repne scasb
not ecx
; 19 : if (NULL == (s = (char *)malloc(n)))
push ecx
call _malloc
mov edx, eax
add esp, 4
test edx, edx
jne SHORT $L52079
pop edi
pop esi
; 22 : }
ret 0
$L52079:
; 20 : return NULL;
; 21 : return strcpy(s, source);
mov edi, esi
or ecx, -1
xor eax, eax
push ebx
repne scasb
not ecx
sub edi, ecx
mov esi, edi
mov ebx, ecx
mov edi, edx
mov eax, edi
shr ecx, 2
rep movsd
mov ecx, ebx
pop ebx
and ecx, 3
rep movsb
pop edi
pop esi
; 22 : }
Comment 7•24 years ago
|
||
memcpy wins on x86/Windows -- thanks. Probably with GCC-latest too, which would
be good to test. Anyway, I'm sold. How about a patch against the JS C source,
for credit and code-review?
/be
Assignee: khanson → balleysson
Comment 8•24 years ago
|
||
My results with gcc 2.96-98 -O2:
strdup_slow: 1.309 sec
strdup_fast: 1.268 sec
strdup(3): 1.315 sec (which surprises me quite a bit)
My results with gcc 3.0.1-3 -O2:
strdup_slow: 1.229 sec
strdup_fast: 1.178 sec
strdup(3): 1.280 sec
Looks like a win for me (Athlon processor). I don't understand why strdup(3)
sucks so much, but hey.
But if I turn off optimizations with gcc3, I get:
strdup_slow: 1.217 sec
strdup_fast: 1.212 sec
strdup(3): 1.316 sec
With gcc 2.96-98 and no optimations, strdup_fast is 1.190.
I don't get it.
| Assignee | ||
Comment 9•24 years ago
|
||
so with gcc strdup_fast() is always the fastest
... and strdup() always the slowest !
patch coming...
| Assignee | ||
Comment 10•24 years ago
|
||
Comment 11•24 years ago
|
||
Comment on attachment 51266 [details] [diff] [review]
use of memcpy instead of strcpy
Nit, not something you need to care about unless you're feeling generous:
JS "house style" sorts declarations by first-use order and has an empty line between
decls and statements, usually.
shaver, sr?
/be
Attachment #51266 -
Flags: review+
| Assignee | ||
Comment 12•24 years ago
|
||
Updated•24 years ago
|
Attachment #51266 -
Attachment is obsolete: true
Comment 13•24 years ago
|
||
Comment on attachment 51276 [details] [diff] [review]
update after brendan's comment
Thanks! r=brendan@mozilla.org, upgrade to sr= if you need to (shaver is usually fast to follow up my r=).
/be
Attachment #51276 -
Flags: review+
Comment 14•24 years ago
|
||
Comment on attachment 51276 [details] [diff] [review]
update after brendan's comment
I'm so predictable. sr=shaver, thanks for the great research.
Attachment #51276 -
Flags: superreview+
Comment 15•24 years ago
|
||
Patch checked in:
[~/src/trunk/mozilla/js/src]$ cvs ci -m"Check in patch for bug 94580, thanks to
Bernard Alleysson <balleysson@bigfoot.com> for researching and writing it (r=me,
sr=shaver)." jsapi.c
Checking in jsapi.c;
/cvsroot/mozilla/js/src/jsapi.c,v <-- jsapi.c
new revision: 3.111; previous revision: 3.110
done
Thanks,
/be
Status: NEW → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
Comment 16•24 years ago
|
||
Great fix! Can we have this in PL_strdup (strdup.c) too?
You need to log in
before you can comment on or make changes to this bug.
Description
•