Closed
Bug 94580
Opened 23 years ago
Closed 23 years ago
speedup JS_strdup() ?
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
VERIFIED
FIXED
People
(Reporter: bernard.alleysson, Assigned: bernard.alleysson)
Details
(Keywords: perf)
Attachments
(2 files, 1 obsolete file)
2.76 KB,
patch
|
Details | Diff | Splinter Review | |
628 bytes,
patch
|
brendan
:
review+
shaver
:
superreview+
|
Details | Diff | Splinter Review |
from jsapi.c 1431 JS_PUBLIC_API(char *) 1432 JS_strdup(JSContext *cx, const char *s) 1433 { 1434 char *p = (char *) JS_malloc(cx, strlen(s) + 1); 1435 if (!p) 1436 return NULL; 1437 return strcpy(p, s); 1438 } I've always thought that " JS_PUBLIC_API(char *) JS_strdup(JSContext *cx, const char *s) { int (or size_t ?) n = strlen(s) + 1; char *p = (char *) JS_malloc(cx, n); if (!p) return NULL; return memcpy(p, s, n); } " would be faster because it uses the fact that we know the string len in advance maybe I'm just wrong ? this code seems ok because it is in fact already used in the JS engine, at least in js_NewStringCopyZ() in an inlined form if one will correct this then the same bug applies to NSPR (function PL_strdup)
Comment 1•23 years ago
|
||
cc'ing Brendan on this one -
Assignee: rogerl → khanson
Status: UNCONFIRMED → NEW
Ever confirmed: true
Comment 2•23 years ago
|
||
I have no idea whether the patch helps or hurts performance. strcpy can actually be faster than memcpy because it tests the last-copied char against '\0', instead of having to maintain a count-down induction variable or check a pointer against a fencepost. Without some evidence that this change actually speeds things up, I question whether this bug is valid. I'm also curious to see compiled code (optimized) for before and after, on at least MSVC and modern gcc. /be
Comment 3•23 years ago
|
||
IMHO memcpy is usualy faster (at least on Intel), because it uses 32-bit copy instructions (rep movsd - I tested it on Watcom 10.x) which automaticaly decrements counter register (ECX). Other libraries/compilers may use other processor specific instructions (such as MMX which copies 8 bytes per cycle which is even faster). Of course the code is not as simple as for strcpy, because it must cope with rest of copied block (if it does not round up to 4 bytes), so copying of longer strings would benefit more. But maybe I'm missing something....
Assignee | ||
Comment 4•23 years ago
|
||
Assignee | ||
Comment 5•23 years ago
|
||
OK, so as the reporter of that stupid bug I made some measurments I'm running Win2K with VC++6.0 I've just attached strdup.cpp I used with my tests Here are the results MaxSpeed (/O2) MinSize (/O1) Slow(strdup_slow()) 4350 2490 Fast(strdup_fast()) 3160 2420 Numbers shown are the average results of 10 run of each test (please note that the test will eat up to 200Mo of memory, feel free to add free() if you like) I've compared times with /O2 where strcpy() and memcpy() are inlined with very simple assembly code against /O1 where strcpy() and memcpy() use some (tricky) assembly code from the MS runtime library (src are available) Results: 1) strdup_fast() is always a win (not by much with /O1) 2)inlined functions (/O2 implies /Oi) are slower than sophisticated library functions Comments are welcome
Assignee | ||
Comment 6•23 years ago
|
||
Here is the generated code (VC++ with /O2, strlen, strcpy) (/O1 version code is useless because the complexity is hidden into library functions) 1)strdup_fast() (memcpy instead of strcpy) ?strdup_fast@@YAPADPBD@Z PROC NEAR ; strdup_fast, COMDAT ; 7 : { push ebx push esi ; 8 : void *s; ; 9 : size_t n = strlen(source) + 1; mov esi, DWORD PTR _source$[esp+4] push edi mov edi, esi or ecx, -1 xor eax, eax repne scasb not ecx dec ecx mov ebx, ecx inc ebx ; 10 : if (NULL == (s = malloc(n))) push ebx call _malloc add esp, 4 test eax, eax jne SHORT $L52071 pop edi pop esi pop ebx ; 13 : } ret 0 $L52071: ; 11 : return NULL; ; 12 : return (char *)memcpy(s, source, n); mov ecx, ebx mov edi, eax mov edx, ecx shr ecx, 2 rep movsd mov ecx, edx and ecx, 3 rep movsb pop edi pop esi pop ebx ; 13 : } 2)strdup_slow() (from JS) ?strdup_slow@@YAPADPBD@Z PROC NEAR ; strdup_slow, COMDAT ; 16 : { push esi ; 17 : char *s; ; 18 : size_t n = strlen(source) + 1; mov esi, DWORD PTR _source$[esp] push edi mov edi, esi or ecx, -1 xor eax, eax repne scasb not ecx ; 19 : if (NULL == (s = (char *)malloc(n))) push ecx call _malloc mov edx, eax add esp, 4 test edx, edx jne SHORT $L52079 pop edi pop esi ; 22 : } ret 0 $L52079: ; 20 : return NULL; ; 21 : return strcpy(s, source); mov edi, esi or ecx, -1 xor eax, eax push ebx repne scasb not ecx sub edi, ecx mov esi, edi mov ebx, ecx mov edi, edx mov eax, edi shr ecx, 2 rep movsd mov ecx, ebx pop ebx and ecx, 3 rep movsb pop edi pop esi ; 22 : }
Comment 7•23 years ago
|
||
memcpy wins on x86/Windows -- thanks. Probably with GCC-latest too, which would be good to test. Anyway, I'm sold. How about a patch against the JS C source, for credit and code-review? /be
Assignee: khanson → balleysson
My results with gcc 2.96-98 -O2: strdup_slow: 1.309 sec strdup_fast: 1.268 sec strdup(3): 1.315 sec (which surprises me quite a bit) My results with gcc 3.0.1-3 -O2: strdup_slow: 1.229 sec strdup_fast: 1.178 sec strdup(3): 1.280 sec Looks like a win for me (Athlon processor). I don't understand why strdup(3) sucks so much, but hey. But if I turn off optimizations with gcc3, I get: strdup_slow: 1.217 sec strdup_fast: 1.212 sec strdup(3): 1.316 sec With gcc 2.96-98 and no optimations, strdup_fast is 1.190. I don't get it.
Assignee | ||
Comment 9•23 years ago
|
||
so with gcc strdup_fast() is always the fastest ... and strdup() always the slowest ! patch coming...
Assignee | ||
Comment 10•23 years ago
|
||
Comment 11•23 years ago
|
||
Comment on attachment 51266 [details] [diff] [review] use of memcpy instead of strcpy Nit, not something you need to care about unless you're feeling generous: JS "house style" sorts declarations by first-use order and has an empty line between decls and statements, usually. shaver, sr? /be
Attachment #51266 -
Flags: review+
Assignee | ||
Comment 12•23 years ago
|
||
Updated•23 years ago
|
Attachment #51266 -
Attachment is obsolete: true
Comment 13•23 years ago
|
||
Comment on attachment 51276 [details] [diff] [review] update after brendan's comment Thanks! r=brendan@mozilla.org, upgrade to sr= if you need to (shaver is usually fast to follow up my r=). /be
Attachment #51276 -
Flags: review+
Comment on attachment 51276 [details] [diff] [review] update after brendan's comment I'm so predictable. sr=shaver, thanks for the great research.
Attachment #51276 -
Flags: superreview+
Comment 15•23 years ago
|
||
Patch checked in: [~/src/trunk/mozilla/js/src]$ cvs ci -m"Check in patch for bug 94580, thanks to Bernard Alleysson <balleysson@bigfoot.com> for researching and writing it (r=me, sr=shaver)." jsapi.c Checking in jsapi.c; /cvsroot/mozilla/js/src/jsapi.c,v <-- jsapi.c new revision: 3.111; previous revision: 3.110 done Thanks, /be
Status: NEW → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Comment 16•23 years ago
|
||
Great fix! Can we have this in PL_strdup (strdup.c) too?
You need to log in
before you can comment on or make changes to this bug.
Description
•