Closed Bug 94580 Opened 23 years ago Closed 23 years ago

speedup JS_strdup() ?

Categories

(Core :: JavaScript Engine, defect)

x86
Windows 2000
defect
Not set
trivial

Tracking

()

VERIFIED FIXED

People

(Reporter: bernard.alleysson, Assigned: bernard.alleysson)

Details

(Keywords: perf)

Attachments

(2 files, 1 obsolete file)

from jsapi.c

1431 JS_PUBLIC_API(char *)
1432 JS_strdup(JSContext *cx, const char *s)
1433 {
1434 char *p = (char *) JS_malloc(cx, strlen(s) + 1);
1435 if (!p)
1436 return NULL;
1437 return strcpy(p, s);
1438 }

I've always thought that 

"
JS_PUBLIC_API(char *)
JS_strdup(JSContext *cx, const char *s)
{
  int (or size_t ?) n = strlen(s) + 1;
  char *p = (char *) JS_malloc(cx, n);
  if (!p)
    return NULL;
  return memcpy(p, s, n);
}
"

would be faster because it uses the fact that we know the string len in advance
 maybe I'm just wrong ?

this code seems ok because it is in fact already used in the JS engine, at least
in js_NewStringCopyZ() in an inlined form

if one will correct this then the same bug applies to NSPR (function PL_strdup)
Keywords: perf
cc'ing Brendan on this one - 
Assignee: rogerl → khanson
Status: UNCONFIRMED → NEW
Ever confirmed: true
I have no idea whether the patch helps or hurts performance.  strcpy can
actually be faster than memcpy because it tests the last-copied char against
'\0', instead of having to maintain a count-down induction variable or check a
pointer against a fencepost.  Without some evidence that this change actually
speeds things up, I question whether this bug is valid.

I'm also curious to see compiled code (optimized) for before and after, on at
least MSVC and modern gcc.

/be
IMHO memcpy is usualy faster (at least on Intel), because it uses 32-bit copy
instructions (rep movsd - I tested it on Watcom 10.x) which automaticaly
decrements counter register (ECX). Other libraries/compilers may use other
processor specific instructions (such as MMX which copies 8 bytes per cycle
which is even faster). Of course the code is not as simple as for strcpy,
because it must cope with rest of copied block (if it does not round up to 4
bytes), so copying of longer strings would benefit more. But maybe I'm missing
something....
Attached patch strdup.cppSplinter Review
OK, so as the reporter of that stupid bug I made some measurments

I'm running Win2K with VC++6.0
I've just attached strdup.cpp I used with my tests

Here are the results

                        MaxSpeed (/O2)          MinSize (/O1)

Slow(strdup_slow())       4350                     2490
Fast(strdup_fast())       3160                     2420


Numbers shown are the average results of 10 run of each test (please note that
the test will eat up to 200Mo of memory, feel free to add free() if you like)

I've compared times with /O2 where strcpy() and memcpy() are inlined with very
simple assembly code against /O1 where strcpy() and memcpy() use some (tricky)
assembly code from the MS runtime library (src are available)

Results:

1) strdup_fast() is always a win (not by much with /O1)

2)inlined functions (/O2 implies /Oi) are slower than sophisticated library
functions

Comments are welcome



Here is the generated code (VC++ with /O2, strlen, strcpy) 
(/O1 version code is useless because the complexity is hidden into library
functions)

1)strdup_fast() (memcpy instead of strcpy)

?strdup_fast@@YAPADPBD@Z PROC NEAR			; strdup_fast, COMDAT

; 7    : {

	push	ebx
	push	esi

; 8    : 	void *s;
; 9    : 	size_t n = strlen(source) + 1;

	mov	esi, DWORD PTR _source$[esp+4]
	push	edi
	mov	edi, esi
	or	ecx, -1
	xor	eax, eax
	repne scasb
	not	ecx
	dec	ecx
	mov	ebx, ecx
	inc	ebx

; 10   : 	if (NULL == (s = malloc(n)))

	push	ebx
	call	_malloc
	add	esp, 4
	test	eax, eax
	jne	SHORT $L52071
	pop	edi
	pop	esi
	pop	ebx

; 13   : }

	ret	0
$L52071:

; 11   : 		return NULL;
; 12   : 	return (char *)memcpy(s, source, n);

	mov	ecx, ebx
	mov	edi, eax
	mov	edx, ecx
	shr	ecx, 2
	rep movsd
	mov	ecx, edx
	and	ecx, 3
	rep movsb
	pop	edi
	pop	esi
	pop	ebx

; 13   : }

2)strdup_slow() (from JS)

?strdup_slow@@YAPADPBD@Z PROC NEAR			; strdup_slow, COMDAT

; 16   : {

	push	esi

; 17   : 	char *s;
; 18   : 	size_t n = strlen(source) + 1;

	mov	esi, DWORD PTR _source$[esp]
	push	edi
	mov	edi, esi
	or	ecx, -1
	xor	eax, eax
	repne scasb
	not	ecx

; 19   : 	if (NULL == (s = (char *)malloc(n)))

	push	ecx
	call	_malloc
	mov	edx, eax
	add	esp, 4
	test	edx, edx
	jne	SHORT $L52079
	pop	edi
	pop	esi

; 22   : }

	ret	0
$L52079:

; 20   : 		return NULL;
; 21   : 	return strcpy(s, source);

	mov	edi, esi
	or	ecx, -1
	xor	eax, eax
	push	ebx
	repne scasb
	not	ecx
	sub	edi, ecx
	mov	esi, edi
	mov	ebx, ecx
	mov	edi, edx
	mov	eax, edi
	shr	ecx, 2
	rep movsd
	mov	ecx, ebx
	pop	ebx
	and	ecx, 3
	rep movsb
	pop	edi
	pop	esi

; 22   : }

memcpy wins on x86/Windows -- thanks.  Probably with GCC-latest too, which would
be good to test.  Anyway, I'm sold.  How about a patch against the JS C source,
for credit and code-review?

/be
Assignee: khanson → balleysson
My results with gcc 2.96-98 -O2:

strdup_slow: 1.309 sec
strdup_fast: 1.268 sec
strdup(3):   1.315 sec (which surprises me quite a bit)
My results with gcc 3.0.1-3 -O2:

strdup_slow: 1.229 sec
strdup_fast: 1.178 sec
strdup(3):   1.280 sec
Looks like a win for me (Athlon processor).  I don't understand why strdup(3)
sucks so much, but hey.

But if I turn off optimizations with gcc3, I get:

strdup_slow: 1.217 sec
strdup_fast: 1.212 sec
strdup(3):   1.316 sec
With gcc 2.96-98 and no optimations, strdup_fast is 1.190.

I don't get it.
so with gcc strdup_fast() is always the fastest
... and strdup() always the slowest !

patch coming...

Attached patch use of memcpy instead of strcpy (obsolete) — Splinter Review
Comment on attachment 51266 [details] [diff] [review]
use of memcpy instead of strcpy

Nit, not something you need to care about unless you're feeling generous:
JS "house style" sorts declarations by first-use order and has an empty line between
decls and statements, usually.

shaver, sr?
/be
Attachment #51266 - Flags: review+
Attachment #51266 - Attachment is obsolete: true
Comment on attachment 51276 [details] [diff] [review]
update after brendan's comment

Thanks!  r=brendan@mozilla.org, upgrade to sr= if you need to (shaver is usually fast to follow up my r=).

/be
Attachment #51276 - Flags: review+
Comment on attachment 51276 [details] [diff] [review]
update after brendan's comment

I'm so predictable.  sr=shaver, thanks for the great research.
Attachment #51276 - Flags: superreview+
Patch checked in:

[~/src/trunk/mozilla/js/src]$ cvs ci -m"Check in patch for bug 94580, thanks to
Bernard Alleysson <balleysson@bigfoot.com> for researching and writing it (r=me,
sr=shaver)." jsapi.c
Checking in jsapi.c;
/cvsroot/mozilla/js/src/jsapi.c,v  <--  jsapi.c
new revision: 3.111; previous revision: 3.110
done

Thanks,

/be
Status: NEW → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Great fix! Can we have this in PL_strdup (strdup.c) too?
Marking Verified - 
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: