Closed Bug 94580 Opened 23 years ago Closed 23 years ago

speedup JS_strdup() ?

Tracking

()

Status:

VERIFIED FIXED

People

(Reporter: bernard.alleysson, Assigned: bernard.alleysson)

Details

(Keywords: perf)

Attachments

(2 files, 1 obsolete file)

strdup.cpp 23 years ago Bernard Alleysson 2.76 KB, patch		Details \| Diff \| Splinter Review
use of memcpy instead of strcpy 23 years ago Bernard Alleysson 625 bytes, patch	brendan : review+	Details \| Diff \| Splinter Review
update after brendan's comment 23 years ago Bernard Alleysson 628 bytes, patch	brendan : review+ shaver : superreview+	Details \| Diff \| Splinter Review

Bernard Alleysson

Assignee

Description

•

23 years ago

from jsapi.c

1431 JS_PUBLIC_API(char *)
1432 JS_strdup(JSContext *cx, const char *s)
1433 {
1434 char *p = (char *) JS_malloc(cx, strlen(s) + 1);
1435 if (!p)
1436 return NULL;
1437 return strcpy(p, s);
1438 }

I've always thought that 

"
JS_PUBLIC_API(char *)
JS_strdup(JSContext *cx, const char *s)
{
  int (or size_t ?) n = strlen(s) + 1;
  char *p = (char *) JS_malloc(cx, n);
  if (!p)
    return NULL;
  return memcpy(p, s, n);
}
"

would be faster because it uses the fact that we know the string len in advance
 maybe I'm just wrong ?

this code seems ok because it is in fact already used in the JS engine, at least
in js_NewStringCopyZ() in an inlined form

if one will correct this then the same bug applies to NSPR (function PL_strdup)

Jarrod Gray

Updated

•

23 years ago

Keywords: perf

Phil Schwartau

Comment 1

•

23 years ago

cc'ing Brendan on this one -

Assignee: rogerl → khanson

Status: UNCONFIRMED → NEW

Ever confirmed: true

Brendan Eich [:brendan]

Comment 2

•

23 years ago

I have no idea whether the patch helps or hurts performance.  strcpy can
actually be faster than memcpy because it tests the last-copied char against
'\0', instead of having to maintain a count-down induction variable or check a
pointer against a fencepost.  Without some evidence that this change actually
speeds things up, I question whether this bug is valid.

I'm also curious to see compiled code (optimized) for before and after, on at
least MSVC and modern gcc.

/be

Petr Kures

Comment 3

•

23 years ago

IMHO memcpy is usualy faster (at least on Intel), because it uses 32-bit copy
instructions (rep movsd - I tested it on Watcom 10.x) which automaticaly
decrements counter register (ECX). Other libraries/compilers may use other
processor specific instructions (such as MMX which copies 8 bytes per cycle
which is even faster). Of course the code is not as simple as for strcpy,
because it must cope with rest of copied block (if it does not round up to 4
bytes), so copying of longer strings would benefit more. But maybe I'm missing
something....

Bernard Alleysson

Assignee

Comment 4

•

23 years ago

Attached patch strdup.cpp — Details — Splinter Review

Bernard Alleysson

Assignee

Comment 5

•

23 years ago

OK, so as the reporter of that stupid bug I made some measurments

I'm running Win2K with VC++6.0
I've just attached strdup.cpp I used with my tests

Here are the results

                        MaxSpeed (/O2)          MinSize (/O1)

Slow(strdup_slow())       4350                     2490
Fast(strdup_fast())       3160                     2420


Numbers shown are the average results of 10 run of each test (please note that
the test will eat up to 200Mo of memory, feel free to add free() if you like)

I've compared times with /O2 where strcpy() and memcpy() are inlined with very
simple assembly code against /O1 where strcpy() and memcpy() use some (tricky)
assembly code from the MS runtime library (src are available)

Results:

1) strdup_fast() is always a win (not by much with /O1)

2)inlined functions (/O2 implies /Oi) are slower than sophisticated library
functions

Comments are welcome

Bernard Alleysson

Assignee

Comment 6

•

23 years ago

Here is the generated code (VC++ with /O2, strlen, strcpy) 
(/O1 version code is useless because the complexity is hidden into library
functions)

1)strdup_fast() (memcpy instead of strcpy)

?strdup_fast@@YAPADPBD@Z PROC NEAR			; strdup_fast, COMDAT

; 7    : {

	push	ebx
	push	esi

; 8    : 	void *s;
; 9    : 	size_t n = strlen(source) + 1;

	mov	esi, DWORD PTR _source$[esp+4]
	push	edi
	mov	edi, esi
	or	ecx, -1
	xor	eax, eax
	repne scasb
	not	ecx
	dec	ecx
	mov	ebx, ecx
	inc	ebx

; 10   : 	if (NULL == (s = malloc(n)))

	push	ebx
	call	_malloc
	add	esp, 4
	test	eax, eax
	jne	SHORT $L52071
	pop	edi
	pop	esi
	pop	ebx

; 13   : }

	ret	0
$L52071:

; 11   : 		return NULL;
; 12   : 	return (char *)memcpy(s, source, n);

	mov	ecx, ebx
	mov	edi, eax
	mov	edx, ecx
	shr	ecx, 2
	rep movsd
	mov	ecx, edx
	and	ecx, 3
	rep movsb
	pop	edi
	pop	esi
	pop	ebx

; 13   : }

2)strdup_slow() (from JS)

?strdup_slow@@YAPADPBD@Z PROC NEAR			; strdup_slow, COMDAT

; 16   : {

	push	esi

; 17   : 	char *s;
; 18   : 	size_t n = strlen(source) + 1;

	mov	esi, DWORD PTR _source$[esp]
	push	edi
	mov	edi, esi
	or	ecx, -1
	xor	eax, eax
	repne scasb
	not	ecx

; 19   : 	if (NULL == (s = (char *)malloc(n)))

	push	ecx
	call	_malloc
	mov	edx, eax
	add	esp, 4
	test	edx, edx
	jne	SHORT $L52079
	pop	edi
	pop	esi

; 22   : }

	ret	0
$L52079:

; 20   : 		return NULL;
; 21   : 	return strcpy(s, source);

	mov	edi, esi
	or	ecx, -1
	xor	eax, eax
	push	ebx
	repne scasb
	not	ecx
	sub	edi, ecx
	mov	esi, edi
	mov	ebx, ecx
	mov	edi, edx
	mov	eax, edi
	shr	ecx, 2
	rep movsd
	mov	ecx, ebx
	pop	ebx
	and	ecx, 3
	rep movsb
	pop	edi
	pop	esi

; 22   : }

Brendan Eich [:brendan]

Comment 7

•

23 years ago

memcpy wins on x86/Windows -- thanks.  Probably with GCC-latest too, which would
be good to test.  Anyway, I'm sold.  How about a patch against the JS C source,
for credit and code-review?

/be

Assignee: khanson → balleysson

Mike Shaver (:shaver -- probably not reading bugmail closely)

Comment 8

•

23 years ago

My results with gcc 2.96-98 -O2:

strdup_slow: 1.309 sec
strdup_fast: 1.268 sec
strdup(3):   1.315 sec (which surprises me quite a bit)
My results with gcc 3.0.1-3 -O2:

strdup_slow: 1.229 sec
strdup_fast: 1.178 sec
strdup(3):   1.280 sec
Looks like a win for me (Athlon processor).  I don't understand why strdup(3)
sucks so much, but hey.

But if I turn off optimizations with gcc3, I get:

strdup_slow: 1.217 sec
strdup_fast: 1.212 sec
strdup(3):   1.316 sec
With gcc 2.96-98 and no optimations, strdup_fast is 1.190.

I don't get it.

Bernard Alleysson

Assignee

Comment 9

•

23 years ago

so with gcc strdup_fast() is always the fastest
... and strdup() always the slowest !

patch coming...

Bernard Alleysson

Assignee

Comment 10

•

23 years ago

Attached patch use of memcpy instead of strcpy (obsolete) — Details — Splinter Review

Brendan Eich [:brendan]

Comment 11

•

23 years ago

Comment on attachment 51266 [details] [diff] [review]
use of memcpy instead of strcpy

Nit, not something you need to care about unless you're feeling generous:
JS "house style" sorts declarations by first-use order and has an empty line between
decls and statements, usually.

shaver, sr?
/be

Attachment #51266 - Flags: review+

Bernard Alleysson

Assignee

Comment 12

•

23 years ago

Attached patch update after brendan's comment — Details — Splinter Review

Brendan Eich [:brendan]

Updated

•

23 years ago

Attachment #51266 - Attachment is obsolete: true

Brendan Eich [:brendan]

Comment 13

•

23 years ago

Comment on attachment 51276 [details] [diff] [review]
update after brendan's comment

Thanks!  r=brendan@mozilla.org, upgrade to sr= if you need to (shaver is usually fast to follow up my r=).

/be

Attachment #51276 - Flags: review+

Mike Shaver (:shaver -- probably not reading bugmail closely)

Comment 14

•

23 years ago

Comment on attachment 51276 [details] [diff] [review]
update after brendan's comment

I'm so predictable.  sr=shaver, thanks for the great research.

Attachment #51276 - Flags: superreview+

Brendan Eich [:brendan]

Comment 15

•

23 years ago

Patch checked in:

[~/src/trunk/mozilla/js/src]$ cvs ci -m"Check in patch for bug 94580, thanks to
Bernard Alleysson <balleysson@bigfoot.com> for researching and writing it (r=me,
sr=shaver)." jsapi.c
Checking in jsapi.c;
/cvsroot/mozilla/js/src/jsapi.c,v  <--  jsapi.c
new revision: 3.111; previous revision: 3.110
done

Thanks,

/be

Status: NEW → RESOLVED

Closed: 23 years ago

Resolution: --- → FIXED

Arthur

Comment 16

•

23 years ago

Great fix! Can we have this in PL_strdup (strdup.c) too?

Phil Schwartau

Comment 17

•

23 years ago

Marking Verified -

Status: RESOLVED → VERIFIED

You need to log in before you can comment on or make changes to this bug.