Status

()

Core
JavaScript Engine
--
trivial
VERIFIED FIXED
17 years ago
16 years ago

People

(Reporter: Bernard Alleysson, Assigned: Bernard Alleysson)

Tracking

({perf})

Trunk
x86
Windows 2000
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments, 1 obsolete attachment)

(Assignee)

Description

17 years ago
from jsapi.c

1431 JS_PUBLIC_API(char *)
1432 JS_strdup(JSContext *cx, const char *s)
1433 {
1434 char *p = (char *) JS_malloc(cx, strlen(s) + 1);
1435 if (!p)
1436 return NULL;
1437 return strcpy(p, s);
1438 }

I've always thought that 

"
JS_PUBLIC_API(char *)
JS_strdup(JSContext *cx, const char *s)
{
  int (or size_t ?) n = strlen(s) + 1;
  char *p = (char *) JS_malloc(cx, n);
  if (!p)
    return NULL;
  return memcpy(p, s, n);
}
"

would be faster because it uses the fact that we know the string len in advance
 maybe I'm just wrong ?

this code seems ok because it is in fact already used in the JS engine, at least
in js_NewStringCopyZ() in an inlined form

if one will correct this then the same bug applies to NSPR (function PL_strdup)

Updated

17 years ago
Keywords: perf

Comment 1

17 years ago
cc'ing Brendan on this one - 
Assignee: rogerl → khanson
Status: UNCONFIRMED → NEW
Ever confirmed: true
I have no idea whether the patch helps or hurts performance.  strcpy can
actually be faster than memcpy because it tests the last-copied char against
'\0', instead of having to maintain a count-down induction variable or check a
pointer against a fencepost.  Without some evidence that this change actually
speeds things up, I question whether this bug is valid.

I'm also curious to see compiled code (optimized) for before and after, on at
least MSVC and modern gcc.

/be

Comment 3

17 years ago
IMHO memcpy is usualy faster (at least on Intel), because it uses 32-bit copy
instructions (rep movsd - I tested it on Watcom 10.x) which automaticaly
decrements counter register (ECX). Other libraries/compilers may use other
processor specific instructions (such as MMX which copies 8 bytes per cycle
which is even faster). Of course the code is not as simple as for strcpy,
because it must cope with rest of copied block (if it does not round up to 4
bytes), so copying of longer strings would benefit more. But maybe I'm missing
something....
(Assignee)

Comment 4

17 years ago
Created attachment 51230 [details] [diff] [review]
strdup.cpp
(Assignee)

Comment 5

17 years ago
OK, so as the reporter of that stupid bug I made some measurments

I'm running Win2K with VC++6.0
I've just attached strdup.cpp I used with my tests

Here are the results

                        MaxSpeed (/O2)          MinSize (/O1)

Slow(strdup_slow())       4350                     2490
Fast(strdup_fast())       3160                     2420


Numbers shown are the average results of 10 run of each test (please note that
the test will eat up to 200Mo of memory, feel free to add free() if you like)

I've compared times with /O2 where strcpy() and memcpy() are inlined with very
simple assembly code against /O1 where strcpy() and memcpy() use some (tricky)
assembly code from the MS runtime library (src are available)

Results:

1) strdup_fast() is always a win (not by much with /O1)

2)inlined functions (/O2 implies /Oi) are slower than sophisticated library
functions

Comments are welcome



(Assignee)

Comment 6

17 years ago
Here is the generated code (VC++ with /O2, strlen, strcpy) 
(/O1 version code is useless because the complexity is hidden into library
functions)

1)strdup_fast() (memcpy instead of strcpy)

?strdup_fast@@YAPADPBD@Z PROC NEAR			; strdup_fast, COMDAT

; 7    : {

	push	ebx
	push	esi

; 8    : 	void *s;
; 9    : 	size_t n = strlen(source) + 1;

	mov	esi, DWORD PTR _source$[esp+4]
	push	edi
	mov	edi, esi
	or	ecx, -1
	xor	eax, eax
	repne scasb
	not	ecx
	dec	ecx
	mov	ebx, ecx
	inc	ebx

; 10   : 	if (NULL == (s = malloc(n)))

	push	ebx
	call	_malloc
	add	esp, 4
	test	eax, eax
	jne	SHORT $L52071
	pop	edi
	pop	esi
	pop	ebx

; 13   : }

	ret	0
$L52071:

; 11   : 		return NULL;
; 12   : 	return (char *)memcpy(s, source, n);

	mov	ecx, ebx
	mov	edi, eax
	mov	edx, ecx
	shr	ecx, 2
	rep movsd
	mov	ecx, edx
	and	ecx, 3
	rep movsb
	pop	edi
	pop	esi
	pop	ebx

; 13   : }

2)strdup_slow() (from JS)

?strdup_slow@@YAPADPBD@Z PROC NEAR			; strdup_slow, COMDAT

; 16   : {

	push	esi

; 17   : 	char *s;
; 18   : 	size_t n = strlen(source) + 1;

	mov	esi, DWORD PTR _source$[esp]
	push	edi
	mov	edi, esi
	or	ecx, -1
	xor	eax, eax
	repne scasb
	not	ecx

; 19   : 	if (NULL == (s = (char *)malloc(n)))

	push	ecx
	call	_malloc
	mov	edx, eax
	add	esp, 4
	test	edx, edx
	jne	SHORT $L52079
	pop	edi
	pop	esi

; 22   : }

	ret	0
$L52079:

; 20   : 		return NULL;
; 21   : 	return strcpy(s, source);

	mov	edi, esi
	or	ecx, -1
	xor	eax, eax
	push	ebx
	repne scasb
	not	ecx
	sub	edi, ecx
	mov	esi, edi
	mov	ebx, ecx
	mov	edi, edx
	mov	eax, edi
	shr	ecx, 2
	rep movsd
	mov	ecx, ebx
	pop	ebx
	and	ecx, 3
	rep movsb
	pop	edi
	pop	esi

; 22   : }

memcpy wins on x86/Windows -- thanks.  Probably with GCC-latest too, which would
be good to test.  Anyway, I'm sold.  How about a patch against the JS C source,
for credit and code-review?

/be
Assignee: khanson → balleysson
My results with gcc 2.96-98 -O2:

strdup_slow: 1.309 sec
strdup_fast: 1.268 sec
strdup(3):   1.315 sec (which surprises me quite a bit)
My results with gcc 3.0.1-3 -O2:

strdup_slow: 1.229 sec
strdup_fast: 1.178 sec
strdup(3):   1.280 sec
Looks like a win for me (Athlon processor).  I don't understand why strdup(3)
sucks so much, but hey.

But if I turn off optimizations with gcc3, I get:

strdup_slow: 1.217 sec
strdup_fast: 1.212 sec
strdup(3):   1.316 sec
With gcc 2.96-98 and no optimations, strdup_fast is 1.190.

I don't get it.
(Assignee)

Comment 9

17 years ago
so with gcc strdup_fast() is always the fastest
... and strdup() always the slowest !

patch coming...

(Assignee)

Comment 10

17 years ago
Created attachment 51266 [details] [diff] [review]
use of memcpy instead of strcpy
Comment on attachment 51266 [details] [diff] [review]
use of memcpy instead of strcpy

Nit, not something you need to care about unless you're feeling generous:
JS "house style" sorts declarations by first-use order and has an empty line between
decls and statements, usually.

shaver, sr?
/be
Attachment #51266 - Flags: review+
(Assignee)

Comment 12

17 years ago
Created attachment 51276 [details] [diff] [review]
update after brendan's comment

Updated

17 years ago
Attachment #51266 - Attachment is obsolete: true
Comment on attachment 51276 [details] [diff] [review]
update after brendan's comment

Thanks!  r=brendan@mozilla.org, upgrade to sr= if you need to (shaver is usually fast to follow up my r=).

/be
Attachment #51276 - Flags: review+
Comment on attachment 51276 [details] [diff] [review]
update after brendan's comment

I'm so predictable.  sr=shaver, thanks for the great research.
Attachment #51276 - Flags: superreview+
Patch checked in:

[~/src/trunk/mozilla/js/src]$ cvs ci -m"Check in patch for bug 94580, thanks to
Bernard Alleysson <balleysson@bigfoot.com> for researching and writing it (r=me,
sr=shaver)." jsapi.c
Checking in jsapi.c;
/cvsroot/mozilla/js/src/jsapi.c,v  <--  jsapi.c
new revision: 3.111; previous revision: 3.110
done

Thanks,

/be
Status: NEW → RESOLVED
Last Resolved: 17 years ago
Resolution: --- → FIXED

Comment 16

17 years ago
Great fix! Can we have this in PL_strdup (strdup.c) too?

Comment 17

17 years ago
Marking Verified - 
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.