win32/xptcinvoke.cpp should be streamlined

RESOLVED FIXED

Status

()

Core
XPCOM
RESOLVED FIXED
16 years ago
16 years ago

People

(Reporter: John Bandhauer, Assigned: John Bandhauer)

Tracking

Trunk
x86
Windows 2000
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment)

(Assignee)

Description

16 years ago
jdunn pointed out that it was not optimal (speedwise) to do separate calls out
of XPTC_InvokeByIndex to count the stack words needed and then to copy the
words. We can instead be pessimistic about the space required rather than being
exact - and then just clean up the extra space on the way out.

I've experimented with this and discovered that this can reduce almost half of
the extra call overhead added by XPTC_InvokeByIndex (where call overhead is
measured as the difference betwen calling though XPTC_InvokeByIndex and calling
the target object directly).

With a test I'll attach I see something like this:

(old code...)
Doing 100000000 direct call iterations...
Doing 100000000 invoked call iterations...
 direct took 1.71 seconds
 invoke took 10.48 seconds
 So, invoke overhead was ~ 8.77 seconds (~ 84%)

(new code...)
Doing 100000000 direct call iterations...
Doing 100000000 invoked call iterations...
 direct took 1.69 seconds
 invoke took 6.65 seconds
 So, invoke overhead was ~ 4.96 seconds (~ 75%)

This varies from run to run on my laptop. But, the above is typical. I doubt
this translates into much in the browser. But, every little bit...

The patch adds __declspec(naked) to allow more control over the prolog and
epilog. I also added a short circuit path for cases where the method takes no
params.
This worksforme in debug and release builds. I'll give it a try under Purify and
run with it for a while.
(Assignee)

Comment 1

16 years ago
Created attachment 59071 [details] [diff] [review]
proposed change

Comment 2

16 years ago
Why not just eliminate the use of i

-    for(PRUint32 i = paramCount; i > 0; i--, d++, s++)
+    for(;paramCount > 0; paramCount--, d++, s++)

Under VC++ it generates the same code, not sure about other compilers. Might
generate better code under some compilers.
Comment on attachment 59071 [details] [diff] [review]
proposed change

sr=shaver.  (I don't know the assembly or Win32 calling conventions well 
enough to r= confidently.)
Attachment #59071 - Flags: superreview+
(Assignee)

Comment 4

16 years ago
dbradley: sure, why not. That change is in my tree. (Note that we don't compile
this code on other compilers :)

Can you bless this with an r=?

Comment 5

16 years ago
Comment on attachment 59071 [details] [diff] [review]
proposed change

Everything looks fine. I double checked the stack frame setup against the
compiler, just to make sure. There's so many ways to setup the frame on Intel.
Attachment #59071 - Flags: review+
(Assignee)

Comment 6

16 years ago
checked in.
Status: NEW → RESOLVED
Last Resolved: 16 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.