jdunn pointed out that it was not optimal (speedwise) to do separate calls out of XPTC_InvokeByIndex to count the stack words needed and then to copy the words. We can instead be pessimistic about the space required rather than being exact - and then just clean up the extra space on the way out. I've experimented with this and discovered that this can reduce almost half of the extra call overhead added by XPTC_InvokeByIndex (where call overhead is measured as the difference betwen calling though XPTC_InvokeByIndex and calling the target object directly). With a test I'll attach I see something like this: (old code...) Doing 100000000 direct call iterations... Doing 100000000 invoked call iterations... direct took 1.71 seconds invoke took 10.48 seconds So, invoke overhead was ~ 8.77 seconds (~ 84%) (new code...) Doing 100000000 direct call iterations... Doing 100000000 invoked call iterations... direct took 1.69 seconds invoke took 6.65 seconds So, invoke overhead was ~ 4.96 seconds (~ 75%) This varies from run to run on my laptop. But, the above is typical. I doubt this translates into much in the browser. But, every little bit... The patch adds __declspec(naked) to allow more control over the prolog and epilog. I also added a short circuit path for cases where the method takes no params. This worksforme in debug and release builds. I'll give it a try under Purify and run with it for a while.
Why not just eliminate the use of i - for(PRUint32 i = paramCount; i > 0; i--, d++, s++) + for(;paramCount > 0; paramCount--, d++, s++) Under VC++ it generates the same code, not sure about other compilers. Might generate better code under some compilers.
Comment on attachment 59071 [details] [diff] [review] proposed change sr=shaver. (I don't know the assembly or Win32 calling conventions well enough to r= confidently.)
dbradley: sure, why not. That change is in my tree. (Note that we don't compile this code on other compilers :) Can you bless this with an r=?
Comment on attachment 59071 [details] [diff] [review] proposed change Everything looks fine. I double checked the stack frame setup against the compiler, just to make sure. There's so many ways to setup the frame on Intel.