Closed Bug 249652 Opened 21 years ago Closed 21 years ago

Use registers for passing parameters on gcc/x86

Categories

(SeaMonkey :: General, defect)

x86
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: bryner, Assigned: bryner)

Details

(Keywords: perf)

Attachments

(1 file, 1 obsolete file)

gcc has an attribute 'regparm' on x86 that allows using up to 3 registers to pass arguments, rather than pushing them on the stack. It nearly always results in smaller and faster code. (MSVC does this for the |this| pointer in C++ by default). Unfortunately, we can't flip this globally since there's no way for it to know that system library functions don't use this calling convention (the headers aren't annotated with regparm(0)). However, since this is a per-function-call overhead we can get the biggest wins by using regparm on the most frequently called functions. I compiled such a list using Quantify on Windows, and then applied regparm to the top 10 or so non-inlined functions that showed up. The result is about a 150KB reduction in code size and about a 6% Tp win, using gcc 3.3.3. (gcc 3.3.3 fixed a bug in applying attributes to destructors, which is huge because nsCOMPtr_base::~nsCOMPtr base is perhaps the single most frequently called function). Purely looking at code size, any function which has many call sites would be great to change to regparm. That probably has a fair amount of overlap with the most frequently called functions but there could be some that I missed. We have to be careful about using this in cases where we need to preserve binary compatibility; it also can't be used on scriptable methods unless we hack xptcall so that it can know the calling convention to use (we'd have to support both unless we decided to break binary compatibility across the board).
Attached patch patch (obsolete) — Splinter Review
In case anyone is wondering about nsRuleNode::GetStyleData, it ends up not being inlined at -Os. Also, I duplicated the #ifdefs in jsdhash/pldhash because I didn't want to introduce new header dependencies... if this went into NSPR I guess that wouldn't be needed. Darin, I tried applying this to nsTString.h but for some reason that actually increased code size. Worked well on nsTAString.h though.
Attachment #152229 - Flags: superreview?(darin)
Attachment #152229 - Flags: review?(dbaron)
1) I suggest a broader macro, say NS_FASTCALL, because you can a) do things like "__attribute__ ((regparm(3),stdcall))" and b) the macro name isn't tied to a particular attribute. Maybe have a stdcall macro. Regparm isn't always a perfirmance win. 2) I think you want a NS_CDECL macro so you can force the compiler to compile the xptcall stuff correctly. A long time ago I played around with regparm for xptcall and I didn't see any speed improvement. You have to save the function arguments somewhere so you wind up putting them on the stack anyway.
(In reply to comment #2) > 1) I suggest a broader macro, say NS_FASTCALL, because you can a) do things like > "__attribute__ ((regparm(3),stdcall))" and b) the macro name isn't tied to a > particular attribute. Maybe have a stdcall macro. Regparm isn't always a > perfirmance win. You're suggesting a broader macro, then you're suggesting finer-grained macros. How about we keep NS_REGPARM as I have it here? stdcall won't be relevant for most of the functions in this patch because they don't take > 3 arguments. > 2) I think you want a NS_CDECL macro so you can force the compiler to compile > the xptcall stuff correctly. A long time ago I played around with regparm for > xptcall and I didn't see any speed improvement. You have to save the function > arguments somewhere so you wind up putting them on the stack anyway. I'm not quite sure what you mean here; are you talking about making cdecl part of the NS_IMETHOD macro? That may be a good idea while our xptcall code depends on it being that way.
> You're suggesting a broader macro, then you're suggesting finer-grained > macros. I'm suggesting a broader xp quickcall macro whose actual implementation depends on the platform. Even on a given platform, the optimum may well depend on the compiler version, etc. For example, regparm(2) might be more useful than regparm(3) at certain optimization levels. Also, using both regparm and stdcall does no harm. The first three parameters are passed in the registers and the rest (if any) get the stdcall treatment. > I'm not quite sure what you mean here; are you talking about making cdecl part > of the NS_IMETHOD macro? That may be a good idea while our xptcall code > depends on it being that way. The xptcall stuff is hand-coded in assembly. The compiler has no say in its calling convention so, yes, I'm saying the convention should be explicitly stated, particularly if you're using non-standard conventions.
Similar optimization applies to MSVC builds too, if you #define NS_REGPARM __fastcall it will work on MSVC, 2 registers will be used for the first 2 parameters. In fact, a while ago I changed PR_EXTERN/JS_EXTERN and PR_CALLBACK/JS_DLL_CALLBACK to use __fastcall and it works very well, even if I had to add missing or misplaced PR_CALLBACK in function declarations / implementations all over the code. I also tried to use __fastcall globally, there's no problem with xpt call on MSVC because NS_IMETHOD explicitly use __stdcall (pascal calling convention) but problems were with NSS code because a) most of NSS exports don't use PR_EXTERN/NSS_EXTERN b) NSS loads external modules and use function pointers that needs __cdecl explicitly. Since then I've sticked with a __cdecl build with __fastcall PR_CALLBACK/PR_EXTERN. Hope this helps.
Attachment #152229 - Flags: superreview?(darin)
Attachment #152229 - Flags: review?(dbaron)
Attached patch patchSplinter Review
changed to NS_FASTCALL with stdcall, added regparm(0)+cdecl to NS_IMETHOD. (stdcall seems to make no difference in my tests, fwiw)
Attachment #152229 - Attachment is obsolete: true
Attachment #152262 - Flags: superreview?(darin)
Attachment #152262 - Flags: review?(dbaron)
the patch applies NS_FASTCALL to nsTAString.h but the implementation work is done in nsTSubstring.h. Looking at assembly code, I guess that if NS_FASTCALL was applied to nsTSubstring.h too, it would be more efficient because register parameters could be kept as is up to the real implementation call in nsTSubstring.h (at the moment they have to be pushed on the stack to call nsTSubstring.h methods)
(In reply to comment #7) > the patch applies NS_FASTCALL to nsTAString.h but the implementation work is > done in nsTSubstring.h. Looking at assembly code, I guess that if NS_FASTCALL > was applied to nsTSubstring.h too, it would be more efficient because register > parameters could be kept as is up to the real implementation call in > nsTSubstring.h (at the moment they have to be pushed on the stack to call > nsTSubstring.h methods) Good catch, I will try that.
I wonder if this is worth backporting to the AVIARY_1_0_20040515_BRANCH branch. It would be nice to get an extra 6% perf increase in Firefox and Thunderbird 1.0 if we can. I'm setting blocking-aviary1.0? so that the right people get their eyes on this.
Flags: blocking-aviary1.0?
Comment on attachment 152262 [details] [diff] [review] patch >Index: js/src/jsdhash.c >+static JSDHashEntryHdr * DHASH_REGPARM maybe rename this macro to DHASH_FASTCALL to be consistent with the change from NS_REGPARM to NS_FASTCALL? >Index: security/nss/cmd/addbuiltin/addbuiltin.c >Index: security/nss/lib/asn1/Makefile >Index: security/nss/lib/asn1/asn1.c >Index: security/nss/lib/asn1/asn1.h etc. looks like these changes aren't meant to be part of the patch. sr=darin
Attachment #152262 - Flags: superreview?(darin) → superreview+
Comment on attachment 152262 [details] [diff] [review] patch what darin said, plus you could follow the naming convention and use JS_DHASH_FASTCALL as the name (plify_jsdhash.sed in js/src will auto-translate that to what xpcom/ds/pldhash.[ch] need). /be
The patch applies NS_FASTCALL to nsQueryInterface operator() but it looks like it can be removed and inlined instead, I filed bug 250107 about this (with a patch)
Comment on attachment 152262 [details] [diff] [review] patch r=dbaron given the changes in comment 7, comment 10, and comment 11. Also, if there are issues where the compiler will not give errors if the NS_FASTCALL part of the signature doesn't match, please document that in the comments (e.g., when overriding a virtual/non-virtual function).
Attachment #152262 - Flags: review?(dbaron) → review+
This broke a whole bunch of tinderboxes, so I added && (__GNUC__ >= 3) to the ifdefs.
Marking as fixed. For whatever reason, this didn't help Tp on our tinderboxes, I'll be continuing to investigate that.
Status: NEW → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
Brian, could you take a look at the patch in bug 250107 ? Would you like to review it ? I expect the changes in that patch to be the same order of magnitude as the NS_FASTCALL changes. Maybe it would help you in your performance effort.
Flags: blocking-aviary1.0? → blocking-aviary1.0-
Product: Browser → Seamonkey
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: