Closed
Bug 249652
Opened 21 years ago
Closed 21 years ago
Use registers for passing parameters on gcc/x86
Categories
(SeaMonkey :: General, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: bryner, Assigned: bryner)
Details
(Keywords: perf)
Attachments
(1 file, 1 obsolete file)
|
122.13 KB,
patch
|
dbaron
:
review+
darin.moz
:
superreview+
|
Details | Diff | Splinter Review |
gcc has an attribute 'regparm' on x86 that allows using up to 3 registers to
pass arguments, rather than pushing them on the stack. It nearly always results
in smaller and faster code. (MSVC does this for the |this| pointer in C++ by
default).
Unfortunately, we can't flip this globally since there's no way for it to know
that system library functions don't use this calling convention (the headers
aren't annotated with regparm(0)). However, since this is a per-function-call
overhead we can get the biggest wins by using regparm on the most frequently
called functions. I compiled such a list using Quantify on Windows, and then
applied regparm to the top 10 or so non-inlined functions that showed up. The
result is about a 150KB reduction in code size and about a 6% Tp win, using gcc
3.3.3. (gcc 3.3.3 fixed a bug in applying attributes to destructors, which is
huge because nsCOMPtr_base::~nsCOMPtr base is perhaps the single most frequently
called function).
Purely looking at code size, any function which has many call sites would be
great to change to regparm. That probably has a fair amount of overlap with the
most frequently called functions but there could be some that I missed.
We have to be careful about using this in cases where we need to preserve binary
compatibility; it also can't be used on scriptable methods unless we hack
xptcall so that it can know the calling convention to use (we'd have to support
both unless we decided to break binary compatibility across the board).
| Assignee | ||
Comment 1•21 years ago
|
||
In case anyone is wondering about nsRuleNode::GetStyleData, it ends up not
being inlined at -Os. Also, I duplicated the #ifdefs in jsdhash/pldhash
because I didn't want to introduce new header dependencies... if this went into
NSPR I guess that wouldn't be needed.
Darin, I tried applying this to nsTString.h but for some reason that actually
increased code size. Worked well on nsTAString.h though.
| Assignee | ||
Updated•21 years ago
|
Attachment #152229 -
Flags: superreview?(darin)
Attachment #152229 -
Flags: review?(dbaron)
1) I suggest a broader macro, say NS_FASTCALL, because you can a) do things like
"__attribute__ ((regparm(3),stdcall))" and b) the macro name isn't tied to a
particular attribute. Maybe have a stdcall macro. Regparm isn't always a
perfirmance win.
2) I think you want a NS_CDECL macro so you can force the compiler to compile
the xptcall stuff correctly. A long time ago I played around with regparm for
xptcall and I didn't see any speed improvement. You have to save the function
arguments somewhere so you wind up putting them on the stack anyway.
| Assignee | ||
Comment 3•21 years ago
|
||
(In reply to comment #2)
> 1) I suggest a broader macro, say NS_FASTCALL, because you can a) do things like
> "__attribute__ ((regparm(3),stdcall))" and b) the macro name isn't tied to a
> particular attribute. Maybe have a stdcall macro. Regparm isn't always a
> perfirmance win.
You're suggesting a broader macro, then you're suggesting finer-grained macros.
How about we keep NS_REGPARM as I have it here? stdcall won't be relevant for
most of the functions in this patch because they don't take > 3 arguments.
> 2) I think you want a NS_CDECL macro so you can force the compiler to compile
> the xptcall stuff correctly. A long time ago I played around with regparm for
> xptcall and I didn't see any speed improvement. You have to save the function
> arguments somewhere so you wind up putting them on the stack anyway.
I'm not quite sure what you mean here; are you talking about making cdecl part
of the NS_IMETHOD macro? That may be a good idea while our xptcall code depends
on it being that way.
> You're suggesting a broader macro, then you're suggesting finer-grained
> macros.
I'm suggesting a broader xp quickcall macro whose actual implementation depends
on the platform. Even on a given platform, the optimum may well depend on the
compiler version, etc. For example, regparm(2) might be more useful than
regparm(3) at certain optimization levels.
Also, using both regparm and stdcall does no harm. The first three parameters
are passed in the registers and the rest (if any) get the stdcall treatment.
> I'm not quite sure what you mean here; are you talking about making cdecl part
> of the NS_IMETHOD macro? That may be a good idea while our xptcall code
> depends on it being that way.
The xptcall stuff is hand-coded in assembly. The compiler has no say in its
calling convention so, yes, I'm saying the convention should be explicitly
stated, particularly if you're using non-standard conventions.
Comment 5•21 years ago
|
||
Similar optimization applies to MSVC builds too, if you
#define NS_REGPARM __fastcall
it will work on MSVC, 2 registers will be used for the first 2 parameters.
In fact, a while ago I changed PR_EXTERN/JS_EXTERN and
PR_CALLBACK/JS_DLL_CALLBACK to use __fastcall and it works very well, even if I
had to add missing or misplaced PR_CALLBACK in function declarations /
implementations all over the code.
I also tried to use __fastcall globally, there's no problem with xpt call on
MSVC because NS_IMETHOD explicitly use __stdcall (pascal calling convention) but
problems were with NSS code because a) most of NSS exports don't use
PR_EXTERN/NSS_EXTERN b) NSS loads external modules and use function pointers
that needs __cdecl explicitly. Since then I've sticked with a __cdecl build with
__fastcall PR_CALLBACK/PR_EXTERN.
Hope this helps.
| Assignee | ||
Updated•21 years ago
|
Attachment #152229 -
Flags: superreview?(darin)
Attachment #152229 -
Flags: review?(dbaron)
| Assignee | ||
Comment 6•21 years ago
|
||
changed to NS_FASTCALL with stdcall, added regparm(0)+cdecl to NS_IMETHOD.
(stdcall seems to make no difference in my tests, fwiw)
Attachment #152229 -
Attachment is obsolete: true
| Assignee | ||
Updated•21 years ago
|
Attachment #152262 -
Flags: superreview?(darin)
Attachment #152262 -
Flags: review?(dbaron)
Comment 7•21 years ago
|
||
the patch applies NS_FASTCALL to nsTAString.h but the implementation work is
done in nsTSubstring.h. Looking at assembly code, I guess that if NS_FASTCALL
was applied to nsTSubstring.h too, it would be more efficient because register
parameters could be kept as is up to the real implementation call in
nsTSubstring.h (at the moment they have to be pushed on the stack to call
nsTSubstring.h methods)
| Assignee | ||
Comment 8•21 years ago
|
||
(In reply to comment #7)
> the patch applies NS_FASTCALL to nsTAString.h but the implementation work is
> done in nsTSubstring.h. Looking at assembly code, I guess that if NS_FASTCALL
> was applied to nsTSubstring.h too, it would be more efficient because register
> parameters could be kept as is up to the real implementation call in
> nsTSubstring.h (at the moment they have to be pushed on the stack to call
> nsTSubstring.h methods)
Good catch, I will try that.
Comment 9•21 years ago
|
||
I wonder if this is worth backporting to the AVIARY_1_0_20040515_BRANCH branch.
It would be nice to get an extra 6% perf increase in Firefox and Thunderbird 1.0
if we can.
I'm setting blocking-aviary1.0? so that the right people get their eyes on this.
Flags: blocking-aviary1.0?
Comment 10•21 years ago
|
||
Comment on attachment 152262 [details] [diff] [review]
patch
>Index: js/src/jsdhash.c
>+static JSDHashEntryHdr * DHASH_REGPARM
maybe rename this macro to DHASH_FASTCALL to be consistent
with the change from NS_REGPARM to NS_FASTCALL?
>Index: security/nss/cmd/addbuiltin/addbuiltin.c
>Index: security/nss/lib/asn1/Makefile
>Index: security/nss/lib/asn1/asn1.c
>Index: security/nss/lib/asn1/asn1.h
etc.
looks like these changes aren't meant to be part of the patch.
sr=darin
Attachment #152262 -
Flags: superreview?(darin) → superreview+
Comment 11•21 years ago
|
||
Comment on attachment 152262 [details] [diff] [review]
patch
what darin said, plus you could follow the naming convention and use
JS_DHASH_FASTCALL as the name (plify_jsdhash.sed in js/src will auto-translate
that to what xpcom/ds/pldhash.[ch] need).
/be
Comment 12•21 years ago
|
||
The patch applies NS_FASTCALL to nsQueryInterface operator() but it looks like
it can be removed and inlined instead, I filed bug 250107 about this (with a patch)
Comment on attachment 152262 [details] [diff] [review]
patch
r=dbaron given the changes in comment 7, comment 10, and comment 11.
Also, if there are issues where the compiler will not give errors if the
NS_FASTCALL part of the signature doesn't match, please document that in the
comments (e.g., when overriding a virtual/non-virtual function).
Attachment #152262 -
Flags: review?(dbaron) → review+
This broke a whole bunch of tinderboxes, so I added && (__GNUC__ >= 3) to the
ifdefs.
| Assignee | ||
Comment 15•21 years ago
|
||
Marking as fixed. For whatever reason, this didn't help Tp on our tinderboxes,
I'll be continuing to investigate that.
Status: NEW → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
Comment 16•21 years ago
|
||
Brian, could you take a look at the patch in bug 250107 ? Would you like to
review it ? I expect the changes in that patch to be the same order of magnitude
as the NS_FASTCALL changes. Maybe it would help you in your performance effort.
Updated•21 years ago
|
Flags: blocking-aviary1.0? → blocking-aviary1.0-
Updated•21 years ago
|
Product: Browser → Seamonkey
You need to log in
before you can comment on or make changes to this bug.
Description
•