Closed
Bug 92144
Opened 23 years ago
Closed 4 years ago
investigate inlining of nsCOMPtr methods
Categories
(Core :: XPCOM, defect, P3)
Core
XPCOM
Tracking
()
RESOLVED
INCOMPLETE
Future
People
(Reporter: dbaron, Unassigned)
References
(Blocks 1 open bug)
Details
(Keywords: perf)
There are a few nsCOMPtr methods that consistently show up as a bit of time
(function, excluding descendents) on jprof profles because we call them so much
that the overhead of a function call may be a significant amount of time.
We should investigate (on multiple platforms and perhaps on multiple versions of
gcc on Linux) the code size and performance affects of inlining:
* ~nsCOMPtr_base (which would be done by un-defining the
NSCAP_FEATURE_FACTOR_DESTRUCTOR macro), and shows up a little above 1% of time
on page loading profiles taken in jprof
* nsCOMPtr_base::begin_assignment, which shows up around 0.8% of time on page
loading profiles taken in jprof (which would be done by defining
NSCAP_FEATURE_INLINE_STARTASSIGMENT)
* nsCOMPtr_base::assign_from_helper, which shows up around 0.3% of time on page
loading profiles taken in jprof. (This one is more interesting because it might
allow the |operator()| on helpers to be inlined by good compilers rather than be
executed as a virtual function call, although it would also probably increase
code size more than the other two.)
Reporter | ||
Updated•23 years ago
|
Comment 1•23 years ago
|
||
On Intel x86, inlining nsCOMPtr's destructor will yield the same
number of instructions in the callee, but the number of bytes emitted
will increase by five bytes per nsCOMPtr.
I started with this routine, using RH7.1's gcc-2.96 compiler with
optimization set at -O2.
void
comptr_test(nsIRDFService* aService)
{
nsCOMPtr<nsIRDFResource> resource;
aService->GetResource("rdf:null", getter_AddRefs(resource));
}
With our current nsCOMPtr.h (dtor out-of-line), we get:
00000000 <comptr_test__FP13nsIRDFService>:
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: 56 push %esi
4: 53 push %ebx
5: 83 ec 1c sub $0x1c,%esp
8: 8d 75 e8 lea 0xffffffe8(%ebp),%esi
b: 8b 5d 08 mov 0x8(%ebp),%ebx
e: 56 push %esi
f: c7 45 e8 00 00 00 00 movl $0x0,0xffffffe8(%ebp)
16: e8 fc ff ff ff call 17 <comptr_test__FP13nsIRDFService+0x17>
1b: 83 c4 0c add $0xc,%esp
1e: 8b 13 mov (%ebx),%edx
20: 50 push %eax
21: 68 00 00 00 00 push $0x0
26: 53 push %ebx
27: ff 52 14 call *0x14(%edx)
2a: 58 pop %eax
2b: 5a pop %edx
2c: 6a 00 push $0x0
2e: 56 push %esi
2f: e8 fc ff ff ff call 30 <comptr_test__FP13nsIRDFService+0x30>
34: 83 c4 10 add $0x10,%esp
37: 8d 65 f8 lea 0xfffffff8(%ebp),%esp
3a: 5b pop %ebx
3b: 5e pop %esi
3c: 5d pop %ebp
3d: c3 ret
3e: 89 f6 mov %esi,%esi
That's 27 instructions, assuming that the |mov %esi,%esi| instruction
was emitted to align the function.
With nsCOMPtr::~nsCOMPtr inlined, we get:
00000000 <comptr_test__FP13nsIRDFService>:
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: 53 push %ebx
4: 83 ec 20 sub $0x20,%esp
7: 8d 45 e8 lea 0xffffffe8(%ebp),%eax
a: 8b 5d 08 mov 0x8(%ebp),%ebx
d: 50 push %eax
e: c7 45 e8 00 00 00 00 movl $0x0,0xffffffe8(%ebp)
15: e8 fc ff ff ff call 16 <comptr_test__FP13nsIRDFService+0x16>
1a: 83 c4 0c add $0xc,%esp
1d: 8b 13 mov (%ebx),%edx
1f: 50 push %eax
20: 68 00 00 00 00 push $0x0
25: 53 push %ebx
26: ff 52 14 call *0x14(%edx)
29: 8b 55 e8 mov 0xffffffe8(%ebp),%edx
2c: 83 c4 10 add $0x10,%esp
2f: 85 d2 test %edx,%edx
31: 74 0c je 3f <comptr_test__FP13nsIRDFService+0x3f>
33: 83 ec 0c sub $0xc,%esp
36: 8b 02 mov (%edx),%eax
38: 52 push %edx
39: ff 50 10 call *0x10(%eax)
3c: 83 c4 10 add $0x10,%esp
3f: 8b 5d fc mov 0xfffffffc(%ebp),%ebx
42: c9 leave
43: c3 ret
Again, 27 instructions.
For reference, re-writing the function without nsCOMPtr, thus:
void
comptr_test(nsIRDFService* aService)
{
nsIRDFResource* resource;
aService->GetResource("rdf:null", &resource);
NS_RELEASE(resource);
}
yielded:
00000000 <comptr_test__FP13nsIRDFService>:
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: 83 ec 0c sub $0xc,%esp
6: 8b 55 08 mov 0x8(%ebp),%edx
9: 8d 45 fc lea 0xfffffffc(%ebp),%eax
c: 8b 0a mov (%edx),%ecx
e: 50 push %eax
f: 68 00 00 00 00 push $0x0
14: 52 push %edx
15: ff 51 14 call *0x14(%ecx)
18: 8b 45 fc mov 0xfffffffc(%ebp),%eax
1b: 8b 10 mov (%eax),%edx
1d: 89 04 24 mov %eax,(%esp,1)
20: ff 52 10 call *0x10(%edx)
23: 83 c4 10 add $0x10,%esp
26: c9 leave
27: c3 ret
Only 17 instructions here.
Comment 2•23 years ago
|
||
I'm reading the inlining as a net loss (added memory usage, w/ no instruction
gain) then.?
Reporter | ||
Comment 3•23 years ago
|
||
No, since the inlined version has a direct call to the release method whereas
the non-inlined version calls a function that calls release. Furthermore, the
function that calls release is probably more likely to lead to mispredicted
branches since the processor's cache of where we branched the last time will be
meaningless when every release goes through the same code, which would lead to
more pipeline halts and also more instruction cache misses which require going
back to memory.
In other words, we'd need to do performance testing to find out what's faster.
Reporter | ||
Updated•23 years ago
|
Target Milestone: mozilla0.9.7 → mozilla0.9.8
Comment 4•23 years ago
|
||
Would inlining all nsCOMPtr methods (constructor, getter_AddRefs, destructor)
yield the same code as not using an nsCOMPtr at all?
I mean the result should be the same and if the compiler can see the full method
it should be able (maybe only in my dreams) to generate the same machine code.
Reporter | ||
Comment 5•23 years ago
|
||
Given a good compiler, one would hope it would, although there's a bit of
nulling, null-checking, and making sure to addref before releasing that it
wouldn't be able to optimize. And it would certainly stop defeating some of the
branch-prediction stuff in the newer Pentium processors. (~nsCOMPtr_base calls
Release on tons of different interfaces).
Reporter | ||
Updated•23 years ago
|
Target Milestone: mozilla0.9.8 → mozilla0.9.9
Reporter | ||
Updated•23 years ago
|
Target Milestone: mozilla0.9.9 → Future
Comment 7•22 years ago
|
||
Is bug 139986 related to this (see profile results there)?
Reporter | ||
Comment 8•21 years ago
|
||
Not really.
Updated•18 years ago
|
QA Contact: scc → xpcom
Reporter | ||
Updated•4 years ago
|
Assignee: dbaron → nobody
Status: ASSIGNED → NEW
Comment 9•4 years ago
|
||
Probably better to file a new bug if nsCOMPtr methods show up in profiles. Hopefully with all of the PGO and whatnot that happens we don't have the manually inline things as much.
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → INCOMPLETE
You need to log in
before you can comment on or make changes to this bug.
Description
•