Closed Bug 92144 Opened 23 years ago Closed 4 years ago

investigate inlining of nsCOMPtr methods

Categories

(Core :: XPCOM, defect, P3)

defect

Tracking

()

RESOLVED INCOMPLETE
Future

People

(Reporter: dbaron, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: perf)

There are a few nsCOMPtr methods that consistently show up as a bit of time (function, excluding descendents) on jprof profles because we call them so much that the overhead of a function call may be a significant amount of time. We should investigate (on multiple platforms and perhaps on multiple versions of gcc on Linux) the code size and performance affects of inlining: * ~nsCOMPtr_base (which would be done by un-defining the NSCAP_FEATURE_FACTOR_DESTRUCTOR macro), and shows up a little above 1% of time on page loading profiles taken in jprof * nsCOMPtr_base::begin_assignment, which shows up around 0.8% of time on page loading profiles taken in jprof (which would be done by defining NSCAP_FEATURE_INLINE_STARTASSIGMENT) * nsCOMPtr_base::assign_from_helper, which shows up around 0.3% of time on page loading profiles taken in jprof. (This one is more interesting because it might allow the |operator()| on helpers to be inlined by good compilers rather than be executed as a virtual function call, although it would also probably increase code size more than the other two.)
Status: NEW → ASSIGNED
Keywords: perf
Priority: -- → P3
Target Milestone: --- → mozilla1.0.1
Blocks: 71668
Blocks: deCOM
Target Milestone: mozilla1.0.1 → mozilla0.9.7
On Intel x86, inlining nsCOMPtr's destructor will yield the same number of instructions in the callee, but the number of bytes emitted will increase by five bytes per nsCOMPtr. I started with this routine, using RH7.1's gcc-2.96 compiler with optimization set at -O2. void comptr_test(nsIRDFService* aService) { nsCOMPtr<nsIRDFResource> resource; aService->GetResource("rdf:null", getter_AddRefs(resource)); } With our current nsCOMPtr.h (dtor out-of-line), we get: 00000000 <comptr_test__FP13nsIRDFService>: 0: 55 push %ebp 1: 89 e5 mov %esp,%ebp 3: 56 push %esi 4: 53 push %ebx 5: 83 ec 1c sub $0x1c,%esp 8: 8d 75 e8 lea 0xffffffe8(%ebp),%esi b: 8b 5d 08 mov 0x8(%ebp),%ebx e: 56 push %esi f: c7 45 e8 00 00 00 00 movl $0x0,0xffffffe8(%ebp) 16: e8 fc ff ff ff call 17 <comptr_test__FP13nsIRDFService+0x17> 1b: 83 c4 0c add $0xc,%esp 1e: 8b 13 mov (%ebx),%edx 20: 50 push %eax 21: 68 00 00 00 00 push $0x0 26: 53 push %ebx 27: ff 52 14 call *0x14(%edx) 2a: 58 pop %eax 2b: 5a pop %edx 2c: 6a 00 push $0x0 2e: 56 push %esi 2f: e8 fc ff ff ff call 30 <comptr_test__FP13nsIRDFService+0x30> 34: 83 c4 10 add $0x10,%esp 37: 8d 65 f8 lea 0xfffffff8(%ebp),%esp 3a: 5b pop %ebx 3b: 5e pop %esi 3c: 5d pop %ebp 3d: c3 ret 3e: 89 f6 mov %esi,%esi That's 27 instructions, assuming that the |mov %esi,%esi| instruction was emitted to align the function. With nsCOMPtr::~nsCOMPtr inlined, we get: 00000000 <comptr_test__FP13nsIRDFService>: 0: 55 push %ebp 1: 89 e5 mov %esp,%ebp 3: 53 push %ebx 4: 83 ec 20 sub $0x20,%esp 7: 8d 45 e8 lea 0xffffffe8(%ebp),%eax a: 8b 5d 08 mov 0x8(%ebp),%ebx d: 50 push %eax e: c7 45 e8 00 00 00 00 movl $0x0,0xffffffe8(%ebp) 15: e8 fc ff ff ff call 16 <comptr_test__FP13nsIRDFService+0x16> 1a: 83 c4 0c add $0xc,%esp 1d: 8b 13 mov (%ebx),%edx 1f: 50 push %eax 20: 68 00 00 00 00 push $0x0 25: 53 push %ebx 26: ff 52 14 call *0x14(%edx) 29: 8b 55 e8 mov 0xffffffe8(%ebp),%edx 2c: 83 c4 10 add $0x10,%esp 2f: 85 d2 test %edx,%edx 31: 74 0c je 3f <comptr_test__FP13nsIRDFService+0x3f> 33: 83 ec 0c sub $0xc,%esp 36: 8b 02 mov (%edx),%eax 38: 52 push %edx 39: ff 50 10 call *0x10(%eax) 3c: 83 c4 10 add $0x10,%esp 3f: 8b 5d fc mov 0xfffffffc(%ebp),%ebx 42: c9 leave 43: c3 ret Again, 27 instructions. For reference, re-writing the function without nsCOMPtr, thus: void comptr_test(nsIRDFService* aService) { nsIRDFResource* resource; aService->GetResource("rdf:null", &resource); NS_RELEASE(resource); } yielded: 00000000 <comptr_test__FP13nsIRDFService>: 0: 55 push %ebp 1: 89 e5 mov %esp,%ebp 3: 83 ec 0c sub $0xc,%esp 6: 8b 55 08 mov 0x8(%ebp),%edx 9: 8d 45 fc lea 0xfffffffc(%ebp),%eax c: 8b 0a mov (%edx),%ecx e: 50 push %eax f: 68 00 00 00 00 push $0x0 14: 52 push %edx 15: ff 51 14 call *0x14(%ecx) 18: 8b 45 fc mov 0xfffffffc(%ebp),%eax 1b: 8b 10 mov (%eax),%edx 1d: 89 04 24 mov %eax,(%esp,1) 20: ff 52 10 call *0x10(%edx) 23: 83 c4 10 add $0x10,%esp 26: c9 leave 27: c3 ret Only 17 instructions here.
I'm reading the inlining as a net loss (added memory usage, w/ no instruction gain) then.?
No, since the inlined version has a direct call to the release method whereas the non-inlined version calls a function that calls release. Furthermore, the function that calls release is probably more likely to lead to mispredicted branches since the processor's cache of where we branched the last time will be meaningless when every release goes through the same code, which would lead to more pipeline halts and also more instruction cache misses which require going back to memory. In other words, we'd need to do performance testing to find out what's faster.
Target Milestone: mozilla0.9.7 → mozilla0.9.8
Would inlining all nsCOMPtr methods (constructor, getter_AddRefs, destructor) yield the same code as not using an nsCOMPtr at all? I mean the result should be the same and if the compiler can see the full method it should be able (maybe only in my dreams) to generate the same machine code.
Given a good compiler, one would hope it would, although there's a bit of nulling, null-checking, and making sure to addref before releasing that it wouldn't be able to optimize. And it would certainly stop defeating some of the branch-prediction stuff in the newer Pentium processors. (~nsCOMPtr_base calls Release on tons of different interfaces).
Target Milestone: mozilla0.9.8 → mozilla0.9.9
Target Milestone: mozilla0.9.9 → Future
collecting under |nsCOMPtr| tracking bug # 178174
Is bug 139986 related to this (see profile results there)?
QA Contact: scc → xpcom
Assignee: dbaron → nobody
Status: ASSIGNED → NEW

Probably better to file a new bug if nsCOMPtr methods show up in profiles. Hopefully with all of the PGO and whatnot that happens we don't have the manually inline things as much.

Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.