92144 - investigate inlining of nsCOMPtr methods

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Description

•

23 years ago

There are a few nsCOMPtr methods that consistently show up as a bit of time (function, excluding descendents) on jprof profles because we call them so much that the overhead of a function call may be a significant amount of time. We should investigate (on multiple platforms and perhaps on multiple versions of gcc on Linux) the code size and performance affects of inlining: * ~nsCOMPtr_base (which would be done by un-defining the NSCAP_FEATURE_FACTOR_DESTRUCTOR macro), and shows up a little above 1% of time on page loading profiles taken in jprof * nsCOMPtr_base::begin_assignment, which shows up around 0.8% of time on page loading profiles taken in jprof (which would be done by defining NSCAP_FEATURE_INLINE_STARTASSIGMENT) * nsCOMPtr_base::assign_from_helper, which shows up around 0.3% of time on page loading profiles taken in jprof. (This one is more interesting because it might allow the |operator()| on helpers to be inlined by good compilers rather than be executed as a virtual function call, although it would also probably increase code size more than the other two.)

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Updated

•

23 years ago

Status: NEW → ASSIGNED

Keywords: perf

Priority: -- → P3

Target Milestone: --- → mozilla1.0.1

Cathleen

Updated

•

23 years ago

Blocks: 71668

Chris Waterson

Updated

•

23 years ago

Blocks: deCOM

Cathleen

Updated

•

23 years ago

Target Milestone: mozilla1.0.1 → mozilla0.9.7

Chris Waterson

Comment 1

•

23 years ago

On Intel x86, inlining nsCOMPtr's destructor will yield the same number of instructions in the callee, but the number of bytes emitted will increase by five bytes per nsCOMPtr. I started with this routine, using RH7.1's gcc-2.96 compiler with optimization set at -O2. void comptr_test(nsIRDFService* aService) { nsCOMPtr<nsIRDFResource> resource; aService->GetResource("rdf:null", getter_AddRefs(resource)); } With our current nsCOMPtr.h (dtor out-of-line), we get: 00000000 <comptr_test__FP13nsIRDFService>: 0: 55 push %ebp 1: 89 e5 mov %esp,%ebp 3: 56 push %esi 4: 53 push %ebx 5: 83 ec 1c sub $0x1c,%esp 8: 8d 75 e8 lea 0xffffffe8(%ebp),%esi b: 8b 5d 08 mov 0x8(%ebp),%ebx e: 56 push %esi f: c7 45 e8 00 00 00 00 movl $0x0,0xffffffe8(%ebp) 16: e8 fc ff ff ff call 17 <comptr_test__FP13nsIRDFService+0x17> 1b: 83 c4 0c add $0xc,%esp 1e: 8b 13 mov (%ebx),%edx 20: 50 push %eax 21: 68 00 00 00 00 push $0x0 26: 53 push %ebx 27: ff 52 14 call *0x14(%edx) 2a: 58 pop %eax 2b: 5a pop %edx 2c: 6a 00 push $0x0 2e: 56 push %esi 2f: e8 fc ff ff ff call 30 <comptr_test__FP13nsIRDFService+0x30> 34: 83 c4 10 add $0x10,%esp 37: 8d 65 f8 lea 0xfffffff8(%ebp),%esp 3a: 5b pop %ebx 3b: 5e pop %esi 3c: 5d pop %ebp 3d: c3 ret 3e: 89 f6 mov %esi,%esi That's 27 instructions, assuming that the |mov %esi,%esi| instruction was emitted to align the function. With nsCOMPtr::~nsCOMPtr inlined, we get: 00000000 <comptr_test__FP13nsIRDFService>: 0: 55 push %ebp 1: 89 e5 mov %esp,%ebp 3: 53 push %ebx 4: 83 ec 20 sub $0x20,%esp 7: 8d 45 e8 lea 0xffffffe8(%ebp),%eax a: 8b 5d 08 mov 0x8(%ebp),%ebx d: 50 push %eax e: c7 45 e8 00 00 00 00 movl $0x0,0xffffffe8(%ebp) 15: e8 fc ff ff ff call 16 <comptr_test__FP13nsIRDFService+0x16> 1a: 83 c4 0c add $0xc,%esp 1d: 8b 13 mov (%ebx),%edx 1f: 50 push %eax 20: 68 00 00 00 00 push $0x0 25: 53 push %ebx 26: ff 52 14 call *0x14(%edx) 29: 8b 55 e8 mov 0xffffffe8(%ebp),%edx 2c: 83 c4 10 add $0x10,%esp 2f: 85 d2 test %edx,%edx 31: 74 0c je 3f <comptr_test__FP13nsIRDFService+0x3f> 33: 83 ec 0c sub $0xc,%esp 36: 8b 02 mov (%edx),%eax 38: 52 push %edx 39: ff 50 10 call *0x10(%eax) 3c: 83 c4 10 add $0x10,%esp 3f: 8b 5d fc mov 0xfffffffc(%ebp),%ebx 42: c9 leave 43: c3 ret Again, 27 instructions. For reference, re-writing the function without nsCOMPtr, thus: void comptr_test(nsIRDFService* aService) { nsIRDFResource* resource; aService->GetResource("rdf:null", &resource); NS_RELEASE(resource); } yielded: 00000000 <comptr_test__FP13nsIRDFService>: 0: 55 push %ebp 1: 89 e5 mov %esp,%ebp 3: 83 ec 0c sub $0xc,%esp 6: 8b 55 08 mov 0x8(%ebp),%edx 9: 8d 45 fc lea 0xfffffffc(%ebp),%eax c: 8b 0a mov (%edx),%ecx e: 50 push %eax f: 68 00 00 00 00 push $0x0 14: 52 push %edx 15: ff 51 14 call *0x14(%ecx) 18: 8b 45 fc mov 0xfffffffc(%ebp),%eax 1b: 8b 10 mov (%eax),%edx 1d: 89 04 24 mov %eax,(%esp,1) 20: ff 52 10 call *0x10(%edx) 23: 83 c4 10 add $0x10,%esp 26: c9 leave 27: c3 ret Only 17 instructions here.

Judson Valeski

Comment 2

•

23 years ago

I'm reading the inlining as a net loss (added memory usage, w/ no instruction gain) then.?

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Comment 3

•

23 years ago

No, since the inlined version has a direct call to the release method whereas the non-inlined version calls a function that calls release. Furthermore, the function that calls release is probably more likely to lead to mispredicted branches since the processor's cache of where we branched the last time will be meaningless when every release goes through the same code, which would lead to more pipeline halts and also more instruction cache misses which require going back to memory. In other words, we'd need to do performance testing to find out what's faster.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Updated

•

23 years ago

Target Milestone: mozilla0.9.7 → mozilla0.9.8

Daniel Bratell

Comment 4

•

23 years ago

Would inlining all nsCOMPtr methods (constructor, getter_AddRefs, destructor) yield the same code as not using an nsCOMPtr at all? I mean the result should be the same and if the compiler can see the full method it should be able (maybe only in my dreams) to generate the same machine code.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Comment 5

•

23 years ago

Given a good compiler, one would hope it would, although there's a bit of nulling, null-checking, and making sure to addref before releasing that it wouldn't be able to optimize. And it would certainly stop defeating some of the branch-prediction stuff in the newer Pentium processors. (~nsCOMPtr_base calls Release on tons of different interfaces).

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Updated

•

23 years ago

Target Milestone: mozilla0.9.8 → mozilla0.9.9

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Updated

•

23 years ago

Target Milestone: mozilla0.9.9 → Future

Scott Collins

Comment 6

•

22 years ago

collecting under |nsCOMPtr| tracking bug # 178174

Blocks: nsCOMPtr_tracking

Markus Hübner

Comment 7

•

22 years ago

Is bug 139986 related to this (see profile results there)?

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Comment 8

•

21 years ago

Not really.

:Gavin Sharp [email: gavin@gavinsharp.com]

Updated

•

18 years ago

QA Contact: scc → xpcom

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Updated

•

4 years ago

Assignee: dbaron → nobody

Status: ASSIGNED → NEW

Andrew McCreight [:mccr8]

Comment 9

•

4 years ago

Probably better to file a new bug if nsCOMPtr methods show up in profiles. Hopefully with all of the PGO and whatnot that happens we don't have the manually inline things as much.

Status: NEW → RESOLVED

Closed: 4 years ago

Resolution: --- → INCOMPLETE

Bugzilla

investigate inlining of nsCOMPtr methods

Categories

(Core :: XPCOM, defect, P3)

Tracking

()

People

(Reporter: dbaron, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: perf)

Crash Data

Security

(public)

User Story

Description

Updated

Updated

Updated

Updated

Comment 1

Comment 2

Comment 3

Updated

Comment 4

Comment 5

Updated

Updated

Comment 6

Comment 7

Comment 8

Updated

Updated

Comment 9