Closed Bug 771266 Opened 12 years ago Closed 12 years ago

Inline nsCOMPtr_base's destructor

Categories

(Core :: XPCOM, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla16

People

(Reporter: ehsan.akhgari, Assigned: ehsan.akhgari)

Details

Attachments

(1 file)

Today we were looking at some profiles, and we saw that ~nsCOMPtr_base() is not inlined. It should probably be declared as such. :-)
Attached patch Patch (v1)Splinter Review
Assignee: nobody → ehsan
Status: NEW → ASSIGNED
Attachment #639425 - Flags: review?(benjamin)
Attachment #639425 - Flags: review?(benjamin) → review?(jmuizelaar)
Attachment #639425 - Flags: review?(jmuizelaar) → review+
Attachment #639425 - Flags: review+ → review?(justin.lebar+bug)
Attachment #639425 - Flags: review?(justin.lebar+bug) → review+
For the record, the gcc produced body in a release build is: 125fd6a: 55 push %rbp 125fd6b: 48 8b 3f mov (%rdi),%rdi 125fd6e: 48 89 e5 mov %rsp,%rbp 125fd71: 48 85 ff test %rdi,%rdi 125fd74: 74 0a je 125fd80 <_ZN13nsCOMPtr_baseD1Ev+0x16> 125fd76: 48 8b 07 mov (%rdi),%rax 125fd79: 48 8b 40 10 mov 0x10(%rax),%rax 125fd7d: c9 leaveq 125fd7e: ff e0 jmpq *%rax 125fd80: c9 leaveq 125fd81: c3 retq and we have 21717 calls to it in XUL :-)
Wow, I should not have r+ed this so quickly!
(In reply to comment #5) > Wow, I should not have r+ed this so quickly! If you're worried about the code size, I'll watch it and backout if necessary.
Sounds good. I'm also worried about icache locality, but that's obviously a trade-off, and we'll see how our benchmarks are affected.
(In reply to Ehsan Akhgari [:ehsan] from comment #3) > Also, https://hg.mozilla.org/integration/mozilla-inbound/rev/34c8a2369cbc, > with irc-review from Neil. Why?
(In reply to comment #8) > (In reply to Ehsan Akhgari [:ehsan] from comment #3) > > Also, https://hg.mozilla.org/integration/mozilla-inbound/rev/34c8a2369cbc, > > with irc-review from Neil. > > Why? See the bug that Jeff filed.
> See the bug that Jeff filed. Bug 771317.
Codesighs on Mac: before: 42432300 after: 42531300 difference: 99000 For Linux32 non-PGO, seems like we have not gathered enough info on the graph server, and the difference is sort of lost. But this shouldn't make a difference on Windows and Linux, since we do PGO there and the compiler is supposed to be smart enough to inline these calls...
(In reply to Rafael Ávila de Espíndola (:espindola) from comment #4) > For the record, the gcc produced body in a release build is: > > 125fd6a: 55 push %rbp > 125fd6b: 48 8b 3f mov (%rdi),%rdi > 125fd6e: 48 89 e5 mov %rsp,%rbp > 125fd71: 48 85 ff test %rdi,%rdi > 125fd74: 74 0a je 125fd80 > <_ZN13nsCOMPtr_baseD1Ev+0x16> > 125fd76: 48 8b 07 mov (%rdi),%rax > 125fd79: 48 8b 40 10 mov 0x10(%rax),%rax > 125fd7d: c9 leaveq > 125fd7e: ff e0 jmpq *%rax > 125fd80: c9 leaveq > 125fd81: c3 retq > > and we have 21717 calls to it in XUL :-) Note that a lot of this is just from frame overhead. You should see in the inlined version just mov (%rdi), %rdi test %rdi, %rdi je +N mov (%rdi), %rax mov 0x10(%rax),%rax call *%rax That's 17 bytes if I can count correctly, 9 bytes if the compiler can optimize array the null check, and 6 bytes or fewer if the vtable pointer can be reused, compared to a baseline of 5 bytes for the call itself.
> For Linux32 non-PGO, seems like we have not gathered enough info on the > graph server, and the difference is sort of lost. But this shouldn't make a > difference on Windows and Linux, since we do PGO there and the compiler is > supposed to be smart enough to inline these calls... Note that linux pgo doesn't use LTO, so this should help there too (if there is a hot call). Cheers, Rafael
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: