Closed
Bug 771266
Opened 12 years ago
Closed 12 years ago
Inline nsCOMPtr_base's destructor
Categories
(Core :: XPCOM, defect)
Tracking
()
RESOLVED
FIXED
mozilla16
People
(Reporter: ehsan.akhgari, Assigned: ehsan.akhgari)
Details
Attachments
(1 file)
1.31 KB,
patch
|
justin.lebar+bug
:
review+
|
Details | Diff | Splinter Review |
Today we were looking at some profiles, and we saw that ~nsCOMPtr_base() is not inlined. It should probably be declared as such. :-)
Assignee | ||
Comment 1•12 years ago
|
||
Assignee | ||
Updated•12 years ago
|
Attachment #639425 -
Flags: review?(benjamin) → review?(jmuizelaar)
Updated•12 years ago
|
Attachment #639425 -
Flags: review?(jmuizelaar) → review+
Assignee | ||
Updated•12 years ago
|
Attachment #639425 -
Flags: review+ → review?(justin.lebar+bug)
Updated•12 years ago
|
Attachment #639425 -
Flags: review?(justin.lebar+bug) → review+
Assignee | ||
Comment 2•12 years ago
|
||
Target Milestone: --- → mozilla16
Assignee | ||
Comment 3•12 years ago
|
||
Also, https://hg.mozilla.org/integration/mozilla-inbound/rev/34c8a2369cbc, with irc-review from Neil.
For the record, the gcc produced body in a release build is:
125fd6a: 55 push %rbp
125fd6b: 48 8b 3f mov (%rdi),%rdi
125fd6e: 48 89 e5 mov %rsp,%rbp
125fd71: 48 85 ff test %rdi,%rdi
125fd74: 74 0a je 125fd80 <_ZN13nsCOMPtr_baseD1Ev+0x16>
125fd76: 48 8b 07 mov (%rdi),%rax
125fd79: 48 8b 40 10 mov 0x10(%rax),%rax
125fd7d: c9 leaveq
125fd7e: ff e0 jmpq *%rax
125fd80: c9 leaveq
125fd81: c3 retq
and we have 21717 calls to it in XUL :-)
Comment 5•12 years ago
|
||
Wow, I should not have r+ed this so quickly!
Assignee | ||
Comment 6•12 years ago
|
||
(In reply to comment #5)
> Wow, I should not have r+ed this so quickly!
If you're worried about the code size, I'll watch it and backout if necessary.
Comment 7•12 years ago
|
||
Sounds good. I'm also worried about icache locality, but that's obviously a trade-off, and we'll see how our benchmarks are affected.
(In reply to Ehsan Akhgari [:ehsan] from comment #3)
> Also, https://hg.mozilla.org/integration/mozilla-inbound/rev/34c8a2369cbc,
> with irc-review from Neil.
Why?
Assignee | ||
Comment 9•12 years ago
|
||
(In reply to comment #8)
> (In reply to Ehsan Akhgari [:ehsan] from comment #3)
> > Also, https://hg.mozilla.org/integration/mozilla-inbound/rev/34c8a2369cbc,
> > with irc-review from Neil.
>
> Why?
See the bug that Jeff filed.
Comment 10•12 years ago
|
||
> See the bug that Jeff filed.
Bug 771317.
Assignee | ||
Comment 11•12 years ago
|
||
(In reply to Ehsan Akhgari [:ehsan] from comment #3)
> Also, https://hg.mozilla.org/integration/mozilla-inbound/rev/34c8a2369cbc,
> with irc-review from Neil.
Backed out: https://hg.mozilla.org/integration/mozilla-inbound/rev/57ae1fdbcfc5
Assignee | ||
Comment 12•12 years ago
|
||
Codesighs on Mac:
before: 42432300
after: 42531300
difference: 99000
For Linux32 non-PGO, seems like we have not gathered enough info on the graph server, and the difference is sort of lost. But this shouldn't make a difference on Windows and Linux, since we do PGO there and the compiler is supposed to be smart enough to inline these calls...
Comment 13•12 years ago
|
||
(In reply to Rafael Ávila de Espíndola (:espindola) from comment #4)
> For the record, the gcc produced body in a release build is:
>
> 125fd6a: 55 push %rbp
> 125fd6b: 48 8b 3f mov (%rdi),%rdi
> 125fd6e: 48 89 e5 mov %rsp,%rbp
> 125fd71: 48 85 ff test %rdi,%rdi
> 125fd74: 74 0a je 125fd80
> <_ZN13nsCOMPtr_baseD1Ev+0x16>
> 125fd76: 48 8b 07 mov (%rdi),%rax
> 125fd79: 48 8b 40 10 mov 0x10(%rax),%rax
> 125fd7d: c9 leaveq
> 125fd7e: ff e0 jmpq *%rax
> 125fd80: c9 leaveq
> 125fd81: c3 retq
>
> and we have 21717 calls to it in XUL :-)
Note that a lot of this is just from frame overhead. You should see in the inlined version just
mov (%rdi), %rdi
test %rdi, %rdi
je +N
mov (%rdi), %rax
mov 0x10(%rax),%rax
call *%rax
That's 17 bytes if I can count correctly, 9 bytes if the compiler can optimize array the null check, and 6 bytes or fewer if the vtable pointer can be reused, compared to a baseline of 5 bytes for the call itself.
Comment 14•12 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/b84af5d0c13c
https://hg.mozilla.org/mozilla-central/rev/34c8a2369cbc
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
> For Linux32 non-PGO, seems like we have not gathered enough info on the
> graph server, and the difference is sort of lost. But this shouldn't make a
> difference on Windows and Linux, since we do PGO there and the compiler is
> supposed to be smart enough to inline these calls...
Note that linux pgo doesn't use LTO, so this should help there too (if there is a hot call).
Cheers,
Rafael
You need to log in
before you can comment on or make changes to this bug.
Description
•