Closed
Bug 1499507
Opened 6 years ago
Closed 6 years ago
Remove some code size overhead from WebIDL profiling stack frames
Categories
(Core :: Gecko Profiler, enhancement)
Core
Gecko Profiler
Tracking
()
RESOLVED
FIXED
mozilla65
People
(Reporter: mstange, Assigned: mstange)
References
Details
Attachments
(9 files)
46 bytes,
text/x-phabricator-request
|
Details | Review | |
46 bytes,
text/x-phabricator-request
|
Details | Review | |
46 bytes,
text/x-phabricator-request
|
Details | Review | |
46 bytes,
text/x-phabricator-request
|
Details | Review | |
46 bytes,
text/x-phabricator-request
|
Details | Review | |
46 bytes,
text/x-phabricator-request
|
Details | Review | |
46 bytes,
text/x-phabricator-request
|
Details | Review | |
46 bytes,
text/x-phabricator-request
|
Details | Review | |
46 bytes,
text/x-phabricator-request
|
Details | Review |
Bug 785440 added profiling stack frames to all WebIDL APIs and caused some code size increases, see bug 1437772.
In this bug I want to land some improvements to this situation, but it probably won't fix the whole regression.
Assignee | ||
Updated•6 years ago
|
Assignee: nobody → mstange
Status: NEW → ASSIGNED
Assignee | ||
Comment 1•6 years ago
|
||
Here are the results from the patches I'm about to attach:
XUL section sizes opt:
Linux x64 Mac x64 Win x64 Android 32
93300567 116789591 92883950 58319924
fold checks -158576 -239520 -189936 +124
93141991 116550071 92694014 58320048
infallible capacity -94672 -88624 -98744 -128
93047319 116461447 92595270 58319920
no line numbers -101120 -107968 -98112
92946199 116353479 n/a 58221808
remove strings -160 -201744 -208
92946039 116151735 n/a 58221600
stackPointer load -159568 -54320 -129088
92786471 116097415 n/a 58092512
I introduced some build failures in the Windows part of the "no line numbers" patch, which is why I don't have numbers for the Windows builds yet, but I'll get them by tomorrow. I'm a bit surprised that only Mac is showing a reduction from the "remove strings" patch, but my guess is that that's because the "XUL section sizes opt" number doesn't measure all sections of the binary. But I haven't verified that assumption.
Assignee | ||
Comment 2•6 years ago
|
||
This eliminates a few instructions from every profiler label and saves code size.
We have around 9000 WebIDL constructors + methods + getters + setters which all
have an inlined instance of this code.
This change reduces the binary size on Linux x64 by around 160KB.
Here's a diff of the impact on the code generated for Attr_Binding::get_specified
in the Mac build:
movq %rsp, %rbp
pushq %r15
pushq %r14
pushq %r12
pushq %rbx
subq $0x10, %rsp
movq %rcx, %r14
movq %rdx, %r15
- movq __ZN7mozilla8profiler6detail12RacyFeatures18sActiveAndFeaturesE@GOT, %rax ; __ZN7mozilla8profiler6detail12RacyFeatures18sActiveAndFeaturesE@GOT
- movl (%rax), %eax
- testl %eax, %eax
- js loc_xxxxx
-
- movq $0x0, -40(%rbp)
- jmp loc_xxxxx
-
- movq 0x78(%rdi), %rbx
+ movq 0x80(%rdi), %rbx
movq %rbx, -40(%rbp)
testq %rbx, %rbx
je loc_xxxxx
movl 0x10(%rbx), %r12d
cmpl %r12d, (%rbx)
jbe loc_xxxxx
Assignee | ||
Comment 3•6 years ago
|
||
This eliminates a few instructions from each inlined instance of
AutoProfilerLabel because we no longer need to handle allocation failure in the
inlined code.
I think this allocation should be fine to make infallible: The allocation size
is limited by the thread's stack depth, and we only hit this code path when the
stack is the deepest it's ever been during the thread's life time.
This change reduces the binary size on Linux x64 by around 100KB.
Here's a diff of the impact on the code generated for Attr_Binding::get_specified
in the Mac build, it really just eliminates one test and one jump at the very end
of the method:
@@ -9,30 +9,29 @@
movq %rcx, %r14
movq %rdx, %r15
movq 0x80(%rdi), %rbx
movq %rbx, -40(%rbp)
testq %rbx, %rbx
je loc_xxxxx
movl 0x10(%rbx), %r12d
- cmpl %r12d, (%rbx)
- jbe loc_xxxxx
+ cmpl (%rbx), %r12d
+ jae loc_xxxxx
movq 0x8(%rbx), %rax
movq %r12, %rcx
shlq $0x5, %rcx
leaq aGetAttrspecifi, %rdx ; "get Attr.specified"
movq %rdx, (%rax,%rcx)
movq $0x0, 0x8(%rax,%rcx)
leaq -40(%rbp), %rdx
movq %rdx, 0x10(%rax,%rcx)
movl $0x106, 0x18(%rax,%rcx)
movl $0x1c, 0x1c(%rax,%rcx)
-
leal 0x1(%r12), %eax
movl %eax, 0x10(%rbx)
movq %r15, %rdi
call __ZNK7mozilla3dom4Attr9SpecifiedEv ; mozilla::dom::Attr::Specified() const
movzxl %al, %eax
movabsq $0xfff9000000000000, %rcx
orq %rax, %rcx
@@ -50,12 +49,9 @@
popq %r14
popq %r15
popq %rbp
ret
; endp
movq %rbx, %rdi
call __ZN14ProfilingStack18ensureCapacitySlowEv ; ProfilingStack::ensureCapacitySlow()
- testb %al, %al
- jne loc_xxxxx
-
jmp loc_xxxxx
Depends on D9192
Assignee | ||
Comment 4•6 years ago
|
||
They were not displayed in the UI, and the instructions to initialize the line
field of a stack frame increased code size unnecessarily.
This change reduces the binary size on Linux x64 by around 100KB.
Here's a diff of the impact on the code generated for Attr_Binding::get_specified
in the Mac build:
@@ -20,17 +20,16 @@
movq 0x8(%rbx), %rax
movq %r12, %rcx
shlq $0x5, %rcx
leaq aGetAttrspecifi, %rdx ; "get Attr.specified"
movq %rdx, (%rax,%rcx)
movq $0x0, 0x8(%rax,%rcx)
leaq -40(%rbp), %rdx
movq %rdx, 0x10(%rax,%rcx)
- movl $0x106, 0x18(%rax,%rcx)
movl $0x1c, 0x1c(%rax,%rcx)
leal 0x1(%r12), %eax
movl %eax, 0x10(%rbx)
movq %r15, %rdi
call __ZNK7mozilla3dom4Attr9SpecifiedEv ; mozilla::dom::Attr::Specified() const
movzxl %al, %eax
movabsq $0xfff9000000000000, %rcx
Depends on D9193
Assignee | ||
Comment 5•6 years ago
|
||
This code is run during JSON serialization so performance is not a big concern.
Depends on D9195
Assignee | ||
Comment 6•6 years ago
|
||
This makes it easier to add more flags.
Depends on D9197
Assignee | ||
Comment 7•6 years ago
|
||
These flags will be used by WebIDL APIs in an upcoming patch.
Depends on D9199
Assignee | ||
Comment 8•6 years ago
|
||
Depends on D9203
Assignee | ||
Comment 9•6 years ago
|
||
This means that our binary does not need to include concatenated strings such
as "set CanvasRenderingContext2D.fillStyle". It only needs to contain the
individual strings "CanvasRenderingContext2D" and "fillStyle" which are most
likely already present in the binary.
This change reduces the binary size on macOS x64 by around 200KB.
Here's a diff of the impact on the code generated for Attr_Binding::get_specified
in the Mac build. This change makes us generate slightly more code, which is
very much offset by the reduction in the amount of strings we ship.
@@ -15,22 +15,23 @@
movl 0x10(%rbx), %r12d
cmpl (%rbx), %r12d
jae loc_xxxxx
movq 0x8(%rbx), %rax
movq %r12, %rcx
shlq $0x5, %rcx
- leaq aGetAttrspecifi, %rdx ; "get Attr.specified"
+ leaq aAttr, %rdx ; "Attr"
movq %rdx, (%rax,%rcx)
- movq $0x0, 0x8(%rax,%rcx)
+ leaq aSpecified, %rdx ; "specified"
+ movq %rdx, 0x8(%rax,%rcx)
leaq -40(%rbp), %rdx
movq %rdx, 0x10(%rax,%rcx)
- movl $0x71, 0x1c(%rax,%rcx)
+ movl $0x3a1, 0x1c(%rax,%rcx)
leal 0x1(%r12), %eax
movl %eax, 0x10(%rbx)
movq %r15, %rdi
call __ZNK7mozilla3dom4Attr9SpecifiedEv ; mozilla::dom::Attr::Specified() const
movzxl %al, %eax
movabsq $0xfff9000000000000, %rcx
orq %rax, %rcx
Depends on D9204
Assignee | ||
Comment 10•6 years ago
|
||
This change reduces the binary size on macOS x64 by around 50KB.
Here's a diff of the impact on the code generated for Attr_Binding::get_specified
in the Mac build. It's a bit hard to read because %r12 and %rbx swap their
function, but what happens in this method is that "movq %r12, %rcx" goes
away, and the two instructions "leal 0x1(%r12) %eax" and
"movl %eax, 0x10(%rbx)" turn into an "incl 0x10(%r12)".
So the old code was preserving the original value of profilingStack->stackPointer
in a register, and then using it later to compute the incremented stackPointer.
The new code uses an "incl" instruction for the stackPointer increment and
doesn't worry that the stackPointer value might have changed since the stack
size check at the start of the function. (It can't have changed.)
before: %rbx has the ProfilingStack*, %r12 has profilingStack->stackPointer
after: %r12 has the ProfilingStack*, %rbx has profilingStack->stackPointer
@@ -3,37 +3,35 @@
movq %rsp, %rbp
pushq %r15
pushq %r14
pushq %r12
pushq %rbx
subq $0x10, %rsp
movq %rcx, %r14
movq %rdx, %r15
- movq 0x80(%rdi), %rbx
- movq %rbx, -40(%rbp)
- testq %rbx, %rbx
+ movq 0x80(%rdi), %r12
+ movq %r12, -40(%rbp)
+ testq %r12, %r12
je loc_xxxxx
- movl 0x10(%rbx), %r12d
- cmpl (%rbx), %r12d
+ movl 0x10(%r12), %ebx
+ cmpl (%r12), %ebx
jae loc_xxxxx
- movq 0x8(%rbx), %rax
- movq %r12, %rcx
- shlq $0x5, %rcx
- leaq aAttr, %rdx ; "Attr"
- movq %rdx, (%rax,%rcx)
- leaq aSpecified, %rdx ; "specified"
- movq %rdx, 0x8(%rax,%rcx)
- leaq -40(%rbp), %rdx
- movq %rdx, 0x10(%rax,%rcx)
- movl $0x3a1, 0x1c(%rax,%rcx)
- leal 0x1(%r12), %eax
- movl %eax, 0x10(%rbx)
+ movq 0x8(%r12), %rax
+ shlq $0x5, %rbx
+ leaq aAttr, %rcx ; "Attr"
+ movq %rcx, (%rax,%rbx)
+ leaq aSpecified, %rcx ; "specified"
+ movq %rcx, 0x8(%rax,%rbx)
+ leaq -40(%rbp), %rcx
+ movq %rcx, 0x10(%rax,%rbx)
+ movl $0x3a1, 0x1c(%rax,%rbx)
+ incl 0x10(%r12)
movq %r15, %rdi
call __ZNK7mozilla3dom4Attr9SpecifiedEv ; mozilla::dom::Attr::Specified() const
movzxl %al, %eax
movabsq $0xfff9000000000000, %rcx
orq %rax, %rcx
movq %rcx, (%r14)
movq -40(%rbp), %rax
@@ -47,11 +45,11 @@
popq %rbx
popq %r12
popq %r14
popq %r15
popq %rbp
ret
; endp
- movq %rbx, %rdi
+ movq %r12, %rdi
call __ZN14ProfilingStack18ensureCapacitySlowEv ; ProfilingStack::ensureCapacitySlow()
jmp loc_xxxxx
Depends on D9205
Assignee | ||
Comment 11•6 years ago
|
||
Updated numbers from successful try pushes:
XUL section sizes opt:
Linux x64 Mac x64 Win x64 Android 32
93300567 116789591 92883950 58319924
fold checks -158576 -239520 -189936 +124
93141991 116550071 92694014 58320048
infallible capacity -94672 -88624 -98744 -128
93047319 116461447 92595270 58319920
no line numbers -101120 -107968 -129552 -98112
92946199 116353479 92465718 58221808
remove strings -160 -201744 -164800 -208
92946039 116151735 92300918 58221600
stackPointer load -159568 -54320 +6608 -129088
92786471 116097415 92307526 58092512
Try pushes:
> baseline: https://treeherder.mozilla.org/#/jobs?repo=try&revision=721d857b06913ecaacc0d65e2d760cb5ef56613d
> after "fold checks": https://treeherder.mozilla.org/#/jobs?repo=try&revision=f9d367e5467c11ebf05d4193f9039bde67ff47b7
> after "infallible capacity": https://treeherder.mozilla.org/#/jobs?repo=try&revision=17d6eae0985be3108f31ff4f4499eb967e84e6fc
> after "no line numbers": https://treeherder.mozilla.org/#/jobs?repo=try&revision=b5be11594dc0a50a8785ad233d9b596b9bd981e9
> after "remove strings": https://treeherder.mozilla.org/#/jobs?repo=try&revision=870a7409e58c039a48c5cd327243c30dbabc241d
> after "stackPointer load": https://treeherder.mozilla.org/#/jobs?repo=try&revision=8fe115c646ebec9f964f68b0966d05124bd7d7e1
Comment 12•6 years ago
|
||
Pushed by mstange@themasta.com:
https://hg.mozilla.org/integration/autoland/rev/4a9c9a91182c
Fold the 'profiler is active' check into the 'JSContext has a non-null PseudoStack' check. r=sfink
https://hg.mozilla.org/integration/autoland/rev/bc134fe1722a
Make ensureCapacitySlow infallible. r=emilio
https://hg.mozilla.org/integration/autoland/rev/16d6c90333de
Don't collect line numbers for profiling stack frames. r=njn
https://hg.mozilla.org/integration/autoland/rev/ca23a517da63
Use AppendPrintf to concatenate the label with the dynamic string. r=njn
https://hg.mozilla.org/integration/autoland/rev/9225e9aea377
Convert the ProfilingStackFrame kind into a set of flags. r=njn
https://hg.mozilla.org/integration/autoland/rev/58dc19fb2b76
Add ProfilingStackFrame flags for to choose the string template that is used to combine the label with the dynamic string. r=njn
https://hg.mozilla.org/integration/autoland/rev/f427afc392b0
Add AUTO_PROFILER_LABEL_DYNAMIC_FAST which allows specifying flags. r=njn
https://hg.mozilla.org/integration/autoland/rev/8a3f4acbad3b
Use AUTO_PROFILER_DYNAMIC_FAST for WebIDL APIs. r=bzbarsky
https://hg.mozilla.org/integration/autoland/rev/541186291b88
Allow the compiler to generate a non-atomic increment instruction for the stack pointer increment. r=njn
Comment 13•6 years ago
|
||
Backed out for bustages on ProfileBuffer.cpp
Backout link: https://hg.mozilla.org/integration/autoland/rev/848152c22f8bd814a564a2306bd249b88099aba8
Push link: https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=testfailed%2Cbusted%2Cexception%2Crunnable&revision=541186291b888899e895b8aa92d3823cf3475905
Log link: https://treeherder.mozilla.org/logviewer.html#?job_id=209888855&repo=autoland&lineNumber=26806
Flags: needinfo?(mstange)
Updated•6 years ago
|
Attachment #9018501 -
Attachment description: Bug 1499507 - Use AUTO_PROFILER_DYNAMIC_FAST for WebIDL APIs. r?bzbarsky → Bug 1499507 - Use AUTO_PROFILER_LABEL_DYNAMIC_FAST for WebIDL APIs. r?bzbarsky
Comment 14•6 years ago
|
||
Pushed by mstange@themasta.com:
https://hg.mozilla.org/integration/autoland/rev/24a9494155fe
Fold the 'profiler is active' check into the 'JSContext has a non-null PseudoStack' check. r=sfink
https://hg.mozilla.org/integration/autoland/rev/30476b824eb4
Make ensureCapacitySlow infallible. r=emilio
https://hg.mozilla.org/integration/autoland/rev/41cede6bc7d2
Don't collect line numbers for profiling stack frames. r=njn
https://hg.mozilla.org/integration/autoland/rev/6256446f16c4
Use AppendPrintf to concatenate the label with the dynamic string. r=njn
https://hg.mozilla.org/integration/autoland/rev/76dd85b9aaf7
Convert the ProfilingStackFrame kind into a set of flags. r=njn
https://hg.mozilla.org/integration/autoland/rev/cf40b044af3f
Add ProfilingStackFrame flags for to choose the string template that is used to combine the label with the dynamic string. r=njn
https://hg.mozilla.org/integration/autoland/rev/9254e6c721b3
Add AUTO_PROFILER_LABEL_DYNAMIC_FAST which allows specifying flags. r=njn
https://hg.mozilla.org/integration/autoland/rev/35d05a53e0a9
Use AUTO_PROFILER_LABEL_DYNAMIC_FAST for WebIDL APIs. r=bzbarsky
https://hg.mozilla.org/integration/autoland/rev/4c79a5557d7b
Allow the compiler to generate a non-atomic increment instruction for the stack pointer increment. r=njn
Comment 15•6 years ago
|
||
Backed out 9 changesets (Bug 1499507) for build bustages on /profiler/core/platform.cpp.
Backout: https://hg.mozilla.org/integration/autoland/rev/ccf46ccf351988116723f6a78c887757941c5348
Push with failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=pending%2Crunning%2Csuccess%2Ctestfailed%2Cbusted%2Cexception&revision=4c79a5557d7b48f90dc46206d798b277ff8f1606&selectedJob=209914255
Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=209914255&repo=autoland&lineNumber=21641
Assignee | ||
Comment 16•6 years ago
|
||
Uh oh :( I thought I could do without a debug build try push...
Flags: needinfo?(mstange)
Assignee | ||
Comment 17•6 years ago
|
||
Fixed the Android build failure, it compiles locally. Going to try again.
Comment 18•6 years ago
|
||
Pushed by mstange@themasta.com:
https://hg.mozilla.org/integration/autoland/rev/18a8aa4a02c3
Fold the 'profiler is active' check into the 'JSContext has a non-null PseudoStack' check. r=sfink
https://hg.mozilla.org/integration/autoland/rev/1f19af3b4374
Make ensureCapacitySlow infallible. r=emilio
https://hg.mozilla.org/integration/autoland/rev/cd066bdf78f5
Don't collect line numbers for profiling stack frames. r=njn
https://hg.mozilla.org/integration/autoland/rev/76602ccd5ca3
Use AppendPrintf to concatenate the label with the dynamic string. r=njn
https://hg.mozilla.org/integration/autoland/rev/be8a9a7f29bd
Convert the ProfilingStackFrame kind into a set of flags. r=njn
https://hg.mozilla.org/integration/autoland/rev/2bc3310f933c
Add ProfilingStackFrame flags for to choose the string template that is used to combine the label with the dynamic string. r=njn
https://hg.mozilla.org/integration/autoland/rev/408a37a9e42b
Add AUTO_PROFILER_LABEL_DYNAMIC_FAST which allows specifying flags. r=njn
https://hg.mozilla.org/integration/autoland/rev/5ec157cbbdd1
Use AUTO_PROFILER_LABEL_DYNAMIC_FAST for WebIDL APIs. r=bzbarsky
https://hg.mozilla.org/integration/autoland/rev/50f1edbfb52c
Allow the compiler to generate a non-atomic increment instruction for the stack pointer increment. r=njn
Comment 19•6 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/18a8aa4a02c3
https://hg.mozilla.org/mozilla-central/rev/1f19af3b4374
https://hg.mozilla.org/mozilla-central/rev/cd066bdf78f5
https://hg.mozilla.org/mozilla-central/rev/76602ccd5ca3
https://hg.mozilla.org/mozilla-central/rev/be8a9a7f29bd
https://hg.mozilla.org/mozilla-central/rev/2bc3310f933c
https://hg.mozilla.org/mozilla-central/rev/408a37a9e42b
https://hg.mozilla.org/mozilla-central/rev/5ec157cbbdd1
https://hg.mozilla.org/mozilla-central/rev/50f1edbfb52c
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
status-firefox65:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla65
Updated•6 years ago
|
You need to log in
before you can comment on or make changes to this bug.
Description
•