Closed Bug 1499507 Opened 6 years ago Closed 6 years ago

Remove some code size overhead from WebIDL profiling stack frames

Categories

(Core :: Gecko Profiler, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla65
Tracking Status
firefox64 --- wontfix
firefox65 --- fixed

People

(Reporter: mstange, Assigned: mstange)

References

Details

Attachments

(9 files)

46 bytes, text/x-phabricator-request
Details | Review
46 bytes, text/x-phabricator-request
Details | Review
46 bytes, text/x-phabricator-request
Details | Review
46 bytes, text/x-phabricator-request
Details | Review
46 bytes, text/x-phabricator-request
Details | Review
46 bytes, text/x-phabricator-request
Details | Review
46 bytes, text/x-phabricator-request
Details | Review
46 bytes, text/x-phabricator-request
Details | Review
46 bytes, text/x-phabricator-request
Details | Review
Bug 785440 added profiling stack frames to all WebIDL APIs and caused some code size increases, see bug 1437772. In this bug I want to land some improvements to this situation, but it probably won't fix the whole regression.
Assignee: nobody → mstange
Status: NEW → ASSIGNED
Here are the results from the patches I'm about to attach: XUL section sizes opt: Linux x64 Mac x64 Win x64 Android 32 93300567 116789591 92883950 58319924 fold checks -158576 -239520 -189936 +124 93141991 116550071 92694014 58320048 infallible capacity -94672 -88624 -98744 -128 93047319 116461447 92595270 58319920 no line numbers -101120 -107968 -98112 92946199 116353479 n/a 58221808 remove strings -160 -201744 -208 92946039 116151735 n/a 58221600 stackPointer load -159568 -54320 -129088 92786471 116097415 n/a 58092512 I introduced some build failures in the Windows part of the "no line numbers" patch, which is why I don't have numbers for the Windows builds yet, but I'll get them by tomorrow. I'm a bit surprised that only Mac is showing a reduction from the "remove strings" patch, but my guess is that that's because the "XUL section sizes opt" number doesn't measure all sections of the binary. But I haven't verified that assumption.
This eliminates a few instructions from every profiler label and saves code size. We have around 9000 WebIDL constructors + methods + getters + setters which all have an inlined instance of this code. This change reduces the binary size on Linux x64 by around 160KB. Here's a diff of the impact on the code generated for Attr_Binding::get_specified in the Mac build: movq %rsp, %rbp pushq %r15 pushq %r14 pushq %r12 pushq %rbx subq $0x10, %rsp movq %rcx, %r14 movq %rdx, %r15 - movq __ZN7mozilla8profiler6detail12RacyFeatures18sActiveAndFeaturesE@GOT, %rax ; __ZN7mozilla8profiler6detail12RacyFeatures18sActiveAndFeaturesE@GOT - movl (%rax), %eax - testl %eax, %eax - js loc_xxxxx - - movq $0x0, -40(%rbp) - jmp loc_xxxxx - - movq 0x78(%rdi), %rbx + movq 0x80(%rdi), %rbx movq %rbx, -40(%rbp) testq %rbx, %rbx je loc_xxxxx movl 0x10(%rbx), %r12d cmpl %r12d, (%rbx) jbe loc_xxxxx
This eliminates a few instructions from each inlined instance of AutoProfilerLabel because we no longer need to handle allocation failure in the inlined code. I think this allocation should be fine to make infallible: The allocation size is limited by the thread's stack depth, and we only hit this code path when the stack is the deepest it's ever been during the thread's life time. This change reduces the binary size on Linux x64 by around 100KB. Here's a diff of the impact on the code generated for Attr_Binding::get_specified in the Mac build, it really just eliminates one test and one jump at the very end of the method: @@ -9,30 +9,29 @@ movq %rcx, %r14 movq %rdx, %r15 movq 0x80(%rdi), %rbx movq %rbx, -40(%rbp) testq %rbx, %rbx je loc_xxxxx movl 0x10(%rbx), %r12d - cmpl %r12d, (%rbx) - jbe loc_xxxxx + cmpl (%rbx), %r12d + jae loc_xxxxx movq 0x8(%rbx), %rax movq %r12, %rcx shlq $0x5, %rcx leaq aGetAttrspecifi, %rdx ; "get Attr.specified" movq %rdx, (%rax,%rcx) movq $0x0, 0x8(%rax,%rcx) leaq -40(%rbp), %rdx movq %rdx, 0x10(%rax,%rcx) movl $0x106, 0x18(%rax,%rcx) movl $0x1c, 0x1c(%rax,%rcx) - leal 0x1(%r12), %eax movl %eax, 0x10(%rbx) movq %r15, %rdi call __ZNK7mozilla3dom4Attr9SpecifiedEv ; mozilla::dom::Attr::Specified() const movzxl %al, %eax movabsq $0xfff9000000000000, %rcx orq %rax, %rcx @@ -50,12 +49,9 @@ popq %r14 popq %r15 popq %rbp ret ; endp movq %rbx, %rdi call __ZN14ProfilingStack18ensureCapacitySlowEv ; ProfilingStack::ensureCapacitySlow() - testb %al, %al - jne loc_xxxxx - jmp loc_xxxxx Depends on D9192
They were not displayed in the UI, and the instructions to initialize the line field of a stack frame increased code size unnecessarily. This change reduces the binary size on Linux x64 by around 100KB. Here's a diff of the impact on the code generated for Attr_Binding::get_specified in the Mac build: @@ -20,17 +20,16 @@ movq 0x8(%rbx), %rax movq %r12, %rcx shlq $0x5, %rcx leaq aGetAttrspecifi, %rdx ; "get Attr.specified" movq %rdx, (%rax,%rcx) movq $0x0, 0x8(%rax,%rcx) leaq -40(%rbp), %rdx movq %rdx, 0x10(%rax,%rcx) - movl $0x106, 0x18(%rax,%rcx) movl $0x1c, 0x1c(%rax,%rcx) leal 0x1(%r12), %eax movl %eax, 0x10(%rbx) movq %r15, %rdi call __ZNK7mozilla3dom4Attr9SpecifiedEv ; mozilla::dom::Attr::Specified() const movzxl %al, %eax movabsq $0xfff9000000000000, %rcx Depends on D9193
This code is run during JSON serialization so performance is not a big concern. Depends on D9195
This makes it easier to add more flags. Depends on D9197
These flags will be used by WebIDL APIs in an upcoming patch. Depends on D9199
This means that our binary does not need to include concatenated strings such as "set CanvasRenderingContext2D.fillStyle". It only needs to contain the individual strings "CanvasRenderingContext2D" and "fillStyle" which are most likely already present in the binary. This change reduces the binary size on macOS x64 by around 200KB. Here's a diff of the impact on the code generated for Attr_Binding::get_specified in the Mac build. This change makes us generate slightly more code, which is very much offset by the reduction in the amount of strings we ship. @@ -15,22 +15,23 @@ movl 0x10(%rbx), %r12d cmpl (%rbx), %r12d jae loc_xxxxx movq 0x8(%rbx), %rax movq %r12, %rcx shlq $0x5, %rcx - leaq aGetAttrspecifi, %rdx ; "get Attr.specified" + leaq aAttr, %rdx ; "Attr" movq %rdx, (%rax,%rcx) - movq $0x0, 0x8(%rax,%rcx) + leaq aSpecified, %rdx ; "specified" + movq %rdx, 0x8(%rax,%rcx) leaq -40(%rbp), %rdx movq %rdx, 0x10(%rax,%rcx) - movl $0x71, 0x1c(%rax,%rcx) + movl $0x3a1, 0x1c(%rax,%rcx) leal 0x1(%r12), %eax movl %eax, 0x10(%rbx) movq %r15, %rdi call __ZNK7mozilla3dom4Attr9SpecifiedEv ; mozilla::dom::Attr::Specified() const movzxl %al, %eax movabsq $0xfff9000000000000, %rcx orq %rax, %rcx Depends on D9204
This change reduces the binary size on macOS x64 by around 50KB. Here's a diff of the impact on the code generated for Attr_Binding::get_specified in the Mac build. It's a bit hard to read because %r12 and %rbx swap their function, but what happens in this method is that "movq %r12, %rcx" goes away, and the two instructions "leal 0x1(%r12) %eax" and "movl %eax, 0x10(%rbx)" turn into an "incl 0x10(%r12)". So the old code was preserving the original value of profilingStack->stackPointer in a register, and then using it later to compute the incremented stackPointer. The new code uses an "incl" instruction for the stackPointer increment and doesn't worry that the stackPointer value might have changed since the stack size check at the start of the function. (It can't have changed.) before: %rbx has the ProfilingStack*, %r12 has profilingStack->stackPointer after: %r12 has the ProfilingStack*, %rbx has profilingStack->stackPointer @@ -3,37 +3,35 @@ movq %rsp, %rbp pushq %r15 pushq %r14 pushq %r12 pushq %rbx subq $0x10, %rsp movq %rcx, %r14 movq %rdx, %r15 - movq 0x80(%rdi), %rbx - movq %rbx, -40(%rbp) - testq %rbx, %rbx + movq 0x80(%rdi), %r12 + movq %r12, -40(%rbp) + testq %r12, %r12 je loc_xxxxx - movl 0x10(%rbx), %r12d - cmpl (%rbx), %r12d + movl 0x10(%r12), %ebx + cmpl (%r12), %ebx jae loc_xxxxx - movq 0x8(%rbx), %rax - movq %r12, %rcx - shlq $0x5, %rcx - leaq aAttr, %rdx ; "Attr" - movq %rdx, (%rax,%rcx) - leaq aSpecified, %rdx ; "specified" - movq %rdx, 0x8(%rax,%rcx) - leaq -40(%rbp), %rdx - movq %rdx, 0x10(%rax,%rcx) - movl $0x3a1, 0x1c(%rax,%rcx) - leal 0x1(%r12), %eax - movl %eax, 0x10(%rbx) + movq 0x8(%r12), %rax + shlq $0x5, %rbx + leaq aAttr, %rcx ; "Attr" + movq %rcx, (%rax,%rbx) + leaq aSpecified, %rcx ; "specified" + movq %rcx, 0x8(%rax,%rbx) + leaq -40(%rbp), %rcx + movq %rcx, 0x10(%rax,%rbx) + movl $0x3a1, 0x1c(%rax,%rbx) + incl 0x10(%r12) movq %r15, %rdi call __ZNK7mozilla3dom4Attr9SpecifiedEv ; mozilla::dom::Attr::Specified() const movzxl %al, %eax movabsq $0xfff9000000000000, %rcx orq %rax, %rcx movq %rcx, (%r14) movq -40(%rbp), %rax @@ -47,11 +45,11 @@ popq %rbx popq %r12 popq %r14 popq %r15 popq %rbp ret ; endp - movq %rbx, %rdi + movq %r12, %rdi call __ZN14ProfilingStack18ensureCapacitySlowEv ; ProfilingStack::ensureCapacitySlow() jmp loc_xxxxx Depends on D9205
Updated numbers from successful try pushes: XUL section sizes opt: Linux x64 Mac x64 Win x64 Android 32 93300567 116789591 92883950 58319924 fold checks -158576 -239520 -189936 +124 93141991 116550071 92694014 58320048 infallible capacity -94672 -88624 -98744 -128 93047319 116461447 92595270 58319920 no line numbers -101120 -107968 -129552 -98112 92946199 116353479 92465718 58221808 remove strings -160 -201744 -164800 -208 92946039 116151735 92300918 58221600 stackPointer load -159568 -54320 +6608 -129088 92786471 116097415 92307526 58092512 Try pushes: > baseline: https://treeherder.mozilla.org/#/jobs?repo=try&revision=721d857b06913ecaacc0d65e2d760cb5ef56613d > after "fold checks": https://treeherder.mozilla.org/#/jobs?repo=try&revision=f9d367e5467c11ebf05d4193f9039bde67ff47b7 > after "infallible capacity": https://treeherder.mozilla.org/#/jobs?repo=try&revision=17d6eae0985be3108f31ff4f4499eb967e84e6fc > after "no line numbers": https://treeherder.mozilla.org/#/jobs?repo=try&revision=b5be11594dc0a50a8785ad233d9b596b9bd981e9 > after "remove strings": https://treeherder.mozilla.org/#/jobs?repo=try&revision=870a7409e58c039a48c5cd327243c30dbabc241d > after "stackPointer load": https://treeherder.mozilla.org/#/jobs?repo=try&revision=8fe115c646ebec9f964f68b0966d05124bd7d7e1
Blocks: 1500467
Pushed by mstange@themasta.com: https://hg.mozilla.org/integration/autoland/rev/4a9c9a91182c Fold the 'profiler is active' check into the 'JSContext has a non-null PseudoStack' check. r=sfink https://hg.mozilla.org/integration/autoland/rev/bc134fe1722a Make ensureCapacitySlow infallible. r=emilio https://hg.mozilla.org/integration/autoland/rev/16d6c90333de Don't collect line numbers for profiling stack frames. r=njn https://hg.mozilla.org/integration/autoland/rev/ca23a517da63 Use AppendPrintf to concatenate the label with the dynamic string. r=njn https://hg.mozilla.org/integration/autoland/rev/9225e9aea377 Convert the ProfilingStackFrame kind into a set of flags. r=njn https://hg.mozilla.org/integration/autoland/rev/58dc19fb2b76 Add ProfilingStackFrame flags for to choose the string template that is used to combine the label with the dynamic string. r=njn https://hg.mozilla.org/integration/autoland/rev/f427afc392b0 Add AUTO_PROFILER_LABEL_DYNAMIC_FAST which allows specifying flags. r=njn https://hg.mozilla.org/integration/autoland/rev/8a3f4acbad3b Use AUTO_PROFILER_DYNAMIC_FAST for WebIDL APIs. r=bzbarsky https://hg.mozilla.org/integration/autoland/rev/541186291b88 Allow the compiler to generate a non-atomic increment instruction for the stack pointer increment. r=njn
Attachment #9018501 - Attachment description: Bug 1499507 - Use AUTO_PROFILER_DYNAMIC_FAST for WebIDL APIs. r?bzbarsky → Bug 1499507 - Use AUTO_PROFILER_LABEL_DYNAMIC_FAST for WebIDL APIs. r?bzbarsky
Pushed by mstange@themasta.com: https://hg.mozilla.org/integration/autoland/rev/24a9494155fe Fold the 'profiler is active' check into the 'JSContext has a non-null PseudoStack' check. r=sfink https://hg.mozilla.org/integration/autoland/rev/30476b824eb4 Make ensureCapacitySlow infallible. r=emilio https://hg.mozilla.org/integration/autoland/rev/41cede6bc7d2 Don't collect line numbers for profiling stack frames. r=njn https://hg.mozilla.org/integration/autoland/rev/6256446f16c4 Use AppendPrintf to concatenate the label with the dynamic string. r=njn https://hg.mozilla.org/integration/autoland/rev/76dd85b9aaf7 Convert the ProfilingStackFrame kind into a set of flags. r=njn https://hg.mozilla.org/integration/autoland/rev/cf40b044af3f Add ProfilingStackFrame flags for to choose the string template that is used to combine the label with the dynamic string. r=njn https://hg.mozilla.org/integration/autoland/rev/9254e6c721b3 Add AUTO_PROFILER_LABEL_DYNAMIC_FAST which allows specifying flags. r=njn https://hg.mozilla.org/integration/autoland/rev/35d05a53e0a9 Use AUTO_PROFILER_LABEL_DYNAMIC_FAST for WebIDL APIs. r=bzbarsky https://hg.mozilla.org/integration/autoland/rev/4c79a5557d7b Allow the compiler to generate a non-atomic increment instruction for the stack pointer increment. r=njn
Uh oh :( I thought I could do without a debug build try push...
Flags: needinfo?(mstange)
Fixed the Android build failure, it compiles locally. Going to try again.
Pushed by mstange@themasta.com: https://hg.mozilla.org/integration/autoland/rev/18a8aa4a02c3 Fold the 'profiler is active' check into the 'JSContext has a non-null PseudoStack' check. r=sfink https://hg.mozilla.org/integration/autoland/rev/1f19af3b4374 Make ensureCapacitySlow infallible. r=emilio https://hg.mozilla.org/integration/autoland/rev/cd066bdf78f5 Don't collect line numbers for profiling stack frames. r=njn https://hg.mozilla.org/integration/autoland/rev/76602ccd5ca3 Use AppendPrintf to concatenate the label with the dynamic string. r=njn https://hg.mozilla.org/integration/autoland/rev/be8a9a7f29bd Convert the ProfilingStackFrame kind into a set of flags. r=njn https://hg.mozilla.org/integration/autoland/rev/2bc3310f933c Add ProfilingStackFrame flags for to choose the string template that is used to combine the label with the dynamic string. r=njn https://hg.mozilla.org/integration/autoland/rev/408a37a9e42b Add AUTO_PROFILER_LABEL_DYNAMIC_FAST which allows specifying flags. r=njn https://hg.mozilla.org/integration/autoland/rev/5ec157cbbdd1 Use AUTO_PROFILER_LABEL_DYNAMIC_FAST for WebIDL APIs. r=bzbarsky https://hg.mozilla.org/integration/autoland/rev/50f1edbfb52c Allow the compiler to generate a non-atomic increment instruction for the stack pointer increment. r=njn
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: