Closed Bug 1437772 Opened 6 years ago Closed 6 years ago

0.35 - 0.58% installer size (linux32, linux64, osx-cross, windows2012-32, windows2012-64) regression on push fbd6ca22f8417eb5833e2f7c585f69252de10216 (Mon Feb 12 2018)

Categories

(Core :: Gecko Profiler, defect, P2)

defect

Tracking

()

RESOLVED FIXED
Tracking Status
firefox-esr52 --- unaffected
firefox-esr60 --- unaffected
firefox58 --- unaffected
firefox59 --- unaffected
firefox60 - fixed
firefox61 - fixed
firefox62 - fixed
firefox63 + wontfix
firefox64 --- wontfix
firefox65 --- fixed

People

(Reporter: igoldan, Assigned: mstange)

References

Details

(Keywords: regression)

Attachments

(1 file)

We have detected a build metrics regression from push:

https://hg.mozilla.org/integration/autoland/pushloghtml?changeset=fbd6ca22f8417eb5833e2f7c585f69252de10216

As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

  1%  installer size windows2012-32 pgo      57,656,558.00 -> 57,990,707.75
  0%  installer size linux32 pgo             64,197,304.08 -> 64,464,600.00
  0%  installer size linux64 pgo             63,493,886.67 -> 63,732,961.75
  0%  installer size osx-cross opt           67,332,418.21 -> 67,579,900.58
  0%  installer size windows2012-64 pgo      61,738,288.25 -> 61,955,546.25


You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=11517

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the jobs in a pushlog format.

To learn more about the regressing test(s), please see: https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Automated_Performance_Testing_and_Sheriffing/Build_Metrics
Component: Untriaged → Gecko Profiler
Product: Firefox → Core
:mstange Changes made in bug 785440 caused >200KB increase in installer size. Please take a look over this issue, to try and reduce this size.
Flags: needinfo?(mstange)
Bug 785440 adds AUTO_PROFILER_LABEL_FAST to lots of different places.  The implementation of that has tons of inlining to avoid problems with the profiler perturbing performance too much...

But the upshot is that we end up with a fair amount of code added.  :(
Here's a size comparison of the "XUL" library from the Mac builds from before and after this change:

$ size -m before/XUL                $ size -m after/XUL
Segment __TEXT: 89382912            Segment __TEXT: 90836992             +1454080
  Section __text: 73350208            Section __text: 74587536           +1237328
  Section __stubs: 11892              Section __stubs: 11892             +0
  Section __stub_helper: 19786        Section __stub_helper: 19786       +0
  Section __const: 3824954            Section __const: 3824954           +0
  Section __cstring: 2472348          Section __cstring: 2685148         +212800
  Section __ustring: 78902            Section __ustring: 78902           +0
  Section __gcc_except_tab: 106828    Section __gcc_except_tab: 106828   +0
  Section __objc_methname: 33822      Section __objc_methname: 33822     +0
  Section __objc_classname: 1660      Section __objc_classname: 1660     +0
  Section __objc_methtype: 36027      Section __objc_methtype: 36027     +0
  Section __eh_frame: 9439752         Section __eh_frame: 9441752        +2000
  total 89376179                      total 90828307                     +1452128
Segment __DATA: 5234688             Segment __DATA: 5234688              +0
  [...]                               [...]
  total 5230984                       total 5230984                      +0
Segment __LINKEDIT: 63348736        Segment __LINKEDIT: 63365120         +16384
total 157966336                     total 159436800                      +1470464

Of the total change of 1.4MB, 1.2MB are due to extra generated code and around 200KB due to extra strings.

I know what to do about the strings but I don't know what to do about the code.

I'm currently comparing the assembly of a few binding methods to get a breakdown of what parts are costing how much. I'm also trying to get a count of the number of methods we instrumented; I think it's somewhere between 7000 and 10000.
Assignee: nobody → mstange
Status: NEW → ASSIGNED
Flags: needinfo?(mstange)
$ otool -tV after/XUL | grep __ZN7mozilla8profiler6detail12RacyFeatures18sActiveAndFeaturesE | wc -l
    8857

So probably around 8800 WebIDL methods + getters + setters. Each one of those grew, on average, by 140 bytes.

Here's AttrBinding::get_specified, which grew by 126 bytes:


Before, 0x10ff27d - 0x10ff250 = 0x2D = 45 bytes:

>                      __ZN7mozilla3dom11AttrBindingL13get_specifiedEP9JSContextN2JS6HandleIP8JSObjectEEPNS0_4AttrE19JSJitGetterCallArgs:        // mozilla::dom::AttrBinding::get_specified(JSContext*, JS::Handle<JSObject*>, mozilla::dom::Attr*, JSJitGetterCallArgs)
> 00000000010ff250         push       rbp                                         ; Begin of unwind block (FDE at 0x4f0e6d4)
> 00000000010ff251         mov        rbp, rsp
> 00000000010ff254         push       rbx
> 00000000010ff255         push       rax
> 00000000010ff256         mov        rbx, rcx
> 00000000010ff259         mov        rdi, rdx
> 00000000010ff25c         call       __ZNK7mozilla3dom4Attr9SpecifiedEv          ; mozilla::dom::Attr::Specified() const
> 00000000010ff261         movzx      eax, al
> 00000000010ff264         movabs     rcx, 0xfff9000000000000
> 00000000010ff26e         or         rcx, rax
> 00000000010ff271         mov        qword [rbx], rcx
> 00000000010ff274         mov        al, 0x1
> 00000000010ff276         add        rsp, 0x8
> 00000000010ff27a         pop        rbx
> 00000000010ff27b         pop        rbp
> 00000000010ff27c         ret
> 00000000010ff27d

After, 0x11043bb - 0x1104310 = 0xAB = 171 bytes (increase of 126 bytes):

>                      __ZN7mozilla3dom11AttrBindingL13get_specifiedEP9JSContextN2JS6HandleIP8JSObjectEEPNS0_4AttrE19JSJitGetterCallArgs:        // mozilla::dom::AttrBinding::get_specified(JSContext*, JS::Handle<JSObject*>, mozilla::dom::Attr*, JSJitGetterCallArgs)
> 0000000001104310         push       rbp                                         ; Begin of unwind block (FDE at 0x5070f3c)
> 0000000001104311         mov        rbp, rsp
> 0000000001104314         push       rbx
> 0000000001104315         push       rax
> 0000000001104316         mov        rbx, rcx
> 0000000001104319         lea        rax, qword [__ZN7mozilla8profiler6detail12RacyFeatures18sActiveAndFeaturesE] ; __ZN7mozilla8profiler6detail12RacyFeatures18sActiveAndFeaturesE
> 0000000001104320         mov        eax, dword [rax]
> 0000000001104322         test       eax, eax
> 0000000001104324         js         loc_1104330
>
> 0000000001104326         mov        qword [rbp+var_10], 0x0
> 000000000110432e         jmp        loc_1104388
> 
>                      loc_1104330:
> 0000000001104330         mov        rax, qword [rdi+0x78]                       ; CODE XREF=__ZN7mozilla3dom11AttrBindingL13get_specifiedEP9JSContextN2JS6HandleIP8JSObjectEEPNS0_4AttrE19JSJitGetterCallArgs+20
> 0000000001104334         mov        qword [rbp+var_10], rax
> 0000000001104338         test       rax, rax
> 000000000110433b         je         loc_1104388
> 
> 000000000110433d         mov        ecx, dword [rax+0x8000]
> 0000000001104343         cmp        ecx, 0x3ff
> 0000000001104349         ja         loc_1104382
> 
> 000000000110434b         mov        ecx, dword [rax+0x8000]
> 0000000001104351         shl        rcx, 0x5
> 0000000001104355         lea        rsi, qword [aGetAttrspecifi]                ; "get Attr.specified"
> 000000000110435c         mov        qword [rax+rcx], rsi
> 0000000001104360         mov        qword [rax+rcx+8], 0x0
> 0000000001104369         lea        rsi, qword [rbp+var_10]
> 000000000110436d         mov        qword [rax+rcx+0x10], rsi
> 0000000001104372         mov        dword [rax+rcx+0x18], 0xfe
> 000000000110437a         mov        dword [rax+rcx+0x1c], 0x10
> 
>                      loc_1104382:
> 0000000001104382         inc        dword [rax+0x8000]                          ; CODE XREF=__ZN7mozilla3dom11AttrBindingL13get_specifiedEP9JSContextN2JS6HandleIP8JSObjectEEPNS0_4AttrE19JSJitGetterCallArgs+57
> 
>                      loc_1104388:
> 0000000001104388         mov        rdi, rdx                                    ; CODE XREF=__ZN7mozilla3dom11AttrBindingL13get_specifiedEP9JSContextN2JS6HandleIP8JSObjectEEPNS0_4AttrE19JSJitGetterCallArgs+30, __ZN7mozilla3dom11AttrBindingL13get_specifiedEP9JSContextN2JS6HandleIP8JSObjectEEPNS0_4AttrE19JSJitGetterCallArgs+43
> 000000000110438b         call       __ZNK7mozilla3dom4Attr9SpecifiedEv          ; mozilla::dom::Attr::Specified() const
> 0000000001104390         movzx      eax, al
> 0000000001104393         movabs     rcx, 0xfff9000000000000
> 000000000110439d         or         rcx, rax
> 00000000011043a0         mov        qword [rbx], rcx
> 00000000011043a3         mov        rax, qword [rbp+var_10]
> 00000000011043a7         test       rax, rax
> 00000000011043aa         je         loc_11043b2
> 
> 00000000011043ac         dec        dword [rax+0x8000]
> 
>                      loc_11043b2:
> 00000000011043b2         mov        al, 0x1                                     ; CODE XREF=__ZN7mozilla3dom11AttrBindingL13get_specifiedEP9JSContextN2JS6HandleIP8JSObjectEEPNS0_4AttrE19JSJitGetterCallArgs+154
> 00000000011043b4         add        rsp, 0x8
> 00000000011043b8         pop        rbx
> 00000000011043b9         pop        rbp
> 00000000011043ba         ret
> 00000000011043bb


The breakdown for this function is as follows:

 - Check if the profiler is active: +13 bytes
 - Remember if the profiler is inactive so that the stack pointer decrement at the end of the function can be skipped: +10 bytes
 - Check whether the JSContext has a PseudoStack: +13 bytes
 - Check whether there's space available on the PseudoStack: +14 bytes
 - Putting data into the ProfileEntry: +55 bytes
 - Increment the PseudoStack's stack pointer: +6 bytes
 - Conditionally decrement the PseudoStack's stack pointer: +15 bytes

13 + 10 + 13 + 14 + 55 + 6 + 15 = 126
Bug 785440 also caused these Talos regressions:

== Change summary for alert #11542 (as of Mon, 12 Feb 2018 16:47:24 GMT) ==

Regressions:

  2%  tp5n main_startup_fileio windows7-32 pgo e10s stylo     71,153,840.83 -> 72,604,103.42
  2%  tp5n main_startup_fileio windows7-32 opt e10s stylo     68,777,990.83 -> 70,166,662.17

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=11542
I think I can get us 290KB of the 1.4MB back: 90KB from code size reductions by combining two duplicated loads (stackPointer and mPseudoStack), and 200KB from string reductions, by concatenating strings at runtime instead of at compile time.

These savings are measured on the uncompressed XUL library on Mac. I don't know how well they translate into installer size savings; the try pushes suggest that some of the savings even lead to installer size increases.

Baseline: https://treeherder.mozilla.org/#/jobs?repo=try&revision=0424a75d218a4592c23d377d727b9f2ca9fd04f6
 + don't duplicate stackPointer load: https://treeherder.mozilla.org/#/jobs?repo=try&revision=abbdb85b7ce143eb1b2358d6d33fa27e85d9ff22
 + save unnecessary mPseudoStack load: https://treeherder.mozilla.org/#/jobs?repo=try&revision=6555d91b54cd89b6802e45464082543683f8a94a
 + don't concatenate strings at compile time: https://treeherder.mozilla.org/#/jobs?repo=try&revision=b9f327e3313f84c8f9c47b880b87ee9e4491bcfb (needs more code changes to give the same result in the profiler)
:mstange Please let us know when you're making a fix for this, even if you're using a separate bug.
Flags: needinfo?(mstange)
(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #7)
> :mstange Please let us know when you're making a fix for this, even if
> you're using a separate bug.

I should have said "when you're pushing the fix".
Will this make 60 cutoff?
Don't think so. We can back out the last patch in bug 785440 from Beta once we've merged; that will undo the regression.
Flags: needinfo?(mstange)
Depends on: 1443830
[Tracking Requested - why for this release]: Make sure it doesn't get lost.
So we have a few options here in addition to just decreasing the code size:

1)  We could enable this only in nightly-or-early-beta.
2)  We could enable this only for certain interfaces, based on whitelists or blacklists or something.

It would be good to understand the use cases we have here so we can figure out whether any of that makes sense....
Flags: needinfo?(mstange)
Flags: needinfo?(ehsan)
(In reply to Boris Zbarsky [:bz] (no decent commit message means r-) from comment #12)
> So we have a few options here in addition to just decreasing the code size:
> 
> 1)  We could enable this only in nightly-or-early-beta.

I'm happy to #ifdef NIGHTLY_BUILD this until this affects a web developer facing feature. More specifically, I think we can make a better case for shipping this in release builds once perf.html has a proper web developer centric view, and once the integration of perf.html into the Firefox devtools is further along.

> 2)  We could enable this only for certain interfaces, based on whitelists or
> blacklists or something.

This is worth revisiting once we've landed some improvements to the code size.

I've had some more ideas for how to reduce code size since my last update to this bug. Ordered from "obvious" to "desperate":
 - Fold the "is profiler running" check into the "is the PseudoStack on the JSContext non-null" check by having a new field that points to the PseudoStack only if the profiler is running and is null otherwise
 - Don't write to the line number field, it's useless. (Would need to make sure we don't access that field for WebIDL PseudoStack frames)
 - Handle PseudoStack stack size overflow using a guard page instead of manual checks
 - Have a side-table that lists all WebIDL APIs so that we only need to push an index, instead of two strings and a category
Flags: needinfo?(mstange)
What advantage would this add to Nightly given that Nightly builds already have --enable-profiling? Is it just to make sure the code doesn't bitrot?
It does not require symbolication, it's present when stackwalking fails (e.g. through JIT code), and it can be treated specially by perf.html without having to make perf.html pattern-match C++ function names.
Boris, I'm tracking 1443830 for 60.
[Tracking Requested - why for this release]: see comment 11
Comment 13 sounds great to me.
Flags: needinfo?(ehsan)
Hi Markus, any recent updates on this bug? :)
Flags: needinfo?(mstange)
Priority: -- → P2
No update yet. I have some patches locally but they all depend on a bigger reworking that I haven't put up for review yet.

We should do the same partial backout that we did for 60, for 61.
Flags: needinfo?(mstange)
Markus, do you plan to do a similar backout for 62? We will be building beta 1 in a couple of weeks.
Flags: needinfo?(mstange)
Flags: needinfo?(mstange)
Is this still affecting 62? Do you plan to do a backout on beta?  I may have already asked on irc but if so, I lost that thread. Does this still affect 63?  Thanks!
Flags: needinfo?(mstange)
Emailing Markus to follow up.
Sorry, yes, this still affects 62, and we need another backout. Hopefully for the last time.

I was hoping to get a proper fix in time but that didn't work out. Sorry I didn't respond sooner.
Flags: needinfo?(mstange)
Markus, any luck here? We still have 3 weeks till the first merge/RC builds. But I'd like to get this change into 62 as soon as possible.
Flags: needinfo?(mstange)
Ryan, can you do the same partial backout again on 62 that you did the previous times?
Flags: needinfo?(mstange) → needinfo?(ryanvm)
The backout patch doesn't apply cleanly. It'll need rebasing.
Flags: needinfo?(ryanvm) → needinfo?(mstange)
Here's the rebased patch.
Flags: needinfo?(mstange) → needinfo?(ryanvm)
(In reply to Markus Stange [:mstange] from comment #25)
> Sorry, yes, this still affects 62, and we need another backout. Hopefully
> for the last time.

Markus, does it still affect 63? Thanks
Flags: needinfo?(mstange)
Marking as wontfix for 63 as we have our last beta today.
Depends on: 1499507
(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #7)
> :mstange Please let us know when you're making a fix for this, even if
> you're using a separate bug.

Bug 1499507 just landed which should help a bit. Can you check whether it made a difference?
Flags: needinfo?(mstange) → needinfo?(igoldan)
(In reply to Markus Stange [:mstange] from comment #32)
> (In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #7)
> > :mstange Please let us know when you're making a fix for this, even if
> > you're using a separate bug.
> 
> Bug 1499507 just landed which should help a bit. Can you check whether it
> made a difference?

Yes, it did made a difference. I'd say it fixed Windows 10 and OS X, Windows 7 partially, Linux 64 by a very small amount.
Flags: needinfo?(igoldan)
(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #33)
> (In reply to Markus Stange [:mstange] from comment #32)
> > (In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #7)
> > > :mstange Please let us know when you're making a fix for this, even if
> > > you're using a separate bug.
> > 
> > Bug 1499507 just landed which should help a bit. Can you check whether it
> > made a difference?
> 
> Yes, it did made a difference. I'd say it fixed Windows 10 and OS X, Windows
> 7 partially, Linux 64 by a very small amount.

That's probably good enough.
Thanks for checking! I'll mark this bug as resolved then. I don't think it's worth uplifting these patches.
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: