Closed Bug 1437772 Opened 7 years ago Closed 6 years ago

0.35 - 0.58% installer size (linux32, linux64, osx-cross, windows2012-32, windows2012-64) regression on push fbd6ca22f8417eb5833e2f7c585f69252de10216 (Mon Feb 12 2018)

Categories

(Core :: Gecko Profiler, defect, P2)

defect

Tracking

()

RESOLVED FIXED
Tracking Status
firefox-esr52 --- unaffected
firefox-esr60 --- unaffected
firefox58 --- unaffected
firefox59 --- unaffected
firefox60 - fixed
firefox61 - fixed
firefox62 - fixed
firefox63 + wontfix
firefox64 --- wontfix
firefox65 --- fixed

People

(Reporter: igoldan, Assigned: mstange)

References

Details

(Keywords: regression)

Attachments

(1 file)

We have detected a build metrics regression from push: https://hg.mozilla.org/integration/autoland/pushloghtml?changeset=fbd6ca22f8417eb5833e2f7c585f69252de10216 As author of one of the patches included in that push, we need your help to address this regression. Regressions: 1% installer size windows2012-32 pgo 57,656,558.00 -> 57,990,707.75 0% installer size linux32 pgo 64,197,304.08 -> 64,464,600.00 0% installer size linux64 pgo 63,493,886.67 -> 63,732,961.75 0% installer size osx-cross opt 67,332,418.21 -> 67,579,900.58 0% installer size windows2012-64 pgo 61,738,288.25 -> 61,955,546.25 You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=11517 On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the jobs in a pushlog format. To learn more about the regressing test(s), please see: https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Automated_Performance_Testing_and_Sheriffing/Build_Metrics
Component: Untriaged → Gecko Profiler
Product: Firefox → Core
:mstange Changes made in bug 785440 caused >200KB increase in installer size. Please take a look over this issue, to try and reduce this size.
Flags: needinfo?(mstange)
Bug 785440 adds AUTO_PROFILER_LABEL_FAST to lots of different places. The implementation of that has tons of inlining to avoid problems with the profiler perturbing performance too much... But the upshot is that we end up with a fair amount of code added. :(
Here's a size comparison of the "XUL" library from the Mac builds from before and after this change: $ size -m before/XUL $ size -m after/XUL Segment __TEXT: 89382912 Segment __TEXT: 90836992 +1454080 Section __text: 73350208 Section __text: 74587536 +1237328 Section __stubs: 11892 Section __stubs: 11892 +0 Section __stub_helper: 19786 Section __stub_helper: 19786 +0 Section __const: 3824954 Section __const: 3824954 +0 Section __cstring: 2472348 Section __cstring: 2685148 +212800 Section __ustring: 78902 Section __ustring: 78902 +0 Section __gcc_except_tab: 106828 Section __gcc_except_tab: 106828 +0 Section __objc_methname: 33822 Section __objc_methname: 33822 +0 Section __objc_classname: 1660 Section __objc_classname: 1660 +0 Section __objc_methtype: 36027 Section __objc_methtype: 36027 +0 Section __eh_frame: 9439752 Section __eh_frame: 9441752 +2000 total 89376179 total 90828307 +1452128 Segment __DATA: 5234688 Segment __DATA: 5234688 +0 [...] [...] total 5230984 total 5230984 +0 Segment __LINKEDIT: 63348736 Segment __LINKEDIT: 63365120 +16384 total 157966336 total 159436800 +1470464 Of the total change of 1.4MB, 1.2MB are due to extra generated code and around 200KB due to extra strings. I know what to do about the strings but I don't know what to do about the code. I'm currently comparing the assembly of a few binding methods to get a breakdown of what parts are costing how much. I'm also trying to get a count of the number of methods we instrumented; I think it's somewhere between 7000 and 10000.
Assignee: nobody → mstange
Status: NEW → ASSIGNED
Flags: needinfo?(mstange)
$ otool -tV after/XUL | grep __ZN7mozilla8profiler6detail12RacyFeatures18sActiveAndFeaturesE | wc -l 8857 So probably around 8800 WebIDL methods + getters + setters. Each one of those grew, on average, by 140 bytes. Here's AttrBinding::get_specified, which grew by 126 bytes: Before, 0x10ff27d - 0x10ff250 = 0x2D = 45 bytes: > __ZN7mozilla3dom11AttrBindingL13get_specifiedEP9JSContextN2JS6HandleIP8JSObjectEEPNS0_4AttrE19JSJitGetterCallArgs: // mozilla::dom::AttrBinding::get_specified(JSContext*, JS::Handle<JSObject*>, mozilla::dom::Attr*, JSJitGetterCallArgs) > 00000000010ff250 push rbp ; Begin of unwind block (FDE at 0x4f0e6d4) > 00000000010ff251 mov rbp, rsp > 00000000010ff254 push rbx > 00000000010ff255 push rax > 00000000010ff256 mov rbx, rcx > 00000000010ff259 mov rdi, rdx > 00000000010ff25c call __ZNK7mozilla3dom4Attr9SpecifiedEv ; mozilla::dom::Attr::Specified() const > 00000000010ff261 movzx eax, al > 00000000010ff264 movabs rcx, 0xfff9000000000000 > 00000000010ff26e or rcx, rax > 00000000010ff271 mov qword [rbx], rcx > 00000000010ff274 mov al, 0x1 > 00000000010ff276 add rsp, 0x8 > 00000000010ff27a pop rbx > 00000000010ff27b pop rbp > 00000000010ff27c ret > 00000000010ff27d After, 0x11043bb - 0x1104310 = 0xAB = 171 bytes (increase of 126 bytes): > __ZN7mozilla3dom11AttrBindingL13get_specifiedEP9JSContextN2JS6HandleIP8JSObjectEEPNS0_4AttrE19JSJitGetterCallArgs: // mozilla::dom::AttrBinding::get_specified(JSContext*, JS::Handle<JSObject*>, mozilla::dom::Attr*, JSJitGetterCallArgs) > 0000000001104310 push rbp ; Begin of unwind block (FDE at 0x5070f3c) > 0000000001104311 mov rbp, rsp > 0000000001104314 push rbx > 0000000001104315 push rax > 0000000001104316 mov rbx, rcx > 0000000001104319 lea rax, qword [__ZN7mozilla8profiler6detail12RacyFeatures18sActiveAndFeaturesE] ; __ZN7mozilla8profiler6detail12RacyFeatures18sActiveAndFeaturesE > 0000000001104320 mov eax, dword [rax] > 0000000001104322 test eax, eax > 0000000001104324 js loc_1104330 > > 0000000001104326 mov qword [rbp+var_10], 0x0 > 000000000110432e jmp loc_1104388 > > loc_1104330: > 0000000001104330 mov rax, qword [rdi+0x78] ; CODE XREF=__ZN7mozilla3dom11AttrBindingL13get_specifiedEP9JSContextN2JS6HandleIP8JSObjectEEPNS0_4AttrE19JSJitGetterCallArgs+20 > 0000000001104334 mov qword [rbp+var_10], rax > 0000000001104338 test rax, rax > 000000000110433b je loc_1104388 > > 000000000110433d mov ecx, dword [rax+0x8000] > 0000000001104343 cmp ecx, 0x3ff > 0000000001104349 ja loc_1104382 > > 000000000110434b mov ecx, dword [rax+0x8000] > 0000000001104351 shl rcx, 0x5 > 0000000001104355 lea rsi, qword [aGetAttrspecifi] ; "get Attr.specified" > 000000000110435c mov qword [rax+rcx], rsi > 0000000001104360 mov qword [rax+rcx+8], 0x0 > 0000000001104369 lea rsi, qword [rbp+var_10] > 000000000110436d mov qword [rax+rcx+0x10], rsi > 0000000001104372 mov dword [rax+rcx+0x18], 0xfe > 000000000110437a mov dword [rax+rcx+0x1c], 0x10 > > loc_1104382: > 0000000001104382 inc dword [rax+0x8000] ; CODE XREF=__ZN7mozilla3dom11AttrBindingL13get_specifiedEP9JSContextN2JS6HandleIP8JSObjectEEPNS0_4AttrE19JSJitGetterCallArgs+57 > > loc_1104388: > 0000000001104388 mov rdi, rdx ; CODE XREF=__ZN7mozilla3dom11AttrBindingL13get_specifiedEP9JSContextN2JS6HandleIP8JSObjectEEPNS0_4AttrE19JSJitGetterCallArgs+30, __ZN7mozilla3dom11AttrBindingL13get_specifiedEP9JSContextN2JS6HandleIP8JSObjectEEPNS0_4AttrE19JSJitGetterCallArgs+43 > 000000000110438b call __ZNK7mozilla3dom4Attr9SpecifiedEv ; mozilla::dom::Attr::Specified() const > 0000000001104390 movzx eax, al > 0000000001104393 movabs rcx, 0xfff9000000000000 > 000000000110439d or rcx, rax > 00000000011043a0 mov qword [rbx], rcx > 00000000011043a3 mov rax, qword [rbp+var_10] > 00000000011043a7 test rax, rax > 00000000011043aa je loc_11043b2 > > 00000000011043ac dec dword [rax+0x8000] > > loc_11043b2: > 00000000011043b2 mov al, 0x1 ; CODE XREF=__ZN7mozilla3dom11AttrBindingL13get_specifiedEP9JSContextN2JS6HandleIP8JSObjectEEPNS0_4AttrE19JSJitGetterCallArgs+154 > 00000000011043b4 add rsp, 0x8 > 00000000011043b8 pop rbx > 00000000011043b9 pop rbp > 00000000011043ba ret > 00000000011043bb The breakdown for this function is as follows: - Check if the profiler is active: +13 bytes - Remember if the profiler is inactive so that the stack pointer decrement at the end of the function can be skipped: +10 bytes - Check whether the JSContext has a PseudoStack: +13 bytes - Check whether there's space available on the PseudoStack: +14 bytes - Putting data into the ProfileEntry: +55 bytes - Increment the PseudoStack's stack pointer: +6 bytes - Conditionally decrement the PseudoStack's stack pointer: +15 bytes 13 + 10 + 13 + 14 + 55 + 6 + 15 = 126
Bug 785440 also caused these Talos regressions: == Change summary for alert #11542 (as of Mon, 12 Feb 2018 16:47:24 GMT) == Regressions: 2% tp5n main_startup_fileio windows7-32 pgo e10s stylo 71,153,840.83 -> 72,604,103.42 2% tp5n main_startup_fileio windows7-32 opt e10s stylo 68,777,990.83 -> 70,166,662.17 For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=11542
I think I can get us 290KB of the 1.4MB back: 90KB from code size reductions by combining two duplicated loads (stackPointer and mPseudoStack), and 200KB from string reductions, by concatenating strings at runtime instead of at compile time. These savings are measured on the uncompressed XUL library on Mac. I don't know how well they translate into installer size savings; the try pushes suggest that some of the savings even lead to installer size increases. Baseline: https://treeherder.mozilla.org/#/jobs?repo=try&revision=0424a75d218a4592c23d377d727b9f2ca9fd04f6 + don't duplicate stackPointer load: https://treeherder.mozilla.org/#/jobs?repo=try&revision=abbdb85b7ce143eb1b2358d6d33fa27e85d9ff22 + save unnecessary mPseudoStack load: https://treeherder.mozilla.org/#/jobs?repo=try&revision=6555d91b54cd89b6802e45464082543683f8a94a + don't concatenate strings at compile time: https://treeherder.mozilla.org/#/jobs?repo=try&revision=b9f327e3313f84c8f9c47b880b87ee9e4491bcfb (needs more code changes to give the same result in the profiler)
:mstange Please let us know when you're making a fix for this, even if you're using a separate bug.
Flags: needinfo?(mstange)
(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #7) > :mstange Please let us know when you're making a fix for this, even if > you're using a separate bug. I should have said "when you're pushing the fix".
Will this make 60 cutoff?
Don't think so. We can back out the last patch in bug 785440 from Beta once we've merged; that will undo the regression.
Flags: needinfo?(mstange)
Depends on: 1443830
[Tracking Requested - why for this release]: Make sure it doesn't get lost.
So we have a few options here in addition to just decreasing the code size: 1) We could enable this only in nightly-or-early-beta. 2) We could enable this only for certain interfaces, based on whitelists or blacklists or something. It would be good to understand the use cases we have here so we can figure out whether any of that makes sense....
Flags: needinfo?(mstange)
Flags: needinfo?(ehsan)
(In reply to Boris Zbarsky [:bz] (no decent commit message means r-) from comment #12) > So we have a few options here in addition to just decreasing the code size: > > 1) We could enable this only in nightly-or-early-beta. I'm happy to #ifdef NIGHTLY_BUILD this until this affects a web developer facing feature. More specifically, I think we can make a better case for shipping this in release builds once perf.html has a proper web developer centric view, and once the integration of perf.html into the Firefox devtools is further along. > 2) We could enable this only for certain interfaces, based on whitelists or > blacklists or something. This is worth revisiting once we've landed some improvements to the code size. I've had some more ideas for how to reduce code size since my last update to this bug. Ordered from "obvious" to "desperate": - Fold the "is profiler running" check into the "is the PseudoStack on the JSContext non-null" check by having a new field that points to the PseudoStack only if the profiler is running and is null otherwise - Don't write to the line number field, it's useless. (Would need to make sure we don't access that field for WebIDL PseudoStack frames) - Handle PseudoStack stack size overflow using a guard page instead of manual checks - Have a side-table that lists all WebIDL APIs so that we only need to push an index, instead of two strings and a category
Flags: needinfo?(mstange)
What advantage would this add to Nightly given that Nightly builds already have --enable-profiling? Is it just to make sure the code doesn't bitrot?
It does not require symbolication, it's present when stackwalking fails (e.g. through JIT code), and it can be treated specially by perf.html without having to make perf.html pattern-match C++ function names.
Boris, I'm tracking 1443830 for 60.
[Tracking Requested - why for this release]: see comment 11
Comment 13 sounds great to me.
Flags: needinfo?(ehsan)
Hi Markus, any recent updates on this bug? :)
Flags: needinfo?(mstange)
Priority: -- → P2
No update yet. I have some patches locally but they all depend on a bigger reworking that I haven't put up for review yet. We should do the same partial backout that we did for 60, for 61.
Flags: needinfo?(mstange)
Markus, do you plan to do a similar backout for 62? We will be building beta 1 in a couple of weeks.
Flags: needinfo?(mstange)
Flags: needinfo?(mstange)
Is this still affecting 62? Do you plan to do a backout on beta? I may have already asked on irc but if so, I lost that thread. Does this still affect 63? Thanks!
Flags: needinfo?(mstange)
Emailing Markus to follow up.
Sorry, yes, this still affects 62, and we need another backout. Hopefully for the last time. I was hoping to get a proper fix in time but that didn't work out. Sorry I didn't respond sooner.
Flags: needinfo?(mstange)
Markus, any luck here? We still have 3 weeks till the first merge/RC builds. But I'd like to get this change into 62 as soon as possible.
Flags: needinfo?(mstange)
Ryan, can you do the same partial backout again on 62 that you did the previous times?
Flags: needinfo?(mstange) → needinfo?(ryanvm)
The backout patch doesn't apply cleanly. It'll need rebasing.
Flags: needinfo?(ryanvm) → needinfo?(mstange)
Here's the rebased patch.
Flags: needinfo?(mstange) → needinfo?(ryanvm)
(In reply to Markus Stange [:mstange] from comment #25) > Sorry, yes, this still affects 62, and we need another backout. Hopefully > for the last time. Markus, does it still affect 63? Thanks
Flags: needinfo?(mstange)
Marking as wontfix for 63 as we have our last beta today.
Depends on: 1499507
(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #7) > :mstange Please let us know when you're making a fix for this, even if > you're using a separate bug. Bug 1499507 just landed which should help a bit. Can you check whether it made a difference?
Flags: needinfo?(mstange) → needinfo?(igoldan)
(In reply to Markus Stange [:mstange] from comment #32) > (In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #7) > > :mstange Please let us know when you're making a fix for this, even if > > you're using a separate bug. > > Bug 1499507 just landed which should help a bit. Can you check whether it > made a difference? Yes, it did made a difference. I'd say it fixed Windows 10 and OS X, Windows 7 partially, Linux 64 by a very small amount.
Flags: needinfo?(igoldan)
(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #33) > (In reply to Markus Stange [:mstange] from comment #32) > > (In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #7) > > > :mstange Please let us know when you're making a fix for this, even if > > > you're using a separate bug. > > > > Bug 1499507 just landed which should help a bit. Can you check whether it > > made a difference? > > Yes, it did made a difference. I'd say it fixed Windows 10 and OS X, Windows > 7 partially, Linux 64 by a very small amount. That's probably good enough.
Thanks for checking! I'll mark this bug as resolved then. I don't think it's worth uplifting these patches.
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: