Closed
Bug 1445147
Opened 7 years ago
Closed 7 years ago
2.01 - 2.59% installer size (windows2012-32, windows2012-64) regression on push a7ab282e1d4a1aa1726017a05e04102c7adc9e33 (Tue Mar 13 2018)
Categories
(Firefox Build System :: General, defect, P2)
Tracking
(firefox-esr52 unaffected, firefox-esr60 unaffected, firefox59 unaffected, firefox60 unaffected, firefox61+ fixed)
RESOLVED
FIXED
mozilla61
Tracking | Status | |
---|---|---|
firefox-esr52 | --- | unaffected |
firefox-esr60 | --- | unaffected |
firefox59 | --- | unaffected |
firefox60 | --- | unaffected |
firefox61 | + | fixed |
People
(Reporter: igoldan, Assigned: froydnj)
References
Details
(Keywords: regression)
Attachments
(1 file)
2.10 MB,
text/plain
|
Details |
We have detected a build metrics regression from push: https://hg.mozilla.org/integration/autoland/pushloghtml?changeset=a7ab282e1d4a1aa1726017a05e04102c7adc9e33 As author of one of the patches included in that push, we need your help to address this regression. Regressions: 3% installer size windows2012-64 pgo 61,624,094.50 -> 63,217,362.75 2% installer size windows2012-32 pgo 57,674,681.25 -> 58,833,725.25 You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=12088 On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the jobs in a pushlog format. To learn more about the regressing test(s), please see: https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Automated_Performance_Testing_and_Sheriffing/Build_Metrics
Reporter | ||
Updated•7 years ago
|
Component: Untriaged → General
Product: Firefox → Firefox Build System
Version: 56 Branch → Trunk
Reporter | ||
Comment 1•7 years ago
|
||
:RyanVM Bug 1424281 increased the installer size by a big amount. Can we reduce them to where they were previously?
Flags: needinfo?(ryanvm)
I'll run this through the various Chromium code size tools today: https://www.chromium.org/developers/windows-binary-sizes
I spot-checked a few of the most-grown functions and we seem to have more aggressive inlining (as expected). This is particularly bad in OnMessageReceived generated code where we have a lot of repetition: https://searchfox.org/mozilla-central/source/__GENERATED__/ipc/ipdl/PFilePickerParent.cpp#112 VS 15.6 inlined all of those helpers where the previous version didn't. Multiply that by all the IPDL classes and that's a lot. I recommend sticking a MOZ_NEVER_INLINE on anything that sounds like an error path: FatalError, SentinelReadError, etc. Perhaps we should also mark these conditions MOZ_UNLIKELY for the sake of gcc/clang but MSVC won't benefit. Given the sheer number of these things we might also want to MOZ_NEVER_INLINE the non-error functions: ReadSentinel, IPDLParamTraits<P>::Read (which is the guts of ReadIPDLParam).
Something interesting to consider -- when inlining functions, in addition to the obvious cost of copy-pasted code, there is another, more subtle, cost: more inlining means more variables live in a given function's stack frame. When the code needs to access variables more than 0xFF bytes away from the base pointer, a four-byte instruction turns into a seven-byte instruction: xul!ShortestPaths+0x106 [z:\build\build\src\js\src\builtin\testingfunctions.cpp @ 3610]: 3610 00000001`830cde32 48895518 mov qword ptr [rbp+18h],rdx 3610 00000001`830cde36 488b542440 mov rdx,qword ptr [rsp+40h] 3610 00000001`830cde3b 4883c220 add rdx,20h 3610 00000001`830cde3f 48895508 mov qword ptr [rbp+8],rdx 3610 00000001`830cde43 488b0a mov rcx,qword ptr [rdx] 3610 00000001`830cde46 48894d10 mov qword ptr [rbp+10h],rcx 3610 00000001`830cde4a 488d4d08 lea rcx,[rbp+8] 3610 00000001`830cde4e 48890a mov qword ptr [rdx],rcx 3611 00000001`830cde51 4c8b4518 mov r8,qword ptr [rbp+18h] 3611 00000001`830cde55 498b4818 mov rcx,qword ptr [r8+18h] 3611 00000001`830cde59 448b71f4 mov r14d,dword ptr [rcx-0Ch] xul!ShortestPaths+0x11d [z:\build\build\src\js\src\builtin\testingfunctions.cpp @ 3610]: 3610 00000001`834aa085 488995c8010000 mov qword ptr [rbp+1C8h],rdx 3610 00000001`834aa08c 488b542448 mov rdx,qword ptr [rsp+48h] 3610 00000001`834aa091 4883c220 add rdx,20h 3610 00000001`834aa095 488995b8010000 mov qword ptr [rbp+1B8h],rdx 3610 00000001`834aa09c 488b0a mov rcx,qword ptr [rdx] 3610 00000001`834aa09f 48898dc0010000 mov qword ptr [rbp+1C0h],rcx 3610 00000001`834aa0a6 488d8db8010000 lea rcx,[rbp+1B8h] 3610 00000001`834aa0ad 48890a mov qword ptr [rdx],rcx 3611 00000001`834aa0b0 4c8b85c8010000 mov r8,qword ptr [rbp+1C8h] 3611 00000001`834aa0b7 498b4818 mov rcx,qword ptr [r8+18h] 3611 00000001`834aa0bb 448b79f4 mov r15d,dword ptr [rcx-0Ch]
Reporter | ||
Comment 5•7 years ago
|
||
Noticed many big perf improvements, but also another regression: == Change summary for alert #12102 (as of Mon, 12 Mar 2018 21:27:50 GMT) == Regressions: 4% tp5n main_startup_fileio windows7-32 pgo e10s stylo 72,419,122.33 -> 75,131,795.50 3% tp5n main_startup_fileio windows7-32 opt e10s stylo 69,906,224.58 -> 72,267,734.92 Improvements: 13% tsvg_static windows7-32 opt e10s stylo 65.24 -> 56.43 9% a11yr windows7-32 pgo e10s stylo 417.36 -> 381.65 8% displaylist_mutate windows7-32 opt e10s stylo9,550.61 -> 8,751.33 7% tp5o_webext responsiveness windows10-64 pgo e10s stylo2.99 -> 2.78 7% stylebench windows7-32 opt e10s stylo 27.07 -> 28.92 7% a11yr windows7-32 opt e10s stylo 698.99 -> 652.84 6% displaylist_mutate windows10-64 opt e10s stylo7,122.12 -> 6,677.56 6% a11yr windows10-64 pgo e10s stylo 393.74 -> 369.54 5% tp5o windows7-32 opt e10s stylo 285.94 -> 271.01 5% stylebench windows7-32 pgo e10s stylo 40.86 -> 42.89 4% speedometer windows7-32 pgo e10s stylo 43.69 -> 45.30 3% speedometer windows7-32 opt e10s stylo 32.99 -> 34.03 3% tp5o_webext windows10-64 pgo e10s stylo 293.19 -> 285.52 2% speedometer windows10-64 pgo e10s stylo 41.48 -> 42.41 For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=12102
Comment 6•7 years ago
|
||
I guess it's plausible that the main_startup_fileio regression is related to the libxul size increase.
Flags: needinfo?(ryanvm)
Updated•7 years ago
|
status-firefox59:
--- → unaffected
status-firefox60:
--- → unaffected
status-firefox61:
--- → affected
status-firefox-esr52:
--- → unaffected
tracking-firefox61:
--- → +
Updated•7 years ago
|
Reporter | ||
Comment 7•7 years ago
|
||
Platform_microbench test also noticed perf regressions: == Change summary for alert #12104 (as of Mon, 12 Mar 2018 22:52:56 GMT) == Regressions: 52% Strings PerfStripCharsWhitespace windows10-64 opt 320,864.17 -> 488,867.75 50% Strings PerfStripCharsWhitespace windows7-32 opt 323,906.00 -> 484,405.58 49% Strings PerfStripCharsCRLF windows10-64 opt 191,063.42 -> 285,057.75 19% Strings PerfStripCRLF windows7-32 opt 86,098.42 -> 102,425.58 19% Strings PerfStripWhitespace windows7-32 opt 73,790.58 -> 87,737.33 9% Strings PerfStripCRLF windows10-64 opt 84,739.04 -> 92,109.50 Improvements: 10% Stylo Gecko_nsCSSParser_ParseSheet_Bench windows7-32 opt 73,499.08 -> 66,322.25 For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=12104
Comment 8•7 years ago
|
||
(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #5) > Noticed many big perf improvements, but also another regression: Those speedometer wins are pretty compelling. (In reply to Ryan VanderMeulen [:RyanVM] from comment #6) > I guess it's plausible that the main_startup_fileio regression is related to > the libxul size increase. Yes, I agree. Realistically, we're unlikely to downgrade the compiler, especially in the light of these perf wins. Some of the more-aggressive inlining is presumably helping us here, but I'm willing to bet that 90% is dead weight. Given that we're looking at multiple megabytes of size regression here, and given that we routinely spend multiple engineer-days to fix codesize regressions that are twenty times smaller, I think we should definitely invest a fair amount of time to iterate on the before/after comparison and move things out of line. David, are you planning to do this, or should we work with management to find someone?
Flags: needinfo?(dmajor)
I'm testing this change which ought to be the biggest bang-per-buck: https://hg.mozilla.org/try/rev/6cd1b351e18a4493d35e9fc5d7dbc907e3dcf6f3 I don't really want to sign up for more detailed subsequent work if that's not enough.
Flags: needinfo?(dmajor)
Comment 10•7 years ago
|
||
(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #7) > Platform_microbench test also noticed perf regressions: Just a note that these are GTest-based performance tests, which means we won't get PGO numbers for them (GTests don't work with our Windows PGO builds), so it's kinda hard to say how severe this is :(.
Comment 11•7 years ago
|
||
Ok, ok, one more: https://hg.mozilla.org/try/rev/508d8700c861ac0bbc56186bfb80e87684218761
Comment 12•7 years ago
|
||
Here's my approach in case anyone wants to keep going even further
- Find the second "Increases in Total Size" header of attachment 8958620 [details]
- Look for repeating patterns (I saw ::OnMessageReceived, ::ToObjectInternal, and ...Binding::get...)
- In WinDbg open xul.dll from Ryan's m-c landing and the one before, side by side
- Dump a function: uf xul!mozilla::dom::HttpConnectionElement::ToObjectInternal (it takes ages)
- Scroll right so you can see line numbers like "compositioneventbinding.cpp @ 259"
- Scroll through the before and after builds at the same time, looking for places where the "after" stays on the same line for very many instructions
- Find the corresponding non-inlined `call` instruction in the "before" build.
It would be great if we had a tool that could automate this: for each function, report how many times it was inlined, and roughly how many instructions that cost us. It would probably need compiler cooperation.
Comment 13•7 years ago
|
||
Sounds good. NI David to report back on the results of the work in comment 9 once we have them. Once we have that, we can see what's left and decide how to proceed.
Flags: needinfo?(dmajor)
Comment 14•7 years ago
|
||
My first push got us about 400k: https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&originalRevision=80b4777a6421&newProject=try&newRevision=ed4b9b48a0ce&framework=1&filter=startup_fileio Adding the second one brings it to about 470k: https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&originalRevision=80b4777a6421&newProject=try&newRevision=62e710f22fdd&framework=1&filter=startup_fileio
Flags: needinfo?(dmajor)
Comment 15•7 years ago
|
||
Ok. Sounds like there's potentially enough further improvements here that we should spend some more time here. Anthony, can you find someone to work on this? Ideally someone would spend at least a couple of days poking at the steps in comment 12 (as well as any other ideas they may come up with). If we hit a point of diminishing returns, we should check with Product to determine how much more time to sink.
Flags: needinfo?(ajones)
Assignee | ||
Comment 16•7 years ago
|
||
Can we report something on this to Microsoft? I cannot believe that the inliner suddenly become stupid enough to inline a lot of those IPC error functions; they should never be hit in PGO profiling, so they ought to be frigidly cold code...
Ryan - the JS engine currently uses -O2. Can you try a build with -O1 -Oi to see if that reduces the amount of inlining?
Flags: needinfo?(ajones) → needinfo?(ryanvm)
Updated•7 years ago
|
Flags: needinfo?(ajones)
Comment 18•7 years ago
|
||
I pushed the below to Try: https://hg.mozilla.org/try/rev/92152d085974cda5137fc4bc15583cab60aee4cc win32 before: 59,167,406 after: 59,185,916 diff: +18,510 (?!) win64 before: 63,530,164 after: 63,454,440 diff: -75,724
Flags: needinfo?(ryanvm)
Comment 20•7 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM] from comment #19) > Bobby, any thoughts on what to do next here? My suggestion from comment 15 still holds. Time is dwindling here - this is a large regression, and the backout options aren't good. IMO this is on Anthony's plate, and he needs to do one of the following: (1) Find resources (either on his team or elsewhere) to continue looking at this. (2) Get signoff from product to accept the regression without further investigation. If I were Product, I would at least insist that we pursue the steps from comment 12 for a few more days, and pursue this with Microsoft via comment 16 to see if there might be some easy fix.
Flags: needinfo?(bobbyholley)
Comment 21•7 years ago
|
||
Anthony, we've got roughly two weeks until the first merge of Gecko 61 to Beta. Is there anyone able to look into this?
Flags: needinfo?(ajones)
Updated•7 years ago
|
Flags: needinfo?(ajones)
Updated•7 years ago
|
Priority: -- → P2
Updated•7 years ago
|
Severity: normal → critical
Comment 22•7 years ago
|
||
Can you find someone to look at this?
Assignee | ||
Comment 23•7 years ago
|
||
I am getting set up to poke at this. I think we should at least land most of dmajor's ipc/-related changes, at least the bits on the error paths. I'm not in a position to make a judgement on the js/ changes: https://hg.mozilla.org/try/diff/508d8700c861/js/src/jsapi.cpp
Assignee | ||
Comment 24•7 years ago
|
||
Fun fact: ipc/glue/ProtocolUtils.cpp already declares a bunch of functions as MOZ_NEVER_INLINE, mostly the same functions dmajor made MOZ_NEVER_INLINE in https://hg.mozilla.org/try/rev/6cd1b351e18a4493d35e9fc5d7dbc907e3dcf6f3 . Maybe MS botched their logic for inlining things?
(In reply to Bobby Holley (:bholley) from comment #20) > Time is dwindling here - this is a large regression, and the backout options > aren't good. Why can't we back it out?
Flags: needinfo?(ajones)
Updated•7 years ago
|
Flags: needinfo?(ajones)
Assignee | ||
Comment 26•7 years ago
|
||
(In reply to Anthony Jones (:kentuckyfriedtakahe, :k17e) from comment #25) > (In reply to Bobby Holley (:bholley) from comment #20) > > Time is dwindling here - this is a large regression, and the backout options > > aren't good. > > Why can't we back it out? That's a reasonable question: bug 1424281 comment 0 said that there are "interesting optimizations" for us, but reading through the release notes, it looks like the notable thing might be that the compiler is slightly faster, not that it produces better code?
Flags: needinfo?(ryanvm)
Comment 27•7 years ago
|
||
I'd say the bigger issue at this point is that we've been landing a number of patches over the last month which depend on the newer version of the compiler. Trying to backout all of those changes and/or rework patches sounds like a terrible idea a week before the first beta. And comment 5 in this bug enumerates the performance wins we saw from this update.
Flags: needinfo?(ryanvm)
Assignee | ||
Comment 28•7 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM] from comment #27) > I'd say the bigger issue at this point is that we've been landing a number > of patches over the last month which depend on the newer version of the > compiler. Trying to backout all of those changes and/or rework patches > sounds like a terrible idea a week before the first beta. Well, this is a compiler update from 2017 -> 2017, not 2015 -> 2017, right? If this was the latter, yes, that's basically impossible now. But the former shouldn't be too big of a problem, I think? > And comment 5 in this bug enumerates the performance wins we saw from this > update. Those are pretty nice!
Comment 29•7 years ago
|
||
(In reply to Nathan Froyd [:froydnj] from comment #28) > Well, this is a compiler update from 2017 -> 2017, not 2015 -> 2017, right? > If this was the latter, yes, that's basically impossible now. But the > former shouldn't be too big of a problem, I think? Microsoft changed their release model in 2017. The more significant part is that we updated from version 15.4 to 15.6 which brought along a toolset change, which includes new C++ features. See also: https://docs.microsoft.com/en-us/visualstudio/productinfo/vs2017-release-rhythm
Assignee | ||
Comment 30•7 years ago
|
||
It's also worth noting that, for reasons that I do not understand, the installer size for x64 Windows pgo on central is down to ~60.2MB, so it's ~5% smaller than the numbers cited in comment 0 (!). Likewise for x86. I think it's still worth applying some of dmajor's ipc/-related patches, because those will get us closer to the size numbers on release--at least assuming they don't tank Talos performance...
Comment 31•7 years ago
|
||
Wait, I'm confused. Do you mean that they went up in comment 0, and then went down again at some later point? If so, what is that point?
Flags: needinfo?(nfroyd)
Assignee | ||
Comment 32•7 years ago
|
||
(In reply to Bobby Holley (:bholley) from comment #31) > Wait, I'm confused. Do you mean that they went up in comment 0, and then > went down again at some later point? If so, what is that point? There's a big drop from bug 1435230, e.g. bug 1435230 comment 12, when we stopped building SkiaPDF for Windows: https://treeherder.mozilla.org/perf.html#/graphs?timerange=5184000&series=mozilla-inbound,1457369,1,2&zoom=1520917605190.7139,1521216207585.17,60513779.417519145,63614008.60864218 It looks like the installer size has been slowly but steadily trending downwards after that point as well. Bug 1455071 seemed to help (although the installer size doesn't stay permanently lowered), and so did bug 1454592 (likewise on permanent lowering).
Flags: needinfo?(nfroyd)
Comment 33•7 years ago
|
||
(In reply to Nathan Froyd [:froydnj] from comment #24) > Fun fact: ipc/glue/ProtocolUtils.cpp already declares a bunch of functions > as MOZ_NEVER_INLINE, mostly the same functions dmajor made MOZ_NEVER_INLINE > in https://hg.mozilla.org/try/rev/6cd1b351e18a4493d35e9fc5d7dbc907e3dcf6f3 . > Maybe MS botched their logic for inlining things? I assume you mean that ProtocolUtils.**h** already sets MOZ_NEVER_INLINE; I can't find any in the cpp. Offhand I can't find any source to back me up, and I could well be imagining this, but I thought __declspec(noinline) had to go on the function definition rather than the declaration.
Assignee | ||
Comment 34•7 years ago
|
||
(In reply to David Major [:dmajor] from comment #33) > (In reply to Nathan Froyd [:froydnj] from comment #24) > > Fun fact: ipc/glue/ProtocolUtils.cpp already declares a bunch of functions > > as MOZ_NEVER_INLINE, mostly the same functions dmajor made MOZ_NEVER_INLINE > > in https://hg.mozilla.org/try/rev/6cd1b351e18a4493d35e9fc5d7dbc907e3dcf6f3 . > > Maybe MS botched their logic for inlining things? > > I assume you mean that ProtocolUtils.**h** already sets MOZ_NEVER_INLINE; I > can't find any in the cpp. I do indeed. > Offhand I can't find any source to back me up, and I could well be imagining > this, but I thought __declspec(noinline) had to go on the function > definition rather than the declaration. We are of both minds of this in various places throughout the tree. OTOH, I did a test push that tacked on MOZ_NEVER_INLINE in a bunch of places, similar to your original ipc/ patch: https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=mozilla-central&originalRevision=0e45c13b34e815cb42a9f08bb44142d1a81e186e&newProject=try&newRevision=5edc0cac0c5ca15f89bd44ef8963281b3e8a53c3&originalSignature=6fe30befd371a869a2ea65b68ce54c22e6f129a6&newSignature=6fe30befd371a869a2ea65b68ce54c22e6f129a6&framework=2 that got a reduction of ~500k in xul.dll size. I then did a test push that tacked on MOZ_NEVER_INLINE *only* on Pickle::ReadSentinel's definition (should have tried it with the declaration, doh) with the patches above, and got: https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=mozilla-central&originalRevision=0e45c13b34e815cb42a9f08bb44142d1a81e186e&newProject=try&newRevision=6b3ad2c1fb42e129a9888b064b51931c6fca05d1&originalSignature=6fe30befd371a869a2ea65b68ce54c22e6f129a6&newSignature=6fe30befd371a869a2ea65b68ce54c22e6f129a6&framework=2 that is, a reduction of ~800k in xul.dll size. I don't understand MSVC PGO. :(
Assignee | ||
Comment 35•7 years ago
|
||
That is, redundantly adding MOZ_NEVER_INLINE to the definitions of the IPC error functions seems to *increase* the size of xul.dll by 300k. So it would seem that MOZ_NEVER_INLINE on the declaration is fine...? Or maybe not inlining ReadSentinel also encourages MSVC to not inline the error functions, or something?
Comment 36•7 years ago
|
||
Per discussion with Nathan, we've clawed back the easy wins we're going to get from bug 1456192. Given that our current installer size on 61 is smaller than it was when this bug was first reported, I think this is as good as it's going to get without a lot more engineering time than it's worth. Calling this "fixed" as a result.
Assignee: nobody → nfroyd
Status: NEW → RESOLVED
Closed: 7 years ago
status-firefox-esr60:
--- → unaffected
Flags: needinfo?(ajones)
Resolution: --- → FIXED
Target Milestone: --- → mozilla61
You need to log in
before you can comment on or make changes to this bug.
Description
•