Use -import-instr-limit to mitigate size growth from ThinLTO
Categories
(Firefox Build System :: Toolchains, enhancement)
Tracking
(firefox72 fixed)
Tracking | Status | |
---|---|---|
firefox72 | --- | fixed |
People
(Reporter: away, Assigned: away)
References
Details
Attachments
(1 file, 1 obsolete file)
When we first enabled ThinLTO on our builds, we got great performance gains, but also large size increases due to aggressive inlining. There is an LLVM option called -import-instr-limit
that limits the size of functions that may be imported (the threshold is subject to modification by PGO). Chromium found a good balance between speed and performance by using a value of 10. In my initial testing, that value can save us many megabytes from libxul (9MB on Windows/Linux, 17MB on Mac!) without noticeable speed regressions.
It is taking me many attempts to get the spelling right across all our linkers. Currently testing https://hg.mozilla.org/try/rev/0626e2fbdd47cdacfd1735592c0636daf01e104e.
When we first enabled ThinLTO on our builds, we got great performance gains, but also large size increases due to aggressive inlining. There is an LLVM option called -import-instr-limit that limits the size of functions that may be imported (the threshold is subject to modification by PGO). Chromium found a good balance between speed and performance by using a value of 10. In initial testing, on Windows and Linux that value can save us many megabytes from libxul without noticeable speed regressions. For Mac, which doesn't yet have PGO, we have to use a higher limit to avoid over-restricting the optimizer which caused slowdowns on my try pushes.
Comment 3•6 years ago
|
||
In Bug 1591725 I've been testing optimization options on android builds. Setting -import-instr-limit
leads to significant reductions in size:
Optimization AArch Size, MB (geckoview_example.apk)
-Oz 32 44.0
-O2,instr-limit=0 32 49.2
-O2,instr-limit=1 32 49.2
-O2,instr-limit=3 32 49.5
-O2,instr-limit=5 32 50.2
-O2,instr-limit=10 32 50.9
-O2 32 53.7
-O3 32 54.9
-Oz 64 50.2
-O2,instr-limit=0 64 55.5
-O2,instr-limit=1 64 55.5
-O2,instr-limit=3 64 55.7
-O2,instr-limit=5 64 56.2
-O2,instr-limit=10 64 57.0
-O2 64 60.9
-O3 64 62.6
So far I've only seen performance, relative to -O2
, degrade (via speedometer), with import-instr-limit=1
and lower.
Comment 5•6 years ago
|
||
bugherder |
Updated•6 years ago
|
Comment 6•6 years ago
|
||
== Change summary for alert #23712 (as of Tue, 05 Nov 2019 09:25:17 GMT) ==
Improvements:
8% raptor-webaudio-firefox windows7-32-shippable opt 205.67 -> 189.92
For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=23712
Comment 7•6 years ago
|
||
== Change summary for alert #23686 (as of Mon, 04 Nov 2019 22:48:29 GMT) ==
Improvements:
31% build times android-4-2-x86 opt taskcluster-m5.4xlarge 2,018.30 -> 1,399.07
29% build times android-4-0-armv7-api16 opt taskcluster-c5.4xlarge 1,806.51 -> 1,285.88
28% build times android-5-0-x86_64 opt taskcluster-c5.4xlarge 2,208.62 -> 1,591.75
28% build times linux64-shippable opt nightly taskcluster-c5d.4xlarge 3,827.68 -> 2,765.54
27% build times android-5-0-x86_64 opt taskcluster-m5.4xlarge 2,151.94 -> 1,566.25
26% build times android-5-0-aarch64 opt taskcluster-c5.4xlarge 1,662.64 -> 1,222.31
26% build times linux64-shippable opt nightly taskcluster-m5.4xlarge 4,068.36 -> 2,997.44
26% build times linux32-shippable opt nightly taskcluster-c5d.4xlarge 3,923.70 -> 2,897.06
26% build times linux32-shippable opt nightly taskcluster-m5.4xlarge 4,177.20 -> 3,089.26
25% build times windows2012-aarch64 opt aarch64-no-eme nightly taskcluster-c5.4xlarge 3,281.75 -> 2,454.48
25% build times linux32-shippable opt nightly taskcluster-c5.4xlarge 3,959.06 -> 2,977.81
24% build times linux64-shippable opt nightly taskcluster-c5.4xlarge 3,773.01 -> 2,866.45
23% build times windows2012-aarch64 opt aarch64-no-eme nightly taskcluster-c4.4xlarge 3,965.53 -> 3,036.61
23% build times windows2012-64-shippable opt nightly taskcluster-c5.4xlarge 4,001.90 -> 3,088.03
23% build times windows2012-32-shippable opt nightly taskcluster-c5.4xlarge 3,793.12 -> 2,935.06
22% build times windows2012-64-shippable opt nightly taskcluster-c4.4xlarge 4,783.91 -> 3,750.17
21% build times windows2012-32-shippable opt nightly taskcluster-c4.4xlarge 4,627.16 -> 3,634.70
18% build times windows2012-aarch64 opt aarch64 taskcluster-c4.4xlarge 3,965.57 -> 3,255.33
16% build times android-4-0-armv7-api16 pgo taskcluster-m5.4xlarge 2,624.75 -> 2,207.55
16% build times android-4-0-armv7-api16 pgo taskcluster-c5d.4xlarge 2,330.61 -> 1,967.63
14% build times android-5-0-aarch64 pgo taskcluster-c5d.4xlarge 2,282.39 -> 1,955.86
14% build times android-5-0-aarch64 pgo taskcluster-m5.4xlarge 2,506.19 -> 2,153.65
13% build times android-4-0-armv7-api16 pgo taskcluster-c5.4xlarge 2,403.27 -> 2,087.00
8% build times osx-shippable opt nightly taskcluster-c5d.4xlarge 3,601.65 -> 3,314.12
For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=23686
Comment 8•6 years ago
|
||
== Change summary for alert #23710 (as of Tue, 05 Nov 2019 09:08:27 GMT) ==
Improvements:
3% Explicit Memory windows7-32 opt 295,940,910.84 -> 285,632,278.54
3% Explicit Memory windows7-32-shippable opt 295,216,826.77 -> 285,805,301.59
2% Explicit Memory windows7-32-shippable opt stylo tp6 391,692,290.18 -> 383,544,099.35
For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=23710
(In reply to Alexandru Ionescu :alexandrui from comment #8)
== Change summary for alert #23710 (as of Tue, 05 Nov 2019 09:08:27 GMT) ==
Improvements:
3% Explicit Memory windows7-32 opt 295,940,910.84 -> 285,632,278.54
3% Explicit Memory windows7-32-shippable opt 295,216,826.77 -> 285,805,301.59
2% Explicit Memory windows7-32-shippable opt stylo tp6 391,692,290.18 -> 383,544,099.35For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=23710
Judging by https://treeherder.mozilla.org/perf.html#/graphs?highlightAlerts=1&series=autoland,1959367,1,4&timerange=1209600&zoom=1572921491070,1572922800207,283626021.2749412,303780436.2993224 , I believe this more rightly belongs to bug 1584101.
Comment 10•6 years ago
|
||
this seems to have regressed using gcc for compiling firefox, with enable LTO.
It seems that the linker doesn't understand -import-instr-limit:
0:11.14 checking what kind of list files are supported by the linker... configure: error: Couldn't find one that works
0:11.15 DEBUG: <truncated - see config.log for full output>
0:11.15 DEBUG: configure:10778: /usr/bin/x86_64-pc-linux-gnu-g++ -o conftest -march=znver1 -pipe -flifetime-dse=1 -Wno-psabi -Wno-class-memaccess -Wno-int-in-bool-context -Wno-multistatement-macros -Wno-maybe-uninitialized -Wno-deprecated-declarations -fno-exceptions -fno-strict-aliasing -fno-rtti -ffunction-sections -fdata-sections -fno-exceptions -fno-math-errno -pthread -lpthread -Wl,-O1 -Wl,--as-needed -Wl,-rpath=/usr/lib64/firefox,--enable-new-dtags -Wl,--compress-debug-sections=zlib -fuse-ld=gold -Wl,-z,noexecstack -Wl,-z,text -Wl,-z,relro -Wl,-z,nocopyreloc -Wl,-Bsymbolic-functions -Wl,--icf=safe conftest.C -ldl 1>&5
0:11.15 DEBUG: configure:10864: checking for -pipe support
0:11.15 DEBUG: configure:10891: checking what kind of list files are supported by the linker
0:11.15 DEBUG: configure:10896: /usr/bin/x86_64-pc-linux-gnu-gcc -std=gnu99 -o conftest.o -c -flto -flifetime-dse=1 -march=znver1 -pipe -fno-strict-aliasing -ffunction-sections -fdata-sections -fno-math-errno -pthread -fPIC -pipe conftest.c 1>&5
0:11.15 DEBUG: configure:10903: /usr/bin/x86_64-pc-linux-gnu-gcc -std=gnu99 -o conftest -flto=12 -flifetime-dse=1 -Wl,-plugin-opt=-import-instr-limit=10 -lpthread -Wl,-O1 -Wl,--as-needed -Wl,-rpath=/usr/lib64/firefox,--enable-new-dtags -Wl,--compress-debug-sections=zlib -fuse-ld=gold -Wl,-z,noexecstack -Wl,-z,text -Wl,-z,relro -Wl,-z,nocopyreloc -Wl,-Bsymbolic-functions -Wl,--icf=safe conftest.list -ldl 1>&5
0:11.15 DEBUG: x86_64-pc-linux-gnu-gcc: error: unrecognized command line option '-import-instr-limit=10'
0:11.15 DEBUG: lto-wrapper: fatal error: /usr/bin/x86_64-pc-linux-gnu-gcc returned 1 exit status
0:11.15 DEBUG: compilation terminated.
0:11.15 DEBUG: /usr/lib/gcc/x86_64-pc-linux-gnu/9.2.0/../../../../x86_64-pc-linux-gnu/bin/ld.gold: fatal error: lto-wrapper failed
0:11.15 DEBUG: collect2: error: ld returned 1 exit status
0:11.15 DEBUG: configure:10907: /usr/bin/x86_64-pc-linux-gnu-gcc -std=gnu99 -o conftest -flto=12 -flifetime-dse=1 -Wl,-plugin-opt=-import-instr-limit=10 -lpthread -Wl,-O1 -Wl,--as-needed -Wl,-rpath=/usr/lib64/firefox,--enable-new-dtags -Wl,--compress-debug-sections=zlib -fuse-ld=gold -Wl,-z,noexecstack -Wl,-z,text -Wl,-z,relro -Wl,-z,nocopyreloc -Wl,-Bsymbolic-functions -Wl,--icf=safe -Wl,-filelist,conftest.list -ldl 1>&5
0:11.15 DEBUG: /usr/lib/gcc/x86_64-pc-linux-gnu/9.2.0/../../../../x86_64-pc-linux-gnu/bin/ld.gold: fatal error: -f/--auxiliary may not be used without -shared
0:11.15 DEBUG: collect2: error: ld returned 1 exit status
0:11.15 DEBUG: configure:10909: /usr/bin/x86_64-pc-linux-gnu-gcc -std=gnu99 -o conftest -flto=12 -flifetime-dse=1 -Wl,-plugin-opt=-import-instr-limit=10 -lpthread -Wl,-O1 -Wl,--as-needed -Wl,-rpath=/usr/lib64/firefox,--enable-new-dtags -Wl,--compress-debug-sections=zlib -fuse-ld=gold -Wl,-z,noexecstack -Wl,-z,text -Wl,-z,relro -Wl,-z,nocopyreloc -Wl,-Bsymbolic-functions -Wl,--icf=safe @conftest.list -ldl 1>&5
0:11.15 DEBUG: x86_64-pc-linux-gnu-gcc: error: unrecognized command line option '-import-instr-limit=10'
0:11.15 DEBUG: lto-wrapper: fatal error: /usr/bin/x86_64-pc-linux-gnu-gcc returned 1 exit status
0:11.15 DEBUG: compilation terminated.
0:11.15 DEBUG: /usr/lib/gcc/x86_64-pc-linux-gnu/9.2.0/../../../../x86_64-pc-linux-gnu/bin/ld.gold: fatal error: lto-wrapper failed
0:11.15 DEBUG: collect2: error: ld returned 1 exit status
0:11.15 DEBUG: configure: error: Couldn't find one that works
0:11.15 ERROR: old-configure failed
0:11.19 *** Fix above errors and then restart with
0:11.19 "./mach build"
0:11.19 gmake: *** [client.mk:115: configure] Error 1
with clang/lld, there's no problem
Comment 11•6 years ago
|
||
0:11.15 DEBUG: x86_64-pc-linux-gnu-gcc: error: unrecognized command line option '-import-instr-limit=10'
Looks like it needs to be clang specific.
Could you please open a new bug?
Thanks
Comment 12•6 years ago
|
||
reverting the patch from this bug allows to use lto wrappers again, so yes this change should be done clang specific. Going to open a new bug and mark this one as the one causing the regression.
![]() |
Assignee | |
Comment 13•6 years ago
|
||
GCC doesn't understand the import-instr-limit
option.
![]() |
Assignee | |
Comment 14•6 years ago
|
||
I'm sorry, I was confused by the discussion being in this bug. tt_1 opened bug 1602355 for the regression. I'll move the patch over there.
Comment 15•6 years ago
|
||
Comment on attachment 9114553 [details]
Fix GCC LTO build break
Revision D56366 was moved to bug 1602355. Setting attachment 9114553 [details] to obsolete.
Description
•