Optimize clang-cl builds for size in places where speed is not critical

RESOLVED WONTFIX

Status

enhancement
RESOLVED WONTFIX
11 months ago
4 months ago

People

(Reporter: dmajor, Assigned: dmajor)

Tracking

3 Branch

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments)

clang-cl generates code that is fast but large. To mitigate the size regression when switching from MSVC, we should aim for small code in the parts of the tree that don't contribute to our perf metrics.

Likely this would take the form of a moz.build variable to that tells the build to optimize for size. Then we could examine a xul.dll using symbolsort + webtreemap to identify the directories that contribute most to binary size, and shrink them if it doesn't mess with perf.

The tricky thing here is that we'd want a way to propagate this variable down to subdirectories, or else the number of moz.build files we'd need to touch would get out of hand.
Comment 1 might just be simple enough to work (and pass review). We would put:

CLANG_CL_OPTIMIZE_FOR_SIZE = True
export('CLANG_CL_OPTIMIZE_FOR_SIZE')


in the moz.build at the root of whichever sub-trees we want to build with -Os.
Comment hidden (mozreview-request)
Attachment #8983226 - Flags: feedback?(dmajor)
Comment on attachment 8983226 [details]
Bug 1465633 - Introduce a moz.build variable to pass -Os instead of other optimize flags when building with clang-cl.

Thanks Chris, this seems to work!

Noting one snag here for my future self: Exporting this variable from media/moz.build does not work, because that file is not part of any DIRS-traversal. To cover the media/ directory, I had to do the export from config/external/moz.build (for the codecs) and media/webrtc/moz.build (for webrtc).
Attachment #8983226 - Flags: feedback?(dmajor) → feedback+
Attachment #8983226 - Flags: review?(core-build-config-reviews)
Assignee: nobody → dmajor
Attachment #8983642 - Flags: review?(core-build-config-reviews)

Comment 6

11 months ago
mozreview-review
Comment on attachment 8983226 [details]
Bug 1465633 - Introduce a moz.build variable to pass -Os instead of other optimize flags when building with clang-cl.

https://reviewboard.mozilla.org/r/249090/#review255830

I have questions.

::: python/mozbuild/mozbuild/frontend/context.py:1229
(Diff revision 2)
> +    'CLANG_CL_OPTIMIZE_FOR_SIZE': (bool, bool,
> +        """Optimize code in this directory for size when compiling with clang-cl.
> +        Passes -Os instead of other optimize flags.

Do we believe there's anything super-special about `clang-cl` in this regard?  That is, should this be `FORCE_OPTIMIZE_FOR_SIZE` (name bikeshedding welcome) instead?

We generally optimize for size anyway (see `old-configure.in`); can we document why this particular variable is different from our default configure settings?

::: python/mozbuild/mozbuild/frontend/emitter.py:1039
(Diff revision 2)
> +        if clang_cl_optimize_for_size and context.config.substs.get('CLANG_CL'):
> +            computed_flags.resolve_flags('OPTIMIZE', ['-Os'])

I know doing the Android work that `-Oz` was the flag that actually gave us size wins, rather than `-Os`; does `clang-cl` support `-Oz` and should we be using that instead?
(In reply to Nathan Froyd [:froydnj] from comment #6)
> ::: python/mozbuild/mozbuild/frontend/context.py:1229
> (Diff revision 2)
> > +    'CLANG_CL_OPTIMIZE_FOR_SIZE': (bool, bool,
> > +        """Optimize code in this directory for size when compiling with clang-cl.
> > +        Passes -Os instead of other optimize flags.
> 
> Do we believe there's anything super-special about `clang-cl` in this
> regard?  That is, should this be `FORCE_OPTIMIZE_FOR_SIZE` (name
> bikeshedding welcome) instead?
> 
> We generally optimize for size anyway (see `old-configure.in`); can we
> document why this particular variable is different from our default
> configure settings?

dmajor has pointed out that the order of the day tends to be *not* optimizing for size (Darwin, non-DEBUG Linux builds, clang-cl); time for me to update my mental model of our optimization policy!

In any event, do we think that it could be just as beneficial to optimize these things for size in e.g. Linux PGO builds, or Mac builds in a future fantasy world?
(In reply to Nathan Froyd [:froydnj] from comment #7)
> In any event, do we think that it could be just as beneficial to optimize
> these things for size in e.g. Linux PGO builds, or Mac builds in a future
> fantasy world?

Yeah, I agree that the current setup is a bit on the hard-code-y side.

(In reply to Nathan Froyd [:froydnj] from comment #6)
> I know doing the Android work that `-Oz` was the flag that actually gave us
> size wins, rather than `-Os`; does `clang-cl` support `-Oz` and should we be
> using that instead?

I'll try it.
Ok, clang-cl accepts -Xclang -Oz but I was only able to test Win64 as the compiler crashed on Win32.

Win64 sizes:
installer size opt: 60272949  # MSVC PGO
installer size opt: 63207231  # clang-cl -O2
installer size opt: 60379894  # clang-cl -O2/-Os
installer size opt: 59594236  # clang-cl -O2/-Oz

Win64 -Os to -Oz: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&
originalRevision=851bfd3c0546&newProject=try&newRevision=4d384ec7d4c6&framework=1&showOnlyConfident=1

The size gains are pretty modest compared to -Os, but the perf hit is large, several percent. Let's stick with -Os.
Comment on attachment 8983642 [details] [diff] [review]
Optimize clang-cl builds for size in a few modules

Review of attachment 8983642 [details] [diff] [review]:
-----------------------------------------------------------------

This is unfortunate but hopefully future work on PGO will let us remove this. Unblocking the switch to clang-cl is a good enough reason to do this.
Attachment #8983642 - Flags: review?(core-build-config-reviews) → review+

Comment 11

10 months ago
mozreview-review
Comment on attachment 8983226 [details]
Bug 1465633 - Introduce a moz.build variable to pass -Os instead of other optimize flags when building with clang-cl.

https://reviewboard.mozilla.org/r/249090/#review258012

I think it's worth considering froydnj's question here just to make sure we're making good choices for future work, but I'm OK landing this either way because it's important to unblock switching to clang-cl.

::: python/mozbuild/mozbuild/frontend/context.py:1229
(Diff revision 2)
> +    'CLANG_CL_OPTIMIZE_FOR_SIZE': (bool, bool,
> +        """Optimize code in this directory for size when compiling with clang-cl.
> +        Passes -Os instead of other optimize flags.

Per discussion elsewhere we currently get this behavior from PGO with MSVC and gcc on Linux, we just don't want to gate the MSVC to clang-cl switchover on getting working PGO with clang-cl, so this ought to be a short-term fix.

You do have a point in that this would probably be generally useful for other platforms where we don't currently build with PGO like macOS and Android. I don't want to block this work on bulding a fully general solution but we could consider a half-measure where this gets renamed to just  `OPTIMIZE_FOR_SIZE`, make the build backend error if you try to set it and the build is *not* using clang-cl, and then change dmajor's patch so all the hunks look like:
```
if CONFIG['CLANG_CL']:
    OPTIMIZE_FOR_SIZE = True
    export('OPTIMIZE_FOR_SIZE')
```

We could document that it's currently only implemented for clang-cl and file a followup for fleshing out a more general mechanism. (We'd likely need to have each compiler define optimization flags for size/speed so we don't have to hard-code them here.)

::: python/mozbuild/mozbuild/frontend/emitter.py:1040
(Diff revision 2)
>              if v in context and context[v]:
>                  computed_flags.resolve_flags('MOZBUILD_%s' % v, context[v])
>  
> +        clang_cl_optimize_for_size = context['CLANG_CL_OPTIMIZE_FOR_SIZE']
> +        if clang_cl_optimize_for_size and context.config.substs.get('CLANG_CL'):
> +            computed_flags.resolve_flags('OPTIMIZE', ['-Os'])

Could we stick this in toolchain.configure or something like that with a name, maybe `CLANG_OPTIMIZE_SIZE_FLAGS`? Having compiler flags squirreled away in the emitter isn't great, and at least there it'd be a bit easier to locate.
Attachment #8983226 - Flags: review+

Comment 12

10 months ago
mozreview-review-reply
Comment on attachment 8983226 [details]
Bug 1465633 - Introduce a moz.build variable to pass -Os instead of other optimize flags when building with clang-cl.

https://reviewboard.mozilla.org/r/249090/#review255830

> I know doing the Android work that `-Oz` was the flag that actually gave us size wins, rather than `-Os`; does `clang-cl` support `-Oz` and should we be using that instead?

Per dmajor's investigation it sounds like `-Os` is the right choice for Windows.
Attachment #8983226 - Flags: review?(core-build-config-reviews)
This bug was opened at a time when it looked like PGO support (bug 1341525) would be far away, but we've managed to get it enabled with much less difficulty than expected.

PGO achieves the goal of this bug, in a much more precise and data-driven way than a human ever could. We'll have official numbers from m-i soon, but in my try pushes I'm seeing up to 8% wins in Talos while still making the binary several megabytes smaller.

Our PGO data really shows the Pareto principle in action:

>llvm-profdata.exe show -detailed-summary merged.profdata
Instrumentation level: Front-end
Total functions: 782910
Maximum function count: 1270757776
Maximum internal block count: 1184666314
Detailed summary:
Total number of blocks: 2000537
Total count: 117932831678
2852 blocks with count >= 6068268 account for 80 percentage of the total counts.
6553 blocks with count >= 1821069 account for 90 percentage of the total counts.
11663 blocks with count >= 722490 account for 95 percentage of the total counts.
29621 blocks with count >= 83232 account for 99 percentage of the total counts.
71484 blocks with count >= 5635 account for 99.9 percentage of the total counts.
126421 blocks with count >= 467 account for 99.99 percentage of the total counts.
179860 blocks with count >= 60 account for 99.999 percentage of the total counts.
Status: NEW → RESOLVED
Last Resolved: 9 months ago
Resolution: --- → WONTFIX
Version: Version 3 → 3 Branch
You need to log in before you can comment on or make changes to this bug.