Closed Bug 1444171 Opened 6 years ago Closed 6 years ago

Use an order file for clang-cl builds

Categories

(Firefox Build System :: General, enhancement)

3 Branch
enhancement
Not set
normal

Tracking

(firefox63 fixed)

RESOLVED FIXED
mozilla63
Tracking Status
firefox63 --- fixed

People

(Reporter: away, Assigned: froydnj)

References

(Blocks 1 open bug)

Details

Attachments

(6 files, 9 obsolete files)

4.38 KB, patch
Details | Diff | Splinter Review
1.44 KB, patch
gps
: review+
Details | Diff | Splinter Review
1.04 KB, patch
gps
: review+
Details | Diff | Splinter Review
4.21 KB, patch
glandium
: review+
Details | Diff | Splinter Review
22.60 KB, patch
glandium
: review+
mshal
: feedback+
Details | Diff | Splinter Review
1.71 KB, patch
glandium
: review+
mshal
: feedback+
Details | Diff | Splinter Review
An order file gives us a 3-4% general perf improvement (and possibly even better for startup scenarios) on clang-cl builds, where we don't have PGO to inform function-locality decisions.

The files can be generated with Chromium's instrumentation library:
https://cs.chromium.org/search/?q=cygprofile_win&type=cs
https://cs.chromium.org/search/?q=cygprofile_instrumentation&type=cs
Instructions in the commit message
Assignee: nobody → dmajor
Attached patch The order files themselves (obsolete) — Splinter Review
These were generated at -O2 without ThinLTO. We can adjust as needed.
Attachment #8964003 - Flags: review?(core-build-config-reviews)
Um, :build won't come up as a reviewer now...
Attachment #8964005 - Flags: review?
I'm sure I'll get the question of whether we can generate these per build, in the same spirit as PGO builds. We probably could. I'm not convinced that it's worth the effort. If we manage to get actual PGO going on clang-cl builds, that will likely make these order files obsolete.
The build account got disabled because something was sending it emails and Bugzilla converted the bounces into a disabled account. The BMO team is aware of it and are going to fix it.
Comment on attachment 8964003 [details] [diff] [review]
The order files themselves

Review of attachment 8964003 [details] [diff] [review]:
-----------------------------------------------------------------

I have a different question: could we just make these available as artifacts for automation to use?  The patch is almost 4MB...that's a lot of stuff to check in for things that--at least for the forseeable future--aren't really going to useful to local developers.
Attachment #8964003 - Flags: review?(core-build-config-reviews)
A fair point!
Discussion today in #build seems to favor doing this the "proper" way: generate the files in the build task itself.

One possibility is to repurpose MOZ_PGO on clang-cl to generate and use order files rather than profiles. We'd need something roughly like:

- During MOZ_PROFILE_GENERATE:
  - Build cygprofile.lib before anything else
    - Requires checking in Chromium's cygprofile.cc
  - Make sure cygprofile.lib is in our lib path
  - Build the tree using the flags in attachment 8963784 [details] [diff] [review]
- During maybe_clobber_profilebuild:
  - Unlike MSVC we'd want to actually clobber here (but retain the order files)
- During MOZ_PROFILE_USE:
  - Link xul.dll with -ORDER:...

The above build work is probably within my capability but I fear it would take me a long time, and I don't think we want to block clang-cl on me flailing around in the build system. Can anyone else take on this bug? froydnj?
Flags: needinfo?(nfroyd)
Depends on: 1470488
(In reply to David Major [:dmajor] from comment #9)
> One possibility is to repurpose MOZ_PGO on clang-cl to generate and use
> order files rather than profiles. We'd need something roughly like:
> 
> - During MOZ_PROFILE_GENERATE:
>   - Build cygprofile.lib before anything else
>     - Requires checking in Chromium's cygprofile.cc

This seems workable. It would certainly be easier if the two build passes were actually separate builds. Maybe we could just do something like build it super early (right after we build the clang-plugin ought to work):
https://dxr.mozilla.org/mozilla-central/rev/681eb7dfa324dd50403c382888929ea8b8b11b00/moz.build#90

Do we need to link this into every shared library or would just linking it into libxul be enough? If we need to link it for everything then we could stick it in a conditional in the `SharedLibrary` template:
https://dxr.mozilla.org/mozilla-central/rev/681eb7dfa324dd50403c382888929ea8b8b11b00/build/templates.mozbuild#71

We might need to invent some new moz.build-ism for "USE_LIBS, but only for MOZ_PROFILE_GENERATE" to make this work properly.

>   - Make sure cygprofile.lib is in our lib path
>   - Build the tree using the flags in attachment 8963784 [details] [diff] [review]

I can't actually find anything about what `-finstrument-functions-after-inlining` does. What's that needed for?

> - During maybe_clobber_profilebuild:
>   - Unlike MSVC we'd want to actually clobber here (but retain the order
> files)

Do we actually need to clobber or just re-link libxul?

> - During MOZ_PROFILE_USE:
>   - Link xul.dll with -ORDER:...

This much should be straightforward enough, we could just set this in PROFILE_USE_LDFLAGS (although we might need to special-case things if we only want it for libxul).
These patches are what I've been working on; the next step is to figure out how
to use the generated order file during the link phase.  I think Ted is correct
insofar as we don't actually have to rebuild everything, we just need to
relink.  It may be more expedient to just rebuild and relink, though, so that
we don't have to remove special cases when clang-cl gets "real" PGO.

Completely untested!
For clang-cl, we want to add code to libxul that only exists during the
PGO generation phase, so we can collect data.  The most expedient way to
do that is to enable certain files in SOURCES to be marked as to only be
compiled during the PGO generation step.
(In reply to (PTO June 23rd-30th) Ted Mielczarek [:ted.mielczarek] from comment #10)
> I can't actually find anything about what
> `-finstrument-functions-after-inlining` does. What's that needed for?

This flag is what enables the necessary instrumentation for generating order files: it'll make the compiler insert code at the top of every function to call __cyg_profile_func_enter.

> Do we actually need to clobber or just re-link libxul?

We'd need to build again without the injected __cyg_profile_func_enter calls.
I'm not sure if I'll actually get to the use-the-order-file bits today, but I am clearly working on this!
Flags: needinfo?(nfroyd)
Assignee: dmajor → nfroyd
Depends on: 1472789
The changes in this version of this patch are:

- We use the correct COMPUTED_ variable in config.mk (:doh:)

- We only attach instrumentation flags to files that are being compiled for
  libxul.  This is a little gross, but I couldn't figure out a good way to
  include the profiling hooks for *all* objects.  Ideas on that score welcome.
Attachment #8987166 - Attachment is obsolete: true
This version of the patch moves the profiling code to mozglue so we don't have
to worry about weird webrtc code that is compiled for libxul, but also built as
standalone webrtc tests.
Attachment #8987167 - Attachment is obsolete: true
This is the patch that brings it all together, where we generate the order file
and use it during the PGO relink.  It works!

I'm not 100% sure that we rebuild all objects with this patch; I think we might
wind up with a libxul with enter/exit profiling hooks and a mozglue with the
profiling hooks exposed.  Will need to experiment more tomorrow.
Trying to eliminate the __cygprofile_* hooks from object and library files turned up some issues.

Turns out that for libraries, we determine the list of objects to link into the library statically at build-the-backend time, and attachment 8989318 [details] [diff] [review] wasn't taking that into account.  So we needed to propagate pgo-generate-onlyness farther into the backend to make libraries (i.e. mozglue.dll) regenerate properly.  And to have differences between what gets compiled at profile-generate time and profile-use time, we had to make use of this support:

https://searchfox.org/mozilla-central/source/python/mozbuild/mozbuild/backend/recursivemake.py#1297-1303
https://searchfox.org/mozilla-central/source/config/config.mk#425-438

for clang-cl as well.  Which makes a certain amount of sense: GCC-style PGO, as we use it, relies on instrumenting the binary, running the binary to get a bunch of information, and then recompiling all the objects to make use of that information and (as a nice side effect) effectively remove the instrumentation.  We're doing the same thing with clang-cl...at least for now.

What's not clear to me is that it looks like we can add PGO profile-generate flags when we're *not* going to force recompilation.  For instance:

https://searchfox.org/mozilla-central/source/config/config.mk#157-174

You might think that, based on that block, that NO_PROFILE_GUIDED_OPTIMIZE can only be not defined or 1.  You would be mistaken!

https://searchfox.org/mozilla-central/source/python/mozbuild/mozbuild/frontend/emitter.py#874

lets us set the variable to a *list* of strings of sources that we are choosing to not optimize.  And we do weird things when computing the list of objects needed for libraries:

https://searchfox.org/mozilla-central/source/python/mozbuild/mozbuild/backend/common.py#218-222

so that the net effect, AFAICT--and assuming I haven't missed some `if GNU_CC` thing I need to modify--is that we compile *lots* of files only once during PGO, and that those files all have profiling instrumentation added.  But that seems like so much nonsense, really; lots of things ought to break if that were the case, and surely somebody would have noticed our official binaries dumping .gcda files all over the place.  I must be misunderstanding something here.
There's a lot of terminology I'm not familiar with here, but if I understand correctly, you're trying to make it so that the correct set of things gets recompiled during the profile-use stage? Why not just clobber and rebuild the world? A bit wasteful but at least it won't miss anything?
The comment in config.mk says it all, really.
Attachment #8989607 - Flags: review?(core-build-config-reviews)
clang-cl-style PGO is more akin to what we have for GCC rather than
MSVC, so we should do a full clobber for maybe_clobber_profiledbuild.
Attachment #8989608 - Flags: review?(core-build-config-reviews)
Just adding the appropriate clang-cl flags at the appropriate points.

Note that these flags are relatively recent (added late last year), so you need
a clang SVN checkout to use them.
Attachment #8989609 - Flags: review?(core-build-config-reviews)
For clang-cl, we want to add code to libxul that only exists during the
PGO generation phase, so we can collect data.  The most expedient way to
do that is to enable certain files in SOURCES to be marked as to only be
compiled during the PGO generation step.

This is a pretty gnarly patch that touches quite a lot, but ideally the idea is
straightforward.
Attachment #8989610 - Flags: review?(core-build-config-reviews)
Attachment #8964003 - Attachment is obsolete: true
Attachment #8964005 - Attachment is obsolete: true
Attachment #8989317 - Attachment is obsolete: true
Attachment #8989318 - Attachment is obsolete: true
Attachment #8989319 - Attachment is obsolete: true
Attachment #8964005 - Flags: review?
Now that we've generated an order file of the first N functions invoked
during startup, let's tell the linker about said functions so it can
cluster them appropriately.
Attachment #8989611 - Flags: review?(core-build-config-reviews)
Want to review these patches out of the common pool? :D
Flags: needinfo?(mh+mozilla)
Attachment #8989607 - Flags: review?(core-build-config-reviews) → review+
Attachment #8989608 - Flags: review?(core-build-config-reviews) → review+
Attachment #8989609 - Flags: review?(core-build-config-reviews) → review+
Comment on attachment 8989610 [details] [diff] [review]
add pgo-generate-only source functionality

Review of attachment 8989610 [details] [diff] [review]:
-----------------------------------------------------------------

::: config/rules.mk
@@ +186,5 @@
>  TARGETS			= $(LIBRARY) $(SHARED_LIBRARY) $(PROGRAM) $(SIMPLE_PROGRAMS) $(HOST_LIBRARY) $(HOST_PROGRAM) $(HOST_SIMPLE_PROGRAMS)
>  endif
>  
> +ifdef MOZ_PROFILE_USE
> +CPPSRCS := $(filter-out $(PGO_GEN_ONLY_CPPSRCS),$(CPPSRCS))

It feels like the other way around would be better: add PGO_GEN_ONLY_CPPSRCS to CPPSRCS when MOZ_PROFILE_GENERATE.

::: mozglue/build/cygprofile.cpp
@@ +44,5 @@
> +  char filename[MAX_PATH];
> +  const char* objdir = ::getenv("MOZ_OBJDIR");
> +
> +  if (!objdir) {
> +    printf("ERROR: cannot determine objdir\n");

fprintf(stderr, ...)?

@@ +52,5 @@
> +  SprintfLiteral(filename, kDumpFile, objdir);
> +
> +  FILE* f = fopen(filename, "w");
> +  if (!f) {
> +    printf("ERROR: Cannot open %s\n", filename);

Likewise.

::: mozglue/build/moz.build
@@ +32,5 @@
>      ]
>  
> +    if CONFIG['MOZ_PGO'] and CONFIG['CC_TYPE'] == 'clang-cl':
> +        SOURCES += ['cygprofile.cpp']
> +        SOURCES['cygprofile.cpp'].pgo_generate_only = True

Oh man, I so wish PGO generate and PGO use were two different builds.
Attachment #8989610 - Flags: review?(core-build-config-reviews)
Comment on attachment 8989611 [details] [diff] [review]
use the order file during re-link with clang-cl

Review of attachment 8989611 [details] [diff] [review]:
-----------------------------------------------------------------

::: Makefile.in
@@ +223,5 @@
>  	$(call BUILDSTATUS,TIER_START pgo_clobber)
>  	$(MAKE) maybe_clobber_profiledbuild
>  	$(call BUILDSTATUS,TIER_FINISH pgo_clobber)
>  	$(call BUILDSTATUS,TIER_START pgo_profile_use)
> +ifdef CLANG_CL

you can do $(MAKE) default MOZ_PROFILE_USE=1 $(if $(CLANG_CL),MOZ_PROFILE_ORDER_FILE=...)
Attachment #8989611 - Flags: review?(core-build-config-reviews)
Flags: needinfo?(mh+mozilla)
Thanks for looking at these.

(In reply to Mike Hommey [:glandium] from comment #26)
> ::: config/rules.mk
> @@ +186,5 @@
> >  TARGETS			= $(LIBRARY) $(SHARED_LIBRARY) $(PROGRAM) $(SIMPLE_PROGRAMS) $(HOST_LIBRARY) $(HOST_PROGRAM) $(HOST_SIMPLE_PROGRAMS)
> >  endif
> >  
> > +ifdef MOZ_PROFILE_USE
> > +CPPSRCS := $(filter-out $(PGO_GEN_ONLY_CPPSRCS),$(CPPSRCS))
> 
> It feels like the other way around would be better: add PGO_GEN_ONLY_CPPSRCS
> to CPPSRCS when MOZ_PROFILE_GENERATE.

It seemed harder to me to handle this in the emitter, but taking a second look, maybe it's not so bad.  I'll give it a shot.

> ::: mozglue/build/cygprofile.cpp
> @@ +44,5 @@
> > +  char filename[MAX_PATH];
> > +  const char* objdir = ::getenv("MOZ_OBJDIR");
> > +
> > +  if (!objdir) {
> > +    printf("ERROR: cannot determine objdir\n");
> 
> fprintf(stderr, ...)?

Bah, bad merges from my Windows machine.  Yes, these should be using stderr.
Now with things added to CPPSRCS per glandium's suggestion, and better errors
in cygprofile.cpp.
Attachment #8989843 - Flags: review?(mh+mozilla)
Attachment #8989610 - Attachment is obsolete: true
With some small tweaks to Makefile.in.
Attachment #8989844 - Flags: review?(mh+mozilla)
Attachment #8989611 - Attachment is obsolete: true
Comment on attachment 8989843 [details] [diff] [review]
add pgo-generate-only source functionality

Review of attachment 8989843 [details] [diff] [review]:
-----------------------------------------------------------------

I feel like mshal or chmanchester should look at this as well.
Attachment #8989843 - Flags: review?(mshal)
Attachment #8989843 - Flags: review?(mh+mozilla)
Attachment #8989843 - Flags: review?(cmanchester)
Attachment #8989843 - Flags: review+
Attachment #8989844 - Flags: review?(mshal)
Attachment #8989844 - Flags: review?(mh+mozilla)
Attachment #8989844 - Flags: review?(cmanchester)
Attachment #8989844 - Flags: review+
Depends on: 1473954
Comment on attachment 8989843 [details] [diff] [review]
add pgo-generate-only source functionality

Thanks for looping me in. I don't see any major issues with these patches as far as the tup backend goes. We don't currently support Windows and/or PGO there, so these are basically a no-op for now, and we'll have to figure it out along with the rest of PGO when we get to that point.

However, it would be helpful if the tup backend doesn't break in the process due to the extra return value from _expand_libs(). We can just ignore the pgo_gen_objs for now, with something like:

-        objs, _, shared_libs, os_libs, static_libs = self._expand_libs(shlib)
+        objs, _, _, shared_libs, os_libs, static_libs = self._expand_libs(shlib)
Attachment #8989843 - Flags: review?(mshal) → feedback+
Attachment #8989844 - Flags: review?(mshal) → feedback+
Thanks mshal! I'll make that change to tup.py and land this on behalf of Nathan.
Attachment #8989843 - Flags: review?(cmanchester)
Attachment #8989844 - Flags: review?(cmanchester)
Pushed by dmajor@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/465d19904f09
Make clang-cl work with OBJS_VAR_SUFFIX as well; r=gps
https://hg.mozilla.org/integration/mozilla-inbound/rev/7aacbb4551be
Perform an actual clobber for profiledbuild with clang-cl; r=gps
https://hg.mozilla.org/integration/mozilla-inbound/rev/e96225163a76
Generate profiling instrumentation for clang-cl in PGO builds; r=glandium
https://hg.mozilla.org/integration/mozilla-inbound/rev/eb28d28e0071
Add pgo-generate-only source functionality; r=glandium
https://hg.mozilla.org/integration/mozilla-inbound/rev/1a27b17ba6a6
use the order file during re-link with clang-cl; r=glandium
Depends on: 1475444
Depends on: 1492149
Version: Version 3 → 3 Branch
Blocks: 1632542
You need to log in before you can comment on or make changes to this bug.