Open Bug 1043140 Opened 10 years ago Updated 2 years ago

configure takes way too long on Windows (6 minutes, actual build takes 18)

Categories

(Firefox Build System :: General, defect)

x86
Windows 8.1
defect

Tracking

(Not tracked)

People

(Reporter: vlad, Unassigned)

References

(Depends on 2 open bugs)

Details

Attachments

(1 file)

We really need a replacement for configure on windows.  Running configure takes *6 minutes* on this laptop; after it finishes, a full build takes only 18.  Taken together, running configure -- for which the results will be 99% identical from invocation to invocation on Windows -- takes up 25% of the build time.  This might even be a significant issue on the tbpl build machines as well.
> Running configure takes *6 minutes* on this laptop

How long does it take if you don't clobber?
There are low-hanging fruits that can improve the situation substantially without replacing configure (which is a huge endeavour)
Depends on: 1043265, 1043268, 1043262
(In reply to Mike Hommey [:glandium] from comment #1)
> > Running configure takes *6 minutes* on this laptop
> 
> How long does it take if you don't clobber?

Something else I'd be interested in, is to know what kind of hardware you have. Because the machine I use to do windows builds does a full build in 40 minutes and only takes 3 minutes for configure (starting from a fresh clone and no objdir).
(In reply to comment #3)
> (In reply to Mike Hommey [:glandium] from comment #1)
> > > Running configure takes *6 minutes* on this laptop
> > 
> > How long does it take if you don't clobber?
> 
> Something else I'd be interested in, is to know what kind of hardware you have.
> Because the machine I use to do windows builds does a full build in 40 minutes
> and only takes 3 minutes for configure (starting from a fresh clone and no
> objdir).

I have a beefy 24 core Windows machine, and a crappy Windows VM on my MBP, and configure takes around 6 minutes on a clobber build on both of them.
So why is replacing configure considered hard? I assume most configure output on Windows looks eerily similar. We could just keep a template around and replace a few placeholders in it to adjust for differenting header paths, etc. Is that super-complicated?
That was my thinking, but there are also a bunch of sub-configures.. and we'd have to handle the various configuration options that we have (that mostly twiddle #defines, but still).  That's probably the hardest part; splitting out the things that are "gecko build configuration flags" from "detecting system build environment".  The latter part is probably 95% of what the configure step does, but it's all mixed together.
We already skip most of the platform check foo on Windows. And, yes, MozillaBuild likely means a lot of the tool checks could be hardcoded as well.

But there is logic that runs depending on configure flags, mozconfig settings, etc.

Configure is inherently shell based. That means Windows is going to suck (due to new process overhead). Period. Even if we put configure on a massive diet (which we've already done to some degree), we still have configure.

Long-term, we want to move to something that isn't shell-based. glandium has ideas for integrating "configuration" directly into moz.build evaluation. e.g. we define a mapping of AC_SUBST variables to Python functions and we don't evaluate the variable unless moz.build needs it. I think we can agree that lazy evaluation would be good for everyone. Perhaps we can start this by adding this support in and not adding new AC_SUBST-only variables to configure.

We can also do things like start to rewrite parts of configure in Python. e.g. we should batch all of the "is program available" checks into a single Python program invocation that does all the stat(), etc in one process, as native system calls. We already have precedent for "intercepting" some of the built-in m4 macros and converting them to Python. It's an ugly hack, but a hack that works.

The low-hanging fruit bugs have been filed. I think we should see where those get us. We should also look at defining AC_SUBST only variables in Python so we don't introduce new checks to configure.
(In reply to Gregory Szorc [:gps] from comment #7)
> We already skip most of the platform check foo on Windows. And, yes,
> MozillaBuild likely means a lot of the tool checks could be hardcoded as
> well.

Can you elaborate a bit on the "we already skip most of the platform checkk foo" part?  I'm not really seeing this, but maybe the issue is all the sub-configures don't skip it?
(In reply to comment #8)
> (In reply to Gregory Szorc [:gps] from comment #7)
> > We already skip most of the platform check foo on Windows. And, yes,
> > MozillaBuild likely means a lot of the tool checks could be hardcoded as
> > well.
> 
> Can you elaborate a bit on the "we already skip most of the platform checkk
> foo" part?  I'm not really seeing this, but maybe the issue is all the
> sub-configures don't skip it?

Yes.  The majority of those 6 minutes is taken by sub-configures (5+ minutes.)
Would it be reasonable to try to eliminate as many of the subconfigures as possible?  We generally always build them in "the same way"; we should be able to create moz.build files with some DEFINES/CFLAGS/etc. and be fine.
(In reply to Vladimir Vukicevic [:vlad] [:vladv] from comment #10)
> Would it be reasonable to try to eliminate as many of the subconfigures as
> possible?  We generally always build them in "the same way"; we should be
> able to create moz.build files with some DEFINES/CFLAGS/etc. and be fine.

I think bug 1043268 and selective clobbering (other two bugs blocking this) will do a sufficient job of improving the situation.

Personally, I want the build system to just download a pre-built ICU, etc from automation and silently incorporate this into the build. My ultimate goal is to have a similar build mode for Firefox/JS-centric developers that downloads a pre-built libxul. It is grossly inefficient to make Firefox developers sit through 10 minutes of C++ compilation when they aren't touching C++. In some circles, this is known as a "build artifact cache." Facebook and Google both run them. Facebook has publicly said that their cache hit rate is 97%. But that's with a more advanced-than-make build system that knows how to do that. For now, I want to bolt something on top of the hg/git commit SHA-1 + `hg status` results. AFAIK nobody is staffed to work on this.
> that their cache hit rate is 97%

They must have a hell of an imposed build environment. Everyone building in the same path with the same configuration and the same compiler.
Note that if you use the same paths and toolchain as automation, you could use sccache and pull from our s3 storage.
(In reply to Mike Hommey [:glandium] from comment #12)
> > that their cache hit rate is 97%
> 
> They must have a hell of an imposed build environment. Everyone building in
> the same path with the same configuration and the same compiler.

Well, when you work on software that runs behind a corporate firewall, it's an easy decision to make: why would you want to support any more than N>1 platforms and configurations?

If Mozilla wanted to move faster, we could quasi-deprecate all build configurations that aren't what's used in official automation and Firefox developers could see similar wins.
Dockerize all the things or have the build system transparently build within a chroot or LXC so paths are normalized to the same as automation.
(In reply to Gregory Szorc [:gps] from comment #15)
> Dockerize all the things or have the build system transparently build within
> a chroot or LXC so paths are normalized to the same as automation.

Windows
Hell, most developers are not even using the same MSVC version as automation.
(In reply to Mike Hommey [:glandium] from comment #17)
> Hell, most developers are not even using the same MSVC version as automation.

Sure, but there's a pretty finite set of MSVC versions that they are doing.  If it would help builds, there's no reason why we can't throw up additional builds that happen a few times a day with the various versions.  Ideally though it would be easy to set up one of these cache machines in any location -- downloading all the debug info over the network doesn't seem like an efficient use of bandwidth.  But now we're going a little beyond the scope of this bug.
(In reply to Mike Hommey [:glandium] from comment #1)
> > Running configure takes *6 minutes* on this laptop
> 
> How long does it take if you don't clobber?

3 min 45 seconds.  (Tested by running configure in a new build dir, then running configure again.)

Better, but ideally it should be <1 min in the clobber case.
My shitty Windows machine also takes 3 minutes to configure from clobber.

I recall others saying that configure inside virtual machines tends to be much worse than native.
BTW, what's the total wall time reported after Reticulating splines, on your machines?
Actually, could you just all paste the output after Reticulating splines up to the end when you run mach configure
Attached file configure-log.txt
Output from:

  time (MOZCONFIG=../fc/debug-64 ../mozilla-central/configure | gawk '{print strftime("%H:%M:%S ",systime()) $0 }') | tee -a configure-log.txt

Final time was 4 minutes, this is on my (very) beefy desktop at home (SSD, Haswell, 32GB), and after I'd run configure twice before (though clean dir, nothing cached).
The breakdown, per-configure:

(seconds)
 30 toplevel
 55 ICU
 67 libffi
 48 jemalloc
 20 nsprpub
 12 js/src

So basically if we got rid of building ICU, libffi, and jemalloc using configure and just created direct moz.build and config.h files for them, we'd save ~75% of the total configure time.
In case you weren't aware, I approached some privacy folk a few months back about adding "metrics" to mach and the build system. The intent is to see what users in the wild are experiencing so we can target improvements to build and UX intelligently. I have preliminary privacy sign-off. Just need someone to write the code.
(In reply to Vladimir Vukicevic [:vlad] [:vladv] from comment #23)
> Created attachment 8463791 [details]
> configure-log.txt
> 
> Output from:
> 
>   time (MOZCONFIG=../fc/debug-64 ../mozilla-central/configure | gawk '{print
> strftime("%H:%M:%S ",systime()) $0 }') | tee -a configure-log.txt
> 
> Final time was 4 minutes, this is on my (very) beefy desktop at home (SSD,
> Haswell, 32GB), and after I'd run configure twice before (though clean dir,
> nothing cached).

Try again without a clean dir.
Depends on: 1091505
Product: Core → Firefox Build System
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: