Closed Bug 903118 Opened 11 years ago Closed 11 years ago

Automation needs custom environment variables set during make

Categories

(Firefox Build System :: General, defect)

defect
Not set
major

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 911634
mozilla26

People

(Reporter: mozilla, Assigned: mshal)

References

Details

Attachments

(3 files, 2 obsolete files)

The way we point to vs2010 is via mozconfig env vars:
http://hg.mozilla.org/releases/mozilla-beta/file/e7215f239d6a/build/win32/mozconfig.vs2010

However, these are only used in configure, and not during make commands.
:ted and :mshal found that using AC_SUBST will likely export those env vars in the makefiles.


This is breaking:
* windows l10n release repacks. We have hardcoded the env in the script, which is not a great long term solution, but we have a workaround.
* desktop b2g windows builds, per :Mossop
* unknown, but most likely anything using vs10 has a suspect make env... aiui that's the majority of our windows builds.


For more context: yes, vs8 and/or vs9 are in the env, but we need a way to specify a different version of Visual Studio, or we have to split the pool.

Splitting the pool would:
* slow down windows builds even further, due to significantly less parallelization
* make testing a new version on Try extremely time+effort prohibitive
* require significant amounts of lead time to switch compilers.  Think of the current amount of time, and then add hardware purchasing (including approval time), shipping, racking/imaging, creating a whole new machine type in puppet and buildbot-configs, and cross-team coordination on top of that.  We'd also have to make sure to change which pool each branch builds on during every merge day, which is not as seamless as riding the trains via code merges.

Specifying via mozconfig allows us to skip all of that.
I thought you could define environment variables in buildbot/mozharness jobs. Why are you throwing out splitting the pool as the alternative here?
Changing bug summary to not imply a specific solution, as there are many on the table.
Summary: mozconfig env vars are only used in configure → Automation needs custom environment variables set during make
aki, can you confirm that the 'make' process that was breaking is not using client.mk? There seem to be 3 different places the environment variables from mozconfig can be used:

1) During configure
2) During a make using client.mk
3) During a subdirectory make (not using client.mk)

If the mozconfig file has:

export INCLUDE=...

Then we get INCLUDE in part 1, but not 2 or 3. When we chatted in IRC, I thought this is what was going on, but I was mistaken. I missed some crucial lines in the rest of the mozconfig.

The build/win32/mozconfig.vs2010 also does the equivalent of:

mk_add_options "export INCLUDE=..."

(via mk_export_correct_style() in mozconfig.vs2010-common). With this, we get INCLUDE in part 1 and 2, but not 3. So if your build is doing a sub-make, it would not be getting the export, since that only gets included from client.mk.

I believe we can address this case by adding a check in rules.mk, and pulling in mozconfig if it hasn't already been handled by client.mk

Or, we can use AC_SUBST(INCLUDE) in configure.in, along with explicit 'export INCLUDE' lines in rules.mk to white-list specific environment variables to be passed along. If we do that, I would think we'd want to remove the mozconfig->client.mk importing step. I'm missing some historical context here on how mozconfig is normally used & supposed to interact with make, though, so that might not be a good idea.
Flags: needinfo?(aki)
Attached patch config-environment.patch (obsolete) — Splinter Review
Ted, does this look like a sane approach? The logic is just straight copied from client.mk (with TOPSRCDIR -> topsrcdir), and works to get environment variables imported from mozconfig using 'mk_add_options "export FOO=bar"' even in a subdirectory build (case 3).
Attachment #787809 - Flags: feedback?(ted)
(In reply to Michael Shal [:mshal] from comment #3)
> aki, can you confirm that the 'make' process that was breaking is not using
> client.mk? There seem to be 3 different places the environment variables
> from mozconfig can be used:
> 
> 1) During configure
> 2) During a make using client.mk
> 3) During a subdirectory make (not using client.mk)

For the desktop b2g builds at least the problem is 3, during the multilocale part of the build it runs |make.py -C b2g/app| and hits problems.
Blocks: 879370
Yes, IIRC 3) was the issue we were hitting (make in tier_base or tier_nspr without going through client.mk).
Flags: needinfo?(aki)
Comment on attachment 787809 [details] [diff] [review]
config-environment.patch

Review of attachment 787809 [details] [diff] [review]:
-----------------------------------------------------------------

I'm not wild about this, mostly because it adds overhead to every invocation of make in a directory. I think what we probably want to do is just AC_SUBST the values we care about and then export them if necessary in config.mk or somewhere.

I don't know why we didn't do that in the first place, I suspect we simply didn't consider this use case (and it didn't break anything up until now).

That ought to be a pretty simple patch, I *think* we can get away with just exporting LIB/INCLUDE, since the toolchain ought to be picked up with a full path if PATH is set in configure.
Attachment #787809 - Flags: feedback?(ted) → feedback-
As a quick fix, can we just include a shell script in the tree and modify the job configs to source that file before invoking make?
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #7)
> Comment on attachment 787809 [details] [diff] [review]
> config-environment.patch
> 
> Review of attachment 787809 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I'm not wild about this, mostly because it adds overhead to every invocation
> of make in a directory.

I don't think there is much overhead - the mozconfig2client-mk script exports FOUND_MOZCONFIG, so the 'ifndef FOUND_MOZCONFIG' causes the mozconfig lines to be imported only once (whether in client.mk, or the first time rules.mk is included). For example, a build in obj/xpcom will load mozconfig during the xpcom/ make, which puts FOUND_MOZCONFIG and the mozconfig exports (INCLUDE, LIB) in the environment. The sub-makes now have those variables in the environment, so they don't read mozconfig again.

> I think what we probably want to do is just AC_SUBST
> the values we care about and then export them if necessary in config.mk or
> somewhere.
> 
> I don't know why we didn't do that in the first place, I suspect we simply
> didn't consider this use case (and it didn't break anything up until now).
> 
> That ought to be a pretty simple patch, I *think* we can get away with just
> exporting LIB/INCLUDE, since the toolchain ought to be picked up with a full
> path if PATH is set in configure.

I thought this would be easy too, but I'm finding the Windows vs. msys paths quite confusing. The default mozconfigs (eg: build/win32/mozconfig.vs2010) use msys paths. However, if we just AC_SUBST/export those values, then cl fails when pymake kicks it off because it needs Windows-style paths. This is hacked around currently by using mk_export_correct_style() in build/mozconfig.vs2010-common, but that only makes those paths available to client.mk, so it doesn't help here.

One way I was able to get things to work is to use the cmd.exe /c hack (from mk_export_correct_style) in configure along with AC_SUBST and an export in config.mk. The paths in mozconfig files are still msys-style, but the value that ends up in config.status is a Windows path. The Windows path is exported by config.mk, so that works even for sub-directory builds.
(In reply to Gregory Szorc [:gps] from comment #8)
> As a quick fix, can we just include a shell script in the tree and modify
> the job configs to source that file before invoking make?

aki, is this a possible solution?
Flags: needinfo?(aki)
I think this is what's necessary for AC_SUBST to work. The mozconfig just needs to export the variable using msys-style paths (as is done currently), but the mk_export_correct_style() function is not needed anymore.

I should note that this patch only adds the 4 Windows variables to the 'export' list in config.mk. Do we want to export PATH as well? That is currently exported by mozconfig.vs2010, so it would probably be needed to support what aki was trying to do. Just seems dangerous :)
Attachment #789260 - Flags: feedback?(ted)
(In reply to Michael Shal [:mshal] from comment #10)
> (In reply to Gregory Szorc [:gps] from comment #8)
> > As a quick fix, can we just include a shell script in the tree and modify
> > the job configs to source that file before invoking make?
> 
> aki, is this a possible solution?

I think it's possible, but we would have to source the file at the beginning of every buildbot step, rather than at the beginning of the job, since buildbot resets env for every step.

If the file doesn't exist everywhere (e.g. rides the trains), we could be looking at a whole lot of ugliness... a mini shell script per step to look for the file, source it, and then proceed.

If we needed this for linux at any point, then the sourcing would also have to work with mock, which would be even more complex to implement.

Possible solution, probably.  Good solution, I'm not very sure about.
Flags: needinfo?(aki)
Not blocking 24.0b1 anymore
No longer blocks: 902112
Comment on attachment 789260 [details] [diff] [review]
AC_SUBST/export patch

Review of attachment 789260 [details] [diff] [review]:
-----------------------------------------------------------------

I don't feel like we ought to have to save $PATH, but looking at $objdir/config/autoconf.mk on my Windows machine shows that we aren't saving the full path to CC/CXX (they're just "cl"), so we either have to go that route or fix configure to save full paths for the compiler (LD as well).

::: config/config.mk
@@ +556,5 @@
> +ifeq ($(OS_ARCH),WINNT)
> +export LIB
> +export LIBPATH
> +export INCLUDE
> +export WIN32_REDIST_DIR

This last one isn't needed, it's only used in our build system:
http://mxr.mozilla.org/mozilla-central/source/build/win32/Makefile.in#24

::: configure.in
@@ +8783,5 @@
> +if test "${OS_TARGET}" = "WINNT"; then
> +# Convert from msys paths to Windows-style paths.
> +INCLUDE=$(cmd.exe //c echo %INCLUDE%)
> +LIB=$(cmd.exe //c echo %LIB%)
> +LIBPATH=$(cmd.exe //c echo %LIBPATH%)

Pretty horrible, but no more horrible than the rest of this, I guess.
Attachment #789260 - Flags: feedback?(ted) → feedback+
Assignee: nobody → mshal
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #14)
> Comment on attachment 789260 [details] [diff] [review]
> AC_SUBST/export patch
> 
> Review of attachment 789260 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I don't feel like we ought to have to save $PATH, but looking at
> $objdir/config/autoconf.mk on my Windows machine shows that we aren't saving
> the full path to CC/CXX (they're just "cl"), so we either have to go that
> route or fix configure to save full paths for the compiler (LD as well).

I'm starting to have second thoughts on the AC_SUBST approach, mostly because it changes the nature of how the environment interacts with the build process in a fairly significant way. Specifically, if we don't define INCLUDE/LIB/etc in the mozconfig but expect to have it come from the environment, we need to re-run configure to pick up environment changes. I also think is a bit confusing if some environment changes require  a re-configure, but others (like PATH) do not. I feel it is a bit risky, since it has a potential to impact both developers and automation in unforeseen ways. (At least, unforeseen by me :)

Can we revisit the config-environment approach? This just makes sure that subdirectory builds pick up the same mozconfig environment as client.mk builds, but otherwise doesn't change behavior. I think this fixes the original problem with the mozconfig environment not being sourced, with hopefully minimal side-effects.
Updated patch to source mozconfig in sub-directory makes.
Attachment #787809 - Attachment is obsolete: true
Seems like it's about the same either way, honestly. gps: do you have an opinion here?
Flags: needinfo?(gps)
I think I prefer the AC_SUBST approach, because an environment change can affect the build in subtle ways. This very bug is proof. There are many other ways in which environment can affect the build, and it's usually safer to re-configure.
If a PATH change post-configure impacts the build, I think that's a bug in configure. Are there any cases where we wouldn't want to abspath all paths during configure so we aren't susceptible to PATH changes?
Flags: needinfo?(gps)
(In reply to Mike Hommey [:glandium] from comment #18)
> I think I prefer the AC_SUBST approach, because an environment change can
> affect the build in subtle ways. This very bug is proof. There are many
> other ways in which environment can affect the build, and it's usually safer
> to re-configure.

This bug is not about an environment change, it is about exports in mozconfig getting pulled in to make only if client.mk is used. Since some of the automation skips client.mk and builds in a subdirectory, the mozconfig exports are unused.
glandium, is this the right approach for getting full paths saved from configure?

The problem I'm having with it in Windows is that my cl.exe is in a path with a space in it, so configure fails trying to run some tests after finding the compiler (/c/Program: No such file or directory). I'm also a bit concerned that it is an MSYS path - when I skip trying this in configure and just put a /c/... path manually into config.status for CXX, I can no longer compile files. I have to use c:/Program\ Files\ \(x86\)/.../cl.exe for it to work.
Attachment #791360 - Flags: feedback?(mh+mozilla)
.(In reply to Michael Shal [:mshal] from comment #21)
> Created attachment 791360 [details] [diff] [review]
> 
> The problem I'm having with it in Windows is that my cl.exe is in a path
> with a space in it, so configure fails trying to run some tests after
> finding the compiler (/c/Program: No such file or directory). I'm also a bit
> concerned that it is an MSYS path - when I skip trying this in configure and
> just put a /c/... path manually into config.status for CXX, I can no longer
> compile files. I have to use c:/Program\ Files\ \(x86\)/.../cl.exe for it to
> work.

I'm likely missing something, but why are the MSYS paths that configure is generated not properly escaped? With a properly escaped MSYS path, I can run cl:
$ /c/Program\ Files\ \(x86\)/Microsoft\ Visual\ Studio\ 10.0/VC/bin/cl.exe
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 16.00.30319.01 for 80x86
Copyright (C) Microsoft Corporation.  All rights reserved.

usage: cl [ option... ] filename... [ /link linkoption... ]

This seems like a deeper problem than what you were originally trying to fix in this bug, but while we're in here it seems like it ought to be solved as well.
(In reply to Clint Talbert ( :ctalbert ) from comment #22)
> I'm likely missing something, but why are the MSYS paths that configure is
> generated not properly escaped? With a properly escaped MSYS path, I can run
> cl:
> $ /c/Program\ Files\ \(x86\)/Microsoft\ Visual\ Studio\ 10.0/VC/bin/cl.exe
> Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 16.00.30319.01 for
> 80x86
> Copyright (C) Microsoft Corporation.  All rights reserved.
> 
> usage: cl [ option... ] filename... [ /link linkoption... ]
> 
> This seems like a deeper problem than what you were originally trying to fix
> in this bug, but while we're in here it seems like it ought to be solved as
> well.

Thanks for the feedback - I'm not sure how to correctly escape the paths from configure. I can also run cl on the command-line with an msys path, but when I try to set that as CXX in config.status, pymake (or cl.py?) has trouble actually running it. So if we can get the escaped paths from configure, I think the msys path will still give us trouble. Specifically, I'm doing:

1) ./mach build
2) edit objdir/config.status
change:
 (''' CXX ''', r''' cl '''),
to:
 (''' CXX ''', r''' /c/Program\ Files\ \(x86\)/Microsoft\ Visual\ Studio\ 11.0/VC/BIN/cl.exe '''),

(hand-typed, just a heads up in case there are any typos).

3) ./config.status
4) ./mach build netwerk/cookie

And it fails to find cl (WindowsError: [Error 2] The system cannot find the file specified)

If instead I change the '/c/Program' to 'c:/Program' and re-run config.status, then pymake/cl.py can find cl.exe.
Comment on attachment 791360 [details] [diff] [review]
WIP - Save full path of tools (CC, LD, etc.)

Nevermind on the feedback for this. According to vlad, just running cl.exe with a full path but with the wrong PATH will cause cl to try to load the wrong dlls. So we would still need to AC_SUBST/export PATH for this to work.
Attachment #791360 - Flags: feedback?(mh+mozilla)
Any news on this? This is blocking bug 879370 which is causing sheriff pain.
(In reply to Ed Morley [:edmorley UTC+1] from comment #25)
> Any news on this? This is blocking bug 879370 which is causing sheriff pain.

We aren't able to reproduce the releng issue easily, so I don't know if any of these patches actually fix this bug. I'm trying to reproduce 879370 to see if the patches work there - I should have some more info today.
(In reply to Michael Shal [:mshal] from comment #26)
> (In reply to Ed Morley [:edmorley UTC+1] from comment #25)
> > Any news on this? This is blocking bug 879370 which is causing sheriff pain.
> 
> We aren't able to reproduce the releng issue easily, so I don't know if any
> of these patches actually fix this bug. I'm trying to reproduce 879370 to
> see if the patches work there - I should have some more info today.

Is the patch in comment 24 the latest version? I can probably test it today
(In reply to Dave Townsend (:Mossop) from comment #27)
> (In reply to Michael Shal [:mshal] from comment #26)
> > (In reply to Ed Morley [:edmorley UTC+1] from comment #25)
> > > Any news on this? This is blocking bug 879370 which is causing sheriff pain.
> > 
> > We aren't able to reproduce the releng issue easily, so I don't know if any
> > of these patches actually fix this bug. I'm trying to reproduce 879370 to
> > see if the patches work there - I should have some more info today.
> 
> Is the patch in comment 24 the latest version? I can probably test it today

I was able to reproduce the b2g issue locally using the instructions you sent. Specifically, if I have the environment setup for VS9 and the mozconfig setup for VS10, running pymake in objdir/b2g/app results in the compiler failing to find stdint.h. After applying the clientmk patch, it compiles successfully. I have not been able to get the ACSUBST or the other WIP patches to work correctly.
Here's the latest patch for sourcing mozconfig environment variables in submakes the same way they are sourced from client.mk. There seem to be a few misconceptions about this patch:

1) Because INCLUDED_MOZCONFIG_MK is exported to the environment, the logic to read mozconfig is only executed once for the whole make process tree. So if client.mk pulls in mozconfig, no subdirs need to read mozconfig. If we run make in objdir/xpcom, only the top-level make pulls in mozconfig, not xpcom/base, xpcom/ds, etc. So there is no per-directory performance impact.

2) For both this bug, and for bug 879370, the environment is not changing between make invocations. The problem is that the environment is defined in mozconfig, but only client.mk sources mozconfig. Both of these bugs involve running make in a subdirectory, which skips over the mozconfig logic. This patch just makes mozconfig sourcing consistent no matter where make is run.

Having environment changes require a re-configure is not required for either of these bugs. I also haven't been able to get the AC_SUBST method working correctly in Windows due to all the path funkiness around msys & Windows paths in both configure & pymake. If we want to better support changing environments, I recommend filing a separate bug since it isn't directly related to 903118 or 879370.
Attachment #790352 - Attachment is obsolete: true
Attachment #797024 - Flags: review?(gps)
Current try results: https://tbpl.mozilla.org/?tree=Try&rev=0f0d8c54b1b2

Not sure what is up with the segfault in OSX, but it disappeared the second time.
We hit periodic segfault while building ICU on a pretty regular basis.
Comment on attachment 797024 [details] [diff] [review]
0001-Bug-903118-set-mozconfig-environment-variables-for-s.patch

Review of attachment 797024 [details] [diff] [review]:
-----------------------------------------------------------------

This patch is technically correct. But it terrifies me.

AFAICT this patch effectively amounts to a giant hack to work around badness elsewhere. Furthermore, it gives the automation configs a "back door" to "monkeypatch" the build system config. Since these automation configs frequently get checked in without build peer review (perhaps we should change that), I'm somewhat worried about what abuse may follow from this new freedom. We've already seen gross hacks and I worry this will just breed the culture of more gross hacks.

I also worry about the impact this will have on local developer builds. We've lived without this hack for so long and nobody is complaining. I'd feel much better if this hack were only enabled where it is required. As a mitigation against adverse effects, can we put the inclusion of mozconfig.mk behind an ifdef in rules.mk and only set that variable from release automation? I'll grant r+ once that question is answered (hopefully a yes and accompanied with a corresponding patch).
Attachment #797024 - Flags: review?(gps) → feedback+
(In reply to Gregory Szorc [:gps] from comment #32)
> Comment on attachment 797024 [details] [diff] [review]
> 0001-Bug-903118-set-mozconfig-environment-variables-for-s.patch
> 
> Review of attachment 797024 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> This patch is technically correct. But it terrifies me.
> 
> AFAICT this patch effectively amounts to a giant hack to work around badness
> elsewhere. Furthermore, it gives the automation configs a "back door" to
> "monkeypatch" the build system config. Since these automation configs
> frequently get checked in without build peer review (perhaps we should
> change that), I'm somewhat worried about what abuse may follow from this new
> freedom. We've already seen gross hacks and I worry this will just breed the
> culture of more gross hacks.

There is no new freedom from this patch. The only badness it works around is the fact that mozconfig is sourced for environment variables from a top-level make, but not a subdirectory make.

If you want to remove the ability to specify the environment in a mozconfig, I think there should be a separate bug & discussion around that.

> 
> I also worry about the impact this will have on local developer builds.
> We've lived without this hack for so long and nobody is complaining. I'd
> feel much better if this hack were only enabled where it is required. As a
> mitigation against adverse effects, can we put the inclusion of mozconfig.mk
> behind an ifdef in rules.mk and only set that variable from release
> automation? I'll grant r+ once that question is answered (hopefully a yes
> and accompanied with a corresponding patch).

It is required anytime:

1) Environment variables are defined in mozconfig, and
2) Make is executed in a subdirectory

I see no reason why we'd want sub-directory makes not to source the mozconfig for environment variables when a top-level make does.

In the tree right now, client.mk already sources mozconfig for environment variables. This patch doesn't grant any extra capabilities that aren't already there.
I would argue that the goal is for the entire configuration to be encapsulated at configure time and then any subsequent |make| (not going through client.mk) will get that "build config." Any modification to the environment variables (with few execptions - for things like controlling build output) should have no impact on the build.

We are currently so far from this ideal. It's probably not worth worrying about today. Let's land this and fix it later.
Attachment #797024 - Flags: feedback+ → review+
https://hg.mozilla.org/mozilla-central/rev/6fe5a446b775
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla26
This was backed out and re-fixed, differently, in bug 911634
https://hg.mozilla.org/integration/mozilla-inbound/rev/88f62538d582
Resolution: FIXED → DUPLICATE
Product: Core → Firefox Build System
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: