Closed Bug 1355464 Opened 3 years ago Closed 3 years ago
Stylo build broken on Windows
In bug 1350001 I made the `cargo build` steps of gecko run in a "clean environment" when being run from a MozillaBuild shell. This was desirable to fix cross-compilation build failures (32-bit builds on 64-bit machines) which resulted from cargo trying to use the 32-bit MSVC env vars set in vcvars.bat to do a 64-bit "host" compile for build.rs files. The clean environment removed the env vars set by vcvars.bat and allowed cargo do use its built-in magic and fixed the problem. However the clean environment also removes all the other env vars - some of which, it turns out, are needed for the stylo build. So now doing a stylo build from a MozillaBuild shell doesn't work. To reproduce one variant of the problem, put this in your mozconfig: ac_add_options --enable-stylo ac_add_options --disable-stylo-build-bindgen ac_add_options --target=x86_64-pc-mingw32 ac_add_options --host=x86_64-pc-mingw32 and do a build. This fails because it can't find the python executable. I had a patch to fix this in attachment 8856510 [details] [diff] [review]. However, there is another variant of the problem, when you try to do a stylo build with bindgen (I haven't tried to reproduce this yet, but presumably you need to remove the --disable-stylo-build-bindgen above). In this scenario, it fails because "the Stylo bindgen build script crashed because we are resetting some environment that means it fails to find certain Win 10 SDK files. It seems like we actually depend on the environment setup by vcvars." (says Brian, bug 1350001 comment 28).
I grabbed a dump of all the environment variables on my system using `env -i` at the point that the cargo library build runs, and am attaching them here. Ideally we can carve out a subset of these that we either leave in or take out when doing a cargo build such that it will allow both the webrender and the stylo builds to work. We know that leaving everything in will fix stylo and break webrender, and taking everything out will fix webrender and break stylo.
I wonder, if vcvars.bat already says that it should compile to 32bit, why does Rust decide to do something different? It sounds like something should be fixed by adding argument to cargo to specify the toolchain rather than blindly striping all the variables. Maybe we just need variables like INCLUDE and maybe LIBPATH, but the solution in bug 1350001 really sounds like a terrible hack, especially given that we rely on passing environment variables in to affect the build script in several other ways, e.g. we sometimes pass STYLO_BUILD_LOG var to enable logging of bindgen to analyse issues there. Adding all those variables into a whitelist and only for Windows doesn't seem to be something sensible...
(In reply to Xidorn Quan [:xidorn] UTC+10 (less responsive 15/Apr-3/May) from comment #2) > I wonder, if vcvars.bat already says that it should compile to 32bit, why > does Rust decide to do something different? It sounds like something should > be fixed by adding argument to cargo to specify the toolchain rather than > blindly striping all the variables. If we can get cargo to fix this problem upstream that would be great. However it will still take time to reach a stable version of Rust/cargo, so we need a workaround in our build system regardless. I'm doing a build now with bindgen enabled to try and reproduce this problem.
For the record the exact build failure I'm getting is that clang (when running part of the cargo build) can't find corecrt.h. This is #include'd from the MSVC header, but actually lives in the Program Files (x86)\Windows Kits\... dir. That dir is in the INCLUDE environment variable that gets stripped out. I think the way forward is to just clear out PATH, LIB, and LIBPATH when doing the cargo build (and add PYTHON). That seems to work for me when doing both stylo and webrender builds. I looked at the various vcvars.bat files and those are the three variables that are different when doing 32-bit vs 64-bit builds, and so those are the variables that we need to clear out at minimum to fix bug 1350001.
Comment on attachment 8857059 [details] Bug 1355464 - Only clean out the environment variables that are affected by 32- vs 64-bit builds. Please give this one a whirl and confirm it fixes the problem for you as well. Thanks!
So, the core problem is that cargo doesn't allow to tell it what build.rs's target is. Has anyone filed a bug against cargo for that? (resetting PATH, LIB and LIBPATH seems like a terrible workaround)
I filed https://github.com/rust-lang/cargo/issues/3915 against cargo.
See Also: → https://github.com/rust-lang/cargo/issues/3915
Comment on attachment 8857059 [details] Bug 1355464 - Only clean out the environment variables that are affected by 32- vs 64-bit builds. This patch works for me, so I'd f+ it, although I still suspect whether it is the right way to do so... Probably without proper support from cargo, we have to have some (even terrible) workaround...
Attachment #8857059 - Flags: feedback?(xidorn+moz) → feedback+
Comment on attachment 8857059 [details] Bug 1355464 - Only clean out the environment variables that are affected by 32- vs 64-bit builds. Clearing feedback request since I don't have anything to add to Xidorn's comment.
Comment on attachment 8857059 [details] Bug 1355464 - Only clean out the environment variables that are affected by 32- vs 64-bit builds. https://reviewboard.mozilla.org/r/128964/#review135864 It'd be nice to figure out a more sensible way to interact with cargo here, but I think this our least-worst option currently.
Attachment #8857059 - Flags: review?(ted) → review+
Pushed by email@example.com: https://hg.mozilla.org/integration/autoland/rev/6266e52aa3a7 Only clean out the environment variables that are affected by 32- vs 64-bit builds. r=ted
You need to log in before you can comment on or make changes to this bug.