Closed Bug 1311729 (wsl-build) Opened 3 years ago Closed 7 months ago

Try to get a Windows MSVC build working in WSL (Windows Subsystem for Linux)

Categories

(Firefox Build System :: General, defect)

defect
Not set

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: ted, Unassigned)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

Microsoft announced that the latest version of WSL supports running native Windows binaries:
https://blogs.msdn.microsoft.com/wsl/2016/10/19/windows-and-ubuntu-interoperability/

There are some caveats, in that it doesn't pass environment variables, and it doesn't do msys-style path translation. I was able to compile and link a simple test program with MSVC inside a WSL shell by explicitly passing every path from $INCLUDE to the compiler with -I, and explicitly passing every path from $LIB to the linker as -LIBPATH:, like so:
luser@dello:/mnt/c/build$ python runcl.py -Fohello.obj -c hello.c
/mnt/c/Program Files (x86)/Microsoft Visual Studio 14.0/VC/bin/amd64/cl.exe -I C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\INCLUDE -I C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\ATLMFC\INCLUDE -I C:\Program Files (x86)\Windows Kits\10\include\10.0.10586.0\ucrt -I C:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\include\um -I C:\Program Files (x86)\Windows Kits\10\include\10.0.10586.0\shared -I C:\Program Files (x86)\Windows Kits\10\include\10.0.10586.0\um -I C:\Program Files (x86)\Windows Kits\10\include\10.0.10586.0\winrt -Fohello.obj -c hello.c
Microsoft (R) C/C++ Optimizing Compiler Version 19.00.23918 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

hello.c
luser@dello:/mnt/c/build$ python runlink.py -OUT:hello.exe hello.obj
/mnt/c/Program Files (x86)/Microsoft Visual Studio 14.0/VC/bin/amd64/link.exe -LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\LIB\amd64 -LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\ATLMFC\LIB\amd64 -LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.10586.0\ucrt\x64 -LIBPATH:C:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\lib\um\x64 -LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.10586.0\um\x64 -OUT:hello.exe hello.obj
Microsoft (R) Incremental Linker Version 14.00.23918.0
Copyright (C) Microsoft Corporation.  All rights reserved.

luser@dello:/mnt/c/build$ ./hello.exe
hello world

It should be possible to do a Windows build with MSVC now, but it'll be a little tricky. I tried just setting PATH='/mnt/c/Program Files (x86)/Microsoft Visual Studio 14.0/VC/bin/amd64/':$PATH, CC=cl.exe CXX=cl.exe LD=link.exe and configure bails pretty quickly:
# Pastebin snj78yml
 0:06.46 checking for the target C compiler... '/mnt/c/Program Files (x86)/Microsoft Visual Studio 14.0/VC/bin/amd64/cl.exe'
 0:06.48 checking whether the target C compiler can be used...
 0:06.48 DEBUG: <truncated - see config.log for full output>
 0:06.48 DEBUG: | #elif _WIN32 || __CYGWIN__
 0:06.49 DEBUG: | %KERNEL "WINNT"
 0:06.49 DEBUG: | #elif __NetBSD__
 0:06.49 DEBUG: | %KERNEL "NetBSD"
 0:06.49 DEBUG: | #elif __APPLE__
 0:06.49 DEBUG: | %KERNEL "Darwin"
 0:06.49 DEBUG: | #endif
 0:06.49 DEBUG: | #if _MSC_VER || __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
 0:06.49 DEBUG: | %ENDIANNESS "little"
 0:06.49 DEBUG: | #elif __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
 0:06.49 DEBUG: | %ENDIANNESS "big"
 0:06.49 DEBUG: | #endif
 0:06.49 DEBUG: Executing: `cl.exe -E /tmp/conftest.DWw80o.c`
 0:06.49 DEBUG: The command returned non-zero exit status 2.
 0:06.49 DEBUG: Its error output was:
 0:06.49 DEBUG: | Microsoft (R) C/C++ Optimizing Compiler Version 19.00.23918 for x64
 0:06.49 DEBUG: | Copyright (C) Microsoft Corporation.  All rights reserved.
 0:06.49 DEBUG: |
 0:06.49 DEBUG: | cl : Command line warning D9002 : ignoring unknown option '/tmp/conftest.DWw80o.c'
 0:06.49 DEBUG: | cl : Command line error D8003 : missing source filename
 0:06.49 ERROR: Command `cl.exe -E /tmp/conftest.DWw80o.c` failed with exit status 2.
 0:06.51 *** Fix above errors and then restart with\
 0:06.51                "/usr/bin/make -f client.mk build"
 0:06.51 make: *** [configure] Error 1

There are two problems here:
1) /tmp/ isn't a path that's accessible to native Windows programs. We'd have to point the temp path somewhere on /mnt/c/ (should be possible to just set TEMP).
2) We can't pass unix-style paths to Windows programs. We'd have to do some path translation. This could get pretty hairy since we'd have to do it *only* for Windows programs. WSL isn't lenient like msys where binaries will accept either style of path--WSL binaries are actual Linux binaries and only take unix paths, and Windows binaries need Windows paths.

The latter might be fixable. We should try and see.
Duplicate of this bug: 1352751
From Bas' bug:
> As per the creator's update, available through the insider preview program and which will be released in the coming weeks. WSL will now be able to execute Win32 executables, this includes cl.exe. It seems that this should make it possible to run our build system in the WSL while using win32 as a build target. This will probably offer a big speed improvement (at least the GCC build on WSL takes the same time for me as it does in native linux), and will hopefully long-term convince more developers to work on windows as they'll be working in a shell they're comfortable with, with reasonable build times.

This is interesting! When WSL was first made available on the insider channel I built Firefox for Linux inside it and my build times weren't any better than they were building for Win32 outside of WSL. Do you have build timings handy on your machine?
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #2)
> From Bas' bug:
> > As per the creator's update, available through the insider preview program and which will be released in the coming weeks. WSL will now be able to execute Win32 executables, this includes cl.exe. It seems that this should make it possible to run our build system in the WSL while using win32 as a build target. This will probably offer a big speed improvement (at least the GCC build on WSL takes the same time for me as it does in native linux), and will hopefully long-term convince more developers to work on windows as they'll be working in a shell they're comfortable with, with reasonable build times.
> 
> This is interesting! When WSL was first made available on the insider
> channel I built Firefox for Linux inside it and my build times weren't any
> better than they were building for Win32 outside of WSL. Do you have build
> timings handy on your machine?

I did, but I've reinstalled the preview build since then. (I too ran it just after it came out) I'll grab new times.
I think I only noted this in IRC, so I'll jot it here for posterity.

My experiments with WSL show that the pico processes they run in the NTFS kernel remove about half of the overhead that full win32 processes have on Windows compared to Linux. We know that a win32 process has substantial startup overhead compared to an ELF process on Linux. WSL makes about half that overhead go away.

Since so much of the Firefox build system runs several, short-lived processes, merely switching to running make, Python, etc from WSL should make the build significantly faster on Windows. Wall and CPU time will still be dominated by compilation. But for light or artifact builds, WSL could make a major difference.
(In reply to Gregory Szorc [:gps] from comment #4)
> I think I only noted this in IRC, so I'll jot it here for posterity.
> 
> My experiments with WSL show that the pico processes they run in the NTFS
> kernel remove about half of the overhead that full win32 processes have on
> Windows compared to Linux. We know that a win32 process has substantial
> startup overhead compared to an ELF process on Linux. WSL makes about half
> that overhead go away.
> 
> Since so much of the Firefox build system runs several, short-lived
> processes, merely switching to running make, Python, etc from WSL should
> make the build significantly faster on Windows. Wall and CPU time will still
> be dominated by compilation. But for light or artifact builds, WSL could
> make a major difference.

Yes, this matches my experience. Still working on good numbers sadly. Add to this CL execution time seems to go down for me with VS2017, and that for builds on my 12-core process overhead is still significant, and I think we'll be able to improve a bunch of situations. Not to mention having WSL as the build environment could help draw developers who prefer a good command line with tools they're comfortable with to the Windows OS, which I think could really benefit Firefox.
(for further reference for those somewhat uninitiated: https://blogs.msdn.microsoft.com/wsl/2016/05/23/pico-process-overview/)
Has anyone tried getting watchman/fsmonitor working? I was really hoping that WSL would alleviate the problem of |hg status| being slow on Windows, but actually it makes things much worse: 12s in WSL compared to 2s in Win32 (both on warm trees).
watchman running under WSL is what, emulated filesystem events derived from events occurring in the Windows kernel? I would not expect watchman/fsmonitor under WSL to be faster than native - either Linux or Windows.

Also, `hg st` completes in <250ms for me on Windows 10 with hg 4.1.2 and watchman 4.1.0. Not sure why it is taking 2s for you. There is a default watchman timeout of 2s in fsmonitor. Perhaps it is timing out and falling back to slow mode? Tack --profile on any hg command to see where it is spending time.
I upgraded to Windows 10 Creators Update today. The new cmd.exe with full support for 16 bit color and other terminal goodies is really nice. Ubuntu 16.04 seems to mostly just work in WSL.

This is so much nicer than MozillaBuild.

I dare say we want the future of building and contributing on Windows to revolve around WSL.

I think we should change configure so it detects WSL and automatically targets Windows and configures the MSVC toolchain. That way, once you get your WSL environment up and running, you just need to type `mach build` and you've basically replaced MozillaBuild.

But first, we need to make configure work with MSVC from WSL and then coerce the build itself into working. That's likely to be a long tail.

FWIW, if anyone wants to know how to detect WSL from Python, look for "Microsoft" in platform.uname(). At a lower level, look for "Microsoft" in /proc/version.
The build system doesn't really like very much that the target and host environment are radically different in how the compiler, linker, and everything around that is different (flags, file suffixes, etc.). That's a long-standing issue, and one that I want to fix for various reasons including this one, or the fact that adding support for host shared libraries is currently a PITA. But it's going to take a while.
Mingw builds on linux kind of work because mingw is still GCC.
Fwiw, my hg st experience (not using Watchman, I don't know what that is :p), matched dmajor's, 3.6s in win32, 16.9s in WSL.
Here's some build timings I did on a fast machine. I'm not sure what's different from the previous time I tried, but as Ted mentioned, it doesn't seem to be faster:

Win32: Clobber: 17:05 Small change, mach build: 1:45, Small change, mach build binaries: 0:27
WSL: Clobber: 16:27 Small change, mach build: 2:45, Small change, mach build binaries: 0:59

What's interesting is that the processes indeed seem to be faster. But WSL calls seem to interact poorly with the NTFS file system, an rm -rf, for example is slower than a windows delete.
WSL in Creators Update supports inotify. So watchman should work under WSL. But given the nature of what watchman is doing, I wouldn't at all be surprised if it were buggy under WSL.

Also, Creators Update supports creating symlinks without administrator privileges: you just need developer mode enabled. That could potentially be profound for e.g. test file installation.
Assignee: nobody → ted
Depends on: 1384557
I got everything in moz.configure working in WSL, but a few things in old-configure are breaking. I'm just going to either skip them (where they're not important) or move them to moz.configure since it'll be easier to make everything work properly there.
Here's my WIP so far:
https://hg.mozilla.org/users/tmielczarek_mozilla.com/mc/rev/7a75c33c06c380deefcf9c9a0a154df586b55656

I'm going to fixup my patch in bug 1384557 because old-configure is currently breaking on the showIncludes parsing check, and fixing that seems like more work than reviving that patch and doing it in moz.configure.
Summary: Try to get a Windows MSVC build working in WSL → Try to get a Windows MSVC build working in WSL (Windows Subsystem for Linux)
Depends on: 1396993
Depends on: 1397263
I got through all of configure in WSL, here's my current patch:
https://hg.mozilla.org/users/tmielczarek_mozilla.com/mc/rev/16e39d4e1d29

It depends on the patches in the bugs blocking this one, which are mostly just moving stuff from old-configure to moz.configure. Making compiler checks work in old-configure seems more painful and not worthwhile, I'd rather just move everything in the way into moz.configure.

Most of it is just fixing path translation in various places, but there were a few more invasive bits, like:
* Added additional checks in `check_build_environment` to verify that the srcdir/objdir are both on a Windows filesystem and not a Linux filesystem, since native Windows programs can't read or write files on the Linux filesystem: https://hg.mozilla.org/users/tmielczarek_mozilla.com/mc/rev/16e39d4e1d29#l2.47
* Changed host system detection to be WSL-aware and explicitly set the host to 'x86_64-pc-mingw32': https://hg.mozilla.org/users/tmielczarek_mozilla.com/mc/rev/16e39d4e1d29#l2.86 . I don't know if this is the *best* solution, but it makes things work.
* Changing `get_registry_values` to call reg.exe in WSL instead of using `_winreg`, since the Python we're using for the build is a Linux Python: https://hg.mozilla.org/users/tmielczarek_mozilla.com/mc/rev/16e39d4e1d29#l5.185
* Changing the -showIncludes prefix check to explicitly pass -I for all the paths we set in INCLUDES: https://hg.mozilla.org/users/tmielczarek_mozilla.com/mc/rev/16e39d4e1d29#l6.158 . We're going to need to do the same thing for actual compilation as well.
* Hacking mozpath.normsep to just always replace forward slashes with backslashes: https://hg.mozilla.org/users/tmielczarek_mozilla.com/mc/rev/16e39d4e1d29#l10.7
* Changing find_program to be able to locate binaries with an optional .exe extension in WSL: https://hg.mozilla.org/users/tmielczarek_mozilla.com/mc/rev/16e39d4e1d29#l5.122
* Skipped or removed a bunch of irrelevant tests in old-configure.

I'm open to any feedback anyone has on anything in this patch! Some parts of it feel very clunky, so if you have suggestions for better ways to fix things I'm all ears. One thing that I'm not super fond of are the definitions of `win_path` / `user_path`, which I cribbed from `normalize_path`: https://hg.mozilla.org/users/tmielczarek_mozilla.com/mc/rev/16e39d4e1d29#l5.70
In the meantime, I started working on things that are broken during the build itself. The first few things we hit are scripts for `GENERATED_FILES` that do things like invoke the preprocessor. I fixed the first one I hit ( https://dxr.mozilla.org/mozilla-central/source/js/src/builtin/embedjs.py ) without too much trouble by simply making it use relative paths when referring to source files. I suspect most of the other fixes are going to look very similar to that. I haven't actually hit any C++ compilation yet, but I know that's going to be broken for sure, since we currently rely on the compiler picking up INCLUDE/LIBS from the environment, and that doesn't work when running Windows binaries from WSL. When I did my initial testing of running MSVC in WSL I was able to get things working by just passing all of those paths as -I / -L arguments to the compiler and linker, respectively, so I don't think that will be hard to fix.
Depends on: 881446
bug 881446 comment 10 has some details on an issue I hit with midl. After some hacking I did find a viable workaround, albeit a little ugly: I added a `CONFIGURE_SUBST_FILE` to create a .bat file that sets INCLUDE/LIB/PATH to the values we need and runs a command in that environment. I can then run midl using that in WSL and it works. However, rather than figure out the awfulness needed to invoke cmd.exe properly from a Makefile I'm going to first port the MIDL execution to moz.build and add a Python script to execute it, and then add the WSL shim invocation on top of that.
Depends on: 1299959
Depends on: 1399870
Depends on: 1399877
Depends on: 1399878
Depends on: 1399882
Depends on: 1382182
So I got a full build working, wish some caveats:
1) I've been hitting very frequent errors invoking the compiler. They look like:
/mnt/c/Program Files (x86)/Microsoft Visual Studio/2017/Community/VC/Tools/MSVC/14.10.25017/bin/HostX64/x64/cl.exe: Invalid argument

I hit a similar error invoking link.exe when I had screwed up the use of response files and the generated commandline was too long, but this seems intermittent. It will fail compiling some source file, but restarting the build will make it succeed. I tried compiling a single source file in a loop overnight and was not able to reproduce, so there must be some interaction with the rest of the build running.

2) There may be some other factors here, but when I got a build to succeed it was not fast: real    32m21.961s

I pushed my patches up here:
https://hg.mozilla.org/users/tmielczarek_mozilla.com/mc/log/7739c7ce5fe8

I'm on PTO next week but when I get back I'll try to get everything cleaned up and up for review.
> it was not fast: real    32m21.961s

... without saying how long it takes for a native windows build on the same machine with the same MSVC version, it's hard to tell how "not fast" that is.
There's a "Fall Creators Update" due out next month. Maybe it'll have some WSL fixes/enhancements? I wonder if it's worth testing on a preview build and/or just waiting for the release.
Duplicate of this bug: 1403826
(In reply to Mike Hommey [:glandium] from comment #21)
> > it was not fast: real    32m21.961s
> 
> ... without saying how long it takes for a native windows build on the same
> machine with the same MSVC version, it's hard to tell how "not fast" that is.

Sorry, this is on a Lenovo Thinkstation P710 with dual Intel(R) Xeon(R) CPU E5-2637 v4 @ 3.50GHz, 32GB ram, SSD running Windows 10 x64. When I first set this machine up I timed Windows builds in MozillaBuild at ~14-15 minutes.

It's possible that some of the slowness is due to quirks in my patch, of course. It might be useful to do some small benchmarks of parts of the build system to see how it compares for the WSL build vs. a build in MozillaBuild. I'm thinking things like `./mach build install-manifests`, building a single .cpp file, building just the rust bits, etc.
Product: Core → Firefox Build System
Depends on: 1446066
I rebased my patch queue and got everything working again and it still doesn't look much better than the last time I tried it. My system has the Fall Creator's Update but the build is still slow. I'm still seeing intermittent compile failures as well, but they seem less frequent than they were before, so maybe something has improved there.
There are known major performance issues concerning i/o in WSL:
https://github.com/Microsoft/WSL/issues/873
My guess is that we're not going to see comparable build times until those are fixed.
(In reply to Ray Donnelly from comment #27)
> Or you could just use MSYS2 and things would run at native or near-native
> speed. Can anyone explain this decision?

Because getting the build to run in MSYS2 has already been tried, and people have decided the trouble of making that work is not worth it.  The promise of WSL is that the builds might actually be *faster* than native Windows, because of less process overhead, and that the build setup is much closer to our other platforms, so there are fewer special cases to worry about.  Neither of those things is true for MSYS2, AFAIK.
(While somewhat interesting, this discussion is off-topic for this bug. Since it looks like there is going to be a bunch of conversation around this topic, would you mind filing a new bug to discuss getting Firefox building under MSYS2?  Thanks!)
(In reply to James Teh [:Jamie] from comment #26)
> There are known major performance issues concerning i/o in WSL:
> https://github.com/Microsoft/WSL/issues/873
> My guess is that we're not going to see comparable build times until those
> are fixed.

version 1803 seems to improve disk I/O performance.  My android builder on WSL is faster than version 1709.
Depends on: 1489211
Depends on: 1490054
Depends on: 1490130
Alias: wsl-build
Depends on: 1490463
Assignee: ted → nobody

I made several attempts at this and it just didn't work very well. I believe future effort should be directed at getting a full Linux->Windows cross compile working and supported, at which point running that build inside WSL should be trivial.

Status: NEW → RESOLVED
Closed: 7 months ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.