Closed Bug 1508828 Opened 11 months ago Closed 9 months ago

Stand up webrender CI on Firefox CI (Windows edition)

Categories

(Core :: Graphics: WebRender, defect, P2)

Other Branch
Unspecified
Windows
defect

Tracking

()

RESOLVED FIXED
mozilla66
Tracking Status
firefox66 --- fixed

People

(Reporter: kats, Assigned: kats)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

(Whiteboard: [gfx-noted])

Attachments

(1 file)

+++ This bug was initially created as a clone of Bug #1507884 +++

Same deal as bug 1507884, but for windows
Depends on: 1503756
Priority: -- → P2
Quick update here: I've been plugging away on this for a while now [1] and making slow but steady progress. Latest attempt is at https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=e1f74b71df028448ce187709119764a5e108fcef - this has the taskcluster job running the windows test commands (after going through a nightmare of a time ensuring msvc is set up, because rustc requires link.exe to link) but now it's failing because we need cmake for servo-freetype-sys.

[1] https://treeherder.mozilla.org/#/jobs?repo=try&author=kgupta%40mozilla.com&group_state=expanded&collapsedPushes=412356%2C412400%2C412411%2C411827&fromchange=88a485107318a9a13f41d0050415a384d5c42a94&tochange=e1f74b71df028448ce187709119764a5e108fcef
Looks like the cmake 3.6.2 repack that's currently on tooltool isn't new enough to generate stuff for Visual Studio 2017. I tried using a newer cmake and that worked better, but it couldn't actually find the Visual Studio "installation" (probably because it's looking for registry keys or some such while all we have is a tarball from tooltool). I'm not entirely sure where to go from here, the cmake documentation doesn't provide much useful information in the way of telling it where to find Visual Studio.
A couple of possible options:
1) Disable the pathfinder build, which should eliminate the servo-freetype-sys build and avoid this whole problem. That will at least get me unblocked to whatever the next problem is, and we can circle back to the pathfinder build later
2) Try to have cmake generate a different build (e.g. ninja + cl.exe instead of VisualStudio + cl.exe) that we can actually set up and execute. Right now the cmake-rs crate assumes that if the rust target is an msvc target, then we must want to use VisualStudio to do the build. Which isn't necessarily true - it should be possible to use a different build system with the cl.exe compiler to still generate msvc ABI binaries. So removing the cmake-rs assumption and allowing cmake to fall back to some other build system should in theory work.
With the pathfinder stuff removed, things seem to generally work. There are some reftest failures though, with what looks like fuzzable differences:

https://mozilla.staktrace.com/tmp/wr-win-failures.xhtml#logurl=https://taskcluster-artifacts.net/agYTDNJtQqewArP6guRS3Q/0/public/logs/live_backing.log
Using ninja+MSVC seems to work. But to do that we'll need a patch to either cmake-rs or servo-freetype-sys. I've filed an issue to figure out which is better. Now it's just the reftests that need to be dealt with, I think.
Switching to a non-GPU worker (and using the PowerShell script to set the screen resolution to what we use in appveyor) seems to fix the reftest differences.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=fb41ba8f14b45ff1bfa33770dc3741428740fe4a
Unfortunately after rebasing I'm getting intermittent windows build failures for the pathfinder build:

https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=215d0bffbf5043ff8cbaacfc8c40493ee8230d38

link.exe sometimes fails to find advapi32.dll. All the environment paths and files should be the exact same between passing and failing runs, so I'm not really sure what's going on here.
Even doing a `cargo clean` before the pathfinder build doesn't help:

https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=d4e0a787a3c839da0388a1d39477e44ce9ddd786

And just to make sure I'm not crazy I did some retriggers on a push from a few days ago:

https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=710ca37dd47f7de3f9a5b1b36bb618432487fba1

and that's all green. Could just be lucky, or could be something regressed. I think we moved from rustc 1.30 to 1.31 in that interval which might be related.
Interesting.. try push at https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=0536b7cf70d715c8b8d84612f207fd3ae6f0fb74 shows that in passing runs the LIB environment variable has windows paths but in failing runs it has msys-style paths:

Passing:
LIB=z:\task_1545096103\build\src\vs2017_15.8.4\VC\lib\x64;z:\task_1545096103\build\src\vs2017_15.8.4\VC\atlmfc\lib\x64;z:\task_1545096103\build\src\vs2017_15.8.4\SDK\Lib\10.0.17134.0\ucrt\x64;z:\task_1545096103\build\src\vs2017_15.8.4\SDK\Lib\10.0.17134.0\um\x64;z:\task_1545096103\build\src\vs2017_15.8.4\DIA SDK\lib\amd64

Failing:
LIB=/z/task_1545096211/build/src/vs2017_15.8.4/VC/lib/x64:/z/task_1545096211/build/src/vs2017_15.8.4/VC/atlmfc/lib/x64:/z/task_1545096211/build/src/vs2017_15.8.4/SDK/Lib/10.0.17134.0/ucrt/x64:/z/task_1545096211/build/src/vs2017_15.8.4/SDK/Lib/10.0.17134.0/um/x64:/z/task_1545096211/build/src/vs2017_15.8.4/DIA SDK/lib/amd64

AFAICT the last `export LIB=` that we actually run sets the msys path so it's not clear to me what's doing the conversion into windows paths, and why it doesn't happen every time. Must be a race condition of some sort.
Still happening: https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=ffe999eaf3f97194f37e2d163741c1b5d575d48d

:ahal, do you have any insight here? I'm at a bit of a loss as to where to look. In a nutshell: I have a task that I'm trying to run on windows via run-task, the meat of which can be seen at [1]. What's happening is that when it gets to running the cmd.exe lines towards the end, the LIB environment variable (and INCLUDE) is sometimes of the form z:\... and sometimes of the form /z/... (see try push logs and search for "LIB is" if you want the full thing). When it's in the windows format it passes and when it's in the other format the linker can't find the DLL it needs and the build job fails.

I am doing some dubious stuff via the setup-msvc-env script, which I copied from [2]. That's the most likely source of problems, but I don't see why it would be non-deterministic like this.

:glandium, also if you have any feedback on this approach or suggestions for alternatives that would bypass this problem I'm all ears.

(Also this is not super urgent, so don't let it interrupt any PTO/holiday stuff. I just wanted to write it down while it's fresh)

[1] https://hg.mozilla.org/try/rev/5c16310b793587e3a1cae9c3011a06f6fdc1550f#l2.99
[2] https://searchfox.org/mozilla-central/rev/9528360768d81b1fc84258b5fb3601b5d4f40076/js/src/devtools/automation/winbuildenv.sh#1-6
Flags: needinfo?(mh+mozilla)
Flags: needinfo?(ahal)
That's pretty strange! The only thing I can think of is that it must have something to do with the state of the hosts, e.g maybe some of them have an artifact lying around from a previous task that is somehow causing a different code path to get hit.

Since your modified VSPATH shows up in the LIB env, it looks like this must be where that env is getting set:
https://searchfox.org/mozilla-central/source/taskcluster/scripts/misc/build-gn-win32.sh#13

I'm not familiar with how this stuff works, but maybe glandium will have a bit of extra insight.
Flags: needinfo?(ahal)
Oh wait, I guess it's here:
https://searchfox.org/mozilla-central/source/build/win64/mozconfig.vs2017#18

That mk_export_correct_style LIB at the bottom looks suspicious.
So the mk_export_correct_style is at [1] and invokes mk_add_options, which I've gutted via [2], so that should effectively be a no-op. It should really just set the env var from the line you linked and that's it. If mk_export_correct_style is responsible, I still don't see why it's nondeterministic. :(

[1] https://searchfox.org/mozilla-central/rev/9528360768d81b1fc84258b5fb3601b5d4f40076/build/mozconfig.vs-common#3
[2] https://hg.mozilla.org/try/rev/5c16310b793587e3a1cae9c3011a06f6fdc1550f#l3.8
This is some kind of madness. In a fresh local mozilla build shell:

MozillaBuild Install Directory: C:\mozilla-build\
kats@kgupta-win ~$ export LIB=$PWD
kats@kgupta-win ~$ echo $LIB; cmd /c 'echo %LIB%'; cmd.exe /c 'echo %LIB%'
/c/Users/kats
/c/Users/kats
c:/Users/kats
kats@kgupta-win ~$ echo $SHELL
/bin/sh
kats@kgupta-win ~$ bash  # nested shell
kats@kgupta-win ~$ echo $SHELL
/bin/sh
kats@kgupta-win ~$ echo $LIB; cmd /c 'echo %LIB%'; cmd.exe /c 'echo %LIB%'
/c/Users/kats
/c/Users/kats
/c/Users/kats
kats@kgupta-win ~$ exit
exit
kats@kgupta-win ~$ echo $LIB; cmd /c 'echo %LIB%'; cmd.exe /c 'echo %LIB%'
/c/Users/kats
/c/Users/kats
c:/Users/kats
kats@kgupta-win ~$ export LIB=$PWD:$PWD/foo
kats@kgupta-win ~$ echo $LIB; cmd /c 'echo %LIB%'; cmd.exe /c 'echo %LIB%'
/c/Users/kats:/c/Users/kats/foo
/c/Users/kats:/c/Users/kats/foo
c:\Users\kats;c:\Users\kats\foo
kats@kgupta-win ~$ bash  # nested shell
kats@kgupta-win ~$ echo $LIB; cmd /c 'echo %LIB%'; cmd.exe /c 'echo %LIB%'
/c/Users/kats:/c/Users/kats/foo
/c/Users/kats:/c/Users/kats/foo
/c/Users/kats:/c/Users/kats/foo
kats@kgupta-win ~$


AFAICT the rules here seem to be:
a) The local bash variable is always fine
b) Echoing via `cmd` gives the same result as (a). Note that `cmd` is a wrapper that just runs "$COMSPEC" "$@"
c) Echoing via `cmd.exe` converts to c:/ format (forward slashes), except if there are multiple paths, in which case it converts to c:\ format (backslashes)
d) In a nested bash shell `cmd.exe` produces the same result as `cmd`

Further experimentation shows that the PATH variable doesn't follow rule (b) and always emits c:/ format (forward slashes) via `cmd`. In nested bash shells `cmd.exe` also emits forward slashes for %PATH%
Welcome in msys hell.
Flags: needinfo?(mh+mozilla)
Indeed.

As far as I can tell from digging around in the source, there are 6 environment variables that are supposed to go undergo automatic conversion [1] - PATH, HOME, LD_LIBRARY_PATH, TMP, TEMP, and TMPDIR. In Cygwin this works as described. In msys (which AFAICT is a fork of cygwin, but with useful stuff removed and bugs sprinkled liberally) this does not work as described, and for some reason works as described in comment 15 instead.

So my options here are:
1) Skip the pathfinder build, but that's just really kicking the can down the road and we'll have to deal with this again later
2) Hard-code the path in Windows format in my task description
3) Generate the path in Unix format and convert it myself to Windows format either using a regex or some other mechanism
4) ???

[1] https://sourceforge.net/p/mingw/msys-runtime/ci/MSYS-1_0_19/tree/winsup/cygwin/environ.cc#l61
I tried making a toolchain out of cygpath.exe and cygwin1.dll from a cygwin installation and using that to do path conversion, but it doesn't work quite right. It converts something like "/z/foo/bar:/z/baz" into "z;Z:\foo\bar;z;Z:\baz" - while that does technically work it seems like it might be a footgun later, and maybe mixing cygwin stuff with msys isn't the best idea.

I did get it working with sed to do the path conversion. Will clean up the patches and do retriggers to ensure it's green before posting for review.
Pushed by kgupta@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/507fae6b3eb4
Add a task to run standalone WebRender CI scripts on Windows. r=glandium,jrmuizel
Status: NEW → RESOLVED
Closed: 9 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla66
You need to log in before you can comment on or make changes to this bug.