Get win32/win64 debug/opt Firefox builds working on Try with TaskCluster using official mozconfigs

RESOLVED FIXED

Status

Release Engineering
General Automation
RESOLVED FIXED
3 years ago
a year ago

People

(Reporter: coop, Assigned: grenade)

Tracking

(Blocks: 1 bug)

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [bb2tc])

Attachments

(1 attachment, 1 obsolete attachment)

(Reporter)

Description

3 years ago
Now that we have a basic generic worker to work with (bug 1147206), we should take the next step and implement Windows build in try using the generic worker.

Some permutations to consider here:
* 32-bit vs 64-bit
* PGO
* do we use hardware machines or the new AWS Windows instances?
(Reporter)

Updated

3 years ago
Depends on: 1180843
In bug 1180190 I'm in the process of making some screencasts to step you through the process for this.

The first video is ready, unfortunately the quality is rather low - need to work out how to fix that. Should be good enough for understanding though:

https://youtu.be/cez1uGY5u8A

This first video shows you how to submit tasks against the generic worker.

The next video (not yet uploaded) steps you through creating a Windows environment in AWS, capturing an AMI from it, then creating a worker type to use this AMI, and finally triggering a task to run against the newly created worker type.
Depends on: 1180190
I had a chat with Pete about this, he showed me around the generic worker and gave me a fairly thorough live demo. Based on our chat, it seems like the initial steps are roughly as follows:

(1) Take one of the AMIs used by Releng for Windows builds in AWS (even if they aren't "production" quality yet).
(2) Modify this by hand to add/remove whatever tools necessary.
(3) Figure out the commands needed to run the build my tinkering around manually.
(4) Create an AMI for a new WindowsBuild worker type or similar based on step (2)
(5) Create a task payload based on step (2), (3), (4).
(6) Attempt to launch worker and build on windows.
007. Success.

Initially the approach will be to use as much of the existing environment as is available and make modifications to permissions, tools, system settings only if and and when necessary.
I published the screencasts to: http://docs.taskcluster.net/presentations

This is essentially the content that we went through together anyway.
Depends on: 1182335
No longer depends on: 1182335
Depends on: 1182335
(Reporter)

Updated

3 years ago
Blocks: 1186437
(Reporter)

Updated

3 years ago
Assignee: ffledgling → amiyaguchi
Depends on: 1189597
Hi Anthony,

I'm keen to get some windows jobs moved over - would we be able to meet up and work out how we can split the work up? I'm based in Germany, so Pacific coast early mornings would be my late afternoon/evening, or middle-of-the-night Pacific would be my mornings!

I'm also happy to walk you through any taskcluster stuff if it helps.

Thanks!
Pete
Depends on: 1194798
I've also done some work on this. I have a based firefox desktop build working in taskcluster, see:
https://tools.taskcluster.net/task-inspector/#EqhjrgOhSSWu5kfd7VmUcg/

This successfully published the firefox.exe.

I set up a new worker type (win2012r2) which is based on Windows Server 2012 R2, with Visual Studio Community Edition 2013, MozillaBuild 2.0.0, Windows SDK 8.1.

Work still to do:

  * support object directory caching
  * vcs cache for gecko (tc-vcs/hgtool.py/some mercurial extension/upgrade -- need to work out the details)
  * writing the powershell script to generate the AMI used in the worker type (and publishing it back to gecko-tree for transparency)
  * publishing binary artifacts to tooltool, supporting
  * optimising instance type / specs
  * integrating in-tree in gecko under /testing/taskcluster

I'll create sub bugs for all these things.
I've been meaning to update this issue too, but I haven't been devoting much time to it recently after banging my head a bit too much with it. I haven't gotten as much progress as Pete (awesome work :D), but I don't want to have absolutely nothing to show for it. 

In any case, with the image I got from the relops (spot-y-2008-2015-08-13-08-57, a try image), I've managed to get closer to a working builder. I'm still unable to build anything with complete success. 

I'll outline a few of the workarounds I've had to make in order to get a semi-working build.

- Update mozilla-build from 1.9 to 1.10
This resolves a puzzling issue where `mach configure` (and in turn `mach build`) fails to read the nightly mozconfig pulled from `browser/config/mozconfigs/win32/nightly` during a mozharness build. The error looks like the following when run with `start-shell-msvc-2013.bat`. 

    0:07.47 client.mk:117: *** missing separator (did you mean TAB instead of 8 spaces?).  Stop.

When started with `start-shell-l10n.bat`, configuration succeeds, but compilation fails since none of the compilers have been added to the path.

- Disable remote sscache (to a certain extent)
This involves setting `SCCACHE_DIR=` to a local directory, otherwise boto will fail due to authorization errors. This is probably due to the lack of a .boto with valid keys, which can otherwise be solved later.

So, not much gained from trying to build with mozharness with the existing configs.
Assignee: acmiyaguchi → nobody
Created attachment 8667281 [details] [diff] [review]
As this is getting so close now, attaching a patch with work in progress...

I'm at the point now where:

  * the builds are defined in-tree, triggered on try for both debug/opt for win32 and win64 bit builds
  * I have code checked in in-tree for generating the windows worker type that the firefox desktop builds use
  * the builds are using the official mozconfigs
  * the mozconfigs that had hardcoded references to buildbot environment have been reworked
  * the builds are running pretty much to the end, probably just a couple more iterations to run through of fix/rerun to get them working 100%
  * for now, not using sccache (need to discuss if we can use existing releng buckets or create new s3 buckets in taskcluster aws account)
  * currently calling ./mach build directly rather than mozharness - when everything works 100% this way, we can look at adapting the buildbot steps in mozharness
  * currently artifacts are listed explicitly - probably we want the lists to be generated dynamically to ease future cross-platform changes

Please note I'm attaching as work-in-progress, and will be updating as and when the last issues are ironed out.
Assignee: nobody → pmoore
Status: NEW → ASSIGNED
... and I've just pushed to https://treeherder.mozilla.org/#/jobs?repo=try&revision=d58ada3fad78 if you'd like to see how the builds look like at this point in time...
(In reply to Pete Moore [:pmoore][:pete] from comment #7)

>   * currently calling ./mach build directly rather than mozharness - when
> everything works 100% this way, we can look at adapting the buildbot steps
> in mozharness

As this may be a controversial point, I felt it warranted some further explanation, to avoid any misunderstandings.

The target build process may look like this:

taskcluster task definitions sit on top of mozharness, which sits on top of mach, which sits on top of the make build system, which is using the gnu build tools (such as autoconf) which run over msys, which call out to compiler toolchains....

Due to the number of layers here, it can be difficult to isolate problems. There are potential references to buildbot slave environment settings in mozharness, mach, and even the make build system. Therefore, it made sense to get things working with mach directly initially, to fix any assumptions that buildbot is being used, and then to step up to the next layer, mozharness, to fix assumptions in that layer.

In other words: the choice of using mach directly is simply to first iron out buildbot references from the build system, *before* tackling fixing the buildbot references in mozharness.

Some might argue we have enough layers, and mach should do the full job, and we shouldn't need mozharness. Regardless of the position on this discussion, it still makes sense to fix things lower down first, hence the attack strategy I adopted.
... and of course signing is not implemented, as there are bunch of security matters to attend to before that can be added ...
Heading off for the evening now - just kicked this one off, hopefully it will be green. =)

https://treeherder.mozilla.org/#/jobs?repo=try&revision=e7c4d81dffd5
Getting things green with just mach is an eminently sensible place to start. Once you've got that green (and it sounds like you do!) it would be good to get them running in mozharness to match the buildbot builds, just for sanity of comparison.

The Linux/Mac Taskcluster builds use build.sh:
https://dxr.mozilla.org/mozilla-central/source/testing/docker/desktop-build/bin/build.sh
...which does most of its work in build-linux.sh:
https://dxr.mozilla.org/mozilla-central/source/testing/taskcluster/scripts/builder/build-linux.sh
(In reply to Pete Moore [:pmoore][:pete] from comment #7)
>   * for now, not using sccache (need to discuss if we can use existing
> releng buckets or create new s3 buckets in taskcluster aws account)

FYI, we're not using sccache for Taskcluster Linux builds yet either: bug 1187257
Thanks for this input Ted!

I'm going to be handing the ball over to Rob as I have some other priorities for Q4 2015 - so Rob I'd recommend taking a look at the links from Ted.

Now is quite a good time to hand over as things are green - so I think next steps are moving to mozharness, and then *lots* of testing to see how the builds hold out.

There may also be room for cutting down some of the config - I'm sure I'm setting more environment variables then are actually needed - so a little bit of cleanup will be useful to strip the tasks down to the bare minimum of needed config, and the worker type setup too...

I've blogged about how the work in this bug has been done here:

http://petemoore.github.io/general/taskcluster/2015/09/30/building-firefox-for-windows-on-try-using-taskcluster.html

I think I'll update the title of this bug to be about getting green try builds, and then we can create separate bugs about rolling it out, migrating to mozharness, etc.
Summary: Implement Windows builds using the generic worker → Get win32/win64 debug/opt Firefbox builds working on Try with TaskCluster using official mozconfigs
Summary: Get win32/win64 debug/opt Firefbox builds working on Try with TaskCluster using official mozconfigs → Get win32/win64 debug/opt Firefox builds working on Try with TaskCluster using official mozconfigs
Created attachment 8667948 [details] [diff] [review]
bug1180775_gecko_v2.patch
Attachment #8667281 - Attachment is obsolete: true
Attachment #8667948 - Flags: review?(ted)
Attachment #8667948 - Flags: review?(garndt)
Comment on attachment 8667948 [details] [diff] [review]
bug1180775_gecko_v2.patch

Review of attachment 8667948 [details] [diff] [review]:
-----------------------------------------------------------------

I overlooked some of non testing/taskcluster/* pieces as I'm not as aware of them.  This looks good.  Since this touches the linux task definitions it might be nice to kick off a task graph include all of those to just make sure nothing was terrible broken (the changes are minor so I don't expect that).

::: testing/taskcluster/tasks/builds/firefox_windows_base.yml
@@ +164,5 @@
> +    # Rather then enforcing particular conventions we require that all build
> +    # tasks provide the "build" extra field to specify where the build and tests
> +    # files are located.
> +    locations:
> +      build: "src/{{object_dir}}/dist/bin/firefox.exe"

Hrm, I might might be overlooking it, where do these values come from?
Attachment #8667948 - Flags: review?(garndt) → review+
I've updated the worker type to run a *** cygwin ssh daemon *** and am regenerating ami etc - so a bit later I'll do another push to make sure everything still works.

(technically it is the open ssh daemon, ported to cygwin)

Having a decent ssh daemon should simplify troubleshooting especially on low bandwidth connections as we currently only have a graphical interface via RDP.

FTR, it was a case of adding the security group "ssh-only" and adding the following Powershell code to the userdata used for setting up the AMI:



$client = New-Object system.net.WebClient  <- line already existed, but is needed, so here just for reference


# download cygwin
$client.DownloadFile("https://www.cygwin.com/setup-x86_64.exe", "C:\cygwin-setup-x86_64.exe")

# install cygwin
$p = Start-Process "C:\cygwin-setup-x86_64.exe" -ArgumentList "--quiet-mode --no-admin --no-startmenu --no-desktop --no-shortcuts --root C:\cygwin --site http://cygwin.mirror.constant.com -P openssh" -wait -NoNewWindow -PassThru -RedirectStandardOutput "C:\cygwin_install.log" -RedirectStandardError "C:\cygwin_install.err"

# open up firewall for ssh daemon
New-NetFirewallRule -DisplayName "Allow SSH inbound" -Direction Inbound -LocalPort 22 -Protocol TCP -Action Allow

# configure sshd
$p = Start-Process "C:\cygwin\bin\bash.exe" -ArgumentList "--login -c `"ssh-host-config -y -c 'ntsec mintty' -w '*********'`"" -wait -NoNewWindow -PassThru -RedirectStandardOutput "C:\cygrunsrv.log" -RedirectStandardError "C:\cygrunsrv.err"

# start sshd
$p = Start-Process "net" -ArgumentList "start sshd" -wait -NoNewWindow -PassThru -RedirectStandardOutput "C:\net_start_sshd.log" -RedirectStandardError "C:\net_start_sshd.err"

# download bash setup script
$client.DownloadFile("https://raw.githubusercontent.com/petemoore/myscrapbook/master/setup.sh", "C:\cygwin\home\Administrator\setup.sh")

# run bash setup script
$p = Start-Process "C:\cygwin\bin\bash.exe" -ArgumentList "--login -c 'chmod a+x setup.sh; ./setup.sh'" -wait -NoNewWindow -PassThru -RedirectStandardOutput "C:\Administrator_cygwin_setup.log" -RedirectStandardError "C:\Administrator_cygwin_setup.err"


Please note we should find a way to obscure the password of the cygwin user that runs the daemon - several options for doing this, just trying it out as a proof of concept at the moment, and making sure it doesn't interfere with msys regarding build of firefox.

Also note it downloads a bash script to execute after setting up the environment, so any further automation can be done in bash rather than powershell. This is also just a temporary solution - in the end this should be checked into tree, or published as a package, etc.
Some strange things going on - contacting the cygwin mailing list:
https://sourceware.org/ml/cygwin/2015-10/msg00036.html
Assignee: pmoore → rthijssen
Ted, if you like we can meet up (vidyo) and go over the patch together, which might help explaining some context etc. In isolation it might be trickier to review, as there are a few different parts to it. Let me know if that appeals. Thanks!
Flags: needinfo?(ted)
I'll look it over today and let you know if I need that.
Flags: needinfo?(ted)
(Assignee)

Updated

2 years ago
Depends on: 1244750
Comment on attachment 8667948 [details] [diff] [review]
bug1180775_gecko_v2.patch

It probably makes more sense now for this to be reviewed when :grenade has a new version, so no need to review this now.
Attachment #8667948 - Flags: review?(ted)

Updated

2 years ago
Blocks: 1228604
is there more work to do here?  I have taskcluster windows builds on try
(Assignee)

Comment 27

a year ago
yeah, this is ancient history
Status: ASSIGNED → RESOLVED
Last Resolved: a year ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.