Closed
Bug 1180775
Opened 9 years ago
Closed 8 years ago
Get win32/win64 debug/opt Firefox builds working on Try with TaskCluster using official mozconfigs
Categories
(Release Engineering :: General, defect)
Release Engineering
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: coop, Assigned: grenade)
References
Details
(Whiteboard: [bb2tc])
Attachments
(1 file, 1 obsolete file)
36.58 KB,
patch
|
garndt
:
review+
|
Details | Diff | Splinter Review |
Now that we have a basic generic worker to work with (bug 1147206), we should take the next step and implement Windows build in try using the generic worker.
Some permutations to consider here:
* 32-bit vs 64-bit
* PGO
* do we use hardware machines or the new AWS Windows instances?
Comment 1•9 years ago
|
||
In bug 1180190 I'm in the process of making some screencasts to step you through the process for this.
The first video is ready, unfortunately the quality is rather low - need to work out how to fix that. Should be good enough for understanding though:
https://youtu.be/cez1uGY5u8A
This first video shows you how to submit tasks against the generic worker.
The next video (not yet uploaded) steps you through creating a Windows environment in AWS, capturing an AMI from it, then creating a worker type to use this AMI, and finally triggering a task to run against the newly created worker type.
Depends on: 1180190
Comment 2•9 years ago
|
||
I had a chat with Pete about this, he showed me around the generic worker and gave me a fairly thorough live demo. Based on our chat, it seems like the initial steps are roughly as follows:
(1) Take one of the AMIs used by Releng for Windows builds in AWS (even if they aren't "production" quality yet).
(2) Modify this by hand to add/remove whatever tools necessary.
(3) Figure out the commands needed to run the build my tinkering around manually.
(4) Create an AMI for a new WindowsBuild worker type or similar based on step (2)
(5) Create a task payload based on step (2), (3), (4).
(6) Attempt to launch worker and build on windows.
007. Success.
Initially the approach will be to use as much of the existing environment as is available and make modifications to permissions, tools, system settings only if and and when necessary.
Comment 3•9 years ago
|
||
I published the screencasts to: http://docs.taskcluster.net/presentations
This is essentially the content that we went through together anyway.
Reporter | ||
Updated•9 years ago
|
Assignee: ffledgling → amiyaguchi
Comment 4•9 years ago
|
||
Hi Anthony,
I'm keen to get some windows jobs moved over - would we be able to meet up and work out how we can split the work up? I'm based in Germany, so Pacific coast early mornings would be my late afternoon/evening, or middle-of-the-night Pacific would be my mornings!
I'm also happy to walk you through any taskcluster stuff if it helps.
Thanks!
Pete
Comment 5•9 years ago
|
||
I've also done some work on this. I have a based firefox desktop build working in taskcluster, see:
https://tools.taskcluster.net/task-inspector/#EqhjrgOhSSWu5kfd7VmUcg/
This successfully published the firefox.exe.
I set up a new worker type (win2012r2) which is based on Windows Server 2012 R2, with Visual Studio Community Edition 2013, MozillaBuild 2.0.0, Windows SDK 8.1.
Work still to do:
* support object directory caching
* vcs cache for gecko (tc-vcs/hgtool.py/some mercurial extension/upgrade -- need to work out the details)
* writing the powershell script to generate the AMI used in the worker type (and publishing it back to gecko-tree for transparency)
* publishing binary artifacts to tooltool, supporting
* optimising instance type / specs
* integrating in-tree in gecko under /testing/taskcluster
I'll create sub bugs for all these things.
Comment 6•9 years ago
|
||
I've been meaning to update this issue too, but I haven't been devoting much time to it recently after banging my head a bit too much with it. I haven't gotten as much progress as Pete (awesome work :D), but I don't want to have absolutely nothing to show for it.
In any case, with the image I got from the relops (spot-y-2008-2015-08-13-08-57, a try image), I've managed to get closer to a working builder. I'm still unable to build anything with complete success.
I'll outline a few of the workarounds I've had to make in order to get a semi-working build.
- Update mozilla-build from 1.9 to 1.10
This resolves a puzzling issue where `mach configure` (and in turn `mach build`) fails to read the nightly mozconfig pulled from `browser/config/mozconfigs/win32/nightly` during a mozharness build. The error looks like the following when run with `start-shell-msvc-2013.bat`.
0:07.47 client.mk:117: *** missing separator (did you mean TAB instead of 8 spaces?). Stop.
When started with `start-shell-l10n.bat`, configuration succeeds, but compilation fails since none of the compilers have been added to the path.
- Disable remote sscache (to a certain extent)
This involves setting `SCCACHE_DIR=` to a local directory, otherwise boto will fail due to authorization errors. This is probably due to the lack of a .boto with valid keys, which can otherwise be solved later.
So, not much gained from trying to build with mozharness with the existing configs.
Updated•9 years ago
|
Assignee: acmiyaguchi → nobody
Updated•9 years ago
|
Blocks: q3-bb-tc-migration
Comment 7•9 years ago
|
||
I'm at the point now where:
* the builds are defined in-tree, triggered on try for both debug/opt for win32 and win64 bit builds
* I have code checked in in-tree for generating the windows worker type that the firefox desktop builds use
* the builds are using the official mozconfigs
* the mozconfigs that had hardcoded references to buildbot environment have been reworked
* the builds are running pretty much to the end, probably just a couple more iterations to run through of fix/rerun to get them working 100%
* for now, not using sccache (need to discuss if we can use existing releng buckets or create new s3 buckets in taskcluster aws account)
* currently calling ./mach build directly rather than mozharness - when everything works 100% this way, we can look at adapting the buildbot steps in mozharness
* currently artifacts are listed explicitly - probably we want the lists to be generated dynamically to ease future cross-platform changes
Please note I'm attaching as work-in-progress, and will be updating as and when the last issues are ironed out.
Assignee: nobody → pmoore
Status: NEW → ASSIGNED
Comment 8•9 years ago
|
||
... and I've just pushed to https://treeherder.mozilla.org/#/jobs?repo=try&revision=d58ada3fad78 if you'd like to see how the builds look like at this point in time...
Comment 9•9 years ago
|
||
(In reply to Pete Moore [:pmoore][:pete] from comment #7)
> * currently calling ./mach build directly rather than mozharness - when
> everything works 100% this way, we can look at adapting the buildbot steps
> in mozharness
As this may be a controversial point, I felt it warranted some further explanation, to avoid any misunderstandings.
The target build process may look like this:
taskcluster task definitions sit on top of mozharness, which sits on top of mach, which sits on top of the make build system, which is using the gnu build tools (such as autoconf) which run over msys, which call out to compiler toolchains....
Due to the number of layers here, it can be difficult to isolate problems. There are potential references to buildbot slave environment settings in mozharness, mach, and even the make build system. Therefore, it made sense to get things working with mach directly initially, to fix any assumptions that buildbot is being used, and then to step up to the next layer, mozharness, to fix assumptions in that layer.
In other words: the choice of using mach directly is simply to first iron out buildbot references from the build system, *before* tackling fixing the buildbot references in mozharness.
Some might argue we have enough layers, and mach should do the full job, and we shouldn't need mozharness. Regardless of the position on this discussion, it still makes sense to fix things lower down first, hence the attack strategy I adopted.
Comment 10•9 years ago
|
||
... and of course signing is not implemented, as there are bunch of security matters to attend to before that can be added ...
Comment 11•9 years ago
|
||
Heading off for the evening now - just kicked this one off, hopefully it will be green. =)
https://treeherder.mozilla.org/#/jobs?repo=try&revision=e7c4d81dffd5
Comment 12•9 years ago
|
||
Wasn't green because of https://github.com/taskcluster/generic-worker/commit/e853c6b190729ff38d0d24eec01389d2ffb6f871
Released a new generic worker with the fix (https://github.com/taskcluster/generic-worker/releases/tag/v1.0.12), rebuilt the ami, updated the worker type, and submitted new jobs, that are now running:
https://tools.taskcluster.net/task-graph-inspector/#CJAvzc1BRfqYNDEYhprMSQ/
Comment 13•9 years ago
|
||
Comment 14•9 years ago
|
||
Getting things green with just mach is an eminently sensible place to start. Once you've got that green (and it sounds like you do!) it would be good to get them running in mozharness to match the buildbot builds, just for sanity of comparison.
The Linux/Mac Taskcluster builds use build.sh:
https://dxr.mozilla.org/mozilla-central/source/testing/docker/desktop-build/bin/build.sh
...which does most of its work in build-linux.sh:
https://dxr.mozilla.org/mozilla-central/source/testing/taskcluster/scripts/builder/build-linux.sh
Comment 15•9 years ago
|
||
(In reply to Pete Moore [:pmoore][:pete] from comment #7)
> * for now, not using sccache (need to discuss if we can use existing
> releng buckets or create new s3 buckets in taskcluster aws account)
FYI, we're not using sccache for Taskcluster Linux builds yet either: bug 1187257
Comment 16•9 years ago
|
||
Thanks for this input Ted!
I'm going to be handing the ball over to Rob as I have some other priorities for Q4 2015 - so Rob I'd recommend taking a look at the links from Ted.
Now is quite a good time to hand over as things are green - so I think next steps are moving to mozharness, and then *lots* of testing to see how the builds hold out.
There may also be room for cutting down some of the config - I'm sure I'm setting more environment variables then are actually needed - so a little bit of cleanup will be useful to strip the tasks down to the bare minimum of needed config, and the worker type setup too...
I've blogged about how the work in this bug has been done here:
http://petemoore.github.io/general/taskcluster/2015/09/30/building-firefox-for-windows-on-try-using-taskcluster.html
I think I'll update the title of this bug to be about getting green try builds, and then we can create separate bugs about rolling it out, migrating to mozharness, etc.
Updated•9 years ago
|
Summary: Implement Windows builds using the generic worker → Get win32/win64 debug/opt Firefbox builds working on Try with TaskCluster using official mozconfigs
Updated•9 years ago
|
Summary: Get win32/win64 debug/opt Firefbox builds working on Try with TaskCluster using official mozconfigs → Get win32/win64 debug/opt Firefox builds working on Try with TaskCluster using official mozconfigs
Comment 17•9 years ago
|
||
Attachment #8667281 -
Attachment is obsolete: true
Attachment #8667948 -
Flags: review?(ted)
Attachment #8667948 -
Flags: review?(garndt)
Comment 18•9 years ago
|
||
Comment on attachment 8667948 [details] [diff] [review]
bug1180775_gecko_v2.patch
Review of attachment 8667948 [details] [diff] [review]:
-----------------------------------------------------------------
I overlooked some of non testing/taskcluster/* pieces as I'm not as aware of them. This looks good. Since this touches the linux task definitions it might be nice to kick off a task graph include all of those to just make sure nothing was terrible broken (the changes are minor so I don't expect that).
::: testing/taskcluster/tasks/builds/firefox_windows_base.yml
@@ +164,5 @@
> + # Rather then enforcing particular conventions we require that all build
> + # tasks provide the "build" extra field to specify where the build and tests
> + # files are located.
> + locations:
> + build: "src/{{object_dir}}/dist/bin/firefox.exe"
Hrm, I might might be overlooking it, where do these values come from?
Attachment #8667948 -
Flags: review?(garndt) → review+
Comment 19•9 years ago
|
||
I've updated the worker type to run a *** cygwin ssh daemon *** and am regenerating ami etc - so a bit later I'll do another push to make sure everything still works.
(technically it is the open ssh daemon, ported to cygwin)
Having a decent ssh daemon should simplify troubleshooting especially on low bandwidth connections as we currently only have a graphical interface via RDP.
FTR, it was a case of adding the security group "ssh-only" and adding the following Powershell code to the userdata used for setting up the AMI:
$client = New-Object system.net.WebClient <- line already existed, but is needed, so here just for reference
# download cygwin
$client.DownloadFile("https://www.cygwin.com/setup-x86_64.exe", "C:\cygwin-setup-x86_64.exe")
# install cygwin
$p = Start-Process "C:\cygwin-setup-x86_64.exe" -ArgumentList "--quiet-mode --no-admin --no-startmenu --no-desktop --no-shortcuts --root C:\cygwin --site http://cygwin.mirror.constant.com -P openssh" -wait -NoNewWindow -PassThru -RedirectStandardOutput "C:\cygwin_install.log" -RedirectStandardError "C:\cygwin_install.err"
# open up firewall for ssh daemon
New-NetFirewallRule -DisplayName "Allow SSH inbound" -Direction Inbound -LocalPort 22 -Protocol TCP -Action Allow
# configure sshd
$p = Start-Process "C:\cygwin\bin\bash.exe" -ArgumentList "--login -c `"ssh-host-config -y -c 'ntsec mintty' -w '*********'`"" -wait -NoNewWindow -PassThru -RedirectStandardOutput "C:\cygrunsrv.log" -RedirectStandardError "C:\cygrunsrv.err"
# start sshd
$p = Start-Process "net" -ArgumentList "start sshd" -wait -NoNewWindow -PassThru -RedirectStandardOutput "C:\net_start_sshd.log" -RedirectStandardError "C:\net_start_sshd.err"
# download bash setup script
$client.DownloadFile("https://raw.githubusercontent.com/petemoore/myscrapbook/master/setup.sh", "C:\cygwin\home\Administrator\setup.sh")
# run bash setup script
$p = Start-Process "C:\cygwin\bin\bash.exe" -ArgumentList "--login -c 'chmod a+x setup.sh; ./setup.sh'" -wait -NoNewWindow -PassThru -RedirectStandardOutput "C:\Administrator_cygwin_setup.log" -RedirectStandardError "C:\Administrator_cygwin_setup.err"
Please note we should find a way to obscure the password of the cygwin user that runs the daemon - several options for doing this, just trying it out as a proof of concept at the moment, and making sure it doesn't interfere with msys regarding build of firefox.
Also note it downloads a bash script to execute after setting up the environment, so any further automation can be done in bash rather than powershell. This is also just a temporary solution - in the end this should be checked into tree, or published as a package, etc.
Comment 20•9 years ago
|
||
Some strange things going on - contacting the cygwin mailing list:
https://sourceware.org/ml/cygwin/2015-10/msg00036.html
Updated•9 years ago
|
Whiteboard: [bb2tc]
Updated•9 years ago
|
No longer blocks: q3-bb-tc-migration
Updated•9 years ago
|
Assignee: pmoore → rthijssen
Comment 21•9 years ago
|
||
Google doc notes from our meeting: https://docs.google.com/document/d/1DM9wDwsf_0Xexes08rqFodY00BQYPsXWAgcKiqAA0Ww/edit#heading=h.5pirqdp0zxl4
Comment 22•9 years ago
|
||
Comment 23•9 years ago
|
||
Ted, if you like we can meet up (vidyo) and go over the patch together, which might help explaining some context etc. In isolation it might be trickier to review, as there are a few different parts to it. Let me know if that appeals. Thanks!
Flags: needinfo?(ted)
Comment 24•9 years ago
|
||
I'll look it over today and let you know if I need that.
Flags: needinfo?(ted)
Comment 25•9 years ago
|
||
Comment on attachment 8667948 [details] [diff] [review]
bug1180775_gecko_v2.patch
It probably makes more sense now for this to be reviewed when :grenade has a new version, so no need to review this now.
Attachment #8667948 -
Flags: review?(ted)
Updated•9 years ago
|
Comment 26•8 years ago
|
||
is there more work to do here? I have taskcluster windows builds on try
Assignee | ||
Comment 27•8 years ago
|
||
yeah, this is ancient history
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Updated•7 years ago
|
Component: General Automation → General
You need to log in
before you can comment on or make changes to this bug.
Description
•