Closed Opened 6 years ago Closed 6 years ago

Not set
normal

RESOLVED FIXED
mozilla52

## Attachments

### (16 files, 1 obsolete file)

 4.02 KB, patch dustin : review+ Details | Diff | Splinter Review 2.21 KB, patch dustin : review+ Details | Diff | Splinter Review 2.81 KB, patch dustin : review+ Details | Diff | Splinter Review 3.00 KB, patch ehsan.akhgari : review+ Details | Diff | Splinter Review 5.16 KB, patch ehsan.akhgari : review+ Details | Diff | Splinter Review 2.76 KB, patch ehsan.akhgari : review+ Details | Diff | Splinter Review 3.33 KB, patch ehsan.akhgari : review+ Details | Diff | Splinter Review 1.28 KB, patch ehsan.akhgari : review+ Details | Diff | Splinter Review 1.25 KB, patch ehsan.akhgari : review+ Details | Diff | Splinter Review 1.92 KB, patch ehsan.akhgari : review+ Details | Diff | Splinter Review 2.18 KB, patch ehsan.akhgari : review+ Details | Diff | Splinter Review 1.46 KB, patch ehsan.akhgari : review+ Details | Diff | Splinter Review 922 bytes, patch ehsan.akhgari : review+ Details | Diff | Splinter Review 1.00 KB, patch ehsan.akhgari : review+ Details | Diff | Splinter Review 1.70 KB, patch pmoore : review+ grenade : review+ Details | Diff | Splinter Review 2.52 KB, patch dustin : review+ Details | Diff | Splinter Review
It's much more convenient to build clang-cl on infra than it is on your local
machine.
I'm new to this, and I'm kind of cargo-culting my way through this, so I'm just
asking for feedback at this point.  All of this code is completely untested.
Attachment #8796612 - Flags: feedback?(dustin)
I'm new to this, and I'm kind of cargo-culting my way through this, so I'm just
asking for feedback at this point.  All of this code is completely untested.
(I am pretty sure that I'm going to have to set up some environment variables,
etc., to make MSVC work, like our Windows mozconfigs do now that MSVC comes via
tooltool rather than being installed on the build machines.)
Attachment #8796613 - Flags: feedback?(dustin)
Comment on attachment 8796612 [details] [diff] [review]
part 1 - move linux toolchain builds to a separate linux.yml file

Review of attachment 8796612 [details] [diff] [review]:
-----------------------------------------------------------------

@@ +10,5 @@
> +	tier: 1
> +    run:
> +        using: toolchain-script
> +        script: build-clang-linux.sh
> +    worker-type: aws-provisioner-v1/gecko-{level}-b-linux

Much of what was captured in the job-defaults was there to allow the individual job stanzas to be more concise.  As it stands, the repetition isn't *too* bad, but I wouldn't want to see it get too much worse.  Another option aside from job-defaults is to add a transform that applies some defaults.
Attachment #8796612 - Flags: feedback?(dustin) → review+
Comment on attachment 8796613 [details] [diff] [review]
part 2 - add clang-cl toolchain builds

Review of attachment 8796613 [details] [diff] [review]:
-----------------------------------------------------------------

This is a good start :)

You'll need to write the bit that connects using: toolchain-script on generic worker to a task definition that will actually cause that script to run in generic-worker.  The parallel for docker-worker is in taskcluster/taskgraph/transforms/job/toolchain.py so you would need to add your decorated function there.
Attachment #8796613 - Flags: feedback?(dustin) → feedback+
(In reply to Dustin J. Mitchell [:dustin] from comment #3)
> Comment on attachment 8796612 [details] [diff] [review]
> part 1 - move linux toolchain builds to a separate linux.yml file
>
> Review of attachment 8796612 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> @@ +10,5 @@
> > +	tier: 1
> > +    run:
> > +        using: toolchain-script
> > +        script: build-clang-linux.sh
> > +    worker-type: aws-provisioner-v1/gecko-{level}-b-linux
>
> Much of what was captured in the job-defaults was there to allow the
> individual job stanzas to be more concise.  As it stands, the repetition
> isn't *too* bad, but I wouldn't want to see it get too much worse.  Another
> option aside from job-defaults is to add a transform that applies some
> defaults.

Thanks for the review.  Is job-defaults even permitted here?  Judging from other files, it's difficult to tell (e.g. I see other \$PLATFORM.yml files, included via jobs-from, that could seemingly benefit from job-defaults, but they don't use job-defaults...) and the documentation doesn't even mention job-defaults (assuming I'm looking at the right place: http://gecko.readthedocs.io/en/latest/taskcluster/taskcluster/index.html).

(In reply to Dustin J. Mitchell [:dustin] from comment #4)
> This is a good start :)
>
> You'll need to write the bit that connects using: toolchain-script on
> generic worker to a task definition that will actually cause that script to
> run in generic-worker.  The parallel for docker-worker is in
> taskcluster/taskgraph/transforms/job/toolchain.py so you would need to add

Thank you for the pointer!
Flags: needinfo?(dustin)
You can keep job-defaults in kind.yml, but then it's farther from the jobs it's giving defaults for, which can lead to confusion for the reader.

Yes, you're looking in the right spot for docs.  Those docs are fairly high-level, though, with more detail in the in-source comments.  In this case, job-defaults and jobs are handled by taskcluster/taskgraph/task/transform.py.
Flags: needinfo?(dustin)
OK, I added what seems to be enough to make my script run:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=96d6cf41e1b8

But the log just says:

with no output, even with the |set -x -v| bits inside the script.  I can imagine things in the script erroring, but that's not what appears to be happening; everything seems to be running in less than a second.  Do you have ideas about what I'm doing wrong here, or what I should be poking at?  Should I just be rewriting everything in Python for more friendliness towards Windows?
Flags: needinfo?(dustin)
That one is a little over my head.  Do you need to run that as "c:\path\to\bash .\build\src\..."?  I'm totally guessing on that account..
Flags: needinfo?(dustin) → needinfo?(pmoore)
(In reply to Dustin J. Mitchell [:dustin] from comment #8)
> That one is a little over my head.  Do you need to run that as
> "c:\path\to\bash .\build\src\..."?  I'm totally guessing on that account..

That appears to be it, now we are getting a little farther.  Thanks!
Flags: needinfo?(pmoore)
Nathan, let me know if you get stuck again with regard to tasks not running as expected on Windows.
Finally got something that builds clang-cl on taskcluster!

https://treeherder.mozilla.org/#/jobs?repo=try&revision=16600ac5fe0b
We need an toolchain-script runner for generic-worker worker types.

This patch depends on bug 1309337, and in particular the WIP patch I uploaded
there.  As we've discussed in that bug, that patch will need some
modifications; I plan on reflecting whatever modifications are made into the
patch here, but I don't expect substantial changes to be necessary to this part.
Attachment #8802970 - Flags: review?(dustin)
The build-clang-windows.sh is basically an exact copy of the Linux one
at the moment; we'll fix it up for all the Windows-specific taskcluster
bits in a future commit.
Attachment #8802971 - Flags: review?(dustin)
Attachment #8796613 - Attachment is obsolete: true
There are a number of changes here:

- Taskcluster machines don't have MSVC installed, so we have to setup all
the MSVC infrastructure ourselves;
- Gecko contains useful Python packages to use, so we need to setup a
of that for us;
- We need to add build tools to our PATH, as they are not pre-installed;
- We need to define UPLOAD_PATH ourself.

These are all of the build-clang-windows.sh changes; future changes will focus
on the build-clang.py script.
Attachment #8802972 - Flags: review?(ehsan)
We cannot depend on a fixed location for cl.exe in a taskcluster world.
We therefore need to make our build-clang.py script accomodate relative
path names for cc/cxx and assume those are binaries that should be
looked up on PATH.

We also need to modify the Linux build script so that the virtualenv is
used to look up the new 'which' package.
Attachment #8802973 - Flags: review?(ehsan)
It doesn't seem good to tie ourselves to the Gecko tooltool manifest for
building clang-cl; we want to stick with something we can update on
clang-cl's schedule, not Gecko's.
Attachment #8802974 - Flags: review?(ehsan)
We need these tools to build, and they are no longer installed on the build
machines for us.
Attachment #8802975 - Flags: review?(ehsan)
This change makes the build somewhat less noisy and somewhat faster, by
virtue of not having to print out all the status messages.
Attachment #8802976 - Flags: review?(ehsan)
CMake is unhappy if we don't provide this piece of information.
Attachment #8802977 - Flags: review?(ehsan)
Due to CMake oddities, we need to escape whatever paths we pass in here.
Attachment #8802978 - Flags: review?(ehsan)
In a taskcluster world, we cannot used fixed directories, since we don't
know the absolute path of the directory we're building in ahead of time.
(We could pass it in to the build script, or discover it in the script
itself, but that wouldn't really solve the next problem.)  This change
does make the builds not reproducible, but as we're using clang-cl
purely for secondary purposes on Windows, rather than for shipping
Firefox binaries (as we would on Mac, say), I don't feel bad about
punting the reproducibility issue down the road a bit.
Attachment #8802979 - Flags: review?(ehsan)
clang-cl would normally derive its MSVC emulation bits from the
installed MSVC version, but we don't have an installed MSVC in this
scenario, so we have to use command-line options instead.  We use
similar options for Gecko builds.
Attachment #8802980 - Flags: review?(ehsan)
Newer clang-cl is more better.
Attachment #8802981 - Flags: review?(ehsan)
Taskcluster builds live in a different place than our buildbot builds did.
Attachment #8802982 - Flags: review?(ehsan)
Attachment #8802972 - Flags: review?(ehsan) → review+
Attachment #8802973 - Flags: review?(ehsan) → review+
Attachment #8802974 - Flags: review?(ehsan) → review+
Attachment #8802975 - Flags: review?(ehsan) → review+
Attachment #8802976 - Flags: review?(ehsan) → review+
Attachment #8802977 - Flags: review?(ehsan) → review+
Attachment #8802978 - Flags: review?(ehsan) → review+
Comment on attachment 8802979 [details] [diff] [review]
part 11 - use a relative path for the build directory on windows

Review of attachment 8802979 [details] [diff] [review]:
-----------------------------------------------------------------

Fair.
Attachment #8802979 - Flags: review?(ehsan) → review+
Attachment #8802980 - Flags: review?(ehsan) → review+
Attachment #8802981 - Flags: review?(ehsan) → review+
Comment on attachment 8802982 [details] [diff] [review]
part 14 - correct tar package substitution for new taskcluster scheme

Review of attachment 8802982 [details] [diff] [review]:
-----------------------------------------------------------------

r=me with the nit below fixed.

::: build/build-clang/build-clang.py
@@ +77,5 @@
>      name = os.path.realpath(name)
>      # On Windows, we have to convert this into an msys path so that tar can
>      # understand it.
>      if is_windows():
> +        name = name.replace('\\', '/').replace('Z:', '/z')

This would make the script almost impossible to run locally since nobody has a Z: drive unless they can mount a network drive or some such.

Can you please switch to use some regex magic to make this work with all drive names?
Attachment #8802982 - Flags: review?(ehsan) → review+
Comment on attachment 8802970 [details] [diff] [review]
part 2 - add windows toolchain build transform

Review of attachment 8802970 [details] [diff] [review]:
-----------------------------------------------------------------

(with this change)

@@ +81,5 @@
> +
> +    # We fetch LLVM SVN into this.
> +    # XXX can we get rid of the level-1- part?
> +    svn_cache = 'level-1-toolchain-clang-cl-build-svn'

It should be 'level-{}-toolchain-clang-cl-build-svn'.format(config.params['level']).  We want to avoid sharing caches between different levels to avoid cache-poisoning attacks.
Attachment #8802970 - Flags: review?(dustin) → review+
Attachment #8802971 - Flags: review?(dustin) → review+
Pushed by nfroyd@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/d89b13772976
part 1 - move linux toolchain builds to a separate linux.yml file; r=dustin
https://hg.mozilla.org/integration/mozilla-inbound/rev/e2a6502b8b05
part 2 - add windows toolchain build transform; r=dustin
https://hg.mozilla.org/integration/mozilla-inbound/rev/b952a9b4914d
part 3 - add clang-cl toolchain builds; r=dustin
https://hg.mozilla.org/integration/mozilla-inbound/rev/244d6edb84c4
part 4 - modify build-clang-windows.sh for running under taskcluster; r=ehsan
https://hg.mozilla.org/integration/mozilla-inbound/rev/00ed949e1477
part 5 - modify clang-cl toolchain config to look for just cl.exe; r=ehsan
https://hg.mozilla.org/integration/mozilla-inbound/rev/139b91fda85f
part 6 - use a different tooltool manifest for clang-cl builds; r=ehsan
https://hg.mozilla.org/integration/mozilla-inbound/rev/dacd0fa7deb4
part 7 - add SVN, CMake, and Ninja to the clang-cl toolchain build; r=ehsan
https://hg.mozilla.org/integration/mozilla-inbound/rev/2cd4658bd132
part 8 - run svn {co,up} with the -q flag; r=ehsan
https://hg.mozilla.org/integration/mozilla-inbound/rev/79d9df8c1d4a
part 9 - add CMAKE_ASM_COMPILER to cmake invocation; r=ehsan
https://hg.mozilla.org/integration/mozilla-inbound/rev/e7b3d2a53ea1
part 10 - slashify paths for cmake; r=ehsan
https://hg.mozilla.org/integration/mozilla-inbound/rev/19ecf350a02a
part 11 - use a relative path for the build directory on windows; r=ehsan
https://hg.mozilla.org/integration/mozilla-inbound/rev/686e30bda72c
part 12 - explicitly select MSVC version for clang-cl to emulate in stage 2+; r=ehsan
https://hg.mozilla.org/integration/mozilla-inbound/rev/402494fd3465
part 13 - update clang-cl revision to something that builds OK with ASan and MSVC2015u3; r=ehsan
https://hg.mozilla.org/integration/mozilla-inbound/rev/3762c5ea3950
part 14 - correct tar package substitution for new taskcluster scheme; r=ehsan
https://hg.mozilla.org/mozilla-central/rev/d89b13772976
https://hg.mozilla.org/mozilla-central/rev/e2a6502b8b05
https://hg.mozilla.org/mozilla-central/rev/b952a9b4914d
https://hg.mozilla.org/mozilla-central/rev/244d6edb84c4
https://hg.mozilla.org/mozilla-central/rev/00ed949e1477
https://hg.mozilla.org/mozilla-central/rev/139b91fda85f
https://hg.mozilla.org/mozilla-central/rev/dacd0fa7deb4
https://hg.mozilla.org/mozilla-central/rev/2cd4658bd132
https://hg.mozilla.org/mozilla-central/rev/79d9df8c1d4a
https://hg.mozilla.org/mozilla-central/rev/e7b3d2a53ea1
https://hg.mozilla.org/mozilla-central/rev/19ecf350a02a
https://hg.mozilla.org/mozilla-central/rev/686e30bda72c
https://hg.mozilla.org/mozilla-central/rev/402494fd3465
https://hg.mozilla.org/mozilla-central/rev/3762c5ea3950
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla52
Funny story: on your push, we got three burned gecko-decision jobs, a la https://queue.taskcluster.net/v1/task/PUHEG5gHQxGfoKlFTMAyDQ/runs/0/artifacts/public%2Flogs%2Flive_backing.log but because we feel that our clownshoes make us look rather dashing, we just didn't worry about that.

Apparently using whatever random magic pixie dust and unicorn farts are used to decide when to run things, that meant we never did try a toolchain-win32-clang-cl build on m-i on any subsequent pushes, waiting until it was merged to m-c and then to fx-team and autoland to run it burning on every single push thereafter, because it thinks it can |Z:\task_1477531665>"c:\Program Files\Mercurial\hg.exe" share c:\builds\hg-shared\mozilla-central .\build\src| and as https://queue.taskcluster.net/v1/task/UyBe5PpBRlO2_VN9F2Zk1w/runs/0/artifacts/public%2Flogs%2Flive_backing.log says, can't do that, ain't no repo there to share.

Hidden on every tree.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(In reply to Phil Ringnalda (:philor) from comment #30)
> Funny story: on your push, we got three burned gecko-decision jobs, a la
> artifacts/public%2Flogs%2Flive_backing.log but because we feel that our
> clownshoes make us look rather dashing, we just didn't worry about that.

This was because we didn't have the proper permissions for caches setup on non-try builds.

> Apparently using whatever random magic pixie dust and unicorn farts are used
> to decide when to run things, that meant we never did try a
> toolchain-win32-clang-cl build on m-i on any subsequent pushes, waiting

This is because the toolchain-win32-clang-cl builds are only run when there are changes to the relevant files; we don't need to be building a toolchain every push.

> until it was merged to m-c and then to fx-team and autoland to run it
> burning on every single push thereafter, because it thinks it can
> c:\builds\hg-shared\mozilla-central .\build\src| and as
> artifacts/public%2Flogs%2Flive_backing.log says, can't do that, ain't no
> repo there to share.

This I have no idea about; it *did* work OK on try.  Pete, do you have any insight with this?
Flags: needinfo?(pmoore)
I suspect related to bug 1313539. Rob, are any changes needed, or does the fix from bug 1313539 sort this out here too? Thanks!
Flags: needinfo?(pmoore) → needinfo?(rthijssen)
c:\builds\hg-shared no longer exists on taskcluster windows workers (except try builders (gecko-1-b-win2012)) as part of the move from hg share ... to hg robustcheckout ... (see bug 1305485). repositories need two patches to make this work. they are: https://hg.mozilla.org/mozilla-central/rev/d99d0f49ac4a and https://hg.mozilla.org/mozilla-central/rev/9803c0158f92
Flags: needinfo?(rthijssen)
Nathan, sounds like a rebase should fix it.
Flags: needinfo?(nfroyd)
(In reply to Pete Moore [:pmoore][:pete] from comment #34)
> Nathan, sounds like a rebase should fix it.

Applying the robustcheckout part is easy; applying the vcs_share_base part is trickier, because these builds don't use mozharness.  Will y:\ be mounted by default, so I can just checkout into y:\hg-shared and then |hg share| that into my z:\ directory?
Flags: needinfo?(nfroyd) → needinfo?(pmoore)
there's probably no need for the vcs_share_base stuff. just do a robustcheckout into your z:\task* directory, providing the --sharebase y:\hg-shared flag and you're done.
We should probably use a cache for all shared vcs object storage. The storage location would then be managed by the generic worker, and it would take care of mounting the cache for tasks that require it. That would also get you scope-protection for the vcs share too - so this would be my preferred solution.

I don't know about robustcheckout - so I'd defer to :grenade / :gps on that topic.

The docson UI is now working properly for the generic worker payload - so take a look at https://docs.taskcluster.net/manual/execution/workers/generic-worker to see the specifics.
Flags: needinfo?(pmoore) → needinfo?(nfroyd)
(In reply to Rob Thijssen (:grenade - GMT) from comment #36)
> there's probably no need for the vcs_share_base stuff. just do a
> y:\hg-shared flag and you're done.

This works, thanks!

(In reply to Pete Moore [:pmoore][:pete] from comment #37)
> We should probably use a cache for all shared vcs object storage. The
> storage location would then be managed by the generic worker, and it would
> take care of mounting the cache for tasks that require it. That would also
> get you scope-protection for the vcs share too - so this would be my
> preferred solution.
>
> I don't know about robustcheckout - so I'd defer to :grenade / :gps on that
> topic.

I don't really understand why the cache solution would be preferred over the robustcheckout stuff and the shared y: directory.  Wouldn't the mozharness builds want to be using that if so?
Flags: needinfo?(nfroyd)
(In reply to Nathan Froyd [:froydnj] from comment #38)
opic.
> I don't really understand why the cache solution would be preferred over the
> robustcheckout stuff and the shared y: directory.  Wouldn't the mozharness
> builds want to be using that if so?

Ideally everything required for a task can be defined in-tree, and the worker is something flexible that can run any type of task it is given. This might be a gecko build/test task, but ideally we'd be able to use e.g. a windows 2012 worker to also run arbitrary github CI jobs that need to run against windows 2012 (just as an example).

In the big scheme of things, it doesn't make so much difference in this particular case, but using caches allows worker types to be more general-purpose, spanning project domains, and also provides better granularity of control over which tasks can access which caches, and offers some benefits over cache retention... Also at some point soon, caches can be purged if needed e.g. by Sheriffs (this is still a TODO for generic worker, but is already implemented in docker worker).

In short, you just get better management and bookkeeping, more control over purging / retention / access controls, and less hard-coding things into specific worker types (so worker types are more general purpose and require less project-specific maintenance).
This change is needed after recent changes removed the shared hg checkout on C:.

I can believe that the cache might be more better, but I would prefer to simply
mirror what the regular mozharness-based TC builds do for now.
Attachment #8807209 - Flags: review?(pmoore)
This seems like something that the in-tree taskcluster infrastructure
ought to be able to handle, but in the absence of that functionality,
this is the next best option.
Attachment #8807210 - Flags: review?(dustin)
Attachment #8807210 - Flags: review?(dustin) → review+
Comment on attachment 8807209 [details] [diff] [review]
part 15 - update windows toolchain builds to use |hg robustcheckout|

Looks good to me. OK with you Rob?
Attachment #8807209 - Flags: review?(rthijssen)
Attachment #8807209 - Flags: review?(pmoore)
Attachment #8807209 - Flags: review+
Comment on attachment 8807209 [details] [diff] [review]
part 15 - update windows toolchain builds to use |hg robustcheckout|

Review of attachment 8807209 [details] [diff] [review]:
-----------------------------------------------------------------

we no longer have to specify the full path to hg as the environment PATH now includes "C:\Program Files\Mercurial" early. it does no harm, but isn't a requirement anymore.
Attachment #8807209 - Flags: review?(rthijssen) → review+
Pushed by nfroyd@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/1c0816657de0
part 15 - update windows toolchain builds to use |hg robustcheckout|; r=pmoore
https://hg.mozilla.org/integration/mozilla-inbound/rev/b135c8db68b7
part 16 - make the toolchain builds also depend on their taskcluster transforms; r=dustin
The ClangCL job is green on the push from comment 44.  What needs to be done to unhide the job?
Flags: needinfo?(philringnalda)
Wait until that's merged around, then poke me.
Flags: needinfo?(philringnalda)
https://hg.mozilla.org/mozilla-central/rev/1c0816657de0
https://hg.mozilla.org/mozilla-central/rev/b135c8db68b7
Status: REOPENED → RESOLVED
Closed: 6 years ago6 years ago
Resolution: --- → FIXED
Is it possible to run clang-cl builds on try? I want to confirm that bug 1313280 will not break clang-cl builds.
Here's some failed attempts:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=a0f1aff44966&exclusion_profile=false
https://treeherder.mozilla.org/#/jobs?repo=try&revision=70f76429da8c967ac9f7881f2ece71923c0c3404&exclusion_profile=false
https://treeherder.mozilla.org/#/jobs?repo=try&revision=482e5b78c07d84b19e82c6df3939a2ca191e3bd3&exclusion_profile=false
Flags: needinfo?(nfroyd)
https://dxr.mozilla.org/mozilla-central/source/taskcluster/ci/toolchain/windows.yml#19 - files-changed builds don't care what you ask for, nor do they care that you don't ask for them, they run when their files are touched and don't when they aren't, so to force it you just have to make any random no-op change to one of those files, like https://treeherder.mozilla.org/#/jobs?repo=try&revision=adca276d6156b09745879d9450faf5d3e3cdc754

(No need for the &exclusion_profile=false now, I've unhidden it.)
Flags: needinfo?(nfroyd)
(In reply to Phil Ringnalda (:philor) from comment #49)
> windows.yml#19 - files-changed builds don't care what you ask for, nor do
> they care that you don't ask for them, they run when their files are touched
> and don't when they aren't, so to force it you just have to make any random
> no-op change to one of those files, like
> https://treeherder.mozilla.org/#/
That task does a build of clang-cl.exe from LLVM source. If I understand correctly, I think :emk wants to test the build of the Mozilla tree, using clang-cl as the compiler?
Actually we should probably not block on that. Bug 1313280 looks pretty critical so landing it should be more important than preserving clang-cl builds. I can deal with bustage later.
(In reply to David Major [:dmajor] from comment #51)
You are right. Do I have to download the built binary and build Mozilla locally?
Yeah, I _think_ local builds are the only option at the present time.