Closed Bug 1349980 Opened 3 years ago Closed 3 years ago

Migrate some OS X workers from taskcluster-worker to generic-worker and run some OS X gecko tests

Categories

(Taskcluster :: Workers, enhancement)

enhancement
Not set

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: pmoore, Assigned: pmoore)

References

Details

Attachments

(7 files, 3 obsolete files)

In order to see if generic-worker is a valid worker to use for initial rollout of taskcluster on OS X, move some of the existing taskcluster-worker pool over to generic-worker, and get some jobs running, to see how results compare.
Attached patch bug1349980_puppet_v1.patch (obsolete) — Splinter Review
Hi Aki,

I thought you'd be a good person to review as you've set up other workers in scl3. This first patch doesn't achieve a lot, since it only migrates one machine from taskcluster-worker to generic-worker, and that machine is already manually pinned to my source branch checked out on releng-puppet2 however ...

Once in-tree task generation is able to correctly generate task payloads for generic worker mac jobs, I'll make a second patch which just moves over more nodes in the pool, rather than just one - so this is a chance to check everything is ok, before we move over a larger part of the pool over.

Note, the binaries and secrets referenced in this patch are already deployed in puppet. Here is an example task group that demonstrate the worker is running:

https://tools.taskcluster.net/task-group-inspector/#/T47kVjR4TASpbgH-_-_O4w?_k=zbibwf

Note, in that task group, the OS X tasks are failing because the payload definitions are not yet valid - which is the problem I want to solve before moving more of the pool over from taskcluster-worker to generic-worker. However, it does demonstrate that the worker is running on t-yosemite-r7-0049, and that it is able to claim tasks.
Assignee: nobody → pmoore
Attachment #8851659 - Flags: review?(aki)
Comment on attachment 8851659 [details] [diff] [review]
bug1349980_puppet_v1.patch

Overall, r+, with a question about why we're keeping toplevel/worker.

>diff --git a/manifests/moco-nodes.pp b/manifests/moco-nodes.pp
>--- a/manifests/moco-nodes.pp
>+++ b/manifests/moco-nodes.pp
<snip>
>-node /t-yosemite-r7-004[0-9]\.test\.releng\.scl3\.mozilla\.com/ {
>+# bug 1349980: 9/10 available workers to use taskcluster-worker for now
>+# see below for 1 worker definition that uses generic-worker
>+node /t-yosemite-r7-004[0-8]\.test\.releng\.scl3\.mozilla\.com/ {
>     $aspects = [ 'low-security' ]
>     $slave_trustlevel = 'try'
>-    include toplevel::worker::releng::test::gpu
>+    include toplevel::taskcluster_worker::releng::test::gpu

I like making the top level name more explicit.  Is there a reason we're still keeping toplevel::worker around?  Just a shared inherited class?

>+    # Bug 1338557 - Please add pmoore public key to authorized_keys file of cltbld/root user of t-yosemite-r7-{0040..0049}
>+    #realize(Users::Person["pmoore"])

We can get rid of the above commented "realize" line if we don't need it.

>+# bug 1349980: only one worker to use generic-worker for now, until in-tree task generation workingcorrectly
>+node /t-yosemite-r7-0049\.test\.releng\.scl3\.mozilla\.com/ {
>+    $aspects = [ 'low-security' ]
>+    $slave_trustlevel = 'try'
>+    include toplevel::generic_worker::releng::test::gpu
>
>     # Bug 1338557 - Please add pmoore public key to authorized_keys file of cltbld/root user of t-yosemite-r7-{0040..0049}
>     #realize(Users::Person["pmoore"])

Same here.

For the below, I'm not entirely sure why we have worker as opposed to taskcluster_worker or generic_worker.  Is this just to be an analog to toplevel/manifests/slave/ ?

If there's a good reason for it, let's keep it.  If it makes more sense to be explicit and specify the worker implementation, let's do that.  All else equal, I'd lean towards the latter.

>diff --git a/modules/toplevel/manifests/worker/releng.pp b/modules/toplevel/manifests/worker/releng.pp
>--- a/modules/toplevel/manifests/worker/releng.pp
>+++ b/modules/toplevel/manifests/worker/releng.pp
>@@ -1,13 +1,13 @@
> # This Source Code Form is subject to the terms of the Mozilla Public
> # License, v. 2.0. If a copy of the MPL was not distributed with this
> # file, You can obtain one at http://mozilla.org/MPL/2.0/.
>
>-class toplevel::worker::releng inherits toplevel::worker {
>+class toplevel::taskcluster_worker::releng inherits toplevel::worker {

>diff --git a/modules/toplevel/manifests/worker/releng/test.pp b/modules/toplevel/manifests/worker/releng/test.pp
>--- a/modules/toplevel/manifests/worker/releng/test.pp
>+++ b/modules/toplevel/manifests/worker/releng/test.pp
>@@ -1,13 +1,13 @@
> # This Source Code Form is subject to the terms of the Mozilla Public
> # License, v. 2.0. If a copy of the MPL was not distributed with this
> # file, You can obtain one at http://mozilla.org/MPL/2.0/.
>
>-class toplevel::worker::releng::test inherits toplevel::worker::releng {
>+class toplevel::generic_worker::releng::test inherits toplevel::generic_worker::releng {

>diff --git a/modules/toplevel/manifests/worker/releng/test/gpu.pp b/modules/toplevel/manifests/worker/releng/test/gpu.pp
>--- a/modules/toplevel/manifests/worker/releng/test/gpu.pp
>+++ b/modules/toplevel/manifests/worker/releng/test/gpu.pp
>@@ -1,13 +1,13 @@
> # This Source Code Form is subject to the terms of the Mozilla Public
> # License, v. 2.0. If a copy of the MPL was not distributed with this
> # file, You can obtain one at http://mozilla.org/MPL/2.0/.
>
>-class toplevel::worker::releng::test::gpu inherits toplevel::worker::releng::test {
>+class toplevel::generic_worker::releng::test::gpu inherits toplevel::generic_worker::releng::test {
Attachment #8851659 - Flags: review?(aki) → review+
Hey Aki,

Many thanks for your swift review! :-)

> We can get rid of the above commented "realize" line if we don't need it.

Nice spot, will get rid of those lines.

> I like making the top level name more explicit.  Is there a reason we're still keeping toplevel::worker around?  Just a shared inherited class? For the below, I'm not entirely sure why we have worker as opposed to taskcluster_worker or generic_worker.  Is this just to be an analog to toplevel/manifests/slave/ ?

There is likely to be some overlap of having both generic-worker and taskcluster-worker deployed while fixing up generic-worker, until it can take over tasks wholesale from taskcluster-worker. I anticipate the same in reverse when we decommission generic-worker at some point in the future (i.e. for a while, we'll have them both running in parallel), so I've tried to keep commonality localised in the worker module, and then only include worker-specific content in the respective worker modules (executable location, launch profile, config file, ...) so that as much as possible, we differences in task results can be attributed to worker implementation differences, rather than being inconsistencies in the puppet setup.

I'll upload a patch later today. Thanks!
Regarding the toplevel hierarchy, remember it's a single hierarchy and (mixins aside) a host can only be in one place in that heirarchy.  So it's OK to have toplevel::generic_worker and toplevel::taskcluster_worker, but a host can't be both toplevel::worker::<something> and toplevel::generic_worker::<something>.

It would probably make sense to add the worker implementation after "releng", for example:

toplevel::worker::releng::generic_worker::test::gpu
toplevel::worker::releng::taskcluster_worker::test::gpu

then modules/toplevel/manifests/worker/relneg/generic_worker.pp would install generic-worker (via "include generic_worker").
Attached patch bug1349980_puppet_v2.patch (obsolete) — Splinter Review
Hi Aki, Dustin,

After looking more carefully, I realised I had made a bit of a mess of the inheritance tree, as you quite rightly spotted - so I've reworked the patch now.

I've tested on t-yosemite-r7-0048 (taskcluster-worker) and t-yosemite-r7-0049 (generic-worker) and I believe both machines are still functioning correctly...

Sorry for all the movement - hope this patch is a bit cleaner.

I've adopted the inheritance scheme suggested by Dustin:

  toplevel::worker::releng::{generic,taskcluster}_worker::test::gpu

Let me know if I've committed any crimes!

Many thanks!
Pete
Attachment #8851659 - Attachment is obsolete: true
Attachment #8852117 - Flags: review?(aki)
Attachment #8852117 - Flags: feedback?(dustin)
One sec - need to quickly fix that patch!!!
Sorry about that!
Attachment #8852117 - Attachment is obsolete: true
Attachment #8852117 - Flags: review?(aki)
Attachment #8852117 - Flags: feedback?(dustin)
Attachment #8852127 - Flags: review?(aki)
Attachment #8852127 - Flags: feedback?(dustin)
Attachment #8852127 - Flags: review?(aki) → review+
Alin kindly merged to production for me, but there is a bug. Will attach a patch now to fix (tested by pinning slave to my environment).
Attachment #8852462 - Flags: review?(dustin)
Attachment #8852462 - Flags: review?(dustin) → review+
Attachment #8852127 - Flags: feedback?(dustin) → feedback+
Component: Worker → Generic-Worker
Release 8.1.0 of generic-worker added "name" as an optional property in a task payload artifact, such that the artifact name can be different to the filesystem path location if required.

This is required for OS X.

This change is backwardly compatible.

Upgrading to 8.1.1 to align release version with other workers (that are also currently being upgraded to 8.1.1).

Please note, we can't deploy this patch yet, as we need to make the binary release available on the puppet masters. However, atm it looks like I can't log on as root into releng-puppet2.srv.releng.scl3.mozilla.com in order to get it added. However, that shouldn't block the review, so putting out for review already.
Attachment #8854391 - Flags: review?(rthijssen)
Comment on attachment 8854391 [details] [diff] [review]
bug1349980_puppet_upgrade-gw-8.1.1_v1.patch

Review of attachment 8854391 [details] [diff] [review]:
-----------------------------------------------------------------

also copied binary to /data/repos/EXEs/generic-worker-v8.1.1-darwin-amd64 on distinguished master
Attachment #8854391 - Flags: review?(rthijssen) → review+
Awesome, many thanks Rob!
Landed, and merged to production branch.
This patch takes 4 more workers for generic-worker, so the pool will be split evenly:

  * taskcluster-worker: 5 machines
  * generic-worker: 5 machines

taskcluster-worker machines:
  * t-yosemite-r7-0040
  * t-yosemite-r7-0041
  * t-yosemite-r7-0042
  * t-yosemite-r7-0043
  * t-yosemite-r7-0044

generic-worker machines:
  * t-yosemite-r7-0045
  * t-yosemite-r7-0046
  * t-yosemite-r7-0047
  * t-yosemite-r7-0048
  * t-yosemite-r7-0049
Attachment #8854854 - Flags: review?(wcosta)
Attachment #8854854 - Flags: review?(wcosta) → review+
Attached patch bug1349980_gecko_v1.patch (obsolete) — Splinter Review
Hey Dustin,

This might not be the absolute last revision, based on try results that come in, but I at least wanted to get review/feedback for the work so far, so that when it comes for final review, there aren't any major surprises!

That said, if try pushes go well, this might be the final version - I just wanted to call out that there could be some further changes prior to landing.

I currently have a try push running here:
 *  https://treeherder.mozilla.org/#/jobs?repo=try&revision=1e87b67b0f9c88e44993956d7a555e59a74b8ddb

However, some of those Mac tasks failed because of some manual kills of the worker, which probably left a firefox browser open.

Previously generic worker only ran on Windows, so the in-tree code mostly assumed generic-worker => windows, so this change uses same worker implementation (generic-worker) but checks build_platform or test_platform when needed, in order to determine correct task payload to generate.

In addition, it supports using the flag '-g' in try syntax in order to trigger the generic worker implementation to be used. This complements the existing '-w' flag to indicate to run taskcluster-worker tasks.

Any questions, let me know!

Many thanks!
Pete
Attachment #8854876 - Flags: review?(dustin)
Dustin, I've spotted I had some flake8 lint issues, I'll address these together with your review feedback.


TEST-UNEXPECTED-ERROR | /home/worker/checkouts/gecko/taskcluster/taskgraph/transforms/job/mozharness.py:239:100 | line too long (125 > 99 characters) (E501)
 
TEST-UNEXPECTED-ERROR | /home/worker/checkouts/gecko/taskcluster/taskgraph/transforms/job/mozharness_test.py:267:7 | continuation line over-indented for visual indent (E127)
 
TEST-UNEXPECTED-ERROR | /home/worker/checkouts/gecko/taskcluster/taskgraph/transforms/job/mozharness_test.py:269:100 | line too long (113 > 99 characters) (E501)
 
TEST-UNEXPECTED-ERROR | /home/worker/checkouts/gecko/taskcluster/taskgraph/transforms/task.py:224:34 | at least two spaces before inline comment (E261)

TEST-UNEXPECTED-ERROR | /home/worker/checkouts/gecko/taskcluster/taskgraph/transforms/tests.py:345:100 | line too long (106 > 99 characters) (E501)
 
TEST-UNEXPECTED-ERROR | /home/worker/checkouts/gecko/taskcluster/taskgraph/transforms/tests.py:395:100 | line too long (115 > 99 characters) (E501)
Comment on attachment 8854876 [details] [diff] [review]
bug1349980_gecko_v1.patch

Review of attachment 8854876 [details] [diff] [review]:
-----------------------------------------------------------------

A few minor things, but in general I like the form of this.

::: taskcluster/taskgraph/transforms/job/mozharness.py
@@ +167,5 @@
>  
>  
>  # We use the generic worker to run tasks on Windows
>  @run_job_using("generic-worker", "mozharness", schema=mozharness_run_schema)
> +def mozharness_on_generic_worker(config, job, taskdesc):

I think eventually we'll need some kind of worker-os or something like that.  Maybe it's 'worker-config', to parallel the conversation about workerTypes we had yesterday. When I first set this up, I figured tc-worker would rule the world, and that we'd have a different engine for each OS, but neither of those are really the case now.  Anyway, just thinking about the future :)

@@ +186,5 @@
>  
>      worker = taskdesc['worker']
>  
>      worker['artifacts'] = [{
> +        'path': r'public/build',

Is this (and the other `/` below) OK for windows too?

::: taskcluster/taskgraph/transforms/job/mozharness_test.py
@@ +224,3 @@
>  
>      installer_url = get_artifact_url(
> +        '<build>', mozharness['build-artifact-name'])

nit: can be one line

@@ +238,5 @@
>  
>      worker['max-run-time'] = test['max-run-time']
>      worker['artifacts'] = artifacts
>  
> +    # not really sure if all of these are needed or not

Where did these come from?  Most of them aren't needed -- IDLEIZER, in particluar, is a moz-buildbot thing.  mozharness_test_on_native_engine has a better list (and includes some things omitted here)

@@ +261,5 @@
> +      ['python2.7', '-u', 'mozharness/scripts/' + mozharness['script']] \
> +        if build_platform.startswith('macosx') else \
> +      ['c:\\mozilla-build\\python\\python.exe', '-u', 'mozharness\\scripts\\' + normpath(mozharness['script'])] \
> +        if build_platform.startswith('win') else \
> +      ['python', '-u', 'mozharness/scripts/' + mozharness['script']]

This is some very tortured syntax, especially with the trailing `\`.  Please reformat into a regular multi-statement conditional.

@@ +267,4 @@
>      for mh_config in mozharness['config']:
> +        cfg_path = 'mozharness/configs/' + mh_config
> +        if build_platform.startswith('win'):
> +            cfg_path = normpath(cfg_path)

Hm, so I guess `/` vs `\` matters in some places?

::: taskcluster/taskgraph/transforms/task.py
@@ +221,5 @@
> +        # a string array
> +        Required('command'): Any(
> +            [taskref_or_string],  # Windows
> +            [[taskref_or_string]] # Linux / OS X
> +        ),

This doesn't correspond to https://docs.taskcluster.net/reference/workers/generic-worker/payload which specifies it must always be an array of strings..

@@ +232,5 @@
>  
>              # task image path from which to read artifact
>              'path': basestring,
> +
> +            Optional('name'): basestring

Need a comment here
Attachment #8854876 - Flags: review?(dustin) → review-
(In reply to Dustin J. Mitchell [:dustin] from comment #18)
> Comment on attachment 8854876 [details] [diff] [review]
> bug1349980_gecko_v1.patch
> 
> Review of attachment 8854876 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> A few minor things, but in general I like the form of this.
> 
> ::: taskcluster/taskgraph/transforms/job/mozharness.py
> @@ +167,5 @@
> >  
> >  
> >  # We use the generic worker to run tasks on Windows
> >  @run_job_using("generic-worker", "mozharness", schema=mozharness_run_schema)
> > +def mozharness_on_generic_worker(config, job, taskdesc):
> 
> I think eventually we'll need some kind of worker-os or something like that.
> Maybe it's 'worker-config', to parallel the conversation about workerTypes
> we had yesterday. When I first set this up, I figured tc-worker would rule
> the world, and that we'd have a different engine for each OS, but neither of
> those are really the case now.  Anyway, just thinking about the future :)

Yeah good point. Shall we keep at as is for now?

> @@ +186,5 @@
> >  
> >      worker = taskdesc['worker']
> >  
> >      worker['artifacts'] = [{
> > +        'path': r'public/build',
> 
> Is this (and the other `/` below) OK for windows too?

Yes, the path works either way on Windows. I could special-case it to make the path with backslashes on Windows, but it isn't technically necessary.

> 
> ::: taskcluster/taskgraph/transforms/job/mozharness_test.py
> @@ +224,3 @@
> >  
> >      installer_url = get_artifact_url(
> > +        '<build>', mozharness['build-artifact-name'])
> 
> nit: can be one line

OK

> 
> @@ +238,5 @@
> >  
> >      worker['max-run-time'] = test['max-run-time']
> >      worker['artifacts'] = artifacts
> >  
> > +    # not really sure if all of these are needed or not
> 
> Where did these come from?  Most of them aren't needed -- IDLEIZER, in
> particluar, is a moz-buildbot thing.  mozharness_test_on_native_engine has a
> better list (and includes some things omitted here)

This was combined from the list of ones used in native engine implementation, plus the additional ones set for native engine in the taskcluster worker puppet config:

https://hg.mozilla.org/build/puppet/file/a5ba4999d9a5/modules/taskcluster_worker/templates/taskcluster-worker.yml.erb#l31

After that, I trimmed the mozharness / gecko ones that were used by test-macosx.sh (since we don't use that, but embed the tasks in the task payload):

https://hg.mozilla.org/mozilla-central/file/720b9177c685/taskcluster/scripts/tester/test-macosx.sh


> @@ +261,5 @@
> > +      ['python2.7', '-u', 'mozharness/scripts/' + mozharness['script']] \
> > +        if build_platform.startswith('macosx') else \
> > +      ['c:\\mozilla-build\\python\\python.exe', '-u', 'mozharness\\scripts\\' + normpath(mozharness['script'])] \
> > +        if build_platform.startswith('win') else \
> > +      ['python', '-u', 'mozharness/scripts/' + mozharness['script']]
> 
> This is some very tortured syntax, especially with the trailing `\`.  Please
> reformat into a regular multi-statement conditional.

OK

> 
> @@ +267,4 @@
> >      for mh_config in mozharness['config']:
> > +        cfg_path = 'mozharness/configs/' + mh_config
> > +        if build_platform.startswith('win'):
> > +            cfg_path = normpath(cfg_path)
> 
> Hm, so I guess `/` vs `\` matters in some places?

Yes, here it matters, so we have to do this.

> 
> ::: taskcluster/taskgraph/transforms/task.py
> @@ +221,5 @@
> > +        # a string array
> > +        Required('command'): Any(
> > +            [taskref_or_string],  # Windows
> > +            [[taskref_or_string]] # Linux / OS X
> > +        ),
> 
> This doesn't correspond to
> https://docs.taskcluster.net/reference/workers/generic-worker/payload which
> specifies it must always be an array of strings..

That is a limitation, that at the moment we only publish a single payload schema - I should fix that, so we publish both, and get the docs to list both.

Until now we only ran on Windows, so I hadn't invested effort into splitting this up, but probably would be wise to do now.

That payload schema actually comes from https://github.com/taskcluster/generic-worker/blob/09daf0b191b06d54259fe8db19fb3362b41cf45a/publish-payload-schema.sh#L6

The linux/mac schema is based on https://github.com/taskcluster/generic-worker/blob/09daf0b191b06d54259fe8db19fb3362b41cf45a/all-unix-style.yml which currently is not published.... So I'll need to do that, nice spot!

> @@ +232,5 @@
> >  
> >              # task image path from which to read artifact
> >              'path': basestring,
> > +
> > +            Optional('name'): basestring
> 
> Need a comment here

OK, will add.
(In reply to Pete Moore [:pmoore][:pete] from comment #19)
> Yeah good point. Shall we keep at as is for now?

Yes.

> Yes, the path works either way on Windows. I could special-case it to make
> the path with backslashes on Windows, but it isn't technically necessary.

OK, just checking :)

> > Where did these come from?  Most of them aren't needed -- IDLEIZER, in
> > particluar, is a moz-buildbot thing.  mozharness_test_on_native_engine has a
> > better list (and includes some things omitted here)
> 
> This was combined from the list of ones used in native engine
> implementation, plus the additional ones set for native engine in the
> taskcluster worker puppet config:
> 
> https://hg.mozilla.org/build/puppet/file/a5ba4999d9a5/modules/
> taskcluster_worker/templates/taskcluster-worker.yml.erb#l31
> 
> After that, I trimmed the mozharness / gecko ones that were used by
> test-macosx.sh (since we don't use that, but embed the tasks in the task
> payload):
> 
> https://hg.mozilla.org/mozilla-central/file/720b9177c685/taskcluster/scripts/
> tester/test-macosx.sh

Ah, ok.  Is it not possible to specify task env vars in any other way than in the task body?  Ideally most of those would still come from the host configuration, set via puppet.
(In reply to Dustin J. Mitchell [:dustin] from comment #20)

> Ah, ok.  Is it not possible to specify task env vars in any other way than
> in the task body?  Ideally most of those would still come from the host
> configuration, set via puppet.

Any env vars set in the generic-worker process would also be in the task environment (since for OS X we are currently not sandboxing users). Since I wasn't sure what these were, or if they were needed, my plan was to add them here, and then systematically remove entries until I found out which were really required. I'll check with wcosta if he knows what they are all for, maybe he knows from when they were added to the taskcluster-worker config file.
See Also: → 1354088
(In reply to Pete Moore [:pmoore][:pete] from comment #21)
> (In reply to Dustin J. Mitchell [:dustin] from comment #20)
> 
> > Ah, ok.  Is it not possible to specify task env vars in any other way than
> > in the task body?  Ideally most of those would still come from the host
> > configuration, set via puppet.
> 
> Any env vars set in the generic-worker process would also be in the task
> environment (since for OS X we are currently not sandboxing users). Since I
> wasn't sure what these were, or if they were needed, my plan was to add them
> here, and then systematically remove entries until I found out which were
> really required. I'll check with wcosta if he knows what they are all for,
> maybe he knows from when they were added to the taskcluster-worker config
> file.

I've created bug 1354088 and assigned to myself, to take care of whittling down this list of env vars at some point in the future. Since this rollout is key to getting the OS X migration completed on Mac, I don't want to block this bug on reducing the env var list, since it would add unnecessary delay; but cleanup in a quieter period would certainly be a good idea.
Addressed all review comments, and fixed lint issues. Also now uses generic-worker mounts feature to mount mozharness.zip from build task, rather than wget'ing it.

A couple of doc fixes improvements too.

Try push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=dcc0da283da2ee98b58a0d924f6e18d88d9d49df
Attachment #8854876 - Attachment is obsolete: true
Attachment #8855297 - Flags: review?(dustin)
Comment on attachment 8855297 [details] [diff] [review]
bug1349980_gecko_v2.patch

Review of attachment 8855297 [details] [diff] [review]:
-----------------------------------------------------------------

I think this looks good!  If you can fix up the .rst I can get it landed.

::: taskcluster/docs/taskgraph.rst
@@ +183,5 @@
> +By default, the above commands will only output a list of tasks. Use `-J` flag
> +to output full task definitions. For example, from your mozilla-central
> +checkout:
> +
> +``~/hg/mozilla-central $ ./mach taskgraph optimized -J -p ~/Downloads/parameters.yml``

Multi-line code looks like

```
.. code-block: none

    $ ./mach taskgraph optimized -J -p ~/Downloads/parameters.yml
```

There are some examples in how-tos.rst.

By the way, it'd be nice to have this in a separate commit.  That's a lot easier to request if you're using mozreview (which supports reviewing multiple commits), so just FYI :)

::: taskcluster/taskgraph/transforms/job/mozharness.py
@@ +203,5 @@
>  
> +    if not job['attributes']['build_platform'].startswith('win'):
> +        raise Exception(
> +            "Task generation for mozharness build jobs currently only supported on Windows"
> +        )

I like this!!

@@ +232,5 @@
> +            dedent('''\
> +            :: sccache currently uses the full compiler commandline as input to the
> +            :: cache hash key, so create a symlink to the task dir and build from
> +            :: the symlink dir to get consistent paths.
> +            if exist z:\\build rmdir z:\\build'''),

This is a big part of why I don't like embedding the build script in the task.  I see why we do it for now though :)
Attachment #8855297 - Flags: review?(dustin) → review+
Pushed by pmoore@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/baade5ccb7c9
update task generation to support -g in try syntax for running OS X tasks in generic-worker,r=dustin
I've pushed without the taskcluster/docs/taskgraph.rst change - i'll do that in a second patch. I should learn how to use mozreview, but it is a simple patch, so I'll just do old-school for now, and learn mozreview when I'm back from PTO. :-)
See Also: → 1311966
Here, just the taskgraph.rst docs patch in isolation. Thanks!
Attachment #8855399 - Flags: review?(dustin)
Attachment #8855399 - Flags: review?(dustin) → review+
Pushed by pmoore@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/3fb5694f8961
Highlight the '-J' flag of mach taskgraph command in taskcluster docs,r=dustin
This is a pretty simple change to make sure OS X workers reboot between tasks. It is actually the same approach we use on Windows 7 in AWS[1]. We have a wrapper script that calls generic worker, and this runs generic worker, and then reboots. Additionally, in the generic worker config, we specify that the worker should run one task and then exit. Combining these features means that the worker reboots after running a task.

There isn't control over which tasks trigger a reboot - we simply reboot after every task. This should serve our initial needs, and we can always optimise this behaviour later, but it at least gets machines in a clean state at the start of the task, which they are not now.

Tested successfully on t-yosemite-r7-0049.test.releng.scl3.mozilla.com by pinning it to pmoore user env on releng-puppet2.srv.releng.scl3.mozilla.com. Also unpinned, after successful testing.

------

[1] https://github.com/mozilla-releng/OpenCloudConfig/blob/8867028aefc9a5ef2ffd1f2827d2c6f7a69bce5c/userdata/Configuration/GenericWorker/run-generic-worker-format-and-reboot.bat#L24
Attachment #8855802 - Flags: review?(dustin)
Attachment #8855802 - Attachment filename: bug1349980_puppet_reboot-os-x-workers-every-task_v1.sh → bug1349980_puppet_reboot-os-x-workers-every-task_v1.patch
Attachment #8855802 - Attachment is patch: true
Attachment #8855802 - Attachment mime type: application/x-sh → text/plain
Attachment #8855802 - Flags: review?(dustin) → review+
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
See Also: → 1357712
Component: Generic-Worker → Workers
You need to log in before you can comment on or make changes to this bug.