Closed Bug 1359976 Opened 3 years ago Closed 3 years ago

Distinguish host OS from worker implementation

Categories

(Firefox Build System :: Task Configuration, task)

task
Not set

Tracking

(Not tracked)

RESOLVED FIXED
mozilla55

People

(Reporter: dustin, Assigned: dustin)

Details

Attachments

(3 files, 1 obsolete file)

We've sort of survived so far guessing the host OS by the worker implementation: 
 generic-worker -> windows
 native-worker -> OS X
 docker-worker -> Linux
etc.  But the above is already wrong and in general won't last for long.

Options off the top of my head:
 - keep `worker-implementation` but extend it (e.g., `generic-worker-windows`)
 - keep `worker-implementation` and add `worker-os`

I think this will just affect tests, as that's where we try to run the "same thing" (e.g., reftests) across multiple platforms.
Looking through how this works right now, it's a little crufty, so there will be some cleanup.  After that:

 * worker-implementation will grow a suffix like "..-on-windows" where it's not clear from the name
 * worker-implementation will be defined based on the worker-type, using a static mapping from worker-type to implementation
Comment on attachment 8865687 [details]
Bug 1359976: clean out un-handled optionally_keyed_by;

https://reviewboard.mozilla.org/r/137324/#review140330
Attachment #8865687 - Flags: review?(wcosta) → review+
Comment on attachment 8865688 [details]
Bug 1359976: remove references to now-unused test-platform-phylum;

https://reviewboard.mozilla.org/r/137326/#review140332
Attachment #8865688 - Flags: review?(wcosta) → review+
Comment on attachment 8865689 [details]
Bug 1359976: base worker payload generation on worker-type

https://reviewboard.mozilla.org/r/137328/#review140490

::: taskcluster/taskgraph/transforms/job/mozharness.py:108
(Diff revision 1)
>          'MOZHARNESS_SCRIPT': run['script'],
>          'MH_BRANCH': config.params['project'],
>          'MH_BUILD_POOL': 'taskcluster',
>          'MOZ_BUILD_DATE': config.params['moz_build_date'],
>          'MOZ_SCM_LEVEL': config.params['level'],
> +        'MOZ_AUTOMATION': '1',

This seems sane, but unrelated to this bug. Is the addition of `'MOZ_AUTOMATION': '1'` intentional?

::: taskcluster/taskgraph/transforms/job/mozharness_test.py:52
(Diff revision 1)
>      Required('test'): test_description_schema,
>  })
>  
>  
> +def test_packages_url(taskdesc):
> +    """Account for ifferent platforms that name their test packages differently"""

ifferent? :-)

::: taskcluster/taskgraph/transforms/job/mozharness_test.py:103
(Diff revision 1)
>          'MOZHARNESS_SCRIPT': mozharness['script'],
>          'MOZILLA_BUILD_URL': {'task-reference': installer_url},
>          'NEED_PULSEAUDIO': 'true',
>          'NEED_WINDOW_MANAGER': 'true',
>          'ENABLE_E10S': str(bool(test.get('e10s'))).lower(),
> +        'MOZ_AUTOMATION': '1',

As above, did you intend to include this `MOZ_AUTOMATION` change?

::: taskcluster/taskgraph/transforms/job/mozharness_test.py:192
(Diff revision 1)
>      test = taskdesc['run']['test']
>      mozharness = test['mozharness']
>      worker = taskdesc['worker']
>  
> +    is_macosx = worker['implementation'] == 'generic-worker-on-macosx'
> +    is_windows = worker['implementation'] == 'generic-worker-on-windows'

nice!

::: taskcluster/taskgraph/transforms/job/mozharness_test.py:221
(Diff revision 1)
>  
>      worker['max-run-time'] = test['max-run-time']
>      worker['artifacts'] = artifacts
>  
> +    env = worker.setdefault('env', {})
> +    env['MOZ_AUTOMATION'] = '1'

I guess it was intentional as now this is the third reference. :D

::: taskcluster/taskgraph/transforms/job/mozharness_test.py
(Diff revision 1)
> -    target = 'firefox-{}.en-US.{}'.format(get_firefox_version(), 'mac') \
> -        if build_platform == 'macosx64' and build_type == 'opt' else 'target'
> -
>      installer_url = get_artifact_url('<build>', mozharness['build-artifact-name'])
> -    test_packages_url = get_artifact_url('<build>',
> -                                         'public/build/{}.test_packages.json'.format(target))

nice cleanup

::: taskcluster/taskgraph/transforms/job/mozharness_test.py:351
(Diff revision 1)
>      command = worker['command'] = ["./test-macosx.sh"]
>      if mozharness.get('no-read-buildbot-config'):
>          command.append("--no-read-buildbot-config")
>      command.extend([
>          {"task-reference": "--installer-url=" + installer_url},
> -        {"task-reference": "--test-packages-url=" + test_packages_url},
> +        {"task-reference": "--test-packages-url=" + test_packages_url()},

Shouldn't this be `test_packages_url(taskdesc)` ?

::: taskcluster/taskgraph/transforms/tests.py:754
(Diff revision 1)
>              worker['loopback-video'] = test['loopback-video']
>              worker['loopback-audio'] = test['loopback-audio']
>              worker['max-run-time'] = test['max-run-time']
>              worker['retry-exit-status'] = test['retry-exit-status']
> +        else:
> +            raise Exception("unknown worker implementation {}".format(implementation))

nice!
Comment on attachment 8865689 [details]
Bug 1359976: base worker payload generation on worker-type

https://reviewboard.mozilla.org/r/137328/#review140490

> This seems sane, but unrelated to this bug. Is the addition of `'MOZ_AUTOMATION': '1'` intentional?

Yes, it used to be added in the task transform to a whitelist of worker implementations, which didn't make a lot of sense -- we don't necessarily want MOZ_AUTOMATION on every docker-worker invocation; we want it on everything that runs mach.

> As above, did you intend to include this `MOZ_AUTOMATION` change?

yes

> Shouldn't this be `test_packages_url(taskdesc)` ?

Indeed.  That's the downside of carrying around dead code while we transition things :(
Comment on attachment 8865689 [details]
Bug 1359976: base worker payload generation on worker-type

https://reviewboard.mozilla.org/r/137328/#review145106

I couldn't see the fixes from my last review - apologies if you made the fixes but I was looking at the wrong diff context etc.

::: taskcluster/taskgraph/transforms/job/mozharness_test.py:52
(Diff revision 2)
>      Required('test'): test_description_schema,
>  })
>  
>  
> +def test_packages_url(taskdesc):
> +    """Account for ifferent platforms that name their test packages differently"""

This still says `ifferent` as I can see it - maybe I don't understand mozreview well enough - is this fixed now?

::: taskcluster/taskgraph/transforms/job/mozharness_test.py:351
(Diff revision 2)
>      command = worker['command'] = ["./test-macosx.sh"]
>      if mozharness.get('no-read-buildbot-config'):
>          command.append("--no-read-buildbot-config")
>      command.extend([
>          {"task-reference": "--installer-url=" + installer_url},
> -        {"task-reference": "--test-packages-url=" + test_packages_url},
> +        {"task-reference": "--test-packages-url=" + test_packages_url()},

Didn't we agree this should be `test_packages_url(taskdesc)`?

I suspect I'm just not understanding the mozreview workflow, and looking at an old version, but raising issue, just to be safe.
Attachment #8865689 - Flags: review?(pmoore) → review-
I think you're doing mozreview right. I suspect I failed to operate mercurial correctly and it threw my changes out. Sorry, fix coming up..
Wondering if it is not worker-type that should carry os information. I see host os and work-implementation as orthogonal features. The in-tree config code that glue both together.
Yeah, the worker type implies both, really.  Now that you mention it, I missed

>  * worker-implementation will be defined based on the worker-type, using a static mapping from worker-type to implementation

so I guess the question is, should that be a map from
  "gecko-b-1-win2012r2": "generic-worker-on-windows"
or
  "gecko-b-1-win2012r2": ("generic-worker", "windows")

Thoughts, wander?
Comment on attachment 8865689 [details]
Bug 1359976: base worker payload generation on worker-type

withdrawing this review for the moment while I figure that out
Attachment #8865689 - Flags: review?(pmoore)
(In reply to Dustin J. Mitchell [:dustin] from comment #17)
> Yeah, the worker type implies both, really.  Now that you mention it, I
> missed
> 
> >  * worker-implementation will be defined based on the worker-type, using a static mapping from worker-type to implementation
> 
> so I guess the question is, should that be a map from
>   "gecko-b-1-win2012r2": "generic-worker-on-windows"
> or
>   "gecko-b-1-win2012r2": ("generic-worker", "windows")
> 
> Thoughts, wander?

The second option seems better to me. Will you assembly the work-implementation from this info? My point is that worker-implementation should tell which worker to use, and the info about which os it is should be the worker-type. I see so because in the in-tree config, tasks are in general assembled by worker payload, which is the same independent of the OS.
I see what you mean.  I think workerType is too specific (we often have different workerTypes with the same implementation and OS), but a worker-os field makes sense.  Then we can select payload builder based on worker implementation, and pass the worker OS to that builder.  The last patch in this sequence suggests this will work well, since both generic-worker-on-* are implemented with the same function.
(In reply to Dustin J. Mitchell [:dustin] from comment #20)
> I see what you mean.  I think workerType is too specific (we often have
> different workerTypes with the same implementation and OS), but a worker-os
> field makes sense.  Then we can select payload builder based on worker
> implementation, and pass the worker OS to that builder.  The last patch in
> this sequence suggests this will work well, since both generic-worker-on-*
> are implemented with the same function.

I mean that the os information could be part of worker-type, but not restrict to it. But I am not opposed to worker-os field. I let you decide what's the best fit.
Attachment #8870836 - Attachment is obsolete: true
Comment on attachment 8865689 [details]
Bug 1359976: base worker payload generation on worker-type

https://reviewboard.mozilla.org/r/137328/#review146704

I'm not as familiar with where we need MOZ_AUTOMATION and test settings.  The portions I am familiar with look like good changes.  If you need a more in-depth review, let me know.

::: taskcluster/taskgraph/util/workertypes.py:33
(Diff revision 4)
> +    'aws-provisioner-v1/gecko-t-win7-32-gpu': ('generic-worker', 'windows'),
> +    'aws-provisioner-v1/taskcluster-generic': ('docker-worker', 'linux'),
> +    'buildbot-bridge/buildbot-bridge': ('buildbot-bridge', None),
> +    'invalid/invalid': ('invalid', None),
> +    'null-provisioner/human-breakpoint': ('push-apk-breakpoint', None),
> +    'null-provisioner/human-breakpoint': ('push-apk-breakpoint', None),

We may use the human breakpoint workerType elsewhere (other than pushapk), but aiui nothing depends on this impl, so we should be able to rename at any point.  It does look like you have a duplicate line here, though.
Attachment #8865689 - Flags: review?(aki) → review+
Comment on attachment 8865689 [details]
Bug 1359976: base worker payload generation on worker-type

https://reviewboard.mozilla.org/r/137328/#review146936

::: taskcluster/taskgraph/transforms/job/mozharness_test.py:306
(Diff revision 4)
>  
> -    if build_platform.startswith('win'):
> +    if is_windows:
>          worker['command'] = [
>              {'task-reference': ' '.join(mh_command)}
>          ]
> -    else:
> +    else:  # is_macosx

Maybe we should add en `elif is_macosx` here and the fall through `else` clause raises an exception.

::: taskcluster/taskgraph/transforms/task.py:404
(Diff revision 4)
>          Required('implementation'): 'push-apk-breakpoint',
>          Required('payload'): object,
>  
>      }, {
> +        Required('implementation'): 'invalid',
> +        # an invalid task is one whih should never actually be created; this is used in

s/whih/which
Attachment #8865689 - Flags: review?(wcosta) → review+
We're sorry, Autoland could not rebase your commits for you automatically. Please manually rebase your commits and try again.

hg error in cmd: hg rebase -s 8d57d210ea31 -d 9309326b2723: rebasing 399001:8d57d210ea31 "Bug 1359976: clean out un-handled optionally_keyed_by; r=wcosta"
merging taskcluster/taskgraph/transforms/job/__init__.py
rebasing 399002:60b45a99b078 "Bug 1359976: remove references to now-unused test-platform-phylum; r=wcosta"
merging taskcluster/taskgraph/transforms/tests.py
rebasing 399003:b12ba627ffda "Bug 1359976: base worker payload generation on worker-type; r=aki,wcosta" (tip)
merging taskcluster/ci/android-stuff/kind.yml
merging taskcluster/ci/build/android.yml
merging taskcluster/ci/build/linux.yml
merging taskcluster/ci/build/macosx.yml
merging taskcluster/ci/build/windows.yml
merging taskcluster/ci/upload-symbols/kind.yml
merging taskcluster/taskgraph/transforms/job/__init__.py
merging taskcluster/taskgraph/transforms/task.py
merging taskcluster/taskgraph/transforms/tests.py
warning: conflicts while merging taskcluster/ci/build/linux.yml! (edit, then use 'hg resolve --mark')
warning: conflicts while merging taskcluster/ci/build/macosx.yml! (edit, then use 'hg resolve --mark')
warning: conflicts while merging taskcluster/ci/build/windows.yml! (edit, then use 'hg resolve --mark')
unresolved conflicts (see hg resolve, then hg rebase --continue)
We're sorry, Autoland could not rebase your commits for you automatically. Please manually rebase your commits and try again.

hg error in cmd: hg rebase -s 8d57d210ea31 -d 02c2ade8e3ec: rebasing 399024:8d57d210ea31 "Bug 1359976: clean out un-handled optionally_keyed_by; r=wcosta"
merging taskcluster/taskgraph/transforms/job/__init__.py
rebasing 399025:60b45a99b078 "Bug 1359976: remove references to now-unused test-platform-phylum; r=wcosta"
merging taskcluster/taskgraph/transforms/tests.py
rebasing 399026:b12ba627ffda "Bug 1359976: base worker payload generation on worker-type; r=aki,wcosta" (tip)
merging taskcluster/ci/android-stuff/kind.yml
merging taskcluster/ci/build/android.yml
merging taskcluster/ci/build/linux.yml
merging taskcluster/ci/build/macosx.yml
merging taskcluster/ci/build/windows.yml
merging taskcluster/ci/upload-symbols/kind.yml
merging taskcluster/taskgraph/transforms/job/__init__.py
merging taskcluster/taskgraph/transforms/task.py
merging taskcluster/taskgraph/transforms/tests.py
warning: conflicts while merging taskcluster/ci/build/linux.yml! (edit, then use 'hg resolve --mark')
warning: conflicts while merging taskcluster/ci/build/macosx.yml! (edit, then use 'hg resolve --mark')
warning: conflicts while merging taskcluster/ci/build/windows.yml! (edit, then use 'hg resolve --mark')
unresolved conflicts (see hg resolve, then hg rebase --continue)
We're sorry, Autoland could not rebase your commits for you automatically. Please manually rebase your commits and try again.

hg error in cmd: hg rebase -s af4c1e331103 -d 445cb4d47d4f: rebasing 399021:af4c1e331103 "Bug 1359976: clean out un-handled optionally_keyed_by; r=wcosta"
rebasing 399022:56f69a946970 "Bug 1359976: remove references to now-unused test-platform-phylum; r=wcosta"
rebasing 399023:96567908e121 "Bug 1359976: base worker payload generation on worker-type; r=aki,wcosta" (tip)
merging taskcluster/ci/build/linux.yml
merging taskcluster/ci/build/macosx.yml
merging taskcluster/ci/build/windows.yml
warning: conflicts while merging taskcluster/ci/build/linux.yml! (edit, then use 'hg resolve --mark')
warning: conflicts while merging taskcluster/ci/build/macosx.yml! (edit, then use 'hg resolve --mark')
warning: conflicts while merging taskcluster/ci/build/windows.yml! (edit, then use 'hg resolve --mark')
unresolved conflicts (see hg resolve, then hg rebase --continue)
                                     _
OK, autoland, I've had it up to here   with you.  Inbound it is!
Product: TaskCluster → Firefox Build System
You need to log in before you can comment on or make changes to this bug.