Closed
Bug 1436037
Opened 7 years ago
Closed 6 years ago
Support generic-worker in run-task
Categories
(Firefox Build System :: Task Configuration, task)
Firefox Build System
Task Configuration
Tracking
(firefox-esr60 fixed, firefox66 fixed)
RESOLVED
FIXED
People
(Reporter: ahal, Assigned: ahal)
References
(Blocks 2 open bugs)
Details
Attachments
(4 files, 2 obsolete files)
The end goal here is to run source-test tasks on OSX and Windows (but OSX is more important). So if I get anything wrong, please modify the bug as appropriate. Source-test tasks are based on the run_task infrastructure. Currently we have targets for 'docker-worker' and 'native-engine', but no support for 'generic-worker' (which is the implementation used to run OSX and Windows tests). Despite the fact that this has been an issue for over a year and I'm only a filing a bug now (oops), it should be high prioritiy as we don't have any place to run python-test tasks on OSX (since OSX builds are cross-compiled). We still run python-tests as part of the build on Windows so it's not as important. Though being able to pull the tests out of the build tasks would sure be nice. Around ~7 months ago, I started a patch but hit some kind of roadblock or ran out of time (I don't remember anymore). I'll push what I had and maybe that will be a good starting point.
Comment hidden (mozreview-request) |
Comment 2•7 years ago
|
||
mozreview-review |
Comment on attachment 8948715 [details] Bug 1436037 - WIP Support generic-worker with run-task https://reviewboard.mozilla.org/r/218106/#review223938 Code analysis found 2 defects in this patch: - 2 defects found by mozlint You can run this analysis locally with: - `./mach lint path/to/file` (JS/Python) If you see a problem in this automated review, please report it here: http://bit.ly/2y9N9Vx ::: taskcluster/taskgraph/transforms/job/run_task.py:132 (Diff revision 1) > + > + if run.get('cache-dotcache'): > + raise Exception("No cache support on generic-worker; can't use cache-dotcache") > + > + run_command = run['command'] > + if isinstance(run_command, basestring): Error: Undefined name 'basestring' [flake8: F821] ::: taskcluster/taskgraph/transforms/task.py:358 (Diff revision 1) > Optional('retry-exit-status'): Any( > int, > [int], > ), > + Optional('retry-exit-status'): int, > + Optional('context'): basestring, Error: Undefined name 'basestring' [flake8: F821]
Comment hidden (mozreview-request) |
Comment 4•7 years ago
|
||
mozreview-review |
Comment on attachment 8948715 [details] Bug 1436037 - WIP Support generic-worker with run-task https://reviewboard.mozilla.org/r/218106/#review223948 Code analysis found 2 defects in this patch: - 2 defects found by mozlint You can run this analysis locally with: - `./mach lint path/to/file` (JS/Python) If you see a problem in this automated review, please report it here: http://bit.ly/2y9N9Vx ::: taskcluster/taskgraph/transforms/job/run_task.py:132 (Diff revision 2) > + > + if run.get('cache-dotcache'): > + raise Exception("No cache support on generic-worker; can't use cache-dotcache") > + > + run_command = run['command'] > + if isinstance(run_command, basestring): Error: Undefined name 'basestring' [flake8: F821] ::: taskcluster/taskgraph/transforms/task.py:357 (Diff revision 2) > # the exit status code(s) that indicates the task should be retried > Optional('retry-exit-status'): Any( > int, > [int], > ), > + Optional('context'): basestring, Error: Undefined name 'basestring' [flake8: F821]
Comment 5•7 years ago
|
||
Hm, this is going to get annoying.
Assignee | ||
Comment 6•7 years ago
|
||
I filed https://github.com/mozilla-releng/services/issues/819
Assignee | ||
Comment 7•7 years ago
|
||
I think the next step in my patch was to actually make the 'generic_worker_run_task' function set things up properly for generic-worker (I'm not sure what this entails). We might also need modifications to the run-task script itself, especially if we start supporting Windows.
Assignee | ||
Comment 8•7 years ago
|
||
I'm still poking around at this. Not sure if I'll end up getting this resolved, but don't want anyone else to pick this up and duplicate efforts in the meantime.
Assignee: nobody → ahalberstadt
Status: NEW → ASSIGNED
Comment 9•7 years ago
|
||
Thanks Andrew! I will concentrate on getting the mozprocess unit tests running on Windows in the meantime.
Comment hidden (mozreview-request) |
Assignee | ||
Comment 11•7 years ago
|
||
I've been making some good progress, but might need some help now. The latest push gets the task running, and am currently working through issues in run-task. The latest error happens because 'hg' is not installed on the generic-worker (I assume we'll also have to figure out how to get 'robustcheckout' installed): https://public-artifacts.taskcluster.net/EL4hZGyqRLqHnuFZbW3-ZA/0/public/logs/live_backing.log I'm told this will involve a change to https://github.com/mozilla-releng/OpenCloudConfig , though I don't see any reference to OSX in there.
Assignee | ||
Comment 12•7 years ago
|
||
Filed: https://github.com/mozilla-releng/OpenCloudConfig/issues/120
Assignee | ||
Comment 13•7 years ago
|
||
(p.s, looks like my in-tree fix to get the reviewbot using python2 worked \o/)
Comment 14•7 years ago
|
||
OCC is for Windows only. I thought that's what you meant when you said "generic-worker" (we wouldn't install hg in the generic worker, obviously -- it'd be installed on the host..) Macs are configured with https://wiki.mozilla.org/ReleaseEngineering/PuppetAgain. The old mac builders had hg and robustcheckout installed, so doing so for the testers shouldn't be too hard.
Updated•7 years ago
|
Product: TaskCluster → Firefox Build System
Comment 15•6 years ago
|
||
I have a desire to make this work as well. But I'll be aiming for Windows rather than macOS. Same difference (more or less). Since it looks like there is no activity on this bug, I hope you don't mind me stealing it.
Assignee: ahalberstadt → gps
Blocks: 1350956
Assignee | ||
Comment 16•6 years ago
|
||
Please do! Sorry it got stalled out.
Comment 17•6 years ago
|
||
Unassigning myself since I'm not working on this right this moment and it looks like ahal has a new need for this.
Assignee: gps → nobody
Status: ASSIGNED → NEW
Assignee | ||
Comment 18•6 years ago
|
||
For the record this isn't blocking my immediate needs and I don't have plans to start working on it again anytime soon. I will be working on bug 1465181 however, which will likely have a little bit of overlap with this one.
See Also: → 1465181
Assignee | ||
Comment 19•6 years ago
|
||
I did look into this a little bit more for OSX, but run-task now requires python 3 and the generic-worker OSX hosts don't have that installed.
Comment 21•6 years ago
|
||
(In reply to Andrew Halberstadt [:ahal] from comment #0) > The end goal here is to run source-test tasks on OSX and Windows (but OSX is > more important). Note this is needed for linux too now that we are migrating these source-test tasks from taskcluster-worker to generic-worker. I figure it makes sense to tackle all three platforms (windows, linux, macOS) at once rather than split into separate bugs, as the code to generate the task definition should be generic (with the noted exception that the format of task.payload.command is platform-specific).
See Also: → 1478364
Comment 22•6 years ago
|
||
mozreview-review |
Comment on attachment 8948715 [details] Bug 1436037 - WIP Support generic-worker with run-task https://reviewboard.mozilla.org/r/218106/#review266358 This looks like a great start, thanks! ::: taskcluster/taskgraph/transforms/job/common.py:84 (Diff revision 3) > reserved for ``run-task`` tasks. > """ > level = config.params['level'] > > - # native-engine does not support caches (yet), so we just do a full clone > - # every time :( > + # native-engine and generic-worker do not support caches (yet), so we just > + # do a full clone every time :( generic-worker does support caches, see "Writable Directory Cache" in https://docs.taskcluster.net/docs/reference/workers/generic-worker/docs/payload ::: taskcluster/taskgraph/transforms/job/mach.py:29 (Diff revision 3) > > > @run_job_using("docker-worker", "mach", schema=mach_schema, defaults={'comm-checkout': False}) > @run_job_using("native-engine", "mach", schema=mach_schema, defaults={'comm-checkout': False}) > +@run_job_using("generic-worker", "mach", schema=mach_schema, defaults={'comm-checkout': False}) > def docker_worker_mach(config, job, taskdesc): should this function be renamed? ::: taskcluster/taskgraph/transforms/job/run_task.py:121 (Diff revision 3) > + worker = taskdesc['worker'] = job['worker'] > + command = ['./run-task'] > + common_setup(config, job, taskdesc, command) > + > + if run.get('cache-dotcache'): > + raise Exception("No cache support on generic-worker; can't use cache-dotcache") Caches are supported - see https://docs.taskcluster.net/docs/reference/workers/generic-worker/docs/payload ::: tools/tryselect/tasks.py:72 (Diff revision 3) > params.check() > except ParameterMismatch as e: > print(PARAMETER_MISMATCH.format(e.args[0])) > sys.exit(1) > > - taskgraph.fast = True > + taskgraph.fast = False What does this boolean change do? Could you add a comment? Thanks!
Comment 23•6 years ago
|
||
I have patch series at https://hg.mozilla.org/users/gszorc_mozilla.com/firefox/rev/2d16aac7 and https://hg.mozilla.org/users/gszorc_mozilla.com/firefox/rev/a5484c1e that may be useful in implementing this functionality. I have no immediate plans to resume that work.
Assignee | ||
Comment 24•6 years ago
|
||
mozreview-review-reply |
Comment on attachment 8948715 [details] Bug 1436037 - WIP Support generic-worker with run-task https://reviewboard.mozilla.org/r/218106/#review266358 > What does this boolean change do? Could you add a comment? Thanks! Ah, this was just to make things easier for me to debug with |mach try| (we skip schema validation in `fast` mode which causes strange errors). This hunk should be reverted before landing.
Assignee | ||
Comment 25•6 years ago
|
||
This might have just been fixed as part of bug 1474570. I'll try and use the work in there to stand up some OSX python-test tasks and see if I run into any problems. If not, we can resolve this.
Depends on: 1474570
Assignee | ||
Comment 26•6 years ago
|
||
I have patches that enable Windows generic-worker tasks to work with run-task: https://treeherder.mozilla.org/#/jobs?repo=try&revision=3b3149fec49ce3e34add364dbeba94dd8ed6ba3b However I created a bit of collateral damage around toolchains and caches. Reverting the change that caused this would be easy, but I have a feeling the existing behaviour is wrong and want to get some feedback before continuing. Because of support_vcs_checkout, we mount a hardcoded cache at /builds/worker/checkouts: https://searchfox.org/mozilla-central/source/taskcluster/taskgraph/transforms/job/common.py#87 This is where most tasks checkout Gecko. However some tasks (I think just toolchains) checkout Gecko to /builds/worker/workspace/build instead: https://searchfox.org/mozilla-central/source/taskcluster/taskgraph/transforms/job/toolchain.py#135 I believe this mismatch between cache and checkout dir is a bug, but just want to verify it isn't expected. Assuming it is a bug, is there any reason not to change toolchains to use /builds/worker/checkouts like everything else?
Flags: needinfo?(gps)
Comment 27•6 years ago
|
||
(In reply to Andrew Halberstadt [:ahal] from comment #26) > Because of support_vcs_checkout, we mount a hardcoded cache at > /builds/worker/checkouts: > https://searchfox.org/mozilla-central/source/taskcluster/taskgraph/ > transforms/job/common.py#87 > > This is where most tasks checkout Gecko. However some tasks (I think just > toolchains) checkout Gecko to /builds/worker/workspace/build instead: All builds also use that directory (among others): https://searchfox.org/mozilla-central/search?q=%7Bworkdir%7D%2Fworkspace%2Fbuild&path=taskcluster and that that directory is also cached: https://searchfox.org/mozilla-central/source/taskcluster/taskgraph/transforms/job/common.py#22-33 As I understand it, the original intention was that things in /builds/worker/checkouts were supposed to not modify the checkout directory, where as ones at /builds/worker/workspace were allowed to. Given that we always clobber the checkout anyway, perhaps this distinction isn't valuable anymore. I seem to recall :gps expressing concern about having the hg-share directory (/builds/worker/checkouts/hg-share) on a different filesystem than the checkout, which occurs in the /builds/worker/workspace case, so it might be worth changing things on that basis.
Assignee | ||
Comment 28•6 years ago
|
||
Ok I'll keep the needinfo to let gps chime in if he wants, but sounds like if there is work to do here it would be better left to a follow-up anyway. I'll refactor my patch to try and preserve the existing behaviour.
Assignee | ||
Comment 29•6 years ago
|
||
Also looks like toolchains never call that method so aren't using the workspace cache.
Assignee | ||
Comment 30•6 years ago
|
||
Also also they pass in sparse=True: https://searchfox.org/mozilla-central/source/taskcluster/taskgraph/transforms/job/toolchain.py#132 This value only gets used to build the checkouts cache, so that sort of implies the intent is that they *are* supposed to be using that cache.
Assignee | ||
Comment 31•6 years ago
|
||
This enables Windows generic-worker based tasks to use the run-task script. MozReview-Commit-ID: C07FANaYzf7
Assignee | ||
Comment 32•6 years ago
|
||
The following python-test paths are being moved out of 'make check' and into their own task: - python/mozlint - testing/mozbase - tools/lint The following python-test paths previously did not run on Windows: - python/mozterm - testing/marionette - testing/raptor - tools/tryselect MozReview-Commit-ID: C07FANaYzf7 Depends on D10758
Assignee | ||
Comment 33•6 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=ef0d47c19767074b011cba0bbbe9053d1a3f7b6f
Assignee | ||
Updated•6 years ago
|
Assignee: nobody → ahal
Status: NEW → ASSIGNED
Comment 34•6 years ago
|
||
Pushed by ahalberstadt@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/e072757bf691 [taskgraph] Support Windows generic-worker with run-task, r=gps https://hg.mozilla.org/integration/autoland/rev/914a7a899dd1 [python] Create Windows python-test tasks, r=gps
Assignee | ||
Comment 35•6 years ago
|
||
Have an OSX patch nearly ready to go, going to re-use this bug.
Flags: needinfo?(gps)
Keywords: leave-open
Comment 36•6 years ago
|
||
Backed out for jsbench failres Push that started the failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=pending%2Crunning%2Ctestfailed%2Cbusted%2Cexception&classifiedState=unclassified&group_state=expanded&revision=914a7a899dd1418ad957e8162eb93fa63df37507&selectedJob=211893650 Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=211893650&repo=autoland&lineNumber=107 Backout: https://hg.mozilla.org/integration/autoland/rev/09be3daa07878cd46f6b30281bf159038ca0c2fe
Flags: needinfo?(ahal)
Comment 37•6 years ago
|
||
Also, this is blocked by bug 1503756 relanding.
Assignee | ||
Comment 38•6 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #37) > Also, this is blocked by bug 1503756 relanding. Sorry, forgot about that!
Flags: needinfo?(ahal)
Comment 40•6 years ago
|
||
This is relevant to my interests (the windows support more urgently, but also macOS). If this is stalled due to lack of developer time let me know and I can help out.
Blocks: 1508828
Assignee | ||
Comment 41•6 years ago
|
||
The Windows patch is finished, reviewed and ready to land (and I have the OSX patch working locally, but haven't polished it up for review yet). Unfortunately it's blocked on bug 1503756. Basically there's a bug in the docker image generation such that when an image is re-built, a bunch of system changes get pulled in. In that bug it was discovered that some system changes had caused a mochitest to start failing as well as cause the emulator on Android to fail at startup. So any modification to a file that would cause an image re-build (like run-task in this case), starts failing tests. So one of two things need to happen: 1) Fix our image generation to not pull in random changes (ideal) 2) Figure out why the emulator stops working with those changes (band-aid, but at least unblocks this) There is *a lot* of stuff blocked on that other bug and it looks like :jmaher is trying to ring some bells. Feel free to chime in over there to further bump up the priority.
Comment 42•6 years ago
|
||
Thanks for the quick summary! I'll see if I can make any progress on that bug as my background task.
Comment 43•6 years ago
|
||
This is now unblocked from landing. However, the patch appears to be bit rotted with merge conflicts. Could you please rebase?
Flags: needinfo?(ahal)
Assignee | ||
Comment 44•6 years ago
|
||
Thanks for unblocking this! Rebased and try run in progress here: https://treeherder.mozilla.org/#/jobs?repo=try&revision=bb20c8898ff855a6eced924e433224412592c5b4
Flags: needinfo?(ahal)
Comment 45•6 years ago
|
||
Pushed by ahalberstadt@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/df5747b8931b [taskgraph] Support Windows generic-worker with run-task, r=gps https://hg.mozilla.org/integration/autoland/rev/c88d2cb951ca [python] Create Windows python-test tasks, r=gps
Comment 46•6 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/df5747b8931b https://hg.mozilla.org/mozilla-central/rev/c88d2cb951ca
Assignee | ||
Comment 47•6 years ago
|
||
This is staying open for OSX support, which I had working awhile back. Just need to rebase, test and get it ready for review.
Assignee | ||
Comment 48•6 years ago
|
||
This isn't a hard blocker, but will make it easier and there is a patch uploaded. So might as well wait.
Assignee | ||
Comment 49•6 years ago
|
||
This enables OSX generic-worker based tasks to use the run-task script.
Assignee | ||
Comment 50•6 years ago
|
||
Depends on D14900
Assignee | ||
Updated•6 years ago
|
Attachment #8948715 -
Attachment is obsolete: true
Assignee | ||
Comment 51•6 years ago
|
||
Latest try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=b2de4dc1786ac0763a0d705f2ada0d1c0de98a33
Assignee | ||
Comment 52•6 years ago
|
||
I broke the jsshell-bench tasks, will need to investigate further tomorrow.
Updated•6 years ago
|
Attachment #9032260 -
Attachment description: Bug 1436037 - [taskgraph] Support OSX generic-worker in run-task, r?gps → Bug 1436037 - [taskgraph] Support OSX generic-worker in run-task, r?Callek
Assignee | ||
Comment 53•6 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=e7a1b6b337c1d9e07948c5b1fe92c80191e33390
Assignee | ||
Updated•6 years ago
|
Keywords: leave-open
Comment 54•6 years ago
|
||
Pushed by ahalberstadt@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/6ffe1e52008f [taskgraph] Support OSX generic-worker in run-task, r=Callek https://hg.mozilla.org/integration/autoland/rev/0b74d496b797 [ci] Run mozbase and mozlint python-test tasks on OSX, r=jmaher
Comment 55•6 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/6ffe1e52008f https://hg.mozilla.org/mozilla-central/rev/0b74d496b797
Comment 56•6 years ago
|
||
Comment 57•6 years ago
|
||
bugherder uplift |
https://hg.mozilla.org/releases/mozilla-esr60/rev/3187f84bf01a
https://hg.mozilla.org/releases/mozilla-esr60/rev/135c44a78e6f
status-firefox-esr60:
--- → fixed
Updated•5 years ago
|
Attachment #9047285 -
Attachment is obsolete: true
You need to log in
before you can comment on or make changes to this bug.
Description
•