Closed Bug 1449981 Opened 6 years ago Closed 6 years ago

gecko-3-b-2012 can't find taskcluster-proxy

Categories

(Infrastructure & Operations :: RelOps: General, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mozilla, Unassigned)

References

Details

Background: in bug 1442793 we need to support private artifacts. In docker-worker, enabling taskcluster-proxy and using http://taskcluster/queue/... urls to download artifacts works. I tried to do the same with generic-worker.

The docs [1] specify that generic-worker >=10.6.0 supports the taskclusterProxy feature. The logs [2] specify that we're running generic-worker 10.6.0, but `Could not start taskcluster proxy: exec: "taskcluster-proxy": executable file not found in %PATH%`

Could we add taskcluster-proxy to %PATH% on gecko-3-b-2012?

[1] https://docs.taskcluster.net/reference/workers/generic-worker/docs/features#feature-taskclusterproxy
[2] https://taskcluster-artifacts.net/HyLv2HeIT6-cunfK46duPg/0/public/logs/live_backing.log
Hi Aki,

Indeed, this might be the first time taskcluster-proxy is being used on gecko-3-b-win2012.

It will need to be installed as part of the worker type environment setup - the binaries are available here: https://github.com/taskcluster/taskcluster-proxy/releases

You can use the livelog.exe installation step as a template for this:

https://github.com/mozilla-releng/OpenCloudConfig/blob/4902e29a9664e890db01502f8154d70c730ab39c/userdata/Manifest/gecko-3-b-win2012.json#L1007-L1019

The taskcluster-proxy download step will essentially be the same as the livelog download step, but point to the taskcluster-proxy binary instead of the livelog binary, have a different SHA512, be downloaded to a different filename, etc. I believe it may need to be published to tooltool as an internal artifact first (although grenade can confirm, because he might have made this automatic).

It would probably also be best to add "taskclusterProxyExecutable" to the global config file, so it will be set in any generic worker 10+ deployments we do in future:

https://github.com/mozilla-releng/OpenCloudConfig/blob/master/userdata/Configuration/GenericWorker/generic-worker.config

See `generic-worker --help` for more information about generic-worker config settings.

I think this is everything. I'm on PTO this week, so am not around much, but grenade can help with OCC stuff, and if there are any problems in the worker, I'll be back on Tuesday next week (3 April) otherwise people in #taskcluster should be able to help.

Good luck!
Assignee: nobody → relops
Component: Generic-Worker → RelOps
Product: Taskcluster → Infrastructure & Operations
QA Contact: pmoore → klibby
For consistency, I'd recommend downloading to C:\generic-worker\taskcluster-proxy.exe - which is where the other worker related binaries are. I'd also recommend installing the latest release, not an earlier one, since the latest release has some security fixes that are specific to running as a native executable rather than running inside a docker container.
Thanks Rob!

You may need to also set "taskclusterProxyExecutable" in the generic-worker configs in the worker type definitions.

Let me know if you hit any problems.
beta worker type definitions updated with:
        "taskclusterProxyExecutable": "C:\\generic-worker\\taskcluster-proxy.exe",
(In reply to Rob Thijssen (:grenade UTC+2) PTO till Apr 2 from comment #5)
> beta worker type definitions updated with:
>         "taskclusterProxyExecutable":
> "C:\\generic-worker\\taskcluster-proxy.exe",

Thanks Rob!

I'm afraid I've only just remembered, but if Aki wishes to use URLs like http://taskcluster/<service>/v1/<path> to be consistent with docker-worker, then we'll also need to update the C:\Windows\System32\drivers\etc\hosts file to have an entry for "taskcluster" as 127.0.0.1:

127.0.0.1 taskcluster

I forgot to mention this before, apologies. The name resolution can't be handled in generic-worker directly, as it is a matter for the name resolution of the host environment.

Note though, even without the host file fix, it *is* still possible to use the taskcluster-proxy as it is now, but using "localhost" or "127.0.0.1" for the URL host (e.g. http://localhost/<service>/v1/<path>).
Aki - I've just reread the context here (sorry I've been on PTO and away from my computer) - and see if this is just about downloading private artifacts, taskcluster-proxy is not needed, this is a native feature of generic-worker. See https://bugzilla.mozilla.org/show_bug.cgi?id=1442793#c82 for details.
Flags: needinfo?(aki)
[taskcluster 2018-03-30T13:30:50.446Z] Worker Type (gecko-1-b-win2012-beta) settings:
[taskcluster 2018-03-30T13:30:50.446Z]   {
[taskcluster 2018-03-30T13:30:50.446Z]     "aws": {
[taskcluster 2018-03-30T13:30:50.446Z]       "ami-id": "ami-bd1302dd",
[taskcluster 2018-03-30T13:30:50.446Z]       "availability-zone": "us-west-1b",
[taskcluster 2018-03-30T13:30:50.446Z]       "instance-id": "i-065329f2a22addbe5",
[taskcluster 2018-03-30T13:30:50.446Z]       "instance-type": "c4.4xlarge",
[taskcluster 2018-03-30T13:30:50.446Z]       "local-ipv4": "10.143.30.74",
[taskcluster 2018-03-30T13:30:50.446Z]       "public-hostname": "ec2-54-153-121-157.us-west-1.compute.amazonaws.com",
[taskcluster 2018-03-30T13:30:50.446Z]       "public-ipv4": "54.153.121.157"
[taskcluster 2018-03-30T13:30:50.446Z]     },
[taskcluster 2018-03-30T13:30:50.446Z]     "config": {
[taskcluster 2018-03-30T13:30:50.446Z]       "deploymentId": "27d2d40b6bec",
[taskcluster 2018-03-30T13:30:50.446Z]       "runTasksAsCurrentUser": false
[taskcluster 2018-03-30T13:30:50.446Z]     },
[taskcluster 2018-03-30T13:30:50.446Z]     "generic-worker": {
[taskcluster 2018-03-30T13:30:50.446Z]       "go-arch": "amd64",
[taskcluster 2018-03-30T13:30:50.446Z]       "go-os": "windows",
[taskcluster 2018-03-30T13:30:50.446Z]       "go-version": "go1.10",
[taskcluster 2018-03-30T13:30:50.446Z]       "release": "https://github.com/taskcluster/generic-worker/releases/tag/v10.7.7",
[taskcluster 2018-03-30T13:30:50.446Z]       "revision": "dd9f0a6e8e7e6b0bce6d3f385d14204817339b31",
[taskcluster 2018-03-30T13:30:50.446Z]       "source": "https://github.com/taskcluster/generic-worker/commits/dd9f0a6e8e7e6b0bce6d3f385d14204817339b31",
[taskcluster 2018-03-30T13:30:50.446Z]       "version": "10.7.7"
[taskcluster 2018-03-30T13:30:50.446Z]     },
[taskcluster 2018-03-30T13:30:50.446Z]     "machine-setup": {
[taskcluster 2018-03-30T13:30:50.446Z]       "ami-created": "2018-03-29 18:03:11.674Z",
[taskcluster 2018-03-30T13:30:50.446Z]       "manifest": "https://github.com/mozilla-releng/OpenCloudConfig/blob/27d2d40b6beca9414a36ccccaf44a7c6ddef1df6/userdata/Manifest/gecko-1-b-win2012-beta.json"
[taskcluster 2018-03-30T13:30:50.446Z]     }
[taskcluster 2018-03-30T13:30:50.446Z]   }
[taskcluster 2018-03-30T13:30:50.448Z] Task ID: Os4KRQLmS2eYryrLbDhV6g
[taskcluster 2018-03-30T13:30:50.448Z] === Task Starting ===
[taskcluster 2018-03-30T13:30:51.416Z] Uploading redirect artifact public/logs/live.log to URL https://g2mxthiaaaawe52rie4v4vsjisvq62af7x5owkuqnwbfspx6.taskcluster-worker.net:60023/log/ZFroU3wYQiy1cBbjyJk6yQ with mime type "text/plain; charset=utf-8" and expiry 2018-03-30T14:30:36.601Z
[taskcluster 2018-03-30T13:30:52.745Z] Executing command 0: wget -O- -q http://localhost/queue/v1/task/%TASK_ID%

Z:\task_1522416594>wget -O- -q http://localhost/queue/v1/task/Os4KRQLmS2eYryrLbDhV6g 
{
  "provisionerId": "aws-provisioner-v1",
  "workerType": "gecko-1-b-win2012-beta",
  "schedulerId": "-",
  "taskGroupId": "Os4KRQLmS2eYryrLbDhV6g",
  "dependencies": [],
  "requires": "all-completed",
  "routes": [],
  "priority": "lowest",
  "retries": 5,
  "created": "2018-03-30T13:30:40.012Z",
  "deadline": "2018-03-31T13:30:40.012Z",
  "expires": "2019-03-31T13:30:40.012Z",
  "scopes": [],
  "payload": {
    "features": {
      "taskclusterProxy": true
    },
    "maxRunTime": 3600,
    "command": [
      "wget -O- -q http://localhost/queue/v1/task/%TASK_ID%"
    ]
  },
  "metadata": {
    "name": "taskcluster-proxy-test-win2012",
    "description": "Test taskcluster-proxy on gecko-1-b-win2012-beta",
    "owner": "pmoore@mozilla.com",
    "source": "https://github.com/taskcluster/taskcluster-proxy"
  },
  "tags": {},
  "extra": {}
}[taskcluster 2018-03-30T13:30:53.029Z]    Exit Code: 0
[taskcluster 2018-03-30T13:30:53.029Z]    User Time: 0s
[taskcluster 2018-03-30T13:30:53.029Z]  Kernel Time: 0s
[taskcluster 2018-03-30T13:30:53.029Z]    Wall Time: 283.0003ms
[taskcluster 2018-03-30T13:30:53.029Z]       Result: SUCCEEDED
[taskcluster 2018-03-30T13:30:53.029Z] === Task Finished ===
[taskcluster 2018-03-30T13:30:53.029Z] Task Duration: 284.0037ms
[taskcluster 2018-03-30T13:30:53.557Z] Uploading redirect artifact public/logs/live.log to URL https://queue.taskcluster.net/v1/task/Os4KRQLmS2eYryrLbDhV6g/runs/0/artifacts/public/logs/live_backing.log with mime type "text/plain; charset=utf-8" and expiry 2019-03-31T13:30:40.012Z
(In reply to Pete Moore [:pmoore][:pete] from comment #7)
> Aki - I've just reread the context here (sorry I've been on PTO and away
> from my computer) - and see if this is just about downloading private
> artifacts, taskcluster-proxy is not needed, this is a native feature of
> generic-worker. See https://bugzilla.mozilla.org/show_bug.cgi?id=1442793#c82
> for details.

a) Awesome.
b) From the taskgraph and script side, consistency is better. We already have it set so all repackage tasks have taskcluster-proxy set if there are private artifacts to download, and the repackage script takes a URL to download and extract. The mount may be better supported on generic-worker, but we'd have to branch the taskgraph and script code to behave differently on Windows than linux. It sounds like we'll be able to use the same workflow on generic-worker as on docker-worker once we resolve this bug, correct?
Flags: needinfo?(aki)
(In reply to Aki Sasaki [:aki] from comment #10)
> (In reply to Pete Moore [:pmoore][:pete] from comment #7)
> > Aki - I've just reread the context here (sorry I've been on PTO and away
> > from my computer) - and see if this is just about downloading private
> > artifacts, taskcluster-proxy is not needed, this is a native feature of
> > generic-worker. See https://bugzilla.mozilla.org/show_bug.cgi?id=1442793#c82
> > for details.
> 
> a) Awesome.
> b) From the taskgraph and script side, consistency is better. We already
> have it set so all repackage tasks have taskcluster-proxy set if there are
> private artifacts to download, and the repackage script takes a URL to
> download and extract. The mount may be better supported on generic-worker,
> but we'd have to branch the taskgraph and script code to behave differently
> on Windows than linux. It sounds like we'll be able to use the same workflow
> on generic-worker as on docker-worker once we resolve this bug, correct?

c) if we were able to add a similar feature like `mounts` to docker-worker, then we could keep consistency across platforms and use the worker to download the artifact. This may actually be preferable to (b), since if the worker downloads the artifact, there's the possibility of the worker cot-verifying the artifact in the future. I'm guessing we won't do this until we roll out tc-worker everywhere.
(In reply to Aki Sasaki [:aki] from comment #10)

> b) From the taskgraph and script side, consistency is better. We already
> have it set so all repackage tasks have taskcluster-proxy set if there are
> private artifacts to download, and the repackage script takes a URL to
> download and extract. The mount may be better supported on generic-worker,
> but we'd have to branch the taskgraph and script code to behave differently
> on Windows than linux. It sounds like we'll be able to use the same workflow
> on generic-worker as on docker-worker once we resolve this bug, correct?

Indeed - using taskcluster-proxy also works.
Note, we'll still need to apply the change described in comment 6. I can take a look at this on Tuesday if needed when I'm back from PTO, or someone else can take a look at it if they wish.
(In reply to Pete Moore [:pmoore][:pete] from comment #13)
> Note, we'll still need to apply the change described in comment 6. I can
> take a look at this on Tuesday if needed when I'm back from PTO, or someone
> else can take a look at it if they wish.

I've done this on the beta worker types, and this seems to do the trick:

> [taskcluster 2018-04-03T17:33:59.507Z] Worker Type (gecko-1-b-win2012-beta) settings:
> [taskcluster 2018-04-03T17:33:59.507Z]   {
> [taskcluster 2018-04-03T17:33:59.507Z]     "aws": {
> [taskcluster 2018-04-03T17:33:59.507Z]       "ami-id": "ami-9daaf676",
> [taskcluster 2018-04-03T17:33:59.507Z]       "availability-zone": "eu-central-1a",
> [taskcluster 2018-04-03T17:33:59.507Z]       "instance-id": "i-08d8121e72d3a5512",
> [taskcluster 2018-04-03T17:33:59.507Z]       "instance-type": "c4.4xlarge",
> [taskcluster 2018-04-03T17:33:59.507Z]       "local-ipv4": "10.147.26.27",
> [taskcluster 2018-04-03T17:33:59.507Z]       "public-hostname": "ec2-54-93-91-61.eu-central-1.compute.amazonaws.com",
> [taskcluster 2018-04-03T17:33:59.507Z]       "public-ipv4": "54.93.91.61"
> [taskcluster 2018-04-03T17:33:59.507Z]     },
> [taskcluster 2018-04-03T17:33:59.507Z]     "config": {
> [taskcluster 2018-04-03T17:33:59.507Z]       "deploymentId": "fdfdfdf",
> [taskcluster 2018-04-03T17:33:59.507Z]       "runTasksAsCurrentUser": false
> [taskcluster 2018-04-03T17:33:59.507Z]     },
> [taskcluster 2018-04-03T17:33:59.507Z]     "generic-worker": {
> [taskcluster 2018-04-03T17:33:59.507Z]       "go-arch": "amd64",
> [taskcluster 2018-04-03T17:33:59.507Z]       "go-os": "windows",
> [taskcluster 2018-04-03T17:33:59.507Z]       "go-version": "go1.10",
> [taskcluster 2018-04-03T17:33:59.507Z]       "release": "https://github.com/taskcluster/generic-worker/releases/tag/v10.7.8",
> [taskcluster 2018-04-03T17:33:59.507Z]       "revision": "49698e74d33e2f45f3dc4e95f118fce2d85d0a37",
> [taskcluster 2018-04-03T17:33:59.507Z]       "source": "https://github.com/taskcluster/generic-worker/commits/49698e74d33e2f45f3dc4e95f118fce2d85d0a37",
> [taskcluster 2018-04-03T17:33:59.507Z]       "version": "10.7.8"
> [taskcluster 2018-04-03T17:33:59.507Z]     },
> [taskcluster 2018-04-03T17:33:59.507Z]     "machine-setup": {
> [taskcluster 2018-04-03T17:33:59.507Z]       "ami-created": "2018-04-01 16:00:57.923Z",
> [taskcluster 2018-04-03T17:33:59.507Z]       "manifest": "https://github.com/mozilla-releng/OpenCloudConfig/blob/96b534f31302b954f3374b311be8a691e49c6ef5/userdata/Manifest/gecko-1-b-win2012-beta.json"
> [taskcluster 2018-04-03T17:33:59.507Z]     }
> [taskcluster 2018-04-03T17:33:59.507Z]   }
> [taskcluster 2018-04-03T17:33:59.507Z] Task ID: MT_f2T0ZSnelhxOQNdMR8g
> [taskcluster 2018-04-03T17:33:59.507Z] === Task Starting ===
> [taskcluster 2018-04-03T17:34:00.680Z] Uploading redirect artifact public/logs/live.log to URL https://gzovwpiaaaawfdgjpwapywtqyaiouz64txe57bpzkhf6ahjn.taskcluster-worker.net:60023/log/fWkOEmoERuuTyslPznqqtw with mime type "text/plain; charset=utf-8" and expiry 2018-04-03T18:33:57.888Z
> [taskcluster 2018-04-03T17:34:02.135Z] Executing command 0: wget -O- -q http://taskcluster/queue/v1/task/%TASK_ID%
> 
> Z:\task_1522776786>wget -O- -q http://taskcluster/queue/v1/task/MT_f2T0ZSnelhxOQNdMR8g 
> {
>   "provisionerId": "aws-provisioner-v1",
>   "workerType": "gecko-1-b-win2012-beta",
>   "schedulerId": "-",
>   "taskGroupId": "Os4KRQLmS2eYryrLbDhV6g",
>   "dependencies": [],
>   "requires": "all-completed",
>   "routes": [],
>   "priority": "lowest",
>   "retries": 5,
>   "created": "2018-04-03T16:55:41.724Z",
>   "deadline": "2018-04-04T16:55:41.724Z",
>   "expires": "2019-04-04T16:55:41.724Z",
>   "scopes": [],
>   "payload": {
>     "features": {
>       "taskclusterProxy": true
>     },
>     "maxRunTime": 3600,
>     "command": [
>       "wget -O- -q http://taskcluster/queue/v1/task/%TASK_ID%"
>     ]
>   },
>   "metadata": {
>     "name": "taskcluster-proxy-test-win2012-with-taskcluster-hostname",
>     "description": "Test taskcluster-proxy on gecko-1-b-win2012-beta using hostname taskcluster",
>     "owner": "pmoore@mozilla.com",
>     "source": "https://github.com/taskcluster/taskcluster-proxy"
>   },
>   "tags": {},
>   "extra": {}
> }[taskcluster 2018-04-03T17:34:02.779Z]    Exit Code: 0
> [taskcluster 2018-04-03T17:34:02.779Z]    User Time: 0s
> [taskcluster 2018-04-03T17:34:02.779Z]  Kernel Time: 0s
> [taskcluster 2018-04-03T17:34:02.779Z]    Wall Time: 642.999ms
> [taskcluster 2018-04-03T17:34:02.779Z]       Result: SUCCEEDED
> [taskcluster 2018-04-03T17:34:02.779Z] === Task Finished ===
> [taskcluster 2018-04-03T17:34:02.779Z] Task Duration: 644.0006ms
> [taskcluster 2018-04-03T17:34:03.727Z] Uploading redirect artifact public/logs/live.log to URL https://queue.taskcluster.net/v1/task/MT_f2T0ZSnelhxOQNdMR8g/runs/0/artifacts/public/logs/live_backing.log with mime type "text/plain; charset=utf-8" and expiry 2019-04-04T16:55:41.724Z


I'm currently updating all the Windows worker types to have the entry for taskcluster -> 127.0.0.1 in their hosts file...
Any luck here? Guessing you've been having fun with the test pool :)
The taskcluster-proxy is installed and working on:

* gecko-1-b-win2012   
* gecko-1-b-win2012-beta   
* gecko-2-b-win2012   
* gecko-3-b-win2012   
* gecko-t-win10-64   
* gecko-t-win10-64-beta   
* gecko-t-win10-64-gpu   
* gecko-t-win10-64-gpu-b   
* gecko-t-win7-32   
* gecko-t-win7-32-beta   
* gecko-t-win7-32-gpu   
* gecko-t-win7-32-gpu-b
Flags: needinfo?(aki)
Still having issues verifying, but that is likely due to issues in our taskgraph artifact paths. I'll clear the needinfo when we have a green run.
Green! Thank you!
Status: NEW → RESOLVED
Closed: 6 years ago
Flags: needinfo?(aki)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.