Closed Bug 1374366 Opened 8 years ago Closed 8 years ago

"InsufficientScopes" for OCC configured hardware Windows 10

Categories

(Infrastructure & Operations :: RelOps: General, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: markco, Assigned: markco)

References

Details

Following OCC being applied the machine is not picking up tests. Currently the task user will auto log on, and the generic worker service is running. The conversation that started in bug 1371780: (In reply to Mark Cornmesser [:markco] from comment #25) > I think the generic worker maybe causing a reboot before OCC is finished: > > Jun 16 13:50:58 T-W864-IX-011.wintest.releng.scl3.mozilla.com > generic-worker: {}#015 > Jun 16 13:50:58 T-W864-IX-011.wintest.releng.scl3.mozilla.com > generic-worker: 2017/06/16 20:50:57 Exiting worker with exit code 67#015 > Jun 16 13:50:58 T-W864-IX-011.wintest.releng.scl3.mozilla.com > generic-worker: 2017/06/16 20:50:57 Immediate reboot being issued...#015 > > In the event viewer there is a shutdown with the comment of "generic-worker > requested reboot". (In reply to Pete Moore [:pmoore][:pete] from comment #26) > Glad the task user login is working now. > > The generic worker should only be started up after OCC run has completed. > > I think this isn't happening because you removed these sections: > > https://github.com/mozilla-releng/OpenCloudConfig/blob/ > a6b9c35660672f9077c491b5a2546225fc61655a/userdata/Configuration/ > GenericWorker/run-generic-worker-format-and-reboot.bat#L6-L12 > > and > > https://github.com/mozilla-releng/OpenCloudConfig/blob/ > a6b9c35660672f9077c491b5a2546225fc61655a/userdata/Configuration/ > GenericWorker/run-generic-worker-format-and-reboot.bat#L31 > > when creating: > > https://github.com/mozilla-releng/OpenCloudConfig/blob/ > 26010943c1def56d0c30ef24638ca1bb34205ba8/userdata/Configuration/ > GenericWorker/run-hw-generic-worker-and-reboot.bat > > If you add those back in, I think you'll have the desired behaviour. > > Good luck! > Pete (In reply to Pete Moore [:pmoore][:pete] from comment #27) > Sorry, you removed only the second part, but note, **you'll need to add that > back in otherwise after reboot after one task, the worker won't start up > again**. > > The OCC mechanism for signalling to generic-worker that it has finished and > generic-worker can start up, is writing a file to > C:\dsc\task-claim-state.valid > > If generic-worker is starting up before that file is written, it suggests > that either: > > 1) the file C:\generic-worker\run-generic-worker.bat script is the wrong one > (i.e. not a copy of > https://github.com/mozilla-releng/OpenCloudConfig/blob/ > 26010943c1def56d0c30ef24638ca1bb34205ba8/userdata/Configuration/ > GenericWorker/run-hw-generic-worker-and-reboot.bat), or > 2) the file C:\dsc\task-claim-state.valid is getting created prematurely > > I recommend manually comparing the run-generic-worker.bat script in 1) to > make sure the patched version is the one deployed (not the default version > installed by generic worker) and also making sure that > C:\dsc\task-claim-state.valid is only created once OCC finishes.
Assignee: relops → mcornmesser
Blocks: 1358558
I added this line to run-hw-generic-worker-and-reboot.bat <nul (set/p z=) >C:\dsc\task-claim-state.valid Since the there a file creation for the task-claim file in the worker bat should we remove it from the rundsc.ps1: https://github.com/mozilla-releng/OpenCloudConfig/blob/master/userdata/rundsc.ps1#L897-L899 What I am currently seeing is: OCC is applied. The task claim file is created, but the in-progress.lock file is never cleared. The machine reboots. Task user logs in. Generic worker service is running. The task claim file is removed and never recreated. Eventually another reboot. Initially I thought we were going to run rundsc.ps1 on start up after the first run of it. Is that the desired behavior?
I am trying to run this task https://tools.taskcluster.net/task-inspector/#Whd5TL55Tv-x-87m6hGSFA/0 It is picking up the task and then hits: Jun 20 14:20:25 T-W864-IX-011.wintest.releng.scl3.mozilla.com generic-worker: {#015 Jun 20 14:20:25 T-W864-IX-011.wintest.releng.scl3.mozilla.com generic-worker: "code": "InsufficientScopes",#015 Jun 20 14:20:25 T-W864-IX-011.wintest.releng.scl3.mozilla.com generic-worker: "message": "You do not have sufficient scopes. This request requires you\nto have one of the following sets of scopes:\n[\n [\n \"queue:resolve-task\",\n \"assume:worker-id:scl3/t-w864-ix-011\"\n ],\n [\n \"queue:resolve-task:Whd5TL55Tv-x-87m6hGSFA/0\"\n ]\n]\n\nYou only have the scopes:\n[\n \"assume:worker-type:scl3-puppet/gecko-t-win10-64-hw\",\n \"auth:sentry:generic-worker\",\n \"auth:statsum:generic-worker\",\n \"queue:claim-task\",\n \"queue:claim-task:scl3-puppet/gecko-t-win10-64-hw\",\n \"queue:claim-work:scl3-puppet/*\",\n \"queue:create-artifact:*\",\n \"queue:pending-tasks:scl3-puppet/*\",\n \"queue:poll-task-urls\",\n \"queue:resolve-task\",\n \"queue:worker-id:*\"\n]\n\nIn other words you are missing scopes from one of the options:\n * Option 0:\n - \"assume:worker-id:scl3/t-w864-ix-011\"\n * Option 1:\n - \"queue:resolve-task:Whd5TL55Tv-x-87m6hGSFA/0\"\n----\nmethod: reportFailed\nerrorCode: InsufficientScopes\nstatusCode: 403\ntime: 2017-06-20T21:20:24.245Z",#015 Jun 20 14:20:25 T-W864-IX-011.wintest.releng.scl3.mozilla.com generic-worker: "requestInfo": {#015 Jun 20 14:20:25 T-W864-IX-011.wintest.releng.scl3.mozilla.com generic-worker: "method": "reportFailed",#015 Jun 20 14:20:25 T-W864-IX-011.wintest.releng.scl3.mozilla.com generic-worker: "params": {#015 Jun 20 14:20:25 T-W864-IX-011.wintest.releng.scl3.mozilla.com generic-worker: "taskId": "Whd5TL55Tv-x-87m6hGSFA",#015 Jun 20 14:20:25 T-W864-IX-011.wintest.releng.scl3.mozilla.com generic-worker: "runId": "0"#015 Jun 20 14:20:25 T-W864-IX-011.wintest.releng.scl3.mozilla.com generic-worker: },#015 Jun 20 14:20:25 T-W864-IX-011.wintest.releng.scl3.mozilla.com generic-worker: "payload": {},#015 Jun 20 14:20:25 T-W864-IX-011.wintest.releng.scl3.mozilla.com generic-worker: "time": "2017-06-20T21:20:24.245Z"#015 Jun 20 14:20:25 T-W864-IX-011.wintest.releng.scl3.mozilla.com generic-worker: }#015 Jun 20 14:20:25 T-W864-IX-011.wintest.releng.scl3.mozilla.com generic-worker: }#015 Jun 20 14:20:25 T-W864-IX-011.wintest.releng.scl3.mozilla.com generic-worker: 2017/06/20 21:20:24 Querying queue to get latest status for task Whd5TL55Tv-x-87m6hGSFA...#015 Jun 20 14:20:25 T-W864-IX-011.wintest.releng.scl3.mozilla.com generic-worker: 2017/06/20 21:20:24 Latest status: Claimed#015 Jun 20 14:20:25 T-W864-IX-011.wintest.releng.scl3.mozilla.com generic-worker: 2017/06/20 21:20:24 ERROR encountered: Task not successful due to following exception(s):#015 Jun 20 14:20:25 T-W864-IX-011.wintest.releng.scl3.mozilla.com generic-worker: Exception 1)#015 Jun 20 14:20:25 T-W864-IX-011.wintest.releng.scl3.mozilla.com generic-worker: []#015 Jun 20 14:20:25 T-W864-IX-011.wintest.releng.scl3.mozilla.com generic-worker: #015 Jun 20 14:20:25 T-W864-IX-011.wintest.releng.scl3.mozilla.com generic-worker: Exit code: 4294967295#015 Jun 20 14:20:25 T-W864-IX-011.wintest.releng.scl3.mozilla.com generic-worker: Exception 2)#015 Jun 20 14:20:25 T-W864-IX-011.wintest.releng.scl3.mozilla.com generic-worker: (Permanent) HTTP response code 403#015 Jun 20 14:20:25 T-W864-IX-011.wintest.releng.scl3.mozilla.com generic-worker: HTTP/1.1 403 Forbidden#015 Jun 20 14:20:25 T-W864-IX-011.wintest.releng.scl3.mozilla.com generic-worker: Content-Length: 1263#015 Jun 20 14:20:25 T-W864-IX-011.wintest.releng.scl3.mozilla.com generic-worker: Access-Control-Allow-Headers: X-Requested-With,Content-Type,Authorization,Accept,Origin#015 Jun 20 14:20:25 T-W864-IX-011.wintest.releng.scl3.mozilla.com generic-worker: Access-Control-Allow-Methods: OPTIONS,GET,HEAD,POST,PUT,DELETE,TRACE,CONNECT#015 Jun 20 14:20:25 T-W864-IX-011.wintest.releng.scl3.mozilla.com generic-worker: Access-Control-Allow-Origin: *#015 Jun 20 14:20:25 T-W864-IX-011.wintest.releng.scl3.mozilla.com generic-worker: Access-Control-Request-Method: *#015 Jun 20 14:20:25 T-W864-IX-011.wintest.releng.scl3.mozilla.com generic-worker: Connection: keep-alive#015 Jun 20 14:20:25 T-W864-IX-011.wintest.releng.scl3.mozilla.com generic-worker: Content-Type: application/json; charset=utf-8#015 Jun 20 14:20:25 T-W864-IX-011.wintest.releng.scl3.mozilla.com generic-worker: Date: Tue, 20 Jun 2017 21:20:24 GMT#015 Jun 20 14:20:25 T-W864-IX-011.wintest.releng.scl3.mozilla.com generic-worker: Server: Cowboy#015 Jun 20 14:20:25 T-W864-IX-011.wintest.releng.scl3.mozilla.com generic-worker: Strict-Transport-Security: max-age=7776000#015 Jun 20 14:20:25 T-W864-IX-011.wintest.releng.scl3.mozilla.com generic-worker: Via: 1.1 vegur#015 Jun 20 14:20:25 T-W864-IX-011.wintest.releng.scl3.mozilla.com generic-worker: X-Powered-By: Express#015 Jun 20 14:20:25 T-W864-IX-011.wintest.releng.scl3.mozilla.com generic-worker: #015 Jun 20 14:20:25 T-W864-IX-011.wintest.releng.scl3.mozilla.com generic-worker: {#015
pmoore: What worker type should be used in the config files?
Flags: needinfo?(pmoore)
Summary: OCC configured hardware is not picking up tests → "InsufficientScopes" for OCC configured hardware Windows 10
Formatting the message in comment 2, we see: You do not have sufficient scopes. This request requires you to have one of the following sets of scopes: [ [ "queue:resolve-task", "assume:worker-id:scl3/t-w864-ix-011" ], [ "queue:resolve-task:Whd5TL55Tv-x-87m6hGSFA/0" ] ] You only have the scopes: [ "assume:worker-type:scl3-puppet/gecko-t-win10-64-hw", "auth:sentry:generic-worker", "auth:statsum:generic-worker", "queue:claim-task", "queue:claim-task:scl3-puppet/gecko-t-win10-64-hw", "queue:claim-work:scl3-puppet/*", "queue:create-artifact:*", "queue:pending-tasks:scl3-puppet/*", "queue:poll-task-urls", "queue:resolve-task", "queue:worker-id:*" ] In other words you are missing scopes from one of the options: * Option 0: - "assume:worker-id:scl3/t-w864-ix-011" * Option 1: - "queue:resolve-task:Whd5TL55Tv-x-87m6hGSFA/0" ---- method: reportFailed errorCode: InsufficientScopes statusCode: 403 time: 2017-06-20T21:20:24.245Z This is kind of bizarre, since assume: scopes should never be required scopes. I'll look into where this is coming from.
(In reply to Mark Cornmesser [:markco] from comment #1) > Since the there a file creation for the task-claim file in the worker bat > should we remove it from the rundsc.ps1: > https://github.com/mozilla-releng/OpenCloudConfig/blob/master/userdata/ > rundsc.ps1#L897-L899 > > What I am currently seeing is: > OCC is applied. > The task claim file is created, but the in-progress.lock file is never > cleared. > The machine reboots. > Task user logs in. > Generic worker service is running. > The task claim file is removed and never recreated. > Eventually another reboot. > > > Initially I thought we were going to run rundsc.ps1 on start up after the > first run of it. Is that the desired behavior? Rob is the OCC expert, I'll redirect to him. :)
Flags: needinfo?(pmoore) → needinfo?(rthijssen)
rundsc.ps1 is the script that manages creating and removing C:\dsc\in-progress.lock if something other than rundsc.ps1 reboots the computer while rundsc is running, it would explain why the file exists and isn't removed. whenever rundsc reboots the machine, it removes the lock file.
Flags: needinfo?(rthijssen)
(In reply to Pete Moore [:pmoore][:pete] from comment #4) > Formatting the message in comment 2, we see: > > You do not have sufficient scopes. This request requires you > to have one of the following sets of scopes: > [ > [ > "queue:resolve-task", > "assume:worker-id:scl3/t-w864-ix-011" > ], > [ > "queue:resolve-task:Whd5TL55Tv-x-87m6hGSFA/0" > ] > ] > > You only have the scopes: > [ > "assume:worker-type:scl3-puppet/gecko-t-win10-64-hw", > "auth:sentry:generic-worker", > "auth:statsum:generic-worker", > "queue:claim-task", > "queue:claim-task:scl3-puppet/gecko-t-win10-64-hw", > "queue:claim-work:scl3-puppet/*", > "queue:create-artifact:*", > "queue:pending-tasks:scl3-puppet/*", > "queue:poll-task-urls", > "queue:resolve-task", > "queue:worker-id:*" > ] > > In other words you are missing scopes from one of the options: > * Option 0: > - "assume:worker-id:scl3/t-w864-ix-011" > * Option 1: > - "queue:resolve-task:Whd5TL55Tv-x-87m6hGSFA/0" > ---- > method: reportFailed > errorCode: InsufficientScopes > statusCode: 403 > time: 2017-06-20T21:20:24.245Z > > > This is kind of bizarre, since assume: scopes should never be required > scopes. I'll look into where this is coming from. There are two different credentials profiles used by the generic worker. There is a static set of credentials granted to the worker in its config, which it uses for claiming and resolving tasks, and there are credentials returned by the queue when claiming (and reclaiming) a task, which it uses for performing actions on behalf of the task. The reportFailed queue API endpoint is called by the worker to resolve a task as failed. For this, it uses its own credentials, rather than the credentials of the task it is resolving. Looking at the role definitions, it seems correct that this call fails, since the role doesn't satisfy the given scope requirements: https://tools.taskcluster.net/auth/roles/#worker-type:scl3-puppet%252f* However, looking at a random generic-worker aws-provisioner-v1 worker type, that *also* seems not to satisfy the requirements: https://tools.taskcluster.net/auth/roles/#worker-type:aws-provisioner-v1%252fgecko-t-win10-64 So based on the assumption that a given AWS worker type is only given temporary credentials with the scope "assume:worker-type:aws-provisioner-v1/<workerType>" I cannot see why this isn't failing on our AWS workers too. Maybe this assumption is wrong, I will now check...
(In reply to Rob Thijssen (:grenade - UTC+3) from comment #6) > rundsc.ps1 is the script that manages creating and removing > C:\dsc\in-progress.lock > if something other than rundsc.ps1 reboots the computer while rundsc is > running, it would explain why the file exists and isn't removed. whenever > rundsc reboots the machine, it removes the lock file. @markco - based on this, I'd suggest setting "disableReboots" to true in the generic worker config, and checking that your run-generic-worker.bat script allows rundsc to reboot the machine, rather than run-generic-worker.bat rebooting. generic-worker --help will explain all the config settings.
(In reply to Pete Moore [:pmoore][:pete] from comment #7) > So based on the assumption that a given AWS worker type is only given > temporary credentials with the scope > "assume:worker-type:aws-provisioner-v1/<workerType>" I cannot see why this > isn't failing on our AWS workers too. Maybe this assumption is wrong, I will > now check... Mystery solved, indeed this assumption was wrong. AWS workers are given: "assume:worker-type:aws-provisioner-v1/<workerType>" "assume:worker-id:*" I'll create a separate bug for us to clean this up, since we don't have any roles prefixed with "worker-id:" so we probably should be using this. In any case, I'll fix this by adding "assume:worker-id:*" to role "worker-type:scl3-puppet/*". I'm curious now which role(s) and provisioner/workerType settings our OS X test workers have in scl3, as I would have expected them to hit this issue too.
See Also: → 1374978
(In reply to Pete Moore [:pmoore][:pete] from comment #9) > I'm curious now which role(s) and provisioner/workerType settings our OS X > test workers have in scl3, as I would have expected them to hit this issue > too. In the case of the OS X machines, the scope assume:worker-id:* is directly assigned to the client, rather than the worker-type role: https://tools.taskcluster.net/auth/clients/#project%252freleng%252fgeneric-worker%252fos-x%252fproduction I noticed that the scope "queue:worker-id:scl3/t-w864-ix-*" had been assigned to client "project/releng/worker/scl3-puppet/gecko-t-win10-64-hw". Now I understand how this mistake came about, because "queue:worker-id:*" must have at some point been changed to "queue:worker-id:scl3/t-w864-ix-*" because this was what the restricted workerIds were intended to look like. The reason this didn't work, is that we have an abuse of the assume: scopes in our APIs. The assume: scopes define identity, whereas other scopes define permissions. Our APIs should depend on clients having certain permissions, not on having certain identities. Therefore, APIs should not require assume: scopes. https://tools.taskcluster.net/auth/clients/#project%252freleng%252fworker%252fscl3-puppet%252fgecko-t-win10-64-hw
Mark, please try again, now that I have updated role "worker-type:scl3-puppet/*".
Flags: needinfo?(mcornmesser)
It seems to get here and just stops: Jun 21 10:05:04 T-W864-IX-011 generic-worker: 2017/06/21 17:05:03 Extracting zip file 'C:\downloads\DqOLbsc3QQaWH4-bsPwoRQ' to 'C:\Users\task_1498064616'#015 Jun 21 10:05:11 T-W864-IX-011 generic-worker: 2017/06/21 17:05:10 Executing command 0: C:\Users\task_1498064616\command_000000_wrapper.bat#015 Jun 21 10:05:11 T-W864-IX-011 generic-worker: 2017/06/21 17:05:10 Making system call VerSetConditionMask with args: [0 2 3]#015 Jun 21 10:05:11 T-W864-IX-011 generic-worker: 2017/06/21 17:05:10 Result: 8000000000000018 0 The operation completed successfully.#015 Jun 21 10:05:11 T-W864-IX-011 generic-worker: 2017/06/21 17:05:10 Making system call VerSetConditionMask with args: [8000000000000018 1 3]#015 Jun 21 10:05:11 T-W864-IX-011 generic-worker: 2017/06/21 17:05:10 Result: 800000000000001B 0 The operation completed successfully.#015 Jun 21 10:05:11 T-W864-IX-011 generic-worker: 2017/06/21 17:05:10 Making system call VerSetConditionMask with args: [800000000000001B 20 3]#015 Jun 21 10:05:11 T-W864-IX-011 generic-worker: 2017/06/21 17:05:10 Result: 800000000001801B 0 The operation completed successfully.#015 Jun 21 10:05:11 T-W864-IX-011 generic-worker: 2017/06/21 17:05:10 Making system call VerSetConditionMask with args: [800000000001801B 10 3]#015 Jun 21 10:05:11 T-W864-IX-011 generic-worker: 2017/06/21 17:05:10 Result: 800000000001B01B 0 The operation completed successfully.#015 Jun 21 10:05:11 T-W864-IX-011 generic-worker: 2017/06/21 17:05:10 Making system call VerifyVersionInfoW with args: [C04222C360 33 800000000001B01B]#015 Jun 21 10:05:11 T-W864-IX-011 generic-worker: 2017/06/21 17:05:10 Result: 1 0 The operation completed successfully.#015 Jun 21 10:05:11 T-W864-IX-011 generic-worker: 2017/06/21 17:05:10 Making system call CreateEnvironmentBlock with args: [C0423CC1E0 510 0]#015 Jun 21 10:05:11 T-W864-IX-011 generic-worker: 2017/06/21 17:05:10 Result: 1 0 The system could not find the environment option that was entered.#015 Jun 21 10:05:11 T-W864-IX-011 generic-worker: 2017/06/21 17:05:10 Making system call DestroyEnvironmentBlock with args: [1206AC0]#015 Jun 21 10:05:11 T-W864-IX-011 generic-worker: 2017/06/21 17:05:10 Result: 1 0 The operation completed successfully.#015 Jun 21 10:05:11 T-W864-IX-011 generic-worker: 2017/06/21 17:05:10 Making system call CreateProcessAsUserW with args: [510 0 C0423A4150 0 0 1 614 C0424BA000 C0420E4300 C0423A40E0 C0420C9040]#015 Jun 21 10:05:11 T-W864-IX-011 generic-worker: 2017/06/21 17:05:10 Result: 1 0 The operation completed successfully.#015 Jun 21 10:05:11 T-W864-IX-011 generic-worker: time="2017-06-21T17:05:10Z" level=info msg="processInfo: 0xc042028048" #015 Jun 21 10:05:11 T-W864-IX-011 generic-worker: 2017/06/21 17:05:10 Making system call SetPriorityClass with args: [5A4 8000]#015 Jun 21 10:05:11 T-W864-IX-011 generic-worker: 2017/06/21 17:05:10 Result: 1 0 The operation completed successfully.#015 Jun 21 10:05:11 T-W864-IX-011 generic-worker: 2017/06/21 17:05:10 Making system call ResumeThread with args: [5A0]#015 Jun 21 10:05:11 T-W864-IX-011 generic-worker: 2017/06/21 17:05:10 Result: 1 0 The operation completed successfully.#015
Flags: needinfo?(mcornmesser)
Eventually this error comes about: Jun 21 10:21:07 T-W864-IX-011 generic-worker: Server: AmazonS3#015 Jun 21 10:21:07 T-W864-IX-011 generic-worker: X-Amz-Id-2: hKUQc6UPJiDGOxaUOFDlJqDXOxpKAB8gjGKLP6AUdrv+KgN55Aa4+O5S5wg/JHePwqJP221eqFM=#015 Jun 21 10:21:07 T-W864-IX-011 generic-worker: X-Amz-Request-Id: 86138B453A1E4FAA#015 Jun 21 10:21:07 T-W864-IX-011 generic-worker: X-Amz-Version-Id: uff8ylmqWVreXc54FzFf3FVtPFO.z4uL#015 Jun 21 10:21:07 T-W864-IX-011 generic-worker: #015 Jun 21 10:21:07 T-W864-IX-011 generic-worker: 2017/06/21 17:21:06 Resolving task...#015 Jun 21 10:21:07 T-W864-IX-011 generic-worker: 2017/06/21 17:21:07 ERROR encountered: Task not successful due to following exception(s):#015 Jun 21 10:21:07 T-W864-IX-011 generic-worker: Exception 1)#015 Followed by the generic worker exiting and a reboot: Jun 21 10:21:09 T-W864-IX-011 generic-worker: 2017/06/21 17:21:08 Exiting worker with exit code 67#015 Just to note, I do currently have reboots disabled in the generic worker config.
Hi Mark, This denotes a task failure, not a worker exception. The worker ran successfully, and, as can be seen by the exit code 67, went on to request a reboot, in order to run the next task. The task that ran failed, due to not being able to fetch node-win32.exe from tooltool. See https://tools.taskcluster.net/task-inspector/#Cpt7LnhSQVW39K44PoG8HA/0 17:08:51 INFO - Copy/paste: c:\mozilla-build\python\python.exe C:\mozilla-build\tooltool.py --url https://api.pub.build.mozilla.org/tooltool/ fetch -m C:\Users\task_1498064616\build\tests\config/tooltool-manifests/win32/nodejs.manifest -o 17:08:51 INFO - Calling ['c:\\mozilla-build\\python\\python.exe', 'C:\\mozilla-build\\tooltool.py', '--url', 'https://api.pub.build.mozilla.org/tooltool/', 'fetch', '-m', 'C:\\Users\\task_1498064616\\build\\tests\\config/tooltool-manifests/win32/nodejs.manifest', '-o'] with output_timeout 600 17:08:51 INFO - INFO - Attempting to fetch from 'https://api.pub.build.mozilla.org/tooltool/'... 17:08:52 INFO - INFO - ...failed to fetch 'node-win32.exe' from https://api.pub.build.mozilla.org/tooltool/ 17:08:53 ERROR - ERROR - The following files failed: 'node-win32.exe' Perhaps some tooltool auth is required? Maybe have a look at C:\mozilla-build\tooltool.py and see if it uses a config file by default, and where it looks for it on the file system (e.g. maybe it is missing, or has bad credentials?). So to be clear, the worker ran correctly, claimed a task, executed it, resolved it (correctly) as failed (since the task had a command which had exit code 4294967295 = 0xFFFFFFFF) and then requested a reboot. So the next step is to find out why this tooltool fetch of node-win32.exe failed with this exit code (0xFFFFFFFF/-1/4294967295). But like I say, that isn't a worker problem, that is a problem either with the task, or the tooltool auth on the machine (i.e. related to machine setup/bootstrapping).
(In reply to Pete Moore [:pmoore][:pete] from comment #10) > I noticed that the scope "queue:worker-id:scl3/t-w864-ix-*" had been > assigned to client "project/releng/worker/scl3-puppet/gecko-t-win10-64-hw". > Now I understand how this mistake came about, because "queue:worker-id:*" > must have at some point been changed to > "queue:worker-id:scl3/t-w864-ix-*" because this was what the restricted > workerIds were intended to look like. This was the root cause of the bug, which has now been solved more generally by adding "assume:worker-id:*" to role "worker-type:scl3-puppet/*". Therefore going to mark this bug as closed, since the tooltool issue you are now hitting is a separate unrelated matter.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.