Closed Bug 1486532 Opened Last year Closed Last year

Don't generate optional payload properties in generic-worker payloads


(Firefox Build System :: Task Configuration, task)

Not set


(firefox-esr60 fixed, firefox63 fixed)

Tracking Status
firefox-esr60 --- fixed
firefox63 --- fixed


(Reporter: pmoore, Assigned: pmoore)




(2 files, 1 obsolete file)

I have mixed feelings about this - curious what others think.

My motivation for this bug comes from bug 1480412 comment 10. I discovered that even though osGroups is currently not supported on non-windows platforms, we were generating an empty list of OS groups for macOS test tasks.

Since several payload properties are optional, I only wanted to include them in generated tasks if non-empty values are provided.

In other words, rather than generating e.g. a task with "osGroups": [] we simply would not include "osGroups" in the generated task payload.

The same goes for all other optional properties ("env", "artifacts", "mounts", "features", .....)

I also wanted to explicitly raise an exception if a task tries to use a feature that is not available for the given platform (such as a mac/linux task that tries to specify osGroups).

I have mixed feelings about removing empty properties because sometimes it could be useful for a user to see that e.g. no groups have been specified, as it brings attention to the features available on the worker. However by the same token, generating long task payloads with several empty declarations can also distract the user from the non-empty values.

So I'm putting this out there to see what others think. My minimilist side appreciates only including required properties. A competing inclination is to prefer generation of empty objects/lists to be more explicit. So I'm not sure!

An argument for not putting the check in here that osGroups is only used on Windows is that at some point it could well be available on e.g. mac, and at that point we have duplicated logic in the worker payload validation and the in-tree task generation code. However an advantage of duplicating that logic here is that you don't need to wait for tasks to run and fail (and wait for their dependencies to successfully complete) before finding out that the feature is not available on the platform... This is certainly a time saver and will help diagnose problems early in the decision task modification process.

Please pitch in with opinions! :-)
Attachment #9004303 - Flags: review?(dustin)
Attachment #9004303 - Flags: review?(ahal)
Assignee: nobody → pmoore
Comment on attachment 9004303 [details]
Don't generate optional payload properties in generic-worker payloads

Andrew Halberstadt [:ahal] has approved the revision.
Attachment #9004303 - Flags: review+
Duplicate of this bug: 1360198
Depends on bug 1443589 since some of the properties (such as artifact expiry) were not optional in earlier versions of generic-worker which are still running in production.
Depends on: 1443589
Comment on attachment 9004303 [details]
Don't generate optional payload properties in generic-worker payloads

Something weird happened with the phabricator -> bz review flags. At any rate, I r+'ed this (and phabricator confirms I have no outstanding reviews there).
Attachment #9004303 - Flags: review?(ahal)
Pushed by
Don't generate optional payload properties in generic-worker payloads r=ahal
(In reply to Andreea Pavel [:apavel] from comment #8)
> Backed out for OSX exceptions
> Push that started the failures:
> jobs?repo=autoland&revision=bce9cb28665c0bb100267cbb956290a00c447721
> Failure log:
> UQ2qLVT0-d5_D9A/runs/0/logs/public%2Flogs%2Flive_backing.log#L23
> Backout:
> b38bcb2c577c0e5052ae595ef572b301aa88046e

Thanks Andreea!

I've spotted the mistake, and am making a new try push with a fix. In this try push, I'll trigger a bunch of tasks that previously failed to make sure they pass now.

Sorry for the bad landing!
Flags: needinfo?(pmoore)
Assuming I interacted correctly with treeherder, I expect some new tasks to appear in that try push shortly; a random selection of tasks that broke in the previous version.

The new push review I'll put up will be a regular patch file rather than a phabricator patch, as I wasn't quite sure how to apply a change to a landed patch via phabricator.

The incremental diff of the change I'm hoping will fix things is:

diff --git a/taskcluster/taskgraph/transforms/ b/taskcluster/taskgraph/transforms/
--- a/taskcluster/taskgraph/transforms/
+++ b/taskcluster/taskgraph/transforms/
@@ -1007,22 +1007,22 @@ def build_generic_worker_payload(config,
     #   * 'task-id'    -> 'taskId'
     # All other key names are already suitable, and don't need renaming.
     mounts = deepcopy(worker.get('mounts', []))
     for mount in mounts:
         if 'cache-name' in mount:
             mount['cacheName'] = mount.pop('cache-name')
         if 'content' in mount:
             if 'task-id' in mount['content']:
                 mount['content']['taskId'] = mount['content'].pop('task-id')
-    if worker.get('mounts', []):
-        task_def['payload']['mounts'] = worker['mounts']
+    if mounts:
+        task_def['payload']['mounts'] = mounts
     if worker.get('os-groups', []):
         task_def['payload']['osGroups'] = worker['os-groups']
     features = {}
     if worker.get('chain-of-trust'):
         features['chainOfTrust'] = True
     if worker.get('taskcluster-proxy'):

As you see above, the problem was that the mounts variable is prepared for insertion into the task payload, as a _mutated_ copy of worker['mounts'], but then worker['mounts'] was being inserted into the payload, rather than the mutated version in the mounts variable.

Schoolboy error!
(In reply to Pete Moore [:pmoore][:pete] from comment #9)

> I've spotted the mistake, and am making a new try push with a fix. In this
> try push, I'll trigger a bunch of tasks that previously failed to make sure
> they pass now.

The only failures in this push are from scriptworker tasks, but I have a fix for that here:
Currently, scriptworker guesses worker implementation based on properties that are included in the task payloads. Therefore, if these properties disappear, this guessing function unfortunately breaks. However, the good news is, there is trustworthy metadata contained in the gecko decision-task generated task definitions which provides this information, which is the worker-implementation tag. I suspect this didn't exist when the worker implementation guess code was written, but now that it does, it should be a reliable source of information.

I'm not sure if scriptworker evaluates tasks that are generated outside of the gecko decision task which may not have the worker-implementation tag, such as release tasks? Therefore, rather than assuming the presence of the worker-implementation tag, the previous mechanics of guessing the worker implementation have been left in place essentially as a fallback (with the added advantage that if they conflict with the tag, an exception will be raised).

Note, if the worker-implementation tag turns out to be present in all tasks that scriptworker evaluates, we could (and probably should) remove the former guessing code.
Attachment #9005147 - Flags: review?(aki)
Hey Dustin,

This is the same change as the previous phabricator patch, plus the diff shown in comment 10, attached as a regular patch. Note, this patch depends on the above scriptworker PR landing (and being rolled out) first.

Attachment #9004303 - Attachment is obsolete: true
Attachment #9005148 - Flags: review?(dustin)
Attachment #9005148 - Flags: review?(dustin) → review+
Comment on attachment 9005147 [details] [review]
Github Pull Request for scriptworker

This is released as scriptworker 15.0.1.
Attachment #9005147 - Flags: review?(aki) → review+
I've retriggered the unsuccessful Bs jobs in which are now correctly evaluating the worker. The new exception I get is e.g.

> CRITICAL - Can't find task signing ZLNUcKvoQWS4_nMgHenaBQ in signing:parent BIApp2ESQVKqnnUrafdw8g task-graph.json!

This is expected since I manually added the Bs task to the push after the decision task had run.

Note, in we do see the following successful line at identifying the worker:

> 2018-09-03T10:39:42    DEBUG -      signing:build ZYLbxmYMRjSiDa6xYtecBw is generic-worker

This is great, because looking at the task definition for ZYLbxmYMRjSiDa6xYtecBw ( ) we see that it does not contain `osGroups` or `mounts` properties, which suggests it successfully recognised the task as being for generic-worker using the `worker-implementation` as per the scriptworker patch above.

So it seems the release has been successfully deployed on the scriptworker hosts, and we should be ready to roll out the gecko patch.

I'll therefore try to land the gecko patch again now.

Thanks all!
Pushed by
Don't generate optional payload properties in generic-worker payloads r=dustin
Closed: Last year
Resolution: --- → FIXED
Target Milestone: --- → mozilla63
You need to log in before you can comment on or make changes to this bug.