add schemas for in-tree generic-worker tasks for mounting directories

RESOLVED FIXED in mozilla52

Status

RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: froydnj, Assigned: froydnj)

Tracking

unspecified
mozilla52

Details

Attachments

(1 attachment, 1 obsolete attachment)

(Assignee)

Description

2 years ago
In discussions with pmoore today, I had a problem that was solved by using
caches and directing generic workers to mount that cache appropriately.
Unfortunately, in-tree task generation doesn't support specifying mounts for
generic workers.  The soon-to-be-attached patch adds the appropriate bits to
task schemas for generic workers, and works well enough that gecko-decision-opt
doesn't complain about my task definition, at least.
(Assignee)

Comment 1

2 years ago
Created attachment 8799922 [details] [diff] [review]
add schemas for in-tree generic-worker tasks for mounting directories

These are based on cargo-culting from:

https://docs.taskcluster.net/manual/execution/workers/generic-worker

and seem to work OK.

Reading from the schema in the link above, it occurs to me that 'content' and
'format' are either both optional or both required for a read-write directory
cache (unless you have initial contents that are just a binary file or
something?), but I'm not sure how to specify that in the above-linked schema,
or in the Python schema.
Attachment #8799922 - Flags: feedback?(pmoore)
Comment on attachment 8799922 [details] [diff] [review]
add schemas for in-tree generic-worker tasks for mounting directories

Review of attachment 8799922 [details] [diff] [review]:
-----------------------------------------------------------------

The schema for task descriptions doesn't necessarily need to follow the schema for the payload.  That makes a little bit more work for the transform at the bottom of this file, but it's not bad.  And it's worthwhile to simplify the task description.

Pete, you are of course welcome to adopt these modifications in the generic-worker, at which point there's only one place to change things in-tree (the transform).

::: taskcluster/taskgraph/transforms/task.py
@@ +36,5 @@
> +    },
> +    # downloading content from a URL
> +    {
> +        Required('url'): Url(),
> +    })

As you've noted, the payload schema is a little wonky, so maybe here it makes more sense to include "format" in both of these alternatives.

@@ +42,5 @@
> +directory_cache_or_read_only_directory = Any(
> +    # read-write cache
> +    {
> +        # a unique name for the cache volume
> +        Required('cacheName'): basestring,

Similarly here, it is not obvious to a reader that the presence of "cacheName" is what enabled write access to the directory.  So it may be worth adding an extra attribute here, maybe "mode", with two options.

And while I'm here, for the task descriptions I've been consistently using lower-case, dashed identifiers -- so, `cache-name` and `task-id`.
(In reply to Dustin J. Mitchell [:dustin] from comment #2)
> As you've noted, the payload schema is a little wonky, so maybe here it
> makes more sense to include "format" in both of these alternatives.

What do you mean by wonky?
Flags: needinfo?(dustin)
I was referring to the last paragraph of comment 1.
Flags: needinfo?(dustin)
(Assignee)

Comment 5

2 years ago
(In reply to Dustin J. Mitchell [:dustin] from comment #2)
> The schema for task descriptions doesn't necessarily need to follow the
> schema for the payload.  That makes a little bit more work for the transform
> at the bottom of this file, but it's not bad.  And it's worthwhile to
> simplify the task description.

Wait, so the schema I was modifying is actually the schema for the in-tree .yml files?  So when I made modifications to my toolchain task, those changes were going to be validated against the schema I modified, and not the schema for what taskcluster actually expects?  And whatever transforms from the in-tree format to the taskcluster format just passes these new values through?

I feel like I keep learning that my understanding of how all these parts fit together is woefully incomplete.
Flags: needinfo?(dustin)
Yes, the Python schemas serve two purposes: to define the structure of data formats that are *not* task definitions; and to provide an in-source place to hang documentation in the form of comments.

So the schema you're referencing is that for task descriptions, which are a simpler, more gecko-oriented format describing a task.  They tend to smooth out differences between worker implementations (for example, artifacts are always in the same format), and also include lots of gecko-specific stuff like index routes, treeherder routes, per-branch expiration times, etc.

In this case where, as you've noted, the task *definition* schema (specifically the generic-worker payload, https://docs.taskcluster.net/manual/execution/workers/generic-worker) is "a little wonky", but that wonkiness doesn't necessarily have to appear in the task description.
Flags: needinfo?(dustin)
(Assignee)

Comment 7

2 years ago
(In reply to Dustin J. Mitchell [:dustin] from comment #6)
> Yes, the Python schemas serve two purposes: to define the structure of data
> formats that are *not* task definitions; and to provide an in-source place
> to hang documentation in the form of comments.
> 
> So the schema you're referencing is that for task descriptions, which are a
> simpler, more gecko-oriented format describing a task.  They tend to smooth
> out differences between worker implementations (for example, artifacts are
> always in the same format), and also include lots of gecko-specific stuff
> like index routes, treeherder routes, per-branch expiration times, etc.
> 
> In this case where, as you've noted, the task *definition* schema
> (specifically the generic-worker payload,
> https://docs.taskcluster.net/manual/execution/workers/generic-worker) is "a
> little wonky", but that wonkiness doesn't necessarily have to appear in the
> task description.

OK, so I think what you're suggesting is:

1. Change the in-tree task definition schema to be cleaner (comment 2 and any associated comments Pete might have).
2. Add code to build_generic_worker_payload in task.py:

https://dxr.mozilla.org/mozilla-central/source/taskcluster/taskgraph/transforms/task.py#389

to transform between what the task defintion schema requires and what the task definition schema requires.

This has the added bonus that I think I can just implement what I need for clang-cl toolchain builds (read/write directory caches) and leave off the read-only directory cache stuff.  Correct?
Flags: needinfo?(dustin)
Yup on all counts :)
Flags: needinfo?(dustin)
(Assignee)

Comment 9

2 years ago
Created attachment 8804437 [details] [diff] [review]
add schemas for in-tree generic-worker tasks for mounting directories

All right, much shorter reviewable version based on Dustin's feedback.
Attachment #8804437 - Flags: review?(pmoore)
(Assignee)

Updated

2 years ago
Attachment #8799922 - Attachment is obsolete: true
Attachment #8799922 - Flags: feedback?(pmoore)
Comment on attachment 8804437 [details] [diff] [review]
add schemas for in-tree generic-worker tasks for mounting directories

Review of attachment 8804437 [details] [diff] [review]:
-----------------------------------------------------------------

lgtm, thanks!
Attachment #8804437 - Flags: review?(pmoore) → review+

Comment 11

2 years ago
Pushed by nfroyd@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/0840f0ac4394
add schemas for in-tree generic-worker tasks for mounting directories; r=pmoore

Comment 12

2 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/0840f0ac4394
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla52
(In reply to Dustin J. Mitchell [:dustin] from comment #4)
> I was referring to the last paragraph of comment 1.

We discussed this, and the misunderstanding was that in the schema, content refers to a file, which may or may not be an archive. It is not only used by directory mounts (read only and writable) but also by file mounts. Therefore you can't put the "format" of the mount directly in the "content" since file mounts also refer to content, which isn't an archive. Therefore, directory mounts (both writable directory caches and read only directories) need to specify the format of the archive content, rather than the content itself.

These are kind of internal details related to the structuring of the schema and can be glossed over for the most part. It avoids having URLFileContent, URLDirectoryContent, ArtifactFileContent, ArtifactDirectoryContent as four separate types - instead just having one URLContent type and one ArtifactContentType, and letting the type that depends on it decide to apply a format, if it is an archive.
You need to log in before you can comment on or make changes to this bug.