Closed Bug 1157242 Opened 9 years ago Closed 7 years ago

figure out how to schedule buildbot backed builds in taskcluster

Categories

(Release Engineering :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: bhearsum, Unassigned)

References

Details

(Whiteboard: [bbb])

Attachments

(2 files, 2 obsolete files)

The Taskcluster folks gave me an overview of how this works for existing jobs that are triggered by Mercurial pushes. A decision task (such as http://docs.taskcluster.net/tools/task-graph-inspector/#_LjL5STHSiipff-2mcpbOQ/2x_t31fBQMy_QdYejylafA) is created, which ends up calling out to mach and extending the task graph with all of the jobs that push requires. That graph is calculated by analyzing the yml files in https://dxr.mozilla.org/mozilla-central/source/testing/taskcluster/tasks.

For example, the entry point for an Alder push is https://dxr.mozilla.org/mozilla-central/source/testing/taskcluster/tasks/branches/alder/job_flags.yml, which references tons of other yml files in there (including through inheritance). AFAICT, all of the existing jobs there are rooted in files such as https://dxr.mozilla.org/mozilla-central/source/testing/taskcluster/tasks/build.yml which define provisionerId and workerType, so we'll need to start a new chain of those, because the Bridge has unique values for them.
Moving scheduling of builds that happen in response to a push seems like a good starting point. Eg: everything on this graph: http://people.mozilla.org/~bhearsum/mozilla-scheduler-graphs/alder-firefox%20scheduler.svg

I'll need to remove this Scheduler from Buildbot before adding it to Taskcluster, otherwise everything would get scheduled twice.
Depends on: 1157310
Combined with the patch from bug 1157310 this will disable all of the build schedulers on Alder so I can start scheduling them through Taskcluster. Test schedulers are untouched for now. I'm thinking that may end being pushed out for awhile since it's not strictly necessary to progress on work that blocks moving to S3.
Attachment #8600965 - Flags: review?(catlee)
This patch is very early days but since I'm touching some of the graph generation code I want to get feedback on it ASAP.

There seems to be some build in assumptions right now that don't hold true for bridge tasks. Specifically, bridge tasks will NOT have:
* Artifact urls up-front
* Locations defined
* Tests

I'm pretty sure these parts are a no-op for everything else - I'm mostly just shuffling around code and adding .get()s to protect against missing keys. However, all verifications related to "locations" are commented out, which actually does affect current tasks.

The rest of this patch is changes to job flags + task definitions. I've created a base task for bridge jobs which defines just about everything they need. I'm not really sure about a lot of the metadata, like name and description, I just chose reasonable-looking things there.

I'm even more unsure about the job flags. I mostly copied the way the "_gecko" builds work for that. I removed all of the non-bridge jobs from Alder for now to make it easier to verify my work (and not waste compute time when I actually start pushing). This is what I ended up with when running taskcluster-graph locally:
./mach taskcluster-graph --project alder --owner bhearsum --head-repository https://hg.mozilla.org/projects/alder --head-rev abcdef123456 
{
    "scopes": [
        "queue:route:index.gecko.v1.alder.revision.linux.abcdef123456.firefox_linux32.opt", 
        "queue:route:index.gecko.v1.alder.latest.linux.firefox_linux32.opt", 
        "queue:define-task:aws-provisioner/buildbot-bridge"
    ], 
    "tasks": [
        {
            "task": {
                "workerType": "buildbot-bridge", 
                "tags": {
                    "createdForUser": "bhearsum"
                }, 
                "extra": {
                    "index": {
                        "rank": 0
                    }, 
                    "treeherder": {
                        "groupSymbol": "tc", 
                        "collection": {
                            "opt": true
                        }, 
                        "machine": {
                            "platform": "linux"
                        }, 
                        "groupName": "Submitted by taskcluster", 
                        "build": {
                            "platform": "linux"
                        }, 
                        "symbol": "B"
                    }
                }, 
                "created": "2015-05-04T14:56:49.216489", 
                "schedulerId": "task-graph-scheduler", 
                "deadline": "2015-05-05T18:56:49.218488Z", 
                "routes": [
                    "index.gecko.v1.alder.revision.linux.abcdef123456.firefox_linux32.opt", 
                    "index.gecko.v1.alder.latest.linux.firefox_linux32.opt"
                ], 
                "payload": {
                    "buildername": "Linux alder build", 
                    "sourcestamp": {
                        "branch": "https://hg.mozilla.org/projects/alder", 
                        "revision": "abcdef123456"
                    }
                }, 
                "provisionerId": "buildbot-bridge", 
                "metadata": {
                    "owner": "release+taskcluster@mozilla.com", 
                    "source": "http://todo.com/soon", 
                    "name": "Linux Opt Build", 
                    "description": "Linux Opt Build"
                }
            }, 
            "taskId": "NDcboS9vRgOkNHO07sJ-kQ"
        }
    ], 
    "metadata": {
        "owner": "bhearsum", 
        "source": "http://todo.com/what/goes/here", 
        "description": "Task graph generated via ./mach taskcluster-graph", 
        "name": "task graph local"
    }
}
Attachment #8601036 - Flags: feedback?(jopsen)
Comment on attachment 8601036 [details] [diff] [review]
early stages of patch to generate buildbot bridge tasks on alder

Summary:
 - Looks good to me
 - taskGraph.metadata.owner must be an email
   (see docs for meta data details: http://docs.taskcluster.net/scheduler/api-docs/#createTaskGraph)
 - Important parts, which you seem to have right is:
   - task.provisionerId: 'buildbot-bridge'
   - task.workerType:    'buildbot-bridge',
   - task.payload:       {whatever buildbot-bridge needs}
   - taskGraph.scopes:   ['queue:define-task:aws-provisioner/buildbot-bridge', task.scopes]

----------------
> * Artifact urls up-front
I might not be up-to-date on mozharness integration. But if it uploads to the taskId/runId that the buildbot-bridge was assigned, then you should have a predictable artifact URL.
(granted you give your artifacts a static name, ie. don't include version numbers, timestamps, etc.)

>        "owner": "bhearsum", 
Must be a valid email. You can use something like: "release+taskcluster@mozilla.com"
Should be an email of the person who made the push, unfortunately we don't always have valid
emails for these people so a dummy email is sometimes used.
Hopefully, we'll either re-define owner to not require it to be an email, or we'll get valid emails
for everybody. There is some useful features in having emails (but that's an unrelated discussion).

>        "source": "http://todo.com/what/goes/here", 
Docs says: "Link to source of this task-graph, should specify file, revision and repository"
Basically, we once had a "maintainer" field, but as we check task configuration into the tree, and
many different people edits those files. It was suggested that we instead have a link to the source,
ideally somewhere like hgweb, where there is a "blame"-link, so you can trace who the task belong to.

>        "name": "task graph local"
>        "description": "Task graph generated via ./mach taskcluster-graph", 
Free form strings (markdown)... Feel free to write whatever makes sense to you.
Hardcoded constant is great for now, maybe in some future we can write nice things like
"Push to `alder`" don't worry about that (I'm just arguing why it's there).
Attachment #8601036 - Flags: feedback?(jopsen) → feedback+
Thanks for the fast turnaround! Pretty much everything you said makes perfect sense. Just one thing to clarify below:

(In reply to Jonas Finnemann Jensen (:jonasfj) from comment #4)
> > * Artifact urls up-front
> I might not be up-to-date on mozharness integration. But if it uploads to
> the taskId/runId that the buildbot-bridge was assigned, then you should have
> a predictable artifact URL.
> (granted you give your artifacts a static name, ie. don't include version
> numbers, timestamps, etc.)

Not all of the jobs that run in Buildbot are run through Mozharness. Some jobs (particularly some release ones) don't have artifacts (other than logs) at all either, in fact! I'm not sure if release jobs will end up using this same code yet, or if we'll have something else that generates their task graphs. Even just thinking about CI jobs, I don't think the assumption of everything having predictable artifact locations up front will be valid while we still have so many jobs implemented in Buildbot.
Attachment #8600965 - Flags: review?(catlee) → review+
Attachment #8600965 - Flags: checked-in+
Here's an updated version of my gecko patch that adds all of the desktop firefox opt and debug builds to taskcluster-graph. You can see it working over at: https://treeherder.allizom.org/#/jobs?repo=alder&revision=a4233f1a49ea (there's a bug in the bridge causing the buildbot versions of the jobs not to show up on TH prod at the moment, hence the link to stage).

I'm mostly using this to validate changes to the buildbot bridge right now, it probably won't merge to m-c anytime soon. It's a good proof of concept for how to do this later, though.
Attachment #8601036 - Attachment is obsolete: true
Attached patch remove alder test schedulers (obsolete) — Splinter Review
Get rid of all of the Alder test schedulers, which will let me try to get them going in a task graph. Depends on the patch in bug 1157310.
Attachment #8606541 - Flags: review?(jlund)
Comment on attachment 8606541 [details] [diff] [review]
remove alder test schedulers

Actually, let's not do this yet...it's going to be tough to add tests to the taskcluster-graph until we have the binaries/test packages attached as artifacts.
Attachment #8606541 - Attachment is obsolete: true
Attachment #8606541 - Flags: review?(jlund)
Unassigning myself because I won't be taking this any further for now. I've got enough builds going to verify buildbot bridge work, and what's attached here is a decent start for porting over CI schedulers. I'll be focus on release automation scheduling now and I'm not sure if I'll be the one driving this bug home.
Assignee: bhearsum → nobody
No longer blocks: bbb
Whiteboard: [bbb]
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: